Using SAS datasteps and PROC SQL to conduct a t-test

/*break the data and apply ttest*/
data kaz20;set kaz2;
if engaged=0;
run;

data kaz21;set kaz2;
if engaged=1;
run;

proc sort data=kaz20;by subgroup variable;run;
proc sort data=kaz21;by subgroup variable;run;

proc sql;
create table new as
select
a.subgroup as school_level,
a.Variable as varriable_name,
a.n as engage0_n,
a.Mean as engage0_mean,
a.StdDev as engage0_SD,
a.Min as enagage0_min,
a.Max as enagage0_max,
b.n as engage1_n,
b.Mean as engage1_mean,
b.StdDev as engage1_SD,
b.Min as enagage1_min,
b.Max as enagage1_max

from kaz20 a
join kaz21 b
on a.subgroup=b.subgroup and a.Variable=b.Variable
;

data new2;set new;
/*QC
engage1_n=100;
engage1_mean=.5;
engage1_SD=.2;
engage0_n=120;
engage0_mean=.55;
engage0_SD=.2;
*/

/*t-test*/
difference=engage1_mean-engage0_mean;
/*https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm*/
POOLED_SE=sqrt( ( (engage1_SD*engage1_SD) / engage1_n ) + ( (engage0_SD*engage0_SD ) / engage0_n ) );

T_value=abs(difference)/POOLED_SE;

P_value=(1-probnorm(T_value))*2;
*if P_value < 0.1 then sig="t";
if P_value < 0.05 then sig="* ";
if P_value < 0.01 then sig="** ";
if P_value < 0.001 then sig="***";
if P_value =. then sig="";

run;

Anova test using PROC GLIMMIX

%macro clem(var1=,var2=);
proc glimmix data=main.Final_year4;
class status;
model &var1=status
/dist=normal /*binomial*/ link=identity/*logit*/ s ddfm=kr;
lsmeans status / ilink diff;
ods output
/*ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1*/
Diffs=DIF_RESULT1 LSMeans=LS1;
run;

data &var1.;
set LS1 DIF_RESULT1;
outcome_name="&var1";
v="vs.";
if Mu ne . then analysis_type="Descriptive Stats";
if Mu = . then do;
analysis_type="Anova";
if Probt < 0.05 then sig="* ";
if Probt < 0.01 then sig="** ";
if Probt < 0.001 then sig="***";
if Probt =. then sig="";
end;
drop effect;
run;

data &var2;
retain outcome_name analysis_type status v _status;
set &var1.;
run;

%mend clem;
%clem(var1=year4_outcome1,var2=a);
%clem(var1=year4_outcome2,var2=b);
%clem(var1=year4_outcome3,var2=c);
%clem(var1=year4_outcome4,var2=d);

data em;
run;

data allresults;
set a em b em c em d ;
run;

How to derive standard deviation from standard error

Algorithm:

SD=Standard error * sqrt(N);

 

Reference:

http://handbook.cochrane.org/chapter_7/7_7_3_2_obtaining_standard_deviations_from_standard_errors_and.htm

 

QC: I checked the algorithm using SAS.  The result was consistent with the algorithm (i.e., SD=standard error*sqrt(N)).

proc means data=sashelp.class mean std stderr n;
var height;
run;

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Mean 62.3368421

SD 5.1270752

Stadard Error 1.1762317

N 19
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

SAS T-test for proportions

data YesNo;
input Gender $ NumYes Total;
Response="Yes"; Count=NumYes; output;
Response="No "; Count=Total-NumYes; output;
datalines;
Men 30 100
Women 45 100
;

proc print noobs;
var Gender Response Count;
run;

proc freq order=data;
weight Count;
table Gender * Response / chisq riskdiff;
exact riskdiff;
run;

Cronbach's alpha

The UCLA site explains Cronbach's alpha as the average internal correlation among survey items.  It also says that it is not a measure of unidimensionality.  Rather, it is a measurement of internal consistency (though just intuitively I feel what is coherent tends to be also uni-dimensional... I think the point is that the measure is most optimal by design for the assessment of internal correlation, not dimentionality.

http://www.ats.ucla.edu/stat/spss/faq/alpha.html

Standardized versus Raw

This SAS website says one should use the standardized version of the measure (as opposed to raw).

https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect032.htm

It says: "Because the variances of some variables vary widely, you should use the standardized score to estimate reliability."

A note to myself: Does this mean if I standardized all items before the analysis, I get the same value for raw and standardized?  I can experiment this.

Excel function to replicate t-test off SAS PROCs (e.g., GLIMMIX)

Phil of SAS helped me identify this function. Thank you.

T-test conducted in PROC GLIMMIX (or most likely other regression procedures) is expressed in Excel function as:

=2*(1-T.DIST( T_VALUE , DEG_OF_FREEDOM ,TRUE))

where T_value must be an absolute value of the original t-value (e.g., if -2 then 2).

This expresses CDF (cumulative distribution function), not PDF (probability density function).  I will explicitly discuss what these are in the near future.

I wanted to know how much of statistical results (off PROC GLIMMIX in this case) comes from SAS's internal computation (i.e., I can't replicate results outside SAS) and how much of it can be done in Excel given what I get from SAS output.