/*break the data and apply ttest*/
data kaz20;set kaz2;
data kaz21;set kaz2;
proc sort data=kaz20;by subgroup variable;run;
proc sort data=kaz21;by subgroup variable;run;
create table new as
a.subgroup as school_level,
a.Variable as varriable_name,
a.n as engage0_n,
a.Mean as engage0_mean,
a.StdDev as engage0_SD,
a.Min as enagage0_min,
a.Max as enagage0_max,
b.n as engage1_n,
b.Mean as engage1_mean,
b.StdDev as engage1_SD,
b.Min as enagage1_min,
b.Max as enagage1_max
from kaz20 a
join kaz21 b
on a.subgroup=b.subgroup and a.Variable=b.Variable
data new2;set new;
POOLED_SE=sqrt( ( (engage1_SD*engage1_SD) / engage1_n ) + ( (engage0_SD*engage0_SD ) / engage0_n ) );
*if P_value < 0.1 then sig="t";
if P_value < 0.05 then sig="* ";
if P_value < 0.01 then sig="** ";
if P_value < 0.001 then sig="***";
if P_value =. then sig="";
proc glimmix data=main.Final_year4;
/dist=normal /*binomial*/ link=identity/*logit*/ s ddfm=kr;
lsmeans status / ilink diff;
/*ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1*/
set LS1 DIF_RESULT1;
if Mu ne . then analysis_type="Descriptive Stats";
if Mu = . then do;
if Probt < 0.05 then sig="* ";
if Probt < 0.01 then sig="** ";
if Probt < 0.001 then sig="***";
if Probt =. then sig="";
retain outcome_name analysis_type status v _status;
set a em b em c em d ;
SD=Standard error * sqrt(N);
QC: I checked the algorithm using SAS. The result was consistent with the algorithm (i.e., SD=standard error*sqrt(N)).
proc means data=sashelp.class mean std stderr n;
Stadard Error 1.1762317
input Gender $ NumYes Total;
Response="Yes"; Count=NumYes; output;
Response="No "; Count=Total-NumYes; output;
Men 30 100
Women 45 100
proc print noobs;
var Gender Response Count;
proc freq order=data;
table Gender * Response / chisq riskdiff;
Check the number of cases within each group (i.e. treatment and control group).
Check if the number of treatment school and control school is balanced within block.
The UCLA site explains Cronbach's alpha as the average internal correlation among survey items. It also says that it is not a measure of unidimensionality. Rather, it is a measurement of internal consistency (though just intuitively I feel what is coherent tends to be also uni-dimensional... I think the point is that the measure is most optimal by design for the assessment of internal correlation, not dimentionality.
Standardized versus Raw
This SAS website says one should use the standardized version of the measure (as opposed to raw).
It says: "Because the variances of some variables vary widely, you should use the standardized score to estimate reliability."
A note to myself: Does this mean if I standardized all items before the analysis, I get the same value for raw and standardized? I can experiment this.
I wrote this Excel program to calculate sample size for surveys.
www.nippondream.com/file/sample size calculation 11 30 2015.xlsx
Phil of SAS helped me identify this function. Thank you.
T-test conducted in PROC GLIMMIX (or most likely other regression procedures) is expressed in Excel function as:
=2*(1-T.DIST( T_VALUE , DEG_OF_FREEDOM ,TRUE))
where T_value must be an absolute value of the original t-value (e.g., if -2 then 2).
This expresses CDF (cumulative distribution function), not PDF (probability density function). I will explicitly discuss what these are in the near future.
I wanted to know how much of statistical results (off PROC GLIMMIX in this case) comes from SAS's internal computation (i.e., I can't replicate results outside SAS) and how much of it can be done in Excel given what I get from SAS output.