Statistical test – My Statistical tools

October 15, 2019

Using SAS datasteps and PROC SQL to conduct a t-test

/*break the data and apply ttest*/
data kaz20;set kaz2;
if engaged=0;
run;

data kaz21;set kaz2;
if engaged=1;
run;

proc sort data=kaz20;by subgroup variable;run;
proc sort data=kaz21;by subgroup variable;run;

proc sql;
create table new as
select
a.subgroup as school_level,
a.Variable as varriable_name,
a.n as engage0_n,
a.Mean as engage0_mean,
a.StdDev as engage0_SD,
a.Min as enagage0_min,
a.Max as enagage0_max,
b.n as engage1_n,
b.Mean as engage1_mean,
b.StdDev as engage1_SD,
b.Min as enagage1_min,
b.Max as enagage1_max

from kaz20 a
join kaz21 b
on a.subgroup=b.subgroup and a.Variable=b.Variable
;

data new2;set new;
/*QC
engage1_n=100;
engage1_mean=.5;
engage1_SD=.2;
engage0_n=120;
engage0_mean=.55;
engage0_SD=.2;
*/

/*t-test*/
difference=engage1_mean-engage0_mean;
/*https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm*/
POOLED_SE=sqrt( ( (engage1_SD*engage1_SD) / engage1_n ) + ( (engage0_SD*engage0_SD ) / engage0_n ) );

T_value=abs(difference)/POOLED_SE;

P_value=(1-probnorm(T_value))*2;
*if P_value < 0.1 then sig="t";
if P_value < 0.05 then sig="* ";
if P_value < 0.01 then sig="** ";
if P_value < 0.001 then sig="***";
if P_value =. then sig="";

run;

January 16, 2019

Anova test using PROC GLIMMIX

%macro clem(var1=,var2=);
proc glimmix data=main.Final_year4;
class status;
model &var1=status
/dist=normal /*binomial*/ link=identity/*logit*/ s ddfm=kr;
lsmeans status / ilink diff;
ods output
/*ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1*/
Diffs=DIF_RESULT1 LSMeans=LS1;
run;

data &var1.;
set LS1 DIF_RESULT1;
outcome_name="&var1";
v="vs.";
if Mu ne . then analysis_type="Descriptive Stats";
if Mu = . then do;
analysis_type="Anova";
if Probt < 0.05 then sig="* ";
if Probt < 0.01 then sig="** ";
if Probt < 0.001 then sig="***";
if Probt =. then sig="";
end;
drop effect;
run;

data &var2;
retain outcome_name analysis_type status v _status;
set &var1.;
run;

%mend clem;
%clem(var1=year4_outcome1,var2=a);
%clem(var1=year4_outcome2,var2=b);
%clem(var1=year4_outcome3,var2=c);
%clem(var1=year4_outcome4,var2=d);

data em;
run;

data allresults;
set a em b em c em d ;
run;

April 2, 2017April 2, 2017

How to derive standard deviation from standard error

Algorithm:

SD=Standard error * sqrt(N);

Reference:

http://handbook.cochrane.org/chapter_7/7_7_3_2_obtaining_standard_deviations_from_standard_errors_and.htm

QC: I checked the algorithm using SAS. The result was consistent with the algorithm (i.e., SD=standard error*sqrt(N)).

proc means data=sashelp.class mean std stderr n;
var height;
run;

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Mean 62.3368421

SD 5.1270752

Stadard Error 1.1762317

N 19
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

February 28, 2017

SAS T-test for proportions

data YesNo;
input Gender $ NumYes Total;
Response="Yes"; Count=NumYes; output;
Response="No "; Count=Total-NumYes; output;
datalines;
Men 30 100
Women 45 100
;

proc print noobs;
var Gender Response Count;
run;

proc freq order=data;
weight Count;
table Gender * Response / chisq riskdiff;
exact riskdiff;
run;

March 17, 2016

How to QC the result of random assignment

QC 1:

Check the number of cases within each group (i.e. treatment and control group).

Q2:

Check if the number of treatment school and control school is balanced within block.

December 2, 2015December 8, 2015

Cronbach's alpha

The UCLA site explains Cronbach's alpha as the average internal correlation among survey items. It also says that it is not a measure of unidimensionality. Rather, it is a measurement of internal consistency (though just intuitively I feel what is coherent tends to be also uni-dimensional... I think the point is that the measure is most optimal by design for the assessment of internal correlation, not dimentionality.

http://www.ats.ucla.edu/stat/spss/faq/alpha.html

Standardized versus Raw

This SAS website says one should use the standardized version of the measure (as opposed to raw).

https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect032.htm

It says: "Because the variances of some variables vary widely, you should use the standardized score to estimate reliability."

A note to myself: Does this mean if I standardized all items before the analysis, I get the same value for raw and standardized? I can experiment this.

December 1, 2015

Sample size calculation using Excel sheet

I wrote this Excel program to calculate sample size for surveys.

www.nippondream.com/file/sample size calculation 11 30 2015.xlsx

November 20, 2015November 21, 2015

Excel function to replicate t-test off SAS PROCs (e.g., GLIMMIX)

Phil of SAS helped me identify this function. Thank you.

T-test conducted in PROC GLIMMIX (or most likely other regression procedures) is expressed in Excel function as:

=2*(1-T.DIST( T_VALUE , DEG_OF_FREEDOM ,TRUE))

where T_value must be an absolute value of the original t-value (e.g., if -2 then 2).

This expresses CDF (cumulative distribution function), not PDF (probability density function). I will explicitly discuss what these are in the near future.

I wanted to know how much of statistical results (off PROC GLIMMIX in this case) comes from SAS's internal computation (i.e., I can't replicate results outside SAS) and how much of it can be done in Excel given what I get from SAS output.