### Sample size needed for estimating a proportion with a certain level of precision

http://bold-ed.com/calculator.htm#calculator

http://www.surveysystem.com/sscalc.htm

### Rasch data

data raschdata;
input
ID \$ 1-10
Q01 11 Q02 12 Q03 13 Q04 14 Q05 15 Q06 16 Q07 17 Q08 18 Q09 19 Q10 20
Q11 21 Q12 22 Q13 23 Q14 24 Q15 25 Q16 26 Q17 27 Q18 28;
cards ;
Richard M 111111100000000000
Tracie F 111111111100000000
Walter M 111111111001000000
Blaise M 111100101000000000
Ron M 111111111100000000
William M 111111111100000000
Susan F 111111111111101000
Linda F 111111111100000000
Kim F 111111111100000000
Carol F 111111111110000000
Pete M 111011111000000000
Brenda F 111110101100000000
Mike M 111110011111000000
Zula F 111111111110000000
Frank M 111111111111100000
Dorothy F 111111111010000000
Rod M 111101111100000000
Britton F 111111111100100000
Janet F 111111111000000000
David M 111111111100100000
Thomas M 111111111110100000
Betty F 111111111111000000
Bert M 111111111100110000
Rick M 111111111110100110
Don M 111011000000000000
Barbara F 111111111100000000
Audrey F 111111111010000000
Anne F 111111001110010000
Lisa F 111111111000000000
James M 111111111100000000
Joe M 111111111110000000
Martha F 111100100100000000
Elsie F 111111111101010000
Helen F 111000000000000000
;
run;

### PROC CALIS to do confirmatory factor analysis or even Rasch model???

I'd like to investigate if I can do CFA or Rasch model using PROC CALIS.

PROC CALIS COVARIANCE CORR RESIDUAL MODIFICATION data=one;
LINEQS
risk_1n= F1 + E1,
risk_2n = F1 + E2,
risk_3n= F1 + E3,
risk_4n= F2 + E4,
risk_5n= F2 + E5,
risk_6n= F2 + E6;
STD
F1 = 1,
F2 = 1,
E1-E6 = VARE1-VARE6;
COV
F1 F2 = CF1F2;
VAR risk_1n risk_2n risk_3n risk_4n risk_5n risk_6n;
RUN;

### PROC IMPORT & EXPORT

PROC EXPORT DATA= all3
OUTFILE= "C:\ ... \name_of_file.xlsx"
DBMS=EXCEL REPLACE;
SHEET="data check";
RUN;

PROC IMPORT OUT= WORK.asdf
DATAFILE= ".xlsx"
DBMS=EXCEL REPLACE;
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;

### Dummy variables in logistic regression models

Why do switching of values in a dummy variable and the use of class statement in PROC LOGISTIC change the coefficients in logistic regression?

(1) and (2) produce the same results. (3) and (4) produce the same results.

(1)
proc logistic data=here.asdf descending ;
model college= boy ;
run;

(2)
proc logistic data=here.asdf descending ;
class girl;
model college= girl ;
run;

(3)
proc logistic data=here.asdf descending ;
model college= girl;
run;

(4)
proc logistic data=here.asdf descending ;
class boy;
model college= boy ;
run;

 (1) Estimates (2) Estimates (3) Estimates (4) Estimates Intercept 0.5346 Intercept 0.3645 Intercept 0.1945 Intercept 0.3645 boy -0.3401 girl -0.1701 girl 0.3401 boy 0.1701 Odds ratio 0.712 0.712 1.405 1.405

### Proc glimmix for logistic regression has an lsmeans option

Thanks J for this info:

Use dist=binary and link=logit for logistic regressino using PROG GLIMMIX.

The lSMEANS statemetn is available from this and it produces probability scores. Use ilink option:

For difference in the groups, you use the DIFF option on the LSMEANS statement. The results are also on the logit scale. If you use the OR option, you will get the odds ratios for the group effect --

lsmeans &group /diff or;

Unfortunately, the difference in the probability scale between the groups are not directly available in PROC GLIMMIX. The magnitude of the difference is easy to compute -- you use the results from the ILINK option output, which gives you the estimated probabilities in each group, and compute the difference by hand or by using a data step, however, the appropriate standard errors for these differences are not available in PROC GLIMMIX.

### SAS function, RANUNI creates a flat distribution

randomx=ranuni(1);

The MEANS Procedure

Analysis Variable : randomx

N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
3812 0.4995942 0.2896404 0.000147896 0.9997638
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The SAS System 19:50 Wednesday, May 29, 2013 3

The UNIVARIATE Procedure

Histogram # Boxplot
0.975+***************************************** 201 |
.*************************************** 193 |
.************************************ 178 |
.*********************************** 173 |
.**************************************** 197 |
.************************************** 186 +-----+
.**************************************** 197 | |
.******************************************** 218 | |
.************************************* 182 | |
.************************************** 188 *-----*
.**************************************** 200 | + |
.************************************* 184 | |
.**************************************** 199 | |
.************************************ 177 | |
.*********************************** 174 | |
.************************************ 176 +-----+
.****************************************** 206 |
.************************************* 183 |
.**************************************** 199 |
0.025+***************************************** 201 |
----+----+----+----+----+----+----+----+----
* may represent up to 5 counts

### T-test for proportions of multiple groups using SAS procedures and datasteps

/*In this example, there is only two groups, but you can run it with multiple groups.*/

data kaz;set sashelp.class;
if age < 12 then THIS_IS_OUTCOME=0; if age > 13 then THIS_IS_OUTCOME=1;

run;

%let group=sex;
%let outcome=THIS_IS_OUTCOME;
%let dataname=kaz;

ods listing;
ods trace on;

proc means data=&dataname;
where &outcome ne .;
class &group;
var &outcome;
ods output summary=kaz_mean;
run;

proc glimmix data=&dataname;
class &group;
model &outcome=&group ;
lsmeans &group /diff ;
ods output Diffs=kaz_t;
run;

data kaz_t2;
set kaz_t;
keep &group _&group estimate;
run;
proc sort;
by &group;run;

data kaz_mean1;
set kaz_mean;
prop1=&outcome._mean;
n1=&outcome._n;
keep &group prop1 n1;
run;
proc sort;by &group;run;

data kaz_mean2;
set kaz_mean;
prop2=&outcome._mean;
n2=&outcome._n;
_&group=&group;
keep _&group prop2 n2;
run;
proc sort;by _&group;run;

data mix1;
merge kaz_t2 kaz_mean1;
by &group;
run;
proc sort;
by _&group;run;

data mix2;
merge mix1 kaz_mean2;
by _&group;
if estimate ne .;
DEG_FD=N1+N2-2;
/*QC’ed
tValue=2.228;
DEG_FD=10;
*/
tValue=(prop1-prop2)/(SQRT((prop1*(1-prop1)/n1 )+prop2*(1-prop2)/ n2));
/*2 tail test*/
P=(1-probt(abs(tValue),DEG_FD))*2;

length _2TAIL_STAT_TEST \$ 2;

if P < .05 then _2TAIL_TEST = "*"; group1=&group; group2=_&group; classvar="&group"; dif=estimate; outcome="&outcome"; run; data mix3; retain outcome classvar group1 group2 n1 prop1 n2 prop2 dif p _2TAIL_TEST ; set mix2; keep outcome classvar group1 group2 n1 prop1 n2 prop2 dif p _2TAIL_TEST ; ; run;

### Appling t-test for comparison of proportion in a data step:

%let var1=female;

%let var2=male;

%let key=ENROLL_RATE;

Z=(&var1._&key-&var2._&key)/(SQRT((&var1._&key*(1-&var1._&key)/&var1._SAMPLE_N )+&var2._&key*(1-&var2._&key)/ &var2._SAMPLE_N ));

data &schoollevel.&var1;
retain group;
merge ueka1b ueka2b;
by group;

P1=M2_MET_pre_Mean;
P2=M2_MET_post1_Mean;

N1=M2_MET_pre_N;
N2=M2_MET_post1_N;

A=(P1*(1-P1))/N1;
B=(P2*(1-P2))/N2;
STDERR=sqrt(A+B);

Z=abs((P1-P2)/STDERR);
/*two tail 5%*/

P=(1-probnorm(Z))*2;

if P < 0.05 then SIG="*";

drop A B P1 P2 N1 N2;
run;

### Statistical Citation

Raudenbush, S.W. and Bryk, A.S. (2002). Hierarchical Linear Models (Second Edition). Thousand Oaks: Sage Publications.

Rosenbaum, Paul R.; Rubin, Donald B. (1983). "The central role of the propensity score in observational studies for causal effects".Biometrika 70 (1): 41–55.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin

<http://sitemaker.umich.edu/group-based/optimal_design_software>
Software:

Raudenbush, S. W., et al. (2011). Optimal Design Software for Multi-level and Longitudinal Research (Version 3.01) [Software]. Available from www.wtgrantfoundation.org.

Documentation:

Spybrook, J., et al. (2011). Optimal Design for Longitudinal and Multilevel Research: Documentation for the Optimal Design Software Version 3.0. Available from www.wtgrantfoundation.org.

Schochet Z., Peter (2009). Do typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes.  National Center for Education Evaluation and Regional Assistance.   http://ies.ed.gov/ncee/pdf/20094065.pdf