/*Kaz's SAS juku www.estat.us Let's try various statistical models */ this shows how various statistica model, simple to complex, are related as members of a great happy family. I wish my stat professors had been able to show this for me before doing anything else The problem is that all these models are written in different styles of equations. When I first learned Rasch model equasion, I was so confused about why it does not have */ proc concents data=sashelp.Prdsal3; run; /*how to get a mean and variance for a variable Weight*/ /*Did you know that a regression model with no predictor returns the mean value for the dependent variable?*/ /*This, I knew.*/ proc nlmixed data=sashelp.Prdsal3; predicted_value=b0; model actual ~ normal(predicted_value, variance); run; /*How to do a simple linear model, the same as OLS regression. Estimation method is different but returns the same results.*/ proc nlmixed data=sashelp.Prdsal3; predicted_value=b0 + b1*predict; model actual ~ normal(predicted_value, variance); run; /*How to do HLM?*/ /*since PROC NLMIXED does not accept CLASS statement, you need to sort*/ proc sort data=sashelp.Prdsal3 out=abc; by state; run; /*I got initial estimates by this. It is doing the same thing as the one below it by PROC NLMIXED proc mixed data=sashelp.Prdsal3 covtest noclprint; title "Non conditional model"; class state; model actual =/solution ddfm=bw; random intercept/sub=state; run;*/ proc nlmixed data=abc; parms b0=869 b1=0.00136 level2_variance=51805 variance=297579 ; predicted_value=b0 + b1*predict + u; model actual ~ normal(predicted_value, variance); random u ~ normal(0,level2_variance) subject=state; run; /*How about logistic HLM?*/ /*Logistic HLM is a misnomer because of L in the middle. Maybe multilevel logistic regression?*/ data ABC2; set sashelp.Prdsal3; ****************; sales_goal_achieved=0; if predict < actual then sales_goal_achieved=1; ****************; run; proc nlmixed data=ABC2; parms a=2.5 s2u=0.5 beta1=-0.00288; eta=a + beta1*PREDICT + u ; expeta=exp(eta); p=expeta/(1+expeta); model sales_goal_achieved~ binary(p); random u ~ normal(0,s2u) subject=state out=residual_file; run; /*To step back a little bit, what about just simple logistic regression*/ /*note what I got rid of*/ proc nlmixed data=ABC2; parms a=2.5 s2u=0.5 beta1=-0.00288; eta=a + beta1*PREDICT /*+ u*/ ; expeta=exp(eta); p=expeta/(1+expeta); model sales_goal_achieved~ binary(p); *random u ~ normal(0,s2u) subject=state out=residual_file; run; /*what about Rasch model?*/ /*We need to transform the data set in a major way*/ /*DATA TITLE='Knox Cube Test (Best Test Design p.31)'*/ /*Also read > Kamata, A. (2001). Item analysis by the hierarchical generalized linear > model. Journal of Educational Measurement, 38, 79-93. */ data troy; input Name \$ 1-8 gender \$ 9 item01 11 item02 12 item03 13 item04 14 item05 15 item06 16 item07 17 item08 18 item09 19 item10 20 item11 21 item12 22 item13 23 item14 24 item15 25 item16 26 item17 27 item18 28 ; datalines; Richard M 111111100000000000 Tracie F 111111111100000000 Walter M 111111111001000000 Blaise M 111100101000000000 Ron M 111111111100000000 William M 111111111100000000 Susan F 111111111111101000 Linda F 111111111100000000 Kim F 111111111100000000 Carol F 111111111110000000 Pete M 111011111000000000 Brenda F 111110101100000000 Mike M 111110011111000000 Zula F 111111111110000000 Frank M 111111111111100000 Dorothy F 111111111010000000 Rod M 111101111100000000 Britton F 111111111100100000 Janet F 111111111000000000 David M 111111111100100000 Thomas M 111111111110100000 Betty F 111111111111000000 Bert M 111111111100110000 Rick M 111111111110100110 Don M 111011000000000000 Barbara F 111111111100000000 Adam M 111111100000000000 Audrey F 111111111010000000 Anne F 111111001110010000 Lisa F 111111111000000000 James M 111111111100000000 Joe M 111111111110000000 Martha F 111100100100000000 Elsie F 111111111101010000 Helen F 111000000000000000 ; run; proc transpose data=troy out=troy2; var item01 -- item18; by name notsorted; run; ods trace on; proc glmmod; class _name_; model col1=_name_; ods output DesignPoints=john; run; data final; merge troy2 john; response=col1; /*just get rid of strange part from the variable names*/ %macro suji (var1=); item&var1= _NAME__item&var1;drop _NAME__item&var1; %mend suji; %suji (var1=01);%suji (var1=02);%suji (var1=03);%suji (var1=04);%suji (var1=05); %suji (var1=06);%suji (var1=07);%suji (var1=08);%suji (var1=09);%suji (var1=10); %suji (var1=11);%suji (var1=12);%suji (var1=13);%suji (var1=14);%suji (var1=15); %suji (var1=16);%suji (var1=17);%suji (var1=18); drop intercept Obsnumber COL1 _NAME_; run; proc print; title "This is the data that goes into NLMIXED"; run; /* I would have understood Rasch model immediately in my first class with Ben Right if I had seen it written in this way. This application below treats person measures as random effects, but Mike Linacre's WINSTEPS program treats both person and item measures as fixed effects (i.e., using a bunch of dummy variables. My question: PROC NLMIX does optimal weighting of means or what we call SHRINKAGE in HLM literature Means in this context refers to person measures. Is it okay for person measures to shrink to the grand mean just because variance in persons are large? THis means that people who are scoring around 50/50 are pulled towards the grand-mean. */ proc nlmixed data=final; title "See parameter estimate table for item difficulty measure"; eta= b1*item01 + b2*item02 + b3*item03 + b4*item04 + b5*item05 /* It takes too long time to run, so I just got rid of the rest + b6*item06 + b7*item07 + b8*item08 + b9*item09 + b10*item10 + b11*item11 + b12*item12 + b13*item13 + b14*item14 + b15*item15 + b16*item16 + b17*item17 + b18*item18 */ + u; expeta=exp(eta); p=expeta/(1+expeta); model response ~ binary(p); random u ~ normal(0,s2u) subject=name out=PERSON_MEASURE_FILE; /*out= part could be done at ODS line, but I could not figure it out how*/ ods output ParameterEstimates=ITEM_DIFFICULTY_FILE; run; *ods trace off; data items; set ITEM_DIFFICULTY_FILE; estimate=estimate*-1;/*note here I flipped plus minus sign to force "item easiness" to be come "item difficulty"*/; run; proc print data=items; title "ITEM DIFFICULTY MEASURES derived by PROC NLMIXED"; run; proc print data=PERSON_MEASURE_FILE; title "Person Measures derived by PROC NLMIXED)"; run; proc univariate plot data=PERSON_MEASURE_FILE; var Estimate ;run;