SAS procedures – Page 2 – My Statistical tools

October 24, 2016

PROG GLM syntax -- contrasting groups using LSMEANS

proc glimmix data=kaz2.Year2AnalysisSample;
class SCHOOL_NAME grade_level race_cate;
model
&z_post = treat
grade_level
/solution ddfm=kr dist=normal link=identity s ;
random SCHOOL_NAME;
covtest /wald;
lsmeans grade_level / diff;
run;

June 21, 2016

Cronbach Coefficient Alpha

SAS's proc cor procedure produces two types of cronbach coefficient alpha: raw value and standardized value.

proc corr alpha data=dataname_here;
var item1 item2 item3 item4 item5 item6 item7;
run;

The result table includes two values:

Cronbach Coefficient Alpha

Variables Alpha
--------------------------------
Raw 0.74
Standardized 0.75

Standardized version is based on standardized values of all variables included in the analysis. If you standardize the variables yourself by creating z-score version of items and apply the same procedure, you will get the same value for both raw and standardized values.

proc standardize data=dataname_here out=dataname_here_B mean=0 std=1;
var item1 item2 item3 item4 item5 item6 item7;
run;
proc corr alpha data=dataname_here_B;
var item1 item2 item3 item4 item5 item6 item7;
run;

Cronbach Coefficient Alpha

Variables Alpha
--------------------------------
Raw 0.75
Standardized 0.75

May 4, 2016May 4, 2016

How to edit a PROC GLIMMIX syntax into a PROC MIXED syntax

I encountered a situation where somehow PROC GLIMMIX can't produce results while PROC MIXED can. In this WORD document, I showed you how I edited the PROC GLIMMIX syntax to be a PROC MIXED syntax. Please ignore my macro %w.

https://drive.google.com/open?id=0B7AoA5fyqX_sVXQ1YzRlSlFlaW8

May 3, 2016May 12, 2016

PROC GLIMMIX error messages

While running PROC GLIMMIX with a large dataset with categorical variables explicitly treated as CLASS variables AND with weight, I got the following error message:

WARNING: Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed.

The weight and the use of categorical variables were the cause of the problem as I found out that without weight the model produces results.

SAS support staff spent time trying to figure out what the issue is/was. I sent them a dataset (though I replaced the variables with randomly generated values) and the exact syntax. They determined that (thank you so much for going extra miles to figure this out!):

"the computation of MIVQUE fails because the default singular value (1e-12) leads to the detection of singularity while sweeping of the mixed model equation" -- May 12, 2016.

They proposed that singular value should be specified as a very small value to let the conversion happen:

proc glimmix data=example singular=1e-9;
weight psweight;
class race_cate level2 ;
model outcome= race_cate / s dist=normal link=identity ddfm=kr;
random intercept/subject=level2;
run;

****

Before hearing this final response from SAS, I wrote this and said that I chose to use PROC MIXED because it produces results.

SAS Support person suggested that I use of PARMS statement after Random, so I feed in the initial values for variance-covariance values manually/directly. I did something like this:

PARMS (0.2) (0.4);

Then I get:

“ERROR: Values given in PARMS statement are not feasible.”

Out of curiosity, I used PROC MIXED to run the same model without PARMS but with a weight statement. This time, the model produced results without encountering any errors.

PROC MIXED and PROC GLIMMIX produce identical results (or almost or essentially identical results) when the same models are specified.

I *think* my solution for now is to use PROC MIXED for this specific model.

The following two produced the identical results.
proc mixed data=analysis noclprint ;
where random_number > 0.8;
weight psweight;
class level2 race_cate2 loc_cate;
model zposttest=
<I put variable names here -- including categorical variables, level2 race_cate2 loc_cate>
/ s ddfm=kr ;
random int/subject=level2;
ParameterEstimates=kaz5
Diffs=DIF_GROUP5
LSMeans=LS5
Tests3=jointtest;
run;

proc glimmix data=analysis noclprint ;
where random_number > 0.8;
weight psweight;
class level2 race_cate2 loc_cate;
model zposttest=
<I put variable names here -- including categorical variables, race_cate loc_cate>
/ s dist=normal link=identity ddfm=kr STDCOEF ;
lsmeans
race_cate2 loc_cate
/ ilink diff
at (
treat
Zpretest
male
LEP
SPECED
DISADV
)=(0 0 0 0 0 0);
random int/subject=level2;
*parms (.03 ) (.45) /* /hold=1,2*/ ;
*ods output
ParameterEstimates=kaz5
Diffs=DIF_GROUP5
LSMeans=LS5
Tests3=jointtest;
run;

February 22, 2016

Logistic regression result and odds ratio

Mathematically, you only need the coefficient for the predictor to derive an odds ratio (you don't need the intercept value).

oddsratio=exp(coefficient_size)

November 20, 2015November 20, 2015

Doing HLM and Time-series analysis at the same time using GLIMMIX

HLM (multilevel models) and econometric analyses (e.g., time series analysis, ARIMA, etc.) are treated as different approaches (the goal of which is to deal with data dependency problem), they can be implemented in the same model via. SAS PROC GLIMMIX. However, I believe doing this is computationally demanding and models may not converge.

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glimmix_sect020.htm

For detailed discussion of this topic, see p.28 of my HLM manual:

http://www.estat.us/sas/PROCMIXED.doc

November 20, 2015November 20, 2015

Kenward-Roger Degrees of Freedom (SAS GLIMMIX)

I will edit this essay later.

Imagine I have an HLM model where level-1 are students and level-2 are schools. I can enter teacher-level variables, but people who are used to HLM software by SSI will wonder how the use of teacher-level variables is possible without making the model 3-level models (level1 students, level2 teachers, level3 schools). This is because in the old HLM tradition (people who studied HLM using HLM software in 1990s), equations are written/processed separately by levels and data must be prepared by levels (student data file and school data file).

If there are student-level and school-level, some HLM users may wonder how teacher-level variables can enter the model (or prepared in what file? student-level file or school-level file???).

People who use SAS or other software programs (maybe R?) will not wonder this because they think in terms of one whole equation and say, "of course we can enter variables of any levels."

The software programs they use adjust degree of freedom to take into consideration that teacher-level variables have a lot less number of values compared to the outcome variable.

K-R option in PROC GLIMMIX adjusts degree of freedom to account for the fact that group-level variables have a lot less possible values when compared to outcome variables.

K-R degree of freedom option seems most appropriate for multilevel modeling applied in educational evaluation studies (where typically students are nested within schools). This option kicks in only when at least one random coefficient is estimated (e.g., intercepts as random effects).

proc glimmix data=i3d;
class school ;
model y =< here variable names >
/solution ddfm=kr dist=normal link=identity s ;
random int /sub=school;
run;

After playing with data type of simulation (I will do a better and well-documented simulation in the future), I learned the following:

For student-level (level-1) variables, degree of freedom under KR option is close to the number of students (minus a small number of cases maybe close to the number of predictors). DF is larger if variance contained in the variable is greater. For binary variables, DF is the largest when variance is .50 (DF gets smaller as a proportion gets close to 0 or 1).

For school-level (level2) variables, DF is close to the number of schools minus a small number maybe close to a number of predictors. This may be also adjusted by variance of the variable.

I created a fake teacher-level predictors by creating two possible numeric values per school. DF for this variable was close to the number of teachers in the data (two per school) minus a small number. I think this is also adjusted by variance of the variable (which I didn't test exactly).

November 18, 2015November 18, 2015

Proc mixed and Proc glimmix produce identical results

PROC MIXED and PROC GLIMMIX produce identical results for the following linear model settings.

proc glimmix data=kaz.asdf;
where IMPACT_ANALYSIS=1;
class school;
model
y =
x1 x2 x3
/solution ddfm=kr dist=normal link=identity s ;
random school;
run;

proc mixed data=kaz.asdf;
where IMPACT_ANALYSIS=1;
class school;
model
y =
x1 x2 x3
/solution s ddfm=kr;
random school;
run;

November 17, 2015November 18, 2015

How to derive subgroup adjusted outcome averages and conduct pairwise stat-test

When you run a statistical interaction model (e.g., Y=TREATMENT + GENDER + TREATMENT*GENDER), you also want to run a mathematically equivalent model:
Y= T_MALE + C_MALE + T_FEMALE + C_FEMALE.

SUBGROUP defines the four groups below.

This is for linear model (HLM because I have the random statement).

proc glimmix data=asdf3 namelen=32;
class CAMPUSID SUBGROUP;
model y=
x1 x2 x3
SUBGROUP
/solution ddfm=kr dist=normal link=identity s noint;
lsmeans SUBGROUP / ilink diff;
random int / subject = CAMPUSID;
ods output ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1
nobs=x4var1 Diffs=DIF_RESULT1;
run;

This is for logistic regression model.

proc glimmix data=asdf METHOD=RSPL;
class CAMPUS_14 subgroup;
model y=x1 x2 x3 subgroup
/dist=binomial link=logit s ddfm=kr;
lsmeans group / ilink diff;
ods output ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1
Diffs=DIF_RESULT1 LSMeans=LS1;
run;

October 24, 2015October 24, 2015

Automate the choice between HLM and non-HLM

When running PROC GLIMMIX (SAS) in a macro-driven way (e.g., running similar models 100 times), what gets annoying is some HLM models do not converge and you have to comb through output and decide which models to convert to fixed effect models, which is simpler and is easier to converge. The following allows the execution of a fixed model (non-HLM) when a random effect model (HLM) fails.

The following macro (%mend checkds;) checks if the first random effect model produces one of the result files (parameter estimates) and if it doesn't exist (i.e., random effect model did not converge), it will run the model without the random effect statement.

proc glimmix data=asdf METHOD=RSPL;
class CAMPUS_14;
model &out = &main

stud_char
interX

&predictors
/dist=binomial link=logit s ddfm=kr STDCOEF;
random int / subject = CAMPUS_14;
covtest /wald;
ods output
ParameterEstimates=kaz1
CovParms=uekawa1
ModelInfo=estes
dimensions=diminfo
ConvergenceStatus=concon
FitStatistics=FITSTAT

;
run;

data hlm1;
hlm1="HLM ";
run;

/*Check if converged and if not run fixed model*/
%macro checkds(dsn);
%if %sysfunc(exist(&dsn)) %then %do;
/*there is concon created*/
%end;
%else %do;
/*delete imcomplete data from the previous proc
that did not converge*/
proc datasets;
delete kaz1 estes diminfo concon FITSTAT hlm1;
run;

proc glimmix data=asdf METHOD=RSPL;
class CAMPUS_14;
model &out = &main
stud_char
interX
&predictors
/dist=binomial link=logit s ddfm=kr;
ods output
ParameterEstimates=kaz1
/*CovParms=uekawa1*/
/*nobs=jeana */
ModelInfo=estes
dimensions=diminfo
/*ConvergenceStatus=concon*/
FitStatistics=FITSTAT;
run;

data hlm1;
hlm1="Fixed";
run;

%end;
%mend checkds;
/* Invoke the macro, pass a non-existent data set name to test */
*%checkds(work.concon);
*%checkds(work.uekawa1);
%checkds(work.FITSTAT);