R lmer and SAS PROC GLIMMIX for multilevel linear modeling and logistic modeling

The following R and SAS syntaxes returned almost identical results.  (I was a bit surprised that they used almost identical degree of freedom - where group variables uses the n of groups minus something.)

LINEAR MODELING

R:

library(lme4)

kaz1 <- lmer(formula=SAT_TOTAL ~ treat + pretest_GPA + male + minority + disadv + (1|group_school),
data= sas_data_Sat, family = gaussian, REML = TRUE, verbose = FALSE)
summary(kaz1)

SAS:

proc glimmix data=sample ;
where flag_SAT=1 ;
class group_school ;
model SAT_TOTAL=treat pretest_GPA male minority disadv/
solution ddfm=kr dist=normal link=identity;
random intercept /subject=group_school;
run;

 

NON-LINEAR MODELING (I don't like the degree of freedom option, though. KR is not working. See the syntax at the bottom of this page for specifying DF)

SAS:

proc glimmix data=sample METHOD=LAPLACE ;
*where flag_CE=1 ;
where flag_SAT=1 ;
*where flag_DE=1 ;
class group_school ;
model enroll_fall=treat pretest_GPA male minority disadv /
solution /*ddfm=kr*/ dist=binomial link=logit ;
random intercept /subject=group_school;
run;

R:

m <- glmer(enroll_fall ~ treat + pretest_GPA + male + minority + disadv +
(1 | group_school), data = sas_data_Sat, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
#print(m, corr = FALSE)
summary(m)

 

One can specify DF this way.

proc glimmix data=whole METHOD=LAPLACE ;
by subgroup;
where flag_SAT=1 ;
class group_school ;
model &var1=treat C_pretest_GPA male minority disadv /
solution /*ddfm=kr*/ dist=binomial link=logit
ddf=36.157,.,.,.;
random intercept /subject=group_school;
ods output ParameterEstimates=_&var1;
run;

 

SAS PROC GLIMMIX example

The following is a proc glimmix example syntax.  I ran it using a fake dataset, so the results are also fake.  The outcome is an interval variable and the model is a linear model (not a non-linear model like the logistic regression model).  The random statement makes this model "multilevel."  Level-1 units are students and level-2 units are schools (IDs are nces_school_name).  The intercept is being estimated as the grand average of school-specific intercepts.

This model is a multilevel model and different from the OLS model because it estimates intercept as random effects (school-specific intercepts are derived as random effects and the grand average of those are reported as the intercept).

proc glimmix data=temp1 ;
class nces_school_name ;
model y=x1 x2 x3/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;

The main result table you will need to look at is the following.  You can interpret this just like you would interpret the OLS regression result.

Solutions for Fixed Effects
Effect Estimate Standard
Error
DF t Value Pr > |t|
Intercept 0.5062 0.01600 1040 31.64 <.0001
x1 0.005776 0.01717 3483 0.34 0.7365
x2 0.000465 0.01684 3486 0.03 0.9780
x3 -0.01117 0.01708 3487 -0.65 0.5133

Another table that you want to look at is this.  Residual (0.54) is Level-1 variance (student-level variance).  Intercept/nces_school_name (0.65) is Level-2 variance (school-level variance).  I made up these numbers.

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Intercept nces_school_name 0.65 0.0181
Residual   0.54 0.2002

I will focus on the first coefficient table when discussing results; however, I would also report information related to the second table.  I would report the following information:

ICC (intraclass correlation):

level2 variance / (level1+level2 variance)

That is:

0.65 / (0.65 + 0.54) =0.546218

This shows the degree to which outcome variance is located between groups (schools, clusters) as opposed to within individuals.

I would also report "variance explained."

I need to run the additional model, which is the analysis-of-variance model where I have no covariate in the model:

proc glimmix data=temp1 ;
class nces_school_name ;
model y=/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;

Let's take a look at the covariance table (I made up these values).

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Intercept nces_school_name 0.92 0.181
Residual   0.83 0.200

 

Let's combine this table with the other one in this way.

Analysis of variance model Final model Variance explained
Level 1 variance 0.83 0.54 0.34939759
Level 2 variance 0.92 0.65 0.293478261

 

 

Variance explained was calculated as:

(0.83-0.54)/0.83

and

(0.92-0.65)/0.92

 

I would make the final table look look like this.   I didn't round numbers, but you should.

Estimates Standard error p-value statistical test
Intercept 0.5062 0.016 <.0001 ***
x1 0.00578 0.01717 0.7365
x2 0.00047 0.01684 0.978
x3 -0.0112 0.01708 0.5133
level-1 variance 0.83
level-2 variance 0.92
ICC 0.55
Level-1 variance explained 0.35
Level-2 variance explained 0.3
Notes: *** if p < 0.001, ** if p < 0.01, * if p < 0.05.

 

How to interpret coefficients in the table

In the table above, x1's coefficient is 0.00578.  This means that one unit increase in X1 will lead to an increase of 0.00578 in Y.   The p-value associated with this is 0.74.  So the coefficient here is not statistically significant at alpha=0.05.

One unit increase in X1 means ... if X is about height in meters (e.g., 1.5 meter, 1.7 meter), then 1 meter is 1 unit increase.  If X is a binary variable (0 or 1), then one unit increase means "0 to 1 increase".

For my work, I almost always have a variable called TREATMENT which is 1 if subjects received treatment/intervention and 0 if the subjects did not.  The coefficient for this is called "program impact effect."  If, for example, the program impact effect is 0.25, I just say that and I also mention that other covariates are in the model and the program impact effect is adjusted for these factors.  If the estimated program effect is 0.25, it means that the difference between the two groups (treatment vs. control) is 0.25 in outcome.

I also want to provide a standardized version of the program effect.  I would run the same statistical model with the z-score version of the outcome variable.  To do this, I usually use proc standard:

data abc2; set abc1;

Z_Y=Y;

run;

proc standard data=abc2 out=abc3 mean=0 std=1;

var Z_Y;

run;

Another approach would be to code this by hand in a datastep.  if the mean of Y is -0.42 and SD is 0.5:

data abc2; set abc1;

Z_Y= (Y - 0.42)/0.5 ;

run;

 

 

 

Using R to run multilevel models

I'm learning how to run multilevel models in R.

I tried the analysis of variance model, AKA, the intercept-only model.

fit<lme(post_test~1,random=~1|school,data=mySASData,control=list(opt="optim"))
summary(fit)
anova(fit)
VarCorr(fit)
summary(fit)

 

I run this in SAS and get the same results.  I didn't get the same degree of freedom.

proc glimmix data=sashlm.core_2014_4_years;
class school;
model post_test=/solution ddfm=kr dist=normal link=identity;
random intercept /subject=school;
run;

HLM: What happens if I enter level2 variables at level 1?

The goal of this exercise is to find out what happens if I intentionally use a level2 variable at level 1 in HLM.   I found that the coefficients and standard errors remain about the same.  The parameter that differed was just the degree of freedom, which was consistent with my expectation.

Using my old NELS dataset, I ran two different HLM models using Bryk and Raudenbush’s software (See model 1 and model 2 equations in the table below).

  • The URBAN as level 1 covariate model (I entered the level2 variable URBAN wrongly at level 1)
  • The URBAN as level 2 covariate model (I entered the level2 variable URBAN correctly at level 2)

The outcome variable is the achievement composite (POSTTEST), students are level 1 and schools are level 2.  When expressed as mixed models, the two models are identical, which is why I expected most parameters to come out the same.

POSTTESTij = γ00
γ10*URBANij  + u0jrij

The first model (MODEL 1; see below) included URBAN (students are in urban school) as a level 1 predictor.  Of course this is a wrong specification because urban is a school characteristic.  In the second model (MODEL 2), I used it at the expected level, which is at level 2 (school level).

These models look different, but AGAIN when expressed as mixed models, they are identical.  As the third model (MODEL 3), I replicated the same HLM model using SAS PROC GLIMMIX.  SAS requires that the equation be expressed as a mixed model.

Results showed that coefficients and standard errors are more or less the same across three models.  The only one thing that was different was degree of freedom.

Conclusion: As long as variables enter the model as fixed effects as done here, there is nothing magical about the HLM model.  HLM software or SAS PROC GLIMMIX (option ddfm=kr) adjust degree of freedom values, accounting for the fact that URBAN is a school-level variable and thus should not be awarded a value that is too large.  Notice that under the correct specification (MODEL 2 and MODEL 3), the degree of freedom for URBAN is close to the number of schools, not to the number of students.

Thanks for any comments you may have.

MODEL 1 MODEL 2 MODEL 3
URBAN as LEVEL 1 covariate URBAN as level 2 covariate SAS PROC GLIMMIX
Level-1 Model

POSTTESTij = β0j + β1j*(URBANij) + rij

Level-2 Model

β0j = γ00 + u0j
β1j = γ10

Mixed Model

POSTTESTij = γ00
γ10*URBANij  + u0jrij

Level-1 Model

POSTTESTij = β0j + rij

Level-2 Model

β0j = γ00 + γ01*(URBAN_LEj) + u0j

Mixed Model

POSTTESTij = γ00 + γ01*URBAN_LEj  + u0jrij

proc glimmix data=kaz.level1;

class schoolID;

model

posttest =

urban

/solution ddfm=kr dist=normal link=identity s ;

random schoolID;

run;

 

Final estimation of fixed effects
(with robust standard errors)

Fixed Effect  Coefficient  Standard
error
 t-ratio  Approx.
d.f.
 p-value
For INTRCPT1, β0
    INTRCPT2, γ00 52.643432 0.526139 100.056 125 <0.001
For URBAN slope, β1
    INTRCPT2, γ10 -0.450022 1.157924 -0.389 692 0.698

 

Final estimation of variance components

Random Effect Standard
Deviation
Variance
Component
  d.f. χ2 p-value
INTRCPT1, u0 3.76951 14.20923 125 292.48369 <0.001
level-1, r 8.39004 70.39271

Statistics for current covariance components model

 

Final estimation of fixed effects
(with robust standard errors)

Fixed Effect  Coefficient  Standard
error
 t-ratio  Approx.
d.f.
 p-value
For INTRCPT1, β0
    INTRCPT2, γ00 52.643459 0.526140 100.056 124 <0.001
    URBAN_LE, γ01 -0.449983 1.157920 -0.389 124 0.698

 

Final estimation of variance components

Random Effect Standard
Deviation
Variance
Component
  d.f. χ2 p-value
INTRCPT1, u0 3.76919 14.20678 124 292.48068 <0.001
level-1, r 8.39007 70.39334

Statistics for current covariance components model

 

Solutions for Fixed Effects
Effect Estimate Standard Error DF t Value Pr > |t|
Intercept 52.6434 0.5460 95.01 96.41 <.0001
urban -0.4501 1.1095 132.4 -0.41 0.6856
 

 

Covariance Parameter Estimates
Cov Parm Estimate Standard Error
schoolID 14.2150 3.4610
Residual 70.3913 3.7455

 

Datasets:

www.nippondream.com/file/datafiles_HLM.zip

Results from Model 1

Results from Model 2

Results from PROC GLIMMIX

 

Random effects vs. Fixed Effects

Random effects:

Effects in this context refer to group outcome averages estimed by a regression model.  In regular regression models (OLS, Logistic Regression model), these are often estimated as coefficients of a series of dummy variables representing group units (e.g., school A, school B, .. each coded as 0 or 1).  These are fixed effects if estimated by non HLM model.  If I am part of the school whose test score average is 500 and if it is estimated as fixed effects, that value is determined solely by the information obtained from that school (I used the word solely to emphasize the point, but this may not be exactly correct due to the fact that predictors from the whole data helped derive coefficients that in turn help adjust the average outcome).

HLM does something similar, but after the initial group averages were estimated, they will be adjusted by reliablity of the group estimates (and this process is called Baysian Shrinkage).  They are adjusted such that the estimates are pulled towards the grand mean. Reliably measured group estimate will stay close to the original fixed effect value.  Not so reliable estimates will be pulled towards the grand mean.  The idea is to mix the grand mean and the group mean to prevent not reliant group estimate from being away from the mean.

Reliabitliy of the group average is a function of a) n of subjects in it and b) outcome variance.  I think Intraclass correlation may be part of the algorithim, but I will check.

When to use HLM (Hierarchical Linear Modeling)

Not common: If your data is a census data (everyone is in the dataset, like US Census), you do not need to use HLM.  You do not even need statistical testing because every estimate you get is a true estimate.  (Note: my friend wrote me and said he disagrees.  He said even if he had all people's data he wants to do stat testing to compare male and female when the difference is very small.  Again I disagree.)

Common: If your data is a sample and data are hierarchically structured and thus errors are dependent (violation of the independence assumption), you may consider HLM to alleviate the clustering problem.  Examples of hierarchically structured data are:

  • Students nested within schools (2-levels)
  • Repeated measures nested within subjects who are nested within schools (3-levels)

It is sometimes said that the motivation for the use of HLM must be whether the group units are a random sample of the population.  The argument claims that if, for example, schools in the sample are a convenient sample, one cannot use HLM.  This is not exactly correct.  I state the following using RCT (randomized control trial) or QED (Quasi-experimental design) impact studies as a context.

If the group units are randomly sampled, one can generalize the result of impact analysis to the whole population.  If Intervention program A was found effective (or not effective) and the sample was a random sample of US population, this finding is generalizable.  If impact estimation relied on a convenient sample, one cannot generalize it to the whole population.  If RCT, the random sample vs. convinient sample difference should not affect the internal validity of the impact estimate.

There is a tricky case.  HLM is inappropriate if the group units are, for a lack of better word, distinct groups with apparent identity and as a researcher you are genuinely interested in the pure group estimates.  This is a case when the exact group estimate, derived as fixed effects not as random effects, are of interest.

For example, if group units are US states, HLM is most likely inappropriate.  State specific estimates should be interpreted as such and should not be treated as random effects. Imagine the outcome of interest is income level and the state you live in had the average income of 50,000.  Just because your state had a small number of survey respondents (and thus reliability of the estimate is lower and HLM will pull your average closer to the grand average), you do not want to see your state’s average to be changed to look more like the national average.  Another example would be a study of 20 regional hospitals.  You should be interested in the fixed estimates of hospital outcomes.

When HLM treats schools as the mixed effects, we are treating school units somewhat instrumentally (a bit rude thing to do :>) in order to obtain the best value for the intercept (=grade average of the group specific effects estimated as random effects).  So if you are a school, you may feel HLM is treating you without respect.  HLM will not respect your data if the sample size is small and outcome variance in your school is large.  But HLM is respecting you in a different way.  Your data is unreliable, so let me just adjust it to be more normal, so you won't embarrass yourself.

PROC GLIMMIX error messages

While running PROC GLIMMIX with a large dataset with categorical variables explicitly treated as CLASS variables AND with weight, I got the following error message:

WARNING: Obtaining minimum variance quadratic unbiased estimates as starting values for the         covariance parameters failed.

The weight and the use of categorical variables were the cause of the problem as I found out that without weight the model produces results.

SAS support staff spent time trying to figure out what the issue is/was.  I sent them a dataset (though I replaced the variables with randomly generated values) and the exact syntax.  They determined that (thank you so much for going extra miles to figure this out!):

"the computation of MIVQUE fails because the default singular value (1e-12) leads to the detection of singularity while sweeping of the mixed model equation"  -- May 12, 2016.

They proposed that singular value should be specified as a very small value to let the conversion happen:

proc glimmix data=example singular=1e-9;
weight psweight;
class race_cate level2 ;
model outcome= race_cate / s dist=normal link=identity ddfm=kr;
random intercept/subject=level2;
run;

****

Before hearing this final response from SAS, I wrote this and said that I chose to use PROC MIXED because it produces results.

 

SAS Support person suggested that I use of PARMS statement after Random, so I feed in the initial values for variance-covariance values manually/directly.  I did something like this:

PARMS (0.2) (0.4);

Then I get:

“ERROR: Values given in PARMS statement are not feasible.”

Out of curiosity, I used PROC MIXED to run the same model without PARMS but with a weight statement.  This time, the model produced results without encountering any errors.

PROC MIXED and PROC GLIMMIX produce identical results (or almost or essentially identical results) when the same models are specified.

I *think* my solution for now is to use PROC MIXED for this specific model.

The following two produced the identical results.
proc mixed data=analysis noclprint ;
where random_number > 0.8;
weight psweight;
class level2 race_cate2 loc_cate;
model zposttest=
<I put variable names here -- including categorical variables, level2 race_cate2 loc_cate>
/ s ddfm=kr ;
random int/subject=level2;
ParameterEstimates=kaz5
Diffs=DIF_GROUP5
LSMeans=LS5
Tests3=jointtest;
run;

proc glimmix data=analysis noclprint ;
where random_number > 0.8;
weight psweight;
class level2 race_cate2 loc_cate;
model zposttest=
<I put variable names here -- including categorical variables, race_cate loc_cate>
/ s dist=normal link=identity ddfm=kr STDCOEF ;
lsmeans
race_cate2 loc_cate
/ ilink diff
at (
treat
Zpretest
male
LEP
SPECED
DISADV
)=(0 0 0 0 0 0);
random int/subject=level2;
*parms (.03 ) (.45) /* /hold=1,2*/ ;
*ods output
ParameterEstimates=kaz5
Diffs=DIF_GROUP5
LSMeans=LS5
Tests3=jointtest;
run;