PROC GLIMMIX – My Statistical tools

November 24, 2022November 24, 2022

How to create a bar graph off logistic regression results

This is how you can create a graph comparing the treatment group and the comparison group's %s -- based on findings from the logistic regression model.

Based on the final multivariate model, you get the program effect in logit. For example:

-0.223

You also run the simpler model with the treatment program indicator only. Get the intercept value. For example:

1.7081

When this logic is converted into a %, it will be the unadjusted % of the comparison group.

Use these two numbers to derive %s for the treatment group and comparison group.

https://docs.google.com/spreadsheets/d/0B7AoA5fyqX_sN0RUc0E5aFowb00/edit?usp=sharing&ouid=116965977650967783537&resourcekey=0-nGn3k4fyPrVKP2f4Xm-fOA&rtpof=true&sd=true

The resulting graph fixes the % of the comparison group to the simple % of the comparison group and shows the % of the treatment group based on the program effect adjusted for covariates.

If you use the intercept value and the program effect value from the final multivariate model, the meaning of percentages become non-intuitive. If you center predictors (except for the treatment indicator), the intercept will represent a typical person in the dataset. But this is a bit difficult to undestand.

Without centering, the intercept value will represent someone who is, for example, female, white, etc., depending on omitted categories of the variables. If continuous variables are included and if the value of 0 in that variable is not intuitive, the meaning of the intercept is a difficult to interpret.

I recommend using the intercept from the simple model (only the treatment group is the predictor), so the comparison group % will be fixed at a simple descriptive % of the comparison group.

March 17, 2022November 3, 2022

R lmer and SAS PROC GLIMMIX for multilevel linear modeling and logistic modeling

The following R and SAS syntaxes returned almost identical results. (I was a bit surprised that they used almost identical degree of freedom - where group variables uses the n of groups minus something.)

LINEAR MODELING

library(lme4)

kaz1 <- lmer(formula=SAT_TOTAL ~ treat + pretest_GPA + male + minority + disadv + (1|group_school),
data= sas_data_Sat, family = gaussian, REML = TRUE, verbose = FALSE)
summary(kaz1)

SAS:

proc glimmix data=sample ;
where flag_SAT=1 ;
class group_school ;
model SAT_TOTAL=treat pretest_GPA male minority disadv/
solution ddfm=kr dist=normal link=identity;
random intercept /subject=group_school;
run;

NON-LINEAR MODELING (I don't like the degree of freedom option, though. KR is not working. See the syntax at the bottom of this page for specifying DF)

SAS:

proc glimmix data=sample METHOD=LAPLACE ;
*where flag_CE=1 ;
where flag_SAT=1 ;
*where flag_DE=1 ;
class group_school ;
model enroll_fall=treat pretest_GPA male minority disadv /
solution /*ddfm=kr*/ dist=binomial link=logit ;
random intercept /subject=group_school;
run;

m <- glmer(enroll_fall ~ treat + pretest_GPA + male + minority + disadv +
(1 | group_school), data = sas_data_Sat, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
#print(m, corr = FALSE)
summary(m)

One can specify DF this way.

proc glimmix data=whole METHOD=LAPLACE ;
by subgroup;
where flag_SAT=1 ;
class group_school ;
model &var1=treat C_pretest_GPA male minority disadv /
solution /*ddfm=kr*/ dist=binomial link=logit
ddf=36.157,.,.,.;
random intercept /subject=group_school;
ods output ParameterEstimates=_&var1;
run;

October 22, 2019October 22, 2019

SAS PROC GLIMMIX example

The following is a proc glimmix example syntax. I ran it using a fake dataset, so the results are also fake. The outcome is an interval variable and the model is a linear model (not a non-linear model like the logistic regression model). The random statement makes this model "multilevel." Level-1 units are students and level-2 units are schools (IDs are nces_school_name). The intercept is being estimated as the grand average of school-specific intercepts.

This model is a multilevel model and different from the OLS model because it estimates intercept as random effects (school-specific intercepts are derived as random effects and the grand average of those are reported as the intercept).

proc glimmix data=temp1 ;
class nces_school_name ;
model y=x1 x2 x3/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;

The main result table you will need to look at is the following. You can interpret this just like you would interpret the OLS regression result.

Solutions for Fixed Effects
Effect	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Intercept	0.5062	0.01600	1040	31.64	<.0001
x1	0.005776	0.01717	3483	0.34	0.7365
x2	0.000465	0.01684	3486	0.03	0.9780
x3	-0.01117	0.01708	3487	-0.65	0.5133

Another table that you want to look at is this. Residual (0.54) is Level-1 variance (student-level variance). Intercept/nces_school_name (0.65) is Level-2 variance (school-level variance). I made up these numbers.

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	nces_school_name	0.65	0.0181
Residual		0.54	0.2002

I will focus on the first coefficient table when discussing results; however, I would also report information related to the second table. I would report the following information:

ICC (intraclass correlation):

level2 variance / (level1+level2 variance)

That is:

0.65 / (0.65 + 0.54) =0.546218

This shows the degree to which outcome variance is located between groups (schools, clusters) as opposed to within individuals.

I would also report "variance explained."

I need to run the additional model, which is the analysis-of-variance model where I have no covariate in the model:

proc glimmix data=temp1 ;
class nces_school_name ;
model y=/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;

Let's take a look at the covariance table (I made up these values).

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	nces_school_name	0.92	0.181
Residual		0.83	0.200

Let's combine this table with the other one in this way.

	Analysis of variance model	Final model	Variance explained
Level 1 variance	0.83	0.54	0.34939759
Level 2 variance	0.92	0.65	0.293478261

Variance explained was calculated as:

(0.83-0.54)/0.83

and

(0.92-0.65)/0.92

I would make the final table look look like this. I didn't round numbers, but you should.

	Estimates	Standard error	p-value	statistical test
Intercept	0.5062	0.016	<.0001	***
x1	0.00578	0.01717	0.7365
x2	0.00047	0.01684	0.978
x3	-0.0112	0.01708	0.5133

level-1 variance	0.83
level-2 variance	0.92
ICC	0.55
Level-1 variance explained	0.35
Level-2 variance explained	0.3
Notes: * if p < 0.001, if p < 0.01, * if p < 0.05.

How to interpret coefficients in the table

In the table above, x1's coefficient is 0.00578. This means that one unit increase in X1 will lead to an increase of 0.00578 in Y. The p-value associated with this is 0.74. So the coefficient here is not statistically significant at alpha=0.05.

One unit increase in X1 means ... if X is about height in meters (e.g., 1.5 meter, 1.7 meter), then 1 meter is 1 unit increase. If X is a binary variable (0 or 1), then one unit increase means "0 to 1 increase".

For my work, I almost always have a variable called TREATMENT which is 1 if subjects received treatment/intervention and 0 if the subjects did not. The coefficient for this is called "program impact effect." If, for example, the program impact effect is 0.25, I just say that and I also mention that other covariates are in the model and the program impact effect is adjusted for these factors. If the estimated program effect is 0.25, it means that the difference between the two groups (treatment vs. control) is 0.25 in outcome.

I also want to provide a standardized version of the program effect. I would run the same statistical model with the z-score version of the outcome variable. To do this, I usually use proc standard:

data abc2; set abc1;

Z_Y=Y;

run;

proc standard data=abc2 out=abc3 mean=0 std=1;

var Z_Y;

run;

Another approach would be to code this by hand in a datastep. if the mean of Y is -0.42 and SD is 0.5:

data abc2; set abc1;

Z_Y= (Y - 0.42)/0.5 ;

run;

January 16, 2019

Anova test using PROC GLIMMIX

%macro clem(var1=,var2=);
proc glimmix data=main.Final_year4;
class status;
model &var1=status
/dist=normal /*binomial*/ link=identity/*logit*/ s ddfm=kr;
lsmeans status / ilink diff;
ods output
/*ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1*/
Diffs=DIF_RESULT1 LSMeans=LS1;
run;

data &var1.;
set LS1 DIF_RESULT1;
outcome_name="&var1";
v="vs.";
if Mu ne . then analysis_type="Descriptive Stats";
if Mu = . then do;
analysis_type="Anova";
if Probt < 0.05 then sig="* ";
if Probt < 0.01 then sig="** ";
if Probt < 0.001 then sig="***";
if Probt =. then sig="";
end;
drop effect;
run;

data &var2;
retain outcome_name analysis_type status v _status;
set &var1.;
run;

%mend clem;
%clem(var1=year4_outcome1,var2=a);
%clem(var1=year4_outcome2,var2=b);
%clem(var1=year4_outcome3,var2=c);
%clem(var1=year4_outcome4,var2=d);

data em;
run;

data allresults;
set a em b em c em d ;
run;

September 27, 2018October 24, 2018

Statistical joint test of categorical variables when expressed as a series of dummy variables

When I have multiple subgroup represented in a series of dummy variables (e.g., race groups, grade levels, etc.), I want to know if dummy variables as a system contribute to the model with statistical significance. This may be called a joint test because I want to know if, for example, race groups together (not separately) make a differences to the model.

The easiest way to do this is to treat those variables as classification variables. You will get a joint statistical test in one of the result tables.

proc glimmix ..;

class race grade_level;

....

run;

In my applications I almost always use numeric version of variables, i.e., dummy variables (coded as 0 or 1). I like this approach because I can use PROC MEANS on them to create a descriptive statistics table.

The question is how I get joint statistical tests when all of my predictors are numerically coded and thus I can't rely on the class statement (shown above in the syntax example).

The GLIMMIX syntax below treats race groups and grade levels as numerically coded dummy variables (if YES 1, else 0).

The parameter estimate tables will show coefficients derived for each of the numeric variables; however, I wouldn't know if race groups as a group matters to the model or grade levels as a system matters to the model. For example, even when the coefficient derived for subjects being black is statistically significant, that is only about how black students are different from white students (reference group in this example). We don't know if race as a group matters and race groups jointly make a statistically significant contribution to the model.

Contrast statements will do the trick.

proc glimmix data=usethis namelen=32;
class groupunit;
model Y= treat black hispanic other grade09 grade10 grade11/
solution ddfm=kr dist=&dist link=&link ;
output out=&outcome.gmxout residual=resid;
random intercept /subject=groupunit;
CONTRAST 'Joint F-Test Race groups ' Black 1, Hispanic 1, other 1;
CONTRAST 'Joint F-Test Grade levels' grade09 1, grade10 1, grade11 1,

ods output
ParameterEstimates=_3_&outcome.result covparms=_3_&outcome.cov
Contrasts=cont&outcome;
run;

December 6, 2017

PROC GLIMMIX's option: lsmeans group / ilink diff;

proc glimmix data=asdf METHOD=RSPL;

class CAMPUS_14 subgroup;

model y=x1 x2 x3 subgroup

/dist=binomial link=logit s ddfm=kr;

lsmeans group / ilink diff;

ods output ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1

Diffs=DIF_RESULT1 LSMeans=LS1;

run;

November 18, 2016November 18, 2016

HLM: What happens if I enter level2 variables at level 1?

The goal of this exercise is to find out what happens if I intentionally use a level2 variable at level 1 in HLM. I found that the coefficients and standard errors remain about the same. The parameter that differed was just the degree of freedom, which was consistent with my expectation.

Using my old NELS dataset, I ran two different HLM models using Bryk and Raudenbush’s software (See model 1 and model 2 equations in the table below).

The URBAN as level 1 covariate model (I entered the level2 variable URBAN wrongly at level 1)
The URBAN as level 2 covariate model (I entered the level2 variable URBAN correctly at level 2)

The outcome variable is the achievement composite (POSTTEST), students are level 1 and schools are level 2. When expressed as mixed models, the two models are identical, which is why I expected most parameters to come out the same.

POSTTEST_ij = γ₀₀
+ γ₁₀*URBAN_ij + u_0j+ r_ij

The first model (MODEL 1; see below) included URBAN (students are in urban school) as a level 1 predictor. Of course this is a wrong specification because urban is a school characteristic. In the second model (MODEL 2), I used it at the expected level, which is at level 2 (school level).

These models look different, but AGAIN when expressed as mixed models, they are identical. As the third model (MODEL 3), I replicated the same HLM model using SAS PROC GLIMMIX. SAS requires that the equation be expressed as a mixed model.

Results showed that coefficients and standard errors are more or less the same across three models. The only one thing that was different was degree of freedom.

Conclusion: As long as variables enter the model as fixed effects as done here, there is nothing magical about the HLM model. HLM software or SAS PROC GLIMMIX (option ddfm=kr) adjust degree of freedom values, accounting for the fact that URBAN is a school-level variable and thus should not be awarded a value that is too large. Notice that under the correct specification (MODEL 2 and MODEL 3), the degree of freedom for URBAN is close to the number of schools, not to the number of students.

Thanks for any comments you may have.

MODEL 1

MODEL 2

MODEL 3

URBAN as LEVEL 1 covariate

URBAN as level 2 covariate

SAS PROC GLIMMIX

Level-1 Model

POSTTEST_ij = β_0j + β_1j*(URBAN_ij) + r_ij

Level-2 Model

β_0j = γ₀₀ + u_0j
β_1j = γ₁₀

Mixed Model

POSTTEST_ij = γ₀₀
+ γ₁₀*URBAN_ij + u_0j+ r_ij

Level-1 Model

POSTTEST_ij = β_0j + r_ij

Level-2 Model

β_0j = γ₀₀ + γ₀₁*(URBAN_LE_j) + u_0j

Mixed Model

POSTTEST_ij = γ₀₀ + γ₀₁*URBAN_LE_j + u_0j+ r_ij

proc glimmix data=kaz.level1;

class schoolID;

model

posttest =

urban

/solution ddfm=kr dist=normal link=identity s ;

random schoolID;

run;

Final estimation of fixed effects
(with robust standard errors)

Fixed Effect	Coefficient	Standard error	t-ratio	Approx. d.f.	p-value
For INTRCPT1, β₀
INTRCPT2, γ₀₀	52.643432	0.526139	100.056	125	<0.001
For URBAN slope, β₁
INTRCPT2, γ₁₀	-0.450022	1.157924	-0.389	692	0.698

Final estimation of variance components

Random Effect	Standard Deviation	Variance Component	d.f.	χ²	p-value
INTRCPT1, u₀	3.76951	14.20923	125	292.48369	<0.001
level-1, r	8.39004	70.39271

Statistics for current covariance components model

Final estimation of fixed effects
(with robust standard errors)

Fixed Effect	Coefficient	Standard error	t-ratio	Approx. d.f.	p-value
For INTRCPT1, β₀
INTRCPT2, γ₀₀	52.643459	0.526140	100.056	124	<0.001
URBAN_LE, γ₀₁	-0.449983	1.157920	-0.389	124	0.698

Final estimation of variance components

Random Effect	Standard Deviation	Variance Component	d.f.	χ²	p-value
INTRCPT1, u₀	3.76919	14.20678	124	292.48068	<0.001
level-1, r	8.39007	70.39334

Statistics for current covariance components model

Solutions for Fixed Effects
Effect	Estimate		Standard Error		DF	t Value	Pr > \|t\|
Intercept	52.6434		0.5460		95.01	96.41	<.0001
urban	-0.4501		1.1095		132.4	-0.41	0.6856

Covariance Parameter Estimates
Cov Parm		Estimate		Standard Error
schoolID		14.2150		3.4610
Residual		70.3913		3.7455

Datasets:

www.nippondream.com/file/datafiles_HLM.zip

Results from Model 1

Results from Model 2

Results from PROC GLIMMIX

October 24, 2016

PROG GLM syntax -- contrasting groups using LSMEANS

proc glimmix data=kaz2.Year2AnalysisSample;
class SCHOOL_NAME grade_level race_cate;
model
&z_post = treat
grade_level
/solution ddfm=kr dist=normal link=identity s ;
random SCHOOL_NAME;
covtest /wald;
lsmeans grade_level / diff;
run;

May 4, 2016May 4, 2016

How to edit a PROC GLIMMIX syntax into a PROC MIXED syntax

I encountered a situation where somehow PROC GLIMMIX can't produce results while PROC MIXED can. In this WORD document, I showed you how I edited the PROC GLIMMIX syntax to be a PROC MIXED syntax. Please ignore my macro %w.

https://drive.google.com/open?id=0B7AoA5fyqX_sVXQ1YzRlSlFlaW8

May 3, 2016May 12, 2016

PROC GLIMMIX error messages

While running PROC GLIMMIX with a large dataset with categorical variables explicitly treated as CLASS variables AND with weight, I got the following error message:

WARNING: Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed.

The weight and the use of categorical variables were the cause of the problem as I found out that without weight the model produces results.

SAS support staff spent time trying to figure out what the issue is/was. I sent them a dataset (though I replaced the variables with randomly generated values) and the exact syntax. They determined that (thank you so much for going extra miles to figure this out!):

"the computation of MIVQUE fails because the default singular value (1e-12) leads to the detection of singularity while sweeping of the mixed model equation" -- May 12, 2016.

They proposed that singular value should be specified as a very small value to let the conversion happen:

proc glimmix data=example singular=1e-9;
weight psweight;
class race_cate level2 ;
model outcome= race_cate / s dist=normal link=identity ddfm=kr;
random intercept/subject=level2;
run;

****

Before hearing this final response from SAS, I wrote this and said that I chose to use PROC MIXED because it produces results.

SAS Support person suggested that I use of PARMS statement after Random, so I feed in the initial values for variance-covariance values manually/directly. I did something like this:

PARMS (0.2) (0.4);

Then I get:

“ERROR: Values given in PARMS statement are not feasible.”

Out of curiosity, I used PROC MIXED to run the same model without PARMS but with a weight statement. This time, the model produced results without encountering any errors.

PROC MIXED and PROC GLIMMIX produce identical results (or almost or essentially identical results) when the same models are specified.

I *think* my solution for now is to use PROC MIXED for this specific model.

The following two produced the identical results.
proc mixed data=analysis noclprint ;
where random_number > 0.8;
weight psweight;
class level2 race_cate2 loc_cate;
model zposttest=
<I put variable names here -- including categorical variables, level2 race_cate2 loc_cate>
/ s ddfm=kr ;
random int/subject=level2;
ParameterEstimates=kaz5
Diffs=DIF_GROUP5
LSMeans=LS5
Tests3=jointtest;
run;

proc glimmix data=analysis noclprint ;
where random_number > 0.8;
weight psweight;
class level2 race_cate2 loc_cate;
model zposttest=
<I put variable names here -- including categorical variables, race_cate loc_cate>
/ s dist=normal link=identity ddfm=kr STDCOEF ;
lsmeans
race_cate2 loc_cate
/ ilink diff
at (
treat
Zpretest
male
LEP
SPECED
DISADV
)=(0 0 0 0 0 0);
random int/subject=level2;
*parms (.03 ) (.45) /* /hold=1,2*/ ;
*ods output
ParameterEstimates=kaz5
Diffs=DIF_GROUP5
LSMeans=LS5
Tests3=jointtest;
run;