Calculating Odds Ratios from Logistic Regression Results

One can obtain odds ratios from the results of logistic regression model.  Odds ratios derived are adjusted for predictors included in the model and explains the relationship between two groups (e.g., treatment and control group) and outcome (binary outcome).  I wrote the following Excel document that calculates odds ratio based on logit coefficients from the intercept and the predictor of interest (binary ones: e.g., impact coefficient, gender effect, etc.).

Appendix (p.27) of the following document includes description of odds ratio.

If between-group variance increased instead of decreased in HLM

In non-HLM model (e.g., OLS), variance (outcome variance) will always reduce as you add predictors.

Variance is about how residuals are distributed.  If predictors are explaining outcomes very well, variance is small.  If predictors are NOT explaining outcomes well, variance is large.  Sometimes, outcome does not have enough variance to begin with (e.g., trying to explain when only 3 person out of 100,000 subjects graduate from high school)

In HLM, variance specific to levels (level-1 variance, level-2 variance) can increase, which is counter-intuitive. For example, when modeling student's achievement as an outcome, you add pretest score to the model and all of a sudden between-school variance increases.  This below was an example of how it can happen.

I will state my conclusions first.

a) It is theoretically and empirically possible to see group-level variance increase when individual-level predictor(s) are added to the model.  By looking at data (case-t-case or group-average <e.g., school average> comparison of residuals from a nonconditional model and a conditional model will help), you will need to understand why it happened (so you can explain when your audience question you).  Some situations will give you meaningful explanations (as mentioned in the housing price and location explanation in the PDF file referenced above).  Other situations will provide boring and a matter-of-fact explanations (it just happens as level-1 predictors change the value of the group mean estimate <i.e., random intercepts>).

b) before reaching an explanation, always suspect an error in the data.  Errors in the data can be related to the between-group variance increase (I will provide an example of this in situation 2).


Situation 1

The model was a multi-level logistic regression where:

  • The outcome: 0 or 1 (passing the post test or not passing)
  • Pretest score: interval score
  • 2-level models: Students (=subjects) are nested within schools

What happened was:

Between school variance increased as we enter the level-1 pretest score.  We have variance from anova model (non-conditional model) and the conditional model (the model that includes predictors).  The between-group variance increased.

This is how we solved:

  • a) I identified which predictor is causing this situation.  We quickly identified that it is the pretest by just testing how between-school variance changed by one predictor at a time.
  • b) I examined residuals from the model that that doesn't include the problem predictor (pretest) and the model that includes it.  When the two data columns are plotted, two observations were off from the rest of observation points, indicating after the predictor was entered into the model, these two group's errors (deviation from the mean) increased in size.
  • c) I examined how the outcome variable is related to the problem predictor.  Although the two are positively correlated, the two problem groups had an unexpected association for the two variables.  Despite that the two had low pretest scores, which would predict low outcome scores, they had relatively high outcome scores.
  • d) This means that the two groups have exceptionally high scores compared to prediction, which results in size increase of errors associated with subjects in these two groups.
  • e) Two alternative solutions
    • Remove outliers and make note of the situation to readers
    • Keep outliers, check consistency of results, if results do not change in substantive meaning (e.g., the impact coefficient stayed more or less the same), make note of the situation to readers

Situation 2



That is about it, but I may come back and edit.

I also want to create a fake dataset to replicate this problem.  I will create an outcome variable and a predictor that are positively correlated (e.g., math posttest and math pretest).  For ease of interpretation I will convert them into z-scores (proc standard data=abc mean=0 std=1;var pretest posttest;run;).  I will take 10% of the data and flip the sign of the predictors, so the association for this subgroup will be the opposite of the rest 90%.  I *think* this will replicate the situation, but I am not 100% sure.

PROC GLIMMIX non-convergence problem solutions

Tips and Strategies for Mixed Modeling with SAS/STAT® Procedures Kathleen Kiernan, Jill Tao, and Phil Gibbs, SAS Institute Inc., Cary, NC, USA

ABSTRACT Inherently, mixed modeling with SAS/STAT® procedures, such as GLIMMIX, MIXED, and NLMIXED is computationally intensive. Therefore, considerable memory and CPU time can be required. The default algorithms in these procedures might fail to converge for some data sets and models. This paper provides recommendations for circumventing memory problems and reducing execution times for your mixed modeling analyses. This paper also shows how the new HPMIXED procedure can be beneficial for certain situations, as with large sparse mixed models. Lastly, the discussion focuses on the best way to interpret and address common notes, warnings, and error messages that can occur with the estimation of mixed models in SAS software.

glimmix data=xxx METHOD=RSPL  ITDETAILS;
class  xxx;
model xxx=  xxx
/dist=binomial link=logit s ddfm=kr;
random int / subject = xxx;

How to explain HLM (response to an email)

Hello XXX, thanks for your email! I recently taught a premier for HLM, so I hope I can update my website with new material.

This time, I tried to say that HLM is simple, but equations make it difficult to learn (which is also the case for learning music instruments, like piano).

In the attached graph, I tried to compare my intuitive model (mental representation model) and a formal equation. When it is in my mind, coefficients are either random (red font) or fixed (blue). The red coefficients are adjusted for precision/reliability, while the blue ones are not (like OLS coefficients). I tried to show how the equation expresses the same idea, but I feel it quickly gets annoying.

HLM slide sample