Kenward-Roger Degrees of Freedom (SAS GLIMMIX)

I will edit this essay later.

Imagine I have an HLM model where level-1 are students and level-2 are schools.  I can enter teacher-level variables, but people who are used to HLM software by SSI will wonder how the use of teacher-level variables is possible without making the model 3-level models (level1 students, level2 teachers, level3 schools).  This is because in the old HLM tradition (people who studied HLM using HLM software in 1990s), equations are written/processed separately by levels and data must be prepared by levels (student data file and school data file).

If there are student-level and school-level, some HLM users may wonder how teacher-level variables can enter the model (or prepared in what file? student-level file or school-level file???).

People who use SAS or other software programs (maybe R?) will not wonder this because they think in terms of one whole equation and say, "of course we can enter variables of any levels."

The software programs they use adjust degree of freedom to take into consideration that teacher-level variables have a lot less number of values compared to the outcome variable.

K-R option in PROC GLIMMIX adjusts degree of freedom to account for the fact that group-level variables have a lot less possible values when compared to outcome variables.

K-R degree of freedom option seems most appropriate for multilevel modeling applied in educational evaluation studies (where typically students are nested within schools).  This option kicks in only when at least one random coefficient is estimated (e.g., intercepts as random effects).

proc glimmix data=i3d;
class school ;
model y =< here variable names >
/solution ddfm=kr dist=normal link=identity s ;
random int /sub=school;
run;

After playing with data type of simulation (I will do a better and well-documented simulation in the future), I learned the following:

For student-level (level-1) variables, degree of freedom under KR option is close to the number of students (minus a small number of cases maybe close to the number of predictors).  DF is larger if variance contained in the variable is greater.  For binary variables, DF is the largest when variance is .50 (DF gets smaller as a proportion gets close to 0 or 1).

For school-level (level2) variables, DF is close to the number of schools minus a small number maybe close to a number of predictors.  This may be also adjusted by variance of the variable.

I created a fake teacher-level predictors by creating two possible numeric values per school.  DF for this variable was close to the number of teachers in the data (two per school) minus a small number.  I think this is also adjusted by variance of the variable (which I didn't test exactly).

 

Cluster effect: What does HLM solve?

When I first learned HLM (Hierarchical linear modeling) at graduate program in 1994/5, I struggled with the following expression:

Errors are correlated.

Up to that point in Stat 101, correlation was about two columns of data (e.g., math test score and science test score).  Errors in the context of regression analysis are residuals from the model and they are stored in one column.  I had a conceptual difficulty trying to understand why values contained in one column (one variable) can be correlated.

When I learned about geostatistics again at a workshop, the model was supposed to correct data dependence issue caused by geographical proximity.  This time, it was about how temperature of town A, for example, is similar to an adjacent town B and thus observations are dependent on one another.

I also learned about econometric approach of trying to deal with the fact that time and observations are correlated (my test score today is dependent on my test score tomorrow).

After hearing again and again about statisticians' attempts to correct for data dependence, correlation of data, etc., I finally realized that data can be correlated within one column of data.  If you and someone else are from the same school, your outcome data are correlated.

The traditional statistical modeling technique, such as OLS regression model, relies on the assumption that outcome data are uncorrelated (observation 1 and 2 are completely not related to one another).  If this assumption is violated, we can no longer consider results of statistical test good.  In fact, in the presence of data dependence problem, results of statistical test will be over-optimistic (too many statistically significant results).

I also learned that the use of HLM is one thing you can do to improve the situation, but it may be just one of many problems you may have in data.  Student test scores may be also related within friendship networks.  Typically we do not have data of this membership.

In the same model, you can try to deal with group dependence (via. HLM) or time dependence (via. ARIMA  model, for example).  This is not impossible, but testing these two at the same time is computationally challenging.  You will have to choose your battle and fix one thing at one time.

What is vertically equated scores

In educational evaluation field, we often have access to vertically equated scales.  Scales means scores, measures, points.  Vertically equated scores in the context of education are the ones that are comparable across grades, which means that you can pick a score from 5th grader and a score from 8th grader and consider them to be measuring the same construct on the same scale, such as math ability.  I can say this or elaborate the concept in a couple of different ways.

  • Vertically equated scales allow you to compare students of different grades on a common scale.
  • If a 4th grader got a score of 50 and 9th grader also got score of 50, they have the same ability level.
  • The 10 point difference among 5th graders (e.g., 50 and 60) and the 10 point difference among 8th graders (60 and 70) are considered equal.

Instead of providing a detailed methodological note, I'd like to use a metaphor to explain why equating is possible across different grades.

<Under construction>

Mediator and Moderater

Mediator: When X is related to Y and when the mediator variable is included in the model X's effect diminishes.

Moderator: this involves statistical interaction: e.g., the effectiveness of the intervention depends on student's demographic characteristic.

Calculating Odds Ratios from Logistic Regression Results

One can obtain odds ratios from the results of logistic regression model.  Odds ratios derived are adjusted for predictors included in the model and explains the relationship between two groups (e.g., treatment and control group) and outcome (binary outcome).  I wrote the following Excel document that calculates odds ratio based on logit coefficients from the intercept and the predictor of interest (binary ones: e.g., impact coefficient, gender effect, etc.).

https://drive.google.com/file/d/0B7AoA5fyqX_sN0RUc0E5aFowb00/view?usp=sharing

Appendix (p.27) of the following document includes description of odds ratio.

http://www.doe.k12.de.us/cms/lib09/DE01922744/Centricity/Domain/91/MA1275TAFINAL508.pdf

What does "the intercept being statistically significant" mean?

What does "the intercept being statistically significant" mean?

You can safely ignore that information.

Statistical test is a test of whether a coefficient is different from zero, so in the most general context, the intercept being different zero doesn’t have much meaning in and of itself.   We can force it to mean something, though.  If a) we standardize the outcome score such that the average is 0 (In SAS, this would be proc standard mean=0; var outcome;run;) and b) if the intercept is significant, it means that whatever the intercept represents (e.g., Hispanic student’s average score) is significantly different from the average score (I'm using "significantly different" in a sloppy way to focus on the main point of my explanation).

Having said that, I just made this up just as an example.  Generally speaking, we should ignore the intercept’s significant level.

In contrast, the intervention effect being significant means something important and often exciting.  If the intervention effect is .20, statistical test examines whether .20 is different from 0.  0 means no effect, so if the impact coefficient is statistically different from 0, it's a good news (you would also want to examine the size of the coefficient too.)

How to understand and test statistical interaction effect

Using the regression model framework, an analyst can test whether the effect of X depends on another predictor.  If the outcome is student achievement and the most important independent variable is the intervention variable (students received treatment 1; else 0), one can further ask if the program effect depends on students’ demographic factors, such as gender, race and ethnicity, and some important student statues (e.g., special education, English learner).  If the main research question is whether the program has an effect on student outcome, we often ask the next question, “does the program effect depend on student demographic factors or student status variables.  If, for example, an educational intervention program works only for boys but not for girls, we expect to see statistical interaction between intervention and gender.  You can say this in different ways:

  1. The program impact varies by gender
  2. The effect of the program depends on gender
  3. The program and gender interact
  4. Gosh, this program is effective particularly for boys!

I will continue to use gender as an example through this text.

Read the rest in the following document (MS-WORD):

How to test and understand statistical interaction effect

If between-group variance increased instead of decreased in HLM

In non-HLM model (e.g., OLS), variance (outcome variance) will always reduce as you add predictors.

Variance is about how residuals are distributed.  If predictors are explaining outcomes very well, variance is small.  If predictors are NOT explaining outcomes well, variance is large.  Sometimes, outcome does not have enough variance to begin with (e.g., trying to explain when only 3 person out of 100,000 subjects graduate from high school)

In HLM, variance specific to levels (level-1 variance, level-2 variance) can increase, which is counter-intuitive. For example, when modeling student's achievement as an outcome, you add pretest score to the model and all of a sudden between-school variance increases.  This below was an example of how it can happen.

https://www.statmodel.com/download/Level-2%20R-square%20decreasing%20when%20adding%20level1-covariate.pdf

I will state my conclusions first.

a) It is theoretically and empirically possible to see group-level variance increase when individual-level predictor(s) are added to the model.  By looking at data (case-t-case or group-average <e.g., school average> comparison of residuals from a nonconditional model and a conditional model will help), you will need to understand why it happened (so you can explain when your audience question you).  Some situations will give you meaningful explanations (as mentioned in the housing price and location explanation in the PDF file referenced above).  Other situations will provide boring and a matter-of-fact explanations (it just happens as level-1 predictors change the value of the group mean estimate <i.e., random intercepts>).

b) before reaching an explanation, always suspect an error in the data.  Errors in the data can be related to the between-group variance increase (I will provide an example of this in situation 2).

 

Situation 1

The model was a multi-level logistic regression where:

  • The outcome: 0 or 1 (passing the post test or not passing)
  • Pretest score: interval score
  • 2-level models: Students (=subjects) are nested within schools

What happened was:

Between school variance increased as we enter the level-1 pretest score.  We have variance from anova model (non-conditional model) and the conditional model (the model that includes predictors).  The between-group variance increased.

This is how we solved:

  • a) I identified which predictor is causing this situation.  We quickly identified that it is the pretest by just testing how between-school variance changed by one predictor at a time.
  • b) I examined residuals from the model that that doesn't include the problem predictor (pretest) and the model that includes it.  When the two data columns are plotted, two observations were off from the rest of observation points, indicating after the predictor was entered into the model, these two group's errors (deviation from the mean) increased in size.
  • c) I examined how the outcome variable is related to the problem predictor.  Although the two are positively correlated, the two problem groups had an unexpected association for the two variables.  Despite that the two had low pretest scores, which would predict low outcome scores, they had relatively high outcome scores.
  • d) This means that the two groups have exceptionally high scores compared to prediction, which results in size increase of errors associated with subjects in these two groups.
  • e) Two alternative solutions
    • Remove outliers and make note of the situation to readers
    • Keep outliers, check consistency of results, if results do not change in substantive meaning (e.g., the impact coefficient stayed more or less the same), make note of the situation to readers

Situation 2

 

 

That is about it, but I may come back and edit.

I also want to create a fake dataset to replicate this problem.  I will create an outcome variable and a predictor that are positively correlated (e.g., math posttest and math pretest).  For ease of interpretation I will convert them into z-scores (proc standard data=abc mean=0 std=1;var pretest posttest;run;).  I will take 10% of the data and flip the sign of the predictors, so the association for this subgroup will be the opposite of the rest 90%.  I *think* this will replicate the situation, but I am not 100% sure.