I will edit this essay later.
Imagine I have an HLM model where level-1 are students and level-2 are schools. I can enter teacher-level variables, but people who are used to HLM software by SSI will wonder how the use of teacher-level variables is possible without making the model 3-level models (level1 students, level2 teachers, level3 schools). This is because in the old HLM tradition (people who studied HLM using HLM software in 1990s), equations are written/processed separately by levels and data must be prepared by levels (student data file and school data file).
If there are student-level and school-level, some HLM users may wonder how teacher-level variables can enter the model (or prepared in what file? student-level file or school-level file???).
People who use SAS or other software programs (maybe R?) will not wonder this because they think in terms of one whole equation and say, "of course we can enter variables of any levels."
The software programs they use adjust degree of freedom to take into consideration that teacher-level variables have a lot less number of values compared to the outcome variable.
K-R option in PROC GLIMMIX adjusts degree of freedom to account for the fact that group-level variables have a lot less possible values when compared to outcome variables.
K-R degree of freedom option seems most appropriate for multilevel modeling applied in educational evaluation studies (where typically students are nested within schools). This option kicks in only when at least one random coefficient is estimated (e.g., intercepts as random effects).
proc glimmix data=i3d;
class school ;
model y =< here variable names >
/solution ddfm=kr dist=normal link=identity s ;
random int /sub=school;
After playing with data type of simulation (I will do a better and well-documented simulation in the future), I learned the following:
For student-level (level-1) variables, degree of freedom under KR option is close to the number of students (minus a small number of cases maybe close to the number of predictors). DF is larger if variance contained in the variable is greater. For binary variables, DF is the largest when variance is .50 (DF gets smaller as a proportion gets close to 0 or 1).
For school-level (level2) variables, DF is close to the number of schools minus a small number maybe close to a number of predictors. This may be also adjusted by variance of the variable.
I created a fake teacher-level predictors by creating two possible numeric values per school. DF for this variable was close to the number of teachers in the data (two per school) minus a small number. I think this is also adjusted by variance of the variable (which I didn't test exactly).