HLM is a type of mixed model developed and refined by Bryk and Raudenbush, as well as their colleagues. When data is hierarchically structured like education data (e.g., students sampled from the set of schools), independence of errors cannot be assumed; so classical statistical models, such as OLS, would not be good. Standard errors would be underestimated, making it overly easy to attain statistically significant results.
Bryk and Raudenbush have their software out through Scientific Software (R). I use SAS proc mixed because I find it easier to use than HLM software. I have not experienced a bug with SAS PROC MIXED.
I demonstrate how HLM works using Korean and American data of 8th graders. TIMSS or Third International Math and Science Study collected mathematics scores of these students, as well as lots of other things. I also show SAS syntax, so it is easy to replicate what I did hereat least for SAS users.
Download SAS DATA . Here I assume that you put it in your "my document" folder.
1. Run this program just once to create library that contains tables for variables.
This is an interceptonly model. The purpose is to get a grandmean (=intercept) and the size of variance at several levels. In this case, the variances of math score are estimated at individuallevel and schoollevel.
Level1: Math Score= b0+error_ij Level2: b00=g00+error00_j

/*Anova model or interceptonly model*/ proc mixed data=both2 covtest noclprint; by IDcntry; model bsmpv01= / solution ddfm=bw ; random intercept/ sub= IDSCHOOL ; run; 
See SAS outputs for anova models for two countries
This is where you find variance estimates for Korea. Note how highlighted parts go to the table below.
Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z Intercept IDSCHOOL 0.06161 0.01268 4.86 <.0001 Residual 0.9077 0.02448 37.08 <.0001
This is for fixed effects.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > t
Intercept 0.7409 0.02692 149 27.52 <.0001
KOREA HLM Anova models  USA HLMAnova models  
KOREA  US  
Coefficient  Error  P  Parameter Variance  Error  P  Coefficient  Error  P  Parameter Variance  Error  P  
Intercept  0.7409  (0.03)  < .0001  0.0616  (0.02)  <.0001  0.3061  (0.04)  < .0001  0.2141  (0.02)  <.0001  
Variance  
Level 1  0.9077  (0.01)  <.0001  0.4769  (0.01)  <.0001  
ICC  7%  45%  
Sample Size  2901  6944  
N of Schools  150  183  
2 RES LOG L  8081.8  15088.5  
AIC  8085.8  15092.5  
AICC  8085.8  15092.5  
BIC  8091.8  15098.9 
Comparing intercepts, we learn that Koreans do a lot better than Americans in this test. Comparison variances, we learn that in Korea students differ in scores mostly within schools, while schools are very similar to each other. American schools are very different in average scores.
We could do this sort of graph for presentation.
OLS Regression  HLM (Random Coefficient model) 
OLS coefficients are just one set of them, while for HLM coefficients are esimated for each group unit (i.e., school)  
Math Score = b0 + b1* college + b2* boy +
error.

Level1: Math Score= b0+b1*college+b2*boy+error_ij Level2: b00=g00+error00_j Level2:b10=g10+error10_j Level2:b20=g20+error20_j

PROC REG and PROC MIXED are very similar. The crucial line in PROC MIXED is RANDOM statement. You put variable names (including intercept) whose coefficients you want to estimate for each group unit.  
/*OLS regression*/ /*by IDcntry> to run a regression separately by nation*/ proc reg data=both2; by IDcntry; model BSMPV01= college2 boy2; run; 
/*college and boy as predictors*/ /*Coefficients of these two predictors are set RANDOM*/ proc mixed data=both2 covtest noclprint; by IDcntry; class college boy; model bsmpv01= college boy/ solution ddfm=bw ; random intercept college boy/ sub= IDSCHOOL s; ods output solutionR=sol; run; highlighted in blue are optional part of the programming.

OLS Results  
Korea  US  
Coefficient  Error  pvalue  Coefficient  Error  pvalue  
Intercept  0.55  (0.03)  ***  0.435  (0.02)  ***  
College Educated Parent  0.53  (0.04)  ***  0.357  (0.02)  ***  
BOY  0.13  (0.04)  ***  0.033  (0.02)  
Sample Size  2900  6943  
R2  0.05  0.039  
Math Score is Zscore. Korean and American samples are merged first and then standardized.  
*** p<0.001; ** p<0.01; * p<0.05 
HLM Results  
Korea  US  
Parameters  Parameter Variance  Parameters  Parameter Variance  
Error  Error  Error  Error  
Intercept  0.56  (0.03)  ***  0.02  (0.01)  *  Intercept  0.38  (0.04)  ***  0.20  (0.02)  ***  
College Educated Parent  0.49  (0.05)  ***  0.01  (0.01)  College Educated Parent  0.16  (0.02)  ***  0.02  (0.01)  **  
BOY  0.14  (0.04)  **  FIXED  BOY  0.05  (0.02)  **  FIXED  
Residual  0.88  (0.02)  ***  Residual  0.47  (0.01)  ***  
ICC  3%  ICC  30%  
*** p<0.001; ** p<0.01; * p<0.05 
Generally speaking, standard errors are bigger for HLM results. Also US results look different probably reflecting that fact that the ICC for US data is large. US data suffers more from correlated errors within schools, which probably is why when the correlated error is fixed in HLM, the results look different.
It's good to look at the random effects. You can have a feel for the data in this way.Because I requested that results for random effects be saved in a data set by saying "ods output solutionR=sol;" I get a data set "sol" that has coefficients for all schools. I did proc print to print out the coefficients for parents' education variables. Notice when we set coefficients RANDOM, all schools (variable subject indicates ID) get coeffcients. For example, at school 1, the effect of parent being a college educated is .03051 (baseline = parent graduated middle school).
/*Eyeball the results*/ proc print data=sol; where effect="college" and estimate > 9 and DF > 9; run;
The SAS System 21:56 Monday, December 8, 2003 63 StdErr Obs IDCNTRY Effect college boy Subject Estimate Pred DF tValue Probt 2 410 college 2college 1 0.03051 0.09281 2898 0.33 0.7423 3 410 college high school 1 0.01327 0.09111 2898 0.15 0.8842 7 410 college 2college 2 0.00905 0.09191 2898 0.10 0.9215 8 410 college high school 2 0.00407 0.09187 2898 0.04 0.9646 12 410 college 2college 3 0.01475 0.09165 2898 0.16 0.8722 13 410 college high school 3 0.00559 0.09247 2898 0.06 0.9518 17 410 college 2college 4 0.01486 0.09473 2898 0.16 0.8753 18 410 college high school 4 0.01260 0.09036 2898 0.14 0.8891 22 410 college 2college 5 0.01241 0.09039 2898 0.14 0.8908 23 410 college high school 5 0.01070 0.09429 2898 0.11 0.9097 27 410 college 2college 6 0.02322 0.09249 2898 0.25 0.8018 28 410 college high school 6 0.02978 0.09134 2898 0.33 0.7444 32 410 college 2college 7 0.001952 0.09430 2898 0.02 0.9835 33 410 college high school 7 0.02067 0.09074 2898 0.23 0.8198 37 410 college 2college 8 0.08799 0.09122 2898 0.96 0.3349 38 410 college high school 8 0.02738 0.09389 2898 0.29 0.7706 REST OMMITTED 
Below I am plotting coefficients, the first one for Korea and the second one for the US.
proc univariate data=sol plot; where effect="college" and estimate > 9 and DF > 9; by IDcntry; var estimate;run;
Histogram # Boxplot 0.085+* 1 0 . .* 2 0 .* 2  .***** 9  .********* 18  .************ 24  .********************* 42 ++ .******************************* 62 *+* .*********************** 45   .********************* 42 ++ .*********** 21  .********* 17  .**** 7  .* 2  .* 2 0 .* 1 0 0.085+** 3 0 ++++++ * may represent up to 2 counts Histogram # Boxplot 0.19+* 1 0 .* 1 0 . .** 4  .***** 10  .*** 5  .*************** 30  .*************** 29  0.03+*********************** 45 ++ .*************************** 53  +  .******************************** 63 ** .*********************** 45 ++ .***************** 34  .************ 24  .***** 10  .**** 7  0.13+*** 5 0 ++++++ * may represent up to 2 counts

I think one insight to derive here is that US has a slightly larger dispersion of college effect. Interesting to think of what this may mean.
Under development