Kaz's SAS, HLM, and Rasch Model
What is "error"?
Home
Large National Data Sets
Kaz Library
SAS manuals
What is "error"?
Rasch Model
HLM
SAS PROC NLMIXED and GLIMMIX
Factor Analysis
Reading output via SAS
Excel functions for Statistical Analysis
Essays on learning statistics
Statistics in Japanese
My profile
My SAS questions and SAS's responses
My work tool box
Introduction
 
The concept of error was confusing to me in the beginning because I couldn't tell where the story of error begins and ends.  There are two parts to statistical models.  The structural part (that is NOT about errors) and error part (that is about errors).  Statistics teachers are usually talking about one or the other, but I don't think they are aware that students get confused precisely because teachers don't make the distinction clear enough.  Stat teachers should say, "now I am talking about the structural part" or "now I am talking about errors."  Also they should say that 99% of their talk is about errors.  I think this is true.  If you program an OLS regression model using a matrix language, beta=(inv(t(x)*x))*(t(x)*y); is the only line you need to get the structural part of the model.  The rest is all about how to deal with errors.   The other source of confusion is that error can mean a lot of different things.  I think we should use the term "residuals" to refer to what we often refer to as errors.
 
What is "error/residual" in the context of obtaining an average score?
If John's math score is 70, Mike's is 80, and  Luke's is 90, the average score is 80.  The residuals for John, Mike, and Luke, respectively are -10, 0, and +10.  Residuals are about how far each score is from the average score (and to what direction).
 
What is "residual" in the context of an OLS regression model?
OLS regression model is usually the first statistical model you learn in STAT 101.  The equasion looks like this:
Y= intercept + beta*X + residual.
 
Before thinking about how this equation works, we should look at a model that is a lot simpler:
Y= intercept + residual.
This model has no predictor.  And this model is the same as the procedure that obtains an average.  If the data set contains math scores of John, Mike, and Luke, it will look like this:
Y= [70, 80, 90]
intercept= 80
residual=[-10, 0, +10]
 
OLS is a technique to obtain values (intercept or an average score in this case) that minimizes the size of residual.  Imagine I completely ignored the algorithm and guessed the average.  I say the average is 70 just because I feel like it!  Then observe the size of residual (it gets bigger).
 
Y= [70, 80, 90]
intercept= 70  
residual=[0, 10, +20]
 
Compare:
residual=[-10, 0, +10]
VS
residual=[0, 10, +20]
 
Can you tell the residual got bigger because I guessed the average/intercept without relying on a correct algorithm?  OLS provides an algorithm that minimizes the size of residuals.  You can google for an exact algorithm, but is seriously is one short line.  It is obviously always more correct that my random guessing.

TRY THIS IN SAS AND STUDY THE TABLE "INFLUENCE DIAGNOSTICS":
data this;input Name $ score;
cards;
John   70
Mike   80
Luke   90
;
run;

proc mixed data=this;
model score=   /s influence;
run;

What is a correlated error problem?
I continue with an example from above. Our OLS regression model doesn't have a predictor.  The intercept will return the average value.  Let's think now about standard errors of the intercept.  Standard error of the intercept tells you how accurately the average is (the bigger the error, the worse the precision).  There is an algorithm to obtain a standard error and it is based on only two things, a variance of the residuals and number of observation. 
     Let's confirm what these two things are.  What is a variance of the residuals?  In the example from above, it will be the variance, given the scores of 70, 80, and 90.  For now you can use excel to get this value, by doing =var(70,80,90).  The number of observation is 3.  Using these two pieces of information, you can derive a standard error. 
     You can search for an exact algorithm by googling it, but here is the point.  The obtained standard error is good only when observations are independent and normally distributed.  If John and Luke are brothers, their scores are not independent.  If John copied Luke's answers during the test, their scores are not independent.  If John and Luke learn from the same teacher and Mike goes to a different school, their scores are not independent.  So statistical decisions about if something is significant or not (the test of which relies on standard errors) is only correct when observations are independent.  Obviously observations are dependent in education research or other research settintgs, which is why we have a lot of techniques to solve this "correlated error problem."
     Whean I heard the expression, "errors are correlated," I had the hardest time understanding it.  For me, "correlation" requried at least two variables.  I was used to the notion of, for example, height and weight are correlated.  But I didn't immediately undersatnd when I was told "errors can be correlated."  I was used to seeing residuals/errors as something showin in one column, i.e., sort of like one variable.  But what does it mean to have errors that are correlated?  But it is possible for observations in one column of data to be correlated.  If John and Luke went to the same schools and learned math from the same teachers, their scores are alike (=correlated).
 
 

How does HLM solve a correlated error issue?
 
Under construction
 
How does Econometrics solve this problem?
Under construction

Enter supporting content here

Copyright 2005 KU
For information inquiry (AT) estat.us