/*******************************************************************\
|SAS PROGRAM to teach OLS regression July 7th 2002 |
|by Kazuaki Uekawa, Ph.D. |
|Independent SAS consultant |
|(ueka@src.uchicago.edu) |
|http://www.src.uchicago.edu/users/ueka |
|http://www.geocities.com/sastip |
********************************************************************/
/*We have a model:
Y=beta0 + beta1*X + error
We derive betas using data.
First, we get betas.
Second, we get errors of the betas in order to evaluate beta -- whether they are statistically significant or not.
After understanding these two, we can go on to worry about R-squares and comparison of models.
*/
proc iml;
/*how can we derive coefficients from Y and X?*/
y={2, 4, 6, 8, 10};
x ={1 1.1 ,
1 2.2 ,
1 3.1 ,
1 4.3 ,
1 5.1 };
/*here is a powerful algorithm which is a major achievement of 20th century statistics*/
beta=(inv(t(x)*x))*(t(x)*y);
/*look at beta*/
print beta;
***************************************************************************************;
/*Now the rest is all about evaluation of the betas derived above*/
/*Are the betas derived above are statistically significant?*/
/*We now need to know something about errors, which will eventually give us a sense of errors of the betas derived above*/
/*(1) Get predicted values for Y*/
predicted_y=x*beta;
/*If we substract predicted Y values from real Y values, we get errors*/
errors=y-predicted_y;
print errors;
/*(2) Get mean squared errors*/
/*(2a) Get a sum of squared errors*/
sum_squared_errors=t(errors)*errors;
print sum_squared_errors;
/*did you understand what a product of Tranposed matrix and original matrix? If not, try this toy example*/
Q={1,
2,
3};
R=t(Q)*Q;
print Q R ;
/*get it?*/
/*(2b) Get a degree of freedom*/
n_of_observation=nrow(Y);
n_of_variables=ncol(X)-1; /*minus one --> one column in X is just an instrumental one with all 1s -- to get intercept estimates*/
degree_of_freedom=n_of_observation-n_of_variables;
/*(2c) Now do (sum of squared errors)/degree of freedom*/
mean_squared_error=sum_squared_errors/degree_of_freedom;
print mean_squared_error;
/*(3) now get the covariance of coefficients*/
/*Getting covariance matrix is an important part of the process when getting standard errors of coefficients without which we cannot test whether betas
are statistically significant*/
covariance_of_betas=inv(t(x)*x)#mean_squared_error;
print covariance_of_betas;
/*From this covariance matrix, you need to get only necessary information which is located in diagnal cells*/
/*in the following, "vecdiag" picks numbers from diagnal cells. Also we want to square it to get standard errors for betas*/
standard_errors=sqrt(vecdiag(covariance_of_betas));
print beta standard_errors;
***************************************************************************************;