/*******************************************************************\ |SAS PROGRAM to teach OLS regression July 7th 2002 | |by Kazuaki Uekawa, Ph.D. | |Independent SAS consultant | |(ueka@src.uchicago.edu) | |http://www.src.uchicago.edu/users/ueka | |http://www.geocities.com/sastip | ********************************************************************/ /*We have a model: Y=beta0 + beta1*X + error We derive betas using data. First, we get betas. Second, we get errors of the betas in order to evaluate beta -- whether they are statistically significant or not. After understanding these two, we can go on to worry about R-squares and comparison of models. */ proc iml; /*how can we derive coefficients from Y and X?*/ y={2, 4, 6, 8, 10}; x ={1 1.1 , 1 2.2 , 1 3.1 , 1 4.3 , 1 5.1 }; /*here is a powerful algorithm which is a major achievement of 20th century statistics*/ beta=(inv(t(x)*x))*(t(x)*y); /*look at beta*/ print beta; ***************************************************************************************; /*Now the rest is all about evaluation of the betas derived above*/ /*Are the betas derived above are statistically significant?*/ /*We now need to know something about errors, which will eventually give us a sense of errors of the betas derived above*/ /*(1) Get predicted values for Y*/ predicted_y=x*beta; /*If we substract predicted Y values from real Y values, we get errors*/ errors=y-predicted_y; print errors; /*(2) Get mean squared errors*/ /*(2a) Get a sum of squared errors*/ sum_squared_errors=t(errors)*errors; print sum_squared_errors; /*did you understand what a product of Tranposed matrix and original matrix? If not, try this toy example*/ Q={1, 2, 3}; R=t(Q)*Q; print Q R ; /*get it?*/ /*(2b) Get a degree of freedom*/ n_of_observation=nrow(Y); n_of_variables=ncol(X)-1; /*minus one --> one column in X is just an instrumental one with all 1s -- to get intercept estimates*/ degree_of_freedom=n_of_observation-n_of_variables; /*(2c) Now do (sum of squared errors)/degree of freedom*/ mean_squared_error=sum_squared_errors/degree_of_freedom; print mean_squared_error; /*(3) now get the covariance of coefficients*/ /*Getting covariance matrix is an important part of the process when getting standard errors of coefficients without which we cannot test whether betas are statistically significant*/ covariance_of_betas=inv(t(x)*x)#mean_squared_error; print covariance_of_betas; /*From this covariance matrix, you need to get only necessary information which is located in diagnal cells*/ /*in the following, "vecdiag" picks numbers from diagnal cells. Also we want to square it to get standard errors for betas*/ standard_errors=sqrt(vecdiag(covariance_of_betas)); print beta standard_errors; ***************************************************************************************;