The meaning of intercept and centering of predictor variables

The result table of a regression model includes, among other things, a column of coefficients.  The intercept value, shown at the top cell of the coefficient column, may look mysterious and even arbitrary.  The intercept is the predicted value for a subject whose values for all predictors in the model are 0’s.  If the regression model includes gender as a predictor (coded as 1 if male, else 0), the intercept will indicate the average outcome value for female subjects.  If the model includes gender and body weight, the intercept value will indicate the average outcome value for females who has a body weight of zero.  Nobody’s weight is 0; thus, the meaning of the intercept in this case is nonsensical.   If an analyst is not particularly interested in adding a substantive meaning to the intercept, he/she can ignore the intercept and safely interpret the rest of coefficients.

Personally I want all values in my result tables to have a substantive and interpretative meaning.  As mentioned, with dummy variables (coded as 1 or 0) included in the model, the intercept already has a meaning.

If the model includes continuous variables, however, I recommend centering those variables around the variables’ average value.  If the variable in question is a test score whose value range is 0 to 100 and the average score was 65, I would subtract 65 from each subject’s test score (if a test score is 60, then 60 - 65.  In SAS, you can do:

proc standard  data=abc out=abc2 mean=0;

var testscore1 ;

run;

 

With centering, the intercept will obtain a meaning.  The intercept value indicates the predicted value for a subject whose test score is the average score.  Again, the centering does not affect coefficients of other variables included in the model or any other values obtained from the model.

You can also center a predictor’s values and fix its standard deviation to be 1.  If SAS, you can do:

proc standard data=abc out=abc2 mean=0 std=1;

var testscore;

run;

The resulting value is called “z-score.”  Z-score may be better-known than the concept of centering.  Z-score is one specific type of centering.  Its mean is zero (as all values are centered around the average value) and standard deviation is fixed as 1.

I typically apply “z-scoring” for a pretest variable whose scores are large numbers (e.g., 953, 405, etc.).  Without this adjustment, the derived coefficients may be too small to read in the table (e.g., 0.00000014).

 

 

 

Leave a Reply