Logit coefficients from logistic regression model

How do we interpret logic coefficients estimated by logistic regression model?  The following is a hypothetical result:

log(p/1-p) = 0.3 + 0.2*Male + 0.4*TREATMENT

One use of this result is to see if Male effect and GPA effect are statistically significant.  We also want to know the meaning of values, such as 0.2 and 0.4.  Because the left side of equation is a complex mathematical construct, it is not immediately clear what 0.2 or 0.4 means.

<Under construction>

Effect size by Cohen

Effect size of:

.2 Small effect

.5 Medium effect

.8 Large effect

Reference:

Cohen, J. Statistical power for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum (1988).

Page 5 of  http://www.wmich.edu/evalphd/wp-content/uploads/2010/05/Effect_Size_Substantive_Interpretation_Guidelines.pdf

Quotation:

"Cohen’s benchmarks Cohen (1988) attempted to address the issue of interpreting effect size estimates relative to other effect sizes. He suggested some general definitions for small, medium, and large effect sizes in the social sciences. However, Cohen chose these quantities to reflect the typical effect sizes encountered in the behavioral sciences as a whole -- he warned against using his labels to interpret relationship magnitudes within particular social science disciplines or topic areas. His general labels, however, illustrate how to go about interpreting relative effects. Cohen labeled an effect size small if d = .20 or r = .10. He wrote, "Many effects sought in personality, social, and clinical-psychological research are likely to be small . . . because of the attenuation in validity of the measures employed and the subtlety of the issue frequently involved" (p. 13). Large effects, according to Cohen, are frequently "at issue in such fields as sociology, economics, and experimental and physiological psychology, fields characterized by the study of potent variables or the presence of good experimental control or both" (p. 13). Cohen suggested large magnitudes of effect were d = .80 or r = .50. Medium-sized effects were placed between these two extremes, that is d = .50 or r = .30. A caution against using Cohen’s benchmarks as generic descriptors of the magnitude of effect size is implied above. Because some areas, like education, are likely to have smaller effect sizes than others, using Cohen’s labels may be misleading."

 

Odds ratio (Explanation)

Odds-ratio can summarize a value that would otherwise take multiple percentage values to explain the result of an intervention.  For example, imagine one group of high school students received the mentoring intervention and the other didn't.  The results of on-time high school graduation was:

  • Group T: 85% graduated; 15% did not graduate
  • Group C: 75% graduated; 25% did not graduate

This is a lot of information to communicate.  I could reduce it like this too, but still it takes a lot of words:

  • Group T: 85% graduated
  • Group C: 75% graduated

Odds ratio can express this with one value.

odds ratio= (P1/(1-P1)) / (P2/(1-P2)

To plug in numbers from the graduation example:

odds ratio= (.85/(1-.85))  /  (.75/(1-.75)) = 1.8888

For people who are not used to mathematical notations:

  • /  means divided by (e.g., 30/3 =10)
  • Also notice that algorithms usually use rates rather than percentages (not 85 but .85).

I recommend replicating this result using Excel sheet.  Enter these values at the left-top corner of an Excel sheet and confirm that the function (A3/B3) will return 1.888...

0.85 0.75
 =(A1/(1-A1))  =(B1/(1-B1))
 =A3/B3

For Excel beginners, A1 means the cell defined by Column A and Row 1 of the Excel sheet.

As you do this replication, try to understand the meaning of a resulting value conceptually.  Change the values in Excel from original .85 and .75 to other values to understand how the algorithm works and changes the result.  Confirm the following:

  • Odds ratio can vary from 0 to infinity (=super big values).
  • If the odd ratio is greater than 1, the intervention program made a larger difference.
  • If the odd ratio is 1, the program did not make any difference.  Try to understand the algorithm by entering the same values to P1 and P2.
  • If the odd ratio is small than 1, the program made the situation worse.

Finally, one of the advantages of odds ratio is that when you look at the value, you can immediately tell if the treatment group had more favorable result than the comparison group did. If programmed exactly as above, an odds ratio value greater than 1 means the treatment group performed better.  If less than 1, the comparison group did better.

Adding a note to SAS results

data NOTES;

input  Notes & $ 1-100;

datalines;

This is my note

;

run;

 

proc print;

run;

 

*****************

data _null_;
set n_level_info;
call symput ("NLevels", NLevels);
run;

data NOTES;
input Notes $ 1-100;
textResolved=dequote(resolve(quote(Notes)));
datalines;
This is the way I add a note in a data step.
This is an example of how I can use a macro --> &NLevels .

;
run;

data notes2;
set notes;
keep textResolved;

run;

Data editing in SAS

My data looks like this:

VAR1
TITLE A
APPLICATION #1
APPLICATION #2
APPLICATION #3
TITLE B
APPLICATION #4
APPLICATION #5
APPLICATION #6
TITLE C
APPLICATION #4
APPLICATION #5
APPLICATION #6

I’d like the result to look like VAR2 below

VAR1 VAR2
TITLE A TITLE A
APPLICATION #1 TITLE A
APPLICATION #2 TITLE A
APPLICATION #3 TITLE A
TITLE B TITLE B
APPLICATION #4 TITLE B
APPLICATION #5 TITLE B
APPLICATION #6 TITLE B
TITLE C TITLE C
APPLICATION #4 TITLE C
APPLICATION #5 TITLE C
APPLICATION #6 TITLE C

To be more exact, I’d like it to be like this, but if I get above, I can get this myself:

VAR1 VAR2
APPLICATION #1 TITLE A
APPLICATION #2 TITLE A
APPLICATION #3 TITLE A
APPLICATION #4 TITLE B
APPLICATION #5 TITLE B
APPLICATION #6 TITLE B
APPLICATION #4 TITLE C
APPLICATION #5 TITLE C
APPLICATION #6 TITLE C

Thanks Charly:

*****;
data a;
input VAR1 &$30.;
cards;
TITLE A
APPLICATION #1
APPLICATION #2
APPLICATION #3
TITLE B
APPLICATION #4
APPLICATION #5
APPLICATION #6
TITLE C
APPLICATION #4
APPLICATION #5
APPLICATION #6
;

data b;
set a;
if var1 =: 'TITLE' then var2=var1;
else output;
retain var2;
run;

proc print ; run;

Creating a series of dummy variables

Thanks Russ:

Based on the lowest and highest grade, the following creates a series of dummy variables indicating which grade level is served --- by schools.

data one;
input ID LOWEST_GRADE HIGHEST_GRADE;
cards;
1 4 9
2 9 12
;

data new;
array grades(*) grade1-grade12;
set one;
do i =1 to dim(grades);
if i ge lowest_grade and i le highest_grade then grades(i)=1;
else grades(i)=0;
end;
run;

proc print;
run;