Dummy variables in logistic regression models

Why do switching of values in a dummy variable and the use of class statement in PROC LOGISTIC change the coefficients in logistic regression?

(1) and (2) produce the same results. (3) and (4) produce the same results.

(1)
proc logistic data=here.asdf descending ;
model college= boy ;
run;

(2)
proc logistic data=here.asdf descending ;
class girl;
model college= girl ;
run;

(3)
proc logistic data=here.asdf descending ;
model college= girl;
run;

(4)
proc logistic data=here.asdf descending ;
class boy;
model college= boy ;
run;

(1) Estimates (2) Estimates (3) Estimates (4) Estimates
Intercept

0.5346

 Intercept

0.3645

 Intercept

0.1945

 Intercept

0.3645

boy      

-0.3401

 girl    

-0.1701

 girl

0.3401

 boy      

0.1701

               
Odds ratio

0.712

 

0.712

 

1.405

 

1.405

Proc glimmix for logistic regression has an lsmeans option

Thanks J for this info:

Use dist=binary and link=logit for logistic regressino using PROG GLIMMIX.

The lSMEANS statemetn is available from this and it produces probability scores. Use ilink option:

lsmeans &group / ilink ;

For difference in the groups, you use the DIFF option on the LSMEANS statement. The results are also on the logit scale. If you use the OR option, you will get the odds ratios for the group effect --

lsmeans &group /diff or;

Unfortunately, the difference in the probability scale between the groups are not directly available in PROC GLIMMIX. The magnitude of the difference is easy to compute -- you use the results from the ILINK option output, which gives you the estimated probabilities in each group, and compute the difference by hand or by using a data step, however, the appropriate standard errors for these differences are not available in PROC GLIMMIX.

How to explain HLM (response to an email)

Hello XXX, thanks for your email! I recently taught a premier for HLM, so I hope I can update my website with new material.

This time, I tried to say that HLM is simple, but equations make it difficult to learn (which is also the case for learning music instruments, like piano).

In the attached graph, I tried to compare my intuitive model (mental representation model) and a formal equation. When it is in my mind, coefficients are either random (red font) or fixed (blue). The red coefficients are adjusted for precision/reliability, while the blue ones are not (like OLS coefficients). I tried to show how the equation expresses the same idea, but I feel it quickly gets annoying.

HLM slide sample

Logistic regression and and comparison of group means in the model (using PROC GLIMMIX)

For logistic regression models in PROC GLIMMIX, you need to use dist=binary link=logit option on the MODEL statement (sorry I missed pointing this out in your program in my previous response). So please add these two options.

For logistic regression models, the estimations are all on the logit scale, so is the LSMEANS statement. To get the lsmeans on the original scale (the probability scale), you can use the ILINK option --

 

lsmeans &group / ilink ;

 

For difference in the groups, you use the DIFF option on the LSMEANS statement. The results are also on the logit scale. If you use the OR option, you will get the odds ratios for the group effect --

lsmeans &group /diff  or;

Unfortunately, the difference in the probability scale between the groups are not directly available in PROC GLIMMIX. The magnitude of the difference is easy to compute -- you use the results from the ILINK option output, which gives you the estimated probabilities in each group, and compute the difference by hand or by using a data step, however, the appropriate standard errors for these differences are not available in PROC GLIMMIX.

Thanks, JT.

PROC TTEST and created a result dataset

/*Ttest Macro*/

/*Creates a result sad data set t_test_results*/
/*Find it in a temp folder and click-open it as an excel file*/

%let dataname=sashelp.class;
%let varlist=weight height age;
%let group=sex;

proc ttest data=&dataname;
class &group;
var
&varlist
;
ods output statistics=kaz1 ttests=kaz2 equality=kaz3;
run;

data kaz3b;
set kaz3;
if ProbF < 0.05 then unequal=1;
if unequal=1;
keep Variable unequal;
run;
proc sort;by Variable ;run;
proc sort data=kaz2;by Variable ;run;
data both;
merge kaz2 kaz3b;
by Variable ;
if unequal ne 1 then unequal=0;
flag=0;
if unequal=0 and variances="Equal" then flag=1;
if unequal=1 and variances="Unequal" then flag=1;
if flag=1;

SIG=" ";
if Probt < 0.05 then SIG="*";

keep Variable Probt SIG variances;
run;

data kaz1b;
set kaz1;
jun=_n_;
run;
proc sort;by Variable ;run;

data t_test_results;
merge kaz1b both;
by Variable ;
this=0;
if class = "Diff (1-2)" then this=1;
if this =1 then do;
probt2=probt;
SIG2=SIG;
variances2=variances;
end;

keep Variable Class mean N probt2 sig2 variances2;

run;

 

Example: how to use ODS in PROC GLIMMIX or other procs

/*Use proc GLIMMIX to run an OLS regression
and saves results (parameter estiamtes) in a
data set named "john" using ODS*/
proc glimmix data=sashelp.class;
model height=weight /dist=normal link=identity solution;
ods output ParameterEstimates=john;
run;
/*Edit the result data*/
data john2;set john;

/*Create a new variable that indicates
the level of significance*/

/*Do not forget to specify a value length*/
length asterisk $ 3;

if Probt < 0.10 then asterisk="~";
if Probt < 0.05 then asterisk="*";
if Probt < 0.01 then asterisk="**";
if Probt < 0.001 then asterisk="***";

run;

/NOW FIND john2 in a work directory and right-click it
to open with Excel*/

/*You can also see this by PROC PRINT*/
proc print data=john2;
run;

ROC Curve Analysis using PROC LOGISTIC

/*ROC Curve Analysis Macro*/

/*a hypothetical data set*/
data asdf;set sashelp.class;
EVENT=0;
if Weight > 100 then EVENT=1;
PREDICTOR=height;
run;

/*data name*/
%let dataname=asdf;
%let outcome=EVENT;
%let ind=PREDICTOR;
%let save_graphic=C:\Documents and Settings\19702\My Documents\sas;

ods html PATH="&save_graphic" (url=none) file="&dataname &ind .html";
ods graphics on / imagename="&dataname&ind";
proc logistic data=&dataname descending OUTEST=&dataname.result;
title "&dataname";
model &outcome =
&ind
/ outroc=&dataname.kaz2 ROCEPS=0 ;
output out = m2 p = prob xbeta = logit ;
ods output ParameterEstimates=kazcoeff
Association=kazassoc
ConvergenceStatus=kazconverg(keep= reason);
run;
ods graphics off;
ods html close;

proc transpose data=kazassoc out=T1;
var cValue1;
id label1;
run;
proc transpose data=kazassoc out=T2;
var cValue2;
id label2;
run;

data kazassoc2;
merge T1 T2;
run;

/*ods trace off;*/
/*Get descriptive statistics*/

ods listing close;
proc means data=&dataname;
var
&outcome
&ind
;
ods output summary=uekawa;
run;
ods listing;
/*get significance of the independent varible*/
data kazcoeff2;
set kazcoeff;
if Variable="&ind";
keep ProbChiSq StdErr flag;
flag=1;
label ProbChiSq="P-value for the ind var effect";
label StdErr="Stderr for the ind var effect";
run;

data &dataname.kaz2;set &dataname.kaz2;
flag=1;
run;

data &dataname.result;
set &dataname.result;
flag=1;
run;

data &dataname.kaz3;merge &dataname.kaz2 &dataname.result kazcoeff2;
by flag;
run;

data &dataname.kaz4;set &dataname.kaz3;
Distance=sqrt( (0-_1MSPEC_)**2 + (1-_SENSIT_)**2 );
suji=_n_;
run;
proc sql;
create table &dataname.kaz5 as
select *,
min(distance) as minimum_distance
from &dataname.kaz4;
run;

data optimal;
retain CUT_OFF_VALUE;
set &dataname.kaz5;
CUTOFF=0;
if distance = minimum_distance then do; CUTOFF=1; type="Dist to perfection";end;
/*if distance2 = maximum_distance2 then do; CUTOFF=1;
type="Dist to noninf";end;*/
if cutoff=1;
effect=&ind ;
LOGIT=LOG(_PROB_ / (1-_PROB_));
CUT_OFF_VALUE=((LOGIT-Intercept)/effect);
drop cutoff ;
run;
data results_of_ROC;
merge optimal uekawa kazassoc2 kazconverg;

TRUE_POSITIVE_RATE=_SENSIT_;
TRUE_NEGATIVE_RATE=1-_1MSPEC_;
AUC=C;
run;

proc print data=results_of_ROC;
title "ROC stats for &outcome";
var CUT_OFF_VALUE
TRUE_POSITIVE_RATE
TRUE_NEGATIVE_RATE
AUC ;
run;