Skip to content
# My Statistical Tool box

## Posts

### PHP

### How I activated a new Essential phone without visiting a Verizon store

### SAS propensity score matching procedure

### The meaning of intercept and centering of predictor variables

### WWC attrition table

### T-test in SAS datastep

### Statistical joint test of categorical variables when expressed as a series of dummy variables

### Wiring of humbuckers

### Why use METHOD=RSPL for PROC GLIMMIX

### PROC GLIMMIX’s option: lsmeans group / ilink diff;

Statsitics, Research Methods, SAS, HLM, Rasch model

— October 17th, 2018

I dropped my Samsung Phone and broke it. I was planning to buy a new phone anyways, so I ordered a ESSENTIAL phone on Amazon. The price was $340 plus tax, which was substantially cheaper than an i-phone or a Samsung phone (price range $800-$1000). If you directly order it from Essential website, the price is $499 and the phone comes with accessories (click here).

I always bought my phones at a Verizon service vendor, but my cousin assured me that buying a smartphone online is easy. My cousin also told me that ESSENTIAL phones are made by Andrew Rubin who created the android OS system. Their website was sharp looking and Internet reviews were positive. One review said that Essential phones are compatible with my phone provider, Verizon. To state my conclusion first:

- I got the phone started within two hours of receiving it
- I was able to get it started without calling Verizon or changing options on my Verizon online account. A Verizon store would charge me at least $20 to switch a SIM card. I did not have to order a new SIM card from my online account.
- I needed
- a small flat screw driver to open the old Sam Sung phone’s back cover and a sharp and tiny need-like thing to open the Essential phone’s SIM slot (the Essential phone came with a pin to open the phone’s SIM slot).

- a pair of scissors to cut the SIM card to the nano size (my original Sam Sung phone SIM card was one size larger than the nano size).

When the Essential phone arrived the next day, I took out the SIM card out of my old Samsung Galaxy S4. I had to flip the backside panel open using a sharp object (I used the IFIXIT driver kit tool). With the new phone, I pressed the SIM card slot inward till it popped out.

The SIM card was larger than the nano SIM card required by Essential ph-1. Following the Internet discussion, I cut the plastic part of the card to make it small. I didn’t use the size template that people said one should use. I just used a pair of scissors. I cut it too small, so I put a Scotch tape on the back of the SIM card to stick it to the slot firmly. I didn’t want the tape to touch the golden side of the SIM card too much, but my understanding is that the only essential part is the center part of the gold side.

I thought hard about which side should be up, but the card can fit into the slot only in one way (because one corner of the card and the slot are both diagonally cut and they only match in one way/direction).

The phone did not start working immediately. I took the SIM card out and put it back a couple of times. At one point the phone started receiving texts. I was also able to send texts. The phone still did not work. It started working when I followed the Internet instruction “Disable Enhanced 4G LTE Mode.” This option was somewhere in the setting.

proc psmatch data=psm region=cs;

where &outcome ne .;

class FLAG districtname SCHOOLNAME ;

psmodel FLAG(Treated=”Y”)= &exactvar &predictors;

match method=greedy(k=1)/*(order=random)*/ exact=districtname stat=lps caliper=&caliper;

output out(obs=match)=outgs lps=_Lps matchid=_matchID;

run;

proc sort data=outgs;by _matchID;run;

The result table of a regression model includes, among other things, a column of coefficients. The intercept value, shown at the top cell of the coefficient column, may look mysterious and even arbitrary. The intercept is the predicted value for a subject whose values for all predictors in the model are 0’s. If the regression model includes gender as a predictor (coded as 1 if male, else 0), the intercept will indicate the average outcome value for female subjects. If the model includes gender and body weight, the intercept value will indicate the average outcome value for females who has a body weight of zero. Nobody’s weight is 0; thus, the meaning of the intercept in this case is nonsensical. If an analyst is not particularly interested in adding a substantive meaning to the intercept, he/she can ignore the intercept and safely interpret the rest of coefficients.

Personally I want all values in my result tables to have a substantive and interpretative meaning. As mentioned, with dummy variables (coded as 1 or 0) included in the model, the intercept already has a meaning.

If the model includes continuous variables, however, I recommend centering those variables around the variables’ average value. If the variable in question is a test score whose value range is 0 to 100 and the average score was 65, I would subtract 65 from each subject’s test score (if a test score is 60, then 60 – 65. In SAS, you can do:

proc standard data=abc out=abc2 mean=0;

var testscore1 ;

run;

With centering, the intercept will obtain a meaning. The intercept value indicates the predicted value for a subject whose test score is the average score. Again, the centering does not affect coefficients of other variables included in the model or any other values obtained from the model.

You can also center a predictor’s values and fix its standard deviation to be 1. If SAS, you can do:

proc standard data=abc out=abc2 mean=0 std=1;

var testscore;

run;

The resulting value is called “z-score.” Z-score may be better-known than the concept of centering. Z-score is one specific type of centering. Its mean is zero (as all values are centered around the average value) and standard deviation is fixed as 1.

I typically apply “z-scoring” for a pretest variable whose scores are large numbers (e.g., 953, 405, etc.). Without this adjustment, the derived coefficients may be too small to read in the table (e.g., 0.00000014).

P. 13 of the WWC stadards document.

https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_procedures_v3_0_standards_handbook.pdf

Overall Attrition | Conservative Boundary | Liberal Boundary |

0 | 0.057 | 0.1 |

0.01 | 0.058 | 0.101 |

0.02 | 0.059 | 0.102 |

0.03 | 0.059 | 0.103 |

0.04 | 0.06 | 0.104 |

0.05 | 0.061 | 0.105 |

0.06 | 0.062 | 0.107 |

0.07 | 0.063 | 0.108 |

0.08 | 0.063 | 0.109 |

0.09 | 0.063 | 0.109 |

0.1 | 0.063 | 0.109 |

0.11 | 0.062 | 0.109 |

0.12 | 0.062 | 0.109 |

0.13 | 0.061 | 0.108 |

0.14 | 0.06 | 0.108 |

0.15 | 0.059 | 0.107 |

0.16 | 0.059 | 0.106 |

0.17 | 0.058 | 0.105 |

0.18 | 0.057 | 0.103 |

0.19 | 0.055 | 0.102 |

0.2 | 0.054 | 0.1 |

0.21 | 0.053 | 0.099 |

0.22 | 0.052 | 0.097 |

0.23 | 0.051 | 0.095 |

0.24 | 0.049 | 0.094 |

0.25 | 0.048 | 0.092 |

0.26 | 0.047 | 0.09 |

0.27 | 0.045 | 0.088 |

0.28 | 0.044 | 0.086 |

0.29 | 0.043 | 0.084 |

0.3 | 0.041 | 0.082 |

0.31 | 0.04 | 0.08 |

0.32 | 0.038 | 0.078 |

0.33 | 0.036 | 0.076 |

0.34 | 0.035 | 0.074 |

0.35 | 0.033 | 0.072 |

0.36 | 0.032 | 0.07 |

0.37 | 0.031 | 0.067 |

0.38 | 0.029 | 0.065 |

0.39 | 0.028 | 0.063 |

0.4 | 0.026 | 0.06 |

0.41 | 0.025 | 0.058 |

0.42 | 0.023 | 0.056 |

0.43 | 0.021 | 0.053 |

0.44 | 0.02 | 0.051 |

0.45 | 0.018 | 0.049 |

0.46 | 0.016 | 0.046 |

0.47 | 0.015 | 0.044 |

0.48 | 0.013 | 0.042 |

0.49 | 0.012 | 0.039 |

0.5 | 0.01 | 0.037 |

0.51 | 0.009 | 0.035 |

0.52 | 0.007 | 0.032 |

0.53 | 0.006 | 0.03 |

0.54 | 0.004 | 0.028 |

0.55 | 0.003 | 0.026 |

0.56 | 0.002 | 0.023 |

0.57 | 0 | 0.021 |

0.58 | – | 0.019 |

0.59 | – | 0.016 |

0.6 | – | 0.014 |

0.61 | – | 0.011 |

0.62 | – | 0.009 |

0.63 | – | 0.007 |

0.64 | – | 0.005 |

0.65 | – | 0.003 |

The following SAS datastep conducts a test using functions in a datastep.

proc means data=both STACKODSOUTPUT n mean std min max stderr ;

class treat ;

var

<Variables here>

;

ods output summary=kaz2;

run;

data c;set kaz2;

if treat=0;

N_c=N;

mean_c=mean;

StdDev_c=StdDev;

Min_C=Min;

Max_C=Max;

StdErr_C=StdErr;

keep N_C MEAN_C StdDev_c MIN_C MAX_C StdErr_C Variable label;

run;

data t;set kaz2;

if treat=1;

N_t=N;

mean_t=mean;

StdDev_t=StdDev;

Min_t=Min;

Max_t=Max;

StdErr_t=StdErr;

Variable_QC=Variable;

keep N_T MEAN_T StdDev_t MIN_T MAX_T Variable_QC StdErr_t;

run;

data merge_CT;

merge C T ;

difference=MEAN_T-MEAN_C;

/*https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm*/

POOLED_SE=sqrt( ( (StdDev_t*StdDev_t) / N_T ) + ( (StdDev_c*StdDev_c ) / N_C ) );

T_value=abs(difference)/POOLED_SE;

P_value=(1-probnorm(T_value))*2;

*if P_value < 0.1 then sig=”t”;

if P_value < 0.05 then sig=”* “;

if P_value < 0.01 then sig=”** “;

if P_value < 0.001 then sig=”***”;

if P_value =. then sig=””;

run;

When I have a group represented in a series of dummy variables (e.g., race groups, grade levels, etc.), I want to also know if dummy variables as a meaningful group unit contribute to the model with statistical significance. The easiest way to do this is to treat those variables as classification variables. You will get a joint statistical test in one of the result tables.

proc glimmix ..;

**class race grade_level;**

….

run;

In my application I almost always use numeric version of variables, i.e., dummy variables (coded as 0 or 1). I like this approach because I can just use PROC MEANS on them to create a descriptive statistics table.

The question is how I get joint statistical tests when all of my predictors are numerically coded and thus I can’t rely on the class statement (shown above in the syntax example).

The GLIMMIX syntax below treats race groups and grade levels as numerically coded dummy variables (if YES 1, else 0).

The parameter estimate tables will show coefficients derived for each of the numeric variables; however, I wouldn’t know if race groups as a group matters to the model or grade levels as a system matters to the model. For example, even when the coefficient derived for subjects being black is statistically significant, that is only about how black students are different from white students (reference group in this example). We don’t know if race as a group matters and race groups jointly make a statistically significant contribution to the model.

<Again this can be done easily by using class variables instead (as shown earlier); however, I like using numeric variables in my models.>

Contrast statements will do the trick.

proc glimmix data=usethis namelen=32;

class groupunit;

model Y= treat black hispanic other grade09 grade10 grade11/

solution ddfm=kr dist=&dist link=&link ;

output out=&outcome.gmxout residual=resid;

random intercept /subject=groupunit;

**CONTRAST ‘Joint F-Test Race groups ‘ Black 1, Hispanic 1, other 1;**

**CONTRAST ‘Joint F-Test Grade levels’ grade09 1, grade10 1, grade11 1, **

ods output

ParameterEstimates=_3_&outcome.result covparms=_3_&outcome.cov

Contrasts=cont&outcome;

run;

The reason for using R (Restricted method) is because the alternative M (Maximum method) can have bias about covariance (level-2 variance in our application) and when the number of group unit is relatively small, so this is a real threat.

proc glimmix data=asdf METHOD=RSPL;

class CAMPUS_14 subgroup;

model y=x1 x2 x3 subgroup

/dist=binomial link=logit s ddfm=kr;

lsmeans group / ilink diff;

ods output ModelInfo=x1var1 ParameterEstimates=x2var1 CovParms=x3var1

Diffs=DIF_RESULT1 LSMeans=LS1;

run;