R mutate case_when NOT WORKING (Solved)

なぜか、太字のところしか機能しておらず、結果は6か、NULLになってしまいます。行ごとに、前の結果が消されていっているために最後の6しか残らないのだと思います。

wholedata %>%
mutate(year_ID = case_when(
REF_DATE >= year1start & REF_DATE <= year1end ~1,
REF_DATE >= year2start & REF_DATE <= year2end ~2,
REF_DATE >= year3start & REF_DATE <= year3end ~3,
REF_DATE >= year4start & REF_DATE <= year4end ~4,
REF_DATE >= year5start & REF_DATE <= year5end ~5,
REF_DATE >= year6start & REF_DATE <= year6end ~6,
TRUE ~ -9)-> wholedata2

 

解決しました。最後の行の)が足りませんでした。

wholedata %>%
mutate(year_ID = case_when(
REF_DATE >= year1start & REF_DATE <= year1end ~1,
REF_DATE >= year2start & REF_DATE <= year2end ~2,
REF_DATE >= year3start & REF_DATE <= year3end ~3,
REF_DATE >= year4start & REF_DATE <= year4end ~4,
REF_DATE >= year5start & REF_DATE <= year5end ~5,
REF_DATE >= year6start & REF_DATE <= year6end ~6,
TRUE ~ -9)) -> wholedata2

SAS PROC GLIMMIX example

The following is a proc glimmix example syntax.  I ran it using a fake dataset, so the results are also fake.  The outcome is an interval variable and the model is a linear model (not a non-linear model like the logistic regression model).  The random statement makes this model "multilevel."  Level-1 units are students and level-2 units are schools (IDs are nces_school_name).  The intercept is being estimated as the grand average of school-specific intercepts.

This model is a multilevel model and different from the OLS model because it estimates intercept as random effects (school-specific intercepts are derived as random effects and the grand average of those are reported as the intercept).

proc glimmix data=temp1 ;
class nces_school_name ;
model y=x1 x2 x3/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;

The main result table you will need to look at is the following.  You can interpret this just like you would interpret the OLS regression result.

Solutions for Fixed Effects
Effect Estimate Standard
Error
DF t Value Pr > |t|
Intercept 0.5062 0.01600 1040 31.64 <.0001
x1 0.005776 0.01717 3483 0.34 0.7365
x2 0.000465 0.01684 3486 0.03 0.9780
x3 -0.01117 0.01708 3487 -0.65 0.5133

Another table that you want to look at is this.  Residual (0.54) is Level-1 variance (student-level variance).  Intercept/nces_school_name (0.65) is Level-2 variance (school-level variance).  I made up these numbers.

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Intercept nces_school_name 0.65 0.0181
Residual   0.54 0.2002

I will focus on the first coefficient table when discussing results; however, I would also report information related to the second table.  I would report the following information:

ICC (intraclass correlation):

level2 variance / (level1+level2 variance)

That is:

0.65 / (0.65 + 0.54) =0.546218

This shows the degree to which outcome variance is located between groups (schools, clusters) as opposed to within individuals.

I would also report "variance explained."

I need to run the additional model, which is the analysis-of-variance model where I have no covariate in the model:

proc glimmix data=temp1 ;
class nces_school_name ;
model y=/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;

Let's take a look at the covariance table (I made up these values).

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Intercept nces_school_name 0.92 0.181
Residual   0.83 0.200

 

Let's combine this table with the other one in this way.

Analysis of variance model Final model Variance explained
Level 1 variance 0.83 0.54 0.34939759
Level 2 variance 0.92 0.65 0.293478261

 

 

Variance explained was calculated as:

(0.83-0.54)/0.83

and

(0.92-0.65)/0.92

 

I would make the final table look look like this.   I didn't round numbers, but you should.

Estimates Standard error p-value statistical test
Intercept 0.5062 0.016 <.0001 ***
x1 0.00578 0.01717 0.7365
x2 0.00047 0.01684 0.978
x3 -0.0112 0.01708 0.5133
level-1 variance 0.83
level-2 variance 0.92
ICC 0.55
Level-1 variance explained 0.35
Level-2 variance explained 0.3
Notes: *** if p < 0.001, ** if p < 0.01, * if p < 0.05.

 

How to interpret coefficients in the table

In the table above, x1's coefficient is 0.00578.  This means that one unit increase in X1 will lead to an increase of 0.00578 in Y.   The p-value associated with this is 0.74.  So the coefficient here is not statistically significant at alpha=0.05.

One unit increase in X1 means ... if X is about height in meters (e.g., 1.5 meter, 1.7 meter), then 1 meter is 1 unit increase.  If X is a binary variable (0 or 1), then one unit increase means "0 to 1 increase".

For my work, I almost always have a variable called TREATMENT which is 1 if subjects received treatment/intervention and 0 if the subjects did not.  The coefficient for this is called "program impact effect."  If, for example, the program impact effect is 0.25, I just say that and I also mention that other covariates are in the model and the program impact effect is adjusted for these factors.  If the estimated program effect is 0.25, it means that the difference between the two groups (treatment vs. control) is 0.25 in outcome.

I also want to provide a standardized version of the program effect.  I would run the same statistical model with the z-score version of the outcome variable.  To do this, I usually use proc standard:

data abc2; set abc1;

Z_Y=Y;

run;

proc standard data=abc2 out=abc3 mean=0 std=1;

var Z_Y;

run;

Another approach would be to code this by hand in a datastep.  if the mean of Y is -0.42 and SD is 0.5:

data abc2; set abc1;

Z_Y= (Y - 0.42)/0.5 ;

run;

 

 

 

Using SAS datasteps and PROC SQL to conduct a t-test

/*break the data and apply ttest*/
data kaz20;set kaz2;
if engaged=0;
run;

data kaz21;set kaz2;
if engaged=1;
run;

proc sort data=kaz20;by subgroup variable;run;
proc sort data=kaz21;by subgroup variable;run;

proc sql;
create table new as
select
a.subgroup as school_level,
a.Variable as varriable_name,
a.n as engage0_n,
a.Mean as engage0_mean,
a.StdDev as engage0_SD,
a.Min as enagage0_min,
a.Max as enagage0_max,
b.n as engage1_n,
b.Mean as engage1_mean,
b.StdDev as engage1_SD,
b.Min as enagage1_min,
b.Max as enagage1_max

from kaz20 a
join kaz21 b
on a.subgroup=b.subgroup and a.Variable=b.Variable
;

data new2;set new;
/*QC
engage1_n=100;
engage1_mean=.5;
engage1_SD=.2;
engage0_n=120;
engage0_mean=.55;
engage0_SD=.2;
*/

/*t-test*/
difference=engage1_mean-engage0_mean;
/*https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm*/
POOLED_SE=sqrt( ( (engage1_SD*engage1_SD) / engage1_n ) + ( (engage0_SD*engage0_SD ) / engage0_n ) );

T_value=abs(difference)/POOLED_SE;

P_value=(1-probnorm(T_value))*2;
*if P_value < 0.1 then sig="t";
if P_value < 0.05 then sig="* ";
if P_value < 0.01 then sig="** ";
if P_value < 0.001 then sig="***";
if P_value =. then sig="";

run;

MS-ACCESS "System resource exceeded"

https://www.windowscentral.com/assign-specific-processor-cores-apps-windows-10

https://social.technet.microsoft.com/Forums/office/en-US/5828a030-f27f-4001-9aaa-8d2951ef4169/quotsystem-resource-exceededquot-for-simple-select-query-in-access-2013?forum=officeitpro

 

So in my case (Windows 10):

1) Control-alt-calc and choose Task manager

2) Go to Details tab and right-click on MSACCESS.EXE

3) Choose SET AFFINITY

I think this will let the computer to use most energy/memory/effort only on that software.

I have not tested this yet.