Anaonda
R mutate case_when NOT WORKING (Solved)
なぜか、太字のところしか機能しておらず、結果は６か、NULLになってしまいます。行ごとに、前の結果が消されていっているために最後の６しか残らないのだと思います。
wholedata %>%
mutate(year_ID = case_when(
REF_DATE >= year1start & REF_DATE <= year1end ~1,
REF_DATE >= year2start & REF_DATE <= year2end ~2,
REF_DATE >= year3start & REF_DATE <= year3end ~3,
REF_DATE >= year4start & REF_DATE <= year4end ~4,
REF_DATE >= year5start & REF_DATE <= year5end ~5,
REF_DATE >= year6start & REF_DATE <= year6end ~6,
TRUE ~ -9)-> wholedata2
解決しました。最後の行の）が足りませんでした。
wholedata %>%
mutate(year_ID = case_when(
REF_DATE >= year1start & REF_DATE <= year1end ~1,
REF_DATE >= year2start & REF_DATE <= year2end ~2,
REF_DATE >= year3start & REF_DATE <= year3end ~3,
REF_DATE >= year4start & REF_DATE <= year4end ~4,
REF_DATE >= year5start & REF_DATE <= year5end ~5,
REF_DATE >= year6start & REF_DATE <= year6end ~6,
TRUE ~ -9)) -> wholedata2
SAS PROC GLIMMIX example
The following is a proc glimmix example syntax. I ran it using a fake dataset, so the results are also fake. The outcome is an interval variable and the model is a linear model (not a non-linear model like the logistic regression model). The random statement makes this model "multilevel." Level-1 units are students and level-2 units are schools (IDs are nces_school_name). The intercept is being estimated as the grand average of school-specific intercepts.
This model is a multilevel model and different from the OLS model because it estimates intercept as random effects (school-specific intercepts are derived as random effects and the grand average of those are reported as the intercept).
proc glimmix data=temp1 ;
class nces_school_name ;
model y=x1 x2 x3/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;
The main result table you will need to look at is the following. You can interpret this just like you would interpret the OLS regression result.
Solutions for Fixed Effects | |||||
Effect | Estimate | Standard Error |
DF | t Value | Pr > |t| |
Intercept | 0.5062 | 0.01600 | 1040 | 31.64 | <.0001 |
x1 | 0.005776 | 0.01717 | 3483 | 0.34 | 0.7365 |
x2 | 0.000465 | 0.01684 | 3486 | 0.03 | 0.9780 |
x3 | -0.01117 | 0.01708 | 3487 | -0.65 | 0.5133 |
Another table that you want to look at is this. Residual (0.54) is Level-1 variance (student-level variance). Intercept/nces_school_name (0.65) is Level-2 variance (school-level variance). I made up these numbers.
Covariance Parameter Estimates | |||
Cov Parm | Subject | Estimate | Standard Error |
Intercept | nces_school_name | 0.65 | 0.0181 |
Residual | 0.54 | 0.2002 |
I will focus on the first coefficient table when discussing results; however, I would also report information related to the second table. I would report the following information:
ICC (intraclass correlation):
level2 variance / (level1+level2 variance)
That is:
0.65 / (0.65 + 0.54) =0.546218
This shows the degree to which outcome variance is located between groups (schools, clusters) as opposed to within individuals.
I would also report "variance explained."
I need to run the additional model, which is the analysis-of-variance model where I have no covariate in the model:
proc glimmix data=temp1 ;
class nces_school_name ;
model y=/
solution ddfm=kr dist=normal link=identity ;
random intercept /subject=nces_school_name;
run;
Let's take a look at the covariance table (I made up these values).
Covariance Parameter Estimates | |||
Cov Parm | Subject | Estimate | Standard Error |
Intercept | nces_school_name | 0.92 | 0.181 |
Residual | 0.83 | 0.200 |
Let's combine this table with the other one in this way.
Analysis of variance model | Final model | Variance explained | |
Level 1 variance | 0.83 | 0.54 | 0.34939759 |
Level 2 variance | 0.92 | 0.65 | 0.293478261 |
Variance explained was calculated as:
(0.83-0.54)/0.83
and
(0.92-0.65)/0.92
I would make the final table look look like this. I didn't round numbers, but you should.
Estimates | Standard error | p-value | statistical test | |
Intercept | 0.5062 | 0.016 | <.0001 | *** |
x1 | 0.00578 | 0.01717 | 0.7365 | |
x2 | 0.00047 | 0.01684 | 0.978 | |
x3 | -0.0112 | 0.01708 | 0.5133 | |
level-1 variance | 0.83 | |||
level-2 variance | 0.92 | |||
ICC | 0.55 | |||
Level-1 variance explained | 0.35 | |||
Level-2 variance explained | 0.3 | |||
Notes: *** if p < 0.001, ** if p < 0.01, * if p < 0.05. |
How to interpret coefficients in the table
In the table above, x1's coefficient is 0.00578. This means that one unit increase in X1 will lead to an increase of 0.00578 in Y. The p-value associated with this is 0.74. So the coefficient here is not statistically significant at alpha=0.05.
One unit increase in X1 means ... if X is about height in meters (e.g., 1.5 meter, 1.7 meter), then 1 meter is 1 unit increase. If X is a binary variable (0 or 1), then one unit increase means "0 to 1 increase".
For my work, I almost always have a variable called TREATMENT which is 1 if subjects received treatment/intervention and 0 if the subjects did not. The coefficient for this is called "program impact effect." If, for example, the program impact effect is 0.25, I just say that and I also mention that other covariates are in the model and the program impact effect is adjusted for these factors. If the estimated program effect is 0.25, it means that the difference between the two groups (treatment vs. control) is 0.25 in outcome.
I also want to provide a standardized version of the program effect. I would run the same statistical model with the z-score version of the outcome variable. To do this, I usually use proc standard:
data abc2; set abc1;
Z_Y=Y;
run;
proc standard data=abc2 out=abc3 mean=0 std=1;
var Z_Y;
run;
Another approach would be to code this by hand in a datastep. if the mean of Y is -0.42 and SD is 0.5:
data abc2; set abc1;
Z_Y= (Y - 0.42)/0.5 ;
run;
3 Akira Takasaki Licks From 1983
Using SAS datasteps and PROC SQL to conduct a t-test
/*break the data and apply ttest*/
data kaz20;set kaz2;
if engaged=0;
run;
data kaz21;set kaz2;
if engaged=1;
run;
proc sort data=kaz20;by subgroup variable;run;
proc sort data=kaz21;by subgroup variable;run;
proc sql;
create table new as
select
a.subgroup as school_level,
a.Variable as varriable_name,
a.n as engage0_n,
a.Mean as engage0_mean,
a.StdDev as engage0_SD,
a.Min as enagage0_min,
a.Max as enagage0_max,
b.n as engage1_n,
b.Mean as engage1_mean,
b.StdDev as engage1_SD,
b.Min as enagage1_min,
b.Max as enagage1_max
from kaz20 a
join kaz21 b
on a.subgroup=b.subgroup and a.Variable=b.Variable
;
data new2;set new;
/*QC
engage1_n=100;
engage1_mean=.5;
engage1_SD=.2;
engage0_n=120;
engage0_mean=.55;
engage0_SD=.2;
*/
/*t-test*/
difference=engage1_mean-engage0_mean;
/*https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm*/
POOLED_SE=sqrt( ( (engage1_SD*engage1_SD) / engage1_n ) + ( (engage0_SD*engage0_SD ) / engage0_n ) );
T_value=abs(difference)/POOLED_SE;
P_value=(1-probnorm(T_value))*2;
*if P_value < 0.1 then sig="t";
if P_value < 0.05 then sig="* ";
if P_value < 0.01 then sig="** ";
if P_value < 0.001 then sig="***";
if P_value =. then sig="";
run;
Yahoo data breach
https://www.cnn.com/2019/10/15/business/yahoo-data-breach-settlement-trnd/index.html
Van Halen - "Eruption" Guitar (SOLO) Lesson #3 taught by Chelsea Constable
MS-ACCESS "System resource exceeded"
https://www.windowscentral.com/assign-specific-processor-cores-apps-windows-10
So in my case (Windows 10):
1) Control-alt-calc and choose Task manager
2) Go to Details tab and right-click on MSACCESS.EXE
3) Choose SET AFFINITY
I think this will let the computer to use most energy/memory/effort only on that software.
I have not tested this yet.
MS-ACCESS, not enough memory
An error message saying "there is not enough memory."
OPTIONS (You will get a small window Access Options)
General (left, top)
Disable hardware graphics acceleration -- to CLICK ON