R

String manipulation

library(stringr)

str_trim(" this is a test")

 

Calendar variables

# Preview students2 with str()
str(students2)

# Load the lubridate package
library(lubridate)

# Parse as date
dmy("17 Sep 2015")

# Parse as date and time (with no seconds!)
mdy_hm("July 15, 2012 12:56")

# Coerce dob to a date (with no time)
students2$dob <- ymd(students2$dob)

 

Change the variable type 

https://campus.datacamp.com/courses/cleaning-data-in-r/1828?ex=3

# Preview students with str()
str(students)

# Coerce Grades to character
students$Grades <- as.character(students$Grades)

# Coerce Medu to factor
students$Medu <- as.factor(students$Medu)

# Coerce Fedu to factor
students$Fedu <- as.factor(students$Fedu)

 

 

Dates with lubridate

https://campus.datacamp.com/courses/cleaning-data-in-r/1828?ex=1

library(lubridate)

 

The use of na.rm=TRUE

mean(x,na.rm=TRUE)

This above is an example of how telling R not to be bothered by a missing value.  Without the na.rm specificaiton, the function will not return a value.

 

Combining two variables into one.

psmdata$cohort_t_status<-paste0(psmdata$cohortid,"-",psmdata$treat)
by_cohort <- psmdata %>%
group_by(cohort_t_status) %>%
summarize(meanACT=mean(act_composite))
ggplot(by_cohort,aes(x=cohort_t_status,y=meanACT))+geom_col()

 

 

Attach() and detach()

http://www.statmethods.net/management/aggregate.html

 

How to convert SAS and other data files into R files:

http://www.ats.ucla.edu/stat/r/faq/inputdata_R.htm

 

SQL in R

sqldf

Function example:

addition = function(num1,num2){ answer = num1+num2 return(answer) } addition(10,9) addition(5,4)

 

Change working directly (Notice the slash is / not \ even on Windows)

setwd("C:/R")

You can check the current working directly by submitting:

getwd()

Read this with more details by clicking here:

How to set a working directory in R

 

Convert a CSV file into a readable dataset

temp <- read.csv(file="practice_data.csv",header=TRUE,sep=",")

type the name of the dataset to see what you just did (if big, you will see a lot of data, though not all):

temp

This would do the same thing:

temp = read.csv(file="practice_data.csv",header=TRUE,sep=",")

 

You can check what datasets you have activated by:

objects()

Print a variable off a dataset

temp2 <- temp$height

Quick look at the data

str(temp)         <This descrives the structure of data.>

head(temp)      <This prints the first 6 observations.>

Get the descriptive summary of the data or variables
summary(temp)
summary(temp$weight)

mean(BOD$demand)        <BOD is a default dataset.>

hist(BOD$demand)   <Histogram is created.>

hist(BOD$demand,breaks=4)   <The bar breask every 4 units>

boxplot(BOD$demand)

 

Create new variables

newvar=BOD$demand*2
newvar

 

How to implement an algorithm using R

Misllaneous comands

library()       You can tell which packages you downloaded

search()        This does something similar.

data()       Show available default datasets

if you type the name of the data, you can see it.  For example:

BOD

 

Reference

https://sites.google.com/site/webtextofr/  (Japanese)

Algebra and R http://atcm.mathandtech.org/EP2008/papers_full/2412008_14997.pdf