Hello, world!
This week in advanced statistics we covered Analysis of Variance (ANOVA). This week we were assigned two questions. Here is the first:
A researcher is interested in the effects of drug against stress reaction. She gives a reaction time test to three different groups of subjects: one group that is under a great deal of stress, one group under a moderate amount of stress, and a third group that is under almost no stress. The subjects of the study were instructed to take the drug test during their next stress episode and to report their stress on a scale of 1 to 10 (10 being most pain).
| High Stress | Moderate Stress | Low Stress |
| 10 | 8 | 4 |
| 9 | 10 | 6 |
| 8 | 6 | 6 |
| 9 | 7 | 4 |
| 10 | 8 | 2 |
| 8 | 8 | 2 |
Report on drug and stress level by using R. Provide a full summary report on the result of ANOVA testing and what does it mean. More specifically, report using the following R functions: Df, Sum, Sq Mean, Sq, F value, Pr(>F)
The R code used to solve this is as follows:
# Assigns the high stress data to a variable
high_stress <- c(10, 9, 8, 9, 10, 8)
# Assigns the moderate stress data to a variable
mod_stress <- c(8, 10, 6, 7, 8, 8)
#Assigns the low stress data to a variable
low_stress <- c(4, 6, 6, 4, 2, 2)
# Creates a df with all stress levels
stress_df <- as.data.frame(cbind(high_stress, mod_stress, low_stress))
# Stacks the data frame information so we can see every instance of high/low/moderate stress that pertains to a pain rating
stress_st <- stack(stress_df)
# Conducts a one-way analysis not assuming equal variances on the pain rating and stress level
stress_ow <- oneway.test(values ~ ind, data = stress_st)
# Calculates ANOVA
stress_aov <- aov(values ~ ind, data = stress_st)
summary(stress_aov)
When this code is run, we receive the following output:
Df Sum Sq Mean Sq F value Pr(>F)
ind 2 82.11 41.06 21.36 4.08e-05 ***
Residuals 15 28.83 1.92
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So in this problem, our null hypothesis is that the drug has no impact on stress levels, and our alternative hypothesis is that the drug does impact stress levels. After conducting our ANOVA test, we can see that the P-Value is .0000408.
Since the P value associated with F is .0000408 and less that the critical value level of .05, we have evidence to reject the null hypothesis.
The second question is as follows:
The zelazo data (taken from textbook’s R package called ISwR) are in the form of a list of vectors, one for each of the four groups. Convert the data to a form suitable for the user of lm, and calculate the relevant test. Consider t tests comparing selected subgroups or obtained by combing groups.
2.1 Consider ANOVA test (one way or two-way) to this dataset (zelazo)
Despite my best efforts, I could not get the 8-week control variable dataset to cooperate with the others. Therefore, I simply compared the other 3 to each other. Here’s my R code:
# Loads the ISwR package
library(ISwR)
data("zelazo")
summary(zelazo)
attach(zelazo)
#Lines 7-13 formulate the data for analysis
d_active <- as.data.frame(zelazo$active)
d_passive <- as.data.frame(zelazo$passive)
d_none <- as.data.frame(zelazo$none)
d_ctr.8w <- as.data.frame(zelazo$ctr.8w)
d_ctr.8w[6,] <- NA
zelazo.df <- as.data.frame(c(d_active, d_passive, d_none, d_ctr.8w))
summary(zelazo.df)
# Fit linear models
a_p.lm<- lm(active ~passive, data = zelazo.df)
p_n.lm <- lm(passive ~none, data = zelazo.df)
n_a.lm<- lm(none~active, data = zelazo.df)
print(a_p.lm)
print(p_n.lm)
print(n_a.lm)
#Conduct t-tests
t.test(active, passive, data = zelazo.df)
t.test(passive, none, data = zelazo.df)
t.test(none, active, data = zelazo.df)
#ANOVA on lms
anova(a_p.lm)
anova(p_n.lm)
anova(n_a.lm)
And here is what the console outputs:
> #Lines 7-13 formulate the data for analysis
> d_active <- as.data.frame(zelazo$active)
> d_passive <- as.data.frame(zelazo$passive)
> d_none <- as.data.frame(zelazo$none)
> d_ctr.8w <- as.data.frame(zelazo$ctr.8w)
> d_ctr.8w[6,] <- NA
> zelazo.df <- as.data.frame(c(d_active, d_passive, d_none, d_ctr.8w))
> summary(zelazo.df)
zelazo.active zelazo.passive zelazo.none zelazo.ctr.8w
Min. : 9.000 Min. :10.00 Min. : 9.00 Min. :11.50
1st Qu.: 9.500 1st Qu.:10.12 1st Qu.:11.50 1st Qu.:11.50
Median : 9.625 Median :10.75 Median :11.75 Median :12.00
Mean :10.125 Mean :11.38 Mean :11.71 Mean :12.35
3rd Qu.: 9.938 3rd Qu.:11.56 3rd Qu.:12.75 3rd Qu.:13.25
Max. :13.000 Max. :15.00 Max. :13.25 Max. :13.50
NA's :1
> # Fit linear models
> a_p.lm<- lm(active ~passive, data = zelazo.df)
> p_n.lm <- lm(passive ~none, data = zelazo.df)
> n_a.lm<- lm(none~active, data = zelazo.df)
> print(a_p.lm)
Call:
lm(formula = active ~ passive, data = zelazo.df)
Coefficients:
(Intercept) passive
12.0439 -0.1687
> print(p_n.lm)
Call:
lm(formula = passive ~ none, data = zelazo.df)
Coefficients:
(Intercept) none
4.6287 0.5762
> print(n_a.lm)
Call:
lm(formula = none ~ active, data = zelazo.df)
Coefficients:
(Intercept) active
7.1445 0.4507
> #Conduct t-tests
> t.test(active, passive, data = zelazo.df)
Welch Two Sample t-test
data: active and passive
t = -1.2839, df = 9.3497, p-value = 0.2301
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.4399668 0.9399668
sample estimates:
mean of x mean of y
10.125 11.375
> t.test(passive, none, data = zelazo.df)
Welch Two Sample t-test
data: passive and none
t = -0.33603, df = 9.5489, p-value = 0.7441
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.557843 1.891177
sample estimates:
mean of x mean of y
11.37500 11.70833
> t.test(none, active, data = zelazo.df)
Welch Two Sample t-test
data: none and active
t = 1.8481, df = 9.9759, p-value = 0.09442
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3262604 3.4929271
sample estimates:
mean of x mean of y
11.70833 10.12500
>
> #ANOVA on lms
> anova(a_p.lm)
Analysis of Variance Table
Response: active
Df Sum Sq Mean Sq F value Pr(>F)
passive 1 0.5114 0.51136 0.2054 0.6739
Residuals 4 9.9574 2.48935
> anova(p_n.lm)
Analysis of Variance Table
Response: passive
Df Sum Sq Mean Sq F value Pr(>F)
none 1 3.8353 3.8353 1.0855 0.3563
Residuals 4 14.1335 3.5334
> anova(n_a.lm)
Analysis of Variance Table
Response: none
Df Sum Sq Mean Sq F value Pr(>F)
active 1 2.1270 2.1270 0.9027 0.3959
Residuals 4 9.4251 2.3563
So, with our ANOVA tests, none of them have significant evidence to reject H0. Honestly, I’d like to do more research and see how I can run ANOVA on datasets with varying lengths, as I thought inserting a NA to fill the space would be effective. Unfortunately, this was not the case! Regardless of this shortcoming, I learned a lot, and that’s all for this week!