Analysis of Variance (ANOVA)

Hello, world!

This week in advanced statistics we covered Analysis of Variance (ANOVA). This week we were assigned two questions. Here is the first:

A researcher is interested in the effects of drug against stress reaction. She gives a reaction time test to three different groups of subjects: one group that is under a great deal of stress, one group under a moderate amount of stress, and a third group that is under almost no stress. The subjects of the study were instructed to take the drug test during their next stress episode and to report their stress on a scale of 1 to 10 (10 being most pain).

High Stress	Moderate Stress	Low Stress
10	8	4
9	10	6
8	6	6
9	7	4
10	8	2
8	8	2

Report on drug and stress level by using R. Provide a full summary report on the result of ANOVA testing and what does it mean. More specifically, report using the following R functions: Df, Sum, Sq Mean, Sq, F value, Pr(>F)

The R code used to solve this is as follows:

# Assigns the high stress data to a variable
high_stress <- c(10, 9, 8, 9, 10, 8)

# Assigns the moderate stress data to a variable
mod_stress <- c(8, 10, 6, 7, 8, 8)

#Assigns the low stress data to a variable
low_stress <- c(4, 6, 6, 4, 2, 2)

# Creates a df with all stress levels
stress_df <- as.data.frame(cbind(high_stress, mod_stress, low_stress))

# Stacks the data frame information so we can see every instance of high/low/moderate stress that pertains to a pain rating
stress_st <- stack(stress_df)

# Conducts a one-way analysis not assuming equal variances on the pain rating and stress level
stress_ow <- oneway.test(values ~ ind, data = stress_st)

# Calculates ANOVA
stress_aov <- aov(values ~ ind, data = stress_st)
summary(stress_aov)

When this code is run, we receive the following output:

                  Df Sum Sq     Mean Sq      F value        Pr(>F)    
ind               2  82.11       41.06         21.36       4.08e-05 ***
Residuals         15  28.83       1.92                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So in this problem, our null hypothesis is that the drug has no impact on stress levels, and our alternative hypothesis is that the drug does impact stress levels. After conducting our ANOVA test, we can see that the P-Value is .0000408.

Since the P value associated with F is .0000408 and less that the critical value level of .05, we have evidence to reject the null hypothesis.

The second question is as follows:

The zelazo data (taken from textbook’s R package called ISwR) are in the form of a list of vectors, one for each of the four groups. Convert the data to a form suitable for the user of lm, and calculate the relevant test. Consider t tests comparing selected subgroups or obtained by combing groups.

2.1 Consider ANOVA test (one way or two-way) to this dataset (zelazo)

Despite my best efforts, I could not get the 8-week control variable dataset to cooperate with the others. Therefore, I simply compared the other 3 to each other. Here’s my R code:

# Loads the ISwR package
library(ISwR)
data("zelazo")
summary(zelazo)
attach(zelazo)

#Lines 7-13 formulate the data for analysis
d_active <- as.data.frame(zelazo$active)
d_passive <- as.data.frame(zelazo$passive)
d_none <- as.data.frame(zelazo$none)
d_ctr.8w <- as.data.frame(zelazo$ctr.8w)
d_ctr.8w[6,] <- NA
zelazo.df <- as.data.frame(c(d_active, d_passive, d_none, d_ctr.8w))
summary(zelazo.df)

# Fit linear models
a_p.lm<- lm(active ~passive, data = zelazo.df)
p_n.lm <- lm(passive ~none, data = zelazo.df)
n_a.lm<- lm(none~active, data = zelazo.df)
print(a_p.lm)
print(p_n.lm)
print(n_a.lm)

#Conduct t-tests
t.test(active, passive, data = zelazo.df)
t.test(passive, none, data = zelazo.df)
t.test(none, active, data = zelazo.df)

#ANOVA on lms
anova(a_p.lm)
anova(p_n.lm)
anova(n_a.lm)

And here is what the console outputs:

> #Lines 7-13 formulate the data for analysis
> d_active <- as.data.frame(zelazo$active)
> d_passive <- as.data.frame(zelazo$passive)
> d_none <- as.data.frame(zelazo$none)
> d_ctr.8w <- as.data.frame(zelazo$ctr.8w)
> d_ctr.8w[6,] <- NA
> zelazo.df <- as.data.frame(c(d_active, d_passive, d_none, d_ctr.8w))
> summary(zelazo.df)
 zelazo.active    zelazo.passive   zelazo.none    zelazo.ctr.8w  
 Min.   : 9.000   Min.   :10.00   Min.   : 9.00   Min.   :11.50  
 1st Qu.: 9.500   1st Qu.:10.12   1st Qu.:11.50   1st Qu.:11.50  
 Median : 9.625   Median :10.75   Median :11.75   Median :12.00  
 Mean   :10.125   Mean   :11.38   Mean   :11.71   Mean   :12.35  
 3rd Qu.: 9.938   3rd Qu.:11.56   3rd Qu.:12.75   3rd Qu.:13.25  
 Max.   :13.000   Max.   :15.00   Max.   :13.25   Max.   :13.50  
                                                  NA's   :1      
> # Fit linear models
> a_p.lm<- lm(active ~passive, data = zelazo.df)
> p_n.lm <- lm(passive ~none, data = zelazo.df)
> n_a.lm<- lm(none~active, data = zelazo.df)
> print(a_p.lm)
Call:
lm(formula = active ~ passive, data = zelazo.df)
Coefficients:
(Intercept)      passive  
    12.0439      -0.1687  
> print(p_n.lm)
Call:
lm(formula = passive ~ none, data = zelazo.df)
Coefficients:
(Intercept)         none  
     4.6287       0.5762  
> print(n_a.lm)
Call:
lm(formula = none ~ active, data = zelazo.df)
Coefficients:
(Intercept)       active  
     7.1445       0.4507  
> #Conduct t-tests 
> t.test(active, passive, data = zelazo.df)
Welch Two Sample t-test
data:  active and passive
t = -1.2839, df = 9.3497, p-value = 0.2301
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.4399668  0.9399668
sample estimates:
mean of x mean of y 
   10.125    11.375 
> t.test(passive, none, data = zelazo.df)
Welch Two Sample t-test
data:  passive and none
t = -0.33603, df = 9.5489, p-value = 0.7441
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.557843  1.891177
sample estimates:
mean of x mean of y 
 11.37500  11.70833 
> t.test(none, active, data = zelazo.df)
Welch Two Sample t-test
data:  none and active
t = 1.8481, df = 9.9759, p-value = 0.09442
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3262604  3.4929271
sample estimates:
mean of x mean of y 
 11.70833  10.12500 
> 
> #ANOVA on lms
> anova(a_p.lm)
Analysis of Variance Table
Response: active
          Df Sum Sq Mean Sq F value Pr(>F)
passive    1 0.5114 0.51136  0.2054 0.6739
Residuals  4 9.9574 2.48935               
> anova(p_n.lm)
Analysis of Variance Table
Response: passive
          Df  Sum Sq Mean Sq F value Pr(>F)
none       1  3.8353  3.8353  1.0855 0.3563
Residuals  4 14.1335  3.5334               
> anova(n_a.lm)
Analysis of Variance Table
Response: none
          Df Sum Sq Mean Sq F value Pr(>F)
active     1 2.1270  2.1270  0.9027 0.3959
Residuals  4 9.4251  2.3563

So, with our ANOVA tests, none of them have significant evidence to reject H0. Honestly, I’d like to do more research and see how I can run ANOVA on datasets with varying lengths, as I thought inserting a NA to fill the space would be effective. Unfortunately, this was not the case! Regardless of this shortcoming, I learned a lot, and that’s all for this week!

Brittany Bryce

Data Made Simple

Analysis of Variance (ANOVA)

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply