Logistic Regression

 Hello world! This week in advanced statistics we covered logistic regression. We were assigned two questions for the week, and the first is as follows:

10.1 Set up an additive model for the ashina data (see Exercise 4.6), containing additive effects of subjects, period, and treatment. Compare the results with those obtained from t tests.

Here is the R Code I used for this question:

# Loads the ISwR package
library(ISwR)

# Pulls the ashina data set
data(ashina)
summary(ashina)

# Creates a "subject" column containing number 1-16 
ashina$subject <- factor(1:16)

# Makes ashina objects more accessible
attach(ashina)

# Data frame for subjects who recieved the drugs
act <- data.frame(vas=vas.active, subject, treat=1, period=grp)

# Data frame for subjects who recieved a placebo
plac <-data.frame(vas=vas.plac, subject, treat=0, period = grp)

# Combines modified data into new dataframe
new_ashina<- rbind(act, plac)

# Makes new_ashina objects more accessible
attach(new_ashina)

# Runs a linear regression model on new_ashina testing
lm_ashina <- lm(treat ~ vas + period, data = new_ashina)

# Runs two way t-tests
t.test(treat, vas)
t.test(treat, period)

#Prints a summary of the linear regression model of new_ashina
summary(lm_ashina) 

And this is what R outputs when the code is run:

> t.test(treat, vas)
Welch Two Sample t-test
data:  treat and vas
t = 3.83, df = 31.006, p-value = 0.0005844
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 16.77138 54.97862
sample estimates:
mean of x mean of y 
    0.500   -35.375 
> t.test(treat, period)
Welch Two Sample t-test
data:  treat and period
t = -7, df = 61.936, p-value = 2.16e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.1248766 -0.6251234
sample estimates:
mean of x mean of y 
    0.500     1.375 
> summary(lm_ashina)
Call:
lm(formula = treat ~ vas + period, data = new_ashina)
Residuals:
     Min       1Q   Median       3Q      Max 
-0.73631 -0.42066 -0.08795  0.44323  0.80985 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.163771   0.284593   0.575   0.5694  
vas         -0.004301   0.001680  -2.560   0.0159 *
period       0.133889   0.180925   0.740   0.4652  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4743 on 29 degrees of freedom
Multiple R-squared:  0.1844, Adjusted R-squared:  0.1281 
F-statistic: 3.278 on 2 and 29 DF,  p-value: 0.05206

As we can see, the estimated effect of vas on treatment is -0.004301, and the estimated impact of time period on treatment is 0.133889. Both of the p-values in the t-tests hail the correlation as significant, and the p-value result in the regression model for vas is also significant. However, the p-value for period in the linear regression model is not significant. 

The second question is as follows:

Consider the following definitions:

a <- gl(2, 2, 8) b <- gl(2, 4, 8) x <- 1:8

y <- c(1:4,8:5) z <- rnorm(8)

Generate the model matrices for models z ~ a*b, z ~ a:b, etc. Discuss the implications. Carry out the model fits, and notice which models contain singularities. 

This is the R code I created for the question:

# Establishes variables
a <- gl(2, 2, 8)
b <- gl(2, 4, 8)
x <- 1:8
y <- c(1:4, 8:5)
z <- rnorm (8)
# Runs model matrices
model.matrix(z~a*b)
model.matrix(z~a:b)
model.matrix(z~a*x)
model.matrix(z~a:x)
model.matrix(z~a*y)
model.matrix(z~a:y)
model.matrix(z~b*x)
model.matrix(z~b:x)
model.matrix(z~b*y)
model.matrix(z~y*x)
model.matrix(z~y:x)

And here is the output R provides:

> # Establishes variable
> a <- gl(2, 2, 8)
> b <- gl(2, 4, 8)
> x <- 1:8
> y <- c(1:4, 8:5)
> z <- rnorm (8)
> # Runs model matrices
> model.matrix(z~a*b)
  (Intercept) a2 b2 a2:b2
1           1  0  0     0
2           1  0  0     0
3           1  1  0     0
4           1  1  0     0
5           1  0  1     0
6           1  0  1     0
7           1  1  1     1
8           1  1  1     1
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(z~a:b)
  (Intercept) a1:b1 a2:b1 a1:b2 a2:b2
1           1     1     0     0     0
2           1     1     0     0     0
3           1     0     1     0     0
4           1     0     1     0     0
5           1     0     0     1     0
6           1     0     0     1     0
7           1     0     0     0     1
8           1     0     0     0     1
attr(,"assign")
[1] 0 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(z~a*x)
  (Intercept) a2 x a2:x
1           1  0 1    0
2           1  0 2    0
3           1  1 3    3
4           1  1 4    4
5           1  0 5    0
6           1  0 6    0
7           1  1 7    7
8           1  1 8    8
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
> model.matrix(z~a:x)
  (Intercept) a1:x a2:x
1           1    1    0
2           1    2    0
3           1    0    3
4           1    0    4
5           1    5    0
6           1    6    0
7           1    0    7
8           1    0    8
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
> model.matrix(z~a*y)
  (Intercept) a2 y a2:y
1           1  0 1    0
2           1  0 2    0
3           1  1 3    3
4           1  1 4    4
5           1  0 8    0
6           1  0 7    0
7           1  1 6    6
8           1  1 5    5
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
> model.matrix(z~a:y)
  (Intercept) a1:y a2:y
1           1    1    0
2           1    2    0
3           1    0    3
4           1    0    4
5           1    8    0
6           1    7    0
7           1    0    6
8           1    0    5
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
> model.matrix(z~b*x)
  (Intercept) b2 x b2:x
1           1  0 1    0
2           1  0 2    0
3           1  0 3    0
4           1  0 4    0
5           1  1 5    5
6           1  1 6    6
7           1  1 7    7
8           1  1 8    8
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(z~b:x)
  (Intercept) b1:x b2:x
1           1    1    0
2           1    2    0
3           1    3    0
4           1    4    0
5           1    0    5
6           1    0    6
7           1    0    7
8           1    0    8
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(z~b*y)
  (Intercept) b2 y b2:y
1           1  0 1    0
2           1  0 2    0
3           1  0 3    0
4           1  0 4    0
5           1  1 8    8
6           1  1 7    7
7           1  1 6    6
8           1  1 5    5
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(z~y*x)
  (Intercept) y x y:x
1           1 1 1   1
2           1 2 2   4
3           1 3 3   9
4           1 4 4  16
5           1 8 5  40
6           1 7 6  42
7           1 6 7  42
8           1 5 8  40
attr(,"assign")
[1] 0 1 2 3
> model.matrix(z~y:x)
  (Intercept) y:x
1           1   1
2           1   4
3           1   9
4           1  16
5           1  40
6           1  42
7           1  42
8           1  40
attr(,"assign")
[1] 0 1

Two of our results came out in the typical 0/1 format indicating significance (or lack thereof) of the response variable. However, running the x, y, and z variable began to produce different results than I was expecting. The relationships between x and y seemed largely linear, but I ran into some interesting deviants when comparing a and b to these data sets. Truth be told, I struggle to interpret the results of these later sets. It seems to me that they are being flagged as significant, but I’m not entirely sure that’s the result I’m looking for.

Anyways, that’s all for this week and I hope everyone has a wonderful week!

Leave a comment