Homework6

Building Dataset

conc1 <- rnorm(20, 1.1, 0.3)
conc5 <- rnorm(20, 1.8, 0.5)

conc_df <- data.frame(concentration = rep(c("1%", "5%"), each=20), rate = c(conc1, conc5))
print(conc_df)

##    concentration      rate
## 1             1% 0.2065922
## 2             1% 0.8847402
## 3             1% 1.2365687
## 4             1% 1.2464606
## 5             1% 1.0009068
## 6             1% 0.9156230
## 7             1% 1.4757131
## 8             1% 0.8147900
## 9             1% 1.1092558
## 10            1% 0.9937252
## 11            1% 0.5940598
## 12            1% 0.8155875
## 13            1% 1.0998652
## 14            1% 1.5698424
## 15            1% 0.8260020
## 16            1% 1.0070956
## 17            1% 0.8116350
## 18            1% 1.3233728
## 19            1% 0.9917221
## 20            1% 1.3565293
## 21            5% 1.8871436
## 22            5% 2.0625917
## 23            5% 1.1861448
## 24            5% 1.2144347
## 25            5% 1.7686388
## 26            5% 2.1600673
## 27            5% 2.4501962
## 28            5% 1.9599116
## 29            5% 2.3630783
## 30            5% 1.0538510
## 31            5% 2.1310850
## 32            5% 1.3864500
## 33            5% 1.8962209
## 34            5% 1.6012915
## 35            5% 2.5992420
## 36            5% 1.8872445
## 37            5% 1.9607263
## 38            5% 1.6425056
## 39            5% 1.7769087
## 40            5% 1.4680958

Data was created to reflect the differences in the rates of vacuole formation between Tetrahymena in 1% ink versus those in 5% ink. Two variables, conc1 and conc5, were assigned vectors of normally distributed values using rnorm(). The sample sizes were both set to be 20, but the means and standard deviations differed to reflect how the two groups differ in reality. A dataframe was made with two columns, the rate and the concentration at which the rate occurred.

Initial Analysis

conc_ANOVA <- aov(rate ~ concentration, data=conc_df)
p_value <- summary(conc_ANOVA)[[1]][[1,"Pr(>F)"]]
print (p_value)

## [1] 3.507837e-08

An ANOVA test was done on the data using aov(); this tested the difference in the rates bt concentration. The p-value was found in the ANOVA summary and it was printed.

Manipulating Data

Sample Size

ss <- c(5, 10, 20, 40, 80)
for (i in 1:length(ss)){
  conc1 <- rnorm(ss[i], 1.1, 0.3)
  conc5 <- rnorm(ss[i], 1.8, 0.5)
  conc_df <- data.frame(concentration = rep(c("1%", "5%"), each=20), rate = c(conc1, conc5))
  conc_ANOVA <- aov(rate ~ concentration, data=conc_df)
  p_value <- summary(conc_ANOVA)[[1]][[1,"Pr(>F)"]]
  print(p_value)
}

## [1] 1
## [1] 1
## [1] 3.45634e-07
## [1] 0.2714846
## [1] 0.9153569

To test the effect of the sample size on the p-value, the variable ss was first assigned a vector of 5 different sample sizes. The for loop then made data and ran an ANOVA test for all of the listed sample sizes, and it printed each p-value in order of how the sample sizes were listed.

Mean

m <- c(1, 2, 3, 4)
for (i in 1:length(m)){
  conc1 <- rnorm(20, 1.1+m[i], 0.3)
  conc5 <- rnorm(20, 1.8+m[i], 0.5)
  conc_df <- data.frame(concentration = rep(c("1%", "5%"), each=20), rate = c(conc1, conc5))
  conc_ANOVA <- aov(rate ~ concentration, data=conc_df)
  p_value <- summary(conc_ANOVA)[[1]][[1,"Pr(>F)"]]
  print(p_value)
}

## [1] 7.749754e-06
## [1] 8.205956e-05
## [1] 8.371917e-06
## [1] 0.001390881

To test the effect of the mean on the p-value, the variable m was first assigned a vector of 4 different increasing values. The for loop individually added each value to the original means for the two groups; it then made data, ran an ANOVA test for each of the new datasets, and printed each p-value in order of how the mean additions were listed in the variable m.

Standard Deviation

sd <- c(0.1, 0.2, 0.3, 0.4)
for (i in 1:length(m)){
  conc1 <- rnorm(20, 1.1, 0.3+sd[i])
  conc5 <- rnorm(20, 1.8, 0.5+sd[i])
  conc_df <- data.frame(concentration = rep(c("1%", "5%"), each=20), rate = c(conc1, conc5))
  conc_ANOVA <- aov(rate ~ concentration, data=conc_df)
  p_value <- summary(conc_ANOVA)[[1]][[1,"Pr(>F)"]]
  print(p_value)
}

## [1] 6.120779e-07
## [1] 0.01164283
## [1] 0.03994584
## [1] 0.01333954

To test the effect of the standard deviation on the p-value, the variable sd was first assigned a vector of 4 different increasing values. The for loop individually added each value to the original standard deviations for the two groups; it then made data, ran an ANOVA test for each of the new datasets, and printed each p-value in order of how the mean additions were listed in the variable sd.