Homework7

Step 1

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

glimpse(iris)

## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

The glimpse() function shows that there are 150 rows (observations) and 5 columns (variables).

Step 2

iris1 <- iris%>%
  filter(Species == "virginica" | Species == "versicolor", Sepal.Length > 6, Sepal.Width > 2.5)
glimpse(iris1) #56 observations of 5 variables

## Rows: 56
## Columns: 5
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.1, 6.1, 6.4, 6.…
## $ Sepal.Width  <dbl> 3.2, 3.2, 3.1, 2.8, 3.3, 2.9, 2.9, 3.1, 2.8, 2.8, 2.9, 3.…
## $ Petal.Length <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.0, 4.7, 4.3, 4.…
## $ Petal.Width  <dbl> 1.4, 1.5, 1.5, 1.5, 1.6, 1.3, 1.4, 1.4, 1.3, 1.2, 1.3, 1.…
## $ Species      <fct> versicolor, versicolor, versicolor, versicolor, versicolo…

Using the filter function, iris was made into a new dataset called iris1 (with <-) that included observations of only 2 species whose sepal measuremnts were above certain values. glimpse() was used to find that the dataset had 56 observations of 5 variables.

Step 3

iris2 <- iris1%>%
  select(Species, Sepal.Length, Sepal.Width)
glimpse(iris2) #56 obs of 3 variables

## Rows: 56
## Columns: 3
## $ Species      <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.1, 6.1, 6.4, 6.…
## $ Sepal.Width  <dbl> 3.2, 3.2, 3.1, 2.8, 3.3, 2.9, 2.9, 3.1, 2.8, 2.8, 2.9, 3.…

select() was used to subset the three columns from iris1 into a new dataset, iris2. glimpse() was used to find that the dataset had 56 observations of 3 variables.

Step 4

iris3 <- iris2%>%
  arrange(by=desc(Sepal.Length))
head(iris3)

##     Species Sepal.Length Sepal.Width
## 1 virginica          7.9         3.8
## 2 virginica          7.7         3.8
## 3 virginica          7.7         2.6
## 4 virginica          7.7         2.8
## 5 virginica          7.7         3.0
## 6 virginica          7.6         3.0

The observations of iris2 were ordered by Sepal.Length using arrange(), and the values were specified to be in descending order with desc(). The order dataset was assigned to the variable iris3, and the first 6 rows were viewed using head().

Step 5

iris4 <- iris3%>%
  mutate(Sepal.Area=Sepal.Length*Sepal.Width)
glimpse(iris3) #56 obs of 4 variables

## Rows: 56
## Columns: 3
## $ Species      <fct> virginica, virginica, virginica, virginica, virginica, vi…
## $ Sepal.Length <dbl> 7.9, 7.7, 7.7, 7.7, 7.7, 7.6, 7.4, 7.3, 7.2, 7.2, 7.2, 7.…
## $ Sepal.Width  <dbl> 3.8, 3.8, 2.6, 2.8, 3.0, 3.0, 2.8, 2.9, 3.6, 3.2, 3.0, 3.…

iris4 was made using the dataset iris3, and a new column was added, “Sepal.Area”, whose values were assigned to be the product of the length and width values. glimpse() was used to find that the dataset had 56 observations of 4 variables.

Step 6

iris5 <- iris4%>%
  summarize(meanLength=mean(Sepal.Length), meanWidth=mean(Sepal.Width), ss=n())
print(iris5)

##   meanLength meanWidth ss
## 1   6.698214  3.041071 56

A summary of iris4 was made using summary(), and the mean sepal length and width along with the sample size were reported. The summary was then printed with print().

Step 7

iris6 <- iris4%>%
  group_by(Species)%>%
  summarize(meanLength=mean(Sepal.Length), meanWidth=mean(Sepal.Width), ss=n())
print(iris6)

## # A tibble: 2 × 4
##   Species    meanLength meanWidth    ss
##   <fct>           <dbl>     <dbl> <int>
## 1 versicolor       6.48      2.99    17
## 2 virginica        6.79      3.06    39

A summary of iris4 was made again with the same values, but the values were calculated for each species rather than all together. This was achieved by grouping the data by species with group_by() before using summarize()

Step 8

irisFinal <- iris%>%
  filter(Species == "virginica" | Species == "versicolor", Sepal.Length > 6, Sepal.Width > 2.5)%>%
  select(Species, Sepal.Length, Sepal.Width)%>%
  arrange(by=desc(Sepal.Length))%>%
  mutate(Sepal.Area=Sepal.Length*Sepal.Width)%>%
  group_by(Species)%>%
  summarize(meanLength=mean(Sepal.Length), meanWidth=mean(Sepal.Width), ss=n())
print(irisFinal)

## # A tibble: 2 × 4
##   Species    meanLength meanWidth    ss
##   <fct>           <dbl>     <dbl> <int>
## 1 versicolor       6.48      2.99    17
## 2 virginica        6.79      3.06    39

The commands above were all combined into one pipeline command, and the final data set was assigned to the variable irisFinal and printed.

Step 9

iris_longer <- iris%>%
  select(Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)%>%
  pivot_longer(cols = Sepal.Length:Sepal.Width:Petal.Length:Petal.Width, names_to="Measure", values_to = "Value")

## Warning in x:y: numerical expression has 2 elements: only the first used

## Warning in x:y: numerical expression has 3 elements: only the first used

print(iris_longer)

## # A tibble: 600 × 3
##    Species Measure      Value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # ℹ 590 more rows

iris was made longer using pivot_longer(). The columns being used were first selected. The columns being lengthened were then specified in the function with cols=. The names of the columns were put into the new column “Measure”, the values from the columns were put into the new column “Value”, and the species column remained the same.