library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
glimpse(iris)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
The glimpse()
function shows that there are 150 rows
(observations) and 5 columns (variables).
iris1 <- iris%>%
filter(Species == "virginica" | Species == "versicolor", Sepal.Length > 6, Sepal.Width > 2.5)
glimpse(iris1) #56 observations of 5 variables
## Rows: 56
## Columns: 5
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.1, 6.1, 6.4, 6.…
## $ Sepal.Width <dbl> 3.2, 3.2, 3.1, 2.8, 3.3, 2.9, 2.9, 3.1, 2.8, 2.8, 2.9, 3.…
## $ Petal.Length <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.0, 4.7, 4.3, 4.…
## $ Petal.Width <dbl> 1.4, 1.5, 1.5, 1.5, 1.6, 1.3, 1.4, 1.4, 1.3, 1.2, 1.3, 1.…
## $ Species <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
Using the filter function, iris
was made into a new
dataset called iris1
(with <-
) that
included observations of only 2 species whose sepal measuremnts were
above certain values. glimpse()
was used to find that the
dataset had 56 observations of 5 variables.
iris2 <- iris1%>%
select(Species, Sepal.Length, Sepal.Width)
glimpse(iris2) #56 obs of 3 variables
## Rows: 56
## Columns: 3
## $ Species <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.1, 6.1, 6.4, 6.…
## $ Sepal.Width <dbl> 3.2, 3.2, 3.1, 2.8, 3.3, 2.9, 2.9, 3.1, 2.8, 2.8, 2.9, 3.…
select()
was used to subset the three columns from
iris1
into a new dataset, iris2
.
glimpse()
was used to find that the dataset had 56
observations of 3 variables.
iris3 <- iris2%>%
arrange(by=desc(Sepal.Length))
head(iris3)
## Species Sepal.Length Sepal.Width
## 1 virginica 7.9 3.8
## 2 virginica 7.7 3.8
## 3 virginica 7.7 2.6
## 4 virginica 7.7 2.8
## 5 virginica 7.7 3.0
## 6 virginica 7.6 3.0
The observations of iris2
were ordered by Sepal.Length
using arrange()
, and the values were specified to be in
descending order with desc()
. The order dataset was
assigned to the variable iris3
, and the first 6 rows were
viewed using head()
.
iris4 <- iris3%>%
mutate(Sepal.Area=Sepal.Length*Sepal.Width)
glimpse(iris3) #56 obs of 4 variables
## Rows: 56
## Columns: 3
## $ Species <fct> virginica, virginica, virginica, virginica, virginica, vi…
## $ Sepal.Length <dbl> 7.9, 7.7, 7.7, 7.7, 7.7, 7.6, 7.4, 7.3, 7.2, 7.2, 7.2, 7.…
## $ Sepal.Width <dbl> 3.8, 3.8, 2.6, 2.8, 3.0, 3.0, 2.8, 2.9, 3.6, 3.2, 3.0, 3.…
iris4
was made using the dataset iris3
, and
a new column was added, “Sepal.Area”, whose values were assigned to be
the product of the length and width values. glimpse()
was
used to find that the dataset had 56 observations of 4 variables.
iris5 <- iris4%>%
summarize(meanLength=mean(Sepal.Length), meanWidth=mean(Sepal.Width), ss=n())
print(iris5)
## meanLength meanWidth ss
## 1 6.698214 3.041071 56
A summary of iris4
was made using
summary()
, and the mean sepal length and width along with
the sample size were reported. The summary was then printed with
print()
.
iris6 <- iris4%>%
group_by(Species)%>%
summarize(meanLength=mean(Sepal.Length), meanWidth=mean(Sepal.Width), ss=n())
print(iris6)
## # A tibble: 2 × 4
## Species meanLength meanWidth ss
## <fct> <dbl> <dbl> <int>
## 1 versicolor 6.48 2.99 17
## 2 virginica 6.79 3.06 39
A summary of iris4
was made again with the same values,
but the values were calculated for each species rather than all
together. This was achieved by grouping the data by species with
group_by()
before using summarize()
irisFinal <- iris%>%
filter(Species == "virginica" | Species == "versicolor", Sepal.Length > 6, Sepal.Width > 2.5)%>%
select(Species, Sepal.Length, Sepal.Width)%>%
arrange(by=desc(Sepal.Length))%>%
mutate(Sepal.Area=Sepal.Length*Sepal.Width)%>%
group_by(Species)%>%
summarize(meanLength=mean(Sepal.Length), meanWidth=mean(Sepal.Width), ss=n())
print(irisFinal)
## # A tibble: 2 × 4
## Species meanLength meanWidth ss
## <fct> <dbl> <dbl> <int>
## 1 versicolor 6.48 2.99 17
## 2 virginica 6.79 3.06 39
The commands above were all combined into one pipeline command, and
the final data set was assigned to the variable irisFinal
and printed.
iris_longer <- iris%>%
select(Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)%>%
pivot_longer(cols = Sepal.Length:Sepal.Width:Petal.Length:Petal.Width, names_to="Measure", values_to = "Value")
## Warning in x:y: numerical expression has 2 elements: only the first used
## Warning in x:y: numerical expression has 3 elements: only the first used
print(iris_longer)
## # A tibble: 600 × 3
## Species Measure Value
## <fct> <chr> <dbl>
## 1 setosa Sepal.Length 5.1
## 2 setosa Sepal.Width 3.5
## 3 setosa Petal.Length 1.4
## 4 setosa Petal.Width 0.2
## 5 setosa Sepal.Length 4.9
## 6 setosa Sepal.Width 3
## 7 setosa Petal.Length 1.4
## 8 setosa Petal.Width 0.2
## 9 setosa Sepal.Length 4.7
## 10 setosa Sepal.Width 3.2
## # ℹ 590 more rows
iris
was made longer using pivot_longer()
.
The columns being used were first selected. The columns being lengthened
were then specified in the function with cols=
. The names
of the columns were put into the new column “Measure”, the values from
the columns were put into the new column “Value”, and the species column
remained the same.