+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization in R

ggplot2 and the grammar of graphics

2020-08-22

1 / 84

Art by Allison Horst

2 / 84

Data Visualization with R

ggplot2 works well with the tidyverse and is friendly and powerful

3 / 84

Data Visualization with R

ggplot2 works well with the tidyverse and is friendly and powerful

Better plots are better communication

4 / 84
5 / 84

ggplot2: Elegant Data Visualizations in R

a Layered Grammar of Graphics

6 / 84

ggplot2: Elegant Data Visualizations in R

a Layered Grammar of Graphics

Data is mapped to aesthetics; Statistics and plot are linked

7 / 84

ggplot2: Elegant Data Visualizations in R

a Layered Grammar of Graphics

Data is mapped to aesthetics; Statistics and plot are linked

Sensible defaults; Infinitely extensible

8 / 84

Publication quality and beyond

https://nyti.ms/2jUp36n

http://bit.ly/2KSGZLu

9 / 84
# print prettily
as_tibble(mtcars)
## # A tibble: 32 x 11
## mpg cyl disp hp drat wt qsec vs am
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1
## 2 21 6 160 110 3.9 2.88 17.0 0 1
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0
## 8 24.4 4 147. 62 3.69 3.19 20 1 0
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0
## # … with 22 more rows, and 2 more variables: gear <dbl>,
## # carb <dbl>
10 / 84

11 / 84
ggplot()
12 / 84
ggplot()

13 / 84
ggplot(mtcars,
aes(x = mpg, y = hp))

14 / 84
ggplot(mtcars,
aes(x = mpg, y = hp)) +
geom_point()

15 / 84

ggplot()

ggplot(data = <data>, mapping = aes(<mapping>)) +

   <geom_function>()

16 / 84

ggplot()

ggplot(data = <data>, mapping = aes(<mapping>)) +

   <geom_function>()

Add layers with +

17 / 84

ggplot()

ggplot(data = <data>, mapping = aes(<mapping>)) +

   <geom_function>()

Add layers with +

Put + at the end of a line

18 / 84

ggplot()

ggplot(data = <data>, mapping = aes(<mapping>)) +

   <geom_function>()

Add layers with +

Put + at the end of a line

map aesthetics with aes()

19 / 84

Your Turn 1

Read in the diabetes data.

Write and run the code from this slide to make a graph. Pay strict attention to spelling, capitalization, and parentheses!

ggplot(data = diabetes, mapping = aes(x = weight, y = hip)) +
geom_point()
20 / 84
diabetes <- read_csv("diabetes.csv")
ggplot(data = diabetes, mapping = aes(x = weight, y = hip)) +
geom_point()

21 / 84

Aesthetics: aes()

ggplot(data = <data>, mapping = aes(<mapping>)) +

   <geom_function>()

22 / 84

Aesthetics: aes()

ggplot(data = <data>, mapping = aes(<mapping>)) +

   <geom_function>()

Aesthetics map the data to the plot

22 / 84

Aesthetics: aes()

ggplot(mtcars, aes(x = mpg, y = hp, color = cyl)) + geom_point()
ggplot(mtcars, aes(x = mpg, y = hp, size = cyl)) + geom_point()
ggplot(mtcars, aes(x = mpg, y = hp, alpha = cyl)) + geom_point()
ggplot(mtcars, aes(x = mpg, y = hp, shape = cyl)) + geom_point()
23 / 84

Your Turn 2

Add color, size, alpha, and shape aesthetics to your graph using the gender variable. Experiment.

ggplot(
data = diabetes,
mapping = aes(x = weight, y = hip)
) +
geom_point()

Try moving the mapping argument to geom_point(). Add in any aesthetics you found helpful.

24 / 84
ggplot(
data = diabetes,
mapping = aes(x = weight, y = hip, color = gender)
) +
geom_point()

25 / 84
ggplot(
data = diabetes,
mapping = aes(x = weight, y = hip, size = gender)
) +
geom_point()

26 / 84
ggplot(
data = diabetes,
mapping = aes(x = weight, y = hip, alpha = gender)
) +
geom_point()

27 / 84
ggplot(
data = diabetes,
mapping = aes(x = weight, y = hip, shape = gender)
) +
geom_point()

28 / 84
ggplot(data = diabetes) +
geom_point(
mapping = aes(
x = weight,
y = hip,
color = gender,
shape = gender
)
)
29 / 84

30 / 84

geoms

What shape does the data take?

31 / 84

geoms

What shape does the data take?

geom_point()

31 / 84

geoms

What shape does the data take?

geom_point()

geom_line()

31 / 84

geoms

What shape does the data take?

geom_point()

geom_line()

geom_violin()

31 / 84

geoms

What shape does the data take?

geom_point()

geom_line()

geom_violin()

Check the cheatsheet!

31 / 84

Your Turn 3

Replace this scatterplot with one that draws boxplots.

ggplot(diabetes, aes(gender, chol)) + geom_point()
32 / 84
ggplot(diabetes, aes(gender, chol)) + geom_boxplot()

33 / 84

Your Turn 4

1. Make a histogram of the glyhb variable in diabetes.

2. Redo the glyhb plot as a density plot.

34 / 84
ggplot(diabetes, aes(x = glyhb)) +
geom_histogram()

35 / 84
ggplot(diabetes, aes(x = glyhb)) +
geom_density()

36 / 84
diabetes %>%
ggplot(aes(x = frame)) +
geom_bar()

37 / 84
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame)) +
geom_bar()

38 / 84

Your Turn 5

Make a bar chart of frame colored by gender. Then, try it with the fill aesthetic instead of color.

diabetes %>%
drop_na() %>%
______() +
______()
39 / 84
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, fill = gender)) +
geom_bar()

40 / 84

Positions

geom_bar(position = "<POSITION>")

41 / 84

Positions

geom_bar(position = "<POSITION>")

When we have aesthetics mapped, how are they positioned?

41 / 84

Positions

geom_bar(position = "<POSITION>")

When we have aesthetics mapped, how are they positioned?

geom_bar: dodge, fill, stacked (default)

41 / 84

Positions

geom_bar(position = "<POSITION>")

When we have aesthetics mapped, how are they positioned?

geom_bar: dodge, fill, stacked (default)

geom_point: jitter

41 / 84
ggplot(mtcars, aes(x = factor(am), y = hp)) +
geom_point()

42 / 84
ggplot(mtcars, aes(x = factor(am), y = hp)) +
geom_point(position = "jitter")

43 / 84
ggplot(mtcars, aes(x = factor(am), y = hp)) +
geom_jitter(width = .1, height = 0)

44 / 84

Your Turn 6

Take your code for the bar chart before (using the fill aesthetic). Experiment with different position values: "dodge", "fill", "stack"

45 / 84
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, fill = gender)) +
geom_bar(position = "stack")

46 / 84
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, fill = gender)) +
geom_bar(position = "dodge")

47 / 84
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, fill = gender)) +
geom_bar(position = "fill")

48 / 84

Mapping vs setting

Cool, but how do I just make everything blue?

49 / 84

Mapping vs setting

Cool, but how do I just make everything blue?

geom_point(aes(x, y), color = "blue")

geom_bar(aes(x, y), fill = "blue")

49 / 84

Mapping vs setting

Cool, but how do I just make everything blue?

geom_point(aes(x, y), color = "blue")

geom_bar(aes(x, y), fill = "blue")

To set a color, put it outside aes()

49 / 84
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl)) +
geom_point(color = "blue")

50 / 84
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl)) +
geom_point(aes(color = "blue"))

51 / 84
ggplot(mtcars, aes(x = cyl)) +
geom_bar(color = "blue")

52 / 84
ggplot(mtcars, aes(x = cyl)) +
geom_bar(fill = "blue")

53 / 84
ggplot(mtcars, aes(x = cyl)) +
geom_bar(color = "red", fill = "blue")

54 / 84

Adding layers

ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) +
<GEOM_FUNCTION>() +
<GEOM_FUNCTION>() +
<SCALE_FUNCTION>() +
<THEME_FUNCTION>()
55 / 84

Your Turn 7

Run the code after every change you make.

1. Predict what this code will do. Then run it.

2. Add a linetype aesthetic for gender. Run it again.

3. Set the color of geom_smooth() to "black"

4. Add se = FALSE to the geom_smooth()

5. It's hard to see the lines well now. How about setting alpha = .2 in geom_point()?

6. Jitter the points. You can either change the geom or change the position argument.

7. Add another layer, theme_bw(). Remember to use +.

ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth()
56 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth()

57 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth(aes(linetype = gender))

58 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth(aes(linetype = gender), col = "black")

59 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth(aes(linetype = gender), col = "black", se = FALSE)

60 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_point(alpha = .2) +
geom_smooth(aes(linetype = gender), col = "black", se = FALSE)

61 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_jitter(alpha = .2) +
geom_smooth(aes(linetype = gender), col = "black", se = FALSE)

62 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_jitter(alpha = .2) +
geom_smooth(aes(linetype = gender), col = "black", se = FALSE) +
theme_bw()

63 / 84

Facets

Easy peazy panels

64 / 84

Facets

Easy peazy panels

facet_grid()

facet_wrap()

65 / 84

Facets

Easy peazy panels

facet_grid()

facet_wrap()

facet_grid(rows = vars(x), cols = vars(y))

facet_wrap(vars(x))

66 / 84
diamonds %>%
ggplot(aes(x = carat, price)) +
geom_point() +
facet_grid(rows = vars(cut), cols = vars(clarity))
67 / 84
diamonds %>%
ggplot(aes(x = carat, price)) +
geom_point() +
facet_grid(rows = vars(cut), cols = vars(clarity))

68 / 84

Your Turn 8

Use a facet grid by gender and location

ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth()
69 / 84
ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth() +
facet_grid(rows = vars(gender), cols = vars(location))

70 / 84

facet_wrap()

diamonds %>%
ggplot(aes(x = carat, price)) +
geom_point() +
facet_wrap(vars(clarity))
71 / 84

facet_wrap()

72 / 84

datasauRus

library(datasauRus)
datasaurus_dozen
## # A tibble: 1,846 x 3
## dataset x y
## <chr> <dbl> <dbl>
## 1 dino 55.4 97.2
## 2 dino 51.5 96.0
## 3 dino 46.2 94.5
## 4 dino 42.8 91.4
## 5 dino 40.8 88.3
## 6 dino 38.7 84.9
## 7 dino 35.6 79.9
## 8 dino 33.1 77.6
## 9 dino 29.0 74.5
## 10 dino 26.2 71.4
## # … with 1,836 more rows
73 / 84

74 / 84

Your Turn 9: Challenge!

1. Load the datasauRus package. This package includes a data set called datasaurus_dozen.

2. Use dplyr to summarize the correlation between x and y. First, group it by dataset, and then summarize with the cor() function. Call the new variable corr. What's it look like?

3. Mutate corr. Round it to 2 digits. Then, mutate it again (or wrap it around your first change) using: paste("corr:", corr)

4. Save the summary data frame as corrs.

5. Pass datasaurus_dozen to ggplot() and add a point geom

6. Use a facet (wrap) for dataset.

7. Add a text geom. For this geom, set data = corrs. You also need to use aes() in this call to set label = corr, x = 50, and y = 110.

75 / 84
corrs <- datasaurus_dozen %>%
group_by(dataset) %>%
summarize(corr = cor(x, y)) %>%
mutate(
corr = round(corr, 2),
corr = paste("corr:", corr)
)
76 / 84
corrs <- datasaurus_dozen %>%
group_by(dataset) %>%
summarize(corr = cor(x, y)) %>%
mutate(
corr = round(corr, 2),
corr = paste("corr:", corr)
)
77 / 84
corrs <- datasaurus_dozen %>%
group_by(dataset) %>%
summarize(corr = cor(x, y)) %>%
mutate(
corr = round(corr, 2),
corr = paste("corr:", corr)
)
78 / 84
corrs <- datasaurus_dozen %>%
group_by(dataset) %>%
summarize(corr = cor(x, y)) %>%
mutate(
corr = round(corr, 2),
corr = paste("corr:", corr)
)
79 / 84
corrs
## # A tibble: 13 x 2
## dataset corr
## <chr> <chr>
## 1 away corr: -0.06
## 2 bullseye corr: -0.07
## 3 circle corr: -0.07
## 4 dino corr: -0.06
## 5 dots corr: -0.06
## 6 h_lines corr: -0.06
## 7 high_lines corr: -0.07
## 8 slant_down corr: -0.07
## 9 slant_up corr: -0.07
## 10 star corr: -0.06
## 11 v_lines corr: -0.07
## 12 wide_lines corr: -0.07
## 13 x_shape corr: -0.07
80 / 84
datasaurus_dozen %>%
ggplot(aes(x, y)) +
geom_point() +
geom_text(data = corrs, aes(label = corr, x = 50, y = 110)) +
facet_wrap(vars(dataset))
81 / 84
datasaurus_dozen %>%
ggplot(aes(x, y)) +
geom_point() +
geom_text(data = corrs, aes(label = corr, x = 50, y = 110)) +
facet_wrap(vars(dataset))
82 / 84
datasaurus_dozen %>%
ggplot(aes(x, y)) +
geom_point() +
geom_text(data = corrs, aes(label = corr, x = 50, y = 110)) +
facet_wrap(vars(dataset))
83 / 84

84 / 84

Art by Allison Horst

2 / 84
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow