+ - 0:00:00
Notes for current slide
Notes for next slide

Best Practices in R

2020-08-22

developed by Emil Hvitfeldt

1 / 46

Welcome!

2 / 46

Change Settings

Keyboard shortcut to open settings
⌘ + , in Mac OS,
ctrl + , in Windows

✓ - Uncheck "Restore .RData into work space at start up"

✓ - Set "Save work space to .Rdata on exit" to "Never"

Settings window

3 / 46

Change Appearance

RStudio themes

Fonts

Font Sizes

Editor Themes

Settings window

4 / 46

Pane layouts

Change the layout of the panes

Source on top?

Source down to the right?

It's all up to you!

Settings window

5 / 46

Pane layouts

Some like having both source and console open

6 / 46

Pane layouts

...while still allowing to have viewer open

7 / 46

RStudio Projects

Keep all files from one project together. Use RStudio projects.

8 / 46

RStudio Projects

Keep all files from one project together. Use RStudio projects.

Self contained

8 / 46

RStudio Projects

Keep all files from one project together. Use RStudio projects.

Self contained

Project orientated

8 / 46

keep all the files associated with a project together — input data, R scripts, analytic results, figures.

usethis

usethis::create_project("project_name")

9 / 46

RStudio Projects - Creation 1 / 4

Click File > New Project
Up right tick

Or click on the upper right Up right tick

10 / 46

RStudio Projects - Creation 2 / 4

1

11 / 46

RStudio Projects - Creation 3 / 4

1

12 / 46

RStudio Projects - Creation 4 / 4

1

13 / 46

Folder Structure

14 / 46

Folder Structure

name_of_project
|--raw_data
|--WhateverData.xlsx
|--report_2017.csv
|--output_data
|--summary2017.csv
|--rmd
|--01-analysis.Rmd
|--docs
|--01-analysis.html
|--01-analysis.pdf
|--scripts
|--exploratory_analysis.R
|--pdf_scraper.R
|--figures
|--weather_2017.png
|--name_of_project.Rproj
|--run_all.R
14 / 46
  1. Raw data separate from cleaned data
  2. Reports and scrips are separated
  3. Generated and imported figures has its own place
  4. Numbered using 2 digits
  5. Reusable and easily understandable
15 / 46

Folder Structure

library(fs)
folder_names <- c("raw_data", "output_data", "rmd", "docs",
"scripts", "figures")
dir_create(fldr_names)
16 / 46

never modify raw data, only read (forever untouched)

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
17 / 46

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.
18 / 46

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.

Only use relative paths, never absolute paths

18 / 46

Introducing the here package.

library(here)
here()
## [1] "/Users/Emil/Research/Health"
library(here)
data <- read_csv(here("raw_data", "amazing_data.csv"))
19 / 46

Naming Things

20 / 46

Naming Things

tweet about naming

20 / 46
  • Organization
  • Ease of use
    There will be multi slides about naming

Naming Things - Files

NO

report.pdf
reportv2.pdf
reportthisisthelastone.pages
Figure 2.png
3465-234szx.r
foo.R

YES

2018-10-01_01_report-for-cdc.pdf
01_data.rmd
01_data.pdf
02_data-filtering.rmd
02_data-filtering.pdf
21 / 46

Follow narrative from folder structure slide
jenny Bryan naming things

  1. Avoid spaces, punctuation, special characters and case sensitivity
  2. Deliberate use of delimiters
  3. Describe the contents of the file
  4. Put something numeric first
  5. Left pad numbers with zeroes
  6. Use a standard date (YYYY-MM-DD)
22 / 46

to preserve chronological and logical ordering.

Naming Things - Files

library(fs)
dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv
## 2018-02-23_health-study_power-100_group-B1.csv
## 2018-02-23_health-study_power-100_group-C1.csv
## 2018-02-23_health-study_power-200_group-A1.csv
## 2018-02-23_health-study_power-200_group-B1.csv
## 2018-02-23_health-study_power-200_group-C1.csv
23 / 46

Naming Things - Files

library(fs)
dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv
## 2018-02-23_health-study_power-100_group-B1.csv
## 2018-02-23_health-study_power-100_group-C1.csv
## 2018-02-23_health-study_power-200_group-A1.csv
## 2018-02-23_health-study_power-200_group-B1.csv
## 2018-02-23_health-study_power-200_group-C1.csv
stringr::str_split_fixed(x, "[_\\.]", 5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "2018-02-23" "health-study" "power-100" "group-A1" "csv"
## [2,] "2018-02-23" "health-study" "power-100" "group-B1" "csv"
## [3,] "2018-02-23" "health-study" "power-100" "group-C1" "csv"
## [4,] "2018-02-23" "health-study" "power-200" "group-A1" "csv"
## [5,] "2018-02-23" "health-study" "power-200" "group-B1" "csv"
## [6,] "2018-02-23" "health-study" "power-200" "group-C1" "csv"
23 / 46
  • Avoid spaces, punctuation, special characters and case sensitivity
  • Deliberate use of delimiters
  • File name should describe the contents of the file
  • Put something numeric first
  • Left pad numbers with zeroes
  • Use ISO 8601 standard for dates (YYYY-MM-DD)

Naming Things - Files

library(tidyverse)
map_df(dir_ls("data/", regexp = "health-study"), read_csv)
# or
dir_ls("data/", regexp = "health-study") %>%
map_df(read_csv)
24 / 46
  • Avoid spaces, punctuation, special characters and case sensitivity
  • Deliberate use of delimiters
  • File name should describe the contents of the file
  • Put something numeric first
  • Left pad numbers with zeroes
  • Use ISO 8601 standard for dates (YYYY-MM-DD)

Naming Things - Objects

  1. Only use lowercase letters, numbers, and _
  2. Use names that are not jargony, weight instead of K
  3. Use informative names
25 / 46

Naming Things - Objects

# Bad
df
e
tuningVar
# Good
health_data
error
tuning_var
26 / 46

lowercase letters + numbers = alpha-numeric characters (ish)

What To Avoid - attach()

Never use attach()

27 / 46

What To Avoid - attach()

Never use attach()

attach(mtcars)
mean(mpg)
## [1] 20.09062

Loads lots of names into the search path, ambiguous selections.

27 / 46

What To Avoid - attach()

Never use attach()

attach(mtcars)
mean(mpg)
## [1] 20.09062

Loads lots of names into the search path, ambiguous selections.

Try with() or withr instead

27 / 46

What To Avoid - attach()

Never use rm(list=ls())

28 / 46

What To Avoid - attach()

Never use rm(list=ls())

Instead, restart the R session

CTRL+SHIFT+F10 for Windows

CMD+SHIFT+ALT+F10 for Mac OS

28 / 46

R Markdown documents versus R scripts

You can use R scripts for simple self contained tasks.

source() R scripts into your R Markdown document where you will do analyses, visualizations and reporting.

29 / 46

R Markdown

- 01-import.R
- 02-clean-names.R
- 03-tidy.R
- etc
30 / 46

R Markdown

- 01-import.R
- 02-clean-names.R
- 03-tidy.R
- etc

Include at the start of R Markdown file

{r load_scripts, include = FALSE}
library(here)
source(here("scripts", "01-import.R"))
source(here("scripts", "02-clean-names.R"))
source(here("scripts", "03-tidy.R"))
30 / 46

Naming Chunks

Names can be placed after the comma

```{r, chunk-label, results='hide', fig.height=4}

or before

```{r chunk-label, results='hide', fig.height=4}

In general it is recommended to use alphabetic characters with words separated by - and avoid other characters. - Yihui Xie

31 / 46
  1. Makes navigating the R Markdown document easier
  2. Makes your R Markdown easier to understand
  3. Clarifies error reports or progress of knitting
  4. Caching when moving chunks around
32 / 46

Lower left corner of Rstudio have menu where sections and chunks can be selected with.

Caching on unnamed chunks are based on numbering.

Setup Chunk

In a fresh R Markdown document you see this

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
33 / 46

Setup Chunk

In a fresh R Markdown document you see this

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

The setup chunk is run before another code - use to your advantage

33 / 46

Setting figure path

34 / 46

Setting figure path

```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.path = "figures/")
34 / 46

highlight use of fig.path option

fig.path: ('figure/'; character) prefix to be used for figure filenames (fig.path and chunk labels are concatenated to make filenames)

Styling Code

Use consistent style when writing code

35 / 46

Styling Code

Use consistent style when writing code

http://style.tidyverse.org/

35 / 46

Styling Code

Use consistent style when writing code

http://style.tidyverse.org/

All about preferences but keep it consistent!!!

35 / 46

Give examples of styles to follow

Use the styler package to style your code for you

36 / 46

Keep .Rprofile Clean

Your computer contains a file called .Rprofile.

This file runs first in every session. Think of it as configuration file.

37 / 46

Keep .Rprofile Clean

Your computer contains a file called .Rprofile.

This file runs first in every session. Think of it as configuration file.

options(stringsAsFactors = FALSE)
options(max.print = 100)
37 / 46

Keep .Rprofile Clean

Only put interactive code in

Yes

# add this with usethis::use_usethis()
library(usethis)

No

library(tidyverse)
38 / 46

Use it to change options and load packages

Comment Your Code

Functions: Arguments and purpose

Code: What or why, NOT how

39 / 46

Comment Your Code

Functions: Arguments and purpose

Code: What or why, NOT how

# Takes a data.frame (data) and replaces the columns with the names
# (names) and converts them from factor variable to character
# variables. Keeps characters variables unchanged.
factor_to_text <- function(data, names) {
for (i in seq_along(names)) {
if(is.factor(data[, names[i], drop = TRUE]))
data[, names[i]] <- as.character.factor(data[, names[i],
drop = TRUE])
}
data
}
39 / 46

Updating R and RStudio

The most recent version of R can be downloaded from The Comprehensive R Archive Network (CRAN)

40 / 46

Updating R and RStudio

Download the most recent version of RStudio at their downloads page

41 / 46

How to ask for help (datapasta and reprex)

The reprex package helps you create a reproducible example

datapasta lets you easy copy + paste small samples of data into RStudio

42 / 46

How to ask for help (reprex)

Check out the package website and RStudio webinar on creating reproducible examples

Art by Allison Horst

43 / 46

Where to get help

RStudio has a helpful community if you have questions (everyone does!)

RStudio Community:

RStudio has a dedicated forum for questions related to R and RStudio: https://community.rstudio.com/

44 / 46

Where else to get help

Stack Overflow

Check out the questions tagged r on Stack Overflow: https://stackoverflow.com/questions/tagged/r

45 / 46

#rstats on Twitter

If you have a Twitter account, check out #rstats: https://twitter.com/hashtag/rstats

Art by Allison Horst

46 / 46

Welcome!

2 / 46
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow