Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to

Evangeline Warren

R Working Group

Jan 20th, 2023

1 / 43

Goals for this session

  • Learn about...
2 / 43

Goals for this session

  • Learn about...

    • basic R syntax

    • different R objects (things that hold data) & indexing them

    • useful functions for working with data

2 / 43

Goals for this session

  • Learn about...

    • basic R syntax

    • different R objects (things that hold data) & indexing them

    • useful functions for working with data

  • Become familiar with R Studio & develop good coding habits
2 / 43

Goals for this session

  • Learn about...

    • basic R syntax

    • different R objects (things that hold data) & indexing them

    • useful functions for working with data

  • Become familiar with R Studio & develop good coding habits

    • R Studio is an additional program that provides many useful features for working with R

    • (you need to download and install both R and R Studio)

2 / 43

R Studio

3 / 43

R Studio

  • Let's dive in by starting R Studio and opening a new R script

    • menu bar:   FileNew FileR Script
    • (in R:   FileNew Script)
4 / 43

R Studio

  • Let's dive in by starting R Studio and opening a new R script

    • menu bar:   FileNew FileR Script
    • (in R:   FileNew Script)
  • You should now have 4 panes open (like on the next slide)

    • Source -- Our script where we will type and save our comments & commands
    • Console -- Where we can give R commands and where the output will appear
    • Output -- File explorer, plots, help files, and more!
    • Environments -- Useful information about the R session
4 / 43

downloaded from user guide on postit.co

5 / 43

R Studio: Good Habits

  • Add a comment to our new script:

    # Comment: My R script from Working Group Session (1/20/2023)
    # (R ignores all lines that begin with a pound/hash/number sign/#)
6 / 43

R Studio: Good Habits

  • Add a comment to our new script:

    # Comment: My R script from Working Group Session (1/20/2023)
    # (R ignores all lines that begin with a pound/hash/number sign/#)
  • Save our script

    • menu bar:   FileSave As...
6 / 43

R Studio: Good Habits

  • Add a comment to our new script:

    # Comment: My R script from Working Group Session (1/20/2023)
    # (R ignores all lines that begin with a pound/hash/number sign/#)
  • Save our script

    • menu bar:   FileSave As...
  • Set our working directory

    • this is where R will start looking for & saving files (e.g., data files or plots)
    • menu bar:   SessionSet Working Directory
              Choose Directory...
6 / 43

Basic R Syntax

7 / 43

Basic R Syntax

  • R syntax takes the form
# object_name <- object_value
mean_age <- 33
8 / 43

Basic R Syntax

  • R syntax takes the form
# object_name <- object_value
mean_age <- 33
  • The symbol "<-" is called the assignment operator

    • we are creating a new variable called mean_age and assigning it a value of 33

    • mean_age = 33 will also work (but <- is the convention)

8 / 43

Basic R Syntax (cont.)

If we enter the name of a variable in the Console, then R will list the value(s)

> Mean_age2 <- 22 ## note: object names are case-sensitive
> Mean_age2
## [1] 22
9 / 43

Basic R Syntax (cont.)

If we enter the name of a variable in the Console, then R will list the value(s)

> Mean_age2 <- 22 ## note: object names are case-sensitive
> Mean_age2
## [1] 22

BUT we are in the business of good habits...

  • type this syntax into our script and (with the cursor on the same line) press the following keys together:

    • On a Mac:   <command> <enter>

    • In Windows:   <control> <enter>   (in R Studio)
                <control> r         (in the R app)

  • these keyboard shortcuts will run the syntax on the line in the Console
    (or you can highlight a region)

9 / 43

Basic R Syntax: functions

We have seen a simple object for holding data, but R has many useful functions

ls() # list all the objects in memory
rm(Mean_age2) # remove the object called Mean_age2
getwd() # print the working directory
dir() # list the files in the current directory
dir("../") # list the files in the parent directory
save.image("my_data.RData") # save all the objects in memory
load("my_data.RData") # load all the objects in the data file
10 / 43

Basic R Syntax: functions

We have seen a simple object for holding data, but R has many useful functions

ls() # list all the objects in memory
rm(Mean_age2) # remove the object called Mean_age2
getwd() # print the working directory
dir() # list the files in the current directory
dir("../") # list the files in the parent directory
save.image("my_data.RData") # save all the objects in memory
load("my_data.RData") # load all the objects in the data file

Quick note:

  • suppose you create an object called abc that holds the value 2
  • then you load a file data.RData that also has an object named abc but holds the value 99
  • the first version of the object (abc holding 2) will get replaced
10 / 43

Basic R Syntax: help files

  • Google searches are a very effective way to find help

    • and so is asking the R Working Group 😎
11 / 43

Basic R Syntax: help files

  • Google searches are a very effective way to find help

    • and so is asking the R Working Group 😎
  • R documentation can be accessed in the Help tab in the Output pane

11 / 43

Basic R Syntax: help files

  • Google searches are a very effective way to find help

    • and so is asking the R Working Group 😎
  • R documentation can be accessed in the Help tab in the Output pane

  • Some additional syntax and functions

?read.csv # show the help file for the function read.csv
help.search("weighted mean") # search help files for the phrase'weighted mean'
11 / 43

Data Structures in R

12 / 43

Data Structures: motivation

We are not going to solve the world's problems with a single number...

> all_ages <- c(22, 33, 44, 55) # c() concatenates numbers together
> all_ages
## [1] 22 33 44 55
> mean(all_ages) # calculate the mean
## [1] 38.5
> all_ed <- c("HS", "Col", "Grad Sch", "HS")
> all_ed
## [1] "HS" "Col" "Grad Sch" "HS"
13 / 43

Data Structures: motivation (cont.)

R handles different types of data as well

> important_data <- c("OSU", "R", "Group", 4)
> important_data
## [1] "OSU" "R" "Group" "4"

Wait, what is going on here?

14 / 43

Data Structures: motivation (cont.)

R handles different types of data as well

> important_data <- c("OSU", "R", "Group", 4)
> important_data
## [1] "OSU" "R" "Group" "4"

Wait, what is going on here?

  • we are mixing different types of data & R assumes that we just forgot to wrap the 4 in quotation marks

  • sometimes R's assumptions are useful, sometimes they are not! 🤔

14 / 43

Data Structures: motivation (cont.)

Here is another example with missing data

> test_scores <- c(88, 99, 110, 66, NA) # NA is for missing values
> mean_scores <- mean(test_scores)
> mean_scores / 100
## [1] NA

😾 Ugh! Why didn't R tell me there was a problem when I tried to calculate the mean?!?

15 / 43

Data Structures: motivation (cont.)

Here is another example with missing data

> test_scores <- c(88, 99, 110, 66, NA) # NA is for missing values
> mean_scores <- mean(test_scores)
> mean_scores / 100
## [1] NA

😾 Ugh! Why didn't R tell me there was a problem when I tried to calculate the mean?!?

  • another R assumption

  • can you figure out how to calculate the mean for non-missing values? (help file is helpful 😄)

15 / 43

Data Structures: vectors

  • We have been creating vectors when we use c() to concatenate data
16 / 43

Data Structures: vectors

  • We have been creating vectors when we use c() to concatenate data

  • Here are some more useful functions for working with vectors

> # test that we have a vector
> is.vector(test_scores) # returns another data type: TRUE or FALSE (called logical)
## [1] TRUE
> summary(test_scores) # numerical summary (less helpful for strings)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 66.00 82.50 93.50 90.75 101.75 110.00 1
16 / 43

Data Structures: vectors (cont.)

> length(test_scores) # how many elements in the vector
## [1] 5
> is.na(test_scores) # test if each element is NA
## [1] FALSE FALSE FALSE FALSE TRUE
> TRUE + TRUE + FALSE # useful trick with logical objects (TRUE/FALSE)
## [1] 2
> n_missing <- sum(is.na(test_scores))
> n_missing
## [1] 1
17 / 43

Data Structures: indexing vectors

We can access the ith element in a vector with the syntax vector_name[ i ]

> test_scores[1] # first element
## [1] 88
> test_scores[2] # second element
## [1] 99
18 / 43

Data Structures: indexing vectors

We can access the ith element in a vector with the syntax vector_name[ i ]

> test_scores[1] # first element
## [1] 88
> test_scores[2] # second element
## [1] 99
> 1:3 # a vector of c(1, 2, 3)
## [1] 1 2 3
> # so what will test_scores[3:1] give us?
18 / 43

Data Structures: indexing vectors (cont.)

The syntax   3:1   gives the vector   c(3, 2, 1), so...

19 / 43

Data Structures: indexing vectors (cont.)

The syntax   3:1   gives the vector   c(3, 2, 1), so...

> test_scores[3:1] # returns 3rd element, then the 2nd, then the first
## [1] 110 99 88
> test_scores # sanity check
## [1] 88 99 110 66 NA
19 / 43

Data Structures: indexing vectors (cont.)

The syntax   3:1   gives the vector   c(3, 2, 1), so...

> test_scores[3:1] # returns 3rd element, then the 2nd, then the first
## [1] 110 99 88
> test_scores # sanity check
## [1] 88 99 110 66 NA
  • So what will the following command do? 🤔
test_scores[c(3, 5, 11)]
19 / 43

Data Structures: changing vectors

We can use indexing to change vectors as well, e.g., reassign the first element

> test_scores[1] <- NA # change the first element to NA
> test_scores[1]
## [1] NA
20 / 43

Data Structures: changing vectors

We can use indexing to change vectors as well, e.g., reassign the first element

> test_scores[1] <- NA # change the first element to NA
> test_scores[1]
## [1] NA

Again, we can use vectors to index as well:

index_missing_scores <- is.na(test_scores) # create an index vector of TRUE & FALSE
test_scores[index_missing_scores] <- -99 # change NA to -99
20 / 43

Data Structures: changing vectors

We can use indexing to change vectors as well, e.g., reassign the first element

> test_scores[1] <- NA # change the first element to NA
> test_scores[1]
## [1] NA

Again, we can use vectors to index as well:

index_missing_scores <- is.na(test_scores) # create an index vector of TRUE & FALSE
test_scores[index_missing_scores] <- -99 # change NA to -99

Let's walk through this...
(🦉 but note a good habit would be to create a new vector, new_test_scores, so we can retain the original data!)

20 / 43

Data Structures: changing vectors (cont.)

> # create an index vector of TRUE & FALSE
> index_missing_scores <- is.na(test_scores)
> index_missing_scores
## [1] TRUE FALSE FALSE FALSE TRUE
21 / 43

Data Structures: changing vectors (cont.)

> # create an index vector of TRUE & FALSE
> index_missing_scores <- is.na(test_scores)
> index_missing_scores
## [1] TRUE FALSE FALSE FALSE TRUE
> # attach these 2 vectors together as columns
> cbind(index_missing_scores, test_scores)
## index_missing_scores test_scores
## [1,] 1 NA
## [2,] 0 99
## [3,] 0 110
## [4,] 0 66
## [5,] 1 NA
21 / 43

Data Structures: changing vectors (cont.)

> # create an index vector of TRUE & FALSE
> index_missing_scores <- is.na(test_scores)
> index_missing_scores
## [1] TRUE FALSE FALSE FALSE TRUE
> # attach these 2 vectors together as columns
> cbind(index_missing_scores, test_scores)
## index_missing_scores test_scores
## [1,] 1 NA
## [2,] 0 99
## [3,] 0 110
## [4,] 0 66
## [5,] 1 NA
  • with   cbind   we are actually creating a new data structure called a matrix
21 / 43

Data Structures: changing vectors (cont.)

> # create an index vector of TRUE & FALSE
> index_missing_scores <- is.na(test_scores)
> index_missing_scores
## [1] TRUE FALSE FALSE FALSE TRUE
> # attach these 2 vectors together as columns
> cbind(index_missing_scores, test_scores)
## index_missing_scores test_scores
## [1,] 1 NA
## [2,] 0 99
## [3,] 0 110
## [4,] 0 66
## [5,] 1 NA
  • with   cbind   we are actually creating a new data structure called a matrix
  • as we will see, matrices can only hold the same data type, so R changes TRUE/FALSE to 1/0 (respectively)
21 / 43

Data Structures: changing vectors (cont.)

> test_scores[index_missing_scores] # access all of the indices with TRUE
## [1] NA NA
22 / 43

Data Structures: changing vectors (cont.)

> test_scores[index_missing_scores] # access all of the indices with TRUE
## [1] NA NA
> # recode NA to -99
> test_scores[index_missing_scores] <- -99
> test_scores
## [1] -99 99 110 66 -99
22 / 43

Data Structures: changing vectors (recap)

When you want to change a vector, do the delta 2-step:

  1. create an index vector that identifies the elements you want to change

    • what data type should this vector hold?
    • logical, i.e. TRUEs and FALSEs
  2. assign new values to the vector using your vector of indices

23 / 43

Data Structures: vector recap

  • We are not going to become 💰 famous 💰 by working with a single vector
24 / 43

Data Structures: vector recap

  • We are not going to become 💰 famous 💰 by working with a single vector

  • However, we have learned a powerful way to work with vectors, indexing, that extends to other types of data structures

24 / 43

Data Structures: vector recap

  • We are not going to become 💰 famous 💰 by working with a single vector

  • However, we have learned a powerful way to work with vectors, indexing, that extends to other types of data structures

  • A matrix made a brief appearance earlier, but before going further let's review a useful framework for thinking about data structures

24 / 43

Data Structures: overview

R has different structures for holding data, which can be organized by...

25 / 43

Data Structures: overview

R has different structures for holding data, which can be organized by...

  1. How many dimensions does the structure have?
25 / 43

Data Structures: overview

R has different structures for holding data, which can be organized by...

  1. How many dimensions does the structure have?

  2. Do the types of data need to be the same?

25 / 43

Data Structures: overview

R has different structures for holding data, which can be organized by...

  1. How many dimensions does the structure have?

  2. Do the types of data need to be the same?

  • Example: vectors

    • only 1 dimension (it is just a single row or a column)
    • we saw earlier that R changes the elements so they all have the same data type (e.g., 4"4")
25 / 43

Data Structures: overview

R has different structures for holding data, which can be organized by...

  1. How many dimensions does the structure have?

  2. Do the types of data need to be the same?

  • Example: vectors

    • only 1 dimension (it is just a single row or a column)
    • we saw earlier that R changes the elements so they all have the same data type (e.g., 4"4")
  • We'll now (re)introduce different data structures, and learn about different data types along the way.
25 / 43

Data Structures: overview (cont.)

  • Vectors

    1. 1 dimension
    2. same data type
    • special case: factor (predefined categories)
  • Matrices

    1. rows and columns
    2. same data type
  • Arrays

    1. any number of dimensions
    2. same data type
26 / 43

Data Structures: overview (cont.)

  • Data Frames

    1. rows and columns
    2. different data types
    • particularly useful for holding a data set with quantitative & qualitative variables
  • Lists

    1. 1 dimension
    2. different data types (or structures!)
    • actually, this is just a special type of vector (can you verify this?)
27 / 43

Data Structures: working with data frames

  • For the rest of this session we will focus on Data frames, the R structure typically used for data sets (i.e., variables as columns and an observation for each row).
28 / 43

Data Structures: working with data frames

  • For the rest of this session we will focus on Data frames, the R structure typically used for data sets (i.e., variables as columns and an observation for each row).

  • Let's get some practice working with data frames using one of R's example data sets

> data(mtcars) ## load one of R's example data sets mtcars
> ls()
## [1] "all_ages" "all_ed" "important_data"
## [4] "index_missing_scores" "Mean_age2" "mean_scores"
## [7] "mtcars" "n_missing" "test_scores"
> is.data.frame(mtcars) ## check that mtcars is a data frame
## [1] TRUE
28 / 43

Data Structures: reading in data sets

Before we proceed with mtcars, a quick example of how to read in a data set.

> # write data to a CSV file called 'copy_mtcars.csv' in the working directory
> write.csv(mtcars, "copy_mtcars.csv")
> mtcars2 <- read.csv("copy_mtcars.csv") # load data set from CSV file
> ls()
## [1] "all_ages" "all_ed" "important_data"
## [4] "index_missing_scores" "Mean_age2" "mean_scores"
## [7] "mtcars" "mtcars2" "n_missing"
## [10] "test_scores"
> is.data.frame(mtcars2)
## [1] TRUE
29 / 43

Data Structures: exploring data frames

  • Since data frames have 2 dimensions, the index requires 2 pieces of info: [row index, column index]
> mtcars[1, 1] # 1st observation in 1st variable
## [1] 21
30 / 43

Data Structures: exploring data frames

  • Since data frames have 2 dimensions, the index requires 2 pieces of info: [row index, column index]
> mtcars[1, 1] # 1st observation in 1st variable
## [1] 21
  • Many times, however, we just work with one variable/column at a time, so all our skills working with vectors still apply
> # if we leave out the row part of the address, we get all rows and a vector
> is.vector(mtcars[, 1])
## [1] TRUE
30 / 43

Data Structures: exploring data frames

  • A useful way to access a single column in a data frame is to use $
> names(mtcars) ## print the variable names
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
> mtcars$mpg ## return the mpg variable
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
31 / 43

Data Structures: exploring data frames

  • A useful way to access a single column in a data frame is to use $
> names(mtcars) ## print the variable names
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
> mtcars$mpg ## return the mpg variable
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
  • Now we will (re)introduce several functions for exploring data frames
  • We will also see a more advanced example of indexing
31 / 43

Data Frames: exploring columns (cont.)

> dim(mtcars) ## print the number of rows and columns
## [1] 32 11
> str(mtcars) ## print structure of data frame
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
32 / 43

Data Frames: summarizing columns

> summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
33 / 43

Data Frames: exploring columns (cont.)

An alternative ways to access a data frame's variable(s):

> mtcars[["mpg"]]
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
> mtcars[, c("mpg", "cyl")]
## mpg cyl
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Datsun 710 22.8 4
## Hornet 4 Drive 21.4 6
## Hornet Sportabout 18.7 8
## Valiant 18.1 6
## Duster 360 14.3 8
## Merc 240D 24.4 4
## Merc 230 22.8 4
## Merc 280 19.2 6
## Merc 280C 17.8 6
## Merc 450SE 16.4 8
## Merc 450SL 17.3 8
## Merc 450SLC 15.2 8
## Cadillac Fleetwood 10.4 8
## Lincoln Continental 10.4 8
## Chrysler Imperial 14.7 8
## Fiat 128 32.4 4
## Honda Civic 30.4 4
## Toyota Corolla 33.9 4
## Toyota Corona 21.5 4
## Dodge Challenger 15.5 8
## AMC Javelin 15.2 8
## Camaro Z28 13.3 8
## Pontiac Firebird 19.2 8
## Fiat X1-9 27.3 4
## Porsche 914-2 26.0 4
## Lotus Europa 30.4 4
## Ford Pantera L 15.8 8
## Ferrari Dino 19.7 6
## Maserati Bora 15.0 8
## Volvo 142E 21.4 4
34 / 43

Data Frames: creating new variables

> mtcars$mpg_squared <- mtcars$mpg * mtcars$mpg
> mtcars[, c("mpg", "mpg_squared")]
## mpg mpg_squared
## Mazda RX4 21.0 441.00
## Mazda RX4 Wag 21.0 441.00
## Datsun 710 22.8 519.84
## Hornet 4 Drive 21.4 457.96
## Hornet Sportabout 18.7 349.69
## Valiant 18.1 327.61
## Duster 360 14.3 204.49
## Merc 240D 24.4 595.36
## Merc 230 22.8 519.84
## Merc 280 19.2 368.64
## Merc 280C 17.8 316.84
## Merc 450SE 16.4 268.96
## Merc 450SL 17.3 299.29
## Merc 450SLC 15.2 231.04
## Cadillac Fleetwood 10.4 108.16
## Lincoln Continental 10.4 108.16
## Chrysler Imperial 14.7 216.09
## Fiat 128 32.4 1049.76
## Honda Civic 30.4 924.16
## Toyota Corolla 33.9 1149.21
## Toyota Corona 21.5 462.25
## Dodge Challenger 15.5 240.25
## AMC Javelin 15.2 231.04
## Camaro Z28 13.3 176.89
## Pontiac Firebird 19.2 368.64
## Fiat X1-9 27.3 745.29
## Porsche 914-2 26.0 676.00
## Lotus Europa 30.4 924.16
## Ford Pantera L 15.8 249.64
## Ferrari Dino 19.7 388.09
## Maserati Bora 15.0 225.00
## Volvo 142E 21.4 457.96
35 / 43

Data Frames: more on indexing

When creating an index, we can also use multiple conditions

  • to satisfy EITHER condition use | (or)
  • to satisfy BOTH conditions use & (and)
> mtcars$mpg > 20 | mtcars$mpg < 25
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE
36 / 43

Data Frames: more on indexing (cont.)

(remember: variables are just vectors, so we can use what we learned earlier)

> cbind(mtcars$mpg, mtcars$mpg < 20 | mtcars$mpg > 30)
## [,1] [,2]
## [1,] 21.0 0
## [2,] 21.0 0
## [3,] 22.8 0
## [4,] 21.4 0
## [5,] 18.7 1
## [6,] 18.1 1
## [7,] 14.3 1
## [8,] 24.4 0
## [9,] 22.8 0
## [10,] 19.2 1
## [11,] 17.8 1
## [12,] 16.4 1
## [13,] 17.3 1
## [14,] 15.2 1
## [15,] 10.4 1
## [16,] 10.4 1
## [17,] 14.7 1
## [18,] 32.4 1
## [19,] 30.4 1
## [20,] 33.9 1
## [21,] 21.5 0
## [22,] 15.5 1
## [23,] 15.2 1
## [24,] 13.3 1
## [25,] 19.2 1
## [26,] 27.3 0
## [27,] 26.0 0
## [28,] 30.4 1
## [29,] 15.8 1
## [30,] 19.7 1
## [31,] 15.0 1
## [32,] 21.4 0
37 / 43

Data Frames: more on indexing (cont.)

And we can use multiple variables

> table(mtcars$mpg > 30 & mtcars$cyl == 6)
##
## FALSE
## 32
> table(mtcars$mpg > 30 & mtcars$cyl == 4)
##
## FALSE TRUE
## 28 4
38 / 43

Data Frames: final indexing example

> hi_mpg <- mtcars$mpg > mean(mtcars$mpg)
> hi_cyl <- mtcars$cyl == 4
> table(hi_mpg, hi_cyl)
## hi_cyl
## hi_mpg FALSE TRUE
## FALSE 18 0
## TRUE 3 11
39 / 43

Data Frames: final indexing example (cont.)

> mtcars$good_car <- FALSE
> mtcars$good_car[hi_mpg & hi_cyl] <- TRUE
> table(mtcars$good_car)
##
## FALSE TRUE
## 21 11
40 / 43

Data Frames: final indexing example (cont.)

Sanity check

> # cbind(mtcars$good_car, hi_mpg, hi_cyl, mtcars$mpg, mtcars$cyl)
> cbind(mtcars$good_car, hi_mpg, hi_cyl)
## hi_mpg hi_cyl
## [1,] FALSE TRUE FALSE
## [2,] FALSE TRUE FALSE
## [3,] TRUE TRUE TRUE
## [4,] FALSE TRUE FALSE
## [5,] FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE
## [8,] TRUE TRUE TRUE
## [9,] TRUE TRUE TRUE
## [10,] FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE
## [18,] TRUE TRUE TRUE
## [19,] TRUE TRUE TRUE
## [20,] TRUE TRUE TRUE
## [21,] TRUE TRUE TRUE
## [22,] FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE
## [26,] TRUE TRUE TRUE
## [27,] TRUE TRUE TRUE
## [28,] TRUE TRUE TRUE
## [29,] FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE
## [32,] TRUE TRUE TRUE
41 / 43

Recap & Moving Forward

  • You should now be familiar with a few of R's data structures

    • (and for knowing when they should be used: # of dimensions & data types)
  • We have also been introduced to some useful functions for manipulating, summarizing, and exploring data

    • There are many more(!) and users contribute R packages that implement a wide range of tools, models, and methods: list of some packages on CRAN
42 / 43

Recap & Moving Forward (cont.)

  • R comes installed with many packages that you can explore & access with the library() function
# library() # list all the packages installed on your computer
library(stats) # load the stats package
# help(package="stats") # look at the package documentation
  • In future session, we will explore some of these packages that are particularly useful for

  • Please join us 😄

43 / 43

Goals for this session

  • Learn about...
2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow