Introduction to

class: center, middle, inverse, title-slide

.title[
# Introduction to <img src="img/Rlogo.png" width="200" />
]
.author[
### Jason Thomas
]
.institute[
### R Working Group
]
.date[
### Sept. 25th, 2025
]

---

class:slide-font-25
# Welcome to the R Working Group

* Website [https://buckipr.github.io/R_Working_Group/](https://buckipr.github.io/R_Working_Group/)

+ we are Slackers(?) (email Jason at thomas.3912 for an invite)

* *New Theme*: Computational Demography
    
    + impacts of digitization on daily life, social interactions, where &
    what data are, capabilities of technology
    
    + taking us beyond "traditional" approach (e.g., inference from regressions
    using survey data)

* R & Computational Demography/Research?

+ closer to a general programming language
  
  + user community, extensions, & IDE

---
# Goals for this session

* Learn about...

+ basic R syntax
    
    + different R objects (things that hold data) & **indexing** them
    
    + general programming
  
* Become familiar with [R Studio](https://posit.co/download/rstudio-desktop/) &
  develop good coding habits 
  
    + R Studio is an *additional* program that provides many useful features
    for working with R
    
    + (you need to download and install both [R](https://cran.r-project.org/) and 
    [R Studio](https://posit.co/download/rstudio-desktop/))

---
class: inverse, center, middle

# R Studio

---
# R Studio

* Let's dive in by starting R Studio and opening a new R script

+ menu bar: &nbsp; `File` &rarr; `New File` &rarr; `R Script`
    + (in R: &nbsp; `File` &rarr; `New Script`)

* You should now have 4 panes open (like on the next slide)

+ **Source** -- Our script where we will type and save our comments & commands
    + **Console** -- Where we can give R commands and where the output will appear
    + **Output** -- File explorer, plots, help files, and more!
    + **Environments** -- Useful information about the R session

---
.center[<img src="img/rstudio-panes-labeled.jpeg" style="width: 75%" />]

.center[.bottom[downloaded from [user guide on postit.co](https://docs.posit.co/ide/user/ide/guide/ui/ui-panes.html)]]

---
class:slide-font-25
# R Studio: Good Habits

* Add a comment to our new script:
    
    ``` r
    #------------------------------------------------------------------------
    # File name: first_r_script.R
    # last modified: 1843-09-13
    # (start comment with # and R ignores the rest of the line)
    #------------------------------------------------------------------------
    3 + 3 # this useful part is for humans (R will add & ignore the rest)
    ```

* Save our script

+ menu bar: &nbsp; `File` &rarr; `Save As...`

* Set our **working directory**

+ this is where R will start looking for & saving files (e.g., data files or plots)
 
 + menu bar: &nbsp; `Session` &rarr; `Set Working Directory` &rarr; 
 &emsp; &emsp; &emsp; &emsp; `Choose Directory...`

---
class:slide-font-25
# R Studio: Terminal?

In the spirit of computational demography...

* We should (should we?) explore the Terminal
  
  + Console Pane in RStudio has multiple tabs, one of which is a Terminal

* Commands (e.g., `ls` and `pwd`) let you explore the file system on your
computer, make changes, and perform a few advanced tricks...

+ [tutorial](https://support.posit.co/hc/en-us/articles/115010737148-Using-the-RStudio-Terminal-in-the-RStudio-IDE)
  
  + *advanced tricks* include: running multiple R scripts at the same time,
  searching a 100 files to see if (and where) the file includes a certain
  word or sentence

* Operating system is important, but R may try to minimize the differences

---
class: inverse, center, middle

# Basic R

---
class:slide-font-25
# Basic R Syntax

* R syntax takes the form

``` r
# object_name <- object_value 
mean_age <- 33
```

* The symbol "`<-`" is called the assignment operator

+ we are creating a new variable called `mean_age` and assigning it (a
    type and) a value of 33

+ `mean_age = 33` will also work (but `<-` is the convention)

* Useful keyboard shortcut to produce `<-`
 
 + <kbd>Alt</kbd> + <kbd>-</kbd> (Windows)
 
 + <kbd>option</kbd> + <kbd>-</kbd> (Mac)

---
class: slide-font-25
# Basic R Syntax (cont.)

If we enter the name of a variable in the `Console`, then R will list the value(s)

``` r
> Mean_age <- 22 ## note: object names are case-sensitive
> mean_age
```

```
## Error: object 'mean_age' not found
```

``` r
> Mean_age
```

```
## [1] 22
```

BUT we are in the business of good habits...

* type this syntax into our script and (with the cursor on the same line) press the following keys together:

+ On a Mac: &nbsp; <kbd>command</kbd> + <kbd>return</kbd>
 
 + In Windows: &nbsp; <kbd>Ctrl</kbd> + <kbd>Enter</kbd> &emsp; (in R Studio) 
 &emsp; &emsp; &emsp; &emsp; &ensp; <kbd>Ctrl</kbd> + <kbd>R</kbd> &emsp; &emsp; &ensp; (in the R app)

* these keyboard shortcuts will run the syntax on the line in the `Console` 
(or you can highlight a region)

---
class: slide-font-25
# Basic R Syntax: functions

We have seen a simple object for holding data, but R has many useful **functions**

``` r
ls()                         # list all the objects in memory
rm(Mean_age)                 # remove the object called Mean_age2
rm(list=ls())                # deletes all objects (CAREFUL!!!)
getwd()                      # print the working directory (wd)
setwd("Thesis/Analysis")     # set the wd to the folder Thesis/Analysis
dir()                        # list the files in the current directory
dir("../")                   # list the files in the parent directory
save.image("my_data.RData")  # save all the objects in memory
# ???                        # what if you only want to save 1 thing??
load("my_data.RData")        # load all the objects in the data file
```

*Quick note*:

* suppose you create an object called `abc` that holds the value 2
* then you load `data.RData` that also has an object named `abc` but holds the value 99
* the first version of the object (`abc` holding 2) will get replaced

---
class: codefs-50
# Basic R Syntax: help files

* Google searches are a very effective way to find help

+ and so is asking the R Working Group 😎

* R documentation can be accessed in the `Help` tab in the `Output` pane

* Some additional syntax and functions

``` r
?read.csv                     # show the help file for the function read.csv
help.search("weighted mean")  # search help files for the phrase 'weighted mean'
```

* What does the `save` function do, and how do you use it?

---
class: inverse, center, middle

# Data Structures in R

---

## **Data Structures**: motivation

We are not going to solve the world's problems with a single number...

``` r
> all_ages <- c(22, 33, 44, 55) # c() concatenates numbers together
> all_ages
```

```
## [1] 22 33 44 55
```

``` r
> mean(all_ages)                 # calculate the mean
```

```
## [1] 38.5
```

``` r
> all_ed <- c("HS", "Col", "Grad Sch", "HS")
> all_ed
```

```
## [1] "HS"       "Col"      "Grad Sch" "HS"
```

---
## **Data Structures**: motivation (cont.)

R handles different *types* of data as well

``` r
> important_data <- c("OSU", "R", "Group", 4)
> important_data
```

```
## [1] "OSU"   "R"     "Group" "4"
```

Wait, what is going on here?

* we are mixing different types of data & R assumes that we just forgot to
wrap the 4 in quotation marks
    
* sometimes R's assumptions are useful, sometimes they are not! 🤔

---
## **Data Structures**: motivation (cont.)

Here is another example with missing data

``` r
> test_scores <- c(88, 99, 110, 66, NA) # NA is for missing values
> mean_scores <- mean(test_scores)
> mean_scores / 100
```

```
## [1] NA
```

😾 Ugh! Why didn't R tell me there was a problem when I tried to calculate the mean?!?

* another R assumption
    
* can you figure out how to calculate the mean for non-missing values? (help file
is helpful 😄)

---
## **Data Structures**: vectors

* We have been creating **vectors** when we use `c()` to concatenate data

* Here are some more useful functions for working with vectors

``` r
> # test that we have a vector
> is.vector(test_scores)  # returns another data type: TRUE or FALSE (called logical)
```

```
## [1] TRUE
```

``` r
> summary(test_scores)    # numerical summary (less helpful for strings)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   66.00   82.50   93.50   90.75  101.75  110.00       1
```

---
## **Data Structures**: vectors (cont.)

``` r
> length(test_scores)     # how many elements in the vector
```

```
## [1] 5
```

``` r
> is.na(test_scores)      # test if each element is NA
```

```
## [1] FALSE FALSE FALSE FALSE  TRUE
```

``` r
> TRUE + TRUE + FALSE     # useful trick with logical objects (TRUE/FALSE)
```

```
## [1] 2
```

``` r
> n_missing <- sum(is.na(test_scores))
> n_missing
```

```
## [1] 1
```

---
## **Data Structures**: indexing vectors

We can access the `$i^{th}$` element in a vector with the syntax `vector_name[ i ]`

``` r
> test_scores[1]    # first element
```

```
## [1] 88
```

``` r
> test_scores[2]    # second element
```

```
## [1] 99
```

``` r
> 1:3   # a vector of c(1, 2, 3)
```

```
## [1] 1 2 3
```

``` r
>       # so what will test_scores[3:1] give us?
```

---
## **Data Structures**: indexing vectors (cont.)

The syntax &ensp; `3:1` &ensp; gives the vector &ensp; `c(3, 2, 1)`, so...

``` r
> test_scores[3:1]  # returns 3rd element, then the 2nd, then the first
```

```
## [1] 110  99  88
```

``` r
> test_scores       # sanity check
```

```
## [1]  88  99 110  66  NA
```

* So what will the following command do? 🤔

``` r
test_scores[c(3, 5, 11)]
```

---
## **Data Structures**: changing vectors

We can use indexing to change vectors as well, e.g., reassign the first element

``` r
> test_scores[1] <- NA # change the first element to NA
> test_scores[1]
```

```
## [1] NA
```

Again, we can use vectors to index as well:

``` r
index_missing_scores <- is.na(test_scores) # create an index vector of TRUE & FALSE
test_scores[index_missing_scores] <- -99 # change NA to -99
```

Let's walk through this... 
(🦉 but note a good habit would be to create a new vector,
`new_test_scores`, so we can retain the original data!)

---
class: slide-font-25
## **Data Structures**: changing vectors (cont.)

``` r
> # create an index vector of TRUE & FALSE
> index_missing_scores <- is.na(test_scores)
> index_missing_scores
```

```
## [1]  TRUE FALSE FALSE FALSE  TRUE
```

``` r
> # attach these 2 vectors together as columns
> cbind(index_missing_scores, test_scores)
```

```
##      index_missing_scores test_scores
## [1,]                    1          NA
## [2,]                    0          99
## [3,]                    0         110
## [4,]                    0          66
## [5,]                    1          NA
```

* with &nbsp; `cbind` &nbsp; we are actually creating a new **data structure** called a **matrix**

* as we will see, matrices can only hold the same *data type*, so R changes `TRUE`/`FALSE`
to `1`/`0` (respectively)

---
## **Data Structures**: changing vectors (cont.)

``` r
> test_scores[index_missing_scores]  #  access all of the indices with TRUE 
```

```
## [1] NA NA
```

``` r
> # recode NA to -99
> test_scores[index_missing_scores] <- -99
> test_scores
```

```
## [1] -99  99 110  66 -99
```

``` r
> # useful tool for finding the location/position of certain values
> which(test_scores == -99)
```

```
## [1] 1 5
```

---
## Strategy for changing vectors

When you want to change a vector, do the *delta 2-step*:

1. create an index vector that identifies the elements you want to change

* what data type should this vector hold?
    * `logical`, i.e. `TRUE`s and `FALSE`s

2. assign new values to the vector using your vector of indices

---
## **Data Structures**: changing vectors (tips)

Create an index with multiple conditions

+ to satisfy BOTH conditions use `&` (and)
  + to satisfy EITHER condition use `|` (or)

``` r
> cbind(test_scores,
+ test_scores > 0 & test_scores < 90,
+ test_scores < 0 | test_scores > 90)
```

```
##      test_scores    
## [1,]         -99 0 1
## [2,]          99 0 1
## [3,]         110 0 1
## [4,]          66 1 0
## [5,]         -99 0 1
```

---
class: slide-font-25
## **Data Structures**: changing vectors (tips)

Check if values belong to a set with: `%in%`.  For example, here are
some letters

``` r
> letters[1:10]
```

```
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
```

We can check if characters (e.g. "1" or "b") are included in the vector `letters`
with:

``` r
> cbind(c("1", "g", "b", "&") ,
+       c("1", "g", "b", "&") %in% letters)
```

```
##      [,1] [,2]   
## [1,] "1"  "FALSE"
## [2,] "g"  "TRUE" 
## [3,] "b"  "TRUE" 
## [4,] "&"  "FALSE"
```

---
## **Data Structures**: more than vectors

* We are not going to become 💰 famous 💰 by working with
a single vector

* However, we have learned a powerful way to work with vectors, **indexing**, that extends to
other types of **data structures**

* A **matrix** made a brief appearance earlier, but before going further let's review a useful framework
for thinking about **data structures**

---
## **Data Structures**: overview

R has different structures for holding data, which can be 
organized by...

1. How many dimensions does the structure have?

2. Do the types of data need to be the same?

* Example: **vectors**

+ only 1 dimension (it is just a single row or a column)
    
    + we saw earlier that R changes the elements so they all have
    the same data type (e.g., `4` &rarr; `"4"`)

We'll now (re)introduce different data structures, and learn about
different data types along the way.

---
## **Data Structures**: overview (cont.)

* **Vectors**
  1. 1 dimension
  1. same data type
    + special case: **factor** (predefined categories)

* **Matrices**
  1. rows and columns
  1. same data type

* **Arrays** 
  1. any number of dimensions
  1. same data type

---

## **Data Structures**: overview (cont.)

* **Data Frames**
  1. rows and columns
  1. different data types
  - particularly useful for holding a data set with quantitative & qualitative variables

* **Lists**
  1. 1 dimension
  1. different data types (or structures!)
  - actually, this is just a special type of vector (can you verify this?)

---
## **Data Structures**: working with data frames

* For the rest of this session we will focus on **Data frames**, the R structure
typically used for data sets (i.e., variables as columns and an observation for each row).

* Let's get some practice working with data frames using one
of R's example data sets

``` r
> data(mtcars)            ## load one of R's example data sets mtcars
> ls()
```

```
## [1] "all_ages"             "all_ed"               "important_data"      
## [4] "index_missing_scores" "Mean_age"             "mean_scores"         
## [7] "mtcars"               "n_missing"            "test_scores"
```

``` r
> is.data.frame(mtcars)   ## check that mtcars is a data frame
```

```
## [1] TRUE
```

---
## **Data Structures**: reading in data sets

Before we proceed with `mtcars`, a quick example of how to read in a data set.

``` r
> # write data to a CSV file called 'copy_mtcars.csv' in the working directory
> write.csv(mtcars, "copy_mtcars.csv") 
> mtcars2 <- read.csv("copy_mtcars.csv") # load data set from CSV file
> ls()
```

```
##  [1] "all_ages"             "all_ed"               "important_data"      
##  [4] "index_missing_scores" "Mean_age"             "mean_scores"         
##  [7] "mtcars"               "mtcars2"              "n_missing"           
## [10] "test_scores"
```

``` r
> is.data.frame(mtcars2)
```

```
## [1] TRUE
```

---
## **Data Structures**: exploring data frames

* Since **data frames** have 2 dimensions, the index requires 2 pieces of
info: `[row index, column index]`

``` r
> dim(mtcars)
## [1] 32 11
> mtcars[1, 1]  # 1st observation in 1st variable
## [1] 21
```

* Many times, however, we just work with one variable/column at a time, so all our skills
working with vectors still apply

``` r
> # if we leave out the row part of the address, we get all rows and a vector
> is.vector(mtcars[, 1])
```

```
## [1] TRUE
```

---
class: slide-font-25
## **Data Structures**: exploring data frames

* And now, some Old School techniques for working with data frames
* Access a single column in a data frame is to use `$`

``` r
> names(mtcars)  ## print the variable names
> mtcars$mpg     ## return the mpg variable
```

```
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"  
## [10] "gear" "carb"
```

```
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3
## [14] 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3
## [27] 26.0 30.4 15.8 19.7 15.0 21.4
```

* Now we will (re)introduce several functions for exploring data frames
* We will also see a more advanced example of indexing

---
## **Data Frames**: exploring columns (cont.)

``` r
> dim(mtcars)    ## print the number of rows and columns
```

```
## [1] 32 11
```

``` r
> str(mtcars)    ## print structure of data frame
```

```
## 'data.frame':	32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
```

---
## **Data Frames**: summarizing columns

``` r
> summary(mtcars)
```

```
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
```

---
## **Data Frames**: exploring columns (cont.)

An alternative ways to access a data frame's variable(s):

``` r
> mtcars[["mpg"]]
```

```
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3
## [14] 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3
## [27] 26.0 30.4 15.8 19.7 15.0 21.4
```

``` r
> mtcars[1:10, c("mpg", "cyl")]
```

```
##                    mpg cyl
## Mazda RX4         21.0   6
## Mazda RX4 Wag     21.0   6
## Datsun 710        22.8   4
## Hornet 4 Drive    21.4   6
## Hornet Sportabout 18.7   8
## Valiant           18.1   6
## Duster 360        14.3   8
## Merc 240D         24.4   4
## Merc 230          22.8   4
## Merc 280          19.2   6
```

---
## **Data Frames**: creating new variables

``` r
> mtcars$mpg_squared <- mtcars$mpg * mtcars$mpg
> mtcars[1:10, c("mpg", "mpg_squared")]
```

```
##                    mpg mpg_squared
## Mazda RX4         21.0      441.00
## Mazda RX4 Wag     21.0      441.00
## Datsun 710        22.8      519.84
## Hornet 4 Drive    21.4      457.96
## Hornet Sportabout 18.7      349.69
## Valiant           18.1      327.61
## Duster 360        14.3      204.49
## Merc 240D         24.4      595.36
## Merc 230          22.8      519.84
## Merc 280          19.2      368.64
```

---
## **Data Frames**: more on indexing

Recall that when creating an index, we can also use multiple conditions

* to satisfy BOTH conditions use `&` (and)
 * to satisfy EITHER condition use `|` (or)

``` r
> mtcars[mtcars$mpg < 25 & mtcars$mpg > 21, c("mpg", "cyl")]
```

```
##                 mpg cyl
## Datsun 710     22.8   4
## Hornet 4 Drive 21.4   6
## Merc 240D      24.4   4
## Merc 230       22.8   4
## Toyota Corona  21.5   4
## Volvo 142E     21.4   4
```

---
## **Data Frames**: more on indexing (cont.)

(remember: variables are just vectors, so we can use what we learned earlier)

``` r
> cbind(mtcars$mpg, mtcars$mpg < 15 | mtcars$mpg > 20)[1:10,]
```

```
##       [,1] [,2]
##  [1,] 21.0    1
##  [2,] 21.0    1
##  [3,] 22.8    1
##  [4,] 21.4    1
##  [5,] 18.7    0
##  [6,] 18.1    0
##  [7,] 14.3    1
##  [8,] 24.4    1
##  [9,] 22.8    1
## [10,] 19.2    0
```

---
## **Data Frames**: more on indexing (cont.)

And we can use multiple variables

``` r
> table(mtcars$mpg > 30 & mtcars$cyl == 6)
```

```
## 
## FALSE 
##    32
```

``` r
> table(mtcars$mpg > 30 & mtcars$cyl == 4)
```

```
## 
## FALSE  TRUE 
##    28     4
```

---
## **Data Frames**: final indexing example

``` r
> hi_mpg <- mtcars$mpg > mean(mtcars$mpg)
> hi_cyl <- mtcars$cyl == 4
> table(hi_mpg, hi_cyl)
```

```
##        hi_cyl
## hi_mpg  FALSE TRUE
##   FALSE    18    0
##   TRUE      3   11
```

---
## **Data Frames**: final indexing example (cont.)

``` r
> mtcars$good_car <- FALSE
> mtcars$good_car[hi_mpg & hi_cyl] <- TRUE
> table(mtcars$good_car)
```

```
## 
## FALSE  TRUE 
##    21    11
```

---
## **Data Frames**: final indexing example (cont.)

Sanity check

``` r
> # cbind(mtcars$good_car, hi_mpg, hi_cyl, mtcars$mpg, mtcars$cyl)
> cbind(mtcars$good_car, hi_mpg, hi_cyl)[1:15,]
```

```
##             hi_mpg hi_cyl
##  [1,] FALSE   TRUE  FALSE
##  [2,] FALSE   TRUE  FALSE
##  [3,]  TRUE   TRUE   TRUE
##  [4,] FALSE   TRUE  FALSE
##  [5,] FALSE  FALSE  FALSE
##  [6,] FALSE  FALSE  FALSE
##  [7,] FALSE  FALSE  FALSE
##  [8,]  TRUE   TRUE   TRUE
##  [9,]  TRUE   TRUE   TRUE
## [10,] FALSE  FALSE  FALSE
## [11,] FALSE  FALSE  FALSE
## [12,] FALSE  FALSE  FALSE
## [13,] FALSE  FALSE  FALSE
## [14,] FALSE  FALSE  FALSE
## [15,] FALSE  FALSE  FALSE
```

---
class: inverse, center, middle

# General Programming

---
## **General Programming**

* Coders often find themselves in the position of repeating blocks of
code (or similar blocks of code), or needing an existing tool to do
something a little different.

* For these situations we can turn to some more general features of
computer programming

+ `for` loops -- repeating the same steps, but for different
  variables/names/objects

+ functions -- creating your own tool that can take an object,
  manipulate the argument, and return something useful

---
## `for` loops: basic example

``` r
> for (i in 1:10) {
+ msg <- paste("Current index:", i, sep = " ")
+ print(msg)
+ }
```

```
## [1] "Current index: 1"
## [1] "Current index: 2"
## [1] "Current index: 3"
## [1] "Current index: 4"
## [1] "Current index: 5"
## [1] "Current index: 6"
## [1] "Current index: 7"
## [1] "Current index: 8"
## [1] "Current index: 9"
## [1] "Current index: 10"
```

---
## `for` loops: data frame example

``` r
> mtcars[1:4,] <- NA
> for (index_for_col in 1:ncol(mtcars)) {
+ index_for_na <- is.na(mtcars[, index_for_col])
+ mtcars[index_for_na, index_for_col] <- -99
+ }
> mtcars
```

```
##                       mpg cyl  disp  hp   drat      wt   qsec  vs  am
## Mazda RX4           -99.0 -99 -99.0 -99 -99.00 -99.000 -99.00 -99 -99
## Mazda RX4 Wag       -99.0 -99 -99.0 -99 -99.00 -99.000 -99.00 -99 -99
## Datsun 710          -99.0 -99 -99.0 -99 -99.00 -99.000 -99.00 -99 -99
## Hornet 4 Drive      -99.0 -99 -99.0 -99 -99.00 -99.000 -99.00 -99 -99
## Hornet Sportabout    18.7   8 360.0 175   3.15   3.440  17.02   0   0
## Valiant              18.1   6 225.0 105   2.76   3.460  20.22   1   0
## Duster 360           14.3   8 360.0 245   3.21   3.570  15.84   0   0
## Merc 240D            24.4   4 146.7  62   3.69   3.190  20.00   1   0
## Merc 230             22.8   4 140.8  95   3.92   3.150  22.90   1   0
## Merc 280             19.2   6 167.6 123   3.92   3.440  18.30   1   0
## Merc 280C            17.8   6 167.6 123   3.92   3.440  18.90   1   0
## Merc 450SE           16.4   8 275.8 180   3.07   4.070  17.40   0   0
## Merc 450SL           17.3   8 275.8 180   3.07   3.730  17.60   0   0
## Merc 450SLC          15.2   8 275.8 180   3.07   3.780  18.00   0   0
## Cadillac Fleetwood   10.4   8 472.0 205   2.93   5.250  17.98   0   0
## Lincoln Continental  10.4   8 460.0 215   3.00   5.424  17.82   0   0
## Chrysler Imperial    14.7   8 440.0 230   3.23   5.345  17.42   0   0
## Fiat 128             32.4   4  78.7  66   4.08   2.200  19.47   1   1
## Honda Civic          30.4   4  75.7  52   4.93   1.615  18.52   1   1
## Toyota Corolla       33.9   4  71.1  65   4.22   1.835  19.90   1   1
## Toyota Corona        21.5   4 120.1  97   3.70   2.465  20.01   1   0
## Dodge Challenger     15.5   8 318.0 150   2.76   3.520  16.87   0   0
## AMC Javelin          15.2   8 304.0 150   3.15   3.435  17.30   0   0
## Camaro Z28           13.3   8 350.0 245   3.73   3.840  15.41   0   0
## Pontiac Firebird     19.2   8 400.0 175   3.08   3.845  17.05   0   0
## Fiat X1-9            27.3   4  79.0  66   4.08   1.935  18.90   1   1
## Porsche 914-2        26.0   4 120.3  91   4.43   2.140  16.70   0   1
## Lotus Europa         30.4   4  95.1 113   3.77   1.513  16.90   1   1
## Ford Pantera L       15.8   8 351.0 264   4.22   3.170  14.50   0   1
## Ferrari Dino         19.7   6 145.0 175   3.62   2.770  15.50   0   1
## Maserati Bora        15.0   8 301.0 335   3.54   3.570  14.60   0   1
## Volvo 142E           21.4   4 121.0 109   4.11   2.780  18.60   1   1
##                     gear carb mpg_squared good_car
## Mazda RX4            -99  -99      -99.00      -99
## Mazda RX4 Wag        -99  -99      -99.00      -99
## Datsun 710           -99  -99      -99.00      -99
## Hornet 4 Drive       -99  -99      -99.00      -99
## Hornet Sportabout      3    2      349.69        0
## Valiant                3    1      327.61        0
## Duster 360             3    4      204.49        0
## Merc 240D              4    2      595.36        1
## Merc 230               4    2      519.84        1
## Merc 280               4    4      368.64        0
## Merc 280C              4    4      316.84        0
## Merc 450SE             3    3      268.96        0
## Merc 450SL             3    3      299.29        0
## Merc 450SLC            3    3      231.04        0
## Cadillac Fleetwood     3    4      108.16        0
## Lincoln Continental    3    4      108.16        0
## Chrysler Imperial      3    4      216.09        0
## Fiat 128               4    1     1049.76        1
## Honda Civic            4    2      924.16        1
## Toyota Corolla         4    1     1149.21        1
## Toyota Corona          3    1      462.25        1
## Dodge Challenger       3    2      240.25        0
## AMC Javelin            3    2      231.04        0
## Camaro Z28             3    4      176.89        0
## Pontiac Firebird       3    2      368.64        0
## Fiat X1-9              4    1      745.29        1
## Porsche 914-2          5    2      676.00        1
## Lotus Europa           5    2      924.16        1
## Ford Pantera L         5    4      249.64        0
## Ferrari Dino           5    6      388.09        0
## Maserati Bora          5    8      225.00        0
## Volvo 142E             4    2      457.96        1
```

---
## `function`: basic example

``` r
> add_one <- function(x) {
+ x_plus_one <- x + 1
+ return(x_plus_one)
+ }
> add_one(19)
```

```
## [1] 20
```

---
## `function`: return a list (1st effort)

``` r
> my_summary <- function(x, remove_na = TRUE) {
+ if (is.vector(x)) {
+ return("Sorry, I only work with vectors.")
+ }
+ out <- list()
+ out$mean <- mean(out, rm.na = remove_na)
+ out$sd <- sd(out, na.rm = remove_na)
+ out$median <- median(out, na.rm = remove_na)
+ out$msg <- "This looks like a great variable!"
+ return(msg)
+ }
> x <- c(rnorm(20), NA, NA)
> my_summary(x)
```

```
## [1] "Sorry, I only work with vectors."
```

Can you fix this?

---
## `function`: return a list (2nd effort)

``` r
> my_summary <- function(x, remove_na = TRUE) {
+ if (!is.vector(x)) {
+ return("Sorry, I only work with vectors.")
+ }
+ out <- list()
+ out$mean <- mean(x, na.rm = remove_na)
+ out$sd <- sd(x, na.rm = remove_na)
+ out$median <- median(x, na.rm = remove_na)
+ out$msg <- "This looks like a great variable!"
+ return(out)
+ }
> x <- c(rnorm(20), NA, NA)
> my_summary(x)
```

```
## $mean
## [1] -0.02018186
## 
## $sd
## [1] 1.204376
## 
## $median
## [1] 0.06062863
## 
## $msg
## [1] "This looks like a great variable!"
```

---
class: inverse, center, middle

# More R Features

---
## **Additional topics**

* R Markdown & Dynamic Documents

+ [previous R session](https://buckipr.github.io/R_Working_Group/r_markdown/2024_09/intro_r_markdown_np.html#1)

* (More ways of) Integrating R with other software

+ Python:  [reticulate](https://cran.r-project.org/web/packages/reticulate/index.html)
    (several vignettes)
    
    + [Transition from Stata](https://buckipr.github.io/R_Working_Group/transition2R/transition2R.html)

* GitHub & RStudio

---
# Track Changes for Code

* [**GitHub**](https://github.com) is an on-line service for tracking the history of each file in your project
(i.e., **version control**)

+ each project is stored in a *repository*, which can be made public or private (so only you and
   your team can access the files)

+ [how much space?](https://docs.github.com/en/github/managing-large-files/what-is-my-disk-quota)

+ unlimited repositories (public and private); "abundant storage"; try to keep it under 1GB per repo

* Software: [GitHub Desktop](https://desktop.github.com/) (but also hooks into fancy IDEs like R Studio 
and VS Code)

---
# GitHub: additional features

* Branches -- a separate copy of every file in your repository; make changes and (if all goes well)
merge back with the main branch

+ useful for trying new code/tools that *might* work; sensitivity analysis, 
  
* You can host your own website through GitHub: [R Working Group](https://buckipr.github.io/R_Working_Group/)
  
* Extensive userbase -- easy to find help online and many tools/libraries/packages are hosted on GitHub
(help with errors and new features)

* Excellent for group projects!

---
class: slide-font-25
## GitHub: concepts and workflow

* Start by creating a repository on the GitHub website

* **clone** the repository -- creates a copy of all the files on your personal computer

* Make new files and changes to your local copies, then **stage** all the files you want to keep track of

* **commit** all of the files you have staged (you can also add a comment about what was done)

+ this creates a new version (or snap shot) of your project
  
  + you can go back and look at previous commits (or revert to that version) and look at differences between
  commits

* **push** your files to GitHub so that your on-line repository has the latest version of the files

---
## **Recap & Moving Forward**

* You should now be familiar with a few of R's data structures

+ (and for knowing when they should be used: # of dimensions & data types)
  
* We have also been introduced to some useful functions for manipulating, summarizing,
and exploring data

+ There are many more(!) and users contribute **R packages** that implement a wide
  range of tools, models, and methods: [list of some packages on CRAN](https://cran.r-project.org/)

---
## **Recap & Moving Forward** (cont.)

* R comes installed with many packages that you can explore & access with the `library()`
function

```r
# library()              # list all the packages installed on your computer
library(stats)           # load the stats package
# help(package="stats")  # look at the package documentation
```

* In future session, we will explore some of these packages that are particularly useful
for

+ data carpentry: [dplyr](https://dplyr.tidyverse.org/)
    + making plots: [ggplot2](https://ggplot2.tidyverse.org/)

* Please join us 😄