Lecture 3: Introduction to R Programming

DATA 101: Making Prediction with Data

Dr. Irene Vrbik

University of British Columbia Okanagan

Introduction

In today’s lecture, we will go a bit deeper into programming by learning:

Operators: Comparison Operators, Logical Operators
Conditionals, e.g. if, if … else, else if, statements
Conditional Indexing and Subsetting (base R wrangling)
Loops

These concepts are fundamental for data manipulation and analysis in R

Operators

R has several operators to perform tasks
We have already seen two:
1. assignment operators (eg. = and ->)
2. arithmetic operators (eg. +, -, *, /, ^, %%)
Other types of operators include
- Comparison operators
- Logical Operators

Comparison Operators

Comparison operators are used to determine whether a specific relationship exists between two values or expressions (e.g., equality, inequality, greater than).
Comparison operators return a logical value, which is either TRUE (if the comparison is true) or FALSE (if the comparison is false).

🤓 You can think of comparison statements as questions.
Q: is 3 < 4 (R input: 3 < 4)?
A: yes! (R output: TRUE)

Examples

Here is a list of some handy comparison operators:

Less than: <
Greater than: >
Less than or equal to: <=
Greater than or equal to: >=
Is equal to: ==
Is NOT equal to: !=

3 < 4

[1] TRUE

3 > 4

[1] FALSE

4 >= 4

[1] TRUE

3 <= 4

[1] TRUE

4 == 4

[1] TRUE

3 != 3

[1] FALSE

Logical Operators

Logical operators are used to manipulate and combine logical values (i.e., TRUE and FALSE).
Logical operators are typically used to create more complex conditions by combining the results of simpler conditions.
In programming, there are three common logical operators: AND (&), OR (|), and NOT (!).

# admitted to the bar?
age = 18; hasID = TRUE
(age >= 18 & hasID)

[1] TRUE

# satisfy pre-reqs: One of STAT205, STAT230
courses <- c("STAT230", "DATA101", "DATA301")
("STAT205" %in% courses || 
    "STAT230" %in% courses)

[1] TRUE

Scalar vs Element-wise Operators

⚠️ Warning: “longform” vesions of exist (&& and ||) to provide flexibility in different use cases

& and | perform element-wise logical operations when applied to vectors, matrices, or arrays (may return a logical vector with length >1)
&& and ||, are designed for scalar operations (will return a logical scalar)

Examples using AND

x <- c(TRUE, FALSE, TRUE)
y <- c(FALSE, TRUE, TRUE)

Scalar usage of AND

TRUE & TRUE   # same TRUE && TRUE
TRUE & FALSE  # same FALSE && TRUE
FALSE & FALSE # same FALSE && FALSE

[1] TRUE

[1] FALSE

[1] FALSE

Using & with vectors (element-wise evaluation)

x & y

[1] FALSE FALSE  TRUE

TRUE & y

[1] FALSE  TRUE  TRUE

Examples using OR

x <- c(TRUE, FALSE, TRUE)
y <- c(FALSE, TRUE, TRUE)

Scalar usage of OR:

TRUE | TRUE   # same as TRUE || TRUE 
TRUE | FALSE  # same as TRUE || FALSE 
FALSE | FALSE # same as FALSE || FALSE

[1] TRUE

[1] TRUE

[1] FALSE

Using & with vectors (element-wise evaluation)

x | y

[1] TRUE TRUE TRUE

TRUE | y

[1] TRUE TRUE TRUE

4.3.0 News

⚠️ Warning: If you’re using 4.3.0 or higher calling && or || with LHS or RHS of length greater than one will produce an error (see R 4.3.0 NEWS),

In other words, the following will produce errors:

x && y      # Error since x and y are vectors
TRUE && y   # Error since y is a vectors
x && TRUE   # Error since x is a vectors

So && should only be used with scalar logical:

TRUE && TRUE

[1] TRUE

Conditionals

Conditional statements allow us to make decisions in R
Conditional statements allow the program to execute different code blocks or take different actions based on specific conditions.
Common conditional statements: if, else if, else
Just like a flow chart, conditional statements supply a sequence of steps, actions, or decisions in a process or system.

Flow chart

Pseudocode¹

# not to be run in R:

if score is greater than or equal to 80:
  set grade to A
else if score is greater than or equal to 68:
  set grade to B
else if score is greater than or equal to 55:
  set grade to C
else if score is greater than or equal to 50:
  set grade to D
else:
  set grade to F

Syntax

If you try to execute the previous pseudocode in R, you will get an error:

> if score is greater than or equal to 80:
[1] Error: unexpected symbol in "if score"

That is because R is expecting a very specif syntax
A syntax refers to the specific rules and conventions that dictate how code is written and structured e.g. case sensitivity, comments, assignment operators

`if` statements

An if statement allows you to execute different code blocks based on whether a specified condition is true or false. The basic syntax of an if statement in R is as follows:

if (condition) {
  # Code to be executed 
  # if condition is true
}

if (age >= 18 & hasID==TRUE) {
  print("admit to club")
}

[1] "admit to club"

💡 Tip: R is designed to compare the condition in a logical context to TRUE

if (age >= 18 & hasID) {
  print("admit to club")
}

[1] "admit to club"

Components

if: This is the keyword that initiates the if statement.
condition: This is a logical expression that evaluates to either TRUE or FALSE. If the condition is TRUE, the code inside the curly braces {} will be executed; otherwise, it will be skipped.
{}: Curly braces enclose the code block that should be executed when the condition is TRUE. If you have only one statement to execute, the curly braces are optional, but it’s a good practice to include them for readability.

Keywords

In R, a keyword refers to a reserved word that has a predefined meaning and cannot be used as a variable or function name.

Here are some common keywords in R:

if: Used to start an if statement for conditional branching.
else: Used in conjunction with if to provide an alternative code block to execute when the condition is false.
else if: Used in an if statement to specify additional conditions to check when the initial condition is false.
for: Used to create a loop that iterates over a sequence of values.
while: Used to create a loop that continues as long as a specified condition is true.
repeat: Used to create an indefinite loop that continues until explicitly stopped with a break statement.
function: Used to define a user-defined function in R.
return: Used within a function to specify the value to return from that function.
break: Used to exit a loop prematurely.
next: Used in a loop to skip the current iteration and move to the next iteration.
NULL: Represents the absence of a value or missing data.
NA: Stands for “Not Available” and is used to represent missing or undefined values in R.
TRUE and FALSE: Represent the logical values for true and false, respectively.
Inf: Represents positive infinity.
NaN: Stands for “Not-a-Number” and represents undefined or unrepresentable numerical values.

`if ... else` statements

The if...else is used when you have a single condition, and you want to execute one block of code if the condition is true and another block if it’s false.

if (condition) {
  # statement1
} else {
  # statement2
}

if (age >= 18 & hasID) {
  print("admit")
} else { 
  print("deny ")
}

💡 Tip: you can write an if statement in R on a single line if the code block associated with the if statement contains only one statement

if (age >= 18 & hasID) print("admit") else print("deny ")

[1] "admit"

`else if` statements

else if is used when you have multiple conditions to check, and you want to evaluate them in sequence until one of them is true. When the first true condition is found, the associated block of code is executed, and the rest of the conditions are not evaluated.

if (condition) {
  # Code to execute if 
  # condition is TRUE
} else if (another_condition) {
  # Code to execute if 
  # another_condition is TRUE
} else {
  # Code to execute if 
  # no conditions are TRUE
}

fakeID = FALSE
looksOld  = FALSE
if (age >= 18 & hasID) {
  print("legit admit")
} else if (age < 18 & fakeID) {
  print("sneaky admit")
} else if (looksOld) {
  print("old admit")
} else {
  print("deny")
}

Conditional Indexing

We can use these operators in some advanced indexing.
Last lecture we saw how to extract data from using one or several indices (eg x[1], x[c(4,2)])
In practice, you often need to extract data that satisfy a certain criteria.
To do this in one step, we use conditional selection.

Example

x = c("female","male","non-binary","female","male","male","female")
x=="female"

[1]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE

x[x=="female"]

[1] "female" "female" "female"

set.seed(2023)
# sample twelve numbers from 1--10
(y = sample(10, 12, replace=TRUE))

 [1]  5  9  8  3 10  2  1  1  1  1  5  8

# return the elements that are larger than 7
y[y>7]

[1]  9  8 10  8

# return the elements that are <7 than and odd
y[y<7 & y%%2==1]

[1] 5 3 1 1 1 1 5

Subsetting

Instead to using operations to create logicals vectors for indexing, we also have used subset()

subset(data, subset = condition)

This function return subsets of vectors, matrices or data frames which meet conditions.
To show its utility, lets consider a built in data set called iris
To see a description type ?iris

Iris

This famous (Fisher’s or Anderson’s) iris data set gives the measurements (in cm) of the variables:
- sepal length and width and
- petal length and width, respectively,

for 50 flowers from each of 3 species of iris:

setosa, 2. versicolor, and 3. virginica.

iris

Iris Indexing

Extract the rows which correspond to the setosa species

nrow(iris) # count the number of observations

[1] 150

setosa = iris[iris$Species == "setosa",]
nrow(setosa)

[1] 50

Equivalently

setosa = subset(iris, Species=="setosa")

Extract the setosa flowers with long (>5 cm) sepal length

longPetals = subset(iris, Species=="setosa" & Sepal.Length>5)
nrow(longPetals)

[1] 22

Transforming

There’s is also transform() function also provides a quick and easy way to transform the data.
For instance if we want to add a new column which holds the log values of the petal lengths we could type:

dim(iris)

[1] 150   5

irisMore = transform(iris, logPL = log(Petal.Length))
dim(irisMore)

[1] 150   6

irisMore

Splitting

Another handy function is split.
split() generates a list of vectors according to a grouping

iSpecies = split(iris, iris$Species)
names(iSpecies)

[1] "setosa"     "versicolor" "virginica"

Note that iSpecies$setosa creates the same subset as setosa defined on this slide. We can verify this using:

all.equal(iSpecies$setosa,setosa)

[1] TRUE

Order

A related function is order() which order provides the indexing of x which provides the sorted vector sortx.

(o <- order(x))

[1] 1 4 7 2 5 6 3

x[o]

[1] "female"     "female"     "female"     "male"       "male"      
[6] "male"       "non-binary"

We can use order to rearange the rows of data set to agree with a sorting of a particular column, for instance.

Example

Example: rearrange the rows of iris so that the Petal.length is sorted from smallest to largest:

o = order(iris$Petal.Length)
head(o)

[1] 23 14 15 36  3 17

irisSorted = iris[o,]
head(irisSorted)

Comment

Note that order() can also take multiple sorting arguments
For instance, we order(gender, age) in the example of the following slide will give a main division into men and women, and within each group, they will be ordered by age.

Example

gender = c("female","male","female","male","male","female")
age = c(36, 24, 25, 40, 22, 23)
df = data.frame(gender=gender, age=age)
o = order(df$gender, df$age)
df[o,]

Missing Data

In R, missing values are represented as NA (Not Available).
NaN (Not a Number) is usually the product of some arithmetic operation and represents impossible values (e.g., dividing by zero).
We can check for these using is.na(), is.nan()

Example

y = -1:3  # fills elements 1--5
y[7] = 7  # element 6 is missing
y

[1] -1  0  1  2  3 NA  7

is.na(y)

[1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

(sy = sqrt(y))   # take the square roots

[1]      NaN 0.000000 1.000000 1.414214 1.732051       NA 2.645751

is.nan(sy)

[1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Infinite values

Arithmetic operations may result in infinity (or negative infinity)
This concept is represented in R using Inf and -Inf
We can check if a number is finite/infinite using is.finite()/is.infinite()
- is.finite(NA)/is.infinite(NA) returns FALSE/FALSE
- is.finite(NaN)/is.infinite(NaN) returns FALSE/FALSE
- is.finite(Inf)/is.infinite(Inf) returns FALSE/TRUE

Example

[1] -1  0  1  2  3 NA  7

(ly = log(y))

[1]       NaN      -Inf 0.0000000 0.6931472 1.0986123        NA 1.9459101

is.finite(ly) # is.finite with NA/NaN/Inf all return FALSE

[1] FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE

is.infinite(ly)

[1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

Missing Values

An easy way to remove rows from a data set having missing values is:

newdata <- na.omit(mydata)

Some functions may having built in arguments to remove missing values from the calculation:

mean(y)

[1] NA

mean(na.omit(y))

[1] 2

💡 Some functions have this feature built in as an argument option:

mean(y, na.rm = TRUE)

[1] 2

NAs

It may happen that we would like to replace values that meet a certain condition with NA

# replace scores outside of allowable range with NA
student_scores <- c(85, 92, -54, 78, 90, 101, 67, 75, 88)
student_scores[student_scores>100 | student_scores<0] = NA
student_scores

[1] 85 92 NA 78 90 NA 67 75 88

On the flip side, we could easily replace NAs by some value

# replace NAs with 0s
student_scores[is.na(student_scores)] = 0

N.B. this common mistake:

> (70 < student_scores < 90)
Error: unexpected '<'

Fix:

(70 < student_scores & 
   student_scores < 90)

[1]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE

Loops

Looping, (AKA cycling or iterating) provides a way of replicating a set of statements multiple times until some condition is satisfied.
Each time a loop is executed is called an iteration
A for loop repeats statements a number of times. It will iterate based on the number of group/collection elements.
A while loop repeats statements while a condition is true
A repeat loop is repeats continuously until you explicitly break it using the break statement.

`for` loop example

General Syntax

for (item in my_vector) {
  # Code to process each item in my_vector
}

Simple example:

# not executed (space)
for (i in 1:5) {
  print(i) 
}

for loop combined with if statement

for (i in 1:5) {
  if (i%%2 == 0) { 
    # if i is even
    print(paste(i, " is even"))
  }
}

[1] "2  is even"
[1] "4  is even"

Loops with lists

my_list <- list("apple", "banana", "cherry")
for (fruit in my_list) {
  print(fruit)
}

[1] "apple"
[1] "banana"
[1] "cherry"

`while` loop syntax/example

General syntax
while (condition) {
  # Code to be executed 
  # as long as the
  # condition is true
}

Example

count <- 1

while (count <= 5) {
  print(count)
  count <- count + 1
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

⚠️ It’s important to be cautious with while loops to avoid infinite loops. Make sure that the condition eventually becomes false, or include logic within the loop to break out of it when needed.

Infinite Loops

Infinite loops are caused by an incorrect loop condition or not updating values within the loop so that the loop condition will eventually be false.

n = 1
while (n <= 5){
  print(n)
}

Here we forgot to increase $n$. Hence we get an infinite loop (i.e. the code will print 1,2,3, , $\infty$)

`repeat` loops

repeat loops are typically used when your iterative tasks doesn’t have a predetermined stopping point.

General Syntax

repeat {
  # Code to be executed 
  # in each iteration
  
  if (condition) {
    break  # Exit the loop when the condition is met
  }
}

Example:

count <- 1
repeat {
  # Code to execute 
  # in each iteration
  if (count > 5) {
    # Exit the loop when 
    # count exceeds 5
    break
  }
  count <- count + 1
}

Comment

Notice that the condition we place for breaking a repeat loop will be the opposite condition that we had for our while loop

Remember: if the while condition is true, we continue to the next iteration
Remember: if the break condition is false, we continue to the next iteration

`next`

Another reserve word is next
Like break, next does not return a value, it merely transfers control within the loop.
A next statement is useful when we want to skip the current iteration of a loop without terminating it.
On encountering next, the R proceeds to next iteration of the loop (with out executing any remaining statements the current iteration).

Example

x <- c("apple", "ball","cat","dog","elephant","fish")
for (i in seq_along(x)){
  print(i)
  if (i%%2==0)
    next
  print(x[i])
} # wont be printed for even indices

[1] 1
[1] "apple"
[1] 2
[1] 3
[1] "cat"
[1] 4
[1] 5
[1] "elephant"
[1] 6

Lecture 3: Introduction to R Programming

Introduction

Operators

Comparison Operators

Examples

Logical Operators

Scalar vs Element-wise Operators

Examples using AND

Examples using OR

4.3.0 News

Conditionals

Flow chart

Syntax

if statements

Components

Keywords

if ... else statements

else if statements

Conditional Indexing

Example

Subsetting

Iris

Iris Indexing

Transforming

Splitting

Order

Example

Comment

Example

Missing Data

Example

Infinite values

Example

Missing Values

NAs

Loops

for loop example

while loop syntax/example

Infinite Loops

repeat loops

Comment

next

Example

`if` statements

`if ... else` statements

`else if` statements

`for` loop example

`while` loop syntax/example

`repeat` loops

`next`