[1] TRUE
[1] FALSE
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
DATA 101: Making Prediction with Data
University of British Columbia Okanagan
In today’s lecture, we will go a bit deeper into programming by learning:
These concepts are fundamental for data manipulation and analysis in R
R has several operators to perform tasks
We have already seen two:
=
and ->
)+
, -
, *
, /
, ^
, %%
)Other types of operators include
Comparison operators are used to determine whether a specific relationship exists between two values or expressions (e.g., equality, inequality, greater than).
Comparison operators return a logical value, which is either TRUE
(if the comparison is true) or FALSE
(if the comparison is false).
🤓 You can think of comparison statements as questions.
Q: is 3 < 4 (R input:3 < 4
)?
A: yes! (R output:TRUE
)
Here is a list of some handy comparison operators:
Logical operators are used to manipulate and combine logical values (i.e., TRUE
and FALSE
).
Logical operators are typically used to create more complex conditions by combining the results of simpler conditions.
In programming, there are three common logical operators: AND (&
), OR (|
), and NOT (!
).
⚠️ Warning: “longform” vesions of exist (
&&
and||
) to provide flexibility in different use cases
&
and |
perform element-wise logical operations when applied to vectors, matrices, or arrays (may return a logical vector with length >1)
&&
and ||
, are designed for scalar operations (will return a logical scalar)
Scalar usage of AND
[1] TRUE
[1] FALSE
[1] FALSE
Using &
with vectors (element-wise evaluation)
Scalar usage of OR:
[1] TRUE
[1] TRUE
[1] FALSE
Using &
with vectors (element-wise evaluation)
⚠️ Warning: If you’re using 4.3.0 or higher calling
&&
or||
with LHS or RHS of length greater than one will produce an error (see R 4.3.0 NEWS),
In other words, the following will produce errors:
So &&
should only be used with scalar logical:
Conditional statements allow us to make decisions in R
Conditional statements allow the program to execute different code blocks or take different actions based on specific conditions.
Common conditional statements: if
, else if
, else
Just like a flow chart, conditional statements supply a sequence of steps, actions, or decisions in a process or system.
> if score is greater than or equal to 80:
[1] Error: unexpected symbol in "if score"
That is because R is expecting a very specif syntax
A syntax refers to the specific rules and conventions that dictate how code is written and structured e.g. case sensitivity, comments, assignment operators
if
statementsAn if
statement allows you to execute different code blocks based on whether a specified condition is true or false. The basic syntax of an if statement in R is as follows:
if
: This is the keyword that initiates the if
statement.
condition
: This is a logical expression that evaluates to either TRUE
or FALSE
. If the condition is TRUE
, the code inside the curly braces {}
will be executed; otherwise, it will be skipped.
{}
: Curly braces enclose the code block that should be executed when the condition is TRUE
. If you have only one statement to execute, the curly braces are optional, but it’s a good practice to include them for readability.
In R, a keyword refers to a reserved word that has a predefined meaning and cannot be used as a variable or function name.
Here are some common keywords in R:
if
: Used to start an if
statement for conditional branching.
else
: Used in conjunction with if
to provide an alternative code block to execute when the condition is false.
else if
: Used in an if
statement to specify additional conditions to check when the initial condition is false.
for
: Used to create a loop that iterates over a sequence of values.
while
: Used to create a loop that continues as long as a specified condition is true.
repeat
: Used to create an indefinite loop that continues until explicitly stopped with a break
statement.
function
: Used to define a user-defined function in R.
return
: Used within a function to specify the value to return from that function.
break
: Used to exit a loop prematurely.
next
: Used in a loop to skip the current iteration and move to the next iteration.
NULL
: Represents the absence of a value or missing data.
NA
: Stands for “Not Available” and is used to represent missing or undefined values in R.
TRUE
and FALSE
: Represent the logical values for true and false, respectively.
Inf
: Represents positive infinity.
NaN
: Stands for “Not-a-Number” and represents undefined or unrepresentable numerical values.
if ... else
statementsThe if...else
is used when you have a single condition, and you want to execute one block of code if the condition is true and another block if it’s false.
💡 Tip: you can write an if statement in R on a single line if the code block associated with the if statement contains only one statement
else if
statementselse if
is used when you have multiple conditions to check, and you want to evaluate them in sequence until one of them is true. When the first true condition is found, the associated block of code is executed, and the rest of the conditions are not evaluated.
x[1]
, x[c(4,2)]
)[1] TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[1] "female" "female" "female"
[1] 5 9 8 3 10 2 1 1 1 1 5 8
[1] 9 8 10 8
[1] 5 3 1 1 1 1 5
subset()
iris
?iris
This famous (Fisher’s or Anderson’s) iris data set gives the measurements (in cm) of the variables:
for 50 flowers from each of 3 species of iris:
Extract the rows which correspond to the setosa
species
[1] 150
[1] 50
Equivalently
Extract the setosa flowers with long (>5 cm) sepal length
transform()
function also provides a quick and easy way to transform the data.split
.split()
generates a list of vectors according to a groupingNote that iSpecies$setosa
creates the same subset as setosa
defined on this slide. We can verify this using:
order()
which order provides the indexing of x
which provides the sorted vector sortx
.[1] 1 4 7 2 5 6 3
[1] "female" "female" "female" "male" "male"
[6] "male" "non-binary"
Example: rearrange the rows of iris
so that the Petal.length
is sorted from smallest to largest:
order()
can also take multiple sorting argumentsorder(gender, age)
in the example of the following slide will give a main division into men and women, and within each group, they will be ordered by age.Arithmetic operations may result in infinity (or negative infinity)
This concept is represented in R using Inf
and -Inf
We can check if a number is finite/infinite using is.finite()
/is.infinite()
is.finite(NA)
/is.infinite(NA)
returns FALSE
/FALSE
is.finite(NaN)
/is.infinite(NaN)
returns FALSE
/FALSE
is.finite(Inf)
/is.infinite(Inf)
returns FALSE
/TRUE
NA
# replace scores outside of allowable range with NA
student_scores <- c(85, 92, -54, 78, 90, 101, 67, 75, 88)
student_scores[student_scores>100 | student_scores<0] = NA
student_scores
[1] 85 92 NA 78 90 NA 67 75 88
NA
s by some valueLooping, (AKA cycling or iterating) provides a way of replicating a set of statements multiple times until some condition is satisfied.
Each time a loop is executed is called an iteration
A for
loop repeats statements a number of times. It will iterate based on the number of group/collection elements.
A while
loop repeats statements while a condition is true
A repeat
loop is repeats continuously until you explicitly break it using the break
statement.
for
loop exampleGeneral Syntax
Simple example:
Loops with lists
while
loop syntax/example⚠️ It’s important to be cautious with while loops to avoid infinite loops. Make sure that the condition eventually becomes false, or include logic within the loop to break out of it when needed.
Infinite loops are caused by an incorrect loop condition or not updating values within the loop so that the loop condition will eventually be false.
Here we forgot to increase \(n\). Hence we get an infinite loop (i.e. the code will print 1,2,3, , \(\infty\))
repeat
loopsrepeat
loops are typically used when your iterative tasks doesn’t have a predetermined stopping point.
next
next
break
, next
does not return a value, it merely transfers control within the loop.next
statement is useful when we want to skip the current iteration of a loop without terminating it.x <- c("apple", "ball","cat","dog","elephant","fish")
for (i in seq_along(x)){
print(i)
if (i%%2==0)
next
print(x[i])
} # wont be printed for even indices
[1] 1
[1] "apple"
[1] 2
[1] 3
[1] "cat"
[1] 4
[1] 5
[1] "elephant"
[1] 6
Comment
Notice that the condition we place for breaking a
repeat
loop will be the opposite condition that we had for ourwhile
loopwhile
condition is true, we continue to the next iterationbreak
condition is false, we continue to the next iteration