DATA 101: Making Prediction with Data
University of British Columbia Okanagan
stripchart()
.For example, let’s generate 100 observations from 1 to 10 and plot them in a strip chart.
Problem what happens if we have two points with the exact same value?
stripchart()
using the method
arugment."overplot"
which simply plots points on-top of each other; see ?stripchart
method
s include "jitter"
and "stack"
plot()
in R.hist()
function which tries to calculate reasonable bins automatically; however, we can manually set them ourselves in the breaks
argumentplot(density(x))
.col
=2 the boxplot turned red.col
values index the colours in R’s palette()
.1 = "black"
, 2 = "red"
, and so on.1="yellow"
, and 2="green"
for example:lines()
superimposed a line overtop our histogram, we can superimpose text in the form of titles and labels (among other things) to a plot that has already been graphed.title()
command.lines()
function to superimpose the density curve on top of the histogram.probability = TRUE
(same as prob = TRUE
,freq = FALSE
)prob = TRUE
/ freq = FALSE
the proportion (rather frequencies) are plotted on the \(y\)-axis.boxplot()
command.The central part of the boxplot is the “box” itself.
It represents the interquartile range (IQR), which spans the middle 50% of the data.
The bottom and top edges of the box correspond to the first quartile (Q1) and the third quartile (Q3), respectively.
The height1 of the box is determined by the range between Q1 and Q3. The box typically contains a horizontal line inside it, representing the median (Q2) of the dataset.
Quantiles are a way to divide a dataset into equal portions1. For boxplots we need:
Median (Q2 or the 50th Percentile): the middle data point when dataset it is ordered from smallest to largest. It divides the data into two equal halves, with 50% of the data falling below it and 50% above it.
First Quartile (Q1 or the 25th Percentile): Q1 divides the lowest 25% of the data from the rest. It is the data point at the 25th percentile, meaning that 25% of the data falls below it.
Third Quartile (Q3 or the 75th Percentile): Q3 divides the lowest 75% of the data from the rest. It is the data point at the 75th percentile, meaning that 75% of the data falls below it.
\[IQR = Q3 - Q1\]
mtcars
For demonstration purposes, let’s have a look at the mtcars
dataset; see ?mtcars
.
We’ll focus on
mpg
Miles/(US) galloncyl
Number of cylindersmpg
variable for cars with 4, 6, and 8 cylinders, respectively.subset()
or split()
location
can be specific co-ordinates or a keyword: “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right”, “bottomright”, or “center”legend
is a vector of characters to appear in the legend....
provide the characteristics, eg. col
(colour), lty
(line type), pch
(plotting character) distinguishing the members in your legend.width
and height
for R plots are set to 7; we can change these defaults using the fig.width
and fig.height
code chunk options2
par()
function.par()
sets the global state for any graphical related commands.dev.off()
or change them back to their default).?par
To see the default setting (assuming you have redefined them already):
It is useful to sometimes change the defaults to remove excessive white space
Create a legend that labels points by the number of cylinders cyl
. Use red, black, and green for the values or 4, 6, and 8, respectively.
Create a scatterplot for mpg
vs. disp
. Create a legend that labels points by the number of cylinders cyl
. Use circles, triangles, and squares for the values or 4, 6, and 8, respectively.
Another helpful graphical parameter setting (recall other on this slide) is mfrow
, mfcol
This expects a vector of the form c(nr, nc)
.
Subsequent figures will be drawn in an nr-by-nc
array on the device by columns (mfcol
), or rows (mfrow
), respectively.
You can save the graph using a variety of methods in R:
Once you’ve completed your plotting with, you can then use one of these file output functions to save the plot to a file depending on the formation you want. Here’s an example using the pdf()
function to save a plot to a PDF file:
Comments
Before we make our plot we first open a PDF file using
pdf("boxplot.pdf")
, and any subsequent plots will be directed to this PDF file.After completing the plot, we close the PDF device using
dev.off()
.You can replace
"boxplot.pdf"
with the file path and name you want to use; otherwise it will get saved to your working directory.The same approach applies to other file formats; you can use
png()
,jpeg()
,svg()
, ortiff()
instead ofpdf()
to save plots in those formats.