Up until this point we have provided examples mostly in Altair with the understanding that ggplot has a similar counterpart.
As Altair is relatively new, and ggplot2 is one of the most widely used and documented packages in R, it does have functionalities that Altair has yet to implement.
One such example is violin plots.
Learning Outcomes
Create density, box plots, and violin plots using ggplot
Data
Below is the reprocessed movies data frame (to see how it was processed see the accompanying ipynb)
# the above is the cleaned versionlibrary(rjson)library(tidyverse)movies <-fromJSON(file ='data/lec-movies.json') %>%as_tibble() %>%unnest(-c(countries, genres))head(movies)
ggplot(free_both) +aes(x = runtime, y = genres, fill = genres) +geom_violin(draw_quantiles =c(0.25, 0.5, 0.75)) +facet_wrap(~countries)
Comments
When possible, it is a good idea to have a look at where the individual data points are.
Of course we could always layer on different marking of our data (using geom_point() for example)
However when we have a lot of data, this could be impossible to read.
For this we can use a categorical scatter plot where the dots are spread/jittered1 randomly on the non-value axis so that they don’t all overlap via geom_jitter().
Layering Points
We can layer the points onto the violin plots:
ggplot(free_both) +aes(x = runtime, y = genres, fill = genres) +geom_violin() +geom_point() +facet_wrap(~countries)
Jittering Data
“jittering” adds some noise to the location of each point
ggplot(free_both) +aes(x = runtime, y = genres, fill = genres) +geom_violin() +geom_jitter() +facet_wrap(~countries)
Order matters
We can change the default height and order or layers
ggplot(free_both) +aes(x = runtime, y = genres, fill = genres) +geom_jitter(height =0.2, alpha =0.3) +geom_violin() +facet_wrap(~countries)
Unfaceting
Rather than faceting we could fill by countries
ggplot(free_both) +aes(x = runtime, y = genres, fill = countries) +geom_violin(draw_quantiles =c(0.25, 0.5, 0.75))
Comments
When possible, it is a good idea to have a look at where the individual data points are.
Of course we could always layer on different marking of our data (using
geom_point()
for example)However when we have a lot of data, this could be impossible to read.
For this we can use a categorical scatter plot where the dots are spread/jittered1 randomly on the non-value axis so that they don’t all overlap via
geom_jitter()
.