Hands-on Exercise 04b - Visualising Uncertainty

Author

Ong Chae Hui

1. Visualizing the uncertainty of point estimates

A point estimate is a single number, such as a mean.
Uncertainty is expressed as standard error, confidence interval, or credible interval
Important:
- Don’t confuse the uncertainty of a point estimate with the variation in the sample

2. Getting Started

2.1. Installing and Loading the required R Packages

In this exercise using Exam_data, we will be using tidyverse, plotly, crosstalk, DT, ggdist and gganimate.

Code

pacman::p_load(tidyverse, plotly, crosstalk, DT, ggdist, gganimate)

2.2. Importing Data (Exam_data)

Code

exam <- read_csv("data/Exam_data.csv")

2.3. Visualizing the uncertainty of point estimates: ggplot2 methods

The code chunk below performs the followings:

group the observation by RACE,
computes the count of observations, mean, standard deviation and standard error of Maths by RACE, and
save the output as a tibble data table called my_sum.

Code

my_sum <- exam %>%
  group_by(RACE) %>%
  summarise(
    n=n(),
    mean=mean(MATHS),
    sd=sd(MATHS)
    ) %>%
  mutate(se=sd/sqrt(n-1))

my_sum

# A tibble: 4 × 5
  RACE        n  mean    sd    se
  <chr>   <int> <dbl> <dbl> <dbl>
1 Chinese   193  76.5  15.7  1.13
2 Indian     12  60.7  23.4  7.04
3 Malay     108  57.4  21.1  2.04
4 Others      9  69.7  10.7  3.79

Next, the code chunk below will

Code

knitr::kable(head(my_sum), format = 'html')

RACE	n	mean	sd	se
Chinese	193	76.50777	15.69040	1.132357
Indian	12	60.66667	23.35237	7.041005
Malay	108	57.44444	21.13478	2.043177
Others	9	69.66667	10.72381	3.791438

2.4. Visualizing the uncertainty of point estimates: ggplot2 methods

The code chunk below is used to reveal the standard error of mean maths score by race.

Code

ggplot(my_sum) +
  geom_errorbar(
    aes(x=RACE, 
        ymin=mean-se, 
        ymax=mean+se), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    linewidth=0.5) +
  geom_point(aes
           (x=RACE, 
            y=mean), 
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  ggtitle("Standard error of mean maths score by race")

2.5. Visualizing the uncertainty of point estimates: ggplot2 methods

Plotting the 95% confidence interval of mean maths score by race. The error bars are sorted by the average maths scores.

Code

my_sum2 <- exam %>%
  group_by(RACE) %>%
  summarise(
    n=n(),
    mean=mean(MATHS),
    sd=sd(MATHS)
    ) %>%
  mutate(se=sd/sqrt(n-1)) %>%
  mutate(ci95= qt(c(0.05, 0.95), length(n) - 1) * se) %>%
  mutate(ci99= qt(c(0.01, 0.99), length(n) - 1) * se)

my_sum2$RACE = with(my_sum2, reorder(RACE, -mean))

ggplot(my_sum2) +
  geom_errorbar(
    aes(x=RACE, 
        ymin=mean-ci95, 
        ymax=mean+ci95), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    linewidth=0.5) +
  geom_point(aes
           (x=RACE, 
            y=mean), 
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  ggtitle("95% Confidence Interval of mean maths score by race")

2.6. Visualizing the uncertainty of point estimates with interactive error bars

Interactive error bars for the 99% confidence interval of mean maths score by race.

Code

colnames(my_sum) <- c('Race', 'No. of pupils','Avg Scores','Std Dev','Std Error')
colnames(my_sum2) <- c('Race', 'No. of pupils','Avg Scores','Std Dev','Std Error', '95% CI', '99% CI')

DT::datatable(my_sum, class= "compact")

Code

d <- highlight_key(my_sum)

p <- ggplot(my_sum2) +
      geom_errorbar(
        aes(x=Race, 
            ymin=`Avg Scores`-`99% CI`, 
            ymax=`Avg Scores`+`99% CI`), 
        width=0.2, 
        colour="black", 
        alpha=0.9, 
        linewidth=0.5) +
      geom_point(aes
               (x=Race, 
                y=`Avg Scores`, 
                text=paste("N=",`No. of pupils`,"<br>99% CI=",`99% CI`)), 
               stat="identity", 
               color="red",
               size = 1.5,
               alpha=1) +
      ggtitle("99% Confidence Interval of \n mean maths score by race")


gg <- highlight(ggplotly(p), tooltip="text")
#                "plotly_selected")  

crosstalk::bscols(gg,               
                  DT::datatable(d), 
                  widths = 5)

3. Visualising Uncertainty: ggdist package

ggdist is an R package that provides a flexible set of ggplot2 geoms and stats designed especially for visualising distributions and uncertainty.
It is designed for both frequentist and Bayesian uncertainty visualization, taking the view that uncertainty visualization can be unified through the perspective of distribution visualization:
- for frequentist models, one visualises confidence distributions or bootstrap distributions (see vignette(“freq-uncertainty-vis”));
- for Bayesian models, one visualises probability distributions (see the tidybayes package, which builds on top of ggdist).

3.1. Visualizing the uncertainty of point estimates: ggdist methods

In the code chunk below, stat_pointinterval() of ggdist is used to build a visual for displaying distribution of maths scores by race.

NOTE: This function comes with many arguments, refer to the syntax reference here for more detail.

Code

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_pointinterval() +   
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

Code

exam %>%
  ggplot(aes(x = RACE, y = MATHS)) +
  stat_pointinterval(.width = 0.95,
  .point = median,
  .interval = qi) +
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

3.2. Visualizing the uncertainty of point estimates: ggdist methods

Showing the plots with 95% and 99% confidence intervals.

Code

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_pointinterval(
    show.legend = FALSE) +   
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

3.3. Visualizing the uncertainty of point estimates: ggdist methods

In the code chunk below, stat_gradientinterval() of ggdist is used to build a visual for displaying distribution of maths scores by race.

NOTE: This function comes with many arguments, refer to the syntax reference here for more detail.

Code

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_gradientinterval(   
    fill = "skyblue",      
    show.legend = TRUE     
  ) +                        
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Gradient + interval plot")

4. Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)

Step 1: Installing ungeviz package (only need to perform this step once1)

Code

# devtools::install_github("wilkelab/ungeviz")

Step 2: Launch the application in R

Code

library(ungeviz)

Code

ggplot(data = exam, 
       (aes(x = factor(RACE), y = MATHS))) +
  geom_point(position = position_jitter(
    height = 0.3, width = 0.05), 
    size = 0.4, color = "#0072B2", alpha = 1/2) +
  geom_hpline(data = sampler(25, group = RACE), height = 0.6, color = "#D55E00") +
  theme_bw() + 
  # `.draw` is a generated column indicating the sample draw
  transition_states(.draw, 1, 3)

5. Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)

Code

ggplot(data = exam, 
       (aes(x = factor(RACE), 
            y = MATHS))) +
  geom_point(position = position_jitter(
    height = 0.3, 
    width = 0.05), 
    size = 0.4, 
    color = "#0072B2", 
    alpha = 1/2) +
  geom_hpline(data = sampler(25, 
                             group = RACE), 
              height = 0.6, 
              color = "#D55E00") +
  theme_bw() + 
  transition_states(.draw, 1, 3)