Hands-on Exercise 9D: Visual Multivariate Analysis with Parallel Coordinates Plot

Author

Kristine Joy Paas

Published

June 13, 2024

Modified

June 13, 2024

1 Overview

This hands-on exercise covers Chapter 15: Visual Multivariate Analysis with Parallel Coordinates Plot.

In this exercise, I learned:

  • Doing multivariate analysis with parallel coordinates

2 Getting Started

2.1 Loading the required packages

For this exercise we will use the following R packages:

pacman::p_load(GGally, parallelPlot, tidyverse)

2.2 Importing data

We will the data of World Happines 2018 report. The data set is downloaded from here. The original data set is in Microsoft Excel format. It has been extracted and saved in csv file called WHData-2018.csv.

wh <- read_csv("data/WHData-2018.csv")

glimpse(wh)
Rows: 156
Columns: 12
$ Country                        <chr> "Albania", "Bosnia and Herzegovina", "B…
$ Region                         <chr> "Central and Eastern Europe", "Central …
$ `Happiness score`              <dbl> 4.586, 5.129, 4.933, 5.321, 6.711, 5.73…
$ `Whisker-high`                 <dbl> 4.695, 5.224, 5.022, 5.398, 6.783, 5.81…
$ `Whisker-low`                  <dbl> 4.477, 5.035, 4.844, 5.244, 6.639, 5.66…
$ Dystopia                       <dbl> 1.462, 1.883, 1.219, 1.769, 2.494, 1.45…
$ `GDP per capita`               <dbl> 0.916, 0.915, 1.054, 1.115, 1.233, 1.20…
$ `Social support`               <dbl> 0.817, 1.078, 1.515, 1.161, 1.489, 1.53…
$ `Healthy life expectancy`      <dbl> 0.790, 0.758, 0.712, 0.737, 0.854, 0.73…
$ `Freedom to make life choices` <dbl> 0.419, 0.280, 0.359, 0.380, 0.543, 0.55…
$ Generosity                     <dbl> 0.149, 0.216, 0.064, 0.120, 0.064, 0.08…
$ `Perceptions of corruption`    <dbl> 0.032, 0.000, 0.009, 0.039, 0.034, 0.17…

3 Plotting Static Parallel Coordinates Plots

We will use ggparcoord() to generate parallel coordinates plots.

3.1 Plotting simple parallel coordinates

ggparcoord(data = wh, 
           columns = c(7:12))

3.2 Plotting simple parallel coordinates with boxplot

Adding boxplot will reveal information about the distribution of the values.

ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Parallel Coordinates Plot of World Happines Variables")

This generates values of for each region

3.3 Parallel coordinates with facets

ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Multiple Parallel Coordinates Plots of World Happines Variables by Region") +
  facet_wrap(~ Region)

3.4 Adjusting the x-axis labels

The x-axis labels overlap and are hard to read.

ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Multiple Parallel Coordinates Plots of World Happines Variables by Region") +
  facet_wrap(~ Region) + 
  theme(axis.text.x = element_text(angle = 30, hjust=1))

4 Plotting Interactive Parallel Plots

parallelPlot is an R package specially designed to plot a parallel coordinates plot by using ‘htmlwidgets’ package and d3.js. In this section, you will learn how to use functions provided in parallelPlot package to build interactive parallel coordinates plot.

4.1 The Basic Plot

wh <- wh %>%
  select("Happiness score", c(7:12))
parallelPlot(wh,
             width = 320,
             height = 250)

4.2 Rotate axis label

The axis labels overlap and are hard to read. We will use rotateTitle to avoid overlapping axis labels.

parallelPlot(wh,
             rotateTitle = TRUE)

4.3 Changing the color scheme

We can also change the color scheme

parallelPlot(wh,
             continuousCS = "YlOrRd",
             rotateTitle = TRUE)

4.4 Parallel coordinates plot with histogram

In the code chunk below, histoVisibility argument is used to plot histogram along the axis of each variables.

histoVisibility <- rep(TRUE, ncol(wh))
parallelPlot(wh,
             rotateTitle = TRUE,
             histoVisibility = histoVisibility)

5 Reflections

I understand that this technique is used to visualize multiple variables across many context (e.g., region).

However, its clarity and aesthetics both are on the low end as presented in the exercise, mainly due to the sheer amount of dimensions presented.

Adding a country label when hovering on a line should already improve the clarity.

For multivariate analysis, I prefer to use heatmap as it’s more aesthetic and clearer.