A New Zealand Price Kaleidoscope

David Banks

University of Auckland
Department of Statistics

Paul Murrell

University of Auckland
Department of Statistics

This document is licensed under a Creative Commons Attribution 3.0 New Zealand License .

Abstract

Introduction

Drawing a Voronoi Treemap

Selecting Initial Site Locations
Obtaining target proportions from CPI data
Filling regions

Adding Labelling

Adding Interactivity

Zooming

Source code

Bibliography

Introduction

The Federal Statistical Office of Germany provides a Price Kaleidoscope to display data on how different products and services contribute to the Consumer Price Index (CPI). The Price Kaleidoscope is fundamentally a Voronoi Treemap, with the size of different regions reflecting the percentage contribution of different price categories to the overall CPI. However, the Price Kaleidoscope also has interactive features such as tooltips to display more detailed information about price categories and it is possible to zoom in to see more detail. Figure 1, “The German Price Kaleidoscope”shows a screen shot of the complete kaleidoscope, with tooltips, and another screen shot with a zoomed view.

Figure 1. The German Price Kaleidoscope

This report describes the construction of a similar Price Kaleidoscope for New Zealand's CPI data. This task required developing code to draw the fundamental Voronoi Treemap, code to add labelling to the Treemap, and code to add interactivity such as tooltips and zooming. Each of these steps is described in the following sections.

Drawing a Voronoi Treemap

The Voronoi Treemap that forms the basis of the Price Kaleidoscope consists of an overall circular region, which is divided into sub-regions based on the percentage contributions of different price categories to the overall CPI. A set of R functions have been developed to generate additively-weighted Voronoi tesselations and these functions can be used to generate a Voronoi Treemap.

An R function called allocate() can be used to generate an additively-weighted Voronoi tesselation, given a set of starting points for the sub-regions, a set of starting weights, a description of the region to be divided up (in this case a circle), and a set of target proportions for the sub-regions. The following code sets up a simple set of these initial values to divide up a circle into 4 sub-regions, where the sub-regions occupy 10%, 20%, 30%, and 40% of the circle respectively.

assign("scale", 1000, envir=.GlobalEnv)
source("VoronoiCode/util.R")
source("VoronoiCode/voronoi.R")
source("VoronoiCode/kaleidescope.R")
source("VoronoiCode/draw.R")
source("VoronoiCode/debug.R")

library("gpclib")
t <- seq(0, 2*pi, length=100)[-1]
circle <- as(list(x=1000*cos(t), y=1000*sin(t)),
             "gpc.poly")
siteX <- c(-500, -500, 500, 500)
siteY <- c(-500, 500, 500, -500)
weights <- seq(10, 40, 10)
target <- weights/sum(weights)

The following code calls allocate() to produce an additively-weighted Voronoi tesselation from these values and uses the drawRegions() function to draw the result (see Figure 2, “A Voronoi Treemap”).

regions <- allocate(letters[1:4], 
                    list(x=siteX, y=siteY),
                    weights, circle, target)

drawRegions(regions, label=TRUE)

Figure 2. A simple Voronoi Treemap displaying a circle divided into four regions that are allocated 10%, 20%, 30%, and 40% of the area of the circle, respectively.

This basic Voronoi Treemap has only one level, but it is a simple matter to construct further levels by taking each top-level region and further subdividing it. For example, the following code takes the fourth region from Figure 2, “A Voronoi Treemap” and divides that into 4 sub-regions (see Figure 3, “A two-level Voronoi Treemap”).

regionFour <- regions$k[[4]]
subsiteX <- c(0, 0, 500, 500)
subsiteY <- c(-500, 0, 0, -500)
subRegions <- allocate(letters[1:4], 
                       list(x=subsiteX, y=subsiteY),
                       weights, regionFour, target)

Figure 3. A two-level Voronoi Treemap displaying a circle divided into four regions that are allocated 10%, 20%, 30%, and 40% of the area of the circle, respectively, with the fourth region further divided into four sub-regions (allocated 10%, 20%, 30%, and 40% of the fourth region, respectively).

Selecting Initial Site Locations

The allocate() function determines the final regions for the Voronoi Treemap by an iterative process. Each region is based on a location, called a site, and the iterative process adjust these site locations so that the regions end up with the desired proportion of the circle. Figure 4, “A Voronoi Treemap animation” shows an animated GIF of this process for Figure 2, “A Voronoi Treemap”.

Figure 4. An animation of the iterative process that the allocate() function follows to generate the Voronoi Treemap.

The iterative process requires an initial set of starting locations for the sites. In the simple example shown above, with only four sites, the starting locations for the sites were chosen to be at the corners of a square that fits within the circle that is being divided up. For more complex cases, the only algorithm employed so far has been to generate a regular grid of locations (that fit within the region that is being divided up) and use an appropriately-sized sample of those locations for the sites.

For the New Zealand Price Kaleidoscope, there are 11 expenditure groups, so we generate a 4 by 4 grid of locations within the circle and then sample 11 locations from the full set of 16 locations. The following code makes use of the startSites() function to generate 16 sites and then uses sample() to select 11 of them (see Figure 5, “The initial sites”).

siteSet <- startSites(11, circle)
samp <- sample(1:length(siteSet$x), 11)
sites <- list(x=siteSet$x[samp], y=siteSet$y[samp])

Figure 5. The initial site locations for the New Zealand Price Kaleidocscope (the selected sites are shown as black dots).

More sophisticated algorithms for selecting starting locations are the subject of ongoing research.

Obtaining target proportions from CPI data

In addition to the site locations, the allocate() function uses a weight value for each site to determine the size and shape of each region that it allocates. Like the site locations, the iterative process adjusts these site weights at each step in order to end up with regions that occupy the required proportions of the circle. So the other main inputs to the allocate() function are a set of initial weights and the target proportions that are used to determine the final division of the circle into regions. In the simple example used above, we want the regions to end up allocated 10%, 20%, 30%, and 40% of the circle, respectively; so the target proportions are .1, .2, .3, and .4. A simple approach is to set the initial weights to these target values (although it helps the allocate() algorithm if the weights are on the order of tens rather than tenths).

For the New Zealand Price Kaleidoscope, the target proportions for each region (just considering expenditure groups at this stage) are based on the expenditure weights for each group, as published by Statistics New Zealand (the worksheet labelled "Table 2" from the Statistics New Zealand workbook was saved in a CSV format).

groups <- read.csv("CPI-review-2011-tables1-1.csv", 
                   skip=10, nrows=11, header=FALSE)[c(1, 5)]
capture.output(groups)

 [1] "                                 V1    V5"
 [2] "1                              Food 18.79"
 [3] "2   Alcoholic beverages and tobacco  6.91"
 [4] "3             Clothing and footwear  4.42"
 [5] "4   Housing and household utilities 23.55"
 [6] "5   Household contents and services  4.44"
 [7] "6                            Health  5.44"
 [8] "7                         Transport 15.12"
 [9] "8                     Communication  3.53"
[10] "9            Recreation and culture  9.12"
[11] "10                        Education  1.84"
[12] "11 Miscellaneous goods and services  6.85"

We can now use the site starting locations and CPI weights to generate a (one-level) Voronoi Treemap for New Zealand's CPI data (see Figure 6, “A CPI Treemap”).

cpiGroup <- allocate(groups[, 1],  sites,
                      groups[, 2],  circle, 
                      groups[, 2]/sum(groups[, 2]))

Figure 6. A one-level Voronoi Treemap representing contributions of expenditure groups to New Zealand's CPI.

A multi-level Voronoi Treemap can be generated easily by taking each of the regions and further subdividing it; site starting locations can again be generated by sampling a regular grid and site weights can be obtained from the more detailed information in the Statistics New Zealand workbook. The code below shows this process for the "Food" expenditure group (the worksheet labelled "Table 1" from the Statistics New Zealand workbook was saved in a CSV format) and the result is shown in Figure 7, “A two-level CPI Treemap”.

food <- cpiGroup$k[[1]]
subgroups <- read.csv("CPI-review-2011-tables1-2.csv", 
                      skip=12, nrows=19, header=FALSE)[c(2, 9)]
subgroups <- subgroups[subgroups[, 1] != "", ]
siteSet <- startSites(nrow(subgroups), food)
samp <- sample(1:length(siteSet$x), nrow(subgroups))
sites <- list(x=siteSet$x[samp], y=siteSet$y[samp])
cpiSub <- allocate(subgroups[, 1],  sites,
                   subgroups[, 2],  food, 
                   subgroups[, 2]/sum(subgroups[, 2]))

Figure 7. A two-level Voronoi Treemap representing contributions of expenditure groups to New Zealand's CPI plus contributions of expenditure subgroups within the "Food" group.

Each expenditure subgroup can then be further subdivided into expenditure classes in a similar fashion. An example of the complete result is shown in Figure 8, “A three-level CPI Treemap”.

Figure 8. A three-level Voronoi Treemap representing contributions of expenditure groups, expenditure subgroups, and expenditure classes to New Zealand's CPI.

Filling regions

The regions within the Voronoi Treemap are filled with colours that reflect the percentage change in prices within the corresponding expenditure group.

Data for price changes were obtained from Statistics New Zealand (the worksheet called "8.01" was saved as a CSV file).

priceChange <- read.csv("cpi-sep12-all-tables-1.csv", 
                      skip=13, nrows=169, header=FALSE)[c(1:3, 6)]
groupPrice <- priceChange[priceChange[, 1] != "", c(1, 4)]
groupPrice[, 1] <- gsub(" group", "", groupPrice[, 1])
capture.output(groupPrice)

 [1] "                                  V1   V6"
 [2] "1                               Food  1.1"
 [3] "23   Alcoholic beverages and tobacco  0.3"
 [4] "31             Clothing and footwear -0.3"
 [5] "45   Housing and household utilities  0.8"
 [6] "63   Household contents and services  0.4"
 [7] "81                            Health  0.9"
 [8] "93                         Transport -1.1"
 [9] "113                    Communication -1.6"
[10] "119           Recreation and culture -0.1"
[11] "142                        Education  0.1"
[12] "149 Miscellaneous goods and services  1.1"

The price changes are discretized into 40 different levels ...

breaks <- seq(-2, 2, .1)
levels <- cut(groupPrice[, 2], breaks)

... and a colour assigned to each level ...

library(colorspace)
cols <- diverge_hcl(40)

... so that each region can be filled in with an appropriate colour (see Figure 9, “A colour CPI Treemap”).

drawRegions(topLevel, fill=cols[levels], 
            top=FALSE, label=TRUE, lwd=3)

Figure 9. A one-level Voronoi Treemap representing contributions of expenditure groups to New Zealand's CPI, with fill colour used to represent change in price for each group.

A slightly more detailed version of this approach can be used to represent price changes at the level of expenditure classes (see Figure 10, “A Static Kaleidoscope”).

Figure 10. A static Price Kaleidoscope representing contributions of expenditure groups, subgroups, and classes to New Zealand's CPI, with fill colour used to represent change in price for each expenditure class.

Adding Labelling

As the Voronoi Treemap becomes more complex, with multiple levels, it becomes harder to provide labelling for all of the regions in the Treemap. This section describes how we provided labels for expenditure groups by drawing the labels to either side of the Treemap.

The algorithm is as follows (see Figure 11, “Labelling a kaleidoscope”; a link to the R code implementation is provided in the section called “Source code”):

Determine the centre of each expenditure group region.
If the region centre is in the right half of the kaleidoscope, draw the region label on the right of the kaleidoscope (ditto for left).
Check for overlapping labels by calculating label heights and combining those with vertical position of region centres.
Resolve overlaps by adjusting label veritcal positions.

Figure 11. A static Price Kaleidoscope representing contributions of expenditure groups to New Zealand's CPI, with lines to indicate where labels would be drawn. The initial label locations are based on the centroids of the regions and these are then adjusted vertically if necessary to avoid overlapping labels.

Adding Interactivity

Another approach to labelling the regions of the Pride Kaleidoscope involves adding tooltips that display a text label while the mouse is over a region.

This was achieved by using the gridSVG package to add attributes to each region to provide a response when the mouse is over the region, plus javascript code to draw a tooltip. A simple example is shown below; the grid.garnish() function is used to define javascript actions in response to the mouse and the grid.garnish() function is used to attach a javascript file to the picture. The gridToSVG() function generates the final SVG image (see Figure 12, “A kaleidoscope with tooltips”).

drawRegions(cpiGroup, label=TRUE, top=FALSE, fill="grey90")
grid.garnish("Food", 
             onmouseover="showTooltip(evt, 'Food')",
             onmouseout="hideTooltip()")
grid.script(file="tooltip.js")
gridToSVG("tooltip.svg")

Figure 12. An interactive Price Kaleidoscope representing contributions of expenditure groups to New Zealand's CPI. When the mouse is over the "Food" region, a tooltip is shown.

The same approach can be used to highlight a region with a strong border when the mouse is over the region; all that changes is the sophistication of the javascript (the section called “Source code” has a link to the full javascript code used in the New Zealand Price Kaleidoscope).

Zooming

A final technique used to help identify small regions within the Price Kaleidoscope involves zooming the view so that only one expenditure group is visible. This allows side labelling to be applied to the subgroups and tooltips to be generated for the expenditure classes.

A simple way to achieve this zooming is to create a hyperlink for each expenditure group region that links to a separate image consisting of just a single expenditure group. This is shown in the following code; the grid.hyperlink() function generates the hyperlink (see Figure 13, “A kaleidoscope with hyperlinks”).

drawRegions(cpiGroup, label=TRUE, top=FALSE, fill="grey90")
grid.hyperlink("Food", href="target.svg")
gridToSVG("hyperlink.svg")

drawRegions(cpiSub, label=TRUE, top=FALSE, fill="grey90")
gridToSVG("target.svg")

Figure 13. An interactive Price Kaleidoscope representing contributions of expenditure groups to New Zealand's CPI. When the mouse is clicked over the "Food" region, the browser will navigate to a "zoomed" view of that region.

Because the labelling procedure is automated (see the section called “Adding Labelling”), labelling the zoomed views is straightforward.

Source code

This section contains links to the source code files. All code is free software, released under the GPL.

Utility functions
Code for generating an additively weighted Voronoi tessellation (including the allocate() function)
Code for drawing a Voronoi Treemap (including the drawRegions() function)
Debugging code

The code below is based on a couple of CSV files (exported from Statistics New Zealand workbooks), plus an R data binary file.

Positioning the labels to either side of the kaleidoscope.
Top level code to load R packages, source other R code, and generate complete Price Kaleidoscope.
Javascript code for tooltips and highlighting, plus navigation between different zoom levels and between different views (showing different quarter comparisons).