The Federal Statistical Office of Germany provides a Price Kaleidoscope to display data on how different products and services contribute to the Consumer Price Index (CPI). The Price Kaleidoscope is fundamentally a Voronoi Treemap, with the size of different regions reflecting the percentage contribution of different price categories to the overall CPI. However, the Price Kaleidoscope also has interactive features such as tooltips to display more detailed information about price categories and it is possible to zoom in to see more detail. Figure 1, “The German Price Kaleidoscope”shows a screen shot of the complete kaleidoscope, with tooltips, and another screen shot with a zoomed view.
This report describes the construction of a similar Price Kaleidoscope for New Zealand's CPI data. This task required developing code to draw the fundamental Voronoi Treemap, code to add labelling to the Treemap, and code to add interactivity such as tooltips and zooming. Each of these steps is described in the following sections.
The Voronoi Treemap that forms the basis of the Price Kaleidoscope consists of an overall circular region, which is divided into sub-regions based on the percentage contributions of different price categories to the overall CPI. A set of R functions have been developed to generate additively-weighted Voronoi tesselations and these functions can be used to generate a Voronoi Treemap.
An R function called allocate()
can be used
to generate an additively-weighted Voronoi tesselation,
given a set of starting points for the
sub-regions, a set of starting weights, a description of the
region to be divided up (in this case a circle), and a set of
target proportions for the sub-regions.
The following code sets up a simple set of these initial values
to divide up a circle into 4 sub-regions, where the sub-regions
occupy 10%, 20%, 30%, and 40% of the circle respectively.
assign("scale", 1000, envir=.GlobalEnv) source("VoronoiCode/util.R") source("VoronoiCode/voronoi.R") source("VoronoiCode/kaleidescope.R") source("VoronoiCode/draw.R") source("VoronoiCode/debug.R")
library("gpclib") t <- seq(0, 2*pi, length=100)[-1] circle <- as(list(x=1000*cos(t), y=1000*sin(t)), "gpc.poly") siteX <- c(-500, -500, 500, 500) siteY <- c(-500, 500, 500, -500) weights <- seq(10, 40, 10) target <- weights/sum(weights)
The following code calls
allocate()
to produce an additively-weighted Voronoi tesselation from
these values and uses the
drawRegions()
function
to draw the result (see Figure 2, “A Voronoi Treemap”).
regions <- allocate(letters[1:4], list(x=siteX, y=siteY), weights, circle, target)
Figure 2. A simple Voronoi Treemap displaying a circle divided into four regions that are allocated 10%, 20%, 30%, and 40% of the area of the circle, respectively.
This basic Voronoi Treemap has only one level, but it is a simple matter to construct further levels by taking each top-level region and further subdividing it. For example, the following code takes the fourth region from Figure 2, “A Voronoi Treemap” and divides that into 4 sub-regions (see Figure 3, “A two-level Voronoi Treemap”).
regionFour <- regions$k[[4]] subsiteX <- c(0, 0, 500, 500) subsiteY <- c(-500, 0, 0, -500) subRegions <- allocate(letters[1:4], list(x=subsiteX, y=subsiteY), weights, regionFour, target)
Figure 3. A two-level Voronoi Treemap displaying a circle divided into four regions that are allocated 10%, 20%, 30%, and 40% of the area of the circle, respectively, with the fourth region further divided into four sub-regions (allocated 10%, 20%, 30%, and 40% of the fourth region, respectively).
The
allocate()
function determines the final regions for the Voronoi Treemap
by an iterative process.
Each region is based on a location, called a site,
and the iterative process adjust these site locations so that the regions
end up with the desired proportion of the circle.
Figure 4, “A Voronoi Treemap animation” shows an animated GIF of this process for
Figure 2, “A Voronoi Treemap”.
Figure 4. An animation of the iterative process that the
allocate()
function follows to generate the Voronoi Treemap.
The iterative process requires an initial set of starting locations for the sites. In the simple example shown above, with only four sites, the starting locations for the sites were chosen to be at the corners of a square that fits within the circle that is being divided up. For more complex cases, the only algorithm employed so far has been to generate a regular grid of locations (that fit within the region that is being divided up) and use an appropriately-sized sample of those locations for the sites.
For the New Zealand Price Kaleidoscope,
there are 11 expenditure groups, so we generate a 4 by 4 grid of locations
within the circle and then sample 11 locations from the full set
of 16 locations. The following code makes use of the
startSites()
function to generate 16 sites and then uses
sample()
to select 11 of them (see Figure 5, “The initial sites”).
siteSet <- startSites(11, circle) samp <- sample(1:length(siteSet$x), 11) sites <- list(x=siteSet$x[samp], y=siteSet$y[samp])
Figure 5. The initial site locations for the New Zealand Price Kaleidocscope (the selected sites are shown as black dots).
More sophisticated algorithms for selecting starting locations are the subject of ongoing research.
In addition to the site locations, the
allocate()
function uses a weight value for each site
to determine the size and shape of each region that it allocates.
Like the site locations,
the iterative process adjusts these site
weights at each step in order to end up
with regions that occupy the required proportions of the circle.
So the other main inputs to the
allocate()
function are a set of initial weights
and the target proportions that are used to
determine the final division of the circle into regions.
In the simple example used above, we want the regions to end up allocated
10%, 20%, 30%, and 40% of the circle, respectively; so
the target proportions are .1, .2, .3, and .4.
A simple approach is to set the initial weights to these target values
(although it helps the
allocate()
algorithm if the weights are on the order of tens rather than tenths).
For the New Zealand Price Kaleidoscope, the target proportions for each region (just considering expenditure groups at this stage) are based on the expenditure weights for each group, as published by Statistics New Zealand (the worksheet labelled "Table 2" from the Statistics New Zealand workbook was saved in a CSV format).
groups <- read.csv("CPI-review-2011-tables1-1.csv", skip=10, nrows=11, header=FALSE)[c(1, 5)] capture.output(groups)
[1] " V1 V5" [2] "1 Food 18.79" [3] "2 Alcoholic beverages and tobacco 6.91" [4] "3 Clothing and footwear 4.42" [5] "4 Housing and household utilities 23.55" [6] "5 Household contents and services 4.44" [7] "6 Health 5.44" [8] "7 Transport 15.12" [9] "8 Communication 3.53" [10] "9 Recreation and culture 9.12" [11] "10 Education 1.84" [12] "11 Miscellaneous goods and services 6.85"
We can now use the site starting locations and CPI weights to generate a (one-level) Voronoi Treemap for New Zealand's CPI data (see Figure 6, “A CPI Treemap”).
cpiGroup <- allocate(groups[, 1], sites, groups[, 2], circle, groups[, 2]/sum(groups[, 2]))
Figure 6. A one-level Voronoi Treemap representing contributions of expenditure groups to New Zealand's CPI.
A multi-level Voronoi Treemap can be generated easily by taking each of the regions and further subdividing it; site starting locations can again be generated by sampling a regular grid and site weights can be obtained from the more detailed information in the Statistics New Zealand workbook. The code below shows this process for the "Food" expenditure group (the worksheet labelled "Table 1" from the Statistics New Zealand workbook was saved in a CSV format) and the result is shown in Figure 7, “A two-level CPI Treemap”.
food <- cpiGroup$k[[1]] subgroups <- read.csv("CPI-review-2011-tables1-2.csv", skip=12, nrows=19, header=FALSE)[c(2, 9)] subgroups <- subgroups[subgroups[, 1] != "", ] siteSet <- startSites(nrow(subgroups), food) samp <- sample(1:length(siteSet$x), nrow(subgroups)) sites <- list(x=siteSet$x[samp], y=siteSet$y[samp]) cpiSub <- allocate(subgroups[, 1], sites, subgroups[, 2], food, subgroups[, 2]/sum(subgroups[, 2]))
Figure 7. A two-level Voronoi Treemap representing contributions of expenditure groups to New Zealand's CPI plus contributions of expenditure subgroups within the "Food" group.
Each expenditure subgroup can then be further subdivided into expenditure classes in a similar fashion. An example of the complete result is shown in Figure 8, “A three-level CPI Treemap”.
Figure 8. A three-level Voronoi Treemap representing contributions of expenditure groups, expenditure subgroups, and expenditure classes to New Zealand's CPI.
The regions within the Voronoi Treemap are filled with colours that reflect the percentage change in prices within the corresponding expenditure group.
Data for price changes were obtained from Statistics New Zealand (the worksheet called "8.01" was saved as a CSV file).
priceChange <- read.csv("cpi-sep12-all-tables-1.csv", skip=13, nrows=169, header=FALSE)[c(1:3, 6)] groupPrice <- priceChange[priceChange[, 1] != "", c(1, 4)] groupPrice[, 1] <- gsub(" group", "", groupPrice[, 1]) capture.output(groupPrice)
[1] " V1 V6" [2] "1 Food 1.1" [3] "23 Alcoholic beverages and tobacco 0.3" [4] "31 Clothing and footwear -0.3" [5] "45 Housing and household utilities 0.8" [6] "63 Household contents and services 0.4" [7] "81 Health 0.9" [8] "93 Transport -1.1" [9] "113 Communication -1.6" [10] "119 Recreation and culture -0.1" [11] "142 Education 0.1" [12] "149 Miscellaneous goods and services 1.1"
The price changes are discretized into 40 different levels ...
breaks <- seq(-2, 2, .1) levels <- cut(groupPrice[, 2], breaks)
... and a colour assigned to each level ...
library(colorspace) cols <- diverge_hcl(40)
... so that each region can be filled in with an appropriate colour (see Figure 9, “A colour CPI Treemap”).
Figure 9. A one-level Voronoi Treemap representing contributions of expenditure groups to New Zealand's CPI, with fill colour used to represent change in price for each group.
A slightly more detailed version of this approach can be used to represent price changes at the level of expenditure classes (see Figure 10, “A Static Kaleidoscope”).
Figure 10. A static Price Kaleidoscope representing contributions of expenditure groups, subgroups, and classes to New Zealand's CPI, with fill colour used to represent change in price for each expenditure class.
As the Voronoi Treemap becomes more complex, with multiple levels, it becomes harder to provide labelling for all of the regions in the Treemap. This section describes how we provided labels for expenditure groups by drawing the labels to either side of the Treemap.
The algorithm is as follows (see Figure 11, “Labelling a kaleidoscope”; a link to the R code implementation is provided in the section called “Source code”):
Figure 11. A static Price Kaleidoscope representing contributions of expenditure groups to New Zealand's CPI, with lines to indicate where labels would be drawn. The initial label locations are based on the centroids of the regions and these are then adjusted vertically if necessary to avoid overlapping labels.
Another approach to labelling the regions of the Pride Kaleidoscope involves adding tooltips that display a text label while the mouse is over a region.
This was achieved by using the
gridSVG
package to add attributes to each region to provide a response
when the mouse is over the region, plus javascript code to draw
a tooltip. A simple example is shown below; the
grid.garnish()
function is used to define javascript actions in response to the mouse
and the
grid.garnish()
function is used to attach a javascript file to the picture.
The
gridToSVG()
function generates the final SVG image
(see Figure 12, “A kaleidoscope with tooltips”).
Figure 12. An interactive Price Kaleidoscope representing contributions of expenditure groups to New Zealand's CPI. When the mouse is over the "Food" region, a tooltip is shown.
The same approach can be used to highlight a region with a strong border when the mouse is over the region; all that changes is the sophistication of the javascript (the section called “Source code” has a link to the full javascript code used in the New Zealand Price Kaleidoscope).
A final technique used to help identify small regions within the Price Kaleidoscope involves zooming the view so that only one expenditure group is visible. This allows side labelling to be applied to the subgroups and tooltips to be generated for the expenditure classes.
A simple way to achieve this zooming is to create a hyperlink for
each expenditure group region that links to a separate image
consisting of just a single expenditure group.
This is shown in the following code; the
grid.hyperlink()
function generates the hyperlink (see Figure 13, “A kaleidoscope with hyperlinks”).
Figure 13. An interactive Price Kaleidoscope representing contributions of expenditure groups to New Zealand's CPI. When the mouse is clicked over the "Food" region, the browser will navigate to a "zoomed" view of that region.
Because the labelling procedure is automated (see the section called “Adding Labelling”), labelling the zoomed views is straightforward.
This section contains links to the source code files. All code is free software, released under the GPL.
allocate()
function)
drawRegions()
function)
The code below is based on a couple of CSV files (exported from Statistics New Zealand workbooks), plus an R data binary file.
dynDoc("pricekaleidoscope.xml", "HTML", preXSL = c("http://www.omegahat.org/XSL/docbook/expandDB.xsl", "resolveCodeRef.xsl", "flattenCode.xsl"), out = "pricekaleidoscope.html", force = TRUE, env = xdenv, xslParams = c(html.stylesheet = "http://stattech.wordpress.fos.auckland.ac.nz/wp-content/themes/twentyeleven/style.css customStyle.css", base.dir = "HTML", generate.toc = "article toc")) Tue Dec 18 13:13:48 2012