by Paul Murrell http://orcid.org/0000-0002-3224-8858
Sunday 29 October 2017
This document
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.
This report describes changes in version 1.6-0 of the 'gridSVG' package for R. The most important result of these changes is that 'gridSVG' is now much faster at generating SVG files (in situations where it used to be painfully slow).
The 'gridSVG' package
(Murrell and Potter)
can be used to convert an R plot
to an SVG format. It differs from the built-in svg
graphics device in several ways: 'gridSVG' only works
on plots drawn using 'grid' (including 'lattice' plots and 'ggplot2' plots)
though the 'gridGraphics' package (Murrell)
provides a pathway for
'graphics'-based plots; the SVG that 'gridSVG' generates contains more
labelling and hierarchical structure (which turns out to be
useful for accessibility, for example in the
'BrailleR' package
(Godfrey, Godfrey and Murrell);
and 'gridSVG' provides access
to SVG features that are not available in normal R graphics
(e.g., masks, fill patterns and gradients, and filters).
One major drawback of the 'gridSVG' package has been that it is very slow, particularly when converting plots that contain many individual shapes, such as a scatterplot with a thousand points.
For the purposes of demonstration, 'gridSVG' version 1.5-1 has been installed as the 'gridSVGslow' package and an example of its performance is shown below.
library(gridSVGslow)
library(lattice) p <- xyplot(runif(1000) ~ runif(1000))
print(p) system.time(grid.export("gridSVGslow.svg"))
user system elapsed 6.704 0.000 6.743
detach("package:gridSVGslow")
Version 1.6-0 of 'gridSVG' includes some internal changes that significantly speed up this sort of plot.
library(gridSVG)
print(p) system.time(grid.export("gridSVG.svg"))
user system elapsed 0.536 0.000 0.587
This section describes the idea behind the internal changes for 'gridSVG' version 0.6-1.
Tauno Metsalu (private correspondence) was the first to identify that
the source of 'gridSVG's poor performance was a for loop within
the code that exports points to SVG. Profiling data showed that
a large percentage of the time was being spent in
repeated calls to the xmlNewNode
function from the
'XML' package.
The following code demonstrates a simplified version of the problem.
We create a dummy XML document and add 1000 <rect> elements
to it by calling newXMLNode
and specifying the
parent
argument to add the elements to the root node
of the document. This is the essence of the internal 'gridSVG'
code prior to version 1.6-0.
library(XML) buildXML <- function() { doc <- newXMLDoc(node=newXMLNode("dummy")) root <- xmlRoot(doc) for (i in 1:1000) { newXMLNode("rect", parent=root) } doc } system.time(loopXML <- buildXML())
user system elapsed 3.632 0.000 3.635
One way to speed this process up is to generate all of the
<rect> elements first and then add them all at once,
with the addChildren
function.
buildXMLchildren <- function() { doc <- newXMLDoc(node=newXMLNode("dummy")) root <- xmlRoot(doc) rects <- lapply(1:1000, function(i) { newXMLNode("rect") }) addChildren(root, kids=rects) doc } system.time(childrenXML <- buildXMLchildren())
user system elapsed 0.128 0.000 0.127
identical(saveXML(loopXML), saveXML(childrenXML))
[1] TRUE
Even faster is to generate the <rect> elements as text
and then parse them with the xmlParse
function
(and then add them all at once
with the addChildren
function).
In the code below, <rect>
has been entered as
\074rect\076
to avoid conflicting with
the HTML syntax of this document.
buildChar <- function() { doc <- newXMLDoc(node=newXMLNode("dummy")) root <- xmlRoot(doc) xmlChar <- paste("\074temp\076", paste(rep("\074rect/\076", 1000), collapse=""), "\074/temp\076", sep="") rects <- xmlParse(xmlChar) addChildren(root, kids=xmlChildren(xmlRoot(rects))) doc } system.time(charXML <- buildChar())
user system elapsed 0.020 0.000 0.024
identical(saveXML(loopXML), saveXML(charXML))
[1] TRUE
This faster approach has been implemented in 'gridSVG' for generating <rect> elements, <circle> elements, and <use> elements (which are used to export data symbols to SVG). These are the only elements that are generated in large quantities for standard plots, so they are the main bottlenecks that have slowed down 'gridSVG' in the past.
The approach of generating XML text and then parsing it was not
applied more widely within 'gridSVG'
for two reasons: it would have required
a great deal of very invasive changes to the source code;
and there are some very useful benefits
from generating XML internal nodes (via newXMLNode
).
An example of the latter is the ability to
build SVG content out of order. For example,
it is easier to create reusable
SVG content, like <symbol> elements,
that appear at the start of the SVG document and are linked
to from <use> elements later in the SVG document.
This section presents the results of a wider range of timings to evaluate the performance of the new version of 'gridSVG'. The time taken to export a simple scatterplot was measured for both old and new versions of 'gridSVG', with the number of points on the scatterplot increasing from 10^1 to 10^3.2 (with the exponent increasing in steps of 0.2). The plot on the left below shows the timings for both old and new versions, with the old version in blue and the new version in black. This shows that the gain in performance increases as we export more points. We can also see that the performance of the old version deteriorates exponentially as the number of points increases. The plot on the right below shows just the timings for the new version of 'gridSVG'. This suggests that, not only is the performance improved overall, but the performance is now deteriorating in a linear fashion (the gain in performance is exponential as the number of points increases). Based on profiling with the 'profvis' package (Chang and Luraschi), it appears that the instability in timings at low numbers of points is due to automatic byte compilation (the first time the 'gridSVG' functions are called) and garbage collection.
Version 1.6-0 of the 'gridSVG' package includes internal changes that have resulted in significant speed improvements when exporting plots that contain a large number of individual elements, such as a scatterplot with many data symbols.
The examples and discussion in this document relate to version 1.6-0 of the 'gridSVG' package.
This report was generated within a Docker container (see Resources section below).
Murrell, P. (2017). Speeding up gridSVG. Technical Report 2017-04, University of Auckland. [ bib ]
This document
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.