Version 1: Wednesday 19 September 2018; original publication
Version 2: update pdf.js code (for displaying PDFs)

This report discusses some problems that can arise when attempting to import PostScript images into R, when the PostScript image contains coordinate transformations that skew the image. There is a description of some new features in the 'grImport' package for R that allow these sorts of images to be imported into R successfully.

1. Introduction

In a general-purpose graphics language, there is usually a "current transformation matrix" that transforms all graphical output. For example, the following PostScript code, stored in a file called simple.ps, describes a simple thick line and a text label.

  %!
  newpath
  50 50 moveto
  100 100 lineto
  10 setlinewidth
  stroke
  /Times-Roman findfont 12 scalefont setfont
  75 50 moveto
  (Hello) show
  showpage

If we add a coordinate transformation at the start of that PostScript code, which inverts the y-axis and skews the image by scaling the x-axis by a factor of two, the transformation affects all of the output. In particular, both the text and the line have been horizontally stretched and the text is now upside down. The code below is stored in a file called transform.ps.

  %!
  0 150 translate
  2 -1 scale 
  newpath
  50 50 moveto
  100 100 lineto
  10 setlinewidth
  stroke
  /Times-Roman findfont 12 scalefont setfont
  75 50 moveto
  (Hello) show
  showpage

This sort of coordinate transformation is often required in statistical graphics to represent the scales on a plot. In the plot below, the y-axis is inverted and the scaling factor on the x-axis is different than the scaling factor on the y-axis (both because of unequal scales and because of unequal physical dimensions). However, in a plot, we do not want text to be written upside down just because the y-axis scale is inverted, nor do we want text to be distorted if the scales on the plot axes do not match. Similarly, when we draw a line through data, we do not want the width of the line to be distorted by differences between the scales on the plot axes.

plot(c(1, 20), c(1, 10), type="l", lwd=20, ylim=c(10, 1), lend="butt")
text(15, 2, "Hello", cex=4, family="Times")

This means that, in R graphics, although features like the location of data symbols and the location of text labels on a plot must obey the coordinate system of the scales on the axes, features like the size of text and the thickness of lines completely ignore the plot coordinate system.

This difference in the treatment of coordinate systems creates a problem when it comes to importing general-purpose graphical images, such as PostScript images, into R. The following code uses version 0.9-1 of the 'grImport' package (made available as the package 'grImportOLD' for this report) to import the transform.ps PostScript image into R. In the resulting image, the line is not distorted in R like it was in the original PostScript image, This reflects the fact that R graphics ignores coordinate transformations when calculating the width (or shape) of the line. (The fact that the text is actually drawn correctly is something that we will come back to later.)

library(grImportOLD)
grImportOLD::PostScriptTrace("transform.ps", "transform-0.9-1.xml")
badTransform <- grImportOLD::readPicture("transform-0.9-1.xml")
grImportOLD::grid.picture(badTransform)

This report describes an update of the 'grImport' package that solves this problem, so that PostScript images that contain coordinate transformations like the one in transform.ps can be rendered faithfully in R.

2. Changes to 'grImport'

The strokepath operator

The changes to 'grImport' in version 0.9-2 are based on the strokepath operator in PostScript. This operator takes a path (a collection of lines and curves) and, instead of stroking it (drawing a line along the path), calculates a new path that describes the outline of the area that would have been drawn by stroking the original path.

The following PostScript code illustrates this idea. This code begins by describing exactly the same path as in transform.ps (a thick straight line between two points, distorted by a scaling transformation). However, just after the 10 setlinewidth operation, this code calls strokepath and then 1 setlinewidth. This means that the subsequent stroke, instead of drawing a thick straight line between two points, draws the outline of that thick straight line.

  %!
  0 150 translate
  2 -1 scale 
  newpath
  50 50 moveto
  100 100 lineto
  10 setlinewidth
  strokepath
  1 setlinewidth
  stroke
  showpage

To reproduce the original distorted straight line in R, instead of importing the original thick straight line between two points, we can import the outline of that thick straight line and fill in the outline. In other words, we turn the original stroke operation of a straight line into a fill operation of the outline of the straight line. This works because, although R graphics cannot draw a skewed thick line, it is capable of filling in an outline.

Adding strokepath to PostScriptTrace

The above shows how it is possible, by inserting a strokepath operation, to convert an original PostScript image containing a stroke that R cannot draw into a PostScript image containing a fill that R can draw. However, there remains the problem of how to add that strokepath conversion to an image that we want to import into R.

The answer to that problem goes to the heart of how the 'grImport' package works. The first step in importing a PostScript image into R is a call to the PostScriptTrace function, which converts a PostScript file into an XML file containing graphics operations that R can draw. The PostScriptTrace function works by running its own PostScript code to process the PostScript file that we want to import. The small R function below, pscode, gives some idea of how that works.

This function just generates some PostScript code, given the name of a PostScript file.

pscode <- function(psimage) {
    c("/stroke {",
      "  strokepath",
      "  fill",
      "} def",
      paste0("(", psimage, ") run"))
}

For example, if we are interested in the simple.ps PostScript image, we generate the following PostScript code.

cat(pscode("simple.ps"), sep="\n")

  /stroke {
    strokepath
    fill
  } def
  (simple.ps) run

The last line of the PostScript code, (simple.ps) run, will execute the PostScript code in the file simple.ps. In other words, the PostScript code we have generated is designed to run the PostScript code in the original image. The code before that defines a PostScript operation called stroke. When the stroke operation is encountered, this definition will be run, which will call the strokepath operator and then the fill operator.

The definition of the stroke operator is significant because that overrides the pre-existing stroke operator. This means that, when we run the PostScript code in simple.ps, when we hit a stroke operation, instead of stroking a path, we will run strokepath and fill instead. This is how we can inject a strokepath operation into an existing PostScript image; by hijacking the standard PostScript stroke operator and replacing it with our own definition.

Determining when to use strokepath

In reality, we do not want to replace a stroke with a fill all the time. We only want to do this when the line has been skewed by unequal scaling factors. The R function below, scalecode, demonstrates how we can do this sort of check, using the same idea as above. We redefine the stroke operator to look at the current PostScript transformation matrix (using the currentmatrix operator) and calculate the amount of x-scaling and y-scaling in force when the stroke operator is called.

scalecode <- function(psimage) {
    c("/str 50 string def",
      "/showscale {",
      "  matrix currentmatrix aload pop pop pop",
      "  dup mul exch dup mul add sqrt 3 1 roll",
      "  dup mul exch dup mul add sqrt",
      "  (xscale=) print dup str cvs print",
      "  (, yscale=) print exch dup str cvs print",
      "} def",
      "/stroke {",
      "  showscale",
      "} def",
      paste0("(", psimage, ") run"))
}

The function below, showscale, shows how we can run ghostscript (Mertz, 1997) from R to run the PostScript code that the scalecode function generates.

showscale <- function(psimage) {
    ps <- scalecode(psimage)
    psfile <- tempfile()
    writeLines(ps, psfile)
    cat(system(paste0("gs -dNOPAUSE -dBATCH -q ",
                      "-sDEVICE=ps2write -sOutputFile=tmp.ps ",
                      psfile), intern=TRUE), "\n")
}

The result of this function for the simple.ps PostScript image (without scaling) is shown below.

showscale("simple.ps")

  xscale=10.0, yscale=10.0

The result for the PostScript image transform.ps shows that the x-scaling and y-scaling are unequal. In the 'grImport' function PostScriptTrace, this case is detected and a strokepath plus fill is used instead of the original stroke.

showscale("transform.ps")

  xscale=20.0, yscale=10.0

The code below shows that the new version of 'grImport' correctly renders the PostScript image transform.ps in R (because it fills the outline of the line, rather than stroking the line).

library(grImport)
PostScriptTrace("transform.ps", "transform-0.9-2.xml")
goodTransform <- readPicture("transform-0.9-2.xml")
grid.picture(goodTransform)

3. Where do skewed lines come from?

The simple PostScript examples used so far have been extremely basic, just to make the coordinate transformations clear. In practice, we are unlikely to be importing a PostScript image that we have generated by hand. The reason for wanting to import an external image into R is because we have used a different graphics system to generate the image. And the reason for using a different graphics system is because other graphics systems are much better than R at generating certain sorts of images. For example, PGF/TikZ (Tantau, 2013) is better than R for drawing diagrams.

This section briefly shows an example of a simple shape generated using MetaPost (Hobby, 1998). The point of this example is to show a graphics system where it is straightforward to describe a line with a skewed transformation.

The MetaPost code below describes a loop shape, by describing a set of points and asking MetaPost to draw a curve through the points.

  prologues := 3;
  outputtemplate := "%j.ps";
  beginfig(1);
    z0 = (0.5cm,1.5cm); 
    z1 = (2.5cm,2.5cm);
    z2 = (6.5cm,0.5cm); 
    z3 = (3.0cm,1.5cm);
  
    pickup pencircle xscaled 2mm yscaled 4mm rotated 30;
    
    draw z0..z1..z2..z3..z0..cycle;
  endfig;
  end

The most important line of code is the line starting pickup pencircle. This line describes the "pen" that is used to draw the curve and it says that the pen is taller than it is wide and that the pen is rotated anticlockwise. The shape of this pen is shown below (at five times its proper size).

This ability to draw lines with a skewed pen makes it very easy to draw skewed lines in a MetaPost image.

If we try to import this image with version 0.9-1 of 'grImport', we get a poor result because R is stroking the line with a fixed width (because that is the best that R can do).

grImportOLD::PostScriptTrace("loop.ps", "loop-0.9-1.xml")
badLetterC <- grImportOLD::readPicture("loop-0.9-1.xml")
grImportOLD::grid.picture(badLetterC)

However, with the new version of 'grImport', we can import the MetaPost image correctly (because we detect that the line has been skewed and fill the outline of the stroke rather than trying to stroke it).

PostScriptTrace("loop.ps", "loop-0.9-2.xml")
goodLetterC <- readPicture("loop-0.9-2.xml")
grid.picture(goodLetterC)

4. Importing dashed lines

The test within the PostScriptTrace function that determines whether the x-scaling and y-scaling within a PostScript file are the same is an example of a test for equality between two floating point values. And testing for equality between floating point values is a mistake that people should make at most once in their life.

0.1 == 0.3 - 0.2

  [1] FALSE

As a consequence, the actual test that PostScriptTrace performs is whether the absolute difference between the x-scaling and the y-scaling is very small. The amount "very small" defaults to 0.1, but PostScriptTrace provides a scaleEPS argument so that the user can control what "very small" means.

This turns out to be useful in a slightly different problem that arises when importing PostScript images into R: dashed lines.

In PostScript, the setdash operator is used to set the dash pattern for stroking a path. The code below shows an example, where the dash pattern is 3 units on then 3 units off, with an offset of 1.5 (so we start half-way through the first "on").

  %!
  newpath
  50 50 moveto
  100 100 lineto
  10 setlinewidth
  [3] 1.5 setdash 
  stroke
  showpage

The line width in this example is set to 10, so the dashes are shorter than the line is wide and, unfortunately, this is something that R graphics cannot emulate. In R graphics, the lty graphics parameter can be specified as an even number of (up to eight) on-off values, but the smallest value is 1, which corresponds to the width of the line. Furthermore, R graphics has no concept of a "dash offset".

This means that 'grImport' does a poor job of importing the PostScript dashed line from above.

PostScriptTrace("dash.ps", "dash-bad.xml")
badDash <- readPicture("dash-bad.xml")
grid.picture(badDash)

However, because we can select our own value of scaleEPS, and whenever the absolute difference between x-scaling and y-scaling is greater than scaleEPS we convert a PostScript stroke to a fill, if we set scaleEPS to be negative, every stroke will be converted to a fill, and 'grImport' will reproduce the dashed line perfectly (as shown below).

library(grImport)
PostScriptTrace("dash.ps", "dash-good.xml", scaleEPS=-1)
goodDash <- readPicture("dash-good.xml")
grid.picture(goodDash)

5. Integrating the imported image with an R plot

All previous examples in this report have only demonstrated that it is possible to replicate an external PostScript image within R. This section describes a slightly more realistic scenario that incoporates an imported PostScript image within an R plot.

The following code produces a 'lattice' (Sarkar, 2008) plot with a small customisation: there is a text label and a line (with an arrow) from the label to a point in the plot.

library(lattice)
library(grid)
rx4 <- mtcars[1, ]
xyplot(mpg ~ disp, mtcars, pch=16,
       panel=function(...) {
           grid.text(rownames(rx4), 400, 27, just="left",
                     default.units="native")
           grid.segments(400, 27, rx4$disp, rx4$mpg,
                         default.units="native",
                         arrow=arrow())
           panel.xyplot(...)
       })

The MetaPost system, with its 'mparrows' module, has a much wider range of line styles and arrowheads than R (even taking into account packages like 'shape' (Soetaert, 2018) and 'DiagrammeR' (Iannone, 2018)). Rather than attempt to rewrite facilities similar to MetaPost's within R, we can just make use of MetaPost to draw a line and then import it into R.

The following code defines a function that generates, imports, and draws a MetaPost line between two points. It first generates MetaPost code to draw a line, based on a start and end point and start and end angles. This code also draws a rectangle of a specified width and height. The MetaPost code is run to produce a PostScript file and the PostScript file is imported into R. Within R, the rectangle is removed (that was just to scale the image correctly) and then the remaining line (plus arrow) is drawn.

mpLine <- function(x1, y1, x2, y2, angle, w, h) {
    mpcode <- c("prologues := 3;",
                'outputtemplate := "%j.ps";',
                "input mparrows;",
                "setarrows(barbed);",
                "barbedarrowindent := .6;",
                "ahlength := 5mm;",
                "beginfig(1);",
                paste0("draw (0,0)-", "-(",
                       w, "cm,0)-", "-(",
                       w, "cm,", h, "cm)-", "-(",
                       "0,", h, "cm)-", "-cycle;"),
                paste0("z0 = (", x1, "cm,", y1, "cm);"),
                paste0("z1 = (", x2, "cm,", y2, "cm);"),
                paste0("drawarrow z0{dir ", angle, "}..z1{dir ", angle, "};"),
                paste0("pickup pencircle xscaled 5pt yscaled .5pt rotated ",
                       angle, ";"),
                paste0("draw z0{dir ", angle, "}..z1{dir ", angle, "};"),
                "endfig;",
                "end")
    mpfile <- "line.mp"
    psfile <- "line.ps"
    xmlfile <- "line.xml"
    writeLines(mpcode, mpfile)
    system(paste0("mpost ", mpfile))
    PostScriptTrace(psfile, xmlfile)
    pic <- readPicture(xmlfile)
    line <- pic[-1]
    line@summary@xscale <- pic@summary@xscale
    line@summary@yscale <- pic@summary@yscale
    grid.picture(line, exp=0)
}

The image below shows an example of the MetaPost image that this function creates (complete with its border rectangle).

The next code makes use of that function to add a MetaPost line to the 'lattice' plot. Some calculations are required to express the line start point and end point, and the overall size of the MetaPost image, in terms of centimetres (all within the 'grid' viewport that corresponds to the 'lattice' plot panel).

xyplot(mpg ~ disp, mtcars,
       panel=function(...) {
           panel.xyplot(...)
           textLeft <- unit(400, "native")
           textY <- unit(27, "native")
           grid.text(rownames(rx4), textLeft, textY, just="left")
           textLeftCM <- convertX(textLeft - unit(1, "mm"), "cm",
                                  valueOnly=TRUE)
           textYCM <- convertY(textY, "cm", valueOnly=TRUE)
           pointRightCM <- convertX(unit(mtcars$disp[1], "native") +
                                    unit(2, "mm"), "cm",
                                    valueOnly=TRUE)
           pointYCM <- convertY(unit(mtcars$mpg[1], "native") -
                                unit(2, "mm"), "cm",
                                valueOnly=TRUE)
           panelWidthCM <- convertWidth(unit(1, "npc"), "cm",
                                        valueOnly=TRUE)
           panelHeightCM <- convertHeight(unit(1, "npc"), "cm",
                                          valueOnly=TRUE)
           mpLine(textLeftCM, textYCM, pointRightCM, pointYCM,
                  135, panelWidthCM, panelHeightCM)
       })

6. Discussion

Why not convert everything to a path?

The 'grImport' package successfully imports skewed lines by converting them to filled paths, but this conversion is only performed where necessary. An alternative would be to convert all shapes into paths (similar to what happens in SVG output with Cairo graphics; Packard et al., 2018), but that is not done in 'grImport' because rendering of paths is not as good as rendering of the original shapes in some cases. For example, rendering a very thin line as a path can lead to either a too heavy line or no line at all because on some graphics devices at least, a line will be anti-aliased, but a path will not. The code below draws a series of thin, horizontal filled rectangles (on the left) and a series of line segments (on the right), with the "width" of the result slowly decreasing. The "widths" of the filled rectangles decrease in quantum steps (and the final one disappears), while the line segments decrease more smoothly due to antialiasing.

grid.rect(x=.2, width=.3, y=1:5/6, height=unit(1:5, "pt"),
          gp=gpar(col=NA, fill="black"))
grid.segments(.6, 1:5/6, .9, 1:5/6,
              gp=gpar(lwd=1:5, lineend="butt"))

Similarly, text that is rendered as filled paths rather than using proper font rendering can produce a worse result because font rendering uses techniques such as "hinting" to produce a good result and filling a path does not use these techniques.

On the other hand, importing text is tricky because it is not necessarily easy or possible to find the font that was used to draw the original image. The reason the upside-down text was imported correctly right back at the beginning of this report is because, by default, 'grImport' converts text in a PostScript image to a filled path.

Importing images versus including images

Being able to import images into R is different from being able to include an image. Importing an image means converting it into the language and data structures of the parent system. By contrast, including an image means keeping the image as a single opaque object (e.g., including a PostScript image within a LaTeX document). Importing an image is useful because it exposes the image to the facilities of the container language. For example, having imported an image into R, we can use R subsetting to extract just some portion of the original image as we did to remove the rectangle border from a MetaPost image in the Section Integrating the imported image with an R plot).

Loss of information

When we are importing an image between higher-level languages, like MetaPost or R, going through a lower-level intermediary, like PostScript, means that we lose some of the information about the original image when we attempt to move between higher-level languages.

This report has described one situation where the higher-level system (R in this case) loses information when importing the lower-level intermediary (PostScript); a skewed stroke in MetaPost, which remains a skewed stroke in PostScript, becomes a filled path in R.

Loss of information can also occur in the transition from higher-level language to lower-level intermediary. For example, in MetaPost it is possible to describe a line that is drawn with a pen that varies along the length of the line (essentially producing a variable-width line). This cannot be converted directly to a line stroke in PostScript, so the PostScript that is generated from MetaPost produces a filled path, rather than a line. The MetaPost code below draws a curve between three points, with a different pen width and angle specified at each point, but this produces PostScript that is a filled path rather than a stroked line (as can be seen in the final line of the PostScript code that is shown below the image).

  prologues := 3;
  outputtemplate := "%j.ps";
  beginfig(1);
  z1 = (0, 0);
  z2 = (50, 50);
  z3 = (100, 0);
  penpos1(5, 0);
  penpos2(10, 90);
  penpos3(20, 0);
  penstroke z1e..z2e..z3e;
  endfig;
  end

  %!PS-Adobe-3.0 EPSF-3.0
  %%BoundingBox: -3 0 111 56 
  %%HiResBoundingBox: -2.5 0 110.00775 55.29723 
  %%Creator: MetaPost 1.999
  %%CreationDate: 2021.01.26:2337
  %%Pages: 1
  %%DocumentResources: procset mpost-minimal
  %%DocumentSuppliedResources: procset mpost-minimal
  %%EndComments
  %%BeginProlog
  %%BeginResource: procset mpost-minimal
  /bd{bind def}bind def/fshow {exch findfont exch scalefont setfont show}bd
  /fcp{findfont dup length dict begin{1 index/FID ne{def}{pop pop}ifelse}forall}bd
  /fmc{FontMatrix dup length array copy dup dup}bd/fmd{/FontMatrix exch def}bd
  /Amul{4 -1 roll exch mul 1000 div}bd/ExtendFont{fmc 0 get Amul 0 exch put fmd}bd
  /ScaleFont{dup fmc 0 get Amul 0 exch put dup dup 3 get Amul 3 exch put fmd}bd
  /SlantFont{fmc 2 get dup 0 eq{pop 1}if Amul FontMatrix 0 get mul 2 exch put fmd}bd
  %%EndResource
  %%EndProlog
  %%BeginSetup
  %%EndSetup
  %%Page: 1 1
   0 0 0 setrgbcolor
  newpath -2.5 0 moveto
  -1.61581 27.50935 22.67906 48.33354 50 45 curveto
  72.76758 42.22205 89.9105 22.93625 90 0 curveto
  110 0 lineto
  110.54294 32.6178 82.44759 58.37186 50 55 curveto
  22.11862 52.10265 1.30803 28.00616 2.5 0 curveto
   closepath fill
  showpage
  %%EOF

This is also how the 'vwline' package (Murrell, 2017) for drawing variable-width lines in R works; it generates paths to fill rather than lines to stroke.

Where possible, it is therefore better to be able to convert more directly between higher-level languages. This is what, for example, the 'tikzDevice' (Sharpsteen and Bracken, 2018) does by converting R graphics output into PGF/TikZ graphics. Another example, is the conversion that the 'gridGraphics' package (Murrell and Wen, 2018) performs between the 'graphics' system and the 'grid' system (entirely within R).

The special nature of PostScript

Importing PostScript is a special case because the PostScript language is Turing-complete, which means that we can write complex programs in PostScript. The 'grImport' package takes advantage of this by writing its own PostScript code to process PostScript images.

Although a more modern graphics language like SVG has more advanced graphics features than PostScript (e.g., filters, semi-transparency, and animation), there is no facility in the SVG language itself that can assist with, for example, converting an SVG line into an SVG path.

This means that, although the 'grImport2' package (Potter, 2018), which can import (a subset of) SVG images into R, is useful for importing images that have features that cannot appear in a PostScript image, we cannot import SVG images that contain skewed lines using 'grImport2' because the transformation from stroked line to filled path that 'grImport' can achieve with PostScript images is not available to 'grImport2'.

Having your cake and eating it

The original problem with importing skewed lines into R is based on the fact that R graphics does not allow coordinate transformations to impact on text and line styles. However, although this restriction within R graphics is justified, it is not absolutely necessary. The PGF/TikZ graphics system has identified the same problem, but has a more sophisticated solution, which involves maintaining more than one set of coordinate transformations. In PGF/TikZ, there is a "canvas" transformation that acts like PostScript, affecting all graphics (including skewed lines if scaling transformations are unequal), and a "coordinate" transformation that only affects locations, like in R graphics. Managing the two transformations coherently requires care, but this does allow a greater expressiveness in PGF/TikZ graphics.

Another example of a sophisticated solution to this problem is the SVG 2 Candidate Recommendation (Schulze et al., 2018), which includes a 'vector-effect' attribute. This allows different parts of the SVG transformation matrix to be ignored. For example, it is possible to specify that part of an SVG image will ignore the current rotation and/or scaling within an image.

7. Summary

R graphics, unlike general-purpose graphics languages such as PostScript, only applies coordinate system transformations selectively. This makes it difficult to import some PostScript images. Version 0.9-2 of the 'grImport' package at least partially solves this problem by converting lines to filled paths when the coordinate transformation involves unequal scaling in the x- and y-dimensions. As a bonus, this also provides a way to successfully import PostScript images that contain stroked paths with a fancy dash pattern. One example application of this new facility is the import into R of images that were produced by the MetaPost language.

8. Technical requirements

The examples and discussion in this document relate to version 0.9-2 of the 'grImport' package.

This report was generated within a Docker container (see Resources section below).

9. Resources

How to cite this document

Murrell, P. (2018). "Importing General-Purpose Graphics in R" Technical Report 2018-09, Department of Statistics, The University of Auckland. [ bib ]

Importing General-Purpose Graphics in R

Table of Contents: