by Paul Murrell http://orcid.org/0000-0002-3224-8858
Version 1:
This document
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.
This report describes the 'gggrid' package, which provides a convenient interface for making use of raw 'grid' functions in combination with 'ggplot2'.
The 'gggrid' package provides two functions,
grid_panel()
and grid_group()
,
both of which create a new layer in a 'ggplot2' plot.
The first argument to both functions is a 'grid' grob or a function that
generates a grob and that grob is added to the plot region
of the 'ggplot2' plot.
For example, the following code adds a rectangle filled with a semitransparent radial gradient to a 'ggplot2' plot.
The 'grid' package for R (R Core Team, 2019)
provides low-level graphics functions
for arranging and drawing basic shapes. One useful feature
is the ability to specify the location of drawing using
a combination of coordinate systems. For example,
the following code describes a text label with its top-right corner
exactly 5mm in from the top-right corner of wherever it is drawn.
The y
value unit(1, "npc") - unit(5, "mm")
is how we can say "5mm down from the top" in 'grid'.
In the following code, we draw a rectangle and then add the text label. This image is embellished with red lines to show the the boundary of the text and the offset of the text from the top-right corner of the image.
The aim of this report is to explore how we can access raw 'grid' features like this in combination with data visualisations drawn by the 'ggplot2' package (Wickham, 2016).
We will spend some time in the remainder of this section establishing why there is a problem to solve, then the following section will describe a solution: The 'gggrid' package.
Suppose we want to add a text label a precise distance in from the top-right corner of a 'ggplot2' plot. For example, in the plot below, the text "Label" is exactly 5mm in from the top-right corner of the plot region.
This is not an easy result to produce in 'ggplot2' with standard geoms.
If we use geom_text()
, we must position the
text relative to the scales on the plot. For example,
we could easily place the text at the y-location 30,
but calculating
"5mm from the top of the plot region" in terms of the y-axis
scale is not at all straightforward.
Even the non-standard
annotate()
function has the same problem;
the position of the text still has to be
in terms of the scales on the plot.
It would be nice to be able to draw in 'ggplot2' relative to coordinate systems other than the data coordinate system within the plot region.
Yes, this could be done with
annotation_custom()
, but we will conveniently ignore that fact
until later when we can explain the problems with that approach.
The 'lattice' package (Sarkar, 2008) is another package that, like 'ggplot2', uses 'grid' to draw its high-level plots. Can we produce a simple label annotation in 'lattice'?
In 'lattice', we can customise a plot by defining a "panel function", which is a function that gets called to draw the contents of the plot region. We are allowed to call any code within the panel function, including 'grid' code, so the label annotation is straightforward in 'lattice'.
In the code below, we draw a 'lattice' plot with a panel function that draws the text label we defined earlier.
It would be nice to have the equivalent of a 'lattice' "panel function" in 'ggplot2'.
One nice thing about a 'lattice' panel function is that it provides access to the useful work that 'lattice' does, including splitting the data into groups and panels and setting up useful coordinate systems. The panel function is run within the context of the plot region, which is a 'grid' viewport with the appropriate scales, and the panel function is provided with the data that is to be drawn within the plot region.
For example, the following code adds a label at a fixed position with a line to one of the data points in the 'lattice' plot. This makes use of a combination of the data values that are passed to the panel function, the coordinate system that is in place when the panel function is run, and absolute 'grid' positioning.
It would be nice if our 'ggplot2' panel function was able to take advantage of the useful work that 'ggplot2' does, including splitting up the data and setting up useful coordinate systems.
One of the reasons why 'ggplot2' is successful is because it offers a clear paradigm or philosophy for how to construct a data visualisation, based on Leland Wilkinson's Grammar of Graphics (Wilkinson, 2005). Ideas like "geoms", "aesthetics", "stats", and "coords" are central to this paradigm, but 'grid' concepts like "units" and "viewports" are not. The 'grid' concepts may also be hidden away in 'ggplot2' because they are perceived to be too awkward or complex.
However, if we are already familiar with 'grid', rigid adherence to the 'ggplot2' paradigm can sometimes mean that some things are harder or more awkward than necessary.
It would be nice to have full access from 'ggplot2' to raw 'grid', red in tooth and claw.
For those intimately familiar with 'grid', there is a post-hoc way
to work with a 'ggplot2' plot.
The following code demonstrates the approach:
having drawn a 'ggplot2' plot, we call the 'grid' function
grid.force()
to get access to all of the 'grid'
grobs and viewports that 'ggplot2' created, then we can navigate to the
'grid' viewport that 'ggplot2' created for the plot region
using downViewport()
, and then we can draw the
text label that we defined earlier
within that context.
It could reasonably be argued that this approach requires
deeper knowledge of 'grid' and also of 'ggplot2' than most people have or
even would like to have. The grid.force()
function and the viewport name "panel.7-5-7-5"
are
not very self-explanatory.
Furthermore, the viewport that 'ggplot2' has created does not have scales relevant to the data, so we cannot add data-based drawing. The scales on the viewport that 'ggplot2' created are just 0 to 1, not the scales that the axes show.
This means that, with this post-hoc approach, although we have full access to 'grid', we cannot, for example, draw shapes relative to the plot scales, like we did in the second 'lattice' panel function example above.
It would be nice to have access to 'grid' and access to the 'ggplot2' context at the same time.
The 'ggplot2' package does actually allow us to specify raw 'grid'
grobs in some specific cases.
For example, the annotation_custom()
function
allows any 'grid' grob to be added to the plot.
However, this access to 'grid' is limited.
For example, the single grob that is passed to
annotation_custom()
is drawn in every
panel of a facetted plot, it is positioned within a region
that is defined in terms of the plot scales,
and it has no access to the
aesthetic mappings for the plot.
The following code shows how our simple label could be
added using annotation_custom()
.
That is simple enough, but what if we use facetting? All we can get is the same grob (in the same place) in every panel.
Furthermore, as with the post-hoc approach, we do not have access to the 'ggplot2' coordinate system, so we cannot draw a grob relative to the axis scales.
Did I mention that it would be nice to have access to 'grid' and access to the 'ggplot2' context at the same time? I want it all!
There are two functions that I know of that provide a variation
on ggplot2::annotation_custom()
:
egg::geom_custom()
and ggpmisc::geom_grob()
.
The reasons why these are still not what I want are left
to the Discussion.
Although 'ggplot2' uses basic 'grid' shapes to draw its Geoms, there are some 'grid' shapes, or 'grid'-based packages, that are not accessible from 'ggplot2'. For example, the 'vwline' package (Murrell, 2019b) draws variable-width lines and the 'gridGeometry' package (Murrell, 2019a) provides constructive geometry operations on 'grid' grobs.
Users are currently dependent on a developer creating a Geom interface in order to use the full range of 'grid'-based shapes in a 'ggplot2' plot.
It would be nice to have instant access to ALL 'grid' shapes within a 'ggplot2' plot.
I did find some mentions online of a github package 'ggvwline' by Houyun Huang, but the links were all stale.
The idea behind the 'gggrid' package is to allow the user to compose a data visualisation from a combination of 'ggplot2' output and 'grid' output. The user should be able to make use of the advantages of 'ggplot2' where that makes sense, e.g., to describe the essential structure of a complex image at a high level, and at the same time make use of the advantages of 'grid' where that makes sense, e.g., to specify precise locations relative to a range of coordinate systems.
There are two functions in the 'gggrid' package:
grid_panel()
and grid_group()
.
Both functions add a new layer to a 'ggplot2' plot,
but they deliberately do not follow the typical naming scheme
of geom_*
or stat_*
because
these functions are not trying to strictly adhere to the
'ggplot2' paradigm; they add raw 'grid' drawing to
a 'ggplot2' plot.
In the simplest case, we call grid_panel()
with a 'grid' grob as the only argument. For example,
the following code produces our precisely positioned label
from the very beginning of the report.
The 'grid' grob can be more complex than a simple shape. For example, the following code adds a gTree that draws a combination of a rectangle and a text label, both within a new 'grid' viewport that is pushed within the current 'ggplot2' plot region.
It is important to note that we already have some access to the
'ggplot2' context, even when we only provide a fixed grob to
grid_panel
; the grob is drawn within the 'grid'
viewport that represents the 'ggplot2' plot region.
For example, the following code simply adds an empty 'grid' rectangle
(with a thick border). This is by default the same size as the
viewport it is drawn within, so we see a rectangle around the 'ggplot2'
plot region.
As we saw earlier, the scales on the
'grid' viewport that 'ggplot2' creates for
the plot region do not reflect the axis scales,
so when we just call grid_panel()
with a grob
as the first argument,
we do not have full access to the 'ggplot2' context for
the plot region.
However, if we provide a function as the first argument to
grid_panel()
, that function is passed
the data
and the coords
(the transformed
data) for the plot, which provides us with enough information
to start writing "panel functions" in the sense of the 'lattice'
package.
As a simple example, the following code calls grid_panel()
with a function that generates a 'grid' grob based on the largest
and smallest data values. The coords
are values
that have already been transformed to the plot scales, so we can
just draw a
'grid' rectangle around the minimum and maximum of those values.
It is important to note that the columns names for the
data
and coords
that are passed to rectFun()
in the above example
come from the
aesthetic mappings in the 'ggplot2' plot.
For example, in this case, we have mapped
the disp
column of the mtcars
data set
to the x
aesthetic, so both data
and
coords
have a column named x
.
Both grid_panel()
and grid_group()
provide a debug
argument that can be a function
and this can be used to inspect the values that are being
passed to the grob
function.
The following code provides a slightly more complex example that demonstrates the combination of 'ggplot2' context (the data values) and raw 'grid'. In this case, we are drawing a "rug" of short lines at the right edge of the plot. On one hand, the y-location of the lines are based on the 'ggplot2' data, but on the other hand, the length of the lines (2mm) is specified using 'grid' units.
Yes, there is a geom_rug()
, but this is a nice
practical and easy-to-understand example to start with.
Later examples will go to some places that 'ggplot2' cannot
currently go.
Another advantage of specifying a grob
function
is that it is evaluated for each panel.
The following code demonstrates this by adding facetting
(but just reusing the rug()
function);
we get a (different) rug added to each panel.
This rule also holds for the grid_group()
function.
If there are distinct groups being drawn in the plot,
grid_group()
will call the grob
function for each group.
For example, the following code produces a 'ggplot2' plot
with two groups of points differentiated by colour
.
We call grid_group()
with a new "rug" grob
function that colours the short lines for each group based on the
colour used for the data.
In addition to data values and transformed data values,
the grid_panel()
and grid_group()
functions have access to any variables that are calculated by
'ggplot2' "stats". For example, the following code
shows the variables that are available if we use
stat_smooth
.
This allows us to add 'grid' drawing based on "stat" output,
as shown below. In this case, we add a label
parallel to the smooth line and we calculate the angle of the
text using the x
and y
values
that come from the "stat" smooth.
We can also use aesthetic mappings to pass additional information
to grid_panel
. For example, in the following code
we make sure that the vehicle names are included in the
data
that are passed to the grob-generating function,
nameVehicle()
,
by specifying aes(label=name)
.
The nameVehicle()
function uses this information,
along with the default data
and coords
values, to label the most efficient vehicle.
We use facetting to emphasise that the calculations occur
for each panel.
This example also highlights the fact that 'gggrid' is not playing by all of the 'ggplot2' rules because it generates a warning about unknown aesthetics.
The next example demonstrates the idea of accessing 'grid'-based
drawing that does not (yet) have a 'ggplot2' Geom interface.
The following code makes use of the 'vwline' package to draw
a variation on Minard's famous map. The grid_group()
function is useful here because there is no existing 'ggplot2' Geom
interface to the 'vwline' package,
so we need direct access to the raw 'grid'-based function
vwlineGrob()
.
This example is also interesting because there are no
geom_*()
calls; grid_panel()
is the only layer in the plot.
Furthermore, we make good use of the 'ggplot2' infrastructure
to set up the coordinate system for the plot, using
coord_fixed()
, and the mapping from the number
of survivors to the line width, using scale_size()
.
This makes our grob-generating function, path()
,
quite straightforward.
The data for this example comes from the supplementary materials published with Wickham, 2010.
The final example demonstrates a combination of 'gggrid' and post-hoc editing of 'ggplot2' plots. This example makes use of a lot of raw 'grid' tools and techniques, so requires a little more explanation.
First, as part of the main 'ggplot2' plot,
we call grid_panel()
just to add
a "null" grob at the location of the data symbol representing
the highest mpg
(in each panel),
but we do not immediately draw the 'ggplot2' plot.
Instead, we push a 'grid' viewport that leaves a gap at the
top of the page and draw the 'ggplot2' plot in the remainder of the
page. We define a text grob and draw that in the gap at the top
of the page.
We then call grid.force()
to make the 'grid'
grobs and viewports from the 'ggplot2' plot accessible and
we determine the "path"s to the "null" grob markers that
we drew on the 'ggplot2' plot (and to the viewports that
those markers were drawn within).
For each marker, we navigate down to the viewport that the
marker was drawn within, calculate the location of the marker
in terms of the entire page, navigate back up to the "root"
viewport (the whole page), and draw a curved line (with an arrow)
from the right edge
of the text label to the marker.
There is not a lot of code in the 'gggrid' package. The main contribution of the package is possibly just a change in mindset, to embrace the use of raw 'grid' in combination with 'ggplot2', rather than trying to avoid raw 'grid' as much as possible.
What the 'gggrid' package provides is full access to ALL 'grid' features, including units and viewports, gradient and pattern fills, and 'grid'-based drawing such as variable-width lines and constructive geometry.
Anyone who has developed a custom 'ggplot2' Geom may have recognised
that all of the examples could have been achieved by
creating a special Geom every time
instead of using grid_panel()
or
grid_group()
.
Nevertheless, 'gggrid' saves on quite a bit of typing.
In effect, 'gggrid' allows us to develop a new 'ggplot2' Geom
on-the-fly (while flouting some of the normal rules, like
having to formally declare the aesthetics that our Geom supports).
From this perspective, 'gggrid' may provide a useful intermediary
between naive 'ggplot2' user and hard-core 'ggplot2' Geom
developer.
There are two packages with functions that allow raw 'grid' grobs
to be added to 'ggplot2' plots:
The geom_custom()
function from the 'egg' package
(Auguie, 2019)
and geom_grob()
from the 'ggpmisc' package
(Aphalo, 2021).
The geom_custom()
function allows the user to
provide a grob_fun
argument, which is
a function that generates a grob,
similar to providing a function to grid_panel()
.
However, with geom_custom()
,
that grob_fun
function is called once for each row of the
data set being plotted
(and the x
and y
components
of the resulting grob are then set based on the data set).
Furthermore, geom_custom()
requires a data
aesthetic (i.e., a data
column within the data set),
that provides the data values that are sent to the grob_fun
.
This interface is designed specifically for adding a grob for each row
of the data set and it is both awkward
for simpler tasks, such as adding a single
label, and restrictive for more complex tasks,
such as adding a single label using calculations based on the
entire data set.
The geom_grob()
function from the 'ggpmisc' package
requires the user to
provide a column of 'grid' grobs in the data set.
Each of these grobs is then drawn within a viewport that is
based on the x
and y
aesthetics in the data set.
Again, the design is aimed at drawing a grob for each row of
the data set and again it makes simpler tasks awkward and
more complex tasks quite difficult.
In both cases, the functions that allow 'grid' grobs to be added to a 'ggplot2' plot appear to be constrained by their conformance to the 'ggplot2' philosophy. The 'gggrid' package deliberately ignores parts of the standard 'ggplot2' approach in order to provide unfettered access to 'grid'.
The philosophy of the 'ggplot2' package has no room for some important 'grid' concepts, like units and viewports. This means that some things are harder to do than they need to be. The 'gggrid' package offers the opportunity to break with the orthodoxy in order to make some things easier to draw.
The goal of 'gggrid' is both to make it easy to perform simple
'grid' drawing and to make it possible to perform more complex
'grid' drawing, with full access to both 'grid' and
the 'ggplot2' context.
Simple tasks are made easy by providing a single grob as the first argument
to grid_panel()
, in which case the 'ggplot2'
data are entirely ignored, though drawing still occurs in the 'ggplot2'
plot region.
Complex tasks are made possible by providing a function as the
first argument to grid_panel
, in which case the 'ggplot2'
data are available to base drawing on, as well as all 'ggplot2'
aesthetic mappings, calculated values from "stats", and
scale and coordinate transformations.
The examples and discussion in this document relate to version 0.1-0 of the 'gggrid' package. Some examples also require R version 4.1.0 or later.
This report was generated within a Docker container (see the Resources section below).
Murrell, P. (2021). "Accessing 'grid' from 'ggplot2'" Technical Report 2021-01, Department of Statistics, The University of Auckland. version 1. [ bib | DOI | http ]
This document
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.