LaTeX Typesetting in R: The 'xdvir' Package

by Paul Murrell http://orcid.org/0000-0002-3224-8858

Version 1:

This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.

This document describes some lower-level, technical details of the CRAN package 'xdvir' for rendering LaTeX fragments as labels and annotations on R plots.

1. Introduction

The 'xdvir' package for R (Murrell, 2025) allows LaTeX fragments to be used as labels and annotations on R plots, including 'ggplot2' plots. The package vignette describes basic high-level usage and an article submitted to the R Journal explains more about the standard usage of the package and has more complex examples. The purpose of this document is to provide a description of some more advanced features of the package, including ways to extend the package, and to provide a record of the lower-level design and implementation of the package.

The package startup detects and reports on which TeX engines are available on the current system. This report is built on a system with XeTeX and an acceptable version of LuaTeX, so both of those engines are available. This means that the default engine for building this report is LuaTeX.

Rendering a LaTeX fragment in R involves three steps:

Authoring: The LaTeX fragment has to be augmented to create a complete LaTeX document.
Typesetting: The LaTeX document has to be typeset to create a file of DVI operations and that DVI file has to be read into an R data structure.
Rendering: The DVI data structure has to be turned into R graphics drawing operations.

This document begins by taking a closer look at each of those three steps.

The high-level design of the xdvir package.

2. Authoring

The author() function turns a LaTeX fragment into a LaTeX document. This consists of adding additional LaTeX code around the LaTeX fragment.

There are some obvious additions, like \begin{document} and \end{document}.

The \documentclass is always standalone. This is partly because that seems sensible - the LaTeX fragments are not complete documents, but typically small labels to be added to other drawing - but also because that will not produce, for example, any page numbering or headers and footers that you normally get from something like the standard article class.

The varwidth option is set for the standalone class. This is not always necessary and it may actually cause problems if the width of the label exceeds the text width of a standard article. However, the width argument to author() can be used to specify an explicit width if that is convenient and/or if that is necessary. This uses the varwidth option to set an explicit width. If given, the width is interpreted as a number of inches.

The \usepackage{unicode-math} is there to force the TeX engine to use TrueType math fonts and to produce font file paths in DVI output. Without this, you tend to get fnt_def operations in the DVI output of the form cmmi10. Those not only require a potentially complicated further mapping to get to actual font files, but they resolve to Type 1 fonts. That is a problem because the rendering of LaTeX fragments in R requires a graphics device that has support for rendering glyphs and the main set of those is the Cairo devices and they no longer support Type 1 fonts.

The comment on the first line records information about the version of 'xdvir', the TeX engine that was used to author the document, and the LaTeX packages that were specified. As noted previously, the engine defaults on package startup. This information is used in the typesetting step to check that compatible TeX engines and packages were used in the authoring step.

We can use the engine and packages arguments to author() to explicitly set those values. Notice that the addition of packages="xcolor" adds a new \usepackage line and it adds to the comment on the first line. The engine just turns up in the first line of comments.

Some packages will add more lines. For example, the preview package adds further LaTeX code to the document preamble and wraps the fragment within a preview environment.

Functions like grid.latex() and geom_latex() also add additional code to LaTeX fragments in order to match the current font family, font face, and font size. We can easily see this if we use the texFile argument to control the name of the file that 'xdvir' writes a full LaTeX document into. For example, the following code draws a simple LaTeX fragment and automatically matches the R default font family (rather than using the default LaTeX Computer Modern font) and writes the full LaTeX document to the file "test.tex". The font matching is implemented by adding \setmainfont and \fontsize commands to the LaTeX document (plus \usepackage{fontspec}).

Where user-level functions add a lot of additional code to the LaTeX fragment, like the examples above, there is a risk of conflicting with the LaTeX fragment and/or a risk of packages conflicting with each other. This is one possible reason for authoring a LaTeX document directly rather than relying on the author() function. However, care must be taken in that case to retain features like \usepackage{unicode-math} so that DVI output that is consumable by 'xdvir' is produced during the typesetting step.

In other words, here be dragons. On the positive side, I have yet to irretrievably bite myself with this problem.

3. Typesetting

The typeset() function processes a LaTeX document to a "DVI" object. The LaTeX document can be a "LaTeXdocument" object, as produced by author() or just a character vector (though the latter must be a complete LaTeX document, not just a LaTeX fragment).

The typeset() function generates a DVI file by writing the LaTeX document to a file and running a TeX engine to produce a DVI file. 'xdvir' makes use of latexmk() from the 'tinytex' package for this step, which automatically takes care of multiple runs of the TeX engine and even installing missing LaTeX packages in at least some cases.

The DVI file is read into R as a "DVI" object and the "DVI" object has a print method to show the contents of a DVI file in a human-readable format. Underneath, it is a list of "rawFormat" objects from the 'hexView' package, which are labelled blocks of raw bytes along with interpreted values (integers, strings, etc). For example, the first element of the dvi object is a DVI pre operation, which consists of a one-byte integer operation code (247), followed by a one-byte integer DVI version number (2), and so on.

Notice that typeset() embeds a "signature" in the pre operation of the DVI file that records information about the 'xdvir' version, the TeX engine used, and any LaTeX packages used. This information is used in the rendering step to check that compatible TeX engines and packages are used in each step. Warnings are given if information is missing or inconsistent.

There is an engine argument to allow the TeX engine to be specified, but it will default from information in the LaTeX document if that exists.

Debugging

The typesetting step is probably where most user problems will occur, most probably because the LaTeX fragment (or the LaTeX document) contains an error, so the call to a TeX engine fails. For example, the following code adds a $ to the previous LaTeX fragment to produce an illegal LaTeX fragment.

When we build a complete LaTeX document and attempt to typeset it, we get an error from the TeX engine.

The texFile argument is useful in cases like this because the .log file will be in the same location as the texFile, so we can easily see what the problem is. Using texFile is also a good way to easily get hold of the LaTeX document and run a TeX engine on it outside of R to debug the problem.

Caching

'xdvir' automatically caches typsetting results. If the combination of LaTeX document, TeX engine, and LaTeX packages has been seen before then a cached "DVI" object will be used. Ths means that the expensive typesetting step should only occur once per unique TeX fragment (per session).

For debugging problems, it can be useful to turn off caching with options(xdvir.useDVIcache=FALSE).

Manual typesetting

Because typeset() accepts a simple character vector, it is easy to author a LaTeX document outside of R and just readLines() that document to pass it to typeset.

It is also possible to perform the typesetting outside of R and just read a DVI file in using readDVI(). The important thing here is to make sure that you produce a DVI file, e.g., xelatex --no-pdf or lualatex --output-format=dvi.

4. Rendering

The render() function renders a "DVI" object. This is an alias for grid.dvi() - the rendering happens within the current 'grid' viewport.

Rendering consists of two steps internally:

We walk the list of DVI operations and build a list of objects, mostly either "XDVIRglyphObjglyph" objects, which describe glyphs, the fonts they come from, and where to draw them, or "XDVIRruleObj" objects, which correspond to rectangles to draw (the only core TeX drawing operation).

This step also resolves font definitions and gathers bounding box information, which is vital for sizing and positioning the grobs that are produced in the second step.
We walk the list of objects and build a gTree of grobs, mostly either "glyphgrob"s, "rect" grobs, or "segments" grobs. Segments are used for very thin rectangles because very thin filled rectangles tend to disappear on raster devices, including on screen.

Consecutive glyph operations in the DVI file are collapsed into a single "glyphgrob", but we may end up with multiple "glyphgrob"s if, for example, a mathematical equation includes a horizontal line. In that case, in order to preserve the order of drawing, we end up with a "glyphgrob" for one or more glyphs, before a "rect" for the line, then another "glyphgrob" for one or more further glyphs.

Justification of the rendering is complicated because we potentially need to justify a combination of multiple "glyphgrob"s, plus lines and/or rectangles. The solution is to generate common "anchors" for all "glyphgrob"s based on the bounding box for the rendering and to calculate an "offset" for lines and rectangles based on how much a "dummy" glyph positioned at the bottom-left of the bounding box would have to move to satisfy the justification.

A margin and rotation can be specified for rendering and these are implemented via viewports on the overall gTree.

Packages, engines, and font libraries

It is possible to specify the packages and engine for rendering, though they will be taken from the "DVI" object if possible. Checks are made for inconsistency between the user specification and the signature in the "DVI" object, if there is one. Warnings are given if information is missing or inconsistent.

It is also possible to specify a font library, which is used to get glyph metrics (for calculating glyph positions and the overall bounding box). There is a default font library based on FreeType (The FreeType Project, 2025) and no other option is provided (on CRAN). Section Font libraries has more information.

Vectorisation

The render() function is vectorised. The dvi argument to render() can be a list of "DVI" objects, which can each be drawn at different (x, y) locations, at different rotations, and with different hjust and vjust justifications. The gp settings are also vectorised and applied to the corresponding "DVI" object. On the other hand, the packages, engine, and fontLib are fixed for all "DVI" objects in a single render() call.

Resolution

By default, DVI positions and font metrics are used at maximum resolution (typically 72*2^16 scaled points per inch and 72*10*1000 units per inch, where 10 is the font point size, respectively) and R uses its default naive infinite resolution calculations to position glyphs and to determine dimensions, such as the bounding box.

However, it is also possible to specify a dpi resolution, in which case glyph locations and font dimensions are rounded to the nearest "pixel". This uses an algorithm that produces identical results to dvitype (Knuth, 1995) at multiple resolutions on an identical DVI file (see tests/dpi.R).

Using dpi may make sense if you know the dpi of the target graphics device, though I have yet to see convincing evidence of improvement. This may be because the dpi rounding does not guarantee alignment with pixels on the device because there are subsequent calculations on the rounded dpi locations to satisfy justification.

Multiple pages

It is possible for a DVI file to contain multiple pages. While this is unlikely to be the typical case for LaTeX fragments, it could still occur if a LaTeX document is generated outside of R.

In such a case, readDVI() will read all pages into a "DVI" object, but only the operations on the page specified to render() will generate objects and operations on other pages will be ignored.

To be more accurate, some operations, such as positioning and setting of fonts will be listened to on all pages in case settings carry over across pages, but those operations will not generate any objects themselves.

Delayed rendering

The creation of "glyphgrob"s is delayed, in the sense of happening within a makeContent() method, because the calculation of anchors and offsets for justification has to occur at drawing time, once the rendering viewport is known. This means that grob queries, such as grobWidth() incur a penalty because they have to generate final grobs (though not necessarily "glyphgrob"s).

There can be a further layer of delay when calling grid.latex() because the author step has to occur at drawing time if the width requires converting to inches and/or the current font has to be queried (so we need to know the final rendering viewport). This delay does not occur if width is not specified and/or it is in absolute units and gp is set to NULL (so the "current" font settings are ignored). This additional delay results in additional penalties for grob queries.

5. Packages

'xdvir' provides predefined support for several LaTeX packages: fontspec, xcolor, preview, zref, and tikz (see the TikZ Section).

There are obviously many more LaTeX packages that users might want to make use of in their LaTeX fragments. One approach is to write an entire LaTeX document by hand, which means you can include whatever packages you need. However, this is not the most convenient and will not work, for example, with the 'ggplot2' interfaces geom_latex() and element_latex().

So there is a mechanism for defining support for additional LaTeX packages in the form of the LaTeXpackage() function (plus the registerPackage() function).

The name argument provides convenience. If the package is registered, then it can be referred to by this name (which is what previous examples have done with predefined packages).

The preamble argument is LaTeX code that will be placed in the LaTeX document preamble in the authoring step. This is often just a \usepackage command, but the preview package example earlier showed that other LaTeX code can be included.

The prefix and suffix arguments are LaTeX code that is wrapped around the LaTeX fragment in the authoring step. A typical use would be to wrap the LaTeX fragment in a LaTeX environment (like the preview package does).

The special argument is a function. This is useful for packages that generate DVI specials, like xcolor. The following code authors and typesets a simple LaTeX fragment that draws text in red. The resulting DVI output contains DVI specials, which are marked as xxx1 operations. These have been produced by the xcolor package and contain information about changes in colour. For example color push gray 0 sets the text colour to black and color push rgb 1 0 0 sets it to red.

The rendering step calls the special function (if not NULL) for every registered package, providing the content of the special (e.g., color push gray 0) as a character value, plus a state object. The latter is just an environment that can be used to maintain state variables during the walk of DVI operations. For example, the special function for the xcolor package (in 'xdvir') maintains a stack of colours based on specials that start with color (and ignores all other specials).

The init and final arguments are also functions that are called at the start and end of walking the DVI operations. These are passed a state object and allow initialisation of state variables and anything else the package wants to do.

In summary, the LaTeXpackage() function allows the user to specify LaTeX code that will be added to a LaTeX fragment during the authoring step and a function to handle specials during the rendering step that the LaTeX package generated in DVI output during the typesetting step.

The registerPackage() function is used to register the package with 'xdvir' so that it can be referred to by name.

6. TikZ

The support for the tikz package in 'xdvir' is several orders of magnitude more complex than the support for other packages, so deserves its own mention.

The LaTeX package TikZ (Tantau, 2013) provides a sophisticated graphics system within the LaTeX world. It generates graphical output by producing specials in the DVI output. For example, consider the TikZ fragment below (it is constructed in a slightly odd way to avoid 'knitr' errors). This describes a simple TikZ diagram that draws an arrow from "a" to "b".

The following code augments that to a full LaTeX document, including additional LaTeX code to load the TikZ package and wrap the fragment within a tikzpicture environment. The \def\pgfsysdriver command is very important because this provides a backend for the TikZ package that outputs TikZ specials designed for consumption by 'xdvir'. Without that command, TikZ would output specials consisting of PDF commands. This is another example of a detail that would have to be replicated if we author a complete LaTeX document (that included a TikZ picture) manually for use with 'xdvir'.

Typesetting this document generates a large number of special operations in the DVI output, all starting with xdvir-tikz:: so that the special function for the tikz package in 'xdvir' will identify them and handle them.

The special function for the tikz package in 'xdvir' calls a large number of other functions to maintain a lot of state to keep track of where drawing needs to occur. It also generates a number of special "XDVIRtikz*" objects (that are not glyph or rule objects that normal DVI operations produce) and there are functions to convert those special tikz objects into 'grid' grobs (in the second stage of the rendering step).

At the high level, all of this internal activity is hidden and we can just call grid.latex() with the TikZ fragment and make sure to load the tikz package support, as shown below.

7. Engines

The 'xdvir' package has predefined support for the XeTeX engine and the LuaTeX engine, though it insists on a recent version of the latter so that DVI output contains font file paths in font definitions and glyph indices in set_char operations.

This means that 'xdvir' cannot work with DVI output that is generated by, for example, pdfTeX or upTeX, because those engines generate DVI output with font definitions that require resolving more complex font file mappings and set_char operations that use a variety of different encodings.

The door is left ajar for future development of support for other engines by providing the TeXengine() function (and the registerEngine() function).

The name argument provides an easy way to refer to the engine if it is registered with 'xdvir'.

The version argument is a function (with no arguments) that should return the version of the engine. This is used to construct comments in the authoring step and signatures in the typesetting step to check for consistency between steps.

The command is a character value that gives the engine command (e.g., "xelatex") and options are any additional options required to produce DVI output (e.g., "--no-pdf"). dviSuffix gives the suffix used for DVI output files (e.g., .xdv).

isEngine is a function that is called with a DVI object and should return whether the engine produced that DVI object (used to help check for consistency).

The preamble is LaTeX code that should be added in the authoring step.

The fontFile and glyphIndex arguments are functions called during the rendering step to convert DVI operations into objects. fontFile is called with the content of a DVI font definition (as a character value) and should return a font file path. glyphIndex is called with the content of a set_char operation (as a set of raw bytes) and should return an integer glyph index.

The implementation of fontFile and glyphIndex for XeTeX and (acceptable) LuaTeX is relatively straightforward, but these functions offer a possible path for other engine support, even if that support may require much more complex implementations.

8. Font libraries

The 'xdvir' package relies on FreeType to access glyph metrics (for glyph placement and for bounding box calculations). The FreeType support is defined via an internal FontLibrary() function that requires three functions to return glyph metric information. That function is not currently exported, but 'xdvir' internally defines a different font library based on TTX (FontTools Project, 2025), which can be useful for debugging.

9. Discussion

The 'xdvir' package provides a convenient high-level interface for using LaTeX fragments as plot labels and annotations in R. This document describes some of the details and internal design of the package, which may be helpful for diagnosing problems when things go wrong and possibly for extending the package.

Limitations

Rendering is slow. The typesetting step requires at least one run of a TeX engine. Caching helps the second time around, but a plot that contains multiple unique TeX fragments can be glacial.

As mentioned previously, 'xdvir' only supports XeTeX and LuaTeX only if luaotfload-tool verion is > 3.15.

'xdvir' depends on R >= 4.3 because it makes use of the glyph-rendering feature that was added in that version. This extends to requiring a graphics device that provides glyph-rendering, which currently includes pdf(), quartz(), the Cairo-based devices such as cairo_pdf() and png(type="cairo"), plus devices from the 'ragg' package (Pedersen and Shemanarev, 2024).

The PDF produced by pdf() does NOT embed the fonts, so viewers may show garbage glyphs. The embedGlyphs() function can be used to embed glyphs, but a little effort is required to extract the glyph information that embedGlyphs() requires from the 'grid' gTree that rendering produces.

Prior work

The 'xdvir' package is an evolution of the 'dvir' package (Murrell, 2018, Murrell, 2020b), which only exists on github. 'xdvir' has borrowed a lot of code from 'dvir' (especially for TikZ support; Murrell, 2020a), but streamlined some of the user interface, added integration with 'ggplot2', and massively simplified the handling of font definitions and glyph specifications in DVI output (by insisting that they contain font file paths and glyph indices, respectively). The 'xdvir' package also benefits from the glyph rendering support in R (from version 4.3), whereas 'dvir' had to struggle with the simplistic R text drawing of character values.

Related work

A number of R packages and other software perform a similar job to 'xdvir', but differ in various ways. The following list focuses on the negatives of these other approaches, but they almost all have the advantage of NOT having massive dependencies (like 'xdvir' does with its need for a TeX installation) and NOT being slow (like 'xdvir' is with its typesetting step).

'latex2exp' converts LaTeX (mathematical equation) fragments to R expressions, to be drawn using plotmath. This helps LaTeX users to write plotmath expressions, but the end result is plotmath, which produces an inferior result to real LaTeX. It also only supports LaTeX mathematical equations.
'tikzDevice' converts R graphics into LaTeX graphics. This is basically the inverse of 'xdvir'. If you want to end up in a LaTeX document, this might be the way to go. If you want to end up somewhere else, then 'xdvir' might be for you.
'marquee' typesets and renders markdown. The main difference is markdown rather than LaTeX. Markdown can be easier to type, but LaTeX gives more control.
There are some R packages that perform specific tasks, like 'geomtextpath' (text along a path) and 'ggpage' (paragraphs of text), but they tend to be much more limited in their scope.
In the Python world, there is a flag in matplotlib that switches all plot labelling to use LaTeX. This uses a similar approach to 'xdvir' by generating and reading DVI output. The main difference is where you want to produce your plots: Python/matplotlib versus R/ggplot2.
The dvi-decode javascript module renders DVI files in a web browser. It is quite general in theory, but faces challenges with resolving fonts. This is also not integrated with any plot-drawing library.
It is apparently possible to use LaTeX in plot titles (at least) in the COMSOL software, though this is a commercial product so I have not tried it myself.

10. Technical requirements

The examples and discussion in this report relate to R version 4.4.2 and 'xdvir' version 0.1-2.

This report was generated within a Docker container (see Resources section below).

11. Resources

The raw source file for this report, a valid XML transformation of the source file, a 'knitr' document generated from the XML file, two R files and the bibtex file that are used to generate the table of contents and reference sections, two XSL files and an R file that are used to transform the XML to the 'knitr' document, and a Makefile that contains code for the other transformations and coordinates everything. These materials are also available on github.
This report was generated within a Docker container. The Docker command to build the report is included in the Makefile above. The Docker image for the container is available from Docker Hub; alternatively, the image can be rebuilt from its Dockerfile.

How to cite this report

Murrell, P. (2025). "LaTeX Typesetting in R: The 'xdvir' Package" Technical Report 2025-01, Department of Statistics, The University of Auckland. Version 1. [ bib | DOI | http ]

12. References

[FontTools Project, 2025]: FontTools Project (2025). TTX: Convert Fonts to XML and Back. Accessed: 2025-03-04. [ bib | .html ]
[Knuth, 1995]: Knuth, D. E. (1995). The dvitype processor. [ bib | http ]
[Murrell, 2018]: Murrell, P. (2018). Revisiting mathematical equations in R: the 'dvir' package. Technical Report 2018-08, Department of Statistics, The University of Auckland. version 2. [ bib ]
[Murrell, 2020a]: Murrell, P. (2020a). Adding TikZ support to 'dvir'. Technical Report 2020-05, Department of Statistics, The University of Auckland. version 1. [ bib | DOI | http ]
[Murrell, 2020b]: Murrell, P. (2020b). The agony and the ecstacy: Adding LuaTeX support to 'dvir'. Technical Report 2020-02, Department of Statistics, The University of Auckland. version 1. [ bib | DOI | http ]
[Murrell, 2025]: Murrell, P. (2025). xdvir: Render 'LaTeX' in Plots. R package version 0.1-2. [ bib ]
[Murrell et al., 2023]: Murrell, P., Pedersen, T. L., and Urbanek, S. (2023). Rendering typeset glyphs in R graphics. Technical Report 2023-01, Department of Statistics, The University of Auckland. version 1. [ bib | DOI | http ]
[Pedersen and Shemanarev, 2024]: Pedersen, T. L. and Shemanarev, M. (2024). ragg: Graphic Devices Based on AGG. R package version 1.3.3. [ bib | http ]
[R Core Team, 2019]: R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [ bib | http ]
[Tantau, 2013]: Tantau, T. (2013). The TikZ and PGF Packages. [ bib | http ]
[The FreeType Project, 2025]: The FreeType Project (2025). Freetype 2. https://freetype.org/. Accessed: 2025-03-04. [ bib ]