by Paul Murrell
Wednesday 27 July 2016
The 'DOM' package for R provides functions to dynamically manipulate the content of a web page that is being viewed within a web browser. The package provides an R interface that is analogous to the DOM (Document Object Model) functions in javascript for manipulating the content of a web page (as defined by the W3C, WHATWG, and the Mozilla Developer Network).
library("DOM")
The function htmlPage
opens a new web page within
a browser window (or tab), optionally setting the initial content
for the page.
page <- htmlPage("<p>Hello world!</p>")
The htmlPage
function returns a unique identifier for the page, which
can be used in further calls to modify the page.
For example, the appendChild
function can be used
to add content and the removeChild
function can be
used to remove content.
appendChild(page, "<p>Goodbye world!</p>")
The code below removes the first <p> element from the document.
removeChild(page, "p")
The closePage
function should be used once
we have finished with a page.
closePage(page)
The full argument list for the appendChild
function is
shown below:
args(appendChild)
function (pageID, child = NULL, childRef = NULL, parentRef = "body", css = TRUE, async = !is.null(callback), callback = NULL, tag = getRequestID()) NULL
The first argument, pageID
, identifies the web page that
we are interacting with (as returned by htmlPage
).
The second argument, child
, can be used to specify HTML
code for the new child element, as shown in the example from the
previous section. In that example, we specified a <p> element
explicitly, but there are many R packages that can help us to generate
HTML code (e.g., 'XML',
'xtable', and
'htmltools').
The third argument to appendChild
,
called childRef
,
provides an alternative way to specify the child element.
This argument can be used to specify a CSS selector for
an existing element in the web page. If we use this argument, we
can move an element from one place to another within the web
page. For example, the following code creates a web page with
three paragraphs.
page <- htmlPage(paste("<p>Paragraph", 1:3, "</p>", collapse=""))
The following call to appendChild
moves the first paragraph to the end of
the page.
appendChild(page, childRef="p")
Exactly one of child
and childRef
must be specified. In the case of child
, the
value must be a single character value and it must describe a
single HTML element (though that element may have other elements
nested within it).
The use of both child
and childRef
arguments reflects the fact that, in the real DOM API,
arguments are pointers to HTML elements.
In the 'DOM' package, we do not have pointers to HTML elements;
R communicates with the browser by passing JSON objects back and
forth over a websocket (thanks to the
'httpuv' package).
If we need to specify an HTML element that is not yet part of the
web page, we use HTML code (as in the child
argument).
If we need to specify an HTML element that is already part of the
web page, we use a CSS selector (as in the childRef
argument).
The fourth argument to appendChild
is called
parentRef
. This is a CSS selector that specifies
the parent element that the child should be added to.
This argument defaults to "body"
, which means
that the child is added to the end of the web page.
The following code adds a <span> element to the page
as a child of the second <p> element.
appendChild(page, child="<span>text</span>", parentRef="p:nth-child(2)")
The fifth arguent to appendChild
is called
css
. This is a logical value specifying whether the
childRef
and parentRef
arguments should be
interpreted as CSS selectors (the default), or as XPath expressions.
The following code uses XPath expressions to move the <span>
element from its current position to
be a child of the third <p> element.
appendChild(page, childRef="//span", parentRef="//p[3]", css=FALSE)
The sixth argument to appendChild
is called async
.
This is a logical value specifying whether the call is asynchronous.
By default, R will block until the web browser has responded with the
result of the appendChild
request. This allows us to
write requests to the browser in the familiar imperative programming
style (do A, then do B, etc).
However, requests to the browser are
inherently asynchronous, so it is also possible to send a request to
the browser and then run subsequent R code without waiting for
a response from the browser.
We will see a use for this when we get to
the section on calling R from the browser.
The seventh argument
to appendChild
is called callback
. This
is either NULL
or an R function.
It can be used to
supply an R function that will be run once the web browser has responded
to the request.
In combination with async
, this can be used to execute
R code some unknown time in the future, whenever the web browser has
completed our request.
The value returned by the appendChild
function
is the HTML code for
the child that was appended. If we specify the child
as HTML code, the return value should be identical to that, as shown below.
appendChild(page, child="<p>Paragraph 4</p>")
[1] "<p>Paragraph 4</p>"
However, if we specify childRef
, this returns the HTML code
for the child that we moved. The code below moves the first paragraph
on the page to the end of the page and returns the HTML code for
the element that was moved.
appendChild(page, childRef="p")
[1] "<p>Paragraph 2 </p>"
There is also an appendChildCSS
function that returns
a CSS selector for the child that was appended (or moved).
In the following code, we add a new paragraph to the end of the
page and the return value provides us with a CSS selector to
identify that new element.
appendChildCSS(page, "<p>Paragraph 5</p>")
[1] ":nth-child(5)"
When we specify a child to move, with a CSS selector in
childRef
,
the return value gives the new position in the page,
not the old position, so the returned CSS selector will not be the same
as the childRef
.
Furthermore, the CSS selector that is returned is generated using the
css-selector-generator
javascript library, which attempts to
produce succinct CSS selectors, so it is difficult to predict
the format of the CSS selector result.
appendChildCSS(page, childRef="p")
[1] ":nth-child(5)"
The 'DOM' package has so far only implemented a tiny part of the DOM interface, but several of the most common operations are possible: adding elements, removing elements, replacing elements, selecting elements (by ID or tag or class), setting attributes, etc.
In each case, where it makes sense, arguments are provided to
allow HTML elements to be specified as either HTML code (for new
elements) or
CSS selectors (for existing elements).
For example, it is possible to use replaceChild
to replace an existing element with a new element ...
replaceChild(page, newChild="<p>Paragraph 6</p>", oldChildRef="p:nth-child(2)")
[1] "<p>Paragraph 4</p>"
... or to replace an existing element with another existing element ...
replaceChild(page, newChildRef="p:nth-child(3)", oldChildRef="p:nth-child(4)")
[1] "<p>Paragraph 5</p>"
Also, where it makes sense, there are function variations that
allow the return value to be either HTML code or CSS selectors.
For example, it is possible to get the results of
getElementsByTagNames
as HTML code ...
getElementsByTagName(page, "p")
[1] "<p>Paragraph 1 <span>text</span></p>" "<p>Paragraph 6</p>" [3] "<p>Paragraph 2 </p>" "<p>Paragraph 3 </p>"
... or as CSS selectors.
getElementsByTagNameCSS(page, "p")
[1] "body > :nth-child(1)" "body > :nth-child(2)" ":nth-child(3)" ":nth-child(4)"
In addition to interacting with a web page that is initialised from
R, with htmlPage
, it is possible to interact with a
web page that already exists.
The filePage
function can be used to open a web page
from the local filesystem.
page <- filePage(system.file("HTML", "RDOM.html", package="DOM"))
Adding and removing content works just like before.
removeChild(page, "p") appendChild(page, "<p>A new paragraph</p>")
The urlPage
function opens a web page from the given
URL (but only for the http:
protocol currently).
page <- urlPage("http://pmur002.neocities.org/index.html") removeChild(page, "p") appendChild(page, "<p>A new paragraph</p>")
These functions allow us to manipulate a web page that we did not create (or do not want to have to go to the effort of creating).
The extra complication with these functions is that they only work if, in addition to having the 'DOM' package installed for R, we have installed the 'RDOM.user.js' user script in the browser. This script is included with the 'DOM' package, but must be manually installed, for example, using the greasemonkey plug-in for Firefox. Furthermore, the settings for the user script will have to be modified to enable access to specific URLs (see the @include rules in 'RDOM.user.js').
It is also possible to manipulate a web page using
a headless browser (PhantomJS).
This works for each of htmlPage
, filePage
,
and urlPage
(without the need for a user script), by
specifying headless=TRUE
.
This somewhat ruins the point of dynamically modifying a web page because, with a headless browser, we cannot watch the changes being made to the web page on screen. However, the headless browser is extremely useful for testing.
Because R is communicating with the web browser via a websocket, it is also possible for the web browser to send requests to R, for example, in response to a user event, such as a mouse click.
The 'DOM' package defines a single javascript function for this purpose:
RDOM.Rcall
. This function must, of course, be used
in javascript code within the web page, but if we wish
we can create that javascript
and insert it in a web page using the R functions previously described.
The RDOM.Rcall
function takes three arguments: the name of
an R function to call, a pointer to an HTML element, and a javascript
function (a callback) to run once the call to R has completed.
The R function given as the first argument to
RDOM.Rcall
must take 2 arguments. The first argument will contain HTML code
for the HTML element that was given as the second argument to
RDOM.Rcall
and the second argument will contain a CSS selector for that
HTML element.
The following code provides a demonstration. First, we define an R
function, echo
,
that takes two arguments and prints them to the screen.
Next, we open a browser window and append a paragraph with
a <span> element embedded in it.
Finally, we set the onclick
attribute of the <span> element to be a call to
RDOM.Rcall
, with the name of our R function,
"echo"
, as the first argument, the span element
itself (this
) as the second argument, and
no callback.
echo <- function(elt, css) { cat("\n") cat("HTML:", elt, "\n") cat("CSS :", css, "\n") } page <- htmlPage() appendChild(page, "<p>This text contains a <span>special</span> word</p>") setAttribute(page, eltRef="span", attrName="onclick", attrValue='RDOM.Rcall("echo", this, null)')
If we now click on the word "special" in the web browser, the R function
echo
is called and we get
the following output in the R console:
HTML: <span onclick="RDOM.Rcall("echo", this, null)">special</span> CSS : span
It is important to note that the call from the web browser to R is
asynchronous. R is not blocked waiting for the call to the
echo
function.
It is possible to include a request
to the browser in the R function that is called from the browser
(e.g., use R to modify an element in response to a mouse click in
the browser), but in that case, it is essential that the request from
R is also asynchronous (using the async
argument that was
described earlier).
The 'DOM' package provides a tool for generating and modifying the content of a web page on-the-fly. It does this through a web socket connection to a web browser, which allows R to send requests to the browser and allows the browser to send responses or even requests back to R.
In effect, the 'DOM' package uses a web browser as an interactive output device. We can write R code to produce output that is rendered by the browser. Furthermore, the browser can capture user events that occur on the output and call back to R.
Future development of the package will be aimed at allowing the generation and modification of SVG and CSS output so that the browser can also act as an interactive device for graphics and/or a mixture of graphics and textual content.
Several excellent packages already existed for manipulating web page content, but they did not provide exactly the right set of features: The 'XML' package, and more recently 'xml2', provide functions for manipulating XML and HTML content, but the document being modified is not associated with a web browser so changes are not dynamically visualised; packages such as 'Rapache' and 'Rook' allow R to act as a web server, but with a focus on supplying content on request from a web browser, not to allow R to drive the web browser; 'RSelenium' allows R to drive a web browser, but R cannot receive callbacks from the browser on user events; 'Shiny' allows us to create web content, including interactive elements that call back to R, but only via a higher-level framework that does not provide the level of fine control that we need. A very recent addition is the 'fiery' package, which is lower-level than 'Shiny', but very general-purpose. We may in the future explore 'fiery' as a possible basis for 'DOM' to build on (instead of 'httpuv').
The 'DOM' package (currently) has several important limitations:
only a tiny fraction of the DOM interface has been implemented so far;
The package has mostly only been tested on Linux, with Firefox
and greasemonkey
(there has been
one successful test of htmlPage
on Windows, with Chrome);
and the 'DOM' package
is only aimed at the case where R and the browser are running
together on the same machine.
Furthermore, the 'DOM' package is aimed at
a single-user, single R session scenario
(e.g., it is not difficult to clobber yourself
by running two R sessions that make use the same port
for their websocket).
The 'DOM' package allows us to open a web page in a browser and to manipulate the content of the web page dynamically from R. It is also possible to arrange for R code to be run in response to user events in the browser.
The examples and discussion in this document relate to version 0.1 of the 'DOM' package.
This report was generated on Ubuntu 14.04 64-bit running
R version 3.3.1 (2016-06-21)
and PhantomJS version 1.9.0.
"CSS Selector Generator", Riki Fridrich, https://github.com/fczbkk/css-selector-generator, date visited: 2016-07-26.
"Document Object Model (DOM)", World Wide Web Consortium (W3C), https://www.w3.org/DOM/, date visited: 2016-07-27.
"DOM", Web Hypertext Application Technology Working Group (WHATWG), https://dom.spec.whatwg.org/, date visited: 2016-07-27.
"Document Object Model (DOM)", Mozilla Developer Network (MDN), https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model, date visited: 2016-07-27.
"fiery: A Lightweight and Flexible Web Framework", Thomas Lin Pedersen, https://cran.r-project.org/web/packages/fiery/index.html, https://github.com/thomasp85/fiery, date visited: 2016-07-26.
"htmltools: Tools for HTML", RStudio, Inc., https://cran.r-project.org/web/packages/htmltools/index.html, https://github.com/rstudio/htmltools, date visited: 2016-07-26.
"httpuv: HTTP and WebSocket Server Library", RStudio, Inc., https://cran.r-project.org/web/packages/httpuv/index.html, https://github.com/rstudio/httpuv, date visited: 2016-07-26.
"Rapache: R embedded inside Apache", Jeffrey Horner, https://github.com/jeffreyhorner/rapache, http://www.rapache.net/, date visited: 2016-07-26.
"Rook: a web server interface for R", Jeffrey Horner, https://cran.r-project.org/web/packages/Rook/index.html, date visited: 2016-07-26.
"RSelenium: R bindings for Selenium WebDriver", John Harrison, https://cran.r-project.org/web/packages/RSelenium/, http://ropensci.github.io/RSelenium, date visited: 2016-07-26.
"shiny: Web Application Framework for R", Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson, https://cran.r-project.org/web/packages/shiny/index.html, http://shiny.rstudio.com/, date visited: 2016-07-26.
"xml2: Parse XML", Hadley Wickham, James Hester, and Jeroen Ooms, https://cran.r-project.org/web/packages/xml2/index.html, https://github.com/hadley/xml2/, date visited: 2016-07-26.
"XML: Tools for Parsing and Generating XML Within R and S-Plus", Duncan Temple Lang and the CRAN Team, https://cran.r-project.org/web/packages/XML/index.html, http://www.omegahat.net/RSXML, date visited: 2016-07-26.
"xtable: Export Tables to LaTeX or HTML", David B. Dahl and David Scott, https://cran.r-project.org/web/packages/xtable/index.html, http://xtable.r-forge.r-project.org/, date visited: 2016-07-26.
An Introduction to the 'DOM' Package
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.