by Paul Murrell http://orcid.org/0000-0002-3224-8858
Wednesday 09 November 2016
'DOM' version 0.4
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.
This report describes changes in version 0.4 of the 'DOM' package for R. The main change in this version is the addition of new functions that allow control over the Cascading Style Sheet (CSS) content of a web page. This provides programmatic control over the styling of HTML and SVG content on a page.
For demonstration purposes, we will work with a web page consisting of a single paragraph (a more complex example is provided later).
library(DOM) page <- htmlPage("<p>A paragraph</p>")
Because we will be working with this paragraph multiple times, the following code creates a pointer to the paragraph element. We will be able to use this to refer to the paragraph from now on.
p <- getElementsByTagName(page, "p", response=nodePtr())
A simple way to use CSS styling on an element on a web page is to define
a style
attribute for the element. The existing
setAttribute
function in the 'DOM' package
already provides support for this. The following code sets the
style
attribute for the paragraph so that the
text turns red.
setAttribute(page, p, "style", "color: red")
However, this setAttribute
approach is heavy-handed
and does not provide fine control over the CSS styling because
the entire style
attribute has to be specified.
For example, the following modification of the CSS styling
replaces
the previous setting; the text is now italic, but it is no longer red.
setAttribute(page, p, "style", "font-style: italic")
Another way to access the CSS styling on an element
is through the style
property of the element.
In version 0.4 of 'DOM' there are two new functions
getProperty
and setProperty
that allow
us to access and modify element properties. The following code gets
the style
property for the paragrah.
style <- getProperty(page, p, "style") style
An object of class "DOM_CSSStyleDeclaration_ptr" [1] "1" Slot "pageID": [1] 1
The result is a DOM_CSSStyleDeclaration_ptr
.
Compare that result to what we get from getAttribute
(another new function in version 0.4),
which is just a character vector.
getAttribute(page, p, "style")
[1] "font-style: italic"
With getProperty
, we get a pointer to a style object,
rather than just the text value for a style attribute. The advantage
of the style object is that we can access and set individual
properties of that object. For example, the following code accesses
the font-style
property of the paragraph style.
getProperty(page, style, "font-style")
[1] "italic"
The following code sets the color property of the style.
The advantage of this, compared to setting an attribute, is that
we only set the color property of the style; the
font-style
property (italic) is untouched.
setProperty(page, style, "color", "red")
There is also a short hand provided for getting and setting properties.
p$style$color
[1] "red"
p$style$color <- "green"
In summary, with the new ability to get and set properties, we can easily access and modify individual CSS properties within the style property of an HTML element on a web page.
Another way to use CSS styling on an element is to add a
style sheet to the web page, with a CSS rule
that targets the element. For examples in this section,
we will start with a fresh page
(because CSS styling via a style sheet has a lower priority
than inline CSS styling via a style
attribute).
page <- htmlPage("<p>A paragraph</p>")
A style sheet can be added to a page by adding a <style> element
to the <head> element of the web page.
Another option would be to add a <link> element
(to point to an external style sheet).
The existing appendChild
function can do this
for us.
appendChild(page, htmlNode('<style type="text/css">p { color: red; }</style>'), parent=css("head"))
The style sheet consists of zero or more rules. In this case, there is a single rule:
p { color: red; }
Each rule consists of a selector and zero or
more style declarations. The selector
specifies the target of the rule (in this case, the
selector p
means that the rule will apply to
all <p> elements in the page) and the style
declarations have the same format as in the style
attribute of an element: a CSS property name, followed by a colon,
followed by a CSS property value (with a semi-colon between
multiple style declarations).
We can add more than one style sheet to a page and we can remove
style sheets (with removeChild
), but, as with
style
attributes, this is heavy-handed and does not
allow fine control of the details of a style sheet.
The new styleSheets
function provides access to
the current style sheets on a page.
The result is a DOM_CSSStyleSheet_ptr
, which is
one or more pointers to the style sheet objects in the browser.
sheets <- styleSheets(page) sheets
An object of class "DOM_CSSStyleSheet_ptr" [1] "0" Slot "pageID": [1] 2
Having access to these style sheet objects is useful because
we can use them with the new
insertRule
and deleteRule
functions
to add/remove individual rules to/from a style sheet.
For example, the following code adds a new CSS rule, that also
applies to <p> elements, so that the paragraph text
becomes italic as well as red.
insertRule(page, sheets[1], "p { font-style: italic; }", 0)
However, adding and removing entire rules is still a fairly coarse level of control. Even better would be control of the components of a rule: the selector and the style declarations.
The cssRules
property of a style sheet produces a
DOM_CSSRule_ptr
object: a vector of pointers
to individual CSS rules. In this case, there are two CSS rules
in the style sheet.
sheets[1]$cssRules
An object of class "DOM_CSSRule_ptr" [1] "1" "2" Slot "pageID": [1] 2
We can access the style
property of a CSS style rule and that
gives us a DOM_CSSStyleDeclaration_ptr
(just like we
got from accessing the style
property of an HTML element).
We can then get and set the properties of that object to access
and modify the style declarations in the CSS rule in the style sheet.
In the following code, we are using CSS rule number 2 to get the rule
that controls color
because the
rule that we inserted above to control font-style
was inserted at index 0 (i.e., BEFORE
the color
rule that was already in the style sheet).
sheets[1]$cssRules[2]$style$color
[1] "red"
sheets[1]$cssRules[2]$style$color <- "green"
The function propertyNames
can be used to get the
names of all properties in a style declaration. This does not
correspond to a DOM method; it is just a convenience function.
propertyNames(page, sheets[1]$cssRules[2]$style)
[1] "color"
We can remove an existing property from a style declaration
with the removeProperty
function.
removeProperty(page, sheets[1]$cssRules[2]$style, "color")
[1] "green"
propertyNames(page, sheets[1]$cssRules[2]$style)
character(0)
It is also possible to access the selector for a CSS rule, but this cannot be modified; if we want a rule to control a different target, we should make a new rule.
sheets[1]$cssRules[2]$selectorText
[1] "p"
Similarly, we can view (but not edit) the full text for a CSS rule
via the cssText
property.
sheets[1]$cssRules[2]$cssText
[1] "p { }"
In summary, several new functions, combined with the ability to get and set properties, allows us to access and modify entire style sheets for a web page. This means that we can programmatically control the appearance of entire sets of elements at once.
Most of the examples so far have involved working with a ready-made
element with a style
attribute or working with
a ready-made style sheet.
This section briefly demonstrates how to build a stylesheet for
a web page from the ground up.
We will again start with a web page containing a single paragraph and no CSS styling.
page <- htmlPage("<p>A paragraph</p>")
The first step is to create an empty style sheet. We can do this by creating an empty <style> element and adding that to the page.
styleElement <- createElement(page, "style") appendChild(page, styleElement, parent=css("head"))
An object of class "DOM_node_HTML" [1] "<style></style>"
We can access the style sheet via the sheet
property of
the <style> element. The first thing we do with the style
sheet is disable it so that we can build it up without affecting
the page.
styleSheet <- styleElement$sheet styleSheet$disabled <- TRUE
The next step is to add an empty rule to the style sheet. This allows us to specify just the selector for the rule.
insertRule(page, styleSheet, "p { }", 0)
[1] 0
We now create a short-cut to the new rule, to save on typing, and add style declarations to the rule.
rule1 <- styleSheet$cssRules[1] rule1$style$color <- "red" rule1$style$"font-style" <- "italic"
The last step is to enable the style sheet so that it can have an effect on the contents of the page.
styleSheet$disabled <- FALSE
All of the examples so far have involved styling HTML elements.
Styling SVG elements is very similar, but with the added complication
that individual SVG elements have presentation attributes
in addition to a style
attribute.
For an HTML element, a
style declaration in the style
attribute will override
any style declarations in a style sheet that target the element.
For an SVG element, a
style declaration in the style
attribute will override
any style declarations in a style sheet that target the element, which
in turn will override any presentation attributes on the SVG element.
The following code demonstrates these rules. First of all, we
add an SVG image to the page and then we set the presentation attribute
fill
on the element (so that it is filled blue).
appendChild(page, svgNode('<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50"> <circle id="c" r="50"/> </svg>'), ns=TRUE) circle <- getElementById(page, "c", response=nodePtr()) setAttribute(page, circle, "fill", "blue")
Now if we add a style sheet to the page that targets that SVG element
and has a style declaration for fill
, it overrides the
element's presentation attribute, and the circle turns green.
appendChild(page, htmlNode("<style>#c { fill: green }"), css("head"))
An object of class "DOM_node_HTML" [1] "<style>#c { fill: green }</style>"
Finally, if we add a style
attribute to the SVG element,
that overrides both the style sheet and the
element's presentation attribute, and the circle turns red.
setAttribute(page, circle, "style", "fill: red")
This section provides a brief demonstration of the new 'DOM' features on a more realistically sized example. The following code creates a web page and adds a 'lattice' plot to the page as SVG content.
library(lattice) library(gridSVG) library(DOM) library(XML) xyplot(mpg ~ disp, mtcars, pch=16, cex=2) svg <- grid.export(NULL)$svg page <- htmlPage() appendChild(page, svgNode(saveXML(svg)), ns="SVG")
The following code adds and builds a style sheet for the page that modifies the styling of the data symbols in the plot (so that they all turn red). This code takes advantage of the fact that all data symbols in the SVG that is generated by 'gridSVG' are <use> elements.
styleElement <- createElement(page, "style") appendChild(page, styleElement, parent=css("head")) styleSheet <- styleElement$sheet insertRule(page, styleSheet, "use { }", 0) styleSheet$cssRules[1]$style$fill <- "red"
Previous sections have focused on the new user-facing features in version 0.4 of the 'DOM' package; how the package should work when we are using it to perform actions that the package is designed to support. This section looks at some of the lower-level and internal changes to the package, which can still affect us if we attempt to use the package for tasks that are not explicitly supported.
The previous version of the 'DOM' package introduced the idea of DOM node pointers, which are R objects that contain a pointer to a DOM node object (an HTML or SVG element) in the browser. This idea has been generalised in version 0.4 to allow for pointers to any DOM object. In addition to things like HTML elements, we can now have an R object that points to, for example, a style sheet object within the browser.
The complete class hierarchy for 'DOM' version 0.4 is shown below.
There is now a DOM_obj
root class that subsumes
the existing DOM_node
class. DOM objects are either
a primitive type (DOM_string
, DOM_number
,
or DOM_boolean
), or something more complex
(DOM_obj_ref
). For everything other than HTML and SVG
nodes, a DOM object in R is a pointer to a DOM object in the
browser (DOM_obj_ptr
). Some DOM objects (so far,
only ones related to CSS)
have their own specific representation in R (e.g.,
DOM_CSSStyleSheet_ptr
).
Because we can now represent any DOM object in R, and we can
get and set properties on these objects, it is now possible
to access any part of the web page (as long as it is accessible via
object properties). For example, if a web page has a
<style> element, we can access the style sheet object
relating to that element through its sheet
property.
page <- htmlPage() appendChild(page, htmlNode('<style id="s1"/>'), parent=css("head"))
An object of class "DOM_node_HTML" [1] "<style id=\"s1\"></style>"
styleElement <- getElementById(page, "s1", response=nodePtr()) styleElement$sheet
An object of class "DOM_CSSStyleSheet_ptr" [1] "1" Slot "pageID": [1] 5
In the above example, there is a specific class to represent the
DOM object (in this case, DOM_CSSStyleSheet_ptr
),
but the 'DOM' package does not have a specific representation for
all possible DOM objects. For example, the following code
adds a CSS media rule to the style sheet of the page and then
attempts to access the media
property of this
rule.
insertRule(page, styleElement$sheet, "@media screen { body { background-color: #AAA } }", 0)
[1] 0
styleElement$sheet$cssRules[1]$media
An object of class "DOM_obj_ptr" [1] "3" Slot "pageID": [1] 5
The result is a generic DOM_obj_ptr
.
The 'DOM' package does not know exactly what sort of DOM object
this is and this has two consequences. First,
the 'DOM' package does not know anything
about the properties of the object, so provides less protection
against doing something silly like trying to set a read-only property.
This should only result in an error, so it is not a big problem,
though in some cases it might just silently not do anything, which
is more dangerous.
A larger problem is that the 'DOM' package may not provide functions
corresponding to the methods of the object.
For example, the 'DOM' package
knows about CSSStyleSheet objects, so it provides
an insertRule
function to mirror the method of that
name for CSSStyleSheet objects. But the 'DOM' package
does not have a specific
representation for media list objects (which is what we have accessed
in the code above) and that is reflected in the
fact that there are no functions for working with media list objects
in the 'DOM' package.
Future versions of the 'DOM' package may expand the set of supported DOM objects to cover some of these holes.
In the meantime, the generalised access to DOM objects
and their properties does still allow a much greater scope
for exploring and interacting with a web page.
The example below shows that, even though 'DOM' does not make
a distinction between CSS style rules and CSS media rules,
we can access the CSS style rule within the CSS media rule just through
the properties of the DOM_obj_ptr
object.
styleElement$sheet$cssRules[1]$cssRules[1]$style$"background-color"
[1] "rgb(170, 170, 170)"
The following code shows that we can now extract the complete HTML code for a page just through object properties. As an interesting side note, the result demonstrates that modifications to the DOM style sheet object are NOT reflected in the HTML content of the web page.
body <- getElementsByTagName(page, "body", response=nodePtr()) body$parentNode$outerHTML
[1] "<html><head><style id=\"s1\"></style></head><body></body></html>"
Version 0.4 of the 'DOM' package introduces several new classes
and functions for working with CSS content in a web page.
The most important change is the ability to get and set properties
on DOM objects in the browser, including style
properties on
individual HTML and SVG elements. In addition, the
styleSheets
function provides access to style sheets
on a web page, the insertRule
function allows CSS rules
to be added to a style sheet, and then the ability to get and set
properties allows us to control the style
properties
of CSS rules within style sheets. The overall result is the ability
to programmatically control the styling of HTML and SVG content
in a web page.
The examples and discussion in this document relate to version 0.4 of the 'DOM' package.
This report was generated within a Docker container (see Resources section below).
'DOM' version 0.4
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.