Navigating the Tree

This slide set builds on the previous XSLT tutorials by discussing XPath, which is a way of navigating an XML tree. It is particularly important for XSLT and publication because it allows the selection of specific elements depending on their context. So, for example, perhaps you want to render “quote” differently when it comes up in “epigraph” than when it occurs in “p.” XPath allows you to specify context for a given element, which makes transformations more nuanced.

Identifying the element to match: simple identification

We've already seen examples of the simple case where we name an element in the value of match, and thereby we identify an element to be processed; here we say "please match the TEI element"

We're not talking here about the flow of the stylesheet: we're just saying "when my turn comes, if there's a XX, I match it..."

In the previous tutorials, we have seen examples of XSL stylesheets where we name an element in the value of match, and thereby we identify an element to be processed; in these cases we say "please match the TEI element."

In those cases we've seen, we were not talking about the flow of the stylesheet, or the context of the element we are trying to match. We were simply saying "when my turn comes, if there's an element XX, please match it..."

The example here in blue says, match the TEI element.

Identifying the element to match: multiples

Here's an example where naming an element actually identifies multiple instances of that element: in this case, all the div elements in the document

We can also specify multiple different elements that we want to target; the or-bar means "this or that", "head or opener"

There are also instances in which the match attribute will identify multiple elements in the document. For instance, in the first scenario, the match attribute matches all of the divs within the document.

The second example actually shows how to select several different elements, not just multiple instances of the same element. The or-bar specifies "this or that." In this case the head or the opener should use this template. As in the first example, this will match head or closer elements anywhere in the document

In either of these cases we might want to be more specific and say, for example, that I only want the div elements that appear within the front element, or I only want head or closer elements if they appear in the teiHeader; XPath provides a syntax for doing this.

Identifying the element to match: context

And we can also specify particular contexts we're interested in: for instance, only the div elements that appear within front

We can specify the the elements by the context in which they appear. In this example, we are only selecting div elements that are within front, and only the head elements that are within divs within front front.

Using this simple XPath method, we can navigate the entire input document tree.

Some other useful patterns: attribute values, order of siblings

We can also select contexts based on particular attribute values (similar to CSS selectors): basically saying "I'm looking at all the divs, and of those I only want the ones that have this attribute, and this value"

Also based on which sibling we are talking about: e.g. we only want the first p child in a given context.

We can also select elements based off of their attribute values. In the first example, what we're saying is "We're looking at all the div elements, but we only want the ones that have this particular attribute and this particular value." The syntax for specifying this is shown in blue, and you can replace it with any element, attribute or value: match="element[@att='value']". Note that you need to use single quotes for the match value, since using double quotes would create an ill-formed document.

In the second example, we are specifying the matched element based on its position relative to its sibling elements. Adding a number in square brackets after the element name in match allows you to specify which number element you want to select. So, for example p[1] would only select and match the first paragraph within a given element.

How to suppress parts of the input tree

We can also suppress whole branches of the input tree, basically telling the XSLT processor to ignore them

We do this by providing a template that matches the parent element for that branch, but with no "apply templates" instruction

You can suppress parts of the input tree by giving the match attribute of xsl:template the value of the element you are trying to suppress. Then leave the content of this xsl:template empty. By leaving out the apply-templates instruction, the processor is instructed to ignore this part of the input tree (the matched element and all of its descendants). You can either use the syntax given, or xsl:templates match="something"/ (the alternate way of writing empty elements).

In the example, the template supresses the teiHeader and all of its descendents. This is something you might do if the metadata included in the teiHeader is not relevant in the ultimate output document, like HTML.

More complex navigation: the context node

We've been doing some simple navigation so far, basically ignoring the question of where exactly we are in the tree

However, to do more complex navigation and selection of nodes for processing, we have to have a more complex understanding of how we navigate, which involves knowing where the XSLT processor thinks we are (and hence, how to get to where we want to go from there)

In an XSLT template, when we match an element, whatever else happens inside that template happens relative to the location of that element (i.e. the matched element in the input tree); that element is "the context node".

So for instance in this example, our template is selecting div as the context node, and as a result the p elements identified by the select attribute are limited to those within the context node: the p children of the context node

So far we’ve been doing some simple navigation and we've basically ignored the question of where exactly we are in the tree. However, to select nodes for more complex processing, we have to have a more complex understanding of how we navigate, which involves knowing where the XSLT processor thinks we are. This will help us know how to get to where we want to go from there.

In this example, our template is using the match attribute to select div as the context node, and as a result the p elements identified by the select attribute are limited to those within the context node: the p children of the context node.

Navigating from the root of the tree

If we want to navigate from the root, rather than relative to our current position, start the expression with a slash

If we want to navigate from the root, rather than the current node, we start our expression with a slash. Starting from the root node is useful if you want to apply templates to elements from a different part of the input document tree, as in the example here.

Parents, ancestors, descendants, and children

So far we've just been navigating down the tree in a very simple way: all of our matches and selections thus far have been done using just the element name. It's time to gloss this notation in a bit more detail: an element name X (used on the select attribute, all by itself, means "the X children of the context node", "my X children"

If we want the descendants, not just the children, there's another notation: //

And if we want parents, there's another notation ../

And if we want ancestors, there's another (more verbose) notation

If we want the descendants, not just the children, there's another notation: //. So in the first example we get all the p elements that are descendants of text (highlighted in purple).

And if we want parents, there's another notation .. In the second example we get the parent element of text, which is TEI (highlighted in orange).

And if we want some child of a parent we can combine notation as seen in ../teiHeader. In the third example, we go up to the parent element of text, which is TEI, then select the element teiHeader that is a child of TEI (highlighted in green).

Finally, if we want ancestors, there's another (more verbose) notation ancestor::TEI. In the final example, we select TEI, an ancestor of p (highlighted in orange).

Following and preceding siblings

The idea of parents and ancestors, children and descendants, takes us up and down the tree; we can think of these relationships as traversing axes that radiate out from the context node

We can also go across the tree along horizontal axes: to identify preceding and following nodes

A simple case: the following sibling...

Parents and ancestors take us up and down the tree. The relationships are like axes that radiate out from the context node. However, we can also go along the tree using the horizontal axes, identifying following and preceding nodes. The case in the example here is the following sibling, meaning any sibling, p, that occurs after the context node (head).

Following and preceding nodes

The more general case of "following" nodes are a little trickier because you have to remember that an element's child doesn't "follow" it. To find "following" elements you have to go up one level, to the parent, as in this example: the "following" div elements are those that are children of any following siblings of the context node.

The more general case of "following" nodes is a bit trickier because you have to remember that an element's child doesn't "follow" it. It may appear to in terms of the layout of your XML file, but that's not how the tree structure is understood to work.

In the cases of following and preceding nodes, we need to look at all of the elements that occur outside (either before or after) the context node's boundaries. So the first example, all of the div elements that come after front are processed with the first template (in blue). In the second example, you need to look at all of the div elements that come before the body element. All of these div elements are processed with the instructions in pink. Remember that following and preceding elements are never children or descendants of the element specified as context.

Axes

For reference, a cribsheet for XPath

Let's take a step back and look at the axes themselves

For reference, a cribsheet for XPath

The different axes that XPath employs are shown on this slide. In this tutorial we have covered ways to navigate these axes using XPath. Take a minute to look through this XML tree diagram, to acquaint yourself with the ways in which different XML elements relate to each other.

This tutorial is complete, please see links below to continue: Proceed to next tutorial in Transformation and Publication Primer Return to Transformation and Publication Primer Return to main tutorial page