A Gentle Introduction to XSLT

What does an XSLT stylesheet do?

Note!! at this stage we are talking very imprecisely and impressionistically about XSLT, for the sake of making it clearer and learning things one at a time. We’ll add precision later.

At its simplest: it takes an input file (an XML tree structure) and transforms it: changes its form somehow.

It does this by taking the input tree, piece by piece, and applying transformation rules to each piece: these rules say what to do with each piece of input

We call these transformation rules "templates", and an XSLT stylesheet is basically a set of templates, each one with a condition of operation (when do I happen?) and its own specific task

These tasks will take place if the conditions of operation are met, and not if not.

A very simple example

Here’s a very, very simple XML document, and a very very simple stylesheet. Looking at this stylesheet, what do you think it’s going to do?

OK, so let’s look at how it does this, breaking it down into steps.

Build an output tree …

The first thing that happens is that we build an output tree--a structural skeleton for the file we are creating. The output tree is made up of pieces of code that are embedded in the XSLT stylesheet

So if we look at this diagram, we can see that the stylesheet contains a whole little HTML document;

when we run the stylesheet, we say in effect "I have a paragraph here, what do I do when I have a paragraph?"
the stylesheet answers "aha! when I have a paragraph, I create an output tree like this..."
Notice that our output tree can include some pre-defined content (like boilerplate) but at this point there’s still no content from the input tree present in the output.

… then fill it with data

Once we have the output tree, we next have to populate it with content from the input. For this step, there’s a special part of the stylesheet that serves as a kind of conduit for content to come from the input and get placed in the output. We can think of this <xsl:apply-templates> component as being sort of like a variable or a placeholder within the stylesheet: it says "right here, content is going to flow from the input tree into this place in the output tree." Or maybe it says, more imperatively "Go get that content, apply templates to it, and put it in the output tree!" In a very simple scenario like this one, that content is very simple and the templates being applied are very simple, so the process is easy to follow:

we go back to the input document, to the element that we matched earlier (i.e. <paragraph>) and we get the content from there
then we put it into the appropriate place in the output tree

http://2.bp.blogspot.com/-l23P6-iCPNc/TfDqRvDzWsI/AAAAAAAAAUw/JjaV5_QVzRg/s1600/fetchblog3.jpg

A more detailed example

So now let’s take a closer look at the stylesheet...

Here’s another view of our first scenario, this time with the actual data showing in XML notation

This slide will be useful to come back to, but let’s look closely at a few specific pieces in an introductory way...we’ll come back to them in more detail later on.

First, let’s look at the templates themselves:

templates in the stylesheet (those instructions we saw), represented by <xsl:template>
the @match attribute: tells us what element in the input tree we are dealing with: this is also the "if", the "condition" that has to be met. The @match attribute says basically "is there such an element? If so..."
let’s also note the fact that the value of @match is a location in the input tree; we can specify a context as well and we’ll see how to do this later (don’t gloss this yet)

How about what a template does? Inside these templates here we have basically two things:

a snippet of the output tree (in this case, HTML): these are basically the information skeleton of the output document
an instruction to "apply templates": what this means is, "keep going!" "don’t stop here". Another more precise way of interpreting this element is to say "go ahead and process any children of the matched element": in other words, keep drilling down into the input tree. Without this instruction, the stylesheet logic reaches a dead end.

Note that both of these are optional: might be absent...

What order do things happen?

The trickiest thing about XSLT is perhaps understanding the order in which things happen, and the logic that determines what happens next. It’s important to understand this because otherwise you can get very puzzling results.

We’re used to thinking of computer programs as sets of steps that will happen in order, where the sequence of events is determined by the computer program itself. XSLT is different, in that the order of steps is primarily determined by the input tree:

You start with the root of the input tree (for now, we’ll pretend that’s the root element)
Then you go to the stylesheet to look for the template that matches that root.
When you find the template that matches, you do what it says
And if it contains an instruction to continue applying templates, then you go back to the input tree and consider the children of that matched element.
And for each child, you go back to the stylesheet and see whether there are any templates that match it.

Recursion and dead ends

So the process is recursive, in that you keep drilling down into the input tree and processing the children, and then their children, and then their children

But it can also come to a dead end: that drilling process requires that we receive an instruction to keep applying templates

So in this diagram what we’re seeing is how in fact the "apply templates" at step 3 is really the key to accessing all of the other templates in the stylesheet. If that instruction isn’t there, nothing else happens. We never get to find out whether the <opener> or the <l> or <byline> is matched.