Publishing and Transforming TEI Documents

Julia Flanders and Syd Bauman

2013-11-20

XML workflow

No one needs to convince us of the importance of the overall topic here: "transforming and publishing TEI". It’s why we create TEI data. However, we may need to do some preliminary clarification and scoping to get a full sense of what we mean, of what the possibilities are and what kinds of "publishing and transforming" they entail.

One way to orient ourselves in the landscape of "publishing and transforming" is to think about how we use our own data. If we think of the life cycle of a TEI project, there are numerous places along the timeline where we want to express different views of our data, for internal or external viewing:

Single-source publishing and XSLT

Another way to approach the topic of this workshop is to think about transformation and publishing as a variety of informational avenues that radiate out from our TEI data. Even though it’s probably a familiar concept to many of you, it’s worth noting an important assumption that underlies much of our work with XML: we’re creating a single XML source from which we are going to generate many different kinds of output.

This is important because the XML source is an expensive and valuable information object: it represents a careful modeling of our research materials, we’ve put a lot of work into it (transcription, encoding, proofreading, correction, annotation, other kinds of enhancement) and we want to exploit it in many different ways, automatically, not by hand.

When we generate these different varieties of output, we are often losing information: erasing distinctions that are present in the source (but unnecessary in the output), or moving from a representationally rich language (like TEI) to a representationally impoverished language (like HTML)

But since these output formats are generated automatically, rather than by hand, this information loss doesn’t matter: the source retains its informational richness: it represents the full set of possibilities from which any specific option can be generated.

Some examples

A few examples:

Transformation as a power tool

A third important aspect of our topic is the idea of data as a mutable, protean substance: as a kind of plastic informational model that we can reshape and manipulate as needed.

In the example here, all four of these examples represent pretty much the same pieces of data—any one of them could be generated from any of the others. And yet these differences might matter in the context of some particular tool or standard way of doing things.

The point is that our data is almost never trapped in its current format: when we understand it as transformable, we gain power over it and we can use it more flexibly. If a collaborator needs some information extracted from our data, or if they put their fields in a slightly different order, or whatever, it’s not a problem.

Scope and ambition

The chief tool for doing all of these kinds of work is a programming language called XSLT, the Extensible Stylesheet Language for Transformations:

Either way, what it does is give us a way of manipulating our XML data: to extract pieces of it, reshape them, change their format, generally do whatever we want to do with them.

Let’s talk for a moment about what we’re going to cover in this seminar (and what we’re not going to cover).

This seminar is aimed at people who have TEI data and not much else: we aren’t assuming familiarity with programming, or with XML publishing tools

Our goal is to help you learn about what’s involved in using your TEI data: in publishing it, in manipulating and transforming it into other formats, exploiting its informational potential; we’d like you to come away, first of all, with a sense of what is possible.

How about in concrete, practical terms? What are we actually going to cover? XSLT is hugely powerful--it is a full-fledged programming language--but as a result it’s a big topic:

By the end of the workshop, you should also have a good sense of whether XSLT is something you want to know more about and learn in a more systematic way, and if it is, we encourage you to take a more intensive XSLT workshop: Syd teaches one at DHSI, and Syd and David teach one at Brown every so often. This workshop is a good starting point for either of those workshops.

Simple Publication with XSLT

The Extensible Stylesheet Language allows you to transform XML documents into other XML formats

Essentially XSLT allows you to map a given XML element onto another XML element: saying "take in the following construct, and put out this other construct"

It could be a construct in the same language, or in a different language such as XHTML, as in the example here

XML Databases and Publication Frameworks

The XML database and publication framework universe

These kinds of tools are designed to manage large groups of XML files, and to provide certain kinds of advanced functionality:

How do databases fit into a larger XML publication framework? What do they do?

Within the XML publication framework, the database sits and waits for queries to come in.

XML databases exist as separate modules that can be used as the basis for XML publishing systems, for instance:

The Bigger Picture

The tools you need, and the people you need, can be imagined as a rough continuum of increasing scale, complexity, difficulty, and cost:

So considering where the three examples we looked at earlier might fit in: