Publishing TEI Documents
Julia Flanders
2007-05-13/14
Publishing
Now you’ve learned about how to create TEI documents, but
we haven’t said anything about what you can do with
themߪ
There is a very wide range of things you can do with TEI
documents “beyond search”, involving in-depth analysis, data
mining, processing to discover patterns of various
sorts.
We’re going to focus on “publishing”, both in the narrow
sense of "making them readable online" and also in the
broader sense of “exploiting the encoding publicly”. But
most of the more advanced things you can do with TEI
documents use technologies similar to the ones we’re talking
about here.
Simple
The simplest approach to publishing TEI documents:
just apply a CSS stylesheet.
- same process as applying a CSS stylesheet to
HTML documents
- the browser reads the TEI file, which points to
a stylesheet
- the browser reads the stylesheet, and applies
its styling to the TEI elements as specified
- the browser displays the formatted text
- can control everything including fonts, colors,
backgrounds, layouts (where the chunks of text are
placed on the page), etc.
- modern standards-compliant browsers can all do
this
Some limitations
This is great, but there are some limitations:
- can’t (at the moment) make links
- can’t search, except by using your browser’s
Find… command
- can’t do any sort of higher-level stuff (of the
sort that we’ll see in a minute)
Transformations with XSLT
The Extensible Stylesheet Language allows you to
transform XML documents in many ways:
- into other XML documents, such as XHTML, TEI,
XSLFO, DocBook, etc.
- into other formats: TeX, RTF, pretty much
anything if you can figure out how
Transformation into other XML documents can mean
several things:
- taking the entire TEI document and converting its
markup into XHTML, so that you now have an XHTML-encoded
document
- taking the entire TEI document and transforming
bits of it into HTML: for instance, taking just the
section headings and making an XHTML-encoded TOC; or
taking a long TEI document and transforming it into
separate XHTML files, one for each chapter, accompanied
by a TOC; etc.
- transforming one kind of TEI markup into another:
for instance, if you mark up your documents using a
customized schema, but you want to exchange data with
other projects, you might convert your markup to TEI
Lite for easier interchange.
This transformation can be done as a process that you run
in advance, and then use the output. For instance, you might
have a set of TEI files which you transform to XHTML, and
then mount the XHTML on your web site. When you make an
update to the TEI files, you run the transformation again,
and remount the resulting XHTML.
Tools you need...
Tools you need for this kind of transformation:
- an XSLT processor (some are built into Oxygen);
there are several, they have different virtues which we
won’t go into here
- an XSLT stylesheet
The processor
reads the stylesheet, and reads your XML file, and it
applies the stylesheet to the file and outputs a
result.
Then the result can be used as appropriate: styled with
CSS and viewed in a browser if HTML; viewed in a browser or
PDF reader if PDF; etc.
Transformations on the fly
You can also run these transformations on the fly, as
part of your publication system:
- your TEI files live on a server
- when a user requests a file (e.g. by clicking on
a URL), the transformation software performs the
transformation on the fly and delivers the resulting
HTML.
- the transformation might vary depending on the
request: for instance, a user clicking on the “sort by
date” link would get different output—from the same
underlying TEI file—that he/she would get by clicking on
the “sort by author” link
Tools you need...
Tools you use for this kind of transformation: e.g.
Cocoon
Some limitations...
This is great, but there are some limitations:
- still not much searching
- what searching there is will be slow; you’re
using a tool not designed for handling searches
efficiently
- not good for managing large aggregations of
files efficiently, or for managing them as a
group, dealing with information that cuts
across the entire aggregation
XML Databases
The XML Database universe
These kinds of tools are designed to manage large
groups of XML files, and to provide certain kinds of
advanced functionality:
- fast, efficient searching
- transformations involving groups of files: not
just transforming each file separately, but doing
transformations that involve taking parts of
different files and creating new results files: for
instance, a sorted list of the first lines from all
the poems in a collection.
XML databases in the larger XML framework
How do databases fit into a larger XML publication
framework? What do they do?
- they create and store indexed information: that
is, information from the source XML files that has
been preprocessed to make it more accessible and
easier to manipulate. For instance, they might store
tables of all the document metadata (author, title,
genre, date, etc.) so that it can be searched and
sorted more quickly
- they contain a representation of the document’s
structure in a format that makes it easier to
process, so that certain kinds of navigation are
easier
Within the XML publication framework, the
database sits and waits for queries to come in.
- when it receives a query, it performs the
necessary searching and returns a result (in the
form of an XML fragment, or a node set, or some
proprietary structure)
- the result can then be transformed (e.g. into
HTML for delivery to a browser, or into some other
XML format for other processing) using XSLT
XML Databases
XML databases exist as separate modules that can be
used as the basis for XML publishing systems, for
instance:
- eXist
- DBXML
- Xindice (Apache)
- TigerLogic XDMS
- TEXTML
XML publishing systems with database component
But there also exist XML publishing systems which
include a database component and also other components
which handle other aspects of the process:
- TEI Publisher (uses eXist): show a bit?
- Philologic (includes its own database, but can
also work with MySQL): show WWP site
- commercial products like Tamino