Overview of Text Encoding and the TEI

Julia Flanders


The DH Universe

Ok, you are here for an introductory class, which means that for many of you this may be your first introduction to Digital Humanities as a field. So, let’s take a second to situate the TEI in the broader field of DH Scholarship. This is really a rudimentary cloud of terms and sub-disciplines that you may come across in DH. I’ve separated this out into two major categories. The red category, down here, is “Making,” and the blue categories are those that are less obviously about Making. Many definitions of DH will maintain that if you’re not making something, some product, that you’re not doing DH. This is, obviously, controversial, and fields that, for instance, derive from New Media Studies, will institute reading practices that are aware of the technological and material underpinnings of the texts they work with, but will not necessarily be interested in producing new objects beyond the objects that their criticism is housed in. And, for that matter, we can make the argument that all work or all labour is a form of making, and that criticism can be the product of a critical labour. I also want to point out that the “Making” category also supports what can be seen as a reading practice. Digital Textual Studies encompasses reading approaches like computational analysis or algorithmic reading, both of which depend on finding patterns in text and producing data that can be analysed. But if we get narrower into Digital Textual Studies, we can see that it narrows into Digital Textual Editing, and this is really where we start to see the utility of something like the TEI. Markup is a way of describing something that exists in text, and we all do markup all the time, whether we’re aware of it or not. Any time you write in the margins of a book, it’s a form of markup. Anytime someone publishes an editorial edition of a book, with footnotes, or information on textual variants, for instance, they rely on a markup system. XML is a descriptive markup language—that’s a term that we’ll return to, descriptive—that primarily supports technological understandings of editing. And TEI is a flavour of XML. So this small slice, which would be even smaller if I were to fill in the more granular concerns of other disciplines, is what we’re dealing with today. But why is this small slice the thing that marks a lot of people’s first exposure to DH? Well, a couple reasons: the first is that TEI encoding is relatively accessible. You should be able to continue a coding practice from what you learn today; and markup, since we do it all the time, is pretty intuitive. It’s a way that we’re used to engaging with texts. The second, and maybe the more compelling, is that TEI is among the most well-established technologies in the field. It has a robust governing body around its usage, and it’s been around long enough to produce some innovative uses of the technology. So what you’re learning here is in many ways a standard practice in the discipline, and a way to start to understand DH approaches the kinds of data we come across in text.

Text Encoding: Representing Research Objects

This workshop is really about how to represent research objects

The TEI is not so much an attempt to answer those questions, as it is a tool for expressing our answers.

One Model, Many Outputs

The digital model of our research materials sits within a larger ecology that also involves other kinds of digitally mediated activities, like publication.

In fact, we can create many different kinds of derived models, different kinds of output, all from the same source, using automated transformations (tools like XSLT)

And we can control their appearance through CSS stylesheets and other formatting processes

But it’s important to remember that all of these different outputs depend on the existence of the original analytical model and its descriptive power: from the fact that what it is describing is not accidental facts of appearance, but deeper ideas about structure and content.

How do XML and the TEI fit in?

So this is a workshop about modeling and representation, but it’s also a seminar about TEI, and you’ve heard the term XML uttered as well: how do these pieces relate to one another?

In text encoding, the concepts that underlie our modeling all come from us: from our minds, from culture, etc.

XML specifies a syntax for text encoding: a way of distinguishing the markup from the content

TEI provides the specifics of the actual text encoding language: the controlled vocabulary of terms (like note) that we will use in marking up the text

We’ll be talking about each of these things in more detail over the rest of the morning and over the next few days.

Representing Research Objects?

Looking a little more closely at this modelling and mapping process:


And now looking at the relationship between the TEI model of this source document, and the mapping of that model into HTML: What do we see when we look at this diagram? (Pause and let people look and make suggestions)

TEI and Research Materials

So, stepping back now...The TEI Guidelines is a representational system: a system of notation through which we can create digital representations of our research materials.

The TEI exists to make it possible for us to represent research materials in a way that is:

But in addition, it focuses our attention in a very particular way on the translation of observations about texts into formal information

Formalism, Selection, Description

The reason this is possible is because languages like the TEI serve as a kind of intermediary between the brute stuff of the world, and the formal analytics of a research process.

So although it looks as if what we’re doing here is reproducing a text, in fact what we’re doing is modeling it, very selectively and very strategically: we’re creating a proxy, a surrogate, sort of like a plastic model of an atom or a DNA strand: with selective, useful omissions and distortions

This model then serves as the basis for all the things we want to do with our text in the digital world: publish it, print it, visualize it, analyse it.

Background on the TEI

So just to make sure we’re all on the same page, a little background on the TEI and on what it is