Advanced Markup Concepts

Julia Flanders


Basic encoding

So far we have been looking mostly at two very basic functions that markup can perform:

These are both really important functions—fundamental to the ways markup can be useful to us. Knowing the boundaries of things, and distinguishing between them, are an essential foundation for everything else we do.

We might think of this kind of markup as being almost like clothing: it closely follows the form of the feature it’s surrounding, it draws our attention to it, it marks out its boundaries (not always in a "natural" way...).

“Advanced” encoding

In this next session, we are going to move beyond this basic concept of markup to consider some more complex things that markup can help us do:

The analogy here would not be clothing but maybe architecture.

Notes and annotations

One very important form of structural complexity: a hypertextual sprout or fork or jump in the textual stream

In TEI, all types of annotations are encoded using the note element these can be classified: to indicate responsibility, to indicate what kind of note (using any classification system that seems useful: e.g. annotation, correction, hypothesis, context, gloss, etc.)

We’re illustrating here several different levels of annotation:

The notes themselves can go anywhere; what we are illustrating here has you putting the notes into a special division in the back matter.

In addition, there may be some kinds of annotation that can be handled other ways, not with note but with something more flexible; in this example we show place names being linked to a gazetteer or placeography, which records additional information about the place (a regularized version of the name, a brief note about it, a location with lat-long). This same approach could be used for the names of people as well. We will talk more about these later in the workshop

Figures and Images


Facsimiles and Page Images

Facsimiles of pages and parts of pages:

Representing Rendition

We’ve been talking about showing page images and facsimiles, which is a great way to give a very accurate representation of the source, but doesn’t give us access to data about how the source document looked. When we want to include that data in our transcription, we can do so using the rend attribute, which is available on all TEI elements.

If you just want to say one simple thing about the appearance of an element (eg. italics, centered, bold, whatever) you can use the simple keyword approach at the top here.

If you need to say more than one thing about the rendition of a given element, then you need to provide some internal structure inside the rend attribute. One way to do this is with something known as rendition ladders (which are not in wide use, but are fairly elegant). Another approach would be to use CSS style descriptors.

If you’re not using the CSS method, you make up the values yourself: the TEI does not provide any suggested values.

Critical apparatus

We can also represent a plurality of editorial opinions or textual witnesses as a piece of critical apparatus, using the app element. The optional lem gives the reading of the "base text", and the two rdg elements each represent a different editorial view of what the text really means.

What if we have a plurality of readings because we have multiple witnesses? Here’s an example of a hypothetical text that is a critical amalgam of two separate witnesses with slight local differences. The witnesses themselves are documented in a listWit element, and the individual readings are associated with the appropriate witness using the wit attribute. Note that you can associate a given reading with more than one witness if that’s more economical.

Generic markup structures

We’ve gone over a fair number of TEI elements with quite specific purposes so far, and there are hundreds more out there--the TEI has anticipated a large number of textual features that we’re going to want to encode, and created elements for them. However, the universe of texts is much larger than the universe of the TEI, and the TEI knows this:

Instead, the TEI provides a fall-back mechanism, a set of generic elements that encoders can use to encode the unforeseen. In these generic elements, instead of giving the element itself a very specific meaning (for instance, personal name, stage direction), the element itself carries almost no meaning at all: it just says "thing!" The semantics, the meaning of the element, is carried in an attribute value, which can be made up by the encoder.

As we show here, there are three main generic elements in TEI, one for each structural level:

Empty elements used as milestones

Simplest option: instead of encoding the feature by enclosing it in an element, instead just mark its boundaries with empty elements

The most common case of this is with milestone elements:

Empty elements used as endpoints

But in addition there are other cases where it’s handy to be able to mark the ends of an element at arbitrary places, rather than having to fit the element neatly into the document hierarchy

For these, as we saw briefly yesterday, we can mark them much more effectively by putting an empty element at each end, sort of like marking the boundaries of an impromptu soccer field by putting your shoes at each end

Then create a link between the two, using the pointing system we talked about yesterday...


Take what is logically a single content object

TEI provides 2 methods for doing this; the first is the part attribute...

The part attribute can be used for serial cases:

Another approach to fragmentation

The next and prev attributes can be used for any cases: