Encoding Contextual Information

Julia Flanders

2018-04-05

Contextual information in the TEI

In earlier versions of the TEI, much of this information had to be encoded in ad hoc ways separate from the encoded document; for the most part the TEI didn’t provide for it

A few exceptions: for instance, the regularization of names, the expression of interpretive information

In P5, much more extensive provision for this kind of information.

Several different types:

Personography

Let’s look first at some common and straightforward examples. Personography is common because everyone has references to people (what varies is what information they want to document)

Key points:

Personography encoding

Let’s look first at personography...

Key points:

A few things to point out here:

So the basic essence of a personography is the management of the identity of individual people: so that we can say in our encoding of the text, we’re talking about this guy here

More personographic detail: demographic and personal qualities

One of the things we can add to our basic identity management is what we might term simple demographic facts: that is, facts about the person that may lend themselves to formalization.

It’s important to note here that formalization is the key to making your personographic data useful. The sample paragraph at the bottom of the slide contains the same informational content (to a human reader) as the encoded sample, but it is dramatically less useful as data (can’t be searched or processed systematically).

Note as well that there are two kinds of elements here: those that are intended to capture some specific piece of information (e.g. nationality), and generic elements (trait and state) that provide a way to model unforeseen facts for which there isn’t a pre-made element.

Let’s pause for a second and consider the data you have in your projects: do you have information analogous to what’s shown here that you want to model? What facts-about-people do you want to manage that aren’t represented here?

More personographic detail: biographical detail

Another important dimension of personographic information is time and biography: things that happen to people, states they pass through, facts that have a specific timespan in their lives. In this example we’ve added some examples of these: for instance, education, residence, affiliations.

This is an area where the type and nature of the information is likely to vary a great deal from project to project, and where questions of quantity and useful granularity really come up. How much information of this kind will you use and what will you use it for?

What information of these types are people encoding? how do you plan to use it? That is, why is it worth representing in this form rather than as a short prose paragraph?

Placeography (gazetteer) encoding

Placeography is the awkward term used in the TEI world for encoded gazetteers and other aggregations of information about places.

A few key points:

The example here shows several different kinds of places and the range of information you can represent:

What kinds of place information do you have in your sights that isn’t represented here?

Other ’Ographies

In addition to these very fully-developed ographies, the TEI provides three others:

The provision for bibliography is long-standing, familiar, and detailed and we won’t cover it here

The provision for orgography and the generic ography is essentially the same: in each case, you’re given a few basic elements (orgName or label plus desc) with which to describe the entity and its properties. For simple data that doesn’t need much formalization, this is OK, but it doesn’t yield much analytical power (compared with the treatment of persons and places). We’ll talk tomorrow about how to handle ographies that are off the TEI’s radar in a more satisfactory way.

Interpretive keywords and themes

Interpretive keywords are a controlled vocabulary of interpretive terms (which might be derived from some standard thesaurus or might be invented by your project); they can be grouped but they are essentially a flat structure. The idea is that these are concepts or terms representing themes in the text that you want to be able to identify consistently (for instance, to support searching in cases when the text’s own terminology is variable or where old spelling, language variation (etc.) make ordinary word searching difficult.

This example is from the WWP and deals with themes that appear in the contextual essays and exhibits that comment on texts in WWO (and potentially could be used to tag these same themes in WWO texts as well).

Note that this information could also go in a separate file (if it’s to be referenced by multiple files)

Interpretive classifications

The taxonomy element is useful when you want to create a more hierarchical set of terms, and where the different levels of the hierarchy are themselves important: for instance, users might want to search on a more general term and retrieve all items that are tagged with its child terms: e.g. all the fiction.

This example is from the WWP and deals with the genres of our texts.

Note that the TEI does provide a place for this in the header, but if it’s to be referenced from multiple files it makes more sense to maintain it in a separate file, and then either include it with XInclude (which is what we do) or reference it remotely. The reason we include it is to facilitate validation (?)

Associating interpretations and classifications with the text

This example shows a variety of ways of associating these different types of topical identifications with the text: