Encoding Contextual Information

Contextual information in the TEI

In earlier versions of the TEI, much of this information had to be encoded in ad hoc ways separate from the encoded document; for the most part the TEI didn’t provide for it

A few exceptions: for instance, the regularization of names, the expression of interpretive information

In P5, much more extensive provision for this kind of information.

Several different types:

’Ographies: prosopography (personography), gazetteers (placeography), orgography, bibliography
these are like local authority lists that you create
keywords applied to the text as a whole
thematic or interpretive information applied to specific places in the text

Personography

Let’s look first at some common and straightforward examples. Personography is common because everyone has references to people (what varies is what information they want to document)

Key points:

Like a local name authority file: a place where you keep the information you want to express about the people named in your text
Can be simple or very detailed: you might want to record just a standardized version of the the person’s name, or you might want to record detailed information about their life and activities: almost like a miniature biography
Can be kept in your encoded file or externally: if you have a very large list of people, or if you need to share personographic data between multiple files, it may be better to maintain it externally. For our purposes, we’re going to show how to encode it right in your encoded file.

Personography encoding

Let’s look first at personography...

Key points:

Like a local name authority file: a place where you keep the information you want to express about the people named in your text
Can be simple or very detailed: you might want to record just a standardized version of the the person’s name, or you might want to record detailed information about their life and activities: almost like a miniature biography
Can be kept in your encoded file or externally: if you have a very large list of people, or if you need to share personographic data between multiple files, it may be better to maintain it externally. For our purposes, we’re going to show how to encode it right in your encoded file.

A few things to point out here:

Note that each <person> element gets an identifier, and that identifier is used to create an association between the reference in the text and the appropriate person entry in the personography
Note also that we can create links between ographies (so in this case we have a link to a placeography that we haven’t looked at yet)
The @ref attribute is the mechanism for pointing from a reference to its referent; we can use @ref on all the name elements, and also on <rs> (referring string)

So the basic essence of a personography is the management of the identity of individual people: so that we can say in our encoding of the text, we’re talking about this guy here.

More personographic detail: demographic and personal qualities

One of the things we can add to our basic identity management is what we might term simple demographic facts: that is, facts about the person that may lend themselves to formalization.

It’s important to note here that formalization is the key to making your personographic data useful. The sample paragraph at the bottom of the slide contains the same informational content (to a human reader) as the encoded sample, but it is dramatically less useful as data (can’t be searched or processed systematically).

Note as well that there are two kinds of elements here: those that are intended to capture some specific piece of information (e.g. nationality), and generic elements (<trait> and <state>) that provide a way to model unforeseen facts for which there isn’t a pre-made element.

Let’s pause for a second and consider the data you have in your projects: do you have information analogous to what’s shown here that you want to model? What facts-about-people do you want to manage that aren’t represented here?

More personographic detail: biographical detail

Another important dimension of personographic information is time and biography: things that happen to people, states they pass through, facts that have a specific timespan in their lives. In this example we’ve added some examples of these: for instance, education, residence, affiliations.

This is an area where the type and nature of the information is likely to vary a great deal from project to project, and where questions of quantity and useful granularity really come up. How much information of this kind will you use and what will you use it for?

What information of these types are people encoding? how do you plan to use it? That is, why is it worth representing in this form rather than as a short prose paragraph?

Placeography (gazetteer) encoding

Placeography is the awkward term used in the TEI world for encoded gazetteers and other aggregations of information about places.

A few key points:

Very similar to personography...but for places! Each place has an identifier, and the basic function here is to disambiguate (but also to provide a place to store whatever additional detail you want to represent)
It’s worth thinking here about what detail you do want to represent: compared with people, places have a lot less churn and they have more common public significance; as a result they are better documented in public places. Resources like Google Earth, Wikipedia, or geonames.org have a lot of basic data about places already (some of it very systematic and well-formalized) so if you’re storing information locally you should be clear on why that’s worth doing: could be because you have more information, could be because it’s easier to work with if it’s in your own format.
Places in a placeography can be linked to maps via geographic information data (latitude and longitude)

The example here shows several different kinds of places and the range of information you can represent:

Places with a formal identity (cities, states)
Places with a more individual or project-specific significance (e.g. people’s homes)
Addresses with detailed location data
Locations marked by latitude and longitude
Information about climate, population (and potentially other data as well, using <trait>)

What kinds of place information do you have in your sights that isn’t represented here?

Other ’Ographies

In addition to these very fully-developed ographies, the TEI provides three others:

Orgography (concerning organizations)
Bibliography (concerning published items)
A generic ography for whatever other kind of entity you are capturing information about

The provision for bibliography is long-standing, familiar, and detailed and we won’t cover it here

The provision for orgography and the generic ography is essentially the same: in each case, you’re given a few basic elements (<orgName> or <label> plus <desc>) with which to describe the entity and its properties. For simple data that doesn’t need much formalization, this is OK, but it doesn’t yield much analytical power (compared with the treatment of persons and places). We’ll talk tomorrow about how to handle ographies that are off the TEI’s radar in a more satisfactory way.

Interpretive keywords and themes

Interpretive keywords are a controlled vocabulary of interpretive terms (which might be derived from some standard thesaurus or might be invented by your project); they can be grouped but they are essentially a flat structure. The idea is that these are concepts or terms representing themes in the text that you want to be able to identify consistently (for instance, to support searching in cases when the text’s own terminology is variable or where old spelling, language variation (etc.) make ordinary word searching difficult.

This example is from the WWP and deals with themes that appear in the contextual essays and exhibits that comment on texts in WWO (and potentially could be used to tag these same themes in WWO texts as well).

Note that this information could also go in a separate file (if it’s to be referenced by multiple files)

Interpretive classifications

The <taxonomy> element is useful when you want to create a more hierarchical set of terms, and where the different levels of the hierarchy are themselves important: for instance, users might want to search on a more general term and retrieve all items that are tagged with its child terms: e.g. all the fiction.

This example is from the WWP and deals with the genres of our texts.

Note that the TEI does provide a place for this in the header, but if it’s to be referenced from multiple files it makes more sense to maintain it in a separate file, and then either include it with XInclude (which is what we do) or reference it remotely. The reason we include it is to facilitate validation (?)

Associating interpretations and classifications with the text

This example shows a variety of ways of associating these different types of topical identifications with the text:

terms from our <taxonomy> can be associated with the entire document, using <textClass>
terms from either our taxonomy or our <interp> elements can be associated with the entire transcription or with specific parts of it
Note that the @ana attribute is global, which means that you can use it anywhere