Lists: general notes

subscriber list
list gloss errata

Encoding lists, including discussion of criteria for identifying lists

At a general level, the encoding of lists is not difficult. The TEI provides a list element with a type attribute which may be used to identify different kinds of lists. We describe some particular types below that may be of relevance for users of this Guide. Within the list element, labels (e.g. numbering or other similar marks) are encoded with label and the list items themselves are encoded with item. The label information can also be expressed using the n attribute on item if it is sufficiently regular.

The TEI provides for two different ways of handling the list’s contents. One method treats the list as a kind of two-column table, with optional headings for each column. In this case, the labels and items are encoded as sibling label and item elements, and the headings are encoded as headLabel and headItem, as in Example 1. The other method treats the list simply as a series of items, with an optional heading for the list as a whole. In this case, label (if present) is nested within item rather than appearing as its sibling.

In both cases, an item may itself contain another list; it may also contain paragraphs and other paragraph-level structures; or it may simply contain text. We recommend that for the sake of simplicity the p element only be nested inside item in cases of multi-paragraph items.

The TEI suggests using the type attribute on list to distinguish between different list styles such as ordered, unordered, bulleted, and simple lists. We prefer to regard these differences as presentational, and to encode them with the rend attribute. For projects using this Guide’s recommended rendition ladder, the encoding would be list rend="label(num | alpha | bullet)" (or whatever values were appropriate for the given project). See Example 2 for more detailed examples.

We recommend using the type attribute to capture information about the functional type of list in question: for instance, errata lists, subscriber lists, tables of contents, indexes, and other special list types may usefully be distinguished, in ways that facilitate analysis. The values which are included in the DTD extensions accompanying this guide are: TOC, index, errata, subscriber, gloss, and simple (the default value, which does not need to be entered by the encoder). These different list types are described in more detail elsewhere in this Guide.

The boundary between lists and non-lists is sometimes difficult to draw: sometimes lists are marked very explicitly using labels and formatting, and at other times may be embedded in running prose in ways that make them hard to distinguish. For projects which are chiefly interested in the most overt textual structures, it is best to limit the encoding of lists to those which are completely explicit: they include explicit labels (numeric or alphabetical markers rather than words like First…next…then…fourthly…) or explicit list formatting, or both. For projects which are interested in lists as a rhetorical feature, however, it may be useful to mark lists which are less explicit. In such cases, you should determine the criteria for recognition and apply them consistently.