Basic Manuscript and Physical Document Encoding

Julia Flanders

2013-04-26

Some philosophical issues

Note that these are really two separate, though closely related issues:

There are some aspects of the encoding of physical document structures which are common to print and MS documents, so we treat them together

Similarly, there are some issues having to do with our perception of the physical exemplar (esp. having to do with legibility and conjecture) that are common to both.

By and large, the TEI is focused, methodologically, on the text as linguistic rather than material information: its encoding provisions for genre, language, content are rich and detailed, while its provisions for material information are fairly minimal. Textual materiality poses some interesting conceptual problems for markup systems:

We raise these here mostly as signposts to issues that may be interesting, rather than trying to give an adequate treatment here; there’s a lot of interesting debate on this topic and if you’re interested we can provide some pointers

For now, we’re just going to cover some practical encoding points.

A sample manuscript page

A sample manuscript page illustrating various MS features.

Basic prose tagging

These are simple, intuitive elements, many of which have direct corollaries in HTML

Transcriptional complexities: choices

In these examples, we’re still looking at parallelism, but instead of managing it through a linking mechanism, we’re managing it in a different way: through an enclosing element.

These examples don’t actually violate the ideal document tree view, but they make it slightly more complex: almost as if a twig has split nd then rejoined

This approach is useful for smaller and more local examples of parallel text. There are a number of kinds of local editorial changes that are often made in the process of transcription and editing: processes of regularization and correction that are often done silently and noted in an introduction:

In print-based editing, these choices are exclusionary: whichever kind of reading you decide to show the reader, its complementary version has to be suppressed (it could be indicated in a note or an appendix but it can’t typically be displayed as part of the regular reading surface)

In an XML transcription, however, it’s possible to represent both (or in principle multiple) readings in a data structure that shows their parallelism and treats them as alternatives, which can then be chosen (displayed, searched, etc.) when desired.

In TEI, this mechanism is the choice element, which represents a moment of textual forking, where instead of a single reading the text offers a choice of readings

Transcriptional complexities: revision

What’s at stake here: because the transcription of manuscript materials (and often printed texts as well) involves significant efforts of decipherment and in many cases conjecture or interpretation, and also because primary sources are informationally complex (authorial revision, erasures, missing letters, illegible passages, etc.), a responsible transcription needs to capture not just the end product but also information about the process and the editorial decision-making: not just produce a clean-looking innocent butter-wouldn’t-melt-in-its-mouth transcription but preserve information about what was difficult or unclear

conventions for accomplishing this are familiar from print: carets and brackets for marking insertions and deletions, italics to indicate unclear text, footnotes to indicate hypothetical readings or to describe damaged sections

In text markup, the goal is to formalize as much of this information as possible and represent it systematically

Next: show basic encoding features: unclear, supplied, gap, add, del

Encoding the physical document

In TEI, the primary emphasis of the encoding is on the text stream (paragraphs, divisions, and so forth)