Text Encoding

To define text encoding at its most elemental, we could say that it is a way of representing whatever features of the text we find compelling, useful, intellectually motivating. These include, most basically, the alphanumeric characters which make up the text, but might also include anything from parts of speech to narrative structure. This representation is for the benefit of both carbon and silicon intelligences: it enables the human user to yoke the computer's efficiency to the categories which are pertinent to human analysis. Thus, if in encoding the text of a play we have marked the dramatic structure (acts, scenes, speeches, lines, speakers, and stage directions) and some of the analytical categories in which we are interested (names, places, quotations, and allusions), all of these categories become available to us in our work with the text: we can find all of the times that a certain character quotes the Bible, or all the scenes in which two particular characters speak in private.

Text encoding has arisen as part of the new electronic technology, and yet conceptually it derives not chiefly from the computer but from the intellectual endeavors that it serves: for the scholarly editor, it renders explicit the issues that editorial theory always engages with, and the questions that scholarship needs to ask. The encoding of a given text will be useful precisely to the degree that it is systematic, principled, and adequate to the uses to which it will be put. Similarly, the encoding of a given group of texts will be useful to the degree that it enables meaningful analysis of the entire group; it follows, therefore, that projects sharing encoded textual data or contributing it to a public research effort will need to share a standard method of text encoding. Such a standard needs to be nuanced enough to express the distinctions that allow for meaningful research, and inclusive enough to accommodate the wide variety of texts that even a single discipline such as literary studies will need to study.

As text encoding becomes crucial in all areas of computerized text processing, standards have emerged to serve the various communities that need them. The most broad-based of these, and the only one which is an international standard, is Standard Generalized Markup Language (SGML). Its most useful humanities application is the encoding system developed by the Text Encoding Initiative for encoding literary and linguistic texts. However, other non- SGML-based systems have been developed which are equally thorough and principled, with benefits and limitations which are suited to the particular material for which they are designed. An example of such a system is the Multi-Element Encoding System (MECS), which was developed by the Wittgenstein Archive. From the viewpoint of humanities text encoding projects like the Women Writers Project at Brown University, text encoding--and particularly standards like SGML and the TEI-- makes it possible to create large electronic resources of previously inaccessible material, such as rare archival texts by women authors. At the same time, it makes possible an integration of responsible editing practice with the new technologies of distribution and access, such as the internet and the World Wide Web.

Return to Argument