Encoding Guide for Early Printed Books

Introduction

History

The modern era of text encoding began, for our purposes, in the late 1980s with two significant milestones: the establishment of the Standard Generalized Markup Language (SGML) as an international standard, and the founding of the Text Encoding Initiative. The former event created a common metalanguage and a technological basis for an intellectually responsible approach to digital text. The latter took this technological promise and began adapting it for the use of humanities scholars. In the intervening decades the technology has matured considerably, with the XML superseding SGML and with the advent of the Web producing a remarkable efflorescence of digital text. The TEI has also matured, producing four versions of its Guidelines for Electronic Text Encoding and Interchange and about to release a fifth in 2007. It is now the intellectual standard for high-quality text encoding in the humanities, and is in wide use by research projects, libraries, publishers, journals, and individual scholars. Text encoding, from being a fairly arcane subject two decades ago, is now arguably one of the central scholarly technologies; its impact on scholarly editing has been immense, and it underlies the development of the large-scale research collections such as Early English Books Online which are transforming how scholars do their work.

This Guide is intended to help scholars come to grips with this domain. It provides guidance on conceptual and strategic matters and background information on technical issues. In particular, it also provides detailed discussion of the use of the TEI Guidelines for encoding early printed books: the rare, fragile, and inaccessible texts that are now being digitized to form the basis of so many scholarly research collections. This is an area where the TEI Guidelines provide a wealth of encoding resources, but not always sufficiently detailed guidance on specific details to show the novice how to proceed confidently. This Guide supplements the TEI Guidelines with advice, detailed discussions of special cases, and rationales for making the difficult decisions that these texts inevitably pose.

What is text encoding and why should one care?

From the viewpoint of the humanities scholar, text encoding initially looks as if it comes to us from computer science: as an activity that takes place on computers and requires some technical knowledge (of software, of data standards, of encoding languages). In fact, it is more useful to think of it as drawing on some other affiliations which instead emphasize its engagement with observation, transcription, and the production of meaning. We might liken the encoder to an anthropologist in the tradition of Clifford Geertz, creating a thick, contextualized, interpretative description of the text, or to a critical editor who produces an analytical representation of the text which provides systematic, expert knowledge about it. Like so many apparently technical concepts, text markup is a more basic idea that has come to our attention because technology and resulting media shifts make us aware of it. In fact it is an expression of motives and practices that have been around for a long time. It is a way of formalizing and externalizing the structures in a text; a way of adding further information to the text that interests us; a meta-text that comments on, interprets, or extends the meaning a text.

If we take these lines of inheritance seriously, text encoding is thus a domain of potential interest and relevance to any academic practitioner, particularly to those who work closely with textual materials. As the research infrastructure for the humanities migrates increasingly into digital forms, an understanding of the representational systems of this new universe is as essential to the modern scholar as an understanding of scholarly editing was in the last century: not an area to master oneself, necessarily, but an area in which naïveté could be limiting or potentially disastrous. Just as most scholars know enough about how printed research materials (such as critical and documentary editions or facsimile reprints) are prepared to assess their editorial probity, so we might argue it is important to have an equivalent level of sophistication in our encounters with digital texts. Basic matters like methods of transcription, regularization practices, and conventions for text presentation have an obvious relevance, but beyond these are more difficult issues: if the text is searchable, what precisely is being searched and how do you interpret the results? if navigational or interpretive features of the interface depend on classifications that are made in the encoding, how transparent are those classifications and what are they based on? how has the encoding represented features that are difficult to name or transcribe?

This Guide attempts to address the question what does one need to know about text encoding? by considering several different kinds of readers who will engage with this domain in varying ways:

Those with a primarily theoretical interest: in other words, those whose research brings them into contact with digital materials, and those who are interested in the evolution of scholarly practices in the digital environment. For these readers, the most relevant sections of the guide will be the opening section on High-level Issues and Strategy and Workflow. These sections provide an overall context for thinking about text encoding, and also reveal something about how digital projects undertake the kinds of methodological decisions that affect the representational outcome. Thus even for those who will never need to make these decisions themselves, this information can illuminate the kinds of motivations and challenges that affect the ultimate shape of the scholarly resources being produced.
Those with a strategic interest: for instance, those who are involved in directing or advising a text encoding project. As above, both High-level Issues and Strategy and Workflow will be of interest, particularly because they help provide rationales for making the decisions that underlie the design of a successful text encoding project (both practically and intellectually). In addition, the Encoding section may be of interest even where specific encoding knowledge is not required, because it provides a detailed sense of the tradeoffs and representational concerns that constrain the encoding process. It also reveals the thought process required and may give a sense of the level of expertise necessary for different kinds of encoding (which may in turn affect issues such as staffing and funding).
Those with a practical and technical interest: those who are directly engaged in the encoding or in making specific decisions about encoding practice. Clearly the entire Guide is relevant for the encoding practitioner, but in particular the last two sections (Encoding and Technical Advice and Magic) are intended specifically for this readership.

It is important to recognize that these domains of interest do not necessarily map onto levels of technical expertise, at least to begin with. We anticipate that there may be many readers of this Guide who have a practical and technical interest in text encoding but at the moment have absolutely no experience or technical knowledge. The aim of this Guide is to enable people in this position to learn what they need, starting from scratch, and to become knowledgeable text encoders.