Encoding Guide for Early Printed Books



The term metadata identifies a broad range of information about your encoding, your texts, and your project, that is typically associated with your encoded text. In the TEI world, most metadata is stored in a particular section of the encoded file, called the TEI header. This header contains all the information needed to identify and understand the file, its source, and the encoding it contains. In other words, it makes the file self-documenting, so that if it is circulated apart from your project, it can still be used and understood.

Metadata has been an essential part of academic research for centuries: for example,the bibliographic information on which scholars rely is represented as metadata on title pages, in catalogue records, and in bibliographies. But it plays a particularly crucial role in the digital realm, for several reasons. First, the digital research world relies on fast, accurate searching and retrieval which is only possible if all digital objects carry accurate metadata. Second, there are many important aspects of digital objects—the equivalent of the some of the extrinsic features of printed objects—which cannot easily be discovered by the user unless they are documented as metadata. Some of these may be chiefly of interest to scholars doing the equivalent of analytical bibliography on the digital text (not as far-fetched now as it might once have seemed): for instance, the filename, file size, or character encoding system. Others may be quite important to scholarly readers more generally, such as the names of those involved in editing, transcribing, and encoding the text, and the editorial work involved in its production.

There are many different types of metadata; a few of the most important are listed here:

Because of metadata’s role in retrieval and data exchange, the use of shared standards for metadata is very important. Within a single project, idiosyncratic metadata may not pose many problems, but programs like the Open Archives Initiative (OAI, which harvests metadata automatically from participating projects and makes it possible to perform searches across multiple collections) rely on standard formats. Furthermore, using common standards means that your data can be shared intelligibly with other projects, archived for long-term preservation, or contributed to become part of a larger resource. Inventing your own metadata system, though possible, is not usually a good use of time and rarely results in something better than the existing standards. There are several core metadata standards that form the basis for most digital scholarly projects in the humanities and are often used in conjunction. They are:

Each of these metadata standards can be used independently, and as a result they overlap somewhat with one another: all of them need to be able to account for basic bibliographic details (author, title, date, etc.) and a few other aspects of the digital text. However, because they each are designed to address a different dimension of digital resource creation and management, they are often used in combination. For example, EAD and TEI are used in tandem for projects that are digitizing archival collections: EAD is used to create a finding aid that describes the structure and contents of the collection, its provenance, and other collection-level information, while TEI is used to create transcriptions of the individual items in the collection. METS may be used with either EAD or TEI (or both) to describe large digital collections: the METS records document the digital structure of the collection, the relationships between the component parts and the behaviors associated with each one.