Final performance report: Creating a Guide for Encoding Early Printed Books
Project Director: Julia Flanders, Brown University
June 28, 2007
This report describes the development of the Brown University Women Writers Project’s Guide to Scholarly Text Encoding, originally funded by the NEH from July 2003 to June 2005, and subsequently extended through December 2006. During the three and a half years of work on this project, the WWP has completed and published the Guide, and has also begun using it in text encoding seminars and workshops. The goals of the original proposal have been largely achieved, though as this report will show, some of the goals have had to be reconceived in light of evolving ideas about scholarly text encoding and the Text Encoding Initiative. We feel that the resulting Guide is a stronger and more effective resource for its intended audience than was envisioned in the proposal, but its audience will be the ultimate judge of its usefulness.
The Guide represents a significant achievement, on several levels:
- It constitutes one of the very few comprehensive sources for information on text encoding aimed at a scholarly audience, and the only such source that combines high-level conceptual explanation with detailed technical advice. It serves audiences who are interested in learning about text encoding in the abstract, and also audiences who need practical guidance.
- It represents a complete documentation of the accumulated experience of nearly twenty years’ work on a digital research collection that is widely regarded as among the best in the field.
- It presents clear, concrete encoding decisions and instructions together with detailed rationales that make it possible for readers to learn the decision-making processes involved in text encoding of this kind; it balances the imperatives towards consistency and interpretive adequacy.
- It presents a cogent and approachable explanation of the TEI Guidelines and their significance for digital humanities scholarship, which the Guidelines themselves do not yet provide except for an expert audience.
- It offers both a pedagogical narrative and a reference structure so that it can be used both to explore the domain of text encoding and to consult for specific information during the work process.
This Guide describes encoding languages and work strategies from a field that is still rapidly developing, and even in the time since its completion we have made further updates and changes to its content. The WWP regards it as a significant ongoing publication, on par with Women Writers Online for social impact, albeit in a different frame of reference. We are committed to supporting and maintaining it as a resource for scholars and the digital humanities community, as well as for our own use in training and outreach. We welcome feedback and critique.
Context for the Project
The three-and-a-half-year development of the Guide to Scholarly Text Encoding emerges from the much longer history of the Women Writers Project at Brown University, and of the Text Encoding Initiative itself; it may be helpful to sketch briefly the significant points in that history by way of context to help the reader understand the significance of the Guide.
The TEI and the WWP were founded almost in the same year (the WWP in 1988 and the TEI in 1987) and WWP staff were closely involved in the early development of the TEI Guidelines. When the TEI published the first really implementable release of the Guidelines (P3, issued in 1994), the WWP undertook a major research effort to determine how to apply these new text encoding guidelines to early printed books. Although these materials fell within the purview of the TEI, our detailed engagement with these texts raised issues that the Guidelines did not address explicitly, and in some cases our texts posed challenges to the specific models and formalizations proposed by the TEI. One example shared by many projects was the TEI’s lack of provision for encoding postscripts in letters, but there are many others less obvious, such as the issue of where title pages may appear within the flow of text, or how the internal contents of a footnote may be structured. The result of our year-long research program was an initial customization of the TEI (using the TEI extension mechanism provided for this purpose) expressing the changes required by the special needs of early printed books, and a set of accompanying documentation providing the rationale for our changes and specific observations concerning how to recognize certain textual features, how to handle difficult or ambiguous cases, and how to decide which TEI encoding to use when multiple options were available. We have continued to expand this documentation and it forms the basis of our encoder training; we have also shared it informally with other text encoding projects upon request.
In 2001, WWP staff began teaching workshops on text encoding in several regular venues, including the Digital Humanities Summer Institute at the University of Victoria, Brown University, and the Graduate School of Library and Information Science at the University of Illinois, Urbana-Champaign, and occasionally in other locations. These workshops revealed what was also clear from other observations: namely that for non-technical humanities researchers such as faculty, library staff, and archivists, learning (or teaching) the TEI from the TEI’s own documentation was not an easy matter. The TEI Guidelines in their full form run to 1500 printed pages, and much of their content is fairly technical. They are clear and well-written, and contain all the information necessary to master the TEI’s encoding system, but they are not aimed at an audience unfamiliar with the underlying concepts of text markup, encoding languages, formal grammar, and digital research. Also, as the WWP had discovered in its initial attempts to apply P3, the TEI Guidelines cover the terrain at a broad rather than deep level: in order to cover all branches of humanities text encoding, they cannot address all of the issues arising in any one area. As a result they are not sufficient to guide all of the encoding decisions an individual digital humanities project must make in specifying its own encoding practice. The TEI expects that individual projects will develop their own internal guidelines, but it does not provide support for them to do so. The WWP’s workshops could help participants understand the need for such guidelines and give them a sense of how to proceed, but in the short time available these events could not provide the kind of detailed information necessary to really establish a complete encoding practice.
Because the TEI offers such a wealth of detailed options for encoding documents, and because it aims to serve a very broad audience, it is usually seen as motivating a proliferation of different encoding practices. This freedom is extremely valuable and has led to a rich tradition of experimentation and text encoding research within the TEI community that has had very important benefits for digital scholarship in the humanities, and has helped the community refine and extend its understanding of how text encoding works to support scholarship. However, there are also strong motives for developing common approaches, where these reflect common disciplinary and methodological assumptions. Individual TEI projects sometimes collaborate to share documentation, but the only common application of the TEI that is in widespread use is TEI Lite, a TEI customization that was initially designed to demonstrate the customization mechanism but has been adopted very widely in the library community because of its simplicity. For scholarly projects with an interest in detailed textual representation, TEI Lite is inadequate almost by definition, but it does provide some measure of consistency. Unfortunately it does so by diminishing the potential for detail and analytical power that constitutes the TEI’s value for scholarly projects. This tension between dispersal and cohesion, between individual interpretive freedom and shared standards, lies at the heart of the TEI and is what makes it both intellectually significant and practically challenging in enormous and equal measure. An important goal of this Guide is therefore to propose an encoding practice for projects like the WWP that is both sharable and interpretively rich. As we described the Guide in our proposal for this grant,
“It would enable individual scholars and small projects to start work more quickly and with greater confidence, without having to grapple with all of the TEI Guidelines or replicate fundamental encoding wisdom. It would also provide projects of any size with an alternative to TEI Lite, which—though compact and easy to understand—is not ideal except for very simple representations of literary data, since it lacks some important elements and constrains the use of others. The proposed guidelines would offer an option as manageable as TEI Lite, but designed specifically for the needs of scholarly text projects, with appropriate elements and structural definitions. These guidelines would also describe more complex levels of encoding for projects wishing to capture their data in more detail. Finally and most importantly, it would help ensure greater consistency between projects working on similar materials, and give them a shared basis for discussion and further refinement of their encoding practice.”
As a basis for such a guide, the WWP’s own documentation suggested a useful model at the right level of detail, and one which could be made much more useful if it could be generalized beyond the WWP’s own specific concerns. We applied for funding from two sources: first, for a small grant from the Gladys Krieble Delmas Foundation to support the publication of the WWP’s documentation, with the goal of using this published version as a basis on which to develop a much more extensive set of guidelines aimed at a broader external audience; and second, to the NEH for support in preparing this larger external guide. The Delmas Foundation gave us a generous grant of $25,000 in March 2003, which enabled us to publish the WWP’s internal documentation. The NEH proposal was funded starting in July 2003.
The overall goals of the Guide can be summed up as follows:
- to provide an approachable resource on text encoding for a scholarly audience, including those with little or no technical background
- to provide guidance and documentation for text encoding projects making detailed representations of early printed sources in the manner that the WWP has pioneered
- to make WWP knowledge and experience more publicly available, and to share the value of our long history of public funding.
These goals have been substantially met, though they have undergone some transformation during the past three years, and to some extent their full realization will require more time. But the Guide as now published represents an important contribution and one which promises to be very useful to its intended audience.
The major activities conducted under this grant were to revise and expand the WWP documentation materials from which we started, and then to review and test them, with the first of these activities taking the large preponderance of time and effort.
Several considerations shaped our revision and extension of the WWP’s original documentation:
Audience and function: The audience for the WWP’s original documentation was composed of encoders and staff working at the WWP. The original documentation did not attempt to cover basic technical concepts, since these were included in the encoders’ basic training and were unnecessary for staff. It also assumed a basic familiarity with the TEI, since its primary mission was to document the WWP’s differences from the TEI, and to provide institutional memory about our decision-making process. In revising these materials we needed to give a much more thorough coverage of the TEI’s encoding system as a whole, aimed at an audience with little or no knowledge of the TEI. We also needed to provide a broader explanation of text encoding as a technology of digital scholarship, again assuming no prior knowledge of the subject.
Retrieval and exploration: While the original WWP documentation was written and maintained in a database as a set of individual entries, the Guide is designed to function both as a reference tool and as a narrative document that can be read through. This derives from considerations of audience, since the Guide needs to serve an audience coming to the topic with no prior knowledge of the subject, and hence must provide information in a gradual and cumulative way, rather than requiring intellectual bootstrapping on the part of the reader.
Type and level of constraint: While the original WWP documentation described the use of a single schema (the customized WWP schema based on P4), the Guide had to be written so as to describe potentially three or even four schemas: the WWP’s P4 customization, the plain “vanilla” version of TEI P4, the upcoming release of P5 (so far as details of that could be known), and potentially also the WWP’s customization of P5, although that schema has at the moment only a notional existence. In addition, since the Guide is providing advice to projects some of which will develop their own TEI customizations, it needs to provide advice at a general strategic level as well as at the level of recommending specific elements.
Guided by these considerations, our expansion of the original WWP documentation took the following approach:
- Revise all entries to produce a consistent level of explanatory detail, and to shift the intended audience from an internal WWP audience to an external audience of project developers and humanities scholars
- Review all entries for areas of redundancy and reformulate as necessary to produce a sensible granularization of topics.
- Identify areas that require further detail and produce new entries as needed. The area still requiring further attention is the TEI header.
- Identify and address the areas of our encoding recommendations which will be affected by the changes now being made in the new release of the TEI Guidelines (P5). These changes, and the challenges they pose, are described in more detail below. In particular, an entire class of information which in P4 is expressed using attribute values is expressed as element content in P5, resulting in a more complex encoding structure. In addition, P5 offers more powerful ways to constrain the content of elements and attributes using what is known as data typing; essentially this technique allows one to require that a given element or attribute contain only content that matches certain criteria (for instance, the content must be a four-digit year after 1450, or it must be an integer, or it must be a member of a certain controlled vocabulary). The areas most significantly affected by these changes, from our point of view in writing the Guide, were in the representation of dates and in the way variant readings (such as typographical errors or abbreviations) are transcribed. In the entries affected, we include a discussion of both the P4 approach and the P5 approach, with a note reminding the reader that P5 is still not in final form and may change further. We will continue to monitor changes to P5 and update the Guide as necessary.
The portion of the Guide that describes specific encoding practices (documentation of particular elements and the encoding of specific textual features) now covers the following major areas central to the representation of early printed books:
- Names and name-like features (personal names, place names, organizational names, names of objects and events, non-name references)
- Bibliographic information
- Dates and times, with a discussion of precision and ranges
- Rhetorical and linguistic markers such as foreign language and dialect words, quotations, dialog and direct speech, typographic markers of irony, emphasis, and word usage
- Large-scale textual structures and their representation of genre
- Components of verse texts, including a detailed treatment of verse types and special issues such as the treatment of partial lines and overlapping verse structures
- Components of dramatic texts, and the encoding of non-dramatic texts that use dramatic devices such as speakers and stage directions
- Components of epistolary and entry-based texts such as diaries, logs, journals, and travel narratives
- Physical document structures such as pagination, signatures, columns, lineation, and forme work
- Features of manuscript texts that appear in printed books, such as handwritten annotations and revisions, with discussion of how to represent details of handwriting
- Notes (including endnotes, footnotes, marginal notes, inline notes, and documentation notes) and the detailed linking structures they require
- Reference structures such as pagination, collation, and bibliographic citations
- Issues of transcription, including how to handle damaged, illegible or unclear text, typographic errors in the source, abbreviations, regularization, normalization, and uncertainty.
- Rendition and the representation of the appearance of the source text
- A wide variety of special topics concerning the complexities of recognizing and encoding features peculiar to early printed books
In addition to the revisions described above, we also added a very substantial amount of new material which provides context for this encoding information, including:
- A narrative section on Project Strategy and Workflow, in seven parts, with detailed advice and description of project planning, project analysis, document analysis, transcription and markup processes, error checking and correction, post-processing, and documentation
- A narrative section on fundamental text encoding concepts in seven sections, discussing what text encoding is and how it works, and introducing a set of essential concepts concerning how a text encoding system like the TEI models textual data.
- A set of general entries introducing the major textual genres and areas of the TEI from a functional point of view; these entries provide points of entry from which readers can find their way into the more detailed discussion of specific encoding topics. Topics include verse, prose, drama, letters, phrase-level encoding, rendition, physical document structures, notes and annotations, reference structures, and transcriptional details.
- A set of entries providing technical information and samples including sample texts, model templates, information on XML editors, TEI customization, publishing tools, and how to set up a TEI workflow.
Draft materials were reviewed internally by WWP staff and encoders and by Elli Mylonas of the Scholarly Technology Group, who was a consultant on this project and whose time was cost-shared by Brown.
One unanticipated dimension to the development of the Guide turned out to be the WWP’s text encoding workshops, taught regularly at Brown University, the University of Victoria, and the University of Illinois, and occasionally in other places. These workshops offered an opportunity to develop and refine explanations of the larger encoding concepts and issues of project strategy for an audience very similar to that of the Guide: one with a variety of expertise levels and disciplinary backgrounds. During this process we were able to discover which concepts are generally found to be most challenging, and which needed to be given extra reinforcement.
Issues and challenges
The most significant challenge we faced in completing this project was the changing nature of the TEI itself. When we first began planning the Guide, the current version of the TEI Guidelines was P4, which was issued in 2002 and represented an XML version of the P3 Guidelines that were published in 1994. By the time this grant was awarded in 2003, planning for the next version of the Guidelines was under way and the release of P5 was tentatively envisioned for fall 2004. It was clear that for the Guide to be useful for as long as possible, it would need to address not only P4 but also P5, and in April 2004 we applied for a supplement to our grant to support the additional work of covering both versions of the standard. Fairly soon, however, the schedule for the publication of P5 was revised and in August 2005 we applied for a one-year no-cost extension to the grant through December 2006, by which time we felt reasonably sure P5 would be completed. This estimate also proved optimistic, and at the time of this report, a 1.0 release of P5 is scheduled (fairly firmly) for November 2007. These changes of deadline, and the level of variability of P5 itself during this time, have made it very challenging to cover P5 adequately in the Guide. If we updated our materials too soon, we risked having to do the revision twice, and as a result we have waited until the last possible moment to add the P5-specific material. Some further changes to P5 in the next few months before its release are inevitable, as are changes well into 2008, and we expect that our ongoing updates to the Guide will include revisions of the P5 material for some time to come.
The differences between P4 and P5 are significant and have profound effects on the way the TEI Guidelines are shaped and maintained, but their impact on actual encoding practice (from the standpoint of a user) is more limited. The most important changes are:
- P5 is expressed as a schema rather than as a DTD; schemas are a more modern way of formalizing encoding languages, and they provide more powerful ways of constraining the data. They also support various new XML tools and technologies such as namespaces, which allow the use of multiple encoding languages in a single encoded document. From the encoder’s or scholar’s point of view, this change is essentially invisible, although its effects may be seen in the more precise constraints placed on the data.
- In P5, the mechanism for maintaining the TEI schema and describing customizations of it is much simpler and more powerful than in P4; from the encoder’s point of view, this change is invisible, but for a project manager or an individual starting a TEI project, it makes the process of specifying a TEI encoding system and customizing the TEI much simpler and easier to learn.
- In P5, the underlying organization of the TEI schema (known as the “class system”, the organization of the TEI’s elements and attributes into functional groupings) has been substantially revised and improved. From the encoder’s point of view, this change is invisible, but again, for someone developing or maintaining a TEI schema the changes will tend to make the process easier.
- In P5, there is a new mechanism for handling characters that are not part of Unicode. This mechanism requires each character of this kind to be encoded using the <g> element, which permits it to be described and documented. Because non-Unicode characters are rare, this in itself would not have much impact on most users. However, because the mechanism involves surrounding these characters with XML markup, no non-Unicode character may appear in an attribute value (where XML encoding is forbidden). This means that an entire class of TEI attributes (known as CDATA attributes, meaning that they contain character data) may not contain non-Unicode characters, making it impossible for projects dealing with such characters to use these attributes. To remedy this problem, in P5 all CDATA attributes have been removed and replaced with child elements or some other mechanism. So for instance, to encode a typographical error, instead of the P4 encoding
<sic corr= "remedy ">reemdy</sic>
In addition to these substantial architectural changes, there have also been changes to the content of the Guidelines in P5, some of which affect the WWP and similar projects more directly than others. Those having the greatest impact include:
- changes to the encoding of verse, eliminating the use of numbered <lg> elements
- changes to the handling of alternative readings, resulting from the larger change to the treatment of CDATA attributes described above; this means that all mechanisms for handling alternative readings (including abbreviations and expansions, typographical errors and corrections, and regularizations) are fundamentally changed. This also affects the encoding of names, since the name regularization mechanism cannot use the P4 reg= attribute (which was one of the CDATA attributes removed in P5), thus requiring that name elements contain some other means of regularization. The P5 solution to this issue is still not settled.
- changes to the handling of dates, date ranges, regularized date values, and errors in dates
- changes to the way foreign languages are documented
- changes to the way measurements are encoded (incidentally, the new method was developed by the WWP)
- changes to the encoding of embedded texts (i.e. entire texts which are included within some larger text, such as a nested narrative within a larger framing narrative). (This change was also proposed by the WWP.)
A second challenge in developing the Guide was the question of how to describe a recommended encoding practice at the appropriate level of abstraction. For some textual features, there is a straightforward encoding method that can be recommended without qualification (“for prose paragraphs, use the <p> element…”). In many cases, however, the approach to be taken depends a great deal on the level of detail the encoding project wants to capture, on questions of audience, or on interpretive issues that will vary from project to project: for instance, the question of whether and how to regularize names, or how to encode pagination, or how to represent handwritten material. For these situations, recommending a specific practice is much less useful than describing the rationale for making the decision, and the factors that should be taken into account. This “teaching the encoder to fish” approach not only scales up much better—producing good decisions even in cases we cannot anticipate in the Guide—but also creates an expectation in the reader’s mind that text encoding is an expression of intellectual activity rather than a set of arbitrary codes to be applied.
This approach naturally raises questions concerning consistency: would it not be better to recommend or require a specific practice in all cases, to ensure interchangeability of data? This question is of crucial importance and in fact has been central to our work on this grant. In our original proposal, we assumed that consistency and improved interchange would be a key goal of producing the Guide, and at least one of the reviewers of our proposal cited this as one of the potentially significant outcomes of the grant. We expect that the Guide as written will in fact tend to increase consistency of encoding by projects that follow its recommendations, but we no longer feel that this increased consistency is desirable in itself. Our research during this period, and in particular our contact with a wide variety of scholarly text encoding projects through consultation and training workshops, has suggested that consistency and interchangeability of data are goals arising from the standards community, which need to be reconsidered before they are applied to scholarly projects and used to evaluate their products.
In order to understand why this is so, we need to specify more clearly what we mean by interchange. If by “interchange” we mean successful automatic processing of the data by unforeseen external agents, then ensuring consistency of practice between projects would be an important goal. However, if we mean successful communication concerning the meaning of the markup and the interpretive goals that motivate it, then consistency (and constraints or prescriptions aimed at producing it) may in fact work against the larger goals of interchange. In any serious scholarly encoding work that deals with complex documentary information, both the structures to be represented and the interpretive positions articulated by the markup will vary considerably from project to project. If we constrain the encoding too tightly (for instance, through a very restrictive schema prescribed by an external authority), we can anticipate one of two outcomes: either information will be omitted and oversimplified, or, less benignly, information will be misrepresented and the encoding language will be misused in an effort to capture details that the schema has failed to anticipate. This kind of “tag abuse” (as it is called in the markup community) actually works to the detriment of interchange. The goal in defining markup practices should be to identify a level of consistency that matches the real commonalities between projects, so as to eliminate casual and meaningless divergence while retaining the opportunity for meaningful expressions of disagreement and interpretive range. The schemas used should reflect the encoding vocabulary and constraints that are needed to express the scholarly intentions of the encoding project, and the documentation of these schemas should provide whatever information is necessary for other projects to understand those intentions, so that when data is exchanged, the result is an effective informational interchange.
The practical result of these considerations for the development of the Guide was that we ended up describing three different levels of customization that might be appropriate. The first is not strictly speaking a customization at all, but rather a set of choices from among the options available in the TEI in its unmodified state. A substantial portion of the Guide falls into this category. The second level is the WWP’s customizations of the TEI, which provide additional elements and attributes and changed structural constraints to represent the features of early printed books that we believe are most important for scholarly research. These customizations are included in the WWP schema that is published with the Guide. The third level is more hypothetical, and includes the potential customizations that readers of the Guide might develop on their own (either modifying the TEI or further modifying the WWP’s schemas). We discuss these customizations when describing cases in which other projects might follow the WWP’s encoding rationales while working on different types of documents, and hence might need a somewhat different encoding approach to suit the specific materials. Projects following these rationales to produce new encoding choices are encouraged to create their own custom schemas that reflect these decisions, and the Guide provides advice on how to do so.
Changes and omissions
Some aspects of our original plan for the development of the Guide have changed during the course of the project. The most significant change is in the role of the advisory board. As originally conceived, this project involved an advisory board who were intended to act as reviewers and also as participants in our encoding research: we had imagined that we would involve them closely in discussions of specific encoding issues. We had also planned to have each advisor contribute an essay on a specific topic, and in compensation for this work we had budgeted honoraria for the advisors. In the event, their role was much less significant. Our expectations of the time commitment they could make were probably unrealistic, and in addition several of the advisors came from professional contexts (such as digital library program) that made them skeptical of the value of detailed encoding, though they endorsed the general aims of the Guide; in response to questions about how best to represent a certain feature, they tended to suggest that such a feature was best left unrepresented. We were reluctant to disregard this advice entirely, but our view of the function of the Guide was that it should describe the encoding of any feature we could anticipate a project wanting to represent, and leave decisions about omission up to the individual project. Overall, therefore, our discussions with the advisors were not very useful in the concrete development of the Guide, and we ended up not cultivating these discussions as actively as we had planned. In addition, because of the extended duration of the grant, job and personal changes caused several of the advisors to diminish their involvement. We probably should have simply replaced them but our sense of the need for their support had also diminished; in the event, the development of the Guide went fairly smoothly and the WWP staff were able to accomplish the work necessary. The funding intended for the advisors’ honoraria and travel to meetings was reallocated to additional WWP staff time.
The other change from what was originally proposed was the inclusion of case studies, which we had imagined as a set of comparative studies of samples from different encoding projects. We contacted a number of projects and began the process of assembling samples, but this aspect of the work was given a lower priority than the development of the Guide itself, and in the end was not completed by the end of the project. Other publications now exist that serve a similar function (for instance, “TEI by Example”, http://www.kantl.be/ctb/project/2006/tei-ex.htm), so the Guide’s omission of these is somewhat less of a loss than it might be. However, we still feel this would be a valuable resource, and we plan to continue developing a set of case studies as time permits for eventual inclusion in the Guide.
Audience and Dissemination
The audience for the Guide includes a number of different kinds of readers whose use of the Guide will vary. The most significant audiences are as follows:
- those developing new text encoding projects in the same general area as the WWP (early printed books, especially in English), for whom the Guide may serve first as an entry point to the TEI and then as a basis for their encoding practice (used in its entirety or with local modifications)
- those developing new text encoding projects in other areas, for whom the Guide may serve as an entry point to the TEI and then as a source of information and rationales for encoding decisions as they develop their encoding practice
- those involved in established encoding projects who want to add a more complex layer of encoding to their current practice
- individual scholars interested in starting a text encoding project, or participating actively in one, for whom the TEI Guidelines may represent too steep a learning curve
One challenge for the publication of the Guide will be ensuring that it reaches these audiences in a way that makes its potential usefulness clear. We are publicizing the Guide in several ways:
- through the WWP web site: the Guide has been announced on the main WWP home page and is also linked permanently from the WWP web site at http://www.wwp.brown.edu/encoding/index.html. This mode of dissemination will reach those who come to the WWP web site to use our other documentation and also to some extent those who come to use Women Writers Online and look for information on how the WWO collection is created and edited.
- through the WWP’s workshops, seminars, and consultation: the Guide has already been circulated to participants in the WWP’s current NEH-funded text encoding seminars and is being used as part of the training offered at those events. This mode of dissemination will reach scholars and also people working on digital projects in libraries and digital humanities centers.
- through the TEI Consortium: the Guide has been announced on the TEI discussion list, TEI-L, with an invitation for commentary and suggestions for further development. We will announce the Guide again on TEI-L once P5 is completely stable and our own coverage of P5 is has been updated to match. In addition, we anticipate that the Guide will be listed among the training and documentation resources on the TEI web site (the TEI site is now being overhauled and this listing will become active in the fall when the new site is made public). This mode of dissemination will reach people who are seeking to learn more about the TEI and how to use it.
- through humanities conferences and professional organizations: over the longer term we plan to contact organizations like the MLA (through its Committee on Information Technology) or the STS to alert them to the existence of the Guide and encourage them to list it among technology resources that may be helpful to their members.
Evaluating the Guide or its impact on its intended audience in any substantive way is somewhat difficult, particularly in the short term. Because it is a large and complex resource whose influence is by its nature gradual and advisory, we do not anticipate an immediate shift in practice resulting from its use. However, one more immediate metric of success would be readers’ assessment of the Guide’s accessibility, clarity, organization, and relevance to their work. To assess this success, and to help us in the future expansion and improvement of the Guide, we have designed a brief survey linked from the Guide which solicits feedback from readers.
Continuation of the project
As already suggested, because the TEI Guidelines themselves are in the process of ongoing development, and because scholarly ideas about the role of text encoding are also evolving, in order to stay current and useful the Guide will need to be updated fairly regularly. In summer 2007 the WWP will be hiring a Textbase Editor who will have responsibility for developing and maintaining the WWP’s internal and external documentation, including the Guide. The results of the WWP’s further encoding research will be added to the Guide and we will also develop areas of the Guide that were not completed during the grant, such as case studies. When P5 is stable and the WWP has migrated its collection and schema to it, we will also include a P5 version of the WWP schema, matching the P4 version that is currently available. Finally, we will add more templates, sample texts, and case studies. These developments will be guided by the feedback we receive on the Guide and in particular by the use we make of the Guide in our workshops and seminars.
The WWP’s Guide to Scholarly Text Encoding is an attempt to reach—and to some extent to create—a new constituency of text encoding practitioners. Historically, the TEI has been most intensively used by computing humanists and scholars with a comparatively technical bent, and by project teams that include some fairly substantial dimension of technical expertise. The actual work of encoding has itself been seen as a technical task, at least partly because the tools (SGML editors, parsers, publication systems) necessary for this activity have tended to be borrowed from computer science and from industry. This pattern is now in the process of changing. Increasingly, as “digital humanities” gains traction and “humanities computing” recedes in emphasis, humanities scholars with a cultural rather than a technical interest in digital materials are becoming concerned with text encoding as a central methodology of modern scholarship and editorial practice. These scholars approach the TEI from a different perspective and with a different set of skills and concerns, and the kind of documentation and guidance they need is not so much technical as conceptual: an introduction to text encoding as a representational system and a mode of scholarship.
By providing such an introduction, the WWP’s Guide has the potential for long-term impact in two important ways. First, by reducing the barriers to entry for scholars wishing to learn TEI text encoding, it may increase the number of scholarly digital projects that are undertaken in TEI rather than in HTML, and may also strengthen the applicant pool for grant funding. Second, by providing detailed guidance in the areas of encoding that are trickiest and most variable, it may enable projects to undertake a more detailed encoding than they would otherwise feel confident doing, and it may also improve the quality and consistency of both their encoding decisions and their encoding process. The resulting projects should be more useful to scholars, somewhat more interoperable, and more competitive for funding.
The most substantial product of this grant is the Guide to Scholarly Text Encoding, published online through the Women Writers Project at Brown University, which is written in TEI and published under a Creative Commons license.
The Guide is organized into four sections: a section on managing a text encoding project, a section on basic concepts of text encoding, a reference section, and a technical section with a set of supporting materials such as schemas and templates. A glossary provides explanations of technical concepts and terms. Each of these sections is described in more detail below. Taken as a whole, the Guide is intended to be read both as a sequence (conceptual introduction, detailed information on text encoding, and the final practical how-tos) and as a reference work (through the search interface and various indexes). The text of the Guide is authored and managed as a set of independent chunks, whose metadata allows them to be rearranged and searched in both narrative and reference fashion.
Project Strategy and Workflow
The opening section provides an overview on how a text encoding project actually works and what the basic necessary processes are, including project and document analysis, transcription, markup, error correction, and post-processing. It provides a detailed explanation of each of these stages and information about the options. These are essential concepts which are not self-evident to anyone who has not worked on a text encoding project before, and which scholars need to be aware of in order to make effective decisions and reasonable cost estimates. This section of the Guide will be further expanded over time, but even in its current form it provides a kind of practical guidance for scholars that is currently available nowhere else.
Text Encoding Concepts
The second section provides an overview of basic concepts in text encoding, with the goal of introducing readers to the kinds of representational capabilities XML markup provides, independent of the specifics of TEI encoding language. This section covers topics like classification, rendition, alignment of parallel structures, linking, and metadata. These concepts are fundamental to digital text representation, and while they reflect larger intellectual structures that are also present in print, their digital instantiation requires some additional care and a specific understanding of how digital texts work. This section is also likely to be expanded over time.
This section represents the bulk of the Guide’s content, and consists of an extensive set of detailed entries on specific encoding topics, such as the encoding of verse structures, cast lists, and catchwords. These entries provide background information necessary to understand the encoding, a sketch of the available options where appropriate, description of the recommended practice and a rationale for the decision so that projects can make rational adaptations to accommodate local needs. In cases where the WWP has created a TEI customization to provide for a better encoding of specific textual features, we provide information on both the normal TEI solution and the WWP’s recommended practice, with reference to any modifications to the WWP schema (which is also published with the Guide). In cases where P4 and P5 differ, the differences are also described.
All of the encoding entries include topical keywords for searchability, and the interface to this portion of the Guide allows the reader to approach the content in a variety of ways. The entries appear in a default order that represents a rough developmental narrative (from the easier and more basic concepts to the more difficult and abstruse). In addition, the entries are grouped by topic, and introductory entries on each topic provide an overview and links to the more detailed entries on specific issues and problems. Technical terms are linked to the glossary.
Technical Advice and Magic
The final section provides a more technical context and some basic materials to help novices get started. This section includes a set of schemas (and the TEI source files from which they were generated, so that they can be further modified if desired), as follows:
- a plain TEI P4 schema that includes the modules needed for the encoding of early printed books: linking, transcription of primary sources, simple analytic mechanisms, critical editions, names and dates.
- the WWP’s P4 schema that includes the modules named above, plus the WWP’s customizations which add further special-purpose elements and provide changes to the schema constraints to accommodate the needs of early printed books
- a TEI P5 schema that includes the modules needed for the encoding of early printed books
When the WWP has migrated to P5, our new schema customization will also be included in the Guide.
The technical section also includes a set of encoding templates for specific genres, which provide an initial framework of markup to help readers understand how each genre is represented, and also for use in developing project-specific templates.