In their initial stages, text encoding projects are like a dream world full of possibility. It is tempting, and often strategically necessary, to produce an initial project plan which is generous in scope and which demonstrates all the potential of the textual material. A plan of this sort can be useful in convincing your administration that the project is worth supporting, or in showing colleagues the long-term value of the work you are imagining. Project analysis is a reality check, in which you confront your specific constraints of time and funding so that you can choose, out of all possible avenues, the one that makes the most practical sense.
While the following set of questions is not comprehensive, it will suggest the kinds of issues that need to be considered, and how they may affect your encoding decisions.
What is the likely duration of your project: does it have a definite completion date, or is it an open-ended project that will continue to grow over time? Projects with a short and determinate life cycle need to plan their encoding carefully so that it can be completed in the time allotted, and so that it produces a coherent (even if limited) representation of the text. This may mean being deliberately cautious in deciding how much information to capture: rather than encoding from the start in a great deal of detail, it may be better to capture all of your texts at a simple level, and then add further information in a second stage.
Longer-term projects may be able to take a more ambitious or exploratory approach. If you know from the start what your goals for the encoded collection are, you may be able to identify with some confidence the textual features you need to capture and represent. Your encoding scheme can be designed to achieve these results.
Projects of longer duration also may have the opportunity to hire encoders (whether students or staff) for longer periods of time, allowing them to develop greater expertise and thus making it possible to do more sophisticated encoding with greater consistency. When planning a project that will last a year or less, the time available for training may be quite limited, which in turn may mandate a simpler set of encoding choices.
What is your project’s reason for encoding these textual materials? Are you creating scholarly editions? Digitizing materials from a local archive? Creating a pedagogical resource? The type of project will influence the design of your encoding scheme, partly by suggesting specific encoding that will be necessary (for instance, some representation of textual variants for a scholarly edition; some representation of material document detail for archival materials) and partly by contributing to the rationale for your encoding system as a whole. That rationale might be quite simple: for instance, create a digital transcription to enable difficult handwritten documents to be read by the general public. Or it might be more complex; for instance, create a digital edition that represents the relationships between the multiple printed and manuscript sources for Text X, showing press variants, authorial revisions, and material deleted by censors, while also representing the modern editor’s analysis of the text, resulting in a clean reading copy with critical apparatus. A rationale in these terms, by articulating why you are encoding the text in the first place, can help you decide individual encoding issues by considering which approach best supports your overall project goals.
It is also important to examine your own convictions about documents and transcription from the start, so that you can reflect these explicitly in your articulation of project goals. What is your editorial philosophy? Do you regard the original presentation of the document as important or insignificant to its meaning? Do you think that obvious errors in the source should be silently emended, or explicitly recorded? (Do you believe there is such a thing as an obvious error?) Issues like these are instantiated in the details of the markup, so to have a consistent encoding practice you need to be clear on where your project stands.
What audience(s) are you planning to serve? Who are your readers? What is their educational background and level of subject specialization? What disciplines? Is the audience quite uniform, or will different groups expect different kinds of information or need different kinds of support? While there are many textual features that are needed no matter what the audience, there are some that may be very discipline-specific or may depend very much on the audience’s level of expertise. For instance:
Who will be doing the encoding: staff, students, faculty, volunteers? how long will you have them, and how much turnover will there be?
The duration of your encoders’ time at the project will have an enormous impact on how much training you can provide, and how much expertise and consistency you can expect. This in turn will affect how much provision you need to make for subsequent quality control and review processes. In general, hiring encoders for at least a year (and preferably more) gives you the opportunity to train them well and allows them to be productive and see some progress before they leave.
The type of staff you use (students, faculty, library staff, etc.) will also determine the kinds of information you can easily represent in your encoding. Subject experts may be more easily able to identify specialized document features, particularly in older documents, although you can supply this information with careful training as well. To encode topic keywords or to do authoritative identification of names, events, and other contextual information, you probably need to involve subject experts as well, either to perform the encoding of this information or to work closely with your encoders. Our experience suggests that undergraduate and graduate students can do a good job on this kind of work if they have guidance and training from the start.
What is this collection for? what activities do you want it to support? How do you envision readers using it? Is it chiefly intended as a way of providing access to rare materials—is your goal simply to enable readers to see the text in a legible form? or are you trying to enable readers to perform more advanced kinds of analysis? Do you want to be able to display the text in alternate ways (for instance, with or without modernization), or do you want to present it in a single consistent manner?