In creating TEI-encoded texts, there exist all of the same opportunities for error that plague any kind of publication. There will inevitably be challenges in maintaining the consistency and accuracy of both the transcription and the encoding process, and these challenges increase with the number of people involved and also with the complexity and detail of the encoding. However, because the encoding can be read and examined with XML-aware software tools, errors and inconsistencies in the encoding can be discovered and sometimes even fixed automatically. There are also various techniques (some obvious and familiar, others less so) for catching errors of transcription. We discuss below some methods that have been used by the WWP and other text encoding projects, which may prove useful.
Both hand transcription and OCR produce texts which are likely to contain typographical errors, no matter how much care is taken to avoid them. Probably the most accurate method of text capture is double keying by a vendor, in which the text is typed in twice and the results are compared, revealing any typographical errors; the results from this process can be as low as one character error in 20 pages of text (99.995%). However, even with this method you will need to be prepared to check the output you receive to make sure it meets the specified levels of accuracy. And with any other transcription method, proofreading will be an essential part of your work flow.
Proofreading an XML-encoded document is slightly more complex than proofreading an ordinary transcription, because some of the information you are checking may be captured in the encoding itself rather than in the content. For instance, if you are representing typographical errors in the original by encoding both the error and a corrected reading, you need to ascertain that both are correct. This means that whatever means you use to display or print the text for proofreading will need to provide access to both readings in some manner. Similarly, any information about presentation (such as font shifts, indentation, and so forth) will need to be made available for proofreading. It is not difficult to design a proofreading output (either printable or viewable online) that will show all the information you need in a meaningful way, but it may take some thought and also some training of your proofreaders to make sure that they understand what they are supposed to be looking for. You can also proofread the XML-encoded file itself (again, by printing it out or viewing it online), and this can be very helpful as a way of identifying errors in the encoding itself, but it may be harder to catch typographical errors in the transcription using such a view, because the presence of the markup may pose a distraction. We recommend two proofreading passes, one in which the XML itself is proofread, and then a second pass using some form of formatted output to allow the proofreader to catch any errors that slipped through the first pass. In both cases, the proofreader would be comparing the text against the source copy, line by line. This is particularly important if your transcription captures old-style spellings or errors in the original, which cannot be checked without reference to the source.
The simplest kinds of encoding errors are errors of invalidity and ill-formedness: cases where the XML markup is simply broken. The easiest way to deal with such errors is to prevent them before they occur. A good XML editor (some are described in the section on )transcription and markup will ensure that your XML is well-formed, and will read your schema (the rules that define the structure of your documents) and know what elements are valid. Software of this sort will help constrain your encoding so that invalid markup is identified immediately (or, better still, is prevented before it happens) and can be fixed promptly. It can also help encoders identify the correct element to use, and can thus prevent common mistakes. Creating many kinds of XML errors under these circumstances will actually take some ingenuity, or else gross inattention and negligence. As a final safeguard, it is essential to check the validity of every file as part of your regular error-checking process.
If you are using good XML editing software to support your transcription, the simple kinds of encoding errors described above will be blessedly rare. However, XML software can only prevent errors involving violations of the rules established in your schema. Encoding errors that involve using the wrong element—as long as that element does not violate those rules—will not be caught. Because humanities texts are so complex, the TEI schema is necessarily complex as well, and there are typically a fairly large number of elements that are valid in any given context. If an encoder encodes a passage of text as verse rather than prose, or encodes a word as a place name rather than a personal name, the XML software has no way of noticing the error. For this reason, it’s important to have additional mechanisms in place for checking the encoding. These mechanisms can take several forms.