Regularization: orig and reg
In addition to the silent forms of regularization discussed in Regularization: silent, there are several kinds of information which may call for more explicit treatment. In P4, the TEI provides the orig and reg elements, which mirror one another in their function:
- the orig element is used to encode an original reading in the source; it carries a reg attribute which is used to capture the regularized reading
- the reg element is used to encode a regularized reading; it carries an orig attribute which is used to capture the original reading from the source
Using orig gives primary weight to the source reading; the reg element gives primary weight to the regularized reading. These elements may be used for old spellings, typography, reference formats, representation of numbers, and other textual features which it is useful to represent in a regularized form. In cases where only the regularization is desired, you can use the reg element without the orig attribute, to indicate that the reading in question has been regularized (and perhaps referring the reader to your documentation).
For the kinds of archival projects to which this Guide is addressed, we recommend using orig rather than reg, since it gives emphasis to an accurate transcription of the original text as a source of historical evidence. However, we also regard the use of reg as an important way of providing valuable extra information, which can be used to improve searching and to provide additional display options for readers.
Some particular forms of regularization that may be useful for projects using this Guide:
- We recommend regularizing old-style typography, in which i/j, u/v, and w/vv are interchanged. In these cases, the original reading is encoded as the content of orig, and the regularized version is encoded as the value of the reg attribute. We tag the smallest applicable unit of regularization (usually the letter). For more details of the encoding of early typography, see Early typography and letter substitutions.
- Modernization of spelling follows the same logic as above, but because it must typically be carried out on a much larger scale it poses special problems. Projects may choose to modernize the use of ijuvw without also modernizing spelling in general, since it represents a smaller and more manageable challenge and can be (in part) automated. If you do wish to modernize spelling, you should use the orig or reg element as above; we recommend orig.
- For projects dealing with literary texts, we recommend regularizing bibliographic references to texts for which a standard reference system exists (e.g. Homer, the Bible, Virgil, etc.). Such encoding not only produces better search and analysis possibities, but also makes it possible to link out to digital versions of these sources in the future. You can use orig with a reg attribute to encode and standardize these citations. The reg attribute should contain a citation in a standardized format, e.g. Gen_1:13. Although it would be ideally desirable to use a widely shared format for such citations, if one can be ascertained, the consistency of your format matters more than your use of any particular existing format. It is comparatively easy to convert from one consistent format to another.