Early typography and letter substitutions

typography vuji regularization
orig reg

In early printed books, there are several typographical factors which complicate transcription. Early usage of i and j and of u and v is variable and differs from modern usage. In capitals, there was typically only one letter for what are now I and J (with several printed forms, some resembling a modern I and some resembling a modern J), and one letter for what are now V and U (with several printed forms, some resembling a modern V and some closer to a modern U). In lower-case, j is very infrequent and u and v are used based on their position within a word rather than for their phonetic value. In addition, a shortage of type in some cases results in the substitution of one glyph for another (for instance, two v characters for a w, an inverted w for an m, etc.).

To support effective word searching, and to improve legibility for non-specialist audiences, it may be desirable to provide regularizations of these readings, even if you are not undertaking a full-scale modernization of the text. This can be accomplished dynamically, using software that either maps specific spellings onto a modern equivalent, or uses fuzzy matching to propose possible equivalents. However, if such software is impractical for any reason, information about regularization can also be encoded in the text itself. This approach allows the information to be used even in very simple publication contexts: for instance, it would allow for a simple user choice between viewing regularized and unregularized versions of the text, irrespective of what publication software is used.

The decision whether to encode regularized typography directly into the text is a significant one, since the encoding is detailed and may be labor-intensive. (Experimental software exists to help automate the encoding process, but it does not provide a universal solution.) If you will be publishing your texts through a system that may allow you to use dynamic regularization, investing in hand-encoding may be a waste of time.

If you do choose to encode regularization information directly into your texts, we recommend using orig with a reg attribute to encode regularized readings of this sort. The source reading is transcribed as the content of the orig element, and the regularized reading is captured as the value of reg. There are some contexts where this encoding makes no sense: for instance, within catchwords and any other places where the text will neither be interpreted nor searched.


Example 1.

I vvent to ivdge the euill deade.

would be encoded as

I <orig reg="w">vv</orig>ent to <orig reg="ju">iv</orig>dge the e<orig reg="v">u</orig>ill deade.