Encoding brevigraphs

diacritical mark macron brevigraph abbreviation superscript

Brevigraphs are essentially contractions: single characters or character combinations which represent multiple characters or, in some cases, entire words. They have several distinctive features which require special treatment. First, they may not be representable using Unicode: they may be private conventions invented by the author (particularly in manuscript documents) or they may simply be symbols which are too rare or arcane to be included. Second, they may have multiple expansions, depending on context. For instance, a macron or breve over a vowel in early English texts may represent an omitted m or n.

There are thus two functions the encoding needs to perform in these cases: it needs to represent the character that appears in the text, and it also needs to represent the expanded value or meaning of the character.

For characters that are represented in Unicode, you can simply use the Unicode code point for that character. Some characters may require two Unicode code points: for instance, certain consonants with diacritics which are rare enough in combination that they are not given their own Unicode character. For characters that are not represented in Unicode, you can define an entity reference which resolves to an image of the character in question. Even for Unicode characters, you may find it more convenient to use an entity reference which resolves to the appropriate Unicode character, since this may be more mnemonic and easier for encoders to use: ō instead of ō

To represent the expanded value, we recommend using abbr with expan. This treats the brevigraph as an abbreviation rather than as a kind of old-style typography; we feel abbr represents its nature more exactly. However, if in general you do not expand abbreviations, or if for other reasons it makes more sense to treat brevigraphs in the same way you treat old-style usage of i/j and u/v, then using orig with reg may make more sense.


wh<abbr expan="om">&omacr;</abbr>
<p>abbr expan="that">&ysupt;</abbr>