Measures and numbers

number price phrase-level encoding
measure num

Encoding of numbers and measurements using measure and num

In projects where measurements or numbers are an important aspect of the text or its research use, encoding measures and numbers explicitly can help make that information more accessible for processing, either by providing a regularized value that can be used for calculation and comparison, or by signalling what type of measurement is being represented, to allow for more specific searching. Some cases which lend themselves particularly well to this kind of encoding are currency, units of land area, units of volume (for instance, in documents describing medical or culinary measurement) or length (in describing geographical distances). In all of these cases, explicit encoding of measurement will enable you to establish comparisons between documents, and extract information such as the length of a route travelled, or the total value of property in a will. If this information is encoded only in a single, isolated document its value may be limited, but if an entire collection (for instance, a set of wills and deeds from an entire town) is being captured, the added value could be considerable.

The TEI provides a measure element, with a type attribute that identifies the unit of measurement and a reg attribute to encode a regularized value. Although the TEI does not specify values for the type attribute, the Guidelines suggest general values indicating the variety of measurement (weight, length, currency, etc.). We recommend instead using type to indicate the unit in which the original measurement is expressed, which conveys more, and more useful, information. The type attribute is particularly useful in cases where the unit is expressed idiosyncratically, or is not expressed explicitly in the text, but is known from external evidence. The reg attribute, if used, should contain both the quantity and the new unit of measurement, e.g.: measure type="fathom" reg="12 ft"two fathomes/measure

In cases where calculation based on the quantities involved is especially important, it may be useful to use the num element to regularize the numerical portion of the measurement, in cases where it is expressed in an idiosyncratic form, e.g.: measure type="fathom" reg="480 ft"num value="80"four score/num fathomes/measure

Another, much more prosaic reason to encode measurements is that occasionally they are printed in a different font or size. Using measure may be preferable to hi, simply because it is more informative, even if no unit or value information is captured.


Example 1.

Price <measure>1 shilling</measure>

Example 2.

Price <measure>1 s.</measure>

Note that the abbreviated forms of monetary values such as “s.” for “shilling” should not be tagged with abbr, since they are on the list of frequently occurring abbreviations which we only tag if renditionally distinct. In the cases listed above, the measure element is sufficient to account for the rendition, and hence the abbr element is not necessary.

Example 3.

<measure>2 copies</measure>

Example 4.

Sugar, <measure>3 lbs.</measure>