Repeated information: ibid, ditto, and other abbreviations

ditto chorus refrain subscriber list list bibliography
abbr sameAs

Encoding repetitions signalled with ditto, ibid, and similar markers, using the sameAs attribute

In many contexts we find techniques for abbreviating repeated information, usually through some indicator which refers the reader back to an original instance where the information is given in full. Indicators of repetition may include words like ditto, ibid, or a long dash (e.g. in bibliographies). All of these indicators are implicitly pointers which refer the reader to somewhere else to find the content of the item. They appear in numerous contexts: lists, choruses and refrains, bibliographies, footnotes, and no doubt others.

To encode this information so that both the original text (e.g. the word ditto) and its useful meaning (the information being pointed to) are preserved, we recommend making an explicit link between the repetition and the content being referenced. This link is encoded using the TEI sameAs attribute, which points to the ID of the element in which the source content is contained. If no element is present (at either end of the link), then seg may be used. This encoding essentially means My content (i.e. that of the ditto or ibid) is the same as the content of the location I am pointing to. Each dittoed reference gets its own ID, regardless of whether that particular piece of information has already appeared and received an ID (since logically the ditto refers to the instance immediately preceding). See example 2.

The usefulness of this encoding is chiefly in cases where a small chunk of text may be extracted from the whole: for instance, a single bibliographic entry. The extracted chunk carries with it an indicator of where the missing information can be found. Similarly, if a bibliographical list is reordered (for instance, sorted by date instead of by author), the use of a long dash to indicate a repeated author name cannot by itself convey the name in the reordered version. With the sameAs attribute, the name can still be identified correctly.

In the case of choruses or refrains in poems (where all repetitions after the first may be printed as etc.), the likelihood of extracting a single stanza is less, but cannot be ruled out. If you are confident that you will only display poems in their entirety, then there is no need to encode a pointer using sameAs. But doing so will provide a useful safeguard in case of unforeseen usage in the future. See Example 2.


Example 1.

The source text might look like this:

   Mr. Joseph Atkinson, Kendal
   Mr. Isaac Atkinson, ditto
   Mr. W. Allport, Liverpool, 4 copies
   Miss C. Addison, London, 2 copies
   W. Barrington, Kendal, ditto

That list encoded would look like this:

   <list type="subscriber"><head rend="slant(italic)">Subscribers</>
   <item><persName key="">Mr. Joseph Atkinson</>
         <placeName id="KEN">Kendal</>
   <item><persName key="IAtkinson.wqi">Mr. Isaac Atkinson</>
         <placeName sameas="KEN">ditto</>
   <item><persName key="WAllport.his">Mr. W. Allport</>
         <num>4 copies</>
   <item><persName key="CAddison.wei">Miss Addison</>
  <num id="C2">2 copies</>
   <item><persName key="WBarringt.poa">W. Barrington</>
         <num sameas="C2">ditto</>

Similarly, in a table of contents one might see the following:

Written on the Sea Shore.......19
     ________ visiting Arundel Castle.........20

This should be encoded as (somewhat simplified):

<item><rs><seg id="wr01">Written on</seg> the Sea Shore</rs>
     <ref type="pagenum">19</ref></item>
<item><rs><seg sameas="wr01">&sdash;</seg> visiting Arundel Castle</rs>
     <ref type="pagenum">20</ref></item>