Tables of contents

table table of contents list
contents div list TOC table table of contents

Encoding of tables of contents with list inside div type="contents", with internal encoding to capture the functional parts of the table of contents information, such as page numbers and titles.

Of all the parts of the book, the table of contents is usually considered to have the least value in a digital representation, since it exists to provide a navigational function which can be better supported by a generated table of contents. Its documentary value may be served simply by transcribing its content, without encoding the links which are implied by the page numbers or section headings. This approach saves a great deal of work, and its only disadvantage is that readers may expect to be able to use it as a navigational tool, and may be surprised to find that it is not live.

If the table of contents is being transcribed solely for its documentary content, some basic structures should be observed but a detailed encoding is not necessary. The table of contents as a whole should be encoded as div type="contents". Inside the div, the table should be encoded as list type="contents". If the list is arranged in columns, the headings for the columns (usually something like Chapter and Page) are encoded with head. Any subsequent repetitions of the column headings caused by page breaks should be encoded (if at all) with fw type="listhead", since they are only present as a result of the page break and hence are similar to other forme work. It is also acceptable to omit these secondary headings entirely.

Each entry is encoded with item. Within item, the name of the individual section should be encoded with rs, and the page number should be encoded with ref type="pageNum". Unless you plan to create links to the actual pages in the text, there is no need to encode a target attribute on ref. If you do wish to encode such links, encode them with target; the value of the target attribute will be the ID of the text chunk (e.g. the div or text element) being named.

We recommend ignoring the dots, dashes, spaces, or other leader that lie between the text chunk and the page number. The relative alignment of the elements (rs, ref, etc.) can be indicated using the rend attribute, and the leaders can be expressed, if desired, through a stylesheet. The specific marks bridge the space are in most cases not significant for any analytical purpose.

If there are internal subgroupings within the table of contents (for example, if the volume is a series of novels each of which has chapters), encode these as nested lists. The outermost item elements would be the novels, and within each novel-level item would be contained a nested list whose items are chapters.


<div type="contents">
<head>The heading for the table goes here.</head>
<list type="TOC">
<mw type="listhead">A subhead, if there is one for the page numbers, etc., goes here.</mw>
[NB use multiple <mw> elements for the headings of multiple columns]
<label>If there is an identifying number, such as a chapter number, it goes here.</label>
<rs>The title of the item, such as a chapter title, goes here.</rs>
<ref type="pagenum" target=[id of element here]>The page number goes here. </ref>