Introduction to XML

Syd Bauman

2011-02-18

Markup Language

By markup language I mean a vocabulary (i.e., the set of elements that have meaning), and a grammar (i.e. how they relate to one another).

Extensible

methods of defining markup language

syntax for expressing markup language

Boxes in Boxes Representation

Classic boxes-inside-boxes representation of a mythical book that contains an introduction, two chapters, and an index, where each chapter contains a heading and two sections

Tree Representation

Classic tree representation of a mythical book that contains an introduction, two chapters, and an index, where each chapter contains a heading and two sections

Why XML?

Have to admit that when I say XML is easy, I am really refering to XML alone. In order to really use the XML universe you need a lot more. When people say "XML is hard", they usually do not mean "XML 1.0 is hard" but "XML 1.0 + namespaces in XML + XPath + DOM + XSLT + W3C XML Schema + XML Base + xml:id + XInclude + XPointer + ... is hard" and the proportion of criticism that goes to XML 1.0 itself is usually pretty low.Eric van der Vlist on the xml-dev mailing list, Tuesday, 12 Feb 2008 08:28:05

Sample (simplified) Tree

Classic simplified tree diagram of the limerick Warp Speed, Ms Bright! that uses a dashed curved arrow to represent the link from the note to the term, and a dotted arror to indicate the link from the ptr to the Wikipedia article. It is simplified because it does not contain attribute (or text) nodes, and is missing an l

boxes-in-boxes representation

Boxes-inside-boxes representation of a mythical marked-up document that contains two root book elements, each of which has a title and two chapters.

tree representation

Tree representation of a mythical marked-up document that contains two root book elements, each of which has a title and two chapters.

boxes representation

Chinese doll (NOT!) representation of overlap

one solution

Of course software that comes along and counts how many sentences you have in your poem elements now will need to be smart enough to not count this as 4 sentences.

There is an example of how to do this in my stylesheet to count metrical lines that is on the TEI wiki.

Software will also need to know the sentence isn’t .. down the walkHe did not ...

Some XML plusses and minusses

Because elements are always nested inside each other, XML can be thought of as representing a boxes inside boxes model of text, or a tree structure.

some advantages

some disadvantages