<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="../../../_utils/schema/yaps.rnc" type="application/relax-ng-compact-syntax"?>
<?xml-model href="../../../_utils/schema/yaps.isosch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../../../_utils/stylesheets/yaps-tei.css"?>
<!-- $Id: publication_overview.xml 28651 2016-05-13 09:27:29Z syd $ -->
<TEI xmlns="http://www.wwp.northeastern.edu/ns/yaps">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Publishing and Transforming TEI Documents</title>
        <author xml:id="jhf">Julia Flanders</author>
        <author>Syd Bauman</author>
      </titleStmt>
      <editionStmt>
        <edition>Taking TEI Further: Publishing and Transforming TEI data, Brown
          University</edition>
      </editionStmt>
      <publicationStmt>
        <distributor>Women Writers Project (via website)</distributor>
        <address>
          <addrLine>url:mailto:wwp@neu.edu</addrLine>
        </address>
        <date when="2013-11-20"/>
        <availability status="restricted">
          <p>Copyright 2012 Syd Bauman, Julia Flanders, and the Women Writers Project</p>
          <p>This TEI-encoded XML file is available under the terms of the <ref
              target="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons
              Attribution-ShareAlike 3.0 (Unported)</ref> license.</p>
        </availability>
        <pubPlace>Providence, RI USA</pubPlace>
      </publicationStmt>
      <sourceDesc>
        <p>Very brief minimalist coverage of basic XML publishing tool chain (at a conceptual level)
          and framework.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <change when="2012-12-04" who="#jhf">Created new from scratch but using parts of other
        publication presentations</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <presentation>
      <abstract>
        <p>This tutorial provides an overview of XML publication platforms and outlines the basic
          framework for the rest of the tutorials in Transformation and Publication. Covered here is
          why one might want to publish TEI data, and how one might go about publishing it.</p>
      </abstract>

      <section>
        <head>
          <q>XML workflow</q>
        </head>
        <slide>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/workflow_formats.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>No one needs to convince us of the importance of the overall topic here: "transforming
            and publishing TEI". It's why we create TEI data. However, we may need to do some
            preliminary clarification and scoping to get a full sense of what we mean, of what the
            possibilities are and what kinds of "publishing and transforming" they entail. </p>
          <p>One way to orient ourselves in the landscape of "publishing and transforming" is to
            think about how we use our own data. If we think of the life cycle of a TEI project,
            there are numerous places along the timeline where we want to express different views of
            our data, for internal or external viewing: <list>
              <item>proofreading and error catching</item>
              <item>formats that extract specific structures to let us catch inconsistencies in the
                encoding</item>
              <item>web publication</item>
              <item>print publication</item>
              <item>formats for contribution to collaborative projects (where we might want to
                simplify or alter our markup to match the target encoding of those projects)</item>
              <item>metadata formats (e.g. to expose to metadata harvesters)</item>
              <item>archival formats for committing to a repository</item>
            </list>
          </p>
        </lectureNote>
        <tutorial>
          <p>Usually when we think about <q>transforming and publishing TEI data,</q> we are talking
            specifically about transforming for print or web publications. However, as this chart
            shows this that we may need to transform our TEI for other reasons, throughout our
            workflow. For example, we may want to transform into formats that facilitate
            proofreading and error catching, formats for contribution to collaborative projects,
            metadata formats, or archival formats.</p>
        </tutorial>
      </section>
      <section>
        <head>Single-source publishing and XSLT</head>
        <slide>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/single-source_xslt.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>Another way to approach the topic of this workshop is to think about transformation and
            publishing as a variety of informational avenues that radiate out from our TEI data.
            Even though it's probably a familiar concept to many of you, it's worth noting an
            important assumption that underlies much of our work with XML: we're creating a single
            XML source from which we are going to generate many different kinds of output. </p>
          <p>This is important because the XML source is an expensive and valuable information
            object: it represents a careful modeling of our research materials, we've put a lot of
            work into it (transcription, encoding, proofreading, correction, annotation, other kinds
            of enhancement) and we want to exploit it in many different ways, automatically, not by
            hand.</p>
          <p>When we generate these different varieties of output, we are often losing information:
            erasing distinctions that are present in the source (but unnecessary in the output), or
            moving from a representationally rich language (like TEI) to a representationally
            impoverished language (like HTML)</p>
          <p>But since these output formats are generated automatically, rather than by hand, this
            information loss doesn't matter: the source retains its informational richness: it
            represents the full set of possibilities from which any specific option can be
            generated.</p>
        </lectureNote>
        <tutorial>
          <p>Another way to approach the topic of this tutorial is to think about transformation and
            publication as a variety of informational avenues that radiate out of our TEI data. When
            we create XML, we're creating a single source from which we are going to generate many
            different kinds of output.</p>
          <p>Our XML data is quite rich, in that it says a lot and can do a lot. However, creating
            rich data is expensive and often involves a lot of labor. Since our XML data is both
            very valuable and expensive (and time-consuming) to create, we want to maintain it in
            some form. We want to create less rich output formats without messing with the original
            XML. What we want to do is find a way to automatically transform our XML into other
            formats without overwriting or impoverishing our rich data.</p>
        </tutorial>
      </section>

      <section>
        <head>Some examples</head>
        <slide>
          <p><ref
              target="http://brockdenbrown.cah.ucf.edu/xtf3/view?docId=1811-00322.xml;query=;brand=default"
              >Charles Brockden Brown Archive</ref></p>
          <p><ref
              target="http://www.marktwainproject.org/xtf/search?iso-year=1853;iso-year-max=1875;category=letters;style=mtp;brand=mtp;sort=date;facet-availability=text"
              >Mark Twain Project</ref></p>
          <p><ref target="http://clover.slavic.pitt.edu:8080/exist/paul/data/paul_main.html">Paul
              the Simple</ref>, <ref
              target="http://clover.slavic.pitt.edu:8080/exist/paul/data/paul.xml">XML
            source</ref></p>
          <p><ref target="http://petrusplaoul.org">Petrus Plaoul</ref></p>
        </slide>
        <lectureNote>
          <p>A few examples: <list>
              <item>Charles Brockden Brown</item>
              <item>Mark Twain Project</item>
              <item>Paul the Simple</item>
            </list>
          </p>
        </lectureNote>
        <tutorial>
          <p>For a few different examples of TEI projects that rely on this process of TEI
            transformation, please see these examples.</p>
          <p>The Charles Brockden Brown Archive allows you to download the TEI files, read the
            transcriptions, and look at the page images. As you can see, there is fairly minimal
            markup (only <gi>div</gi>, <gi>p</gi> and <gi>lb</gi>). The transformation too is
            relatively simple. It shows the page images (recorded with the <att>facs</att>
            attribute) and the line breaks that the encoder has marked. The XSLT on this particular
            project is relatively straightforward. However, the site and reading interface are a bit
            more complicated. A project like this would probably require that you hire a web
            developer. Also, if you go to the <q>search</q> button, you will see that the search
            uses XTF, which we will discuss later.</p>
          <p>The Mark Twain Project also has a site that would require a lot time and money invested
            on the project's part. If you click through to one of the letters, you can see that
            there are sidebars that contain notes that are highlighted when clicked. The search
            function on this one also requires someone with extensive technical experience to set it
            up.</p>
          <!--[Paul the Simple appears to be broken!! :(((( ]-->
          <!--[you now need permission to access Petrus Plaoul!]-->
        </tutorial>
      </section>

      <section>
        <head>Transformation as a power tool</head>
        <slide>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/xml_mutability.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>A third important aspect of our topic is the idea of data as a mutable, protean
            substance: as a kind of plastic informational model that we can reshape and manipulate
            as needed.</p>
          <p>In the example here, all four of these examples represent pretty much the same pieces
            of data—any one of them could be generated from any of the others. And yet these
            differences might matter in the context of some particular tool or standard way of doing
            things.</p>
          <p>The point is that our data is almost never trapped in its current format: when we
            understand it as transformable, we gain power over it and we can use it more flexibly.
            If a collaborator needs some information extracted from our data, or if they put their
            fields in a slightly different order, or whatever, it's not a problem.</p>
        </lectureNote>
        <tutorial>
          <p>For our purposes here, it is important to think of data as a mutable, protean
            substance, like a kind of plastic informational model that we can reshape and manipulate
            as needed. In this slide, all four of these examples represent pretty much the same
            pieces of data—any one of them could be generated from any of the others. And yet these
            differences might matter in the context of some particular tool or standard way of doing
            things. The point is that our data is almost never trapped in its current format: when
            we understand it as transformable, we gain power over it and we can use it more
            flexibly. If a collaborator needs some information extracted from our data, or if they
            put their fields in a slightly different order, we can easily transform our data to
            match theirs.</p>
        </tutorial>
      </section>
      <section>
        <head>Scope and ambition</head>
        <slide>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/xslt_workshop_scope.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>The chief tool for doing all of these kinds of work is a programming language called
            XSLT, the Extensible Stylesheet Language for Transformations: <list>
              <item>it can be used on its own to generate different kinds of transformed and
                manipulated data (such as HTML, KML, JSON, other XML formats)</item>
              <item>and it also is built into many (most? all?) of the XML publication systems that
                we use, such as XTF, as the way that they take XML data and manipulate it as part of
                their publication activities</item>
            </list> Either way, what it does is give us a way of manipulating our XML data: to
            extract pieces of it, reshape them, change their format, generally do whatever we want
            to do with them. </p>
          <p>Let's talk for a moment about what we're going to cover in this seminar (and what we're
            not going to cover).</p>
          <p>This seminar is aimed at people who have TEI data and not much else: we aren't assuming
            familiarity with programming, or with XML publishing tools</p>
          <p>Our goal is to help you learn about what's involved in using your TEI data: in
            publishing it, in manipulating and transforming it into other formats, exploiting its
            informational potential; we'd like you to come away, first of all, with a sense of what
            is possible.</p>
          <p>How about in concrete, practical terms? What are we actually going to cover? XSLT is
            hugely powerful--it is a full-fledged programming language--but as a result it's a big
            topic: <list>
              <item>hence we are not aiming here to teach you XSLT in any kind of comprehensive
                way</item>
              <item>what we are aiming to do is give you an understanding of how XSLT works, both on
                its own and in the context of XML publishing systems</item>
              <item>on the first two days, we are going to look at a lot of examples, and we're
                going to experiment with a lot of things that XSLT can do, including generating HTML
                and KML</item>
              <item>on the third day, we are going to install and set up two fairly simple XML
                publishing tools: XTF and TEI Boilerplate</item>
              <item>so at a minimum, by the end of the workshop you will be able to take your TEI
                data and publish it on the web in some basic ways.</item>
            </list>
          </p>
          <p>By the end of the workshop, you should also have a good sense of whether XSLT is
            something you want to know more about and learn in a more systematic way, and if it is,
            we encourage you to take a more intensive XSLT workshop: Syd teaches one at DHSI, and
            Syd and David teach one at Brown every so often. This workshop is a good starting point
            for either of those workshops. </p>
        </lectureNote>
        <tutorial>
          <p>It is important to note what this primer will cover—and what it will not. For our
            purposes, we will be focusing on a language called XSLT (the Extensible Stylesheet
            Language for Transformations). We will be focusing on transforming specifically TEI
            data, although the language is capable of transforming into and out of any XML language,
            as well as other types of data altogether. We will mostly be focusing on how XSLT works
            in the context of publishing systems, using tools like XTF and TEI Boilerplate. However,
            there is much more that you can do with XSLT!</p>
        </tutorial>
      </section>



      <section>
        <head>Simple Publication with XSLT</head>
        <slide>
          <p>Extensible Stylesheet Language transformations allow you to transform XML documents
            into other formats </p>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/publication_tools_xslt.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>The Extensible Stylesheet Language allows you to transform XML documents into other XML
            formats</p>
          <p>Essentially XSLT allows you to map a given XML element onto another XML element: saying
            "take in the following construct, and put out this other construct"</p>
          <p>It could be a construct in the same language, or in a different language such as XHTML,
            as in the example here</p>
        </lectureNote>
        <tutorial>
          <p>XSLT allows you to transform from one XML format into another. When you write an XSLT
            stylesheet, you are essentially saying <said>take this construct and turn it into
              another construct.</said> It is important to note that the transformations can occur
            within the same language (turning one TEI element into another) or from one language to
            another (take this TEI element, and transform it to HTML). In the example listed, we can
            see that the TEI element <gi>text</gi> is transformed into the HTML element
              <gi>body</gi>. This is important for creating XML that can interact with stylesheets
            like CSS. </p>
        </tutorial>
      </section>






      <section>
        <head>XML Databases and Publication Frameworks</head>
        <slide>
          <p>Tools designed to manage large groups of XML files, with more advanced functionality: <list>
              <item>fast, efficient searching</item>
              <item>transformations involving groups of files</item>
              <item>eXist, DBXML, Xindice, XTF, MarkLogic</item>
            </list>
          </p>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/publication_tools_framework.png"/>
          </figure>
        </slide>
        <lectureNote>
          <label>The XML database and publication framework universe</label>
          <p> These kinds of tools are designed to manage large groups of XML files, and to provide
            certain kinds of advanced functionality: <list>
              <item>fast, efficient searching</item>
              <item>transformations involving groups of files: not just transforming each file
                separately, but doing transformations that involve taking parts of different files
                and creating new results files: for instance, a sorted list of the first lines from
                all the poems in a collection.</item>
            </list></p>

          <p> How do databases fit into a larger XML publication framework? What do they do? <list>
              <item>they create and store indexed information: that is, information from the source
                XML files that has been preprocessed to make it more accessible and easier to
                manipulate. For instance, they might store tables of all the document metadata
                (author, title, genre, date, etc.) so that it can be searched and sorted more
                quickly</item>
              <item>they contain a representation of the document's structure in a format that makes
                it easier to process, so that certain kinds of navigation are easier</item>
            </list> Within the XML publication framework, the database sits and waits for queries to
            come in. <list>
              <item>when it receives a query, it performs the necessary searching and returns a
                result (in the form of an XML fragment, or a node set, or some proprietary
                structure) </item>
              <item>the result can then be transformed (e.g. into HTML for delivery to a browser, or
                into some other XML format for other processing) using XSLT</item>
            </list></p>
          <p>XML databases exist as separate modules that can be used as the basis for XML
            publishing systems, for instance: <list>
              <item>eXist</item>
              <item>DBXML</item>
              <item>Xindice (Apache)</item>
            </list></p>

        </lectureNote>
        <tutorial>
          <p>XML Databases and Publication Frameworks are designed to manage large groups of XML
            files. They provide certain kinds of advanced functionality, such as fast, efficient
            searching and transformations of large groups of files. Rather than treating each file
            separately, XML databases do transformations that involve taking parts of different
            files and creating new resulting files. For instance, this type of tool could create a
            sorted list of the first lines of all the poems in a collection.</p>
          <p>So, how do databases fit into a larger XML publication framework? and what do they
            do?</p>
          <p>They can create and store indexed information from source XML files that has been
            preprocessed to make it more accessible and easier to manipulate. For instance, it might
            store tables of all the document metadata so that it can be searched and sorted more
            quickly.</p>
          <p>XML databases contain a representation of the document's structure that makes it easier
            to process, so that certain kinds of navigation are easier. So, as you can see in the
            example. the words from the opening of Charles Dickens' A Tale of Two Cities are
            indexed, so that a query for the word <q>times</q> is returned more quickly than if the
            search engine had to run through the entire document character by character.</p>
          <p>Within the XML publication framework, the database sits and waits for queries to come
            in. When it receives a query, it performs the necessary searching and returns a result
            in the form of an XML fragment, node set, or some proprietary structure. The result is
            then transformed using XSLT. So, for example, the result could be transformed into HTML
            for delivery to a browser, as in the example on the slide (which renders the search hits
            red).</p>

        </tutorial>
      </section>
      <section>
        <head>The Bigger Picture</head>
        <slide>
          <figure>
            <graphic height="400px" url="../../../_utils/gfx/publication_spectrum.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>The tools you need, and the people you need, can be imagined as a rough continuum of
            increasing scale, complexity, difficulty, and cost: <list>
              <item>at the simplest level, there are things you can do (or learn to do) by yourself,
                with very little in the way of equipment or software: tools like XSLT and CSS will
                go a long way towards producing simple, effective interfaces for browsing and
                reading small sets of documents</item>
              <item>at a slightly more complex level, as the number of documents increases and as
                you want to do more ambitious things with them (such as visualizations, complex
                searching), you need software tools that are a little more challenging to manage:
                perfectly within the capabilities of a humanist, but requiring more time: not
                something you can do on the side of another job; this becomes someone's major job
                responsibility</item>
              <item>Going a bit further, we get to things that require XML publication frameworks
                that require a professional systems administrator, someone who really understands
                the installation and configuration of things like web servers, XML databases, etc.
                These are the kinds of tools we need to build things like data mining or text/topic
                analysis into our publications, and also if we want to publish larger collections of
                documents that require more server power/speed</item>
              <item>For production-level publication, where you may be actually charging money for
                access (and hence need to do things like authentication) and hence may have higher
                standards of performance and reliability, you need to start engaging with your
                institutional IT organization to make sure that things like backups, server
                maintenance, etc. are being handled at the appropriate level of professionalism;
                this is also the level of scale at which we start to be able to really work
                effectively with multiple large data sets: for instance, multiple projects of
                substantial size</item>
              <item>Finally, if we want to be able to ensure the long-term sustainability of
                projects, we need to engage with systems like institutional repositories and the
                data curators who can help us ensure that data will be maintained, migrated, etc.
                after the project itself is no longer funded.</item>
            </list>
          </p>
          <p>So considering where the three examples we looked at earlier might fit in: <list>
              <item>Paul the Simple: a single scholar, acting alone</item>
              <item>Charles Brockden Brown: a small amount of professional systems
                adminstration</item>
              <item>Mark Twain Project: a much larger staff, embedded in the CDL (which is where XTF
                was built)</item>
            </list>
          </p>
        </lectureNote>
        <tutorial>
          <p>The types of systems you will need in place to transform and publish TEI data exist on
            a continuum of complexity, difficulty and cost. There's a lot you can learn to do by
            yourself! However, functionality is limited if you want to create and maintain something
            on your own. More expensive and complicated frameworks need institutional support in
            order to survive, however not all of us have access to this kind of support. Moving
            forward (in this primer and for your own data) it is important to think about where your
            project fits in along this continuum.</p>
          <list>
            <head>This tutorial is complete, please see links below to continue:</head>
            <item><ref target="../xslt_intro/xslt_intro_tutorial_00.xhtml">Proceed to next tutorial
                in Transformation and Publication Primer</ref></item>
            <item><ref target="../../../../resources/transformation.html">Return to Transformation
                and Publication Primer</ref></item>
            <item><ref target="../../../../resources/tutorial_main.html">Return to main tutorial
                page</ref></item>
          </list>
        </tutorial>
      </section>


    </presentation>
  </text>
</TEI>
