Final Report: Seminars in Humanities Text Encoding with TEI

Project Director: Julia Flanders, Brown University

September 30, 2009

Introduction

Between January 2007 and June 2009, the Brown University Women Writers Project (WWP) offered a series of introductory text encoding seminars at eleven institutions, focusing on the use of the TEI Guidelines. The seminars were designed and taught by Julia Flanders and Syd Bauman, with additional support and consultation by project staff from the WWP and the Brown University Scholarly Technology Group. There were two central goals envisioned for this initiative. First, we wanted to provide humanities faculty and students with an opportunity to examine the significance of text encoding as a scholarly practice, through a combination of discussion and practical experimentation. As we observed in the grant proposal, text encoding is typically taught (particularly in a workshop setting) as a technical skill and as something having primarily to do with the use of computers. There have been very few opportunities for humanists to learn about text encoding in a way that emphasizes its theoretical and methodological significance for humanities research and teaching. In these seminars we sought to fill that need, teaching text encoding as both deeply theoretical and also concretely located in practice—something to be learned by both doing and reflecting.

Secondly, we wanted to provide resources to support humanities researchers who wanted to experiment with text encoding on their own, or who hoped to start or become involved with a digital research project. We knew that at many institutions, there is little or no support for faculty projects, and even less for the kinds of specialized problems that scholarly text encoding might pose. For faculty to pursue such projects realistically, they would need to be able to draw on consultation and ongoing advice outside of the seminar itself. Also, because text encoding is not a purely theoretical exercise—because its application depends on the specifics of the information being represented—it must be studied in practice. We anticipated that the seminars might well encourage experiments which, with a little further support, might grow into viable scholarly projects that could compete successfully for external funding.

We thus conceived the seminar series as responding to these needs in several ways which set these events apart from typical text encoding workshops. The starting point for each event was a discussion of research methods and the kinds of digital representations that support them, as a way of situating text encoding within this landscape as a specific representational strategy. In this respect the seminars resembled a very abbreviated version of the graduate “methods” course, which serves to introduce future scholars to the fundamentals of disciplinary practice, and to situate that practice critically. Because of the disciplinary diversity of the seminar participants, these discussions also highlighted differences in method: for instance, between the information that an art historian or a medievalist might regard as crucial to a transcription of an illuminated manuscript. The core of each seminar was organized around participants’ own projects and sample texts, using these materials and participants’ interests to drive further discussion of how to create effective digital representations in TEI. The final segments of each seminar were typically aimed at giving participants an understanding of how text encoding projects are planned and conducted, including topics like digital publication tools, TEI customization, and project management. The seminars were aimed at faculty, students, and practitioners in the humanities with little or no technical experience but a strong interest in digital textuality.

The grant formally began on January 1, 2007, and our first seminar was held in March of that year; the last was completed in April 2009. As Table 1 shows, the seminars were held at a range of institutions, but most were at large universities (Wheaton College being the significant exception). All of the host institutions had some digital humanities activity under way, but at some these activities already had a strong organizational shape (for instance, the Center for Digital Research in the Humanities at the University of Nebraska-Lincoln, UCSB’s Transliteracies Project, or the Maryland Institute for Technology in the Humanities) while at others these activities were in the process of being institutionally formalized. The Digital Humanities Initiative at the University at Buffalo was actually launched during our visit, and our hosts at Texas A&M University had just completed a white paper on digital humanities as part of an institution-wide strategic planning initiative.

Attendance and recruiting strategies varied from event to event; in many cases, the seminar served a substantial local population with a few participants from outside. In others, where the seminar was held in conjunction with another event (as at Miami University and the University of Maryland) the audience was more far-flung. In all cases, however, the audience included a mix of faculty, students, and staff (including both library staff and digital humanities or IT support staff); Table 1 shows the breakdown among the groups. The seminars also varied in their duration, and (to some extent) in the material presented. The shortest event was a one-day seminar at Stanford, the first in the series; the longest were a full three days. In almost all cases we covered basic principles of text encoding, XML, and TEI (though at Stanford and UNL we did not include a hands-on component and these topics were treated in a compressed form); the additional topics we offered depended on local interests, and included project management, digital publication tools, TEI customization, and existing digital research projects. In addition, some seminars included presentations by local researchers.

Table 1: Seminar participants and roles
Faculty and independent scholars 83 (38%)
Students and postdoctoral fellows 73 (33%)
Librarians 32 (15%)
Staff (including IT and digital project staff) 30 (14%)
Other 2
Total 220
Table 2: Seminar locations, attendance, and length
InstitutionAttendanceLength
Stanford University 11 1 day
University of California, Los Angeles 15 2 days
University of California, Santa Barbara 29 3 days
University of Nebraska-Lincoln 27 1 ½ days
University at Buffalo (SUNY) 14 3 days
University of Maryland 19 2 ½ days
University of Washington 22 2 days
Miami University of Ohio 38 3 days
Wheaton College 19 3 days
Northwestern University 15 3 days
Texas A&M University 11 3 days
Total (Average) 220 (20) 27 (2.5)

In all, the seminars reached 220 participants from 71 different institutions, ranging from large research institutions (UCLA, University of Illinois) to small liberal-arts colleges (Wheaton College, Augustana College). A full list of institutions is included in the appendix.

The evaluation feedback that we received was extremely positive. Most participants found the seminars to be the right pace (81%) and length (72%) and considered the seminars to be “highly relevant” to their work (73%). The comments included praise for both the design and conduct of the seminars, as some highlight quotes will illustrate:

  • ‘I honestly loved it. It changed the way I see what I do.’
  • ‘TEI really makes you think long and hard about how aesthetic works are put together and about how best to access those constitutional blocks.’
  • ‘The seminar helped me appreciate the breadth of text encoding and that was the most useful concept I took away.’
  • ‘I found the entire time both useful and interesting. I thought in particular it was a great mix of theoretical explanations and practical coding.’
  • ‘It was wonderful!’
  • ‘Julia and Syd are an amazingly dynamic duo, who managed to make something as potentially dry as TEI seem both important and exciting. WELL DONE!’

More detailed analysis of the evaluation feedback is provided below.

Seminar Design

In designing the curriculum for these seminars, we drew on the experience the WWP has gained from teaching TEI workshops in a variety of venues over the past five years. Although early on our audience was typically library staff and digital humanities practitioners, over time an increasing proportion have been humanities faculty and graduate students, and from their participation we observed that the intellectual hooks most helpful for understanding text encoding were drawn from textual editing, anthropology, and information modeling. What these connections bring to the fore is the role that the editor (observer, modeler) plays in shaping the representation that is created, and the disciplinary, strategic, and interpretive nature of the resulting digital object. The seminar curriculum emphasizes these conceptual models and encourages participants to think explicitly about the models their research demands: in some cases more explicitly than is common in the humanities disciplines. It also raises the question of how these models can be harmonized (through a standard encoding language like TEI) so as to permit communication between researchers.

Our goal was thus for participants in the seminars to come away, at a minimum, with an understanding of how text markup works: what kind of digital representation it can create, and how these can function in a digital scholarly context. We also wanted them to understand the role the TEI currently plays in supporting these activities, and to have some sense of how TEI data works in a digital project context: how such projects are developed and managed, and how they work as publications. To get a grounded understanding of these topics, participants would need some introduction to technical matters—for instance, the basic concepts of XML—but we wanted to be sure that the emphasis was never purely technical, and that even a discussion of XML well-formedness or data standards was contextualized so that their significance for the humanities could be easily grasped.

All of the seminars (Stanford being the one exception; see below for details) were structured similarly around a core set of topics, with several optional topics included based on local interests. The core topics were:

  • Introduction to concepts of descriptive markup and digital scholarship
  • Introduction to the TEI
  • Introduction to XML
  • Basic TEI encoding (simple vocabulary)
  • Advanced TEI encoding (more specialized vocabulary)

The optional topics included:

  • Customizing the TEI schema
  • Tools for publishing TEI-encoded texts
  • Project management
  • Digital research projects
  • Metadata and contextual information

We also included two hands-on practice sessions of two or three hours each in almost every case, with the exception of Stanford and University of Nebraska. In addition, at some of the seminars (Wheaton, University of Washington, Miami University of Ohio, University of Nebraska) we included presentations on local projects.

Some of these topics lent themselves especially well to discussion, and in our conduct of the seminars we tried to encourage this as much as possible. In the earlier seminars we tended to structure the seminars as a series of presentations and hands-on work, followed by questions and answers, but increasing familiarity with the audience enabled us to reframe some of the core topics entirely as discussions (for instance, the opening topic on concepts of descriptive markup and digital scholarship), and we also took advantage of the participants’ questions to launch discussion of broader issues and concepts. The increasing emphasis on discussion also demonstrated that (particularly for the more complex details of markup) participants learned much more effectively from responses to questions than from presentation, and this prompted us to rework our presentations so as to diminish their level of detail and increase their narrative flow. This allowed participants to grasp the overall concepts better, and to ask questions about the details that interested them.

The role of hands-on practice proved crucial in these seminars in ways that we had not predicted. It was clear to us from the start that we would need to ground our discussion of TEI markup with some opportunity for participants to practice what they had learned, and we designed the seminars around an alternation of concepts and practice: first, an opening discussion of background concepts (markup, XML, TEI), followed by hands-on practice; next, an exploration of more advanced encoding concepts (annotation, editorial markup, overlap) followed by hands-on practice; and finally, a discussion of how markup is used (project management, publication tools, schema customization). However, we discovered that the real role of the hands-on practice turned out not to be to anchor specific TEI concepts in people’s minds, or even really to teach them TEI at all, but rather to have participants use the markup process as a way of thinking about the text—thus helping to illustrate how much the process has to do with expressing the scholar’s own interests. This point was reinforced by our use of participants’ own sample texts for the hands-on practice, which ensured that participants were using texts in which they had some personal investment. In our use of the hands-on practice time we did not propose any standard of correctness in TEI encoding beyond the basic requirements of validity[1] and the ideas of economy and elegance that inform any modeling effort. Instead, we tried to reinforce the idea that the TEI can be used to express a very wide variety of research interests, and that in important ways the TEI markup reflects the information modeling that takes place within the researcher’s mind, rather than simply shaping the data to fit a standard external model. (In the seminars where TEI schema customization was covered, we were able to make this point more explicitly, since participants could see how they might modify the TEI to match their own needs more precisely.) The hands-on practice tended to generate questions that would not have arisen otherwise: for instance, concerning the semantics of markup, how to ensure consistency across projects, what level of encoding granularity is useful in different contexts, and similar questions of encoding strategy and its impact on the digital representation being created.

In a similar vein, and even more unexpectedly, we found that offering participants the ability to see their texts displayed had a remarkable effect on their engagement with the markup issues. In the early seminars we did not provide any mechanism for displaying the encoded texts (e.g. in a web browser) and although participants always asked “what will it look like?” we took this question as an opportunity to discuss the separation of content from formatting, and the use of descriptive markup to produce multi-purpose source data that could be displayed in a variety of ways. However, in the third seminar we began providing a simple CSS stylesheet to accompany our TEI encoding template, and showing participants how to modify the style information so that they could choose (for instance) to display all personal names highlighted in red. This experiment was a success not only in giving participants a sense of completion (of being able to “see” their texts), but also more importantly in giving them a very concrete sense of how display (and other functions) can only work with what the data provides: if one wants to see rhyme words highlighted, they first must be represented explicitly in the markup. Enabling participants to design a very simple “publication” in this way gave them a motivation to work more intensively on the markup driving it.

Seminar Resources

To support the seminar series we developed a set of materials for our own use in teaching the seminars, and also for participants (and others) to use outside the seminar to refresh their memories or teach themselves further. Materials of this kind developed for other workshops of ours have also been used by other instructors and we hope that these may similarly be reused in other venues. All of these materials are permanently available from the WWP seminars site (http://www.wwp.northeastern.edu/encoding/seminars/) and are published under a Creative Commons license which permits free reuse, with attribution, for non-commercial purposes.[2]

Slides and Lecture Notes

The slides and lecture notes for all presentations given in these seminars (and in the other workshops we teach) are written in TEI, using a customized schema developed for this purpose. The source XML is converted to HTML slides and lecture note output using XSLT; the source and all derived files are published on the WWP site. We maintain a master set of materials which are reviewed and improved at intervals, and these are available at the WWP’s encoding resources page (http://www.wwp.northeastern.edu/encoding/resources/index.html). In addition, the specific versions used in each seminar are archived with the materials for that seminar, so that participants can go back and review the materials as they were presented at the event. This system permits us to make and preserve event-specific changes and references.

The slide schema itself and the stylesheets used to transform and display the HTML are also available from the WWP site. They provide a convenient, simple presentation system built using open-source software and standards, which also illustrates the variety of uses of TEI markup and customization.

Teaching Schema, Templates, and Stylesheets

Since hands-on practice is so central to these seminars, the schema participants use for this work plays a very important role in shaping what they can do in their encoding and how they experience the TEI language in practice.[3] Because the TEI schema is customizable,[4] we wanted to provide a customization for use in the seminars that would be particularly suited to the kinds of encoding we would be teaching: focused on the representation of historical documents and other research materials. The schema we used for the seminars evolved somewhat during the course of the series as we shifted our approach. In the earlier seminars, we provided two schemas: a very simple one for the introductory exercises containing only the most basic elements, and a more complex schema for the more advanced practice later in the seminar. However, we found that we had better results in most cases if instead of starting with a set of simple exercises, we asked participants to begin encoding their own sample documents from the start: it meant that the materials were more engaging and participants were more strongly motivated to figure out how to represent them. We still offered participants the option of starting with our simpler examples (see below) but few tended to choose this option. As a result, participants tended to “outgrow” the simple schema fairly quickly, and this occasionally introduced a slight logistical hiccup at the point where we needed to show participants how to change their schema. It proved more straightforward to provide a single schema that was complex enough to accommodate all of the features of interest, and have participants use that from the start. In practice the disadvantage we had anticipated (that participants would find the larger schema confusing) did not tend to materialize.

To accompany the schema, we also provided participants with a document template: a short, valid, but skeletal TEI file using the seminar schema. This template contains only the minimum TEI header data and document structure required for validity, and enabled participants to begin encoding without having to worry at the outset about which elements from the TEI header were required; they could simply open the file and begin transcription. In our earlier versions, this document template included some sample encoding, demonstrating the elements for basic prose, verse, drama, and letters. However, we found towards the end of the series that participants almost never used these examples but either deleted them instantly, or left them as a kind of appendage in the document. In some cases, the deletion caused difficulties when participants deleted too much or too little, leaving partial elements behind and causing errors. For this reason we finally modified the template to contain no sample material at all. For future events we will provide an encoded sample text to accompany the template for reference and illustration purposes.

As mentioned above, we also found that participants benefited from being able to see their encoding displayed in a browser, and we developed a skeletal CSS file for this purpose. The encoding template contained a link to this CSS file, so that when opened in a standards-compliant browser such as Firefox, the TEI file would be displayed using the CSS stylesheet. The stylesheet contained almost no style information at all: it simply listed most of the TEI elements used in the teaching schema, with a style designating whether they should be formatted as block-level or inline elements. During one of the hands-on sessions, we showed participants how to add further style information to the stylesheet (for instance, text color, indentation, font size, and so forth) so that different aspects of the encoding could be displayed: for instance, color coding of different rhyme-words or different kinds of names, appropriate formatting of poetry and paragraphing, treatment of different heading levels, highlighting of interpretive codes, display or suppression of alternative readings.

The schema, template, and stylesheets were all bundled together into a .zip file and made available for download from the WWP site.[5] At some events, the local organizers were able to install these materials on the computers in advance, which was a great help; at other events, we had participants download them during the hands-on practice. In the future we would consider making these materials available on USB keys, since this would be vastly more convenient, but the cost would need to be budgeted in advance.

Handouts

We developed a number of handouts on various topics to assist participants in the hands-on practice. These included:

  • A short annotated element list, including all of the most basic TEI tags that participants would be most likely to need in their encoding. This enabled participants to look up elements quickly without having to navigate the TEI Guidelines at the outset.
  • A crib sheet of basic commands in the <oXygen/> XML editor (for inserting elements and attributes, validating the file, and so forth.
  • A crib sheet of basic CSS information, styles, and selectors, to help participants unfamiliar with CSS.
  • A step-by-step guide to creating customized TEI schemas using the TEI’s Roma web interface (http://www.tei-c.org/Roma/).
  • A set of short sample texts designed to illustrate the basic document genres (prose, verse, drama, letters). We also created a more extensive set of sample texts drawing from publicly available archival materials, to provide more challenging examples for participants who did not bring sample texts of their own.

All of the handouts are available for download from the WWP seminars site.

Seminar Descriptions

Stanford University

Duration: 1 day
Venue: Stanford Humanities Center
Hosts: Matthew Jockers and Nicole Coleman
Participants:
10 (Stanford, UCLA, UC Berkeley)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/stanford/

This seminar was somewhat atypical in two respects. For local logistical reasons the event was abbreviated to a single day, whereas the rest of the seminars were two or three days long. Also, because of the short time available we did not engage the participants in any hands-on encoding practice. We began the day by providing background on text encoding and the TEI, and then in the afternoon we examined a number of specific text encoding projects and discussed the different uses they make of TEI markup. Participants raised a number of interesting issues for discussion, including the challenges of representing graphical information in encoded form. (One participant was an art history professor who showed a Blake engraving of Laocon surrounded by a huge complex welter of text, including Hebrew and Greek, that curves around the contours of the human figure.) Participants came away with a strong basic grounding in how text encoding works as a representational system, and how it functions in a scholarly environment as a basis for digital publication and research. However, subsequent experience with the rest of the series suggests that hands-on practice of some sort (even as a large group exercise) strengthens participants’ grasp of the theoretical issues considerably; in a shorter event like this one, it would still be desirable to find some way for participants to engage with the actual markup rather than simply looking at its published surfaces.

University of California, Los Angeles

Duration: 2 days
Venue: Digital Humanities Incubator Group
Host: Zoe Borovsky
Participants: 19 (UCLA, Getty, USC)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/UCLA/

The participants in this seminar were a mix of faculty, library and technical staff, students and postdoctoral fellows from a range of disciplinary areas, with a small cluster working in Russian studies. The first day began with an overview of the kinds of innovative research that are possible using detailed text markup, and showing some examples of projects that are exploring these research possibilities. The goal was to give the participants a sense of what is at stake and also of some of the issues (data consistency, discipline-specificity, level of granularity in encoding, metadata) that affect encoding decisions and project planning. Following this opening session, we spent a second session describing the TEI and then began working on practical text encoding, starting with an overview of XML and basic tagging, followed by some intensive hands-on practice. On the second day, we spent the first session presenting some more advanced encoding topics followed by more hands-on practice, and then discussed the various methods of publishing TEI documents and what the different publication methods offer by way of functionality and research opportunity. We concluded with a discussion of the impact of text encoding on humanities scholarship.

University of California, Santa Barbara

Duration: 3 days
Venue: Transliteracies Project, English Broadside Ballads Archive
Hosts: Alan Liu, Patricia Fumerton
Participants: 25 (UCSB, UC Northridge, UC Irvine, UCLA)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/ucsb/

Participants for this workshop were drawn substantially from UCSB faculty and graduate students in English, but also including participants from history, sociology, comparative literature, education, and visual studies, and also graduate students and postdoctoral fellows from other UC schools including UC Northridge, UC Irvine, and UCLA. Participants also included members of the library staff.

This event was among the most successful in the series, at least partly because of the enthusiasm of the organizers and their personal interest in the issues being raised. Alan Liu and Patricia Fumerton, both of whom are directly engaged in running digital projects, had done a great deal of work in advance to raise interest in the event and to emphasize its continuity with research questions and discussions already ongoing among their students and colleagues. As a result the discussion was at a very high level and participants seemed to find clear and interesting connections to their own work. The members of the English Broadside Ballads Archive project met during the hands-on sessions to focus on issues specific to that project.

University of Nebraska-Lincoln

Duration: 2 days
Venue: Center for Digital Research in the Humanities
Host: Katherine Walter
Participants: 27 (UNL, University of Iowa, New York University)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/unl/

This seminar was held in conjunction with the second annual Nebraska Digital Workshop, which showcased the work of three young scholars in digital humanities. The audience for the seminar thus drew on attendance at the workshop, and participants had the opportunity to see and critique the three projects from the perspective of the issues raised in the seminar. Two of the three projects being showcased were TEI projects (one focusing on medieval manuscripts, the other on a modern documentary edition). The event was attended by 27 people largely from UNL with a few participants from the University of Iowa: a mix of librarians, faculty from various departments, and graduate students. A substantial number of the participants were from the Walt Whitman Archive and the Willa Cather Archive, both of which are digital projects located at UNL.

This event was distinctive in that we were not able to include a hands-on component, due to the limitations of the space where the event was to be held. We nonetheless wanted to expose participants to some actual encoding, and as an experiment we tried a group encoding exercise, in which we showed a very lightly encoded version of a Walt Whitman poem (to give the audience a general sense of what the TEI structures would look like) and then called for suggestions from the participants of features that would be interesting to represent. The resulting discussion was interesting, although slow to start; participants suggested encoding place names, which led to some consideration of how to recognize place names and how to encode allusive or phrasal references to places. They also suggested encoding alliterative structures, which led to a discussion of how fine-grained such an encoding should be and what purpose it would serve.

University at Buffalo (SUNY)

Duration: 3 days
Venue: Digital Humanities Initiative
Hosts: Maureen Jameson, Cristanne Miller
Participants: 14 (University at Buffalo)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/buffalo/

This seminar was hosted by Maureen Jameson and Cristanne Miller, who at the event announced the launch of a new Digital Humanities Initiative at Buffalo, with support from the University. The seminar was attended by 14 people, most of whom were humanities faculty and graduate students, with a few participants from the library and IT departments. All participants were from the University at Buffalo. The seminar covered our full range of topics, but with an emphasis on scholarly editing because of specific interests within the group, several of whom were engaged in or considering digital editing projects. At this event we also made our initial shift from a very topical organization of the encoding presentations towards a more conceptual organization based on scholarly tasks and constructs (editorial apparatus, annotation, comparison of parallel structures). The discussion at this event was exceptionally good; a few of the participants had digital projects either under way or in the early planning stages, and their questions and comments introduced topics which the other participants found compelling as well.

University of Maryland

Duration: 2 days
Venue: Maryland Institute for Technology in the Humanities
Hosts: Neil Fraistat and Matthew Kirschenbaum
Participants: 15 (UMD, Arcadia University, Shippensburg University, University of Houston, University of Birmingham [UK], National Gallery, Harvard University, Washington University of St. Louis)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/umd/

This seminar was hosted by Neil Fraistat and Matthew Kirschenbaum at the Maryland Institute for Technology in the Humanities, and was held in conjunction with their “Digital Diasporas” conference (http://www.mith2.umd.edu/diaspora2008/). There were fifteen seminar participants, almost all of them from the humanities, and many of them were at MITH specifically for the conference and had projects that were germane to the digital diaspora theme. As a result the participants represented a much broader spectrum of institutions than usual. For this event, in addition to our usual presentations and discussion, we included a segment showcasing one of MITH’s current projects, an Ajax XML encoding tool (http://www.mith2.umd.edu/mithresearch/?id=19) that offers a web-based interface for adding XML markup to text and other digital formats.

While having the seminar associated with the Digital Diasporas conference was useful in broadening the audience, it did not necessarily bring in participants with a specific interest in text encoding; the discussion at this event was somewhat less energetic and focused than in some of our other events, which reminds us that the “digital humanities” field is in fact composed of quite heterogeneous interests and disciplinary concerns. However, the survey feedback was positive, and the pace of the discussion may also have been influenced by the large size of the room, which to some extent worked against the immediacy of the conversation.

University of Washington

Duration: 2 days
Venue: Simpson Center for the Humanities
Hosts: Kathleen Woodward and Miriam Bartha
Participants: 24 (University of Washington)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/uw/

This seminar was hosted by Kathleen Woodward and Miriam Bartha at the Simpson Center for the Humanities, which though not a digital humanities center takes a strong interest in digital humanities issues and topics. There were 24 participants, among the largest groups we met, with faculty and graduate students from a very wide range of disciplines, as well as a few people from the library; all participants were from the University of Washington. The discussion was exceptionally lively, which we might attribute substantially to the intellectual engagement of the group, aided by an excellent seminar space (around a table rather than in rows) and the provision of advance readings.

In addition to our usual presentations and discussion, this event included a morning showcase of local digital projects, featuring five extremely interesting research projects ranging from digital library infrastructure to the representation of near Eastern inscriptions. These presentations added a very useful dimension to the seminar, both by providing people with local real-world examples (which help to ground the subsequent discussion) and by giving a sense of the variety of uses to which scholarly digital information can be put. These projects also enabled the group to think more concretely about challenges of project design, work flow, and funding.

Miami University of Ohio

Duration: 3 days
Venue: NINES workshop
Host: Laura Mandell
Participants: 35 (Indiana University, University of Rochester, Youngstown State University, University of Iowa, Hofstra University, Texas A&M, Virginia Commonwealth University, Northern Kentucky University, Queen’s University [Canada], Dalhousie University [Canada], University of Illinois, University of Birmingham [UK], St. Francis Xavier University [Canada], University of Nebraska-Lincoln, National University of Ireland Galway, Acadia University [Canada], Louisville University, Ohio State, Wichita State University, University of Virginia, Case Western University)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/miami/

This seminar was hosted by Laura Mandell and held in conjunction with a workshop funded by NINES (the Networked Infrastructure for Nineteenth-century Electronic Scholarship). Our seminar constituted the first three days of a week-long event, and gave participants a grounding in text encoding which then formed the foundation of subsequent discussions of digitization, collection management, metadata, and other NINES-related topics. At 35 participants, the seminar was the largest we offered. Because of the NINES workshop, the participants were a much more uniform group than usual in some ways, since almost all were working (or planning to work) on a digital scholarly project with a 19th-century focus. The level of interest was also exceptionally high, since all participants were strongly motivated to learn more about how to develop their projects. However, the range of skill levels was still fairly broad, with some participants having no experience with XML or the TEI. The range of questions and the level of discussion were exceptionally high.

Because the remainder of the NINES workshop was going to cover topics such as XML publication and metadata, we were able to give greater emphasis than usual to some more advanced topics, and in particular we experimented with a detailed presentation on the TEI customization process and the design of the TEI as an encoding language. This was the first time we had presented this material at this level of detail in a TEI seminar, although we do typically cover it in our longer (5-day) workshops.

In addition to our usual presentations and discussion, this event included a presentation of an experimental XML editing interface built on top of Microsoft Word 2008 (which uses a form of XML as its document storage format), which is being developed by computer science faculty at Miami University. This prompted an interesting discussion of the relative usefulness of being able to see and edit an XML document without seeing the tags, or of having a more direct awareness of the document’s structure, and the different audiences for whom this tool might be helpful.

Wheaton College

Duration: 3 days
Venue: Wheaton College Library
Host: Scott Hamlin
Participants: 19 (Wheaton College, Dartmouth College, Dickinson College, Willamette University, Rhodes College, Harvard University, Hamilton College, University of Puget Sound)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/wheaton/

This seminar was hosted by Scott Hamlin, Director of Technology for Research and Instruction at the Wheaton College library, with whom we had collaborated previously. Wheaton has had a long relationship with the TEI and in fact this was the third TEI event in which we have participated at Wheaton: in 2004 and 2005 Scott organized a NITLE-funded workshop on the TEI for humanities faculty from small liberal-arts colleges, and these events engendered several small-scale TEI projects which have flourished over the past few years. Several of the participants from these earlier events attended this seminar as a refresher, and in addition we saw several new participants from other NITLE schools. The participants included humanities faculty and library and IT staff, and one representative from NITLE (which co-sponsored the event). In addition to the usual mix of presentations and discussion, the event included a project showcase in which participants demonstrated projects under way.

In a followup message, Scott Hamlin has let us know that as a result of the seminar, he is collaborating with Dickinson College and Mount Holyoke College in a IMLS Leadership Planning Grant. If funded, this grant would fund a service that would help small liberal arts colleges publish their TEI documents online.

Northwestern University

Duration: 2 days
Venue: Northwestern University English Department
Host: Martin Mueller
Participants: 14 (Northwestern, University of Chicago, University of Illinois, Loyola University, Augustana College, Newberry Library)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/northwestern/

The seminar’s host, Martin Mueller, is a long-standing member of the digital humanities community and has been involved in several digital projects, most notably WordHoard (a resource for exploring the major works of English literature using digital tools) and MONK (http://www.monkproject.org), which focuses on the application of data mining and visualization techniques to large corpora of humanities texts. The participants included junior and senior faculty, graduate students, and librarians. Most had had some exposure to digital projects in the past, but several professed no prior knowledge whatever. For a few participants, motivation to attend the seminar came from incipient involvement in a digital project, which led them to seek greater expertise in this area.

Texas A&M University

Duration: 3 days
Venue: Texas A&M University English Department
Host: Maura Ives and Amy Earhart
Participants: 11 (TAMU, Texas Tech)
Schedule
: http://www.wwp.northeastern.edu/encoding/seminars/tamu/

The final seminar in our series was in certain respects the most successful, and certainly the most dramatic from a logistical viewpoint. First scheduled for September 2008, the seminar had to be cancelled at the last minute because of Hurricane Ike, and was rescheduled for April 2009. The seminar was hosted by Maura Ives and Amy Earhart, both of whom have a strong interest in digital humanities. At the time of our second visit they had just completed a white paper proposing a new digital humanities center as part of a campus-wide strategic planning initiative, and the audience for the seminar drew on several digital projects at Texas A&M, notably the Digital Donne project (http://digitaldonne.tamu.edu/) arising from the Donne Variorum, and the World Shakespeare Bibliography. In all there were eleven participants, almost all from Texas A&M, representing a mix of graduate students, faculty, and IT staff. Most had had some exposure to digital projects but not to XML or TEI. The discussion in this event was uniformly excellent; two of the participants were technically quite advanced and had a background in programming, and they were able to ask questions which, though informed by technical interests, led to a discussion of broader issues which the entire group found illuminating but which would have been difficult for the other participants to raise. In addition, the preparation of the white paper had already begun a set of conversations about the role of digital humanities research at the university which helped localize and animate the seminar discussions.

Evaluation and Analysis

Summary of Survey Results

To evaluate the seminars, we created an online survey (using the Vovici survey-building site, http://www.vovici.com) which we asked all participants to complete following each seminar. The response rate varied from event to event (probably depending partly on how much reinforcement participants received from the local organizer); the total response rate was 34% (74 responses from a total of 220 participants) which was lower than we had hoped. Responses were drawn from across the participant population, however, with faculty, students, and staff responding in roughly the same proportion as their attendance overall. We felt it was fair to assume in analyzing the survey results that responses came disproportionately from those who felt most positively about the seminar and were thus motivated to respond, so the results must be read with an admixture of caution. Nonetheless we felt the feedback was encouraging (and it was confirmed by the informal comments we received during the seminars) and suggested that the seminar program had been a success. Each faculty member or graduate student who took away some greater interest in concepts of scholarly text representation and the use of digital markup is in a position to communicate that interest to many others through teaching and other interactions; library and IT staff are similarly able to point colleagues and collaborators in the direction of TEI if appropriate. The most important effects of the series may thus operate over the long term.

To get an idea of where the comments were coming from, we asked respondents to identify their level of familiarity with digital technology generally, and with text encoding in particular. (We did not specify “TEI encoding,” so familiarity with text encoding could include experience with EAD, metadata standards, or potentially even HTML.) Surprisingly, most respondents indicated that they were ‘somewhat familiar’, ‘very familiar’, or ‘expert’ with digital technology (87% of all respondents), and even more surprisingly 63% indicated that they were “somewhat” or “very” familiar with text encoding—which seems to suggest that the seminars were drawing on an audience that was already interested in digital humanities but were less successful in bringing in those who were not already engaged in some way. The survey results mirrored our own informal observations in the seminars: there was usually a wide range of expertise in each event. The most frequent type of digital expert was a collaborator from computer science or digital support staff member; for these participants, the most important aspect of the seminar was often the exposure to scholarly approaches and functional requirements. Those most frequently identifying themselves as unfamiliar with digital technology were the senior faculty and graduate students. The full breakdown of expertise levels follows (see Figures 1 and 2).

chart showing participants' familiarity with digital technology

chart showing participants' familiarity with text encoding

As already suggested above, the overall feedback we received from the survey was very positive. Respondents indicated that the seminar was the right length and pace, and that it was pitched at the right level of technical detail; they also felt that the seminar was relevant to their work and would have an impact on their future scholarship (see Figures 3–7 for a summary of responses).

chart showing participants' perception of the seminar length
chart showing participants' perception of the seminar's pace
chart showing participants' perception of the seminar's level of detail
chart showing participants perception of the seminar's relevance to their research
chart showing participants' perception of the seminar's impact on their research

We also asked questions about the content of the seminar: which aspects of the seminar were most and least useful, and what additional topics should have been included. The responses to “what topics or sessions did you find most useful or interesting?” were quite varied. Many respondents mentioned the opportunity for hands-on practice, particularly in combination with the discussion of theoretical topics, as the “most useful or interesting” part (23 responses, 32% of the total). As one participant put it,

I found the ‘hands-on’ sessions to be the most useful, but they would’ve been useless if we didn’t have enough knowledge to do the work. What I’m driving at here is that the ‘hands-on’ portions were very well placed and interspersed throughout the workshop.

Other responses (sometimes by the same respondents) highlighted other areas that were felt to be useful:

  • examples and analysis of existing projects
  • TEI customization
  • discussion of project management, planning, and strategy
  • theoretical discussion of text encoding and digital representation
  • historical and conceptual overview of text encoding standards

Addressing the question “what was the most useful concept or information you took away from the seminar?”, respondents tended to answer in general rather than specific terms: the following responses were typical:

  • ‘The importance and usefulness of TEI’
  • ‘The logic of how TEI works.’
  • ‘The multiple uses of TEI encoding, that make the investment of time worth-while, and the resources identified to turn to for aid.’
  • ‘The relationship between the mechanics of text encoding and how databases are structured, searchable—the "what’s under the hood" aspect of TEI/databases, project design considerations, etc.’

These kinds of answers suggest that the seminar did succeed at its most basic and important goal of giving participants an overall appreciation of the importance of markup of this kind for humanities scholarship, and a sense of the informational complexity that underlies high-quality digital resources. Some particularly telling responses also suggested that participants understood the critical role humanities scholars need to play in developing these resources:

  • ‘that I need a schema and that I need to validate and that I need to take more ownership of the computing matters instead of continuing to rely on knowledgeable software colleagues’
  • ‘encoding as a disciplinary practice’
  • ‘That TEI really makes you think long and hard about how aesthetic works are put together and about how best to access those constitutional blocks.’
  • ‘The confidence that I can master this difficult procedure and enrich my work and the publication of that work through the internet.’
  • ‘It was exhilarating to see how consequences of decisions about encoding structures are immediately visible to practitioners in all their hermeneutic ramifications. It is exciting to envision working with a living group of like-minded scholars who get the point, rather than having all my mark-up interactions be limited to the W3C validators. I’m psyched.’

The most typical response to “what topics or sessions did you find least useful?” was some variant on “everything was useful” (17 responses, 23% of the total). 34 respondents did find some topics less useful or successful. In some cases this was because the information was too advanced or complex: for instance, the presentation of TEI customization, XSLT, or XML publication tools. In other cases, it was because the information seemed too basic, or the respondent was already familiar with the material: for instance, several respondents who indicated that they were “very familiar” with digital technology found the XML background unnecessary. These are critiques which arise from specific individual needs; what is too technical for some participants will be too basic for others, and we tried to address these concerns by covering a range of topics (technical and introductory) in each seminar. Because in general the feedback was positive (most of the respondents commenting in this area also indicated that the level of technical detail overall was “just right”) in the final analysis we felt that this class of concerns reflected the natural diversity of the audience and would be difficult to eliminate.

Some other comments did point to areas we could improve. A few respondents reported that some of the information was too abstract or could have benefited from more examples, particularly if the same examples could be used through the course of the seminar in the manner of a case study. Several suggested that providing readings in advance would be valuable. Others suggested that the hands-on practice (particularly the first practice session) should be more structured with a more specific task. These suggestions all resonate with our own observations and we will consider how to implement them in future workshops we offer.

A few other observations of interest:

  • People enjoyed learning from others; one respondent commented (in the context of space logistics) that “the best times were when we were all together and could hear what other people were coming up against and trying out”, and this idea was echoed in other responses as well.
  • For some respondents, what seemed most valuable was simply the realization that they could undertake a digital project on their own; as one person put it, “[the most useful concept I took away was] the confidence that I can master this difficult procedure and enrich my work and the publication of that work through the internet.”
  • Respondents also found value in what one of them termed the “big picture”, particularly when anchored in practical detail; several comments are revealing here: “I found the discussion around the theory of TEI and the ‘big picture’ of where scholarship in the humanities is heading, incredible useful. Our opening discussions really drew me in and sparked my interest in TEI, when I came with fresh eyes, entirely new to the program. This theory talk, coupled with the technical details teaching, and the hands-on work, all went very well together and overall was effective.” “It was a great mix of theoretical explanations and practical coding.” “I saw TEI not purely as a technical means, but as an intellectual tool to further scholarship in a way that makes the material mutable and interactive - a great thing!”

Seminar Logistics and Ergonomics

Although our focus in planning and teaching these seminars has been chiefly on the text encoding content, during the course of the series we have also made some observations concerning the peculiar ergonomics and logistics of teaching this kind of seminar—a hybrid of presentation, discussion, and hands-on practice which requires a variety of different kinds of interaction.

Certainly the technical logistics are important: to enable a group of ten or twenty people to practice text encoding the basic technical prerequisites are:

  • a classroom with networked computers and data projection equipment
  • installed on each computer, a standards-compliant browser such as Firefox, and an XML editor (we used <oXygen/>)
  • access to some form of data storage (e.g. networked server for local participants; a USB key or other removable media for others) to enable participants to save their work
  • network access for visitors who choose to use their own laptops

These facilities were readily available at nearly all of the institutions we visited. In one case the group was split between two classrooms, and in another there was no computer classroom readily available. But for the most part, wired (or wireless) classrooms seem to be available for humanities use, and are staffed in a way that permits new software to be installed. The technical support at all the institutions was excellent.

But although the greatest logistical challenge in conducting seminars like these would at first appear to be on the technical side, in fact our experience suggests that other factors play a greater role in the success of such events. Because of the diversity of the audiences (including librarians, junior and senior faculty, students, technical staff, and others) it was important at each event to create an atmosphere where everyone could feel comfortable asking questions, expressing ignorance, and contributing a variety of expertise. The ergonomics of space turned out to have a significant impact on this level of comfort: the most successful seminars were those where the discussion took place around a central table, rather than with chairs or computers placed in rows. Similarly, very large rooms where participants were seated at a distance from one another (or with a great deal of space overhead) tended to diminish the vibrancy of discussion. These effects are not unexpected but their significance in this case was perhaps magnified, since a “seminar on text encoding” is not a genre of pedagogical encounter that participants would find familiar; we could not count on normal habits and expectations to provide a basis for interaction. When participants felt (because of proximity and sitting face to face) that they were all in this together, they seemed more inclined to take conversational risks, respond to each other’s observations, and generally work harder at the social encounter. When they were more isolated from one another, they seemed more inclined to leave the work to the instructors, and to treat the event as one of passive delivery rather than participation.

Assistance to Participant Projects

One of the goals of the seminar series was to enable participants to start or further their own text encoding projects. To support these efforts we offered participants ongoing advice and support upon request through the end of the grant period. In our initial proposal we anticipated that participants might need support of the following kinds:

  • Advice on developing grant proposals
  • Advice and assistance in developing TEI schemas and templates for use in their project
  • Assistance in developing stylesheets to transform TEI data into other formats for publication
  • Answers to ongoing encoding questions

In the event, we did receive requests for support of all of these types, though not to the degree we had expected and not immediately following the seminars. We had expected that participants might use the seminar as an opportunity to start work on a project that would sustain a certain urgency following the seminar, requiring followup and ongoing support. Instead, it appears that for many participants who had projects in mind, the actual project planning and development was driven (or held back) by external factors such as their own available time, or the availability of student labor to work on the project, or the opportunity to seek local funding. Even if support from us was available, that support was not enough to permit them to maintain momentum. This suggests that for many faculty, at least, development of a digital project (precisely because it is in so many ways a collaborative venture or at least one that requires coordination of many strands of support) requires not only support but a certain kind of infrastructure: one which can ensure the supply of resources, labor, and project organization in such a way that these are all brought to bear when needed. It may be worth noting in this context that several participants from these seminars have applied to attend one of the WWP’s NEH-funded series of advanced institutes in 2009-2011.

Below are brief notes on the participants to whom we provided some support following the seminar.

Ronan Crowley: Ronan is a graduate student in English at the University at Buffalo, working on James Joyce’s Ulysses. WWP staff provided informal followup advice.

Catherine Goebel and Michelle Richmond: Catherine Goebel is a professor of Art History at Augustana College, working on a set of press-cutting scrapbooks compiled by James MacNeill Whistler. She attended our Northwestern University workshop and then urged her collaborator, Michelle Richmond (a visual arts librarian), to attend another workshop of ours (not funded under this grant). Together they are planning a digital project focusing on these press cuttings. Following the seminar, WWP staff assisted Goebel and Richmond in a variety of ways:

  • developing encoding templates to help ensure consistency and make it easier to train their student encoders
  • converting metadata from a raw input form into TEI headers
  • providing preliminary feedback and advice on draft grant proposals

Kent Hooper: A professor in the department of Languages at the University of Puget Sound, working on a bibliography of works by Ernst Barlach. WWP staff assisted him in converting his bibliography to TEI.

Susan Cole: A professor of Classics at the University at Buffalo; WWP staff assisted her in consolidating data sources for her project and doing some file format conversions necessary to getting these materials into a TEI format.

Maureen Jameson: A professor of Romance Languages and Literatures at the University at Buffalo, working on a project to develop a digital collection of texts and translations for use in language teaching. WWP staff provided informal followup advice to Jameson as she taught herself XSLT and made contacts within the TEI community.

John Bryant: A professor of English at Hofstra University, working on a digital edition of Herman Melville’s Typee and also the larger Melville Electronic Library (which received an NEH Digital Humanities Startup Grant in 2008). WWP staff provided followup advice on encoding questions and also provided feedback and advice on draft grant proposals. Bryant and a collaborator will also be attending the WWP’s upcoming advanced text encoding institute at the University of Maryland, January 2010.

Kathryn Tomasek: A professor of History at Wheaton College, working in collaboration with Zephorene Stickney of the college archives on a collection of materials on the Wheaton family and the founding of the college. WWP staff provided feedback and advice on draft grant proposals; Tomasek also attended a subsequent TEI workshop taught by WWP staff (at the University of Victoria, June 2009) with an undergraduate student who is working with her on the project.

English Broadside Ballad Archive: This project, located at the University of California, Santa Barbara, is directed by Patricia Fumerton and involves a number of graduate students in English at UCSB. This group attended our UCSB seminar; while in Santa Barbara, WWP staff met with the project team and provided some detailed consulting and advice on project strategy, technical directions, and text encoding. Members of this group will also be attending the WWP’s upcoming advanced institute being held at UCSB in September 2009.

Amelia Wong: A graduate student in American Studies at the University of Maryland; WWP staff provided informal followup advice.

Richard Wisneski: A metadata librarian at Case Western Reserve University, working on a variety of digitization and encoding projects. WWP staff provided informal followup advice.

Emma Moreton: A graduate student in corpus linguistics at the University of Birmingham (UK), working on a corpus of 19th-century American female slave narratives. WWP staff provided informal followup advice.

Appendix: Participants’ Institutions

  • Acadia University (Canada)
  • Augustana College
  • Birkbeck College, London University (UK)
  • Case Western Reserve University
  • California State University, Northridge
  • Dalhousie University, CA
  • Dartmouth College
  • Dickinson College
  • Gale Group
  • Getty Museum
  • Hamilton College
  • Harvard University
  • Hofstra University
  • Illinois Institute of Technology
  • Indiana University
  • Louisville University
  • Loyola University
  • Miami University of Ohio
  • Michigan State University
  • Mount Holyoke College
  • National Gallery of Art
  • Newberry Library
  • NITLE
  • Northern Illinois State University
  • Northern Kentucky University
  • Northwestern University
  • New York University
  • Ohio State University
  • Queen’s University (Canada)
  • Rhodes College
  • Shippensburg College
  • St. Francis Xavier University (Canada)
  • Stanford University
  • Texas A&M University
  • Texas Tech
  • University at Buffalo
  • University of Bergen
  • University of Birmingham, UK
  • University of California, Berkeley
  • University of California, Irvine
  • University of California, Los Angeles
  • University of California, Santa Barbara
  • University of Chicago
  • University of Houston
  • University of Illinois, Chicago
  • University of Illinois, Urbana-Champaign
  • University of Iowa
  • University of Ireland, Galway
  • University of Maryland
  • University of Nebraska-Lincoln
  • University of Puget Sound
  • University of Rochester
  • University of Southern California
  • University of Virginia
  • University of Washington
  • Virginia Commonwealth University
  • Washington University in St. Louis
  • Western Michigan University
  • Wheaton College
  • Wichita State University
  • Willamette University
  • Youngstown State University

Notes

[1] An XML file is said to be "valid" when it conforms to the rules expressed in a given schema. In order to be valid, a typical TEI file must include certain basic metadata elements (title, publication statement, description of its source) and some simple body elements.

[2]We include in the category of “non-commercial” any activity by a non-profit organization or by an individual teacher, whether paid or not.

[3] The schema is the formal declaration of the rules governing the encoding language in question: the tags that are permitted and the ways they may interact. A schema, for instance, would constrain whether a list may appear inside a paragraph (e.g. whether a <list> element may be enclosed within a <p> element).

[3] More specifically, the TEI schema must be customized to be used; every TEI schema is a customization, in the sense that users must choose which parts of the TEI language will be included in the schema they use. Some pre-built TEI schemas are available (for instance, from the TEI site, or bundled with some XML editing software) but these are simply customizations built in advance.

[4] See http://www.wwp.brown.edu/encoding/current/handouts/TEI_exercise.zip.