Methods
Starting with Women Writers Online
The “Intertextual Networks” research project was built on the existing Women Writers Online (WWO) corpus, a collection of works by women originally published between 1526 and 1850. These works have been transcribed and encoded in XML, following the guidelines of the Text Encoding Initiative (TEI) and our own editorial principles.
When we began the project, the WWO texts already included encoding for titles, quotes, and citations. WWO texts did not, however, include markup pointing to the specific works being referenced.
Compiling the bibliography
To identify the referenced works in WWO, we extracted the “intertextual gestures”—references to, or marked engagement with, another work—into actionable reports. Each report included as much information as the existing markup was able to provide, as well as XPaths that we could use to get back to the gestures as they appeared in the original WWO files.
We structured our research into phases, intended to leverage the information gleaned from existing encoding. The phases were:
- We identified works referenced by
title
elements, when they occurred inside citations (bibl
). If the citation also included an author’s name, that information was included in the spreadsheet report. - We identified works in a spreadsheet of
title
s (not in citations) which occurred more than once in WWO. - We identified references to the “singleton titles”—those
title
s which were unique in the textbase. Because these were likely to be more singular cases, the report had a web interface which showed each title in its narrative context. - We identified
quote
s when they occurred inside epigraphs, or when they had a correspondingnote
. As in the previous phase, our work was aided by a web interface. - We identified biblical citations marked by the WWO element
regMe
. As in phases 1 and 2, using a spreadsheet allowed us to reduce repetition by condensing multiple instances into distinct strings of text.
For each phase, we used existing tools to transform WWO documents into regularized text, then used XQuery scripts to generate reports on the texts’ gestures. We went through the report, researched the text of the gestures to find matching works, and gathered information about the earliest edition we could find. We held regular meetings to discuss unusual or difficult cases, suggest enhancements to our workflows, and share notable examples from the textbase. As we worked, our team categorized works by genre and topic. At the same time we expanded our list of genres and topics from simple terms such as “poetry” and “drama”, to more complex and/or historically nuanced categories such as “natural philosophy” and “chorography”.
While we had envisioned creating an XML bibliography programmatically after the first phase, our team quickly discovered that bibliographic information is often too complex for columns in a spreadsheet (or, later, by HTML form fields). In such cases, we created entries directly in the TEI bibliography file, and marked them off in the report. The spreadsheet and HTML reports were still used to gather standard bibliographic information such as title, author name, and publication details. After a phase was complete, the report entries were programmatically converted into TEI and inserted into the bibliography file.
Enhancing WWO
The reports were used to gather bibliographic information, and associate each entry with an identifier. After each report was complete, we used it to propagate the identifiers for the referenced works back into the markup of the intertextual gestures in WWO. As a result, most—but not all—of the gestures in WWO had tags to mark them out, as well as pointers to relevant bibliography entries.
By this point, our team had substantial experience with interpreting WWO encoding, researching
extant works, creating bibliography entries in TEI, and using XPath to identify features of markup.
Rather than creating another spreadsheet or HTML report on the remaining gestures, we worked
directly in the WWO documents. Each team member worked exclusively in the WWO files and in the XML
bibliography, using XPath to find tagged gestures that did not yet have pointers to the
bibliography, and working to identify them. These primarily included the remainder of the
quote
s, as well as bibl
elements that did not
include title
s.
We also assigned identifiers to the authors and contributors named in the bibliography, ensuring that works by the same person would be linked. As part of this work, we began categorizing authors by perceived gender identity, using names, pronouns, honorifics and other titles to make a decision. These classifications are essential not only for making women’s contributions visible, but for allowing researchers to drill down into the ways WWO authors engaged with works by contributors of different genders. Still, it is important to note that we used modern sensibilities and understandings of gender when we made choices on how to classify historical people. The concepts of “sex” and “gender” are deeply personal, and they shift over time. As such, one should treat our categorization of historical individuals with a grain of salt.