Methods

Starting with Women Writers Online

The “Intertextual Networks” research project was built on the existing Women Writers Online (WWO) corpus, a collection of works by women originally published between 1526 and 1850. These works have been transcribed and encoded in XML, following the guidelines of the Text Encoding Initiative (TEI) and our own editorial principles.

When we began the project, the WWO texts already included encoding for titles, quotes, and citations. WWO texts did not, however, include markup pointing to the specific works being referenced.

Compiling the bibliography

To identify the referenced works in WWO, we extracted the “intertextual gestures”—references to, or marked engagement with, another work—into actionable reports. Each report included as much information as the existing markup was able to provide, as well as XPaths that we could use to get back to the gestures as they appeared in the original WWO files.

We structured our research into phases, intended to leverage the information gleaned from existing encoding. The phases were:

We identified works referenced by title elements, when they occurred inside citations (bibl). If the citation also included an author’s name, that information was included in the spreadsheet report.
We identified works in a spreadsheet of titles (not in citations) which occurred more than once in WWO.
We identified references to the “singleton titles”—those titles which were unique in the textbase. Because these were likely to be more singular cases, the report had a web interface which showed each title in its narrative context.
We identified quotes when they occurred inside epigraphs, or when they had a corresponding note. As in the previous phase, our work was aided by a web interface.
We identified biblical citations marked by the WWO element regMe. As in phases 1 and 2, using a spreadsheet allowed us to reduce repetition by condensing multiple instances into distinct strings of text.

For each phase, we used existing tools to transform WWO documents into regularized text, then used XQuery scripts to generate reports on the texts’ gestures. We went through the report, researched the text of the gestures to find matching works, and gathered information about the earliest edition we could find. We held regular meetings to discuss unusual or difficult cases, suggest enhancements to our workflows, and share notable examples from the textbase. As we worked, our team categorized works by genre and topic. At the same time we expanded our list of genres and topics from simple terms such as “poetry” and “drama”, to more complex and/or historically nuanced categories such as “natural philosophy” and “chorography”.

While we had envisioned creating an XML bibliography programmatically after the first phase, our team quickly discovered that bibliographic information is often too complex for columns in a spreadsheet (or, later, by HTML form fields). In such cases, we created entries directly in the TEI bibliography file, and marked them off in the report. The spreadsheet and HTML reports were still used to gather standard bibliographic information such as title, author name, and publication details. After a phase was complete, the report entries were programmatically converted into TEI and inserted into the bibliography file.

Enhancing WWO

The reports were used to gather bibliographic information, and associate each entry with an identifier. After each report was complete, we used it to propagate the identifiers for the referenced works back into the markup of the intertextual gestures in WWO. As a result, most—but not all—of the gestures in WWO had tags to mark them out, as well as pointers to relevant bibliography entries.

By this point, our team had substantial experience with interpreting WWO encoding, researching extant works, creating bibliography entries in TEI, and using XPath to identify features of markup. Rather than creating another spreadsheet or HTML report on the remaining gestures, we worked directly in the WWO documents. Each team member worked exclusively in the WWO files and in the XML bibliography, using XPath to find tagged gestures that did not yet have pointers to the bibliography, and working to identify them. These primarily included the remainder of the quotes, as well as bibl elements that did not include titles.

We also assigned identifiers to the authors and contributors named in the bibliography, ensuring that works by the same person would be linked. As part of this work, we began categorizing authors by perceived gender identity, using names, pronouns, honorifics and other titles to make a decision. These classifications are essential not only for making women’s contributions visible, but for allowing researchers to drill down into the ways WWO authors engaged with works by contributors of different genders. Still, it is important to note that we used modern sensibilities and understandings of gender when we made choices on how to classify historical people. The concepts of “sex” and “gender” are deeply personal, and they shift over time. As such, one should treat our categorization of historical individuals with a grain of salt.