Browsed by
Month: September 2017

To the Right Honourable, Virtuous, Heroical Reader

To the Right Honourable, Virtuous, Heroical Reader

This post was authored by Anna Kroon, University of New Haven class of 2019, who held an internship at the WWP during the summer of 2017. 

I came to the Women Writers Project really excited to work on such a large project with a wide variety of texts in their files. My experience was limited to Victorian shipboard newspapers, so anything not related to the ocean or intellectual boat humor was thrilling to me.

Since I had experience with XML and the TEI, but not with WWP encoding, I wanted to learn how to encode a short text that was vastly different from what I had transcribed and encoded before. To begin, I worked on Poems on Various Subjects by Elizabeth Sarah Gooch. This is a pessimistic set of poems that mostly deal with the author’s loss and sadness.

What interested me most about this text was at the very end and not even written by Gooch at all. The last poem in the collection was written to Gooch by a Mr. Anthony Pasquin, Esq. Being so new to the WWP encoding guidelines I had to ask “is there anything special I do with a poem not written by the author?” The answer was not what I had expected from such a detail-oriented project: no.

To the Author, with Love

“Non-authorial paratexts” (NAPTs) became my specific interest (and the bane of my existence). My dad would joke “para what? Pair-a-socks?” as I stumbled through explaining the encoding jargon and an oversimplified definition of my research.

Essentially these NAPTs are texts (poems, letters, and other short texts) that are published with a larger text and are written by a person who is not the author of that text. As a general rule NAPTs are written to or about the author of the main text praising her abilities, virtuousness, or life. I had many hypotheses about the purpose, authors, and significance of these texts, but I had to pare them down to match the scale of a summer internship.

I started off looking to see if Pasquin’s poem was not in fact a singular phenomenon. I took to the digital stacks of already published and in-progress texts in Women Writers Online to see what I could find. Using XPath, I tried and erred my way through many possibilities of where and what these non-authorial paratexts could be. As of writing this post, I have found over 120 unique non-authorial paratexts from 30 different larger texts.

The Process of Finding NAPTs with XPath

Full disclosure: I had never used XPath before this summer. That, combined with my beginner’s knowledge of the WWP encoding, made my XPath searching very slow to start. I used some logic and knowledge of general publication formatting to create my first few queries.

Texts are very broadly split into three main sections: front matter, main body, and back matter. Based on my initial assessment, I determined that the author generally writes everything in the main body without much contribution by others. So it wouldn’t be very fruitful to look there. Front matter comes first and often holds dedications and other prefatory materials. Even though the case that sent me to this research was in the back matter I thought it prudent to start with what was in front of me.

Within the front matter, a <div> (textual division) can have a several values for the @type attribute, including: “advert,” “contents,” “ded,” (dedication) “frontispiece,” “prefatory,” or “prologue.” The two values for @type that seemed the most useful were “ded” and “prefatory.” For my first search, I used “prefatory” because it is the broader category that would give me more results so that I could go on to refine my search.

I didn’t want to cast my net too wide on my first search, so I also decided to specify that I was looking for poetry. The XPath I used was: //front//div[@type="prefatory"]//div[@type="poem"]
essentially saying “Look in the front matter, then look in <div>s with a @type value of “prefatory,” and then look for <div>s with a @type value of “poem.” This search gave me 12 poems, 4 of which were NAPTs. One text I found in this fashion was Katherine Philips’ Poems (1664), which had 7 poems written by mostly men praising Philips.

While I went through my first set of results I noticed some of the line groups had a @type of “para.” In the hazy fog of researching NAPTs I assumed that this was an irregularly used value for paratexts. I added this on to my original search creating: //front//div[@type="prefatory"]//div[@type="poem"]//lg[@type="para"]
I was dismayed to find that I was looking at my previous search results except broken down by line group rather than any sort of new material. Still determined that “para” stood for paratext I tried deleting the type of ‘prefatory’ from my search. This gave me more fruitful results. The Poetical Works of the late Mrs. Mary Robinson had an astounding 19 paratexts written by various important men, including the Duke of Leeds.

This success made me sweat a little—maybe what I was researching had already been done and didn’t need me endlessly XPath searching. A quick look through the WWP encoding guidelines assuaged my fears stating that the value “para” actually stood for verse paragraphs. After this I went back to the model of my first query. Since “ded” was the other front matter @type value that seemed worthwhile, I simply exchanged “poem” for “ded” and searched: //front//div[@type="prefatory"]//div[@type="ded"]. I had not yet realized that these were two values that were essentially describing the same level of textual divisions and that dedications would not necessarily be nested within more general prefatory textual divisions. My results helped me to see the issue with my search: there was only one resulting dedication, written by the author to a princess.

From these missteps I decided to go with a simple query: //front//div[@type="ded"]. This action was driven by frustration, but also observation. Most of the non-authorial paratexts I found were poems or letters written to or about the author. In other words, dedicated to the author. There were 143 results, many of which were actually authorial paratexts (that is, paratexts written by the author herself). Even so, from this single search I found 77 unique non-authorial paratexts. However, many of these were not the highlighted passages found by XPath. I scrolled up and down around the search result to see if there were potentially other NAPTs in that specific document. Of the 77 I found with the [@type="ded"] search, 52 were found through scrolling—or, 67% were not a direct result of the XPath query.

As the simple “ded” search was very successful, I thought another simple search would be a good course of action. Searching //front//div[@type="prefatory"] yielded 432 results. While I knew that there would be many non-authorial paratexts hidden under a lot of other odd prefatory material, this was such a large grouping that it would take a single person with a time constraint far too long to parse through. To trim this list, down I looked at prefatory material that contained <signed> elements inside of <closer>s.

When scrolling through texts I quickly learned that a closing byline or signature was more common than an opening one beneath the heading. There were some rare cases where the authorial attribution was baked into the title, but as a general rule NAPTs in the WWP textbase have signatures as authorial attributions, so looking for dedications that contained <signed> elements gave me another way to refine my results. I performed several other searches, including reviews of the back matter, until I had a pool of nearly 130 unique non-authorial paratexts, which seemed like a comfortable amount to make a case for tagging these texts and a significant enough sample size to analyze.

Features of a Non-Authorial Paratext

My lengthy process of trial and error was at times frustrating, but it did make me acutely aware what was and what wasn’t a non-authorial paratext. The first thing I noticed was the signature or byline. When XPath pulled up a potential file, I scrolled to the very end and looked for a line that began with <signed>. If there was a signature, I would check to make sure that the author attributed was not the author herself. At times, this meant looking to the publication information or the personographic data to make sure the initials were not hers. If there were any doubts about authorship (especially for texts written by various or unknown authors) I would not catalog it.

If there was no signature line I would check the <head> or <head> @type “sub” for a byline. This was the less common occurrence, which is why I would look at it second even though it comes before the ending signature. I observed this phenomenon happening most often when the author of the non-authorial paratext was someone of importance such as a duke or a lord.

Another feature I touched upon in my searching was the titles of the paratexts. I focused on the subset of non-authorial paratexts that were written to the author praising her abilities. This is shown in the titles with prepositions like “to,” “on,” or “upon.” The most common first word of the title was, unsurprisingly, “to” with 71 instances or 57%. I looked at the other first words and considered a word significant if there were more than 5 occurrences. I decided to put “Impromptu” and “Sonnet” together class they are both declarations of the poem’s type and neither was significant on its own. The significant identifying words broke down thusly:

“To”: 57%
Less than 5 occurrences, various: 22%
“On/Upon”: 11%
Untitled: 6%
“Impromptu/Sonnet”: 4%

The keyword contents of the title and the author signature were the two most important features for identifying an NAPT. The only other useful tool was context. There were one or two occurrences where the paratext was not written to the author, but by the content of the poem or letter made it clear that it wasn’t written by the author herself.

Marking Up Non-Authorial Paratexts

Since I clearly proved the existence and abundance of NAPTs in the WWP textbase, the next step was to discuss markup to make the non-authorial content distinct from the authorial content. I discussed this topic with Sarah Connell and Ashley Clark as well as some encoders working on the project. We came to the conclusion that this was not going to be decided soon, but created several potential methods of encoding.

The first was simple and logical, but would be more difficult to implement. This meant creating an attribute that would be used on a <div> with a set of values that indicated the paratext author’s assumed gender. For example, an @author attribute with values of “male,” “female,” “collective,” “unknown,” and “nonbinary.” Therefore a poem written by the author’s brother would be marked as:
<div author= "male"> <head>To my beautiful sister by <persName> Joseph Kroon </persName></head> [insert poem here]</div>.

The second method followed the logic of the first, but was more practical. Rather than create a new attribute we would use @resp which indicates “the nature of a person’s intellectual responsibility, or an organization’s role in the production or distribution of a work” according to the Text Encoding Initiative’s P5 Guidelines. Instead of values that indicated the NAPT author’s gender we would create general personography entries for @resp to point to, expanding on the existing entry for “unknown author” to include unknown male, female, collective, and nonbinary authors. For example, that same poem by the author’s brother would now be encoded as:
<div resp="p:umale.agv"><head>To my beautiful sister by <persName> Joseph Kroon </persName></head> [insert poem here] </div>.

This method would also provide more specificity for paratexts where the NAPT author already has an entry in the personography:
<div resp="p:jkroon.doe"><head>To my beautiful sister by <persName> Joseph Kroon </persName></head> [insert poem here] </div>.

As of this writing, no decision has been made on the method of markup. Thinking theoretically, if we had decided upon the markup, my next step would be to implement it in a special test folder on the non-authorial paratexts I cataloged. From there we could see how the new markup interacts with the existing markup to ensure that it doesn’t cause any issues. We could also used this marked-up data to perform simpler searches for NAPTs. We would even have the ability to easily create reports with bibliographic data for analysis.

Analysis of the Non-Authorial Paratexts

After cataloging and marking up the paratexts, my final goal was to research a few hypotheses about the authorship of NAPTs. I brainstormed many different hypotheses that I wanted to look into, but settled on three.
1. The gender composition of paratext authorship is mostly male.
2. Most paratexts are written by people of distinction.
3. Texts with a significant number (more than 5) of NAPTs are published posthumously.

The authorship genders broke down as: 5% female, 36% male, and 59% gender unknown, thus showing my hypothesis was plausible. The considerable number of gender unknown authors comes from the fact that authors would sign their paratext with only their initials or a gender non-specific abbreviation. I did not have the resources or the time to search for mystery authors with only initials and the main author’s bibliographic data.

I defined distinction very broadly as anyone who had a title of any sort. This could be anyone from a king to a judge considered “honourable.” Of the 126 cataloged non-authorial paratexts, 24 (or 19%) were written by someone whom I defined as distinguished. The distinguished titles broke down as follows:
Duke: 4%
Marquis: 4%
Earl: 8%
Lord: 4%
Reverend/Doctor/Reverend Doctor: 13%
Sir: 4%
Honourable: 4%
Esquire: 55%
All this to say that my hypothesis was proven false. However, any of the unknown authors who signed with initials could be a titled person of distinction.

My final hypothesis was going to be researched with the use of the WWP’s counting XQuery and the experimental paratext markup. Since we were not able to come to a decision on this topic I was not able to mark up nor have a an automatic routine parse through lots of data. Curating author death dates and text publication dates for nearly 100 different texts would have been far too time consuming. However, my text with the most non-authorial paratexts was The Poetical Works of the Late Mrs. Mary Robinson with a stunning 19 non-authorial paratexts. From the title and the contents of the individual non-authorial paratexts it is apparent that this volume was published posthumously and Mrs. Robinson was a well-known and well-loved author.

This project is still ongoing. Once a decision is made on how to tag the non-authorial paratexts and they have been tagged, there are many other topics to research. We could easily identify how many NAPTs there are per text or the genre that has the most NAPTs. With more connections to the WWP’s bibliographic information we could see if any of the NAPT authors are other female authors in the database. We could also do a full analysis of publication versus death date to see if texts published posthumously do in fact have more non-authorial paratexts than texts published during the author’s lifetime.

WWP Practicum Series

WWP Practicum Series

We’re delighted to announce that the WWP will be offering a new practicum series during the 2017–2018 academic year. In this series, we’ll be holding two-hour workshops focused on particular skills and tools. Each session will be held from 10am to 12pm in the Digital Scholarship Commons in Snell Library. In the fall, we will be offering:

  • October 4: File Management For Digital Humanities Researchers. This session will cover essential strategies and design considerations for organizing files and research data for the long term, including basics of using the command line to see under the hood of your hard drive
  • November 8: Using Oxygen Like An Expert: projects, frameworks, and scenarios. This session will cover advanced topics in the use of the Oxygen XML editor, including setting up projects, frameworks, transformation and validation scenarios, and version control client plugins. Everything you’ve always wanted to know about Oxygen (but were afraid to ask…).
  • December 6: Efficient Code-Writing in RStudio. This session will share a “cheat sheet” developed to cover the most common usage scenarios for typical digital humanities data. We’ll also discuss how to write R code that can be efficiently embedded in workflows to handle repetitive tasks.

In the spring semester, we’ll look at further topics including basic and advanced XPath, regular expressions, Schematron, and XQuery.

These sessions are free and open to the public, but guests from outside the NU community should email Sarah Connell (sa.connell[at]northeastern[dot]edu) to arrange library access.

We hope to see you there!