Browsed by
Tag: TEI

‘To the most distant Parts’: Reading and writing about the world in The Female Spectator

‘To the most distant Parts’: Reading and writing about the world in The Female Spectator

This post is part of a series authored by our collaborators on the Intertextual Networks project. For more information, see here. 

By Samuel Diener, Ph.D. Candidate in English, Harvard University

In the November 1744 issue of her periodical The Female Spectator, the novelist and essayist Eliza Haywood writes:

What Clods of Earth should we have been but for Reading? —How ignorant of every thing but the Spot we tread upon? —Books are the Channel through which all useful Arts and Sciences are conveyed: —By the Help of Books we sit at Ease, and travel to the most distant Parts; behold the Customs and Manners of all the different Nations in the habitable Globe, nay take a View of Heaven itself, and traverse all the Wonders of the Skies.1

Haywood’s exclamation is an admonition to her female readers to cultivate knowledge of history, ethnography, geography, cosmography, and the art of navigation. But it is also an injunction to employ the social technology of the book to travel all over the globe. For Haywood, books offer access to the frontiers of empire. They are a ticket to the contact zone, one that enables the reader to behold the “Customs and Manners” of the national other.

Haywood suggests that her readers owe it to the mariners who bring back the luxuries of empire to journey with them vicariously: “a Sense of Gratitude, methinks, should influence us to interest ourselves in the Safety and Welfare of the gallant Sailors, . . . commiserate their Sufferings, and rejoice in their Escapes.”2 In the midst of a moment of crisis for the British empire, when its future success was the subject of anxiety, Haywood here advises her readers to confirm the notion of empire and fill a specific gendered role in the imperial project: vicarious participation. But she also suggests that women owe it to themselves to cultivate their knowledge of the globe precisely in order to contest the constraints of that gendered role in the course of interactions with men, reading “to the End they may be enabled to make an agreeable Part in Conversation [and] be qualified to judge for themselves.”3

But did Haywood herself (and other British women of the early modern period) actually engage in this kind of readerly practice? And how did they view their role in the empire’s expansion? The Women Writers Online corpus presents a potentially valuable way to approach this question. It is coextensive with the rise of British imperialism, including many moments when the imperial project was in a precarious position, and contains texts that engage topically with the extra-European world. Since each place-name reference in the corpus is tagged as a TEI/XML element with <placeName>, it is possible to map these references. As part of the Intertextual Networks Project, I will be using the <placeName> tags to explore the extent to which the women writers in the corpus engage topically with the imperial margins. Then, by examining the context of individual references (or clusters of references), I will be able to make conjectures about the networks of information in which these women were embedded, the sources they employ—like news or narratives of travel—and the uses they make of their material. As a result, I envision my project as a two-staged, mixed-method study: first tracking references at the macro-level, and then following up with careful interpretation and analysis.

Computational Analysis

The first obstacle to working with the corpus at a macro level is simply accessing the data. Thankfully, there are multiple resources available for this kind of work. After an excellent workshop with Northeastern University’s Syd Bauman and Julia Flanders on XSLT which I took this January, I’d recommend this language for other users of the WWO corpus; it’s straightforward and intuitive and specifically designed for interpreting XML data. Also, there is an existing set of useful resources produced by the WWO team, including Ashley Clark’s “Counting Robot”, which is available here.

However, since I was eager to begin work and lacked any experience with XSLT at the time I began the project, I conferred with some friends who have significant coding experience and they helped me design a simple counting robot in Python that performs the same function. It extracts the contents of the <placeName> tags to a large tab-delimited table, converts special characters (like the medial S), and eliminates alternate punctuations to obtain reference totals for each work (see Figure 1). Because I am specifically interested in mapping topical engagement in the texts, I chose to exclude frontmatter and backmatter, focusing only on the body of the text itself. (I don’t mean to imply that that material doesn’t contain valuable data, but only that its significance for the questions I wanted to ask seemed harder to predict. Future versions of the project may include this data.) We then created a second data table, which lists all the unique place names and their combined totals across the texts. In all, there were 6,091 unique place-names in the corpus as it stood at the time I began my project. Each place-name was also assigned a unique 4-digit ID based on its frequency-rank.

Figure 1. Example selection from the initial dataset, with columns for author, short version of title, publication date, most common punctuation of the place-name, and count. The sixth column lists all variant punctuations and spellings, so that individual references can be traced.

Together, these two datasets form a rudimentary relational database that will let me use functions in R (my language of choice for data-analysis) both to find patterns in place-name usage over time in the corpus at large and to map the topical engagement of individual texts. Figures 2-4 show the kind of broad-brush analysis that such data makes possible. They map the shape of the data for the entire corpus. A striking dynamic emerges: a collection of just a few locations, often around the metropole (England, France, London), are referenced an enormous amount of times, but the distribution curve falls off very quickly to a very, very long tail. Of the 6,091 unique names, only 487 places are mentioned more than ten times.

Figure 2. Bar plot of place names in the WWO corpus, sorted by number of total references.
Figure 3. Histogram plot of frequencies. The y axis is the number of references; the height of each bar represents the number of place-names that are mentioned at that frequency. Thus the first bar shows the number of places mentioned just once.
Figure 4. Frequency histogram, omitting place names mentioned just once.

Unfortunately, as Figure 1 illustrates, there are significant problems with this data. A glance at the text will show, for example, that the different names in the sixth column of lines 732 and 741 refer to the same place. To correct such issues, I am going through the entire second data-table, editing the ID’s so that alternate spellings of the same place-name are assigned the same unique ID. I will also have to look up archaic place-names to identify their geographical referent and to make distinctions between real-world places and “heaven,” “topsy-turvy,” “Abraham’s bosom,” and other fictional, mythical, or non-terrestrial locations. Finally, in order to map the geographical distribution of these places, I will have to retrieve (using the “ggmap” package available for R)—and check by hand—latitude/longitude coordinates for each place.

This labor-intensive process is simply beyond the realm of possibility for a busy PhD student like myself. (I can do about 15-20 place names in an hour.) However, there are 3,524 place-names that appear only once in my dataset. Trimming off this “long tail” will still give me valuable, if somewhat simplified, data, as shown in Figure 4. And a diversity test of the data, like the one shown in Figure 5, shows that nonce place names are fairly evenly distributed across the corpus. Getting rid of them only excludes a few texts, which mostly prove to have had just a small number of place-name references. (Examining these texts to see what generic or other conventions predict such less-spatially-localized writing might prove fascinating matter for another project). So far, I have only worked my way through about 700 of the 2,567 place names that occur more than once in the database, so it will be quite a while before I can begin to do analysis at the aggregate level.

Figure 5. Shannon diversity plot of authors in the corpus, showing their place-name diversity (threshold >0) and how it is affected by excluding place names that occur in the corpus just once (threshold > 1), twice (threshold > 2), three times (threshold >3), etc. Authors with only the “>0” bar use no place names that appear more than once in the corpus, and thus will no longer be represented in the dataset if nonce place names are eliminated.

Spectator as Case Study

Since my project was inspired in part by the section of The Female Spectator that I mention above, I’ll return to that work as a test case to see what these methods can tell us about a text using the data I have so far. I’ve checked and obtained coordinates for the 192 unique place names mentioned in the four volumes of the periodical available in WWO. The distinct character of their distribution is immediately apparent, and it reveals—surprisingly, in light of the passages I quote above—a tightly localized focus. The text’s most-used place name by far (at 46) is “London,” which (by contrast) takes a distant third place in the corpus’ overall place-name distribution (see Figure 6). As Figure 7 shows, many of the other place-names mentioned in the periodical (including, for example, the street-addresses of its ostensible contributors) also cluster densely around the metropolitan area of London. Meanwhile, most of the foreign high-scorers in the corpus data set (Rome and America, for example) drop well down in The Female Spectator’s data (see Figure 6).

Figure 6. Top 20 most-referenced places in the WWO corpus (left) vs. top 20 most-referenced places in The Female Spectator (right).
Figure 7. The Female Spectator: Place-names in the vicinity of London.

I’d suggest that an explanation for this geographical localization is easily found in the structure of the work. The first periodical aimed at women authored by a woman in English, The Female Spectator was produced by Haywood in London between 1744 and 1746. It engages with debates about politics and domestic life that were topical for bourgeois and upper-class women in and around London in the period and takes the same form as many other famous periodicals of the century like The Tatler and The Spectator. It consists of one essay each month engaging with a particular topic, often including and responding to a letter ostensibly written by a reader from the same geographical area.

The periodical thus attempts to mirror formally, while also providing a medium for, a public sphere for 18th century women living in its primary area of distribution in the environs of London. Comparing this map to the England/France map (Figure 8) and the world map (Figure 9) show us how dramatically place-name references drop off as we go farther from the metropolitan center; for example, one occurrence of “Canada,” two of “America,” and three of “West Indies” are the only references to the Western hemisphere (unless you count two references to the Pacific and one to the South Sea).

Figure 8. Place-name distribution in Britain and France in The Female Spectator.
Figure 9. Global place-name distribution.

As Figure 9 shows, Haywood’s primary sustained engagement with the non-European, non-Mediterranean world seems to have been with the island of Sumatra in Indonesia, then the site of a small British colonial trading post called British Bencoolen. Most of these references come from a single section in the October 1745 issue of The Female Spectator, which tells the tale of a British crew shipwrecked on Sumatra. The story opens with a breakdown in Western technical prowess: the ship leaking badly, the crew deliberately runs it ashore, where it lodges fast between two rocks. To this breakdown is quickly added a reversal of the documentary gaze. The shipwrecked sailors are surrounded by indigenous locals, and kneel in surrender: “This made them withdraw their Bows . . . and draw round us in a Circle, staring as the Rabble of England would do on one of them, had we had them here in the odd Habits they wear there” (186). The inversion of roles upsets colonial hierarchies, reminding us that on the soil of another Empire—as we soon find out, the Empire of Summatra—the British seem as bizarre, and their clothes as garish, as indigenous people might seem to the British. The entire anecdote seems to be fictional: despite extensive searching, I have been able to find no corroborating sources. Haywood’s point in the tale, she states explicitly, is to contest the othering rhetoric of travel writers, who imply “that God had endued only the Europeans with reasonable Souls.”

The variety of travel-books Haywood mentions and summarizes for her readers—mainly in the July 1745 issue—suggests that she was reading voyage narratives with comprehensive deliberateness. She describes (among others) works by Aubry de la Mottraye (1674?-1743), Bernard de Montfaucon (1655-1741), William Dampier (1651-1715), Jean-Baptiste Du Halde (1674-1743), François Maximilian Misson (1650?-1722), Cornelis de Bruyn (1652-1726?), Jean-Baptiste Tavernier (1605-1689), and Jean Chardin (1643-1713). Her list concludes, “There are yet some other Books I would fain take upon me to recommend; but . . . I have been already too ample in my Detail.” It is thus particularly striking that in The Female Spectator itself, so far from enacting vicarious participation with the British imperial project, Haywood employs her mastery of the genre and the discourse of travel narrative to fabricate a fictional voyage of her own that calls into question the ideological assumptions of what was, at the time, a genre dominated almost entirely by men.

A New(ish) Approach to Markup in the Undergraduate Classroom

A New(ish) Approach to Markup in the Undergraduate Classroom

By Kevin G. Smith, Ph.D. Candidate in English, Northeastern University

Note: Kevin G. Smith is a pedagogical development consultant for the WWP. His dissertation research is partially supported by a grant from the NULab for Texts, Maps, and Networks.

A few summers ago, I spent my days working in Northeastern’s Digital Scholarship Commons. As is common in that space, there were nearly daily meetings of different teams of faculty, library personnel, and graduate students working on digital projects. One of these projects was The Early Caribbean Digital Archive (ECDA). During that summer the ECDA project team was working on customizing a TEI schema to encode their texts in ways that were more in line with their decolonial archival goals. As I procrastinated on my own work, I was overhearing these amazing conversations that the EDCA team was having about the meanings and applications of certain aspects of their TEI customization. How should they tag an embedded or mediated slave narrative, for example? What to do about unnamed slaves? And how might they handle commodities? What are the ethical ramifications of encoding a slave as a commodity (or not)?

As I sat, listening to these conversations, I began to realize that it was precisely because they were encoding the texts in TEI that these conversations were happening. The act of encoding literally inscribes texts with interpretation, forcing the project team to discuss just what kinds of interpretive judgments they wanted to make. And they were important conversations: about how we represent our objects of inquiry in the humanities, about the ethics of data representation. (By no means am I the first to realize this. For a compelling example, see Julia Flanders: “The Productive Unease of 21st-century Digital Scholarship.”)

The point is that I was struck by these conversations. And I began to think about how the tension of formalization, this “productive unease,” as Flanders terms it, might be leveraged in writing classrooms. Could I somehow use the TEI to intervene in students’ writing processes, to foster these kinds of conversations about their own writing? What would that even look like?

Two years later, in the summer of 2016, I taught my first markup-based writing course at Northeastern. In the intervening years my approach shifted from using the TEI to designing a built-from-scratch XML schema for each course. Thus far, I’ve taught two courses using this method (Advanced Writing for the Technical Professions in the summer of 2016 and First-year Writing in the fall of 2016). In addition to writing their assignments in XML (using Oxygen), students in these courses engage in a semester-long, collaborative writing project: the design and implementation of an XML schema that structurally and rhetorically models a range of genres of writing.

This approach—using XML to produce texts—represents a shift from the mimetic roots of XML and its primary use in humanities research, the TEI. In the rest of this post, I want to briefly discuss this shift and its implications for the study of markup.

Teaching with Markup

 There are many wonderful examples of using the TEI and XML in classrooms. Kate Singer’s use of TEI for developing poetic vocabularies in an undergraduate class comes to mind, as does Trey Conatser’s use of XML in a first-year writing course at the Ohio State. Though, at first blush, these two markup classrooms may appear very different—one being in an upper-level literature course and the other a first-year writing course—the perceived pedagogical benefits of using markup are similar. Both pedagogues seek to foster close attention to the object of study—a poem or the student’s own writing—through what is essentially a process of annotation.

Where my approach to markup differs from these (and most traditional) classroom uses is in the thoroughly bottom-up, data driven approach to schema design (Piez, 2001). Students begin with a (basically) bare schema and—iteratively and deliberately over the course of an entire semester—design and revise the schema for a range of writing tasks using document analysis and modeling, qualitative writing research methods, and their own experiences of authorship. The result is a shift from annotation to production, from product to process.

An example may be illuminative here. A group of students decide they would like to design a schema for movie reviews. They begin the process by researching the genre—gathering examples, examining related genres, tracing the circulation and uptake of the genre, interviewing experienced writers and readers of the genre, and so on. Based on this research, the group identifies the salient structural, rhetorical, and content-based components of the genre—a movie review includes a series of paragraphs, for example; the first of these paragraphs must, according to the students, include a component called “opinion,” which has a specific definition and different types. They name these components and write a prose pseudo-schema, including documentation, attributes, dependencies, and rules for the components. The pseudo-schema is translated into an XML schema using Relax NG (by me).

An element list from an in-class schema design session with students in the First-year Writing course of 2016.

Once the schema is drafted, each student writes an individual XML document, their own example of a movie review that responds to a unique rhetorical situation. Based on this experience, the group reconvenes to revise their schema. They might, for example, decide that the <opinion> element should be optional in the first paragraph, or decide that an additional attribute value should be added to the @type attribute, or choose to adjust the definition of the element itself. Once schema revision is complete, students revise their XML documents. And on it goes.

An example of XML markup designed for the course.

What I hope the above example illuminates is the thoroughly process-oriented approach to markup adopted in these classes. The schema is not static. It is a living document that affects and is affected by student’s experiences of composing, among other things. Neither are the student-authored XML documents static. They are repeatedly invalidated by revisions to the schema. They are subject to feedback from classmates and instructor. They must be continually revised. From a digital humanities perspective, this application of markup may seem alien. In fact, in some ways, it doesn’t even matter what the schema ends up looking like (though it can be fascinating). The object of using markup in this way is not to produce the perfect model of a genre. In fact, an understanding of genres as social actions, rather than a set of ossified textual features is central to the theoretical framework of the course. This understanding resists the idea that genres can be accurately modeled. The point of using markup is to foster productive conversations about writing, to interrupt the normal thinking and writing processes of students in productive ways. This brings us back to the conversations I overheard in the summer of 2014, eavesdropping on the ECDA when I was supposed to be writing.

An example of a markup output document for display. The XML is transformed to HTML with custom XSLT and highlighted according to XML tags.

But this approach raises new questions. How do I know if this approach is productive in the ways that I hope? What kinds of conversations are students having in these classes? How does markup function rhetorically for students when used for authorship? Does writing in XML and designing schemas for authoring contribute to students’ understanding of their writing and reading processes? Do reading and writing practices in the markup classroom transfer to other contexts? These questions just so happen to be the basis for my dissertation research, which takes as its objects of inquiry the two markup-based writing courses.

Studying (Authorial) Markup

The questions posed above present unique methodological concerns for the study of markup. A shift from product to process raises practical questions concerning how we access students’ experience using markup in this way. How can I make claims about the rhetorical and expressive capacities of authorial markup? How can I understand the role of the schema, the markup, and the platform(s) in students’ writing, reading, and thinking processes? In short, how do I study this?

Here, a slight shifting in thinking—from the digital humanities to writing studies—is helpful. While the pedagogical approach may be unconventional, my research questions are typical of writing studies research. Methods for studying student writing and experience in classroom settings are well established in the field. Although qualitative approaches to the study of markup are not typical in the digital humanities, the research questions for this project, based, as they are, on student experience, reflection, writing, and perception, necessitate the adaptation of innovative methods. To this end, I’ve employed a teacher research methodology—a systematic approach to data collection that honors the inside perspectives of teachers and students—that adapts qualitative research methods culled from ethnography, education, and writing studies research. Data for the study was gathered from direct participant observation, reflective journaling, semi-structured and directed qualitative interviews (three interviews each with nine case study students), and the collection of student writing (normal prose and XML, including version control logs for all XML files).

At this point, data collection has ended and the project is shifting to the data analysis phase. It is too soon to report results, however, early indications from student interviews point to some promising findings around student reflection and transfer, the multi-directional mediation of the schema, and students’ use of markup as a tool for generic invention and change. Here, it may be enough to assert that qualitative approaches to studying markup-based undergraduate courses may be fruitful. Indeed, digital humanities courses in general may benefit from adopting qualitative methodologies, like teacher research, to self-assess and to advocate for curricular change and institutional support.

The assignment discussed above is collected with the pilot set of teaching materials from the WWP’s pedagogical development consultants and is available here.

Humanities features an article on Mary Moody Emerson’s Almanacks

Humanities features an article on Mary Moody Emerson’s Almanacks

We are so delighted to share that an article on the Almanacks of Mary Moody Emerson is featured in the current issue of Humanities, the magazine of the NEH. “Mary Moody Emerson Was a Scholar, a Thinker, and an Inspiration” by Noelle A. Baker and Sandra Harbert Petrulionis, editors of The Almanacks of Mary Moody Emerson: A Scholarly Digital Editionoffers a portrait of the self-educated, undoubtedly brilliant Emerson.

Emerson’s Almanacks span over 50 years and extend to more than 1,000 pages. We’re partnering with Baker and Petrulionis to encode these Almanacks in TEI and publish them in Women Writers Online as a pilot for future manuscript publication in WWO. In December, we added a new folder to the collection, dated c. 23 July 1812–November 1813 and discussed in more detail here.

If the Humanities article has sparked your interest in this fascinating early-American, proto-Transcendentalist woman, you might also want to read “Mary Moody Emerson as Reader and Reviewer,” recently added to our open-access Women Writers in Context series. The exhibit explores Emerson’s extensive, experimental, and eclectic reading and writing practices.

A (semi-)Serious Proposal to the Linguists

A (semi-)Serious Proposal to the Linguists

God, Vertue, Ladies, and Souls

A few days ago, I came across this really interesting Language Log post, which talks about capitalization in one of our Women Writers Online texts—Mary Astell’s A Serious Proposal to the Ladies (1694). In the post, Mark Liberman asks the question: “Why did authors from Astell’s time distribute initial capital letters in the apparently erratic way that they did?” Liberman looks at sentences like this one, which describes the purpose of Astell’s proposal:

It’s aim is to fix that Beauty, to make it laſting and permanent, which Nature with all the helps of Art, cannot ſecure: And to place it out of the reach of Sickneſs and Old Age, by transferring it from a corruptible Body to an immortal Mind.

Since this is a WWO text, I decided to try a bit of experimentation and see what I might be able to uncover using not just the text itself, but also the markup. For just a bit of background, the texts in WWO are encoded according to the guidelines of the Text Encoding Initiative. You do need a subscription to access the collection, but we are always happy to offer free trials, so if you don’t have institutional access or an individual subscription and are interested in reading the texts in WWO, you can find instructions for how to set up a month-long trial here. If you’re curious about the details of our markup, those are covered in our internal documentation.

The first thing I did was enlist some help from Syd Bauman and Ashley Clark, our XML developers. Syd generated a list of all the capitalized words in Astell’s Proposal, along with their immediate ancestry (i.e., the local elements around each word). We found 2,491 capitalized words in total. Reviewing the elements in this list, I could see that it was likely many words were capitalized for reasons reflected in their markup. For example, there were proper nouns (tagged with <name>, <persName>, and <placeName>), titles of other texts (tagged with <title>), and the document’s own headings (tagged with <head>). There were also some words that were simply appearing at the starts of sentences.

So, I asked Ashley and Syd to help me come up with a new list of the capitalized words in Proposal, excluding those in proper nouns, titles, headings, and at the start of sentences. That list is here (original spellings preserved). The top results are: “God” with 31 instances; “Vertue” with 31; “Ladies” with 24; and “Souls” with 21 (in case you’re wondering, the WWP does not encode “God” with <persName>; see here for more details). The rest of the top fifteen—Women, World, Good, Nature, Piety, Religious, Religion, Soul, Beauty, Education, Glory—are all the sorts of word I’d expect to see capitalized in a seventeenth-century text. Reading through the whole list, I was also struck by how much it does feel like an inventory of the text’s core concerns.

Beauty and Death

Having looked at the capitalized words in an individual file, I thought it would be worth investigating all of the occurrences of those words across our corpus. So, since “Beauty” was a commonly capitalized word for Astell (in addition to being relatively short and without too many potential spelling variations), I started with that.

I first wanted to determine if I should be concerned with weeding out the capitalized cases of “Beauty” in sentence-initial positions. A bit of exploration showed me that there weren’t many such cases, and most of these came from texts that also had instances of “beauty” capitalized in the middle of sentences. I found only a handful of clear cases where “beauty” was being capitalized just because it was at the start of the sentence, so I decided not to worry about sentence position. I did find several texts that capitalized “beauty” only some of the time—in a few cases, this seemed to indicate a distinction between personified beauty and a more general usage (e.g., contrast “Soft Beauty’s timid smile serene” with “youth and the bloom of beauty,” both from the 1824 Poetical Works of the Late Mrs. Mary Robinson); in other cases the pattern was less clear. These instances, presumably would be one place I might start if I were investigating this phenomenon in earnest.

So, armed with the power of XPath, I set out to investigate the beauties of WWO. Here’s what I found. There are:
1577 total instances of Capital-B “Beauty” and
1863 cases of lowercase-b “beauty”
Looking across the whole corpus, that’s about 46% capitalized instances.

I repeated the search with “beautie” (to catch both “beauties” and the alternate spelling of “beautie”) and while there were fewer hits, the results were similar in terms of percent capitalized:
438 Beautie; 580 beautie (43% capitalized)

For “beautiful” I saw a different distribution:
71 Beautiful; 1619 beautiful (4% capitalized)

Since I suspected that this kind of capitalization would be more common in our earlier set of texts, I decided to narrow down the results. That just meant adding a bit of XPath before my search to look only in texts with publication dates before 1701 (198 out of 388 texts total).

Here’s what I found:
872 Beauty; 415 beauty (68% capitalized)
270 Beautie; 235 beautie (53% capitalized)
36 Beautiful; 212 beautiful (16% capitalized)

For this term at least (and with all appropriate acknowledgement of the highly rudimentary nature of this search), there does seem to be a bit more capitalization in the earlier half of the collection. Next, I wanted to see what else I could do with our markup. In my review of the tags we used for capitalized words in Astell’s Proposal, I had noticed that there were quite a few occurrences of <mcr>; this is a WWP-created element for a “meaningful change in rendition.” We use it where there are changes in rendition (such as between upright and italicized text) that are neither a printer’s error nor a merely decorative shift and that we can’t encode with more specific elements (such as <emph>, <name>, &c.). It’s essentially an element that says: “we think something semantically significant is happening with rendition here, but we’re not able to say exactly what.” Liberman alluded to this sort of thing when he wrote: “[And never mind, for now, Astell’s italicization choices…]”

Thinking that there might be interesting links between capitalization and these meaningful-but-unspecified changes in rendition, I tried my “beauty” search again, but restricted my results to text inside of <mcr>.

Here’s what I found, first looking across the corpus as a whole:
102 Beauty; 16 beauty (86% capitalized)

And then just the pre-1701 texts:
83 Beauty; 5 beauty (94% capitalized)

Admittedly, the corpus is small enough that narrowing down this far means you have fairly few results. (I also tried “beautie” and “beautiful,” but there really weren’t that many once I narrowed to the contents of <mcr>; for what it’s worth, 35 out of 37 instances of “beautie” in <mcr> are capitalized.) Still, there does seem to be something potentially interesting here. Most of the time, the rendition doesn’t change with capitalization (there are, after all, 1475 instances of “Beauty” in the collection that are not in <mcr>), but when the rendition does change, there is a higher percentage of capitalization. I decided to try another keyword and see what came up. I went with “death” this time, using the same criteria that it’s short, fairly common in the corpus, and without many spelling variations (there is “deathe,” which had 5 capitalized and 138 lowercase instances overall, none in <mcr>, all from texts published before 1701). Here’s what I found:

2578 Death; 4759 death (35% capitalized)
239 Dead; 2381 dead (9% capitalized)

1226 Death; 2115 death (37% capitalized)
110 Dead; 1313 dead (8% capitalized)

Contents of <mcr>
251 Death; 54 death (82% capitalized)
218 Death; 34 death (87% capitalized)

These are just two specific keywords, of course; if I were pursuing this seriously, I’d want to refine the search itself and try quite a few more terms as well as other XPath variations: looking at headings and titles, checking for items in lists, perhaps comparing verse and prose, and so on.

“Friendship Cheese”

Finally, I decided to take a look at the contents of <mcr> itself, using an XQuery that Ashley Clark wrote for the WWP (affectionately nicknamed “The Counting Robot” and available here). I normalized punctuation, long s (ſ) characters, and whitespace, but preserved capitalization. I got 21,741 different strings inside of <mcr>; of those, 16,832 were unique. Many of the unique cases are not single words or short phrases, but entire sentences or clauses where the renditional shifts cannot be attributed to emphasis or quotation. The top term on the list was “God,” with 1237 results; rounding out the top-five for the corpus are: Lord, I, Love, and Author.

Of the 127 cases with 30 or more hits, all but ten are capitalized—the exceptions are: “life,” “death,” “lying,” “they,” “she,” “love,” “one,” “her,” “he,” and “royal paper.” (This last item serves as a small caveat regarding the size of our corpus: all 204 instances of “royal paper” appear in a single text, Mary Jones’s 1750 Miscellanies in Prose and Verse.) Nevertheless, I do think that these exploratory results show that there is a great deal of potential for more serious research into these features using the WWO corpus—and if anyone is interested in a project along these lines, I’d be delighted to help set that up. In fact, this is my semi-serious proposal to anyone in the research community (linguists or otherwise) who might want to take this kind of work up.

One of my favorite things about this sort of exploration is that it brings me into contact with our texts in unpredictable ways, usually emphasizing how interesting and genuinely fun our corpus is. This was no exception and I’ll end here with my personal Top Ten results from the contents of <mcr>:

  • Wretched productions! inspired by hunger and dictated by stupidity and a disposition to lying! &c &c
  • As Irish ladies pass in jaunting cars
  • Confounded Harlot!
  • Effemenate Cat
  • For Gad Madam I don’t love being baulk’d thus
  • Friendship Cheese
  • Great Cuttle’s gland
  • Hedges of the Eyebrows
  • His lisping children hail their sire’s return!
  • Julius Cesar when he was beheaded by Oliver Cromwell


Registration is Now Open for Two WWP Workshops

Registration is Now Open for Two WWP Workshops

Registration is open for two upcoming TEI seminars offered by the Women Writers Project and the Digital Scholarship Group at the Northeastern University Library. The first workshop, Introduction to TEI, will be held on February 17th–18th. The second workshop, TEI Customization, will be held on April 7th–8th. Northeastern University will host both of the seminars. The cost for each is $450 (students and TEI members, $300). Registration is free for members of the Northeastern University community. For more information and to register, please visit our workshops and seminars page.

Introduction to TEI offers an intensive exploration of scholarly text encoding, aimed at an audience of humanities scholars, archivists, and digital humanists. Through a combination of hands-on practice, presentation, and discussion, participants will work through the essentials of TEI markup and consider how markup languages make meaning and support scholarship in the digital age. No prior experience is necessary. Topics covered include:

  • Text markup languages as an instrument of humanities scholarship
  • Basics of TEI markup: essential text structures and genres
  • Advanced TEI markup: editorial markup and commentary, details of physical documents, complex structures
  • Contextual information and metadata

The schedule for this workshop is available hereRegister here by February 10, 2017.

The TEI Customization seminar will introduce participants to the central concepts of TEI customization and to the language (a variant of the TEI itself) in which TEI customizations are written. When properly planned, the TEI customization process can make a huge difference to the efficiency of a TEI project and the quality and longevity of its data. Good customizations capture the project’s specific modeling decisions, and ensure consistency in the data, while retaining as much interoperability and mutual intelligibility with other TEI projects and tools as possible. Customization also contributes importantly to the process of data curation, both at the time of data creation and later in the project’s life cycle. Topics covered include:

  • Background on how the TEI schema is organized
  • Essentials of the TEI’s customization language
  • Using Roma to generate schemas and documentation
  • Designing a schema for your project: data constraint, work flow, and long-term maintenance
  • Conformance and interoperability

The schedule for this workshop is available hereRegister here by April 1, 2017.

We hope to see you there!

Announcing: Women Writers in Review

Announcing: Women Writers in Review

We are delighted to announce the publication of Women Writers in Review, a collection of more than 600 eighteenth- and nineteenth-century reviews, publication notices, literary histories, and other texts responding to works by early women writers, transcribed and encoded in the Text Encoding Initiative (TEI) markup language. The Women Writers in Review interface offers sorting by the reviews’ sources, by the authors and works that they reference, by their genres and formats, and by tracked tags such as the topics they discuss and their evaluations of reviewed texts. We have also published an API, so that researchers can query and access the Women Writers in Review data and resources in JSON or HTML.

Women Writers in Review was created as part of the Cultures of Reception project, which was designed to investigate the discourse of reception in connection with the changing transatlantic literary landscape from 1770 to 1830. The Cultures of Reception project was generously funded by a Collaborative Research grant from the National Endowment for the Humanities.

We hope that Women Writers in Review will enable researchers to address a wide range of questions, which might include: how do periodical reviews in this period imagine the relationship between the local and transnational writing spaces? How do reviews work to constitute for women authors a sense of a reading public? What are the differences that mark reading and reviewing practices across various regions and localities? To what extent does geography affect patterns of reference to women’s writing during this period? How do reviews, anthologies, and other similar sources gender particular spaces or locations of reading? And, we hope, many others!

screen-shot-2016-11-13-at-8-13-19-pmOver the next few months, we’ll be posting on some of our favorite reviews, as well as some of the research we’ve been doing with the collection. For now, we can share a tip about the site: you’ll find some of the liveliest and most humorous reviews among those that have been marked as offering very negative evaluations of their subject matter.

We are also looking for faculty and graduate students who are interested in using Women Writers in Review in their classrooms to develop sample assignments using the collection. If you would like to learn more about becoming a pedagogical development consultant for the Women Writers Project, please contact us at wwp[at]neu[dot]edu.

To begin exploring the collection, please visit the main page or read this explanation of the site’s features.

The Women Writers Project staff at the official launch of the
The Women Writers Project staff at the official launch of Women Writers in Review. Photo Credit: Jennie Robbiano.
Loanwords, Macrons, and Orientalism: Encoding an Eighteenth-Century Fictional Translation

Loanwords, Macrons, and Orientalism: Encoding an Eighteenth-Century Fictional Translation

By Elizabeth Polcha, WWP Encoder and Ph.D. Candidate in English

Since late last fall, I’ve been encoding a text that poses some interesting markup challenges because of its use of Orientalist language: Scottish author Eliza Hamilton’s 1796 epistolary novel, Translation of the Letters of a Hindoo Rajah. While I was excited to encode Translation because my own research considers eighteenth-century colonial literature, I focus on Caribbean and American literature. So, as an encoder, I approached Translation with an interest in how Hamilton is using distinct language to construct colonial notions of race and gender, but with only a limited familiarity with Orientalist print culture and history.

Before I lay out the details of how I’ve been encoding linguistically distinct language in Translation, it is necessary to explain just how Orientalist (and orientalist, to use Edward Said’s version of the term) this novel is. And no, Translation is not actually a collection of letters that Hamilton translated from Hindi.1 The “translated” letters of Hamilton’s text are fictional, mostly authored by the titular character and protagonist, Zāārmilla, the Rajah of Almora. Hamilton supplements the letters with a “preliminary dissertation,” lengthy footnotes, and a glossary of terms. She strategically includes these textual addendums as a way of demonstrating her expertise in the Orientalist scholarship of her time. Also, as you can see from the macrons included on “Zāārmilla” and on another major character’s name, “Māāndāāra,” Hamilton is a fan of using diacritical marks as a kind of typographic flourish. In writing Translation, Hamilton participated in a scholarly discourse rooted in a Western imperialist fascination with Eastern Asia, citing British colonial scholarship like Nathaniel Halhed’s A Code of Gentoo Laws Or, Ordinations of the Pundits and Orientalist groups like The Asiatic Society.2

Part of our encoding process at the Women Writers Project is to begin with a preliminary document analysis. This means that once we’ve acquired a text to encode, we look through the text carefully to take note of its structure and textual features before opening up an XML file and marking up our text in TEI. During my preliminary document analysis of Translation, aside from noticing the epistolary structure and Hamilton’s unusual diacritical marks I’ve described above, I also noticed quite a few Hindi and Sanskrit terms and phrases that seemed to be roughly transliterated into English (such as “Poojah” or Pūjā, पूजा, a Sanskrit-derived word for Hindu ritual prayer). From my document analysis I knew that it would be important to look up the etymology and meaning of Hamilton’s transliterated terms in order to decide how to most accurately describe them using the TEI. My encoding practice for Translation so far has involved occasionally switching between my XML file, the Oxford English Dictionary (OED), and Google Books in determining the best way to tag specific terms and phrases.

The WWP follows the TEI Guidelines for capturing specialized language with the element <distinct>, which means that we use <distinct> to tag language that is “archaic, technical, dialectal, non-preferred.” In addition to <distinct>, <foreign> and <term> were also particularly important in my encoding of Translation. The WWP also uses the @xml:lang attribute with a value from the IANA language registry to provide standardized identifications for non-English words and phrases.3 This means that my encoding process involves paying attention to the etymology of distinct words and phrases in order to assign each <distinct> or <foreign> element an IANA language code.

For example, in the first letter in Translation, Zāārmilla refers to a character’s “Ayammi Shadee,” which Hamilton defines in a footnote as “the present made to a young woman by her relations during the period of her betrothment” (58). In determining how to encode this term, I first searched for it in the OED—which returned no results. I then searched in Google Books, which brought me to Halhed’s A Code of Gentoo Laws, Hamilton’s original source. Eventually, I determined that “Shadee” must be Hamilton’s (and Halhed’s) version of the Hindi word, shadi, or, marriage.

Example encoding of “Ayammi Shadee.”
Example encoding of “Ayammi Shadee.”

This term stood out to me in the text not only because it was capitalized and footnoted, but also because I did not recognize it. If Hamilton had simply used the word “Marriage” there would be no need to tag it with a more descriptive element, but because the WWP is interested in tagging non-English and linguistically distinct language, I needed to figure out the best way to encode the term. I ended up encoding “Ayammi Shadee” using the element <foreign>, which is used to tag non-English words in cases where there is not another more appropriate element, such as <name>, <persName>, or <placeName>. I also used the @xml:lang attribute with a value of “hi” for Hindi.

As in the example above, one of the challenges of marking up non-English and linguistically distinct terminology in texts like Hamilton’s Translation is that it is sometimes difficult to know when a word is being referenced in the text as a foreign language term, or when the text is using a term that has been adapted into English as a loanword. For example, the English word “pundit” is a loanword from the Sanskrit term “pandit” meaning knowledge owner, or, according to the OED, “a person with knowledge of Sanskrit and Indian philosophy, religion, and law.” So, when Halhed includes “Ordinations of the Pundits” in the title of his text, he is referring to a “pundit” as an intermediary who could clarify Indian law for colonial authorities.

It is also difficult to distinguish when a term can accurately be tagged “foreign” or “distinct” (<distinct> is the element we use for linguistically or dialectically distinct terms that are not distinct enough to constitute a ‘foreign” language), since what is considered foreign or distinct to me may not have been foreign or distinct to an eighteenth-century reader. The WWP aims to best represent the documents we encode within the context in which they were written and published, which is part of the reason why the OED is so often a valuable resource for encoders—we wouldn’t want to mark an early modern spelling of a particular word as a typographical error using the elements <sic> and <corr>, for example. But it is also important to recognize that each encoder approaches the encoding process with her own understanding of the text. My choices in marking up the term “Ayammi Shadee” are based on my understanding of the WWP’s encoding practices and my analysis of the text—and these choices will be reviewed by other encoders and may change as Translation moves through our proofing process and into final publication on Women Writers Online.

What I love about working for the WWP is the endlessly evolving way we think about markup, and the collaborative nature of the encoding process. From the many discussions I’ve had in encoding meetings with my WWP colleagues about Hamilton’s Translation, we’ve shifted slightly in our thinking about elements like <distinct>. Ultimately, the complicated way Hamilton uses Hindi- and Sanskrit-derived terms has helped me to think more critically about the linguistic complexity of eighteenth-century colonial writing.