Browsed by
Tag: R

‘To the most distant Parts’: Reading and writing about the world in The Female Spectator

‘To the most distant Parts’: Reading and writing about the world in The Female Spectator

This post is part of a series authored by our collaborators on the Intertextual Networks project. For more information, see here. 

By Samuel Diener, Ph.D. Candidate in English, Harvard University

In the November 1744 issue of her periodical The Female Spectator, the novelist and essayist Eliza Haywood writes:

What Clods of Earth should we have been but for Reading? —How ignorant of every thing but the Spot we tread upon? —Books are the Channel through which all useful Arts and Sciences are conveyed: —By the Help of Books we sit at Ease, and travel to the most distant Parts; behold the Customs and Manners of all the different Nations in the habitable Globe, nay take a View of Heaven itself, and traverse all the Wonders of the Skies.1

Haywood’s exclamation is an admonition to her female readers to cultivate knowledge of history, ethnography, geography, cosmography, and the art of navigation. But it is also an injunction to employ the social technology of the book to travel all over the globe. For Haywood, books offer access to the frontiers of empire. They are a ticket to the contact zone, one that enables the reader to behold the “Customs and Manners” of the national other.

Haywood suggests that her readers owe it to the mariners who bring back the luxuries of empire to journey with them vicariously: “a Sense of Gratitude, methinks, should influence us to interest ourselves in the Safety and Welfare of the gallant Sailors, . . . commiserate their Sufferings, and rejoice in their Escapes.”2 In the midst of a moment of crisis for the British empire, when its future success was the subject of anxiety, Haywood here advises her readers to confirm the notion of empire and fill a specific gendered role in the imperial project: vicarious participation. But she also suggests that women owe it to themselves to cultivate their knowledge of the globe precisely in order to contest the constraints of that gendered role in the course of interactions with men, reading “to the End they may be enabled to make an agreeable Part in Conversation [and] be qualified to judge for themselves.”3

But did Haywood herself (and other British women of the early modern period) actually engage in this kind of readerly practice? And how did they view their role in the empire’s expansion? The Women Writers Online corpus presents a potentially valuable way to approach this question. It is coextensive with the rise of British imperialism, including many moments when the imperial project was in a precarious position, and contains texts that engage topically with the extra-European world. Since each place-name reference in the corpus is tagged as a TEI/XML element with <placeName>, it is possible to map these references. As part of the Intertextual Networks Project, I will be using the <placeName> tags to explore the extent to which the women writers in the corpus engage topically with the imperial margins. Then, by examining the context of individual references (or clusters of references), I will be able to make conjectures about the networks of information in which these women were embedded, the sources they employ—like news or narratives of travel—and the uses they make of their material. As a result, I envision my project as a two-staged, mixed-method study: first tracking references at the macro-level, and then following up with careful interpretation and analysis.

Computational Analysis

The first obstacle to working with the corpus at a macro level is simply accessing the data. Thankfully, there are multiple resources available for this kind of work. After an excellent workshop with Northeastern University’s Syd Bauman and Julia Flanders on XSLT which I took this January, I’d recommend this language for other users of the WWO corpus; it’s straightforward and intuitive and specifically designed for interpreting XML data. Also, there is an existing set of useful resources produced by the WWO team, including Ashley Clark’s “Counting Robot”, which is available here.

However, since I was eager to begin work and lacked any experience with XSLT at the time I began the project, I conferred with some friends who have significant coding experience and they helped me design a simple counting robot in Python that performs the same function. It extracts the contents of the <placeName> tags to a large tab-delimited table, converts special characters (like the medial S), and eliminates alternate punctuations to obtain reference totals for each work (see Figure 1). Because I am specifically interested in mapping topical engagement in the texts, I chose to exclude frontmatter and backmatter, focusing only on the body of the text itself. (I don’t mean to imply that that material doesn’t contain valuable data, but only that its significance for the questions I wanted to ask seemed harder to predict. Future versions of the project may include this data.) We then created a second data table, which lists all the unique place names and their combined totals across the texts. In all, there were 6,091 unique place-names in the corpus as it stood at the time I began my project. Each place-name was also assigned a unique 4-digit ID based on its frequency-rank.

Figure 1. Example selection from the initial dataset, with columns for author, short version of title, publication date, most common punctuation of the place-name, and count. The sixth column lists all variant punctuations and spellings, so that individual references can be traced.

Together, these two datasets form a rudimentary relational database that will let me use functions in R (my language of choice for data-analysis) both to find patterns in place-name usage over time in the corpus at large and to map the topical engagement of individual texts. Figures 2-4 show the kind of broad-brush analysis that such data makes possible. They map the shape of the data for the entire corpus. A striking dynamic emerges: a collection of just a few locations, often around the metropole (England, France, London), are referenced an enormous amount of times, but the distribution curve falls off very quickly to a very, very long tail. Of the 6,091 unique names, only 487 places are mentioned more than ten times.

Figure 2. Bar plot of place names in the WWO corpus, sorted by number of total references.
Figure 3. Histogram plot of frequencies. The y axis is the number of references; the height of each bar represents the number of place-names that are mentioned at that frequency. Thus the first bar shows the number of places mentioned just once.
Figure 4. Frequency histogram, omitting place names mentioned just once.

Unfortunately, as Figure 1 illustrates, there are significant problems with this data. A glance at the text will show, for example, that the different names in the sixth column of lines 732 and 741 refer to the same place. To correct such issues, I am going through the entire second data-table, editing the ID’s so that alternate spellings of the same place-name are assigned the same unique ID. I will also have to look up archaic place-names to identify their geographical referent and to make distinctions between real-world places and “heaven,” “topsy-turvy,” “Abraham’s bosom,” and other fictional, mythical, or non-terrestrial locations. Finally, in order to map the geographical distribution of these places, I will have to retrieve (using the “ggmap” package available for R)—and check by hand—latitude/longitude coordinates for each place.

This labor-intensive process is simply beyond the realm of possibility for a busy PhD student like myself. (I can do about 15-20 place names in an hour.) However, there are 3,524 place-names that appear only once in my dataset. Trimming off this “long tail” will still give me valuable, if somewhat simplified, data, as shown in Figure 4. And a diversity test of the data, like the one shown in Figure 5, shows that nonce place names are fairly evenly distributed across the corpus. Getting rid of them only excludes a few texts, which mostly prove to have had just a small number of place-name references. (Examining these texts to see what generic or other conventions predict such less-spatially-localized writing might prove fascinating matter for another project). So far, I have only worked my way through about 700 of the 2,567 place names that occur more than once in the database, so it will be quite a while before I can begin to do analysis at the aggregate level.

Figure 5. Shannon diversity plot of authors in the corpus, showing their place-name diversity (threshold >0) and how it is affected by excluding place names that occur in the corpus just once (threshold > 1), twice (threshold > 2), three times (threshold >3), etc. Authors with only the “>0” bar use no place names that appear more than once in the corpus, and thus will no longer be represented in the dataset if nonce place names are eliminated.

Spectator as Case Study

Since my project was inspired in part by the section of The Female Spectator that I mention above, I’ll return to that work as a test case to see what these methods can tell us about a text using the data I have so far. I’ve checked and obtained coordinates for the 192 unique place names mentioned in the four volumes of the periodical available in WWO. The distinct character of their distribution is immediately apparent, and it reveals—surprisingly, in light of the passages I quote above—a tightly localized focus. The text’s most-used place name by far (at 46) is “London,” which (by contrast) takes a distant third place in the corpus’ overall place-name distribution (see Figure 6). As Figure 7 shows, many of the other place-names mentioned in the periodical (including, for example, the street-addresses of its ostensible contributors) also cluster densely around the metropolitan area of London. Meanwhile, most of the foreign high-scorers in the corpus data set (Rome and America, for example) drop well down in The Female Spectator’s data (see Figure 6).

Figure 6. Top 20 most-referenced places in the WWO corpus (left) vs. top 20 most-referenced places in The Female Spectator (right).
Figure 7. The Female Spectator: Place-names in the vicinity of London.

I’d suggest that an explanation for this geographical localization is easily found in the structure of the work. The first periodical aimed at women authored by a woman in English, The Female Spectator was produced by Haywood in London between 1744 and 1746. It engages with debates about politics and domestic life that were topical for bourgeois and upper-class women in and around London in the period and takes the same form as many other famous periodicals of the century like The Tatler and The Spectator. It consists of one essay each month engaging with a particular topic, often including and responding to a letter ostensibly written by a reader from the same geographical area.

The periodical thus attempts to mirror formally, while also providing a medium for, a public sphere for 18th century women living in its primary area of distribution in the environs of London. Comparing this map to the England/France map (Figure 8) and the world map (Figure 9) show us how dramatically place-name references drop off as we go farther from the metropolitan center; for example, one occurrence of “Canada,” two of “America,” and three of “West Indies” are the only references to the Western hemisphere (unless you count two references to the Pacific and one to the South Sea).

Figure 8. Place-name distribution in Britain and France in The Female Spectator.
Figure 9. Global place-name distribution.

As Figure 9 shows, Haywood’s primary sustained engagement with the non-European, non-Mediterranean world seems to have been with the island of Sumatra in Indonesia, then the site of a small British colonial trading post called British Bencoolen. Most of these references come from a single section in the October 1745 issue of The Female Spectator, which tells the tale of a British crew shipwrecked on Sumatra. The story opens with a breakdown in Western technical prowess: the ship leaking badly, the crew deliberately runs it ashore, where it lodges fast between two rocks. To this breakdown is quickly added a reversal of the documentary gaze. The shipwrecked sailors are surrounded by indigenous locals, and kneel in surrender: “This made them withdraw their Bows . . . and draw round us in a Circle, staring as the Rabble of England would do on one of them, had we had them here in the odd Habits they wear there” (186). The inversion of roles upsets colonial hierarchies, reminding us that on the soil of another Empire—as we soon find out, the Empire of Summatra—the British seem as bizarre, and their clothes as garish, as indigenous people might seem to the British. The entire anecdote seems to be fictional: despite extensive searching, I have been able to find no corroborating sources. Haywood’s point in the tale, she states explicitly, is to contest the othering rhetoric of travel writers, who imply “that God had endued only the Europeans with reasonable Souls.”

The variety of travel-books Haywood mentions and summarizes for her readers—mainly in the July 1745 issue—suggests that she was reading voyage narratives with comprehensive deliberateness. She describes (among others) works by Aubry de la Mottraye (1674?-1743), Bernard de Montfaucon (1655-1741), William Dampier (1651-1715), Jean-Baptiste Du Halde (1674-1743), François Maximilian Misson (1650?-1722), Cornelis de Bruyn (1652-1726?), Jean-Baptiste Tavernier (1605-1689), and Jean Chardin (1643-1713). Her list concludes, “There are yet some other Books I would fain take upon me to recommend; but . . . I have been already too ample in my Detail.” It is thus particularly striking that in The Female Spectator itself, so far from enacting vicarious participation with the British imperial project, Haywood employs her mastery of the genre and the discourse of travel narrative to fabricate a fictional voyage of her own that calls into question the ideological assumptions of what was, at the time, a genre dominated almost entirely by men.

R, Voyant, and the Search for Computational Delicacy in an Early Modern Corpus

R, Voyant, and the Search for Computational Delicacy in an Early Modern Corpus

This post is part of a series authored by our collaborators on the Intertextual Networks project. For more information, see here. 

By Amanda Henrichs, Institute for Digital Arts and Humanities, Department of English, Indiana University

My contribution to the Intertextual Networks takes up the literary and historical relationships between Lady Mary Wroth (1587–1651) and her aunt Mary Sidney-Herbert (1561–1621). These two women are members of the Sidney family, one of the most influential families in English literature and politics for over 200 years (the 2015 Ashgate Research Companion is invaluable here.) Both women were active in Queen Elizabeth’s court, and both provided literary and artistic patronage to writers, artists, and musicians. Further, both were known as prolific and respected authors to their contemporaries. Wroth in particular has enjoyed a resurgence in popularity (and scholarly praise for her literary skill) over the past few decades.

These women lived together and—scholars tell us—wrote together. Yet, the primary evidence for their relationship is historical. That is, when scholars assert that Sidney-Herbert was a formative literary influence for Wroth, they do not cite stylistic similarities. Rather, they mention the time the two spent together at Penshurst, the Sidney family’s home in Kent, and the loving relationship between the two women. But it seems nearly necessary that there would be stylistic evidence of Wroth’s literary homage to her aunt: Wroth is a highly allusive and intertextual writer, with clear allusions to, and borrowings or translations of, Petrarch, Philip Sidney, Fulke Greville, Edmund Spenser, and others. But Sidney-Herbert seems to be entirely absent from Wroth’s works.

There is thus an absence of intertextual connection where there should be a presence. And this is what my current project takes up. I am writing an R script to mine Wroth’s long prose romance Urania and Sidney-Herbert’s translations The Tragedie of Antonie and A Discourse of Life and Death for similarities in word choice, sentence structure, turns of phrase, and other stylistic similarities. Then, based on these results, I will use another coding language to visualize the results. In effect, I want to visualize literary absence.

I want to pause here, though, and mention some of the problems I’ve run in to. The biggest one is R itself. For those who aren’t familiar, R was originally used to run statistical analyses on very large datasets, and is now quite popular with humanists who want to do things like text mining and topic modeling. R is a very powerful tool, but it is also idiosyncratic, complex, and difficult to master. Even working through Matthew Jockers’ incredible book Text Analysis with R for Students of Literature, I keep getting bogged down in cleaning and parsing the text files I’m examining; I also have to continually remind myself of R prompts and commands, since even a single wrong keystroke creates an error I need to go back and dig out—a debugging practice that is second nature to trained programmers, but less familiar to traditional researchers in the humanities. From what I can tell, this is a common experience for scholars who, for whatever reason, want to employ computational approaches in their research.

Other problems include asking the right questions; or rather, asking questions in a way that R can understand. I am at the point where I can tell R to pull a .txt file from the internet (or my computer), clean out the extraneous metadata from the beginning and end of the text, split the text according to its internal divisions (be they chapters or stanzas), find the relative frequency of a word or words across the text, and plot those frequencies in a graph of my choice. In Shakespeare’s Sonnets, for example, I found that there are 4,612 unique words in the collection. The word “I” accounts for 1.8% of the total words; “my” for 2.6%. But a patient and dedicated reader could do this work without a line of code. At this point, I’m saving enormous amounts of time, which is of course incredibly valuable in itself, but I am gaining old insights more quickly, rather than coming to new conclusions. And what does this data actually mean? It isn’t enough simply to spout statistics, as interesting as it may be to have these numbers handy.

In the case of Wroth’s Urania, for example, I know that the word “she” declines dramatically toward the end of the romance, precisely at the point when the words “lo”, “loue”, “louing”, “loued”, etc., spike dramatically. In the interest of quick results, I uploaded the romance to Voyant, an online visualization tool that remediates a text of your choice. Here, the blue line is the “loue” variations and the purple is “she.”1

Voyant visualization of “she” and “loue” variations in Wroth’s Urania.

Towards the end of the romance is where the heroine Pamphilia finds happiness in love; and “she” simultaneously disappears, both literally and figuratively. Does this chart also open up a feminist critique of the loss of selfhood of an otherwise proactive and literarily productive female protagonist? Or does it simply reflect that Wroth appended the sonnet collection Pamphilia to Amphilanthus to the romance? In this collection, she details her constancy in her “loue” for Amphilanthus, but writes in the first person instead of the third. Thus the decline of “she.” I’m inclined to the latter interpretation; but, given the immense difference in length between the prose romance and the sonnet collection, there is still an interesting shift that might need further investigation. If you’re reading this blog, I don’t need to convince you of the value of digital or computational approaches and what these kinds of results remind me is that approaching old texts in new ways might let us see things we simply haven’t noticed yet. Computational approaches—once we learn them—are not only incredibly fast, they can also help us make remarkably subtle observations.

Though the multi-text capabilities of Voyant are not as subtle as I would like, they still gesture towards the simultaneous reach and delicacy of computational tools that I hope to achieve with R. When I uploaded all three texts to Voyant, I started to find some interesting things. For example, Antonie has the highest vocabulary density, while Urania has the lowest. (Urania is also the longest text; however, Discourse is the shortest, which lends credence to the density result. That is, Antonie seems to have a proportionally higher vocabulary density than the other texts, regardless of length.) More suggestive still are the words which are distinctive to each text; in Antonie, “hir” is most prevalent (56 instances), followed by “cl”—the speaker tag for Cleopatra(43), and “Antony” (40). In Discourse, we have “wee” (51), “worlde” (20), and “porte” (6); in the Urania, “shee” (1,386), “Amphilanthus” (392), and “Pamphilia” (269).

Again, the question is, what do we do with this data? I might conclude that Antonie is an extended blazon of Cleopatra’s qualities: her estates, her person, her speeches, her beauty. I might also say that it appears that the Urania doesn’t pass the Bechdel test; even though “shee” is four times more present than Amphilanthus, we still have more mentions of Amphilanthus’ name, suggesting that characters (or the author) talk about him more than they talk about Pamphilia.

Yet I am not tied to any of these interpretations; they could be completely wrong. Instead, I am more inspired by the possibilities that are suggested by these lists of numbers. While I will eventually need to come to conclusions about the specifics of my data, for now I am content with what tools like Voyant and R certainly provide me: a different view. In other words, numbers are not enough; but more satisfying are the subtle characteristics that computational tools let me visualize, even when the sheer amount of text seems anything but subtle.

One short postscript: I spent hours (three, I think) trying to create a comparative scatterplot in Voyant of the distinctive words I mentioned above. The closest I came was this:

Attempt at a comparative scatterplot in Voyant.

And this is clearly not very legible. In order even to get to this point, I had to use the raw frequency of each word, and manually strip out partial words like “lo-”, “-ed”, “-ing”, “ha-”, and “bra-”. I also had to use a proximity tool; I asked Voyant to show me the words closest to “she,” and limit the results to about 35 words. One thing we can see is that “he” is the most common word closest to “she”; we also see verbs like “doe” and “make.” This suggests that “he” and “she” are both very active in the texts, and because “she” is more common than “he,” that the female protagonists are most active. However, I’m still not committed to these results, partly because I didn’t tell Voyant how to determine proximity, and partly because I still have a very hard time understanding what this plot is telling me. I present this plot for two reasons: one, because the prevalence of verbs is suggestive; and two, because I want to emphasize how important it is for humanist researchers to know at least a little bit about the back-end of the tool they might use. Since I don’t know exactly how Voyant determines proximity, and I also can’t tell it to consider the “u” character as part of a full word (as in loue, haue, or braue), I’m not willing to draw interpretations from this data. In other words, with Voyant I’m left with interesting directions for future inquiry; with R, because I will have written the code myself, I will feel confident in my results.