“‘The Text is Variety’: Contextualizing and Analyzing the Works of Margaret Cavendish with Text Encoding

“‘The Text is Variety’: Contextualizing and Analyzing the Works of Margaret Cavendish with Text Encoding

Below are lecture notes from Sarah Connell’s presentation at the 2017 International Margaret Cavendish Society Conference. The slides are available as a separate file here.

Okay, so, since one of the themes of this conference is how Cavendish was received, I want to begin with a quote about her from a text in Women Writers Online.

So, here we have Elizabeth Benger on Cavendish, speaking of her fertile fancy, her uncommon genius, her wildness and inaccuracy, and her voluminous works. And, as much as this feels like a textbook example of damning with faint praise, I have to say I find myself sympathizing with Benger when she speaks of Cavendish’s wildness—you see, I didn’t come to this project expecting to work on Cavendish at all. I was trying to do research with Women Writers Online as a collection but I found that Cavendish just kept popping up. Her works started to feel wild precisely because they are so voluminous; they represent a very significant percentage of our corpus, so it’s not really surprising that they were so prominent in all of my searches through the collection. But, I’ve found that with Cavendish, it’s not just about sheer numbers; she also was showing up in my research because her texts have a high number of unusual phenomena. It seemed as if, whenever I found some textual feature that was unique to a particular author, that author would be Cavendish. Well, or Eleanor Davies. But, it was Cavendish a lot of the time. So, clearly, Cavendish called for a research project of her own, which is what I’m going to share with you today. But first, I’ll give you a bit of background.

So, as I said, I was working from the Women Writers Online collection, which has about four hundred texts by women. These are largely print texts, although we do have one manuscript collection with the Almanacks of Mary Moody Emerson. We have a relatively broad chronological framing, 1526 to 1850, and the texts themselves are quite generically diverse. These texts are published in a web interface called Women Writers Online, but they’re encoded in TEI, which is much more detailed and information-rich than we’re able to show on the web.

And here’s what I mean by information rich; in fact, I’ve simplified this and all other examples of our encoding to make it more readable. TEI markup is a very complicated and diverse topic, so I’ll focus on the basics here. We use elements, such as this <head> element, which marks that “Scene 8” is a heading. Here is a <div> element, for a textual division. The TEI is very, very good at labeling things—saying, for example: this is a stage direction, this is a division, this is a paragraph, this is a speaker label—and it’s very good at marking their boundaries; this stage direction starts here and ends here. The TEI is also good at showing hierarchical relationships, the nesting of textual features; so, here we have a <sp> element, used to mark a dramatic speech—and, inside of that, we have a speaker label and a paragraph. There’s no ambiguity that this speaker label and this paragraph belong together, because they’re both in the same <sp>. In addition to elements, the TEI also has attributes, which are kind of like adjectives. They give more information about their elements. For instance, we have three examples of the @type attribute, one on <div>, asserting that the type of division we have is a scene and two on <stage>, describing which types of stage directions we have. This @who attribute points to a cast list elsewhere in the document, where we’ve defined “ign” as referring to Lady Ignorant. That way, every time she speaks, we’ve marked those speeches as belonging to her in a way that’s easily readable by a computer. There’s no ambiguity, even if the speaker label is missing or incorrect. Okay, so, like I said, this is a big topic, but that covers the basics. Elements both name and mark the boundaries of features within a textual hierarchy and attributes provide more information about elements. My work has been on how we can use this markup in literary research; I’ve been developing methodologies for asking questions about our collection, taking advantage of the really enormous amount of information that’s available in encoded texts. So, turning to Cavendish now.

Here’s what we have of hers. Depending on how you count things, we have at least nine and as many as twenty-seven works by Cavendish, if you count each play separately. When you’re only talking about 400 texts, that really is quite a a high percentage. And, if you use the markup to get into the details of those texts, you can get an even better sense of just how much Cavendish there is.

We have over a million words, more than 15 thousand paragraphs, 13 thousand lines of verse, and 11 thousand dramatic speeches. There are almost 3,500 page breaks, which I had to double-check, because it didn’t seem believable to me. But, that’s correct. In addition to those basic structural elements, we also have markup for quotations and for phrase level features like names of persons and places, as well as the proper names of works, encoded with <title>. So, that’s one way that the markup can give you a sense of what’s in the collection of Cavendish works in WWO. And, here’s another.

As I mentioned, we use the @type attribute to categorize our textual divisions, so you can count those and see how our Cavendish materials fall into the WWP’s categorizations. Essentially, you can use text encoding to get a profile of a particular text or set of texts; there are this many poems, that many scenes, and so on. Even in these basic counts, we’re already starting to see potentially interesting patterns, particularly around paratexts. Cavendish’s works have quite a lot of general prefatory materials, for example, but much less general concluding material. Epilogues and prologues, on the other hand, are nearly evenly balanced. There’s just one advertisement and one table of contents. And so on.

For basic element counts and types of textual divisions, there’s really just too much Cavendish to compare with anyone else in the collection. But, looking at language usage, we can compare different authors. I’ve given you an example of the markup that makes this kind of query possible; the @xml:lang attribute has values from a controlled vocabulary for describing languages. This attribute can go on any element to indicate its language and, if there’s no more appropriate element, you can use <foreign>, as I’ve shown here. So, for all of these authors, French and Latin dominate across the board, with Italian coming in third. But, the relative percentages are different in Cavendish; she has about twice as much Latin as she has French, which does set her out among this group, but puts her in line with Women Writers Online as a whole. In total, we have about 2,000 instances of Latin, 1,600 of French, and 200 of Italian. Relative percentages of Latin and French are very much a distinction of period. If you look in the seventeenth century, there is about four times as much Latin as there is French; in the eighteenth century, there’s twice as much French as there is Latin. Which, I suppose, doesn’t really surprise anyone who’s worked in those periods, but it is I think reassuring to know that markup-based results can be verified by what we already know. Okay so, getting a bit more complex than simple counts, we can also ask questions about where elements of interest are appearing. In my research, I’ve discovered that it’s useful to look at both general patterns, where particular elements most often appear, and at outliers: where there are unusual cases. So, here’s just one such unusual case:

I’ve been doing a fair amount of work on intertextuality, for a current project at the WWP, so I wanted to look at where <title> elements for proper names of works were appearing. For a bit of context, there are more than 5,000 <title> elements in Women Writers Online, and these generally show up in bibliographic citations, notes, advertisements, and, quite often, just in prose paragraphs. By contrast, only about sixty appear in drama as I’ve identified it here, using a fairly conservative definition. As you can see, Cavendish comes in just after Cowley for number of titles named in drama. Now, remember that encoding is really good at making layered textual hierarchies explicit, so once you’ve narrowed to this definition of drama, you can then go look at the elements inside of drama to get more specific about where titles appear. Most of them are in prose rather than verse. About forty of these titles are in the <sp> element, that is they’re named by the characters in the play, about fifteen are in stage directions. In the whole of Women Writers Online, there are just three titles in cast lists; all in the works of, you guessed it, Margaret Cavendish.

Here’s one of those. The paragraph above gives a bit of context from elsewhere in the text and the encoding below shows you the markup I found in my search: essentially, Plays Never Before Printed contains a fragmentary play that was meant to be published with the Blazing World; as Cavendish explains, she found her “genius did not tend that way” so she left the project behind, but did “suffer” the piece to be published in the 1688 Plays collection. Then as the heading in the encoded cast list explains, Cavendish also authored characters’ names for a farce that would have followed the play in the Blazing World. But, the first play being unfinished “the farse was not so much as begun.” Nevertheless, Cavendish did include the farce’s cast list in her collection and that’s what you’re seeing here. To my mind, this is a particularly clear example of how unusual instances in the encoding—title elements within cast lists—are effective at pinpointing noteworthy textual phenomena. You might also have noted that this <title> element references one of Cavendish’s own works, which is something else that can be examined with markup. So, here are Cavendish’s most-named titles.

Our current work on intertextuality will make this search much more precise, but for now we’re still relying a degree of human intervention, and there’s a chance I’ve missed some titles if the spelling variations were significant enough. But, even with that in mind, you can still see some overall patterns. I think the immediately obvious aspect of these results is that the titles Cavendish is naming are, often, Cavendish titles. This isn’t really unusual, though I haven’t seen any other author in WWO reference her own work quite this extensively. In fact, if you look at all of the titles named in all of our seventeenth-century texts, Philosophical and Physical Opinions still comes out in the top three. So, what were other seventeenth-century writers naming? That’s something else that can be queried with the markup.

I ran the same search in the non-Cavendish texts that had publication dates in the 17thc and the results were…rather different. First of all, I should note that the search for <titles>s is actually underreporting biblical references because the WWP uses a different element in cases where writers cite biblical texts by chapter and verse; these are just references to the titles of entire biblical books. With that in mind, I wanted to look at biblical citations as well and I found that, for the seventeenth century, there are another 1869 chapter-and-verse biblical citations. Two of those are in works by Cavendish. So, I think it’s fair to say that her citation practices are measurably different from other seventeenth-century women writers, in ways you can track with text encoding.

Finally, I’d like to close with an example of some research I’ve really just begun. I’m at the stage now of gathering results and I’m not yet sure precisely what all of this means, but that’s actually something I’d hoped that you all might be able to help with. So, I’ve been looking at a particular element, <mcr>, which is is an element that was actually created by the WWP. <mcr> stands for “meaningful change in rendition.” “Rendition” means the appearance of the text, for example, is it italicized, underlined, in all caps. We consider text “renditionally distinct” when its appearance shifts to be different from the text around it, for example words that are italicized when surrounding text isn’t. Often, words will be renditionally distinct if they’re names, or if they’re foreign-language words, or if they’re being emphasized. But sometimes they’ll be renditionally distinct in ways that we can’t attribute to naming or linguistic features and that’s when we use <mcr>, to say: there is a change in rendition here, and we think it’s meaningful, not just decorative, but we’re not able to be more precise about why the rendition has changed.

So, I wanted to examine the words in Cavendish’s texts encoded with <mcr>. Here’s what I’ve found; this is a listing of the most frequent words in <mcr> by unique occurrences, so, for example, the word Atomes also shows up many other times with adjectives like sharp atomes, flat atomes, round atomes, fiery atoms and so on. Here, Cavendish follows a usual pattern for WWO, in which words in <mcr> are generally nouns and usually capitalized. Now, as satisfying as it is to survey entire corpora with a few keystrokes, one thing I’ve learned in my research is that it’s very important to be moving back and forth between collection-wide results and individual texts. And, in fact, one of the things I find really valuable about the methods I’ve been establishing is that they make it possible to move seamlessly between these birds-eye views and the texts on the ground, so to speak. <mcr> usage in Cavendish is a really good example of why it is important to keep individual texts in focus, because, in fact, most of the words in this slide are from a single text.

In fact, of those almost 15,000 <mcr> elements in Cavendish, 13,710 are Poems and Fancies, marking italicization shifts. And when you see this line group, you can start to see how those numbers got so high. It’s worth noting that there is *nothing* in Women Writers Online that comes remotely close to this proliferation of meaningful changes in rendition. The next highest text is Jane Barker’s Poetical Recreations (1688), with about 5,000 <mcr> elements. Judith Murray’s The Gleaner (1798) has about 3,000 and Elizabeth Rowe’s Poems on Several Occasions (1696) has 1800. Only eight texts in the whole collection have more than a thousand <mcr> elements. And, certainly, there are quite a few verses in Poems and Fancies like this one where nearly every noun is italicized.

What I’ve actually discovered, though, is that there are still plenty of nouns that are not distinct; and,  in fact, when you look word-by-word, you can see some interesting patterns in where words are or aren’t distinct. I’ve begun looking at individual words from Poems and Fancies, particularly those that are well represented in both the renditionally distinct and the non-distinct columns. So, by contrast, Atomes is almost always renditionally distinct, to the point where I’d wonder whether the two non-distinct instances are actually errors. With terms like “love” and “reason” that have a more even split, there are pretty clear patterns about which are distinct. When love is used as a verb (as in “love to play”) it tends not to be distinct. When it’s a noun (“love and hate”, for example) it’s likelier to be distinct. When reason is a verb, or used in constructions like “the reason why” it tends not to be distinct. Capital-R Reason as in “The Rule of Reason” tends to be distinct. These aren’t hard and fast differences, but they’re recognizable tendencies. You see the same thing with “feare” and with “care”; noun forms, particularly those referring to abstract concepts, tend to be distinct where verb forms aren’t.

Cases where words are usually distinct, with some exceptions, are also interesting. With Death, the non-distinct cases are all but one lowercase and all but one (a different one) clustered at the end of Poems and Fancies. You see the same sort of thing with “gods”; all but one of the non-distinct instances are lowercased and they’re fairly tightly clustered. I’ve only just started working with this material and I’m still figuring out how to make sense of it all, but I do think there’s something interesting here and, as I said, I’d be grateful for your thoughts.

Finally, I’ve also found that some words, like delight, tend to be non-distinct, so I’ve been looking at the cases where they are distinct to see whether that might have a particular significance. I’ve given you an example of one such usage here, partly because I think it highlights a pattern I want to investigate next—is there a correlation between verses that have very high instances of italicized terms and distinction in words that otherwise tend not to be distinct? This is a fairly large question, but it is one that the encoding makes it possible to answer. So, in the example I’ve included here, not only is delight italicized, but also horses, carts, cows, butter, and milk, among quite a few others. I’ve chosen to end with this verse not just because it does show that high rate of italicization but also because it is an example of the real pleasure I’ve had in making new discoveries in our collections through the research I’ve been doing, since it contains what is very possibly my favorite example of any term inside of <mcr>: I’m speaking, of course, about the unforgettably-named “friendship cheese.” Thank you!


Leave a Reply

Your email address will not be published. Required fields are marked *