“Day of DH” Snapshots of Our Daily Lives

The Women Writers Project is proud to host our local Digital Scholarship Group “Day of DH” post this year. “Day of DH” provides an opportunity for members of the DH community to share “day in the life” vignettes with each other. For more information about “Day of DH,” please view the official site and you can follow the twitter hashtag #DayofDH.  I hope these snapshots offer a fun array of some of the people, activities, and work that comprises the DH community at Northeastern.

Julia Flanders, Director of the Digital Scholarship Group and the Women Writers Project

This year for “Day of DH” I had an unusually substantive day–in the past I’ve sometimes found myself trying to create an inspiring narrative about the relevance of administrative work, but today I did some genuinely digital-humanities things. My first activity was a meeting of the research group for a seedling grant that is focused on using the Women Writers Project corpus with Word2Vec. In the coming year we’ll be expanding some tools Ashley Clark developed that produce a modified version of the WWP’s TEI/XML markup from which we can then extract plain-text data to feed into the word vector analysis. The modifications handle things like hyphenated words broken across a line break (representing these as a single word for analysis purposes), or selecting the regularized-spelling option for words which the WWP has marked for regularization. The resulting output produces more meaningful results in the word vector analysis (since it doesn’t include word fragments and typographical variants). We sat down together as a group and installed the current version of Ashley’s XSLT and XQuery routines, so that as the grant work gets going we can all experiment together.

After that, the Digital Scholarship Group had its weekly staff meeting at which we discussed the recently announced NHPRC/Mellon “Digital Edition Publishing Cooperatives” funding program, and the potential it might hold for DSG. Then in the afternoon, Syd Bauman and I taught the second session of a short and intensive workshop on schema-writing with RelaxNG, for graduate students in Northeastern’s Digital Humanities Certificate program.

A good and enjoyable day with wonderful colleagues–I feel really lucky for these moments of routine productivity, amid more uncertain and threatening circumstances.

Sarah Connell, Assistant Director of the Women Writers Project and the NULab for Texts, Maps, and Networks

You can get a reasonable picture of my day by looking at “before” and “after” versions of my to-do list, combined with my calendar. Today was a fairly standard Thursday in that it was mostly meetings, with other work happening in the gaps between. On my train ride in and for the first half-hour of the day, I was able to prepare for a training session I have tomorrow and send out a scheduling notice for an upcoming meeting that the NULab faculty will be having to plan for our programming next year, which will focus on the theme of fake news and disinformation. I also checked one of our WWO texts to see if my suspicions that a semicolon really needed to be a period were correct (they were). I replied to a few emails as well (there are always emails) and I got some incremental work done in reviewing the newest set of Women Writers in Context exhibits for publication.

Then, Ashley Clark and I met with the team who will be working on a new WWP project, funded by one of Northeastern’s TIER 1 grants, to set up a prototype vector space analysis web platform for Women Writers Online. This was a fun meeting because we were getting the whole team up and running with the XSLT and XQuery transformations necessary to take encoded texts and prepare them for analysis using Ben Schmidt’s word2vec package in R. It was a good chance for me to practice walking people through these processes and, as always, there were some new wrinkles that came up, which Ashley and I will now be able to anticipate the next time we teach this. That meeting ran late, so I ended up going right into the Digital Scholarship Group team meeting (which actually just meant moving to a different seat on the couch in our media lounge).

After the DSG meeting I grabbed a bit of lunch and sent a few more emails, including a scheduling message for a meeting on using the CERES Toolkit in a class on Literature and Digital Diversity that Elizabeth Dillon and I will be teaching in the fall. I was also able to take care of a few WWP admin tasks before the next meeting—in this case, actually a workshop on RELAX NG and schema planning, the second of two sessions led by Julia Flanders and Syd Bauman. After that workshop, Julia and I had our weekly meeting, which enabled me to check off a few items on my to-do list, particularly around our planning for the DH Certificate and for the work that the WWP and other DSG & NULab projects will be doing over the summer. As often happens, I added a few new items to my to-do list as well.

Finally, it was time for a Barrs Lecture, “Senecan Inwardness and the Staging of Race in Titus Andronicus and Othello” by Curtis Perry, followed by dinner with the speaker and then a train ride home (during which I’ll probably write more emails). I’m sending this for posting prior to the lecture and I’m really looking forward to it.

And now it’s time to check off one last item on my to-do list: “Write Day of DH post.”

Sarah’s “to-do” list at the beginning of the day.
Sarah’s “to-do” list at the end of the day. At the WWP we are all amazed at everything Sarah manages every single day.

Ashley Clark, XML Applications Programmer

This morning I assisted Sarah Connell in introducing the process we use to generate full-text versions of Women Writers Project TEI. The process consists of an XSL transformation I wrote to regularize things like <choice> elements and soft hyphens—phenomena that the WWP encoders have dutifully transcribed, but the implications of which can be lost when one strips out the markup, retaining only the text content. For example, a typo transcribed as:

will, when the encoding is stripped out, appear like this:

The XSLT creates a normalized version of the WWP TEI, moving non-useful text into an attribute I’ve called ‘read’ (as in, “for this element, read ‘This'”):

which translates into this plain text version:

But! Since the original text content is preserved in `@read`, you can reconstitute it and use XPath to find the matching phrase in its original context:

`//text//p[matches(normalize-space(.),’the Emrppre[sſ]s’)]`

(Note that I haven’t yet made explicit the normalization of long-S to regular S. Ideally, the XSLT would use @read for the long-S as well, so you wouldn’t have to resort to regular expressions.)

Lara Roberts, PhD Candidate in English

Lara’s Day of Digital *Human*ities

0930-1100 I was part of a group that transformed the WWP corpus with XSLT and XQuery to use later with the word2vec R package.
1130-1300 I went to our weekly meeting for the Early Caribbean Digital Archive. We were so excited working on prepping the website for launch that I forgot to take a picture. Instead, here’s a slide with pictures of the team members (past and present)!
From 1300-1600, I joined my cohort in our teeny office to have weekly work time trying to understand data analysis through RStudio.
1600, Usually, at some point, we have to go get snacks to keep our brains fueled, before…
1630-1900 I ended the day in the always challenging and entertaining Humanities Data Analysis class.

Joanne DeCaro Afornalli, Outreach Coordinator for the Women Writers Project

After a brisk morning walk with my exceedingly energetic little puppy Brooke, I settled in to some tea and emails. I was very excited to see a congratulatory email from David Lazer, Co-Director of NULab, on a recent presentation I gave for the NULab faculty on my Digital Humanities Certificate project. Afterwards, I spent some time looking over a new contribution for our Intertextual Networks series. I’m really looking forward to sharing Cassie Childs’ upcoming post on Delarivier Manley’s Letters Written by Mrs Manley and food history. It includes some fascistic analysis of archival images from eighteenth-century recipe books and botanical guides, and the post’s images immediately struck me with their beauty and nostalgia.

My big event of the day was attending Northeastern’s Academic Honors Convocation to receive the Outstanding Graduate Student Award for Experiential Learning. The award recognizes a graduate student who has “shown an extraordinary capacity to integrate academics and professional work, and establish themselves as an emerging leader in their field.” I was highly honored to received it, and very glad I could share the experience with my advisor Elizabeth Maddock Dillon, my Co-op coordinator Lisa Cantwell Doherty, and Marina Leslie (who so kindly nominated me for the award).

Now that I’m home for the night, I plan on making the final minor formatting touches on my master’s thesis, and then submitting it to ProQuest! My thesis, “Angelenos Incarcerated: The La County Jail Oral History Project” is a DH project that features the oral histories of ex-inmates told through videography, mapping, exhibits, and encoded texts (with a customized TEI schema). You can view the project’s website here.

Overall, it was a pretty big day. Not necessarily the heaviest DH day for me. But, I was so honored to have the multimedia and digital humanities work I do recognized in a big way today. And I was beyond grateful to have such an amazing group of women cheering me on.

Liz Polcha, PhD Candidate in English

Cara Messina, PhD Candidate in English

This morning I woke up feeling the familiar finals anxiety. Even so, I pushed myself to attend the RelaxNG workshop run by Julia Flanders. Thanks to learning the different approaches to schema building (and Julia’s excellent scaffolding and metaphors), I have begun creating a flexible XML schema that I plan to use as a pedagogical tool next semester. Learning new DH tools is the perfect form of productive procrastination!

After the workshop, I attended Ryan Cordell’s Humanities Data Analysis final class. Throughout the semester, we’ve used R to analyze our corpora; my corpus contains the metadata and actual texts of 3,000 Korra x Asami (Korrasami) fanfictions from Archive of Our Own.  We went over topic modeling and classification again; Ryan encouraged us to embrace topic modeling’s lack of stability. Although most of the class revolved around discussing challenges and asking/answering questions about our struggles with R, we had a few laughs reading Day of DH Tweets and reflecting on the semester.

Bill Quinn, PhD Candidate in English

Today for DH, I worked on writing my prospectus. I wrote about how computational text analysis will help me explore intertextuality in modernist magazines. It feels really weird writing about what computers do between inputting the data and rendering the visualizations, and I am trying to figure out how some people do it so well. Fortunately, Stanley the dog was there to help out.
Cavendish X Molière: Braiding The Politics of Inter-Gender Dialogue

This post is part of a series authored by our collaborators on the Intertextual Networks project. For more information, see here. 

By Arnaud Zimmern, Ph.D. Candidate in English, University of Notre Dame

Were it but for matters of language—that Margaret Cavendish’s French was, like Molière’s English, non-existent—the titular resonance between her 1662 The Female Academy and his 1662 L’Ecole des femmes would defy coincidence. Similarly, the ambitions for all-female education and for celibate female autonomy at stake in her 1668 The Convent of Pleasure would find their satirical counterpoint in his 1672 Les Femmes Savantes. And thus the influence of Restoration England’s most under-appreciated female playwright on early modern France’s most admired male comedian would be unmistakably sealed.

Unfortunately, as Laura Carraro and Antonella Rigamonti intimated in 2000, scanty evidence that the two playwrights ever referenced each other makes it that Cavendish’s plays and persona as a learned lady can only be considered a loose “subtext” of Molière’s, nothing more (138). Whether these contemporaries read each other’s works or drew on common sources of inspiration are points of intertextuality clamoring for further elucidation. But let me propose that we take intertextuality from a less verbal and more structural vantage point. In the absence of a common language and common sources, did Molière and Cavendish share a common dramaturgical approach? More specifically, did they stage dialogue in similar ways, especially dialogue between witty, learned women and the men who would oppose and/or espouse them? If both playwrights staged the figure of the learned lady, whether satirically or heroically, did they give her distinct idiosyncratic modes of conversation or analogous ones? That, at least, is the question I’m setting out to answer for my project for Intertextual Networks. In this first post I want to present in some detail the historical background I’m addressing and I want to introduce the particularities, strengths, and weakness of the visualization method I am currently developing in order to track and compare dialogue, a method we can provisionally call “braiding.” My hope is that in bringing a visualization method to bear on the work of a woman writer who disparaged the Royal Society for its microscopes and who rightly elevated baking and cosmetics to the status of “chymistry,” I’ve at least paid her the homage of drawing my guiding metaphor from the realm of brioches and hair fashion.

That Cavendish knew of Molière by the time she self-published her second volume of plays, Playes Never Before Printed (1668), is rather safely attested. In 1667, her husband, William Duke of Newcastle, translated the Frenchman’s early play L’Etourdi (to be later revised and staged by Dryden). In September of the following year, William was also the dedicatee of Thomas Shadwell’s The Sullen Lovers, an overt adaptation of Molière’s Les Facheux. William’s contributions in verse to Margaret’s The Convent of Pleasure—which she carefully identified with individually pasted markers in the folio editions of the Plays (think early-modern Post-Its)—suggest the couple collaborated and would have discussed the latest trends in the comédie de moeurs (comedy of manners), as several scholars have noted.1

It would be difficult also to underestimate the press surrounding Molière’s L’Ecole des femmes, in equal parts a box-office smash and a tabloids scandal. The controversy cut on both sides of Molière’s professional and private lives. Hardly four years into his Paris career, Molière single-handedly relaunched the querelle des femmes, or debate on women’s education, as he satirized the efforts of the middle-aged Arnolphe who tries to keep his ward and bride-to-be, Agnès, untainted from all forms of knowledge, be they scientific or carnal. Molière opened himself to identical satire as he set about marrying his young ward, the actress Armande Béjart, whom many alleged to be his own illegitimate daughter. In February of 1662, under a chorus of wedding bells and a storm of gossip, Molière effectively succeeded where Arnolphe comically fails. 2 Unabashed, Molière rode the waves of popular attention to financial gain, producing in the following year a response play, La Critique de l’Ecole des femmes, which netted record profits.

Charles Robinet’s Panégyrique de l’Ecole des Femmes, the last of several published critiques of the original play, reports on the stir that Molière’s Ecole caused especially in England, where debonaire British husbands allegedly found the play’s male protagonist too tyrannical in his efforts to preserve Agnès’ innocence and ignorance.3 Whether Robinet’s report can be taken at face-value remains to be corroborated. For instance, his claim that the British have little appetite for Molière’s variety of “languishing comedies” but feed rather on a regular diet of the purest Tragedy, is historically specious.4 The most recent study of Molière’s impact on England suggests rather that his brand of comédie de moeurs—combining social documentary and lampoon—“parallels… the great manners tradition in Restoration comedy.”5 But Robinet’s characterization of laxer, more complaisant British husbands does seem to match Margaret Cavendish’s portrait of an obliging William Cavendish in her defensive biography, the Life of the Duke (1667). So with that point of sympathy in mind, we can conclude that if Margaret did not hear about L’Ecole des femmes from the public sphere, she knew of its author from the private sphere of her husband’s theatrical work and his collaboration, and of its themes from the kind of gossip that dogged her own marriage, as snide critics labelled her as pseudo-intellectual and her husband as lackadaisical.

Whether Molière, in turn, knew of the Duchess of Newcastle or of her works is a question altogether harder to answer and perhaps less promising. If he knew of her or of other learned women’s intellectual ambitions, he seems to have made both much and little of them, as the whim suited him. Ian MacLean reminds us: “Why should an opportunistic playwright in search of controversial material limit himself to a single view or consistent line? Education for women is implicitly defended in L’École des femmes; its excesses are attacked in Les Femmes Savantes. Women’s literary creativity in the form of romances is satirized in Les Précieuses Ridicules; the restrictions of women to such domestic activities as needle-work and sewing and their exclusion from education are impugned in L’Ecole des femmes.”6 Scholars often point to Mademoiselle de Scudéry, the 17th century female novelist, as a particular target of ridicule in Molière’s misogynistic plays. But there is no reason Cavendish should be exempt from her company, for Molière’s satires are capacious. Specific evidence, however, remains hard to come by in the plays themselves.

At stake then in finding intertextualities beyond the usual inter-citational or referential patterns, are two portraits. The first is that of Cavendish as a playwright more acutely aware of, adaptable to, and critical of continental trends in dramaturgy than the recent scholarly focus on her Shakespearean, Jonsonian, and purely anglo-English inheritance has suggested.7 If she responded vividly to the scientific discourses of René Descartes and Pierre Gassendi, whom she hosted at her table, there is little reason to doubt she responded with the same energy to developments in French theater. The second portrait is that of Molière, whose dramaturgical debts might extend across the channel in ways historians tracking his relationship to (and largely unilateral influence on) Restoration drama thus far could not account for.8 Their oversight, if it indeed it is one, would stem in large part from the fact that Cavendish’s plays have remained in the bibliographic shadows—a dilemma presently being resolved. But it would stem also from a lack of methods with which to track various kinds of non-verbal intertextuality—a dilemma I want to try to address with braiding.


Braiding sounds tricky but isn’t—in fact, it’s almost naïvely simple. In this second, more DH-heavy half of the post, I want to introduce it as a method for visualizing, tracking, and comparing structures of dramatic dialogue.

Let me begin by saying that, just like my first-year students (albeit for different reasons), I often get so lost in the language of the 17th century that I forget to read plays with an eye for who is talking to whom when. It takes considerable familiarity with a particular play, its characters, its plot, &c. to back up from the content of the speech-acts and look instead at their patterning, their sequencing, the rationales for the turn-taking within a given conversation, and identify the politics (whether gendered, racial, class-based, &c.) that are determining those turns, sequences, and patterns. It is a matter of the scale at which we read, whether close or distant, but also of the scale at which our methods make us comfortable reading. For instance, in Shakespeare studies, Anthony J. Gilbert tried early on to introduce literary scholars to the terminologies and questions of conversation analysis defined by anthropologists like Harvey Sacks—elements like indexicality, sequence, pre-sequence, announcements, pre-announcements, turn-taking, etc. But 1997 was perhaps still too early in the digital era to envisage how Sacks and Gilbert’s terms could help scholars understand intertextuality. Rather than stimulate scholars to look for analogous structures and strategies of dramatic-speech across plays and playwrights, Sacks’ abstruse terminology likely came across as another alien import from the social sciences that we would be better off not learning. So rather than propose a distant-reading tool predicated on Sacks’ anthropology and its set of assumptions, one that would markedly distinguish itself from the more comfortable realm of close reading, I want to propose braiding as a method that enables what Martin Mueller calls “scalable reading,” or the transition from close, formal analysis to the more structural, big-picture concerns of conversation analysis, and back again to the text.9 I’ll start by presenting the specificity of braiding in contrast to the better-known techniques of network analysis, and conclude with a few remarks on how attention to author-specific strategies of staging dramatic conversation might help us see Molière and Cavendish’s plays informing each other.

If we wanted a snapshot of the dialogue between characters in a given play, we might opt for a character network analysis, like Franco Moretti’s social network of Hamlet below, where an edge or link between two character-nodes represents a spoken transaction between those characters.

The Hamlet Network.

But networks are notoriously poor at representing the passage of time. With a network, a play or a short-story’s diegetic time gets compressed down to a single plane: you see the whole plot summarized in one instant. If we want to see changes in the network within time, if we wish to see the plot unfold, we may resort to a kind of flipbook of consecutive networks, flipping through various instances of the plot (this is often called a dynamic network). But the same problem ultimately persists: at each instant, we can consider only how the present network compares to the network from the previous instant or the upcoming one; we cannot visualize the overall change. What’s more, networks are poor at enabling multiple-graph comparisons. While we can handle comparing two simple network side by side, the intuitive benefits of that visualization break down once we’re looking at six or seven networks: the visual patterns simply cease to stand out because the cognitive load is too great. Networks therefore don’t encourage studies of multiple similar texts or of a single text’s transformation across several editions in historical time.

That’s where braiding intervenes as a supplement to social character network graphs. For clarity’s sake, let me use a text many of us will know well: Little Red Riding Hood (hereafter LRRH). In the scene excerpted below from Charles Perrault’s 1697 version of the tale, Wolf knocks at Grandmother’s door and pretends to be Riding Hood. Imagine Wolf’s voice as a strand in a braid, rather than an edge in a network, and let it cross over Grandmother’s strand to represent that Wolf speaks to Grandmother.

Braiding Demo – Little Red Riding Hood

That’s our first “crossing” within the braid, and we’ll encode it as a “braid-letter.” Assuming Wolf is character 1, and Grandmother is character 2, that braid-letter looks something like (102) where the 0 is a placeholder dividing addresser and addressee. When Grandmother responds and for each distinct ensuing speech-act, repeat the process — and so on and so forth for the rest of the story.

The visualization and the string of braid-letters (or “braid-word”) that emerges by the end is dependent on how you, as reader and encoder, have interpreted what constitutes “conversation.” Does a non-verbal knock at Grandmother’s door count as a speech-act? Is Grandmother responding at once to Riding Hood (in her mind) and to Wolf (in reality) when she responds, in which case the braid-letter might not just be (102) but (1023), entwining Wolf and Riding Hood’s strands together before having them cross under Grandmother’s strand? Similar questions of confused identity famously arise in early modern drama, especially in Shakespeare and Cavendish with their cross-dressing characters. It is precisely to avoid losing this important part of subjective interpretation that I propose braiding as a method and not a tool. I hope thereby to leave braiding available to multiple research interests, including those that need to pay attention to character confusion, focalization through a specific character’s experience of the plot, direct vs. indirect discourse, etc. I hope to encourage the kind of scalable-reading mentioned earlier, where the assumptions driving a visualization-technique are legible first and foremost to the reader/user.

A method though it might be, an important tool-like aspect of braiding, however, does emerge once we’ve encoded several conversations or several versions of a story into braid-words. In the following sample of 12 LRRH stories written between 1697 and 1899, we can certainly proceed visually and intuitively with the braid diagrams with relatively little cognitive difficulty, looking for patterns our eyes are rather good at picking up on.

Twelve Versions of Little Red Riding Hood, Braided and with Braidwords. Note how the dialogue structure of the 1697 Perrault version gets reproduced almost exactly in 1729, 1879, and 1891, while the introduction of the hunter figure at the tail end of the 1812 Brothers Grimm version leads to subsequent adaptations both minor (1889, 1894) as well as major, for instance in 1888 and 1898 when grandmother loudly assumes the hunter’s role.

But we can also quantitatively sequence the “braid-words” to retrieve patterns or near-patterns using rudimentary sequence-parsing algorithms borrowed from genetic sequencing. Braiding becomes an instrument for pattern-recognition and pattern-discovery across relatively large and complex corpuses, in ways networks do not readily allow for. For a brief (and somewhat naïve) gloss of a few interesting patterns in this sample of LRRH stories, you’re welcome to check out an embarrassing TEDx talk I gave my senior year of undergraduate studies at Southern Methodist University. The more important result I want to focus on, the one more relevant to the interests of our group at WWO—which emerges quite palpably from the picture above—is that braids offer the opportunity to consider at a glance the complex ebb and flow of conversations, and to some extent even of plot-line, within one story (synchronically) as well as across stories (diachronically). Moreover, they invite our visual intuition to collaborate with sequence-parsing algorithms, and they allow our comfort with close-reading specific passages to merge with more distant considerations of patterning across a text or multiple texts. Lastly they invite us, upon discovering a pattern, to return to the details of the relevant passage or set of passages to consider what, at the level of power-play and politics, is conditioning that particular pattern. They enable and enact scalable reading in ways I find few DH tools currently encourage.

All is not rosy-eyed, of course: I have yet to automate the transition from braid-words to braid-diagrams. The picture above is made entirely by hand. But my first step for the WWO project will be to automate the transition from braid-word to braid-diagram using either the python-based Numpy library or the MathML braid-visualization library.10 Charming and vintage as Microsoft Paint and manual labor might be, no one has that kind of time to spend. I welcome any suggestions or questions on how best to go about that part of the project and look forward to any thoughts or concerns it might elicit.

To bring things back finally to Molière and Cavendish and to conclude this long post, my project will begin with identifying a set of scenes within the plays aforementioned and others from their corpuses wherein male and female characters, learned ladies and their male antagonists, exchange contested words. As the Women Writers Lab pointed out early on with its helpful visualizations (reproduced below), Cavendish’s The Convent of Pleasure is not especially marked by male-female interactions, and we might add that Cavendish’s 1662 The Female Academy is even less so.

Margaret Cavendish, The Convent of Pleasure, 1668. This visualization illustrates the percentage of female & male speakers in each scene of Margaret Cavendish’s The Convent of Pleasure (1668). In ten out of a total of twenty scenes, female characters are sole speakers. Image reproduced from WWLab.

But the WWO Lab’s visualization depends on whether we encode the play’s central cross-dressing figure, the Prince who eventually marries the learned-lady figure, as male or female. By allowing for multiple possible encodings of the gender dynamics in these scenes, I hope to show that we can think about the cross-dressing Prince as someone who simultaneously parodies, venerably imitates, and obligingly enables the conversational patterns of the play’s learned lady. My literary-historical hunch (and I welcome any critiques or responses to it) is that in the gap between Cavendish’s first collection of plays (1662) and the second (1668), she has had time to consider and digest how Molière and his various critics/imitators represent the learned lady’s conversational patterns in L’Ecole and the Critique de l’Ecole. She is more attune to the learned lady’s strategies for intervening in natural-philosophical or proto-scientific discussions, where the politics of turn-taking are dominated both by intellectual hierarchies and age-hierarchies, and most importantly by gender norms. She is therefore better able to respond to Molière’s Ecole des Femmes in the Convent of Pleasure than she was as she composed The Female Academy. By allying a new scalable reading method with elements of conversation analysis, I hope to capture a glimpse of that otherwise illegible intertextuality.


Intertextual connections in An Collins’s Divine Songs and Meditacions: poetry versus prose

This post is part of a series authored by our collaborators on the Intertextual Networks project. For more information, see here. 

By Jenna Townend, Ph.D. Candidate in English, Drama, and Publishing, Loughborough University

My collaborative work with the Intertextual Networks project takes the form of an investigation into how quantitative network analysis can help us map intertextual practices and influences in the poetry of the seventeenth-century writer, An Collins. Her collection of devotional poems, Divine Songs and Meditacions (1653), is the only source of information we have on Collins and her life. Though it is apparent from the poems that Collins suffered from a chronic illness which had affected her since childhood, discerning other influences on Collins’s writing – such as her particular religious beliefs, her reading habits, and how she made use of what she read – is not an easy task. Nevertheless, previous work by Helen Wilcox and Mary Morrissey has established that there are intertextual connections to be found, and it is from these studies that this project takes its departure.

The poems of Collins’s Divine Songs and Meditacions communicate her desire for union with God through her journey from melancholy to grace, and her experiences of spiritual and physical affliction. Divine Songs and Meditacions show that her creative and devotional thinking were influenced by the poetical devices and structural elements of poets such as George Herbert, as well as the prose texts of popular puritan theologians like William Perkins. My project examines and maps in close detail what Collins took from her textual sources, and considers how she used these sources in the context of her desire to achieve union with God. This blog post will consider how I have identified a good number of intertextual connections using a piece of text comparison software called WCopyfind, and will discuss the issue that is now of greatest significance as, in the second stage of the project, I begin to translate these data concerning intertextual connections into a format to which network analysis can be applied.

Inevitably, before the methods of network analysis structure can be used, much recovery work is required to uncover and categorize the intertextual elements of Collins’s text, and this requires the examination of each of the works that Collins may have been influenced by. Taking cues from the work of Wilcox and Morrissey, I began by examining George Herbert’s The Temple (1633), Henry Vaughan’s Silex Scintillans (1650), and William Perkins’s The Foundations of the Christian Religion (1590). This corpus has now been expanded to include other popular poetic works and theological texts, of which Faithful Teate’s Ter Tria (1650) and Richard Baxter’s The Saints Everlasting Rest (1650) are just two examples. Making close comparisons between multiple texts which span the genres of prose and poetry is an exceptionally time-consuming task, but it has been made significantly easier by a piece of software called WCopyfind. WCopyfind is an open-source program that compares documents and highlights similarities between their words and phrases. The software was originally developed to detect plagiarism in student essays, but it is also an invaluable resource for anyone working on similarities or differences between texts.

The interface of WCopyfind is extremely user-friendly, and enables the user to choose to ignore features such as punctuation or letter case: something that is invaluable when it comes to analysing early-modern texts with non-standardized spelling and syntax. Using the EEBO full-text files of each of the texts in the project’s corpus (remembering to remove extraneous metadata and hyperlinks such as ‘View Document Image 9’), it is possible to run comparisons between the phrasing of texts. Users can select various parameters such as the shortest phrase to include (for example, telling WCopyfind that you want it to find shared phrases of no fewer than four words), whether or not to include punctuation, and, perhaps most significantly, a minimum percentage of matching words (setting this value to 80%, for instance, allows WCopyfind to find matches despite minor discrepancies in spelling). Once the comparison has been run, the two texts and their similarities can be viewed in parallel windows, with correspondences shown in red:

Figure 1. Side-by-side comparison in WCopyfind between Collins’s Divine Songs and Perkins’s The Foundation.
Figure 2. Side-by-side comparison in WCopyfind of Collins’s Divine Songs and Perkins’s The Foundation, showing similarities between their comments on faith.

It is worth noting, however, that if such tools are used only for the purposes of noting down statistics relating to the degree of similarity between Collins’s work and that of a probable source, then they become something of a blunt object. As another collaborator on the Intertextual Networks project, Amanda Henrichs, has noted in her own work, doing so often leads to ‘gaining old insights more quickly, rather than coming to new conclusions’. What I would like to do, therefore, is to examine some of the results I have obtained by using WCopyfind, and suggest the direction that this project will take as it begins to experiment with using network analysis to map intertextual influence.

Running comparisons in WCopyfind between Collins’s Divine Songs and Herbert’s The Temple, and then between Collins’s work and Perkins’s The Foundation of the Christian Religion produced some surprising results which have altered the trajectory of this project. When it comes to similarities in phrasing, there are roughly twice as many correspondences between Collins’s poems and Perkins’s work than with Herbert’s verses, despite the fact that The Temple is more than twice as long as The Foundation. Repeating this comparison with other poetic texts that Collins may have been influenced by, such as Vaughan’s Silex Scintillans or Teate’s Ter Tria, produces a similar result. This unexpected outcome has caused me to widen the net of my project. After all, it calls into question any assumption that poets are always most influenced by other poets when it comes to the content of their verse. The fundamental question raised by these results thus concerns the difference between a poet drawing on, or being influenced by, a prose text and a poetic work. What was it about Perkins’s text that Collins found so well-aligned with her own devotional and creative thinking, and what, in turn, did she take from her poetic sources like Herbert? Whatever it was that Collins found appealing in her poetic sources, it does not appear to have been their doctrinal content or phrasing, and we must therefore pay close attention to Collins’s borrowing of verse forms, metaphors, and images from contemporary poets.

A brief example of the complexity of this issue can be found in the opening verse of Collins’s work, ‘The Discourse’. The one-hundred-and-three stanzas of Collins’s lyric are written in a similar style to the seventy-seven stanzas of Herbert’s own introductory poem, ‘The Church-porch’, and sets out many of the devotional ideas and topics that are also explored later in the volume. Collins uses an adapted version of the verse form of Herbert’s ‘The Church-porch’, rhyming her lyric ABABBCC, rather than ABABCC. We also learn personal details, such as the fact that Collins ‘spent my infantcy, | And part of freshest yeares, as hath been sayd | Partaking then of nothing cheerfully’ (ll. 85-87), and of her desire that ‘Next unto God, my selfe I sought to know’ (l. 246). However, in terms of the number of shared phrases, ‘The Discourse’ possesses a greater debt to Perkins’s The Foundation than any other poem in Collins’s text when it comes to doctrinal content. Perkins’s text, which takes the form of a catechism, was an extremely popular text among English puritans, and it was organized around six devotional topics of God: man’s sinfulness, imputation, saving faith, obtaining faith, and death (Morrissey, p. 469). As an illustrative example of the parity between Collins’s lyric and Perkins’s catechism, it is worth comparing Collins’s comments on faith in stanza seventy-nine of ‘The Discourse’ with a passage from Perkins’s catechism (see also Figure 2 for a side-by-side comparison of these sections in WCopyfind):

That such a man hath Faith it doth appeare
For these desires doe plainly testifie,
He hath the Spirit of his Saviour dear,
For tis his speciall work or property,
To stir up longings after purity:
Now where his Spirit is there Christ resides,
And where Christ dwels is true Faith though weak abides. (ll. 550-56)

Q. How doo you know that such a man hath faith?

A. These desires and prayers are testimonie of the spirit, whose propertie it is to stirre up a longing and a lusting after heavenly things, with sighes and groanes for Gods favour and mercie in Christ. Nowe where the spirit of Christ is, there is Christ dwelling: and where Christ dwelleth, there is true fayth how weake soever it be. (sigs. B5r-B6v)

The parallels in phrasing here are obvious. Following Perkins’s indication that a man’s ‘desires and prayers are testimonie of the spirit’ and that they ‘stirre up a longing and a lusting after heavenly things’, Collins similarly states her belief that faith’s ‘desires doe plainly testifie, | He hath the Spirit of his Saviour’ (ll. 551-52) and in turn ‘stir up longings after purity’ (l. 554). However, given that Collins transposes much of the content of Perkins’s prose catechism into a verse form adapted from Herbert, considering the confluence of both prose and poetic influences is evidently vital to understanding Collins’s lyrics and how she made use of her devotional reading. My current hypothesis is that Collins takes elements of the content and theology of her poems from writers like Perkins, while adapting features of the form, style, and theme from her poetic texts in order to give shape and order to these doctrinal elements. This hypothesis will be tested as the project now moves, in its second stage, to modelling data concerning these intertextual correspondences using network analysis.

Inevitably, using a methodology that is traditionally used to focus on tracing social relationships or connections between members of a network will require some sensitive reworking if it is going to productively examine questions of literary influence. After all, the project is dealing with intertextual correspondences that range from a direct borrowing of phrasing, shared doctrinal or theological topic, poetic form, and particular metaphors or images. Moving forward, then, my next challenge will be to experiment with network software programs such as UCINET and Gephi to conceptualize the most effective way of visually representing these various types of intertextual connection in the work of An Collins and, more broadly, to interrogate how early-modern women’s poetry was influenced by a full range of contemporary writers and their texts.


Making (and using!) WWO:SDI

Recently, we published an announcement about the release of the Women Writers Online: Scrabble Discovery Interface (WWO:SDI), which was (we hope) fairly obviously an April Fools’ Day joke. For all its silliness, however, WWO:SDI demonstrates some of the much more practical tools we have for interacting with WWO. More than that, the WWO:SDI interface itself has proved to be a remarkably effective proofing tool.

This second point may be less surprising when you note that WWO:SDI is similar to some of our existing proofing routines, which use XSLT to create HTML documents that enable us to review our data. For example, we have a proofing routine that creates a chart displaying encoded data on the page numbers and signature marks that appear in our texts, along with our idealizations of page numbers and milestones. This chart makes it much easier to see where there are mismatches between our idealized numbering and the actual contents of each page and to catch errors such as when pages might be numbered: 1, 2, 5.

Creating WWO:SDI was an interesting thought experiment for us, particularly as we considered how our markup could be used to extract words that would not be allowed in a standard Scrabble® game (we thought of the various namelike elements right away, but hadn’t considered <speaker> until we remembered that most of the contents of <speaker> labels are proper nouns—we did have to reconcile ourselves to falsely excluding some words, such as “servant,” “duke,” or “attendants”). We also had to figure out a mechanism for excluding roman numerals, which proved tricker than we first expected, precisely because they aren’t always set aside in the encoding as names and such are. And we were able to draw on some of our existing routines for regularizing original orthographies, dealing with end-of-line (“soft”) hyphens, and preferring corrections over errors.

Because WWO:SDI makes it easy to sort by word length, it also has helped us to catch some encoding errors in the texts we are preparing for publication. For example, the interface will join up the halves of words that are split by end-of-line hyphens, which we encode with a “soft hyphen” character that appears identical in most programs to the standard keyboard hyphen character we use for compound words (“hard hyphens,” as we often call them). Thus, WWO:SDI makes it very easy to spot incorrectly-encoded soft hyphens because these typically appear as extremely long words at the top of the lists when sorted by length.

Soft and hard hyphens: spot the difference

Similarly, WWO:SDI is good at uncovering the kinds of missing spaces that are much less visible in the XML files themselves, usually where words are marked with phrase-level elements, such as in:

There’s a missing space between “best” and “History” but the (in this case, artificially constructed) layers of markup make that hard to see. On the other hand, “besthistory” is much easier to spot in WWO:SDI and we may just end up developing a version that we could use in our actual proofing processes.

So, hopefully you enjoyed playing with WWO:SDI—and perhaps it even sparked your interest in using tools like XSLT to work with XML-encoded documents (possibly by joining the XSLT workshop at DHSI). We certainly have a lot of fun using XSLT to explore and proof our documents, even when it isn’t April 1st!

Announcing the Women Writers Online: Scrabble Discovery Tool!

The WWP is delighted to report that we have developed a new interface that will enhance the texts in Women Writers Online by allowing users to discover the Scrabble® scores for the words in each text. The Women Writers Online: Scrabble Discovery Interface (WWO:SDI) provides sortable lists for all WWO texts, making it possible for users to determine the highest- and lowest-scoring words in the collection. The chart also denotes words that cannot be played in a single turn because they are longer than seven letters and words that could not be played using the letters provided by a standard Scrabble® set.

For example, the highest-scoring words in Harriet Cheney’s 1824 novel, A Peep at the Pilgrims, are “characterized” and “philosophically,” both with 30 points—although neither could be played on a single turn. The highest-scoring word in Ann Yearsley’s 1787 Poems on Various Subjects is “whizzing” at 33 points, but this word would only be possible if a player smuggled in an extra “z” tile from another set. The highest scoring word in the entire collection is “quizzically” at 43 points from Sarah Green’s 1810 Romance Readers and Romance Writers. The text with the highest average Scrabble® score is The Latter Examination of Anne Askew, 1547, which has words like “quyckeneth” and “excommunycate” at 31 points and “pertycypacyon” at 30 points. Archaic spelling seems to bring an advantage in this case! For sheer number of words that could be used in a Scrabble® game, the winner is Judith Murray’s 1798 The Gleaner, with 15,490 total playable words.

This interface uses cutting-edge technology to exclude words that are not allowed in standard Scrabble® games, drawing on the detailed encoding in the Women Writers Online collection. For example, excluding the contents of <name>, <persName>, <orgName>, <placeName>, and <speaker> removes many proper nouns from the results. Similarly, the interface excludes dialect and non-English words. We have also regularized some archaic letterforms, such as the long s (ſ), and regularized some spelling, such as i/j and u/v substitutions. The interface displays expanded versions of abbreviations and corrections of errors, wherever these are available.

We are confident that our readers will find WWO:SDI a valuable research tool, as well as a useful pedagogical resource. At long last, it is possible to compare texts by the important metric of their maximum and average scores in a Scrabble® game. We hope that this tool will revolutionize the study of early women writers and perhaps lead to new fields of word-game based literary scholarship.

We hope to add additional functionality to this useful resource very soon–including the option to have two authors or texts play off against each other in a simulated game. We expect to add scoring information on WWO texts’ performance in other word games, including Boggle®, Upwords®, and Bananagrams®. Finally, we are investigating the possibility of developing a WWO Edition Scrabble set, which would include extra “u” and “i” tiles (to be scored at 2 or 8 points when used in substitution for “v” and “j”). The set would also contain tiles for: ſ, æ, œ, ☉, and ☾ (these last two are essential in any serious gameplay for scholars of the seventeenth-century prophet Eleanor Davies).

We expect to have these new materials ready for release no later than one year from today, April 1, 2017.