Browsed by
Tag: markup

A New(ish) Approach to Markup in the Undergraduate Classroom

A New(ish) Approach to Markup in the Undergraduate Classroom

By Kevin G. Smith, Ph.D. Candidate in English, Northeastern University

Note: Kevin G. Smith is a pedagogical development consultant for the WWP. His dissertation research is partially supported by a grant from the NULab for Texts, Maps, and Networks.

A few summers ago, I spent my days working in Northeastern’s Digital Scholarship Commons. As is common in that space, there were nearly daily meetings of different teams of faculty, library personnel, and graduate students working on digital projects. One of these projects was The Early Caribbean Digital Archive (ECDA). During that summer the ECDA project team was working on customizing a TEI schema to encode their texts in ways that were more in line with their decolonial archival goals. As I procrastinated on my own work, I was overhearing these amazing conversations that the EDCA team was having about the meanings and applications of certain aspects of their TEI customization. How should they tag an embedded or mediated slave narrative, for example? What to do about unnamed slaves? And how might they handle commodities? What are the ethical ramifications of encoding a slave as a commodity (or not)?

As I sat, listening to these conversations, I began to realize that it was precisely because they were encoding the texts in TEI that these conversations were happening. The act of encoding literally inscribes texts with interpretation, forcing the project team to discuss just what kinds of interpretive judgments they wanted to make. And they were important conversations: about how we represent our objects of inquiry in the humanities, about the ethics of data representation. (By no means am I the first to realize this. For a compelling example, see Julia Flanders: “The Productive Unease of 21st-century Digital Scholarship.”)

The point is that I was struck by these conversations. And I began to think about how the tension of formalization, this “productive unease,” as Flanders terms it, might be leveraged in writing classrooms. Could I somehow use the TEI to intervene in students’ writing processes, to foster these kinds of conversations about their own writing? What would that even look like?

Two years later, in the summer of 2016, I taught my first markup-based writing course at Northeastern. In the intervening years my approach shifted from using the TEI to designing a built-from-scratch XML schema for each course. Thus far, I’ve taught two courses using this method (Advanced Writing for the Technical Professions in the summer of 2016 and First-year Writing in the fall of 2016). In addition to writing their assignments in XML (using Oxygen), students in these courses engage in a semester-long, collaborative writing project: the design and implementation of an XML schema that structurally and rhetorically models a range of genres of writing.

This approach—using XML to produce texts—represents a shift from the mimetic roots of XML and its primary use in humanities research, the TEI. In the rest of this post, I want to briefly discuss this shift and its implications for the study of markup.

Teaching with Markup

 There are many wonderful examples of using the TEI and XML in classrooms. Kate Singer’s use of TEI for developing poetic vocabularies in an undergraduate class comes to mind, as does Trey Conatser’s use of XML in a first-year writing course at the Ohio State. Though, at first blush, these two markup classrooms may appear very different—one being in an upper-level literature course and the other a first-year writing course—the perceived pedagogical benefits of using markup are similar. Both pedagogues seek to foster close attention to the object of study—a poem or the student’s own writing—through what is essentially a process of annotation.

Where my approach to markup differs from these (and most traditional) classroom uses is in the thoroughly bottom-up, data driven approach to schema design (Piez, 2001). Students begin with a (basically) bare schema and—iteratively and deliberately over the course of an entire semester—design and revise the schema for a range of writing tasks using document analysis and modeling, qualitative writing research methods, and their own experiences of authorship. The result is a shift from annotation to production, from product to process.

An example may be illuminative here. A group of students decide they would like to design a schema for movie reviews. They begin the process by researching the genre—gathering examples, examining related genres, tracing the circulation and uptake of the genre, interviewing experienced writers and readers of the genre, and so on. Based on this research, the group identifies the salient structural, rhetorical, and content-based components of the genre—a movie review includes a series of paragraphs, for example; the first of these paragraphs must, according to the students, include a component called “opinion,” which has a specific definition and different types. They name these components and write a prose pseudo-schema, including documentation, attributes, dependencies, and rules for the components. The pseudo-schema is translated into an XML schema using Relax NG (by me).

An element list from an in-class schema design session with students in the First-year Writing course of 2016.

Once the schema is drafted, each student writes an individual XML document, their own example of a movie review that responds to a unique rhetorical situation. Based on this experience, the group reconvenes to revise their schema. They might, for example, decide that the <opinion> element should be optional in the first paragraph, or decide that an additional attribute value should be added to the @type attribute, or choose to adjust the definition of the element itself. Once schema revision is complete, students revise their XML documents. And on it goes.

An example of XML markup designed for the course.

What I hope the above example illuminates is the thoroughly process-oriented approach to markup adopted in these classes. The schema is not static. It is a living document that affects and is affected by student’s experiences of composing, among other things. Neither are the student-authored XML documents static. They are repeatedly invalidated by revisions to the schema. They are subject to feedback from classmates and instructor. They must be continually revised. From a digital humanities perspective, this application of markup may seem alien. In fact, in some ways, it doesn’t even matter what the schema ends up looking like (though it can be fascinating). The object of using markup in this way is not to produce the perfect model of a genre. In fact, an understanding of genres as social actions, rather than a set of ossified textual features is central to the theoretical framework of the course. This understanding resists the idea that genres can be accurately modeled. The point of using markup is to foster productive conversations about writing, to interrupt the normal thinking and writing processes of students in productive ways. This brings us back to the conversations I overheard in the summer of 2014, eavesdropping on the ECDA when I was supposed to be writing.

An example of a markup output document for display. The XML is transformed to HTML with custom XSLT and highlighted according to XML tags.

But this approach raises new questions. How do I know if this approach is productive in the ways that I hope? What kinds of conversations are students having in these classes? How does markup function rhetorically for students when used for authorship? Does writing in XML and designing schemas for authoring contribute to students’ understanding of their writing and reading processes? Do reading and writing practices in the markup classroom transfer to other contexts? These questions just so happen to be the basis for my dissertation research, which takes as its objects of inquiry the two markup-based writing courses.

Studying (Authorial) Markup

The questions posed above present unique methodological concerns for the study of markup. A shift from product to process raises practical questions concerning how we access students’ experience using markup in this way. How can I make claims about the rhetorical and expressive capacities of authorial markup? How can I understand the role of the schema, the markup, and the platform(s) in students’ writing, reading, and thinking processes? In short, how do I study this?

Here, a slight shifting in thinking—from the digital humanities to writing studies—is helpful. While the pedagogical approach may be unconventional, my research questions are typical of writing studies research. Methods for studying student writing and experience in classroom settings are well established in the field. Although qualitative approaches to the study of markup are not typical in the digital humanities, the research questions for this project, based, as they are, on student experience, reflection, writing, and perception, necessitate the adaptation of innovative methods. To this end, I’ve employed a teacher research methodology—a systematic approach to data collection that honors the inside perspectives of teachers and students—that adapts qualitative research methods culled from ethnography, education, and writing studies research. Data for the study was gathered from direct participant observation, reflective journaling, semi-structured and directed qualitative interviews (three interviews each with nine case study students), and the collection of student writing (normal prose and XML, including version control logs for all XML files).

At this point, data collection has ended and the project is shifting to the data analysis phase. It is too soon to report results, however, early indications from student interviews point to some promising findings around student reflection and transfer, the multi-directional mediation of the schema, and students’ use of markup as a tool for generic invention and change. Here, it may be enough to assert that qualitative approaches to studying markup-based undergraduate courses may be fruitful. Indeed, digital humanities courses in general may benefit from adopting qualitative methodologies, like teacher research, to self-assess and to advocate for curricular change and institutional support.

The assignment discussed above is collected with the pilot set of teaching materials from the WWP’s pedagogical development consultants and is available here.

Loanwords, Macrons, and Orientalism: Encoding an Eighteenth-Century Fictional Translation

Loanwords, Macrons, and Orientalism: Encoding an Eighteenth-Century Fictional Translation

By Elizabeth Polcha, WWP Encoder and Ph.D. Candidate in English

Since late last fall, I’ve been encoding a text that poses some interesting markup challenges because of its use of Orientalist language: Scottish author Eliza Hamilton’s 1796 epistolary novel, Translation of the Letters of a Hindoo Rajah. While I was excited to encode Translation because my own research considers eighteenth-century colonial literature, I focus on Caribbean and American literature. So, as an encoder, I approached Translation with an interest in how Hamilton is using distinct language to construct colonial notions of race and gender, but with only a limited familiarity with Orientalist print culture and history.

Before I lay out the details of how I’ve been encoding linguistically distinct language in Translation, it is necessary to explain just how Orientalist (and orientalist, to use Edward Said’s version of the term) this novel is. And no, Translation is not actually a collection of letters that Hamilton translated from Hindi.1 The “translated” letters of Hamilton’s text are fictional, mostly authored by the titular character and protagonist, Zāārmilla, the Rajah of Almora. Hamilton supplements the letters with a “preliminary dissertation,” lengthy footnotes, and a glossary of terms. She strategically includes these textual addendums as a way of demonstrating her expertise in the Orientalist scholarship of her time. Also, as you can see from the macrons included on “Zāārmilla” and on another major character’s name, “Māāndāāra,” Hamilton is a fan of using diacritical marks as a kind of typographic flourish. In writing Translation, Hamilton participated in a scholarly discourse rooted in a Western imperialist fascination with Eastern Asia, citing British colonial scholarship like Nathaniel Halhed’s A Code of Gentoo Laws Or, Ordinations of the Pundits and Orientalist groups like The Asiatic Society.2

Part of our encoding process at the Women Writers Project is to begin with a preliminary document analysis. This means that once we’ve acquired a text to encode, we look through the text carefully to take note of its structure and textual features before opening up an XML file and marking up our text in TEI. During my preliminary document analysis of Translation, aside from noticing the epistolary structure and Hamilton’s unusual diacritical marks I’ve described above, I also noticed quite a few Hindi and Sanskrit terms and phrases that seemed to be roughly transliterated into English (such as “Poojah” or Pūjā, पूजा, a Sanskrit-derived word for Hindu ritual prayer). From my document analysis I knew that it would be important to look up the etymology and meaning of Hamilton’s transliterated terms in order to decide how to most accurately describe them using the TEI. My encoding practice for Translation so far has involved occasionally switching between my XML file, the Oxford English Dictionary (OED), and Google Books in determining the best way to tag specific terms and phrases.

The WWP follows the TEI Guidelines for capturing specialized language with the element <distinct>, which means that we use <distinct> to tag language that is “archaic, technical, dialectal, non-preferred.” In addition to <distinct>, <foreign> and <term> were also particularly important in my encoding of Translation. The WWP also uses the @xml:lang attribute with a value from the IANA language registry to provide standardized identifications for non-English words and phrases.3 This means that my encoding process involves paying attention to the etymology of distinct words and phrases in order to assign each <distinct> or <foreign> element an IANA language code.

For example, in the first letter in Translation, Zāārmilla refers to a character’s “Ayammi Shadee,” which Hamilton defines in a footnote as “the present made to a young woman by her relations during the period of her betrothment” (58). In determining how to encode this term, I first searched for it in the OED—which returned no results. I then searched in Google Books, which brought me to Halhed’s A Code of Gentoo Laws, Hamilton’s original source. Eventually, I determined that “Shadee” must be Hamilton’s (and Halhed’s) version of the Hindi word, shadi, or, marriage.

Example encoding of “Ayammi Shadee.”
Example encoding of “Ayammi Shadee.”

This term stood out to me in the text not only because it was capitalized and footnoted, but also because I did not recognize it. If Hamilton had simply used the word “Marriage” there would be no need to tag it with a more descriptive element, but because the WWP is interested in tagging non-English and linguistically distinct language, I needed to figure out the best way to encode the term. I ended up encoding “Ayammi Shadee” using the element <foreign>, which is used to tag non-English words in cases where there is not another more appropriate element, such as <name>, <persName>, or <placeName>. I also used the @xml:lang attribute with a value of “hi” for Hindi.

As in the example above, one of the challenges of marking up non-English and linguistically distinct terminology in texts like Hamilton’s Translation is that it is sometimes difficult to know when a word is being referenced in the text as a foreign language term, or when the text is using a term that has been adapted into English as a loanword. For example, the English word “pundit” is a loanword from the Sanskrit term “pandit” meaning knowledge owner, or, according to the OED, “a person with knowledge of Sanskrit and Indian philosophy, religion, and law.” So, when Halhed includes “Ordinations of the Pundits” in the title of his text, he is referring to a “pundit” as an intermediary who could clarify Indian law for colonial authorities.

It is also difficult to distinguish when a term can accurately be tagged “foreign” or “distinct” (<distinct> is the element we use for linguistically or dialectically distinct terms that are not distinct enough to constitute a ‘foreign” language), since what is considered foreign or distinct to me may not have been foreign or distinct to an eighteenth-century reader. The WWP aims to best represent the documents we encode within the context in which they were written and published, which is part of the reason why the OED is so often a valuable resource for encoders—we wouldn’t want to mark an early modern spelling of a particular word as a typographical error using the elements <sic> and <corr>, for example. But it is also important to recognize that each encoder approaches the encoding process with her own understanding of the text. My choices in marking up the term “Ayammi Shadee” are based on my understanding of the WWP’s encoding practices and my analysis of the text—and these choices will be reviewed by other encoders and may change as Translation moves through our proofing process and into final publication on Women Writers Online.

What I love about working for the WWP is the endlessly evolving way we think about markup, and the collaborative nature of the encoding process. From the many discussions I’ve had in encoding meetings with my WWP colleagues about Hamilton’s Translation, we’ve shifted slightly in our thinking about elements like <distinct>. Ultimately, the complicated way Hamilton uses Hindi- and Sanskrit-derived terms has helped me to think more critically about the linguistic complexity of eighteenth-century colonial writing.