Digital Research Materials

What is distinctive about digital research materials?

What I want to do in this first session is situate text encoding within a larger frame of reference: as a specific way of representing scholarly information

partly as a way of understanding text encoding (and other forms of digital research tools)
and partly as a way of understanding how we represent and use scholarly information, research materials, more generally
because one crucial question being asked in the digital humanities domain is essentially "what changes?"; "does anything change?"
i.e. are we radically altering how humanities research is done? or the kinds of arguments we make?
put another way, is it all worth the effort?

In order to do this, I think it will be helpful if we can lay out in front of ourselves the kinds of research materials we’re familiar with, and talk about how we’re used to thinking with them: what do they do for us, informationally? how do they present the source materials to us?

So: what kinds of research sources have you used in the past year?

primary sources? in what media? originals, facsimiles, reprints, editions, microfilm reproductions, digital transcriptions (what kind?)
secondary sources?
derived data? in what form?
metadata (library catalogues, finding aids, etc.)

How would we characterize the types of information found in these sources?

sources that show us visual evidence
sources that give us derived analysis (quantitative, qualitative)
sources that give us a description, in language
sources that give us a description, in formal terms
sources that give us an argument, in language
sources that give us an argument, in other forms?

How do we evaluate these various types of sources? how are they successful or unsuccessful?

Visual evidence: level of granularity, fineness of detail, accuracy (of color, etc.)
Derived analysis: the intellectual basis of the analysis: the accuracy and relevance of its disciplinary assumptions, the usefulness of the insight it gives us
Prose description: the richness of detail, the persuasiveness (i.e. how it convinces us of the author’s trustworthiness and usefulness as a witness), also its comparability to other descriptions (i.e. using descriptive terms consistently)
Formal description: consistency, appropriate granularization of the data

Is any of this inflected by discipline?

what kinds of sources do historians use?
what kinds of sources do literary scholars use?
linguists?
ontologists?
--other groups?

What is text encoding? Where does it fit in?

So we can try to situate the activity of text encoding in this intellectual space:

From the viewpoint of the humanities scholar, text encoding looks as if it’s coming over from computer science: as an activity that takes place on computers and requires some technical knowledge (of software, of data standards, of encoding languages)
in fact, there are some other lines of connection that make it clearer why it should be of interest to us
anthropological: the text encoder is an observer and documenter of the textual world, and the encoding he/she produces has (at least potentially) something of the quality of a thick description: a contextualized, interpretative account of the details of the textual landscape.
editorial: the text encoder is also very much like a critical editor, creating an analytical representation of the text which provides systematic, expert knowledge about it
interpretive, critical: the encoder can also act as an interpretative commentator, using markup to add context, layers of interpretive information

Perhaps most importantly, text encoding is a modelling activity: a process of creating an analytical representation of an object (e.g. a document) or an information system

Sampling and modelling

It may be useful to talk more about the concept of data modelling at this point...

increasingly common concept in digital humanities, lots of discussion about what it means

I’m using the term modelling here as distinct from a concept like sampling:

Sampling takes slices or samples of the world: visually (like a digital camera), sonically (like a digital sound recorder), or in some other way
the classic example is the bitmap image: a matrix of colored dots that represent an image at some resolution: high or low
Modelling creates an analytic representation of the world: as a function, a formalization, a mathematical representation, a conceptual model, some kind of surrogate

Sampling produces what I would tentatively call a depiction: a version that aspires to be the source:

Measured in terms of fidelity
Example: a high-resolution photograph (higher resolution = better depiction)

Modelling produces a version that aspires to yield information about the source for a specific purpose:

Measured in terms of functionality against the purpose in question
A topographical map: functional for understanding geographical features
A road map: functional for navigating in a car
A satellite map: functional for viewing weather systems

Varieties of digital formats

What do the examples here show us? what does each version let you see? how does each representation convey information about the original artifact? (Pause for discussion)

Text encoding vs. information modelling

Note that text encoding may not even be the best or most evocative term for all of this:

it describes the markup of text streams
but for things like modelling contextual information (e.g. personography example) the emphasis is on the structures created, not on the text itself
somewhere in between is the terrain of text which is itself regularly structured
when we think of what all of these kinds of markup are really doing, we might better describe it as information modelling or data modelling

And note as well that text markup is not the only way to model data, or text data:

databases have been used for a long time for this purpose
with a greater emphasis on structure and less on the nuances of text
not fundamentally different, just a different emphasis: on what is consistent vs. on what is variable
databases tend to cease modelling at the point where the text becomes highly variable: within paragraphs, within lines of poetry: the markup of individual words
though now with XMl databases, we’re starting to see approaches that are really hybrids