Word Vectors Institute: Introductions and Overview

Julia Flanders



To situate this event a bit:

We’re not expecting any prior knowledge of text analysis and certainly none of word embedding models (that’s why you’re here!) but we hope everyone will come away feeling comfortable with several things:

What we will not be covering:

Finding the right level

This is also a sort of meta-workshop:

A quick look at the schedule...

Quick look at the schedule:

Making notes

We’ve provided a fair amount of time for individual and small-group experimentation, and time for you to think about your own research projects

However, this workshop will really just be a start, a chance to get comfortable with fundamental concepts

I want to talk for a moment about some suggestions for how to take this work with you and continue it in your own time after you get home:


So, with those preliminaries out of the way, let’s get into our first explanation of word embedding models. For this first explanatory pass through, we won’t dwell in detail on the terminology or the mathematics: we’ll keep to a sort of metaphorical level of explanation to get a feel for things.

And the first term I want to talk about is the word model

Word-embedding models have properties of both, but in important respects are more like this latter type:

The practical applications of this kind of modeling are familiar: predictive text on your phone! But in digital humanities, models of this kind are also valuable because they let us understand language better and help us do research on specific topics and historical formations. So where the machine-learning research in industry is focused on getting the most accurate predictions of what word I’m trying to type, through a somewhat abstract, de-historicized understanding of language, in digital humanities we need to pay close attention to language as represented in our specific corpora (representing a time period, a genre, a set of authors, etc.) and also to the assumptions we’re making about language when we train our models.

A first look at word vectors

At the simplest level, a word embedding model is a model of a text corpus that represents word usage in the corpus by locating each word in space

Metaphorically, we can imagine that those spatial locations show us neighborhoods of words that tend to occur in the same contexts

Another way to think about these neighborhoods is that they are answers to the question: what are the words most likely to appear near word X? or what word X is most likely to appear in this context?

So the clusters we see are groups of words that might be predicted by the same kinds of contexts. What can we imagine those contexts to be, based on the clusters we’re seeing here?

Thinking with vectors

So this is interesting in itself:

It’s also interesting because we can do further analysis:

If you had a chance to read Ryan Heuser’s analysis of riches and virtue, or Ben Schmidt’s analysis of the Rate My Professor data, where he considers breaking down the gender binary, they are taking advantage of this same idea:

Locating words in vector space

So how do those words get located in this space? What does the spatial metaphor really mean?

We will go into the details much more fully, very soon. But for this initial orientation:

This slide shows some actual quotations from WWO where the word danger occurs:


You may be thinking, as I did, words have many different associations: if location in space is representing the semantic affiliations of each word, how can a word be in multiple places at one time?

In this diagram, on the left, the word bank has two associations:

On the right, we have a more complicated situation: the word set has many more associations. We can’t draw an equivalent diagram, but we can still imagine:

If this feels baffling right now, don’t worry--in my experience this idea takes a little time to sink in. Let it sit in your mind as a metaphor for now: a big cloud of words, with neighborhoods of related words; closer words are more closely related.

Questions at this stage?

Factors that affect the behavior of the model

I mentioned earlier that we need to be attentive and critical about how this model is created; there are a number of things that affect how a word embedding model will perform for us.

The size of the corpus matters a lot (and you’ll remember that we specified that you had to have at least a million words):

The content of the corpus also matters a lot:

The data preparation also matters a lot (and we’re going to spend two whole sessions on this later on):

And finally, the training process matters:

Comparison with other forms of text analysis

As part of our orientation, it may also be helpful to situate word embedding in relation to some other kinds of digital analysis we may already be familiar with; all of these are ways to get an understanding of texts at scale

Has anyone here already experimented with word frequency, for instance with Voyant tools?

How about topic models: has anyone used those? For instance, tools like Mallet?

What’s distinctive about word embedding models:

The larger question of what word embedding models are distinctively good for is one that we will explore as a group in the rest of the institute!

Disclaimers! Questions?

I should note here: we have been working hard to understand word embedding models and develop this curriculum; however, the underlying math is undeniably challenging. At some points in the next few days, I anticipate that you’re going to have questions that we actually can’t answer, because we haven’t yet fully mastered that deeper layer. We’re going to treat these as learning and teaching moments! After all, these are also questions that our students and colleagues will be asking us. So part of what we’re exploring here is how to understand the boundaries of what we know, and how to respond effectively based on that knowledge, whatever level we may be at.

Questions at this stage?