Discovering with “Clusters”

By Cara Marta Messina

Description

This classroom-based activity will enable students to explore the “Clusters” function in the Word Vector Interface (WVI) and encourage them to begin considering larger thematic questions about the texts in Women Writers Online (WWO). The activity will primarily focus on using the clustering function as a catalyst for discovery; however, it can be paired with several other activities and act as a starting point for incorporating WWO, word embedding models, and text analysis in the classroom.

Learning Goals

This activity will enable students to:

  • Gain a basic understanding of the Word Vector Interface, particularly the difference between the “Clustering” function and “Basic” queries.
  • Learn about what word embedding models are and why they are useful.
  • Prepare for further exploration and discovery in the Word Vector Interface and Women Writers Online.

Activity

Using the “Clusters” function, students will choose a particular model of interest and begin exploring the different clusters that appear. A key goal here is to compare the randomized “Cluster” results to the results in the “Basic” query function and discuss why these are different. Below is an explanation of the difference between “Clusters” and “Basic” queries.

The “Clusters” results show the top 10 results of a cluster of words around a specific point in the vector map. These results are different from the “Basic” search because the “Basic” search is making a specific word the central point on the vector map, while the cluster is choosing a random point, not a word.

Another way to look at the difference between the “Clusters” and “Basic” results is visually, thinking about the “point of origin” for these results. The point of origin for “Clusters” is a randomized point on the entire model of the corpus, while the point of origin for the “Basic” results is a specific word in the corpus.

As they are exploring the clusters, ask students to think about what these might suggest about the texts that were used to train the model. Can they guess what kinds of texts and topics are in the collections based on the clusters that they see?

This quick activity simply asks students to explore a few clusters and try searching for related terms using the “Basic” query, but you can use this as a starting point for thinking about how word embedding models represent words and corpora, or even move on to one of the other suggested assignments on this site.