Navigating the Word Vector Interface
As part of the Women Writers Vector Toolkit, the Word Vector Interface serves as a space for researchers, instructors, and students to explore text analysis with word vectors through a set of pre-trained models. Some of the models include those trained using the Women Writers Online collection, the Victorian Women Writers Project collection, Early English Books Online, and selected sub-collections based on genre or time period.
The interface is designed to be usable with no knowledge of code required. The interface allows you to explore different facets of word vectors—from clustering to comparison to visualization. The interface aims to be starting point for those who are interested in learning more about what kinds of analysis are possible through word vectors and word embedding models. The Women Writers Project also offers an introduction to word embedding models for those interested in more in-depth introductory material.
To get started, navigate to the Word Vector Interface. With the toolbar on the left side of the screen, you can select the model you are interested in and adjust the settings for the type of analysis you have selected. The results of your analysis will appear to the right of the toolbar, in the main window of the interface. This window offers a series of five tabs, each of which facilitates a particular form of analysis. You can begin by selecting a model of interest, navigating to the relevant tab, and then adjusting any relevant settings in the toolbar.
Below, you will find a description of each tab as well as information on how to use each form of analysis. You might also find this list of sample queries useful.
Home
The “Home” tab allows you to search for a specific word in the model that you have selected and display the words which are closest to your search word in vector space.
The window displays the model that is currently selected (the default is the WWO Full Corpus) and offers a brief description of the corpus that the model was trained on. You can use the dropdown on the left to choose from the available models.
In order to search for a word of interest—in other words, to “query” the model for a specific term—you should type the word into the box labeled “query term.” The words in the results table are in descending order according to their similarity to the query word.
Clicking on any of the words will navigate you to results for that word in the Women Writers Online collection. (Note that the interface will always link to WWO, even when showing results from models trained on other corpora.) You can change the number of words that are displayed as the result of each query by moving the “Number of Words” slider on the toolbar. The default number of words displayed is ten.
Try searching for the word “rose,” “food,” or even a color like “red” to get a feel for how the search function works.
Compare
The “Compare” tab window allows you to compare results for two separate models. Similarly to the “Home” tab, the “Compare” tab displays corpus names and model descriptions for the models that you select. You can select the models to query using the two drop-down menus located in the toolbar. You can also adjust the number of words to be displayed using the slider in the toolbar.
The “Compare” tab has one search box where you can type a query term. The results of querying each model with that single term will be displayed in tables below the corresponding model and corpus description. The table includes a ranking of the words according to their level of similarity to the query term as well as a clickable link, which will direct you to the WWO search results for that term.
A good example query for the “Compare” tab is to try searching for “england” in the “WWO 16th and 17th Centuries” and the “VWWP and WWO” models. Another query to try is to search for “grace” in the “WWO 16th and 17th Centuries” and the “WWO 19th century” models.
Clusters
The “Clusters” tab will display clusters of words which share semantic similarity (this can include antonyms) in a particular model. The toolkit generates one hundred and fifty clusters—or grouping of words which are close to one another in vector space—for the selected corpus and model. With the “Clusters” tab, you can see the most relevant words for a randomly-selected set of clusters. As with the other tabs, you can use the dropdown to select a model. From the toolbar on the left, you can also reset the clusters, adjust the number of words included in each cluster, and download the cluster results.
Each cluster is given a generic label “cluster_1,” “cluster_2,” “cluster_3,” and so on. The default number of words per cluster is ten, with one hundred and fifty being the highest number possible in the interface.
When you click “Reset clusters,” the toolkit randomly selects a new set of clusters and their related terms. If you would like to see the full set of clusters and related words for a model, download that model’s data with “Download” button.
Operations
The “Operations” tab allows you to perform vector math using a model of your choice. The toolbar includes a dropdown menu for selecting which model to use as well as a dropdown menu for selecting either addition, subtraction, analogies, or advanced operations—each of which enables a different operation. For more information on word vector operations, check out this explainer.
In the main window, a corpus and model description are provided for the corresponding model, along with descriptions of how each operation works and what conclusions can be drawn from the results.
There are two text boxes, one labeled “Word 1” and the other “Word 2.” Using these boxes, you can type the two words you would like to perform a particular operation on. After you enter two terms, a table will be generated displaying the results calculated by using the operation that was selected on the chosen words. The table includes cosine similarities and clickable words that are linked back to WWO.
To test out the addition operation, try adding words like “grace” and “beauty.” You might use the same pair of words in a subtraction operation or to even subtract “grace” from a word like “god.” To use more complicated operations, you can put analogies to the test such as “rich - poor + dress.”
Visualization
The visualizations tab provides some experimental visualizations for exploring query terms and their contexts. You can change models with the “Model” dropdown menu in the sidebar. The “Select visualization” dropdown menu allows you to choose different visualization options.
Word Cloud
The Word Cloud visualization offers a spatial view of the closest words to a query term. The query term you enter appears in the center of the cloud, surrounded by terms whose proximity is based on cosine similarity to the input term. Terms with higher cosine similarities are closer to the center. The terms are also color-coded to show cosine similarity, as outlined in the key below.
In the sidebar, the top slider allows you to set a threshold for the cosine similarity of displayed terms. The middle slider allows you to control the maximum number of words displayed, and the bottom slider controls the size of the plot image.
Learn more
The Word Vector Interface is supported by a collection of further resources and readings:
- Our About page provides background information about the interface and the Women Writers Project more broadly. The page also offers a few resources for familiarizing yourself with word vectors and the toolkit before diving in.
- For more in depth information about the methodologies at work under the hood of the toolkit, please see our Methodology page, which walks through the code we used to train the models, how we prepared the corpora, how we tested the models, and other details about the data and methods you will see at work in the interface.
- If you’re interested in recreating some of the analyses you see in the interface on your own computer, the Resources page offers a set of code walkthroughs which are meant to be used as tools for learning how to run your own word vector code. In addition to these walkthroughs, the page also includes a glossary of some of the terminology used in the interface as well as a few case studies that were created with the interface’s analytical tools.
- Finally, the Teaching page includes resources for incorporating the toolkit interface in the classroom in addition to sample assignments.
See the How to Navigate the Toolkit guide for more information about these resources. See the site map for an outline of the webpages within the WWVT.