Your goal is to examine the “experimental” models in the Sandbox or RStudio Server
and build a forensic case for the specific properties or limitations in their
preparation. Feel free to chat as a group and share notes and ideas together. Try
looking at clusters, run a few queries (especially comparing results between these
models and the “WWO Full Corpus” model), and test some of the operations, such as
additions, subtractions, and analogies. You can even run some of our validation
scenarios in RStudio Server, if you’re feeling adventurous.
You have two options for your exploration:
- Option one: This option focuses on finding traces of the
choices that were made in corpus preparation. Look at the “experimental corpus”
models: what can you determine about the selection and preparation of the texts
used to train the models? What are the specific clues you can find by exploring
the models? How are the choices made in text selection and preparation
impacting your results when you query the models?
- Option two: This option focuses on finding traces of the
decisions made in training the model (i.e. the parameter settings), looking at
the “experimental preparation” models. How does the number of dimensions seem
to be impacting your results? What happens when a model is trained on a very
small corpus? What changes when the window size is decreased?
For either option, make a list of the evidence you can find about data preparation
or model training, and develop some notes about how these choices seem to be
impacting the models.
 |
Word Vectors: Hands-on Practice and Group Work, Intensive
Pedagogy-focused, slide 9 of 11
|
© 2019 Syd Bauman, Julia Flanders, Sarah Connell, and the Women Writers Project This
TEI-encoded XML file is available under the terms of the Creative Commons Attribution-ShareAlike
3.0 (Unported) license.
|