Word Vectors for the Thoughtful Humanist
Northeastern University
May 16–20, 2022
Sarah Connell, Julia Flanders, Syd Bauman, Juniper Johnson, Ash Clark
Digital Scholarship Group, Northeastern University Libraries
Schedule
― Monday May 16 ―
Session 1 | 12:30–1:50 (Eastern) | Welcome, introduction and scoping (slides: HTML, TEI, notes) |
1:50–2:00 | break | |
Session 2 | 2:00–3:00 | Conceptual orientation and walkthrough of the basics of R (slides: HTML, TEI, notes) |
3:00–3:10 | break | |
Session 3 | 3:10–4:10 | A deeper look at core concepts and terms, part 1 (slides: HTML, TEI, notes) |
Session 4 | 4:10–4:50 | Pedagogical showcase (slides: HTML, TEI, notes) |
Wrap-up | 4:50–5:00 |
Homework: Hands-on experimentation with the Women Writers Vector Toolkit and/or our Sandbox
― Tuesday May 17―
Session 5 | 12:30–1:45 | A deeper look at core concepts and terms, part 2 (slides: HTML, TEI, notes) |
1:45–2:00 | break | |
Session 6 | 2:00–3:00 | Walkthrough commented code on querying an existing model (slides: HTML, TEI, notes) |
Session 7 | 3:15–4:30 | Group hands-on practice (slides: HTML) |
Troubleshooting | 4:30–5:00 | Optional session |
Homework: Choose a term that you’re interested in, query that term in at least two different models, and make notes on your results in the Day 2 Homework
document in the Group Activities
folder in our shared Google Drive. Also, please read through the sample curricular materials and make notes on one or two learning goals for word embeddings in your classroom. On Wednesday, we'll be training our own models, so aim to have a corpus of no more than ~4 million words to work with, either from your own project or subsetted from the test corpora.
― Wednesday May 18―
Pre-session | 11:00–12:00 | Downloading R and RStudio (optional but recommended!) |
Session 8 | 12:30–1:00 | Group discussion of sample assignments and learning goals (slide: HTML) |
Session 9 | 1:00–2:15 | Process, part 1: Corpus and data preparation (slides: HTML, TEI, notes) |
2:15–2:30 | break | |
Session 10 | 2:30–3:30 | Group walkthrough of model training in RStudio (slide: HTML, TEI, notes) |
3:30–3:45 | break | |
Session 11 | 3:45–5:00 | Hands-on practice: getting ready to train a model, loading your own data (slides: HTML, TEI, notes), exporting results (slide: HTML) |
Homework: train a model on your own data, varying one parameter from the defaults
― Thursday May 19―
Pre-session | 11:00–12:00 | Downloading R and RStudio (optional but recommended!) |
Session 12 | 12:30–1:45 | Group annotation and discussion of pedagogical artifacts (slide: HTML) |
1:45–2:00 | break | |
Session 13 | 2:00–3:00 | Process, part 2: Parameters and validation (slides: HTML, TEI, notes) |
3:00–3:15 | break | |
Session 14 | 3:15–4:00 | Treasure hunt (slide: HTML) |
Session 15 | 4:00–5:00 | Group hands-on practice: setting up to train and validate another model (walkthrough slides: HTML, TEI, notes; hands-on slide: HTML) |
Homework: Train another model with different parameters, develop some word pairs, and run validation code
― Friday May 20―
Pre-session | 11:00–12:00 | Tools and tactics for full-text corpus exploration (optional but recommended!) |
Session 16 | 12:30–1:45 | Small group discussion of syllabi and course design (slide: HTML) |
1:45–2:00 | break | |
Session 17 | 2:00–3:15 | Walkthrough demo: Exploration and analysis (slides: HTML, TEI, notes) |
3:15–3:30 | break | |
Session 18 | 3:30–4:30 | Full-group discussion and wrap-up (slides: HTML, TEI) |
Resources
Model training walkthroughs, web-friendly versions
The resource page has links to WWP tutorials and slides, interesting web sites we may have shown, and useful TEI links