Research with WWP Data at the AI/Machine Learning Research Bootcamp

Research with WWP Data at the AI/Machine Learning Research Bootcamp

By Haripriya Mehta, Co-founder, MehtA+ 

In Summer 2022, 15 high school students from all over the world participated in MehtA+’s AI/Machine Learning Research Bootcamp and learned the theory and application of machine learning. Students learned various AI/Machine Learning models including KNN, support vector machines, artificial neural networks, and topics in computer vision and natural language processing. 

The students put their newfound knowledge into practice through an exploratory week-long midterm project. The objective of the midterm was to predict the gender of the author of a book using machine learning and, in the process, understand what stylistic features may be useful in trying to determine if an author of the book was male or female. 

Students were provided a suggested corpus featuring 17th- and 18th-century science texts written by male and female authors.

The female-authored texts were graciously provided by Women Writers Project and included the following:

  • Cavendish, Margaret. Observations upon Experimental Philosophy, 1666 
  • Cavendish, Margaret. Natures Pictures, 1656 
  • Cavendish, Margaret. Philosophical Letters, 1664 
  • Conway, Anne. The Principles of the Most Ancient and Modern Philosophy, 1692 
  • Stone, Sarah. A Complete Practice of Midwifery, 1737 
  • Woolley, Hannah. The Cook’s Guide: or, Rare Receipts for Cookery, 1664 

The male-authored texts were obtained from Project Gutenberg and included the following:

  • Francis Bacon. The Advancement of Learning, 1605
  • Robert Boyle. The Sceptical Chymist, 1661
  • William Harvey. An Anatomical Disquisition on the Motion of Heart and Blood in Animals, 1628
  • Robert Hooke. Micrographia, 1665
  • Rene Descartes. A Discourse of a Method for the Well Guiding of Reason, 1649
  • John Maubray. The Female Physician, 1724

The challenges were aplenty, with some authors being overrepresented in the small corpus. The distribution of topics between books written by males and females were also varied due to paucity of freely available male-authored texts written in the same time period as the female-authored texts. 

Despite these challenges, students worked persistently on reading past research in this field, implementing various machine learning models, and analyzing the results they obtained.

At the end of the week, the students presented their work to Julia Flanders, WWP Director, and Sarah Connell, WWP Associate Director. 

Here are the students’ projects: 


Leave a Reply

Your email address will not be published. Required fields are marked *