Current Status of My Project
(The following may be later integrated into my completed project.)
To complete the NULab Certificate, I have chosen to utilize the power of word embedding models for a project on the Foreign Relations of the United States series. The project initially was based around a basic gender analysis on two regions over the period of the early Cold War. Operating under the assumption that the corpus was a complete representation of all foreign relations documents concerning the area and timeframe, I began to investigate the corpus through basic term searches. Directing my search toward strongly gendered vectors, either male or female, two very different results were produced from each corpus. In the European corpus, words associated with the terms “woman+girl” resembled roles in relation to men, such as “daughter” and “wife”, but also terms of agriculture such as “mule” and “lamb”. From the corpus regarding Latin America in the same period with the same search terms, associated words included “jail”, “hanged”, “traitor”, “killed”, and more. These two queries included the same parameters (Threads: 3; Vectors: 100; Window: 10; Iterations: 10; Negative Sample: 5).
While it is very interested to see the differing results that each corpus produced, insinuating that State Department officials had strongly different conceptions of the female gender in each region, it is important to reconsider the production of the corpus and the project itself. On 14 September 2020, the Historical Advisory Committee (HAC) to the Department of State held a public forum on the production of the FRUS series. Dr. Elizabeth Charles of the Office of the Historian, who spoke on the process of producing a volume, explained that after completing about a year of research, the officials involved select about 3-4,000 documents to review. For the final edition, about 300-450 documents are compiled to total about 1,400 pages based on the “important” events of the period. Dr. Charles estimated that for every document that makes it into the volume to show “what best conveys how these decision makers made these decisions”, there are twenty that are left out. The Office of the Historian aims to create an accurate representation of the foreign policy for that period and region, but this requires curating a sample of the correspondence and official policy rather than providing its entirety. This of course also includes the limitations and stylization of the declassification procedure. In addition, the “Editing Division” copyedits the volume. In order to stay within the publication timeline and typical page limitations (around 1,400), the editors may also choose to reduce the amount of documents included. Overall, there is a precise process of editing the volume to be concise and to represent the chosen topics by the department.
Understanding that the corpus is a product of a rigorous process of selection and editing, I was reminded of Catherine D’Ignazio and Lauren Klein’s lessons on “data feminism”. In their work, they call for using a feminist approach of recognizing the “context” in which the data was produced. This recognition of context allows researchers “to better understand any functional limitations of the data and any associated ethical obligations, as well as how the power and privilege that contributed to their making may be obscuring the truth” [D’Ignazio and Klein, 152-153]. Through this lens, it is possible to see that the raw “data” that is used in analysis, whether it is textual or otherwise, is not bias-free. There are human selections at every level of this project. The State Department officials in the mid-twentieth century created documents with their own biases; the current producers of the FRUS series make selections based on what is deemed “important”; and the researcher (myself) has crafted an analysis with certain parameters and methodologies that have produced the given results. This complexity will be discussed at greater length in the final project.
References
D’Ignazio, Catherine and Lauren F. Klein. Data Feminism. Ideas Series. Cambridge, Massachusetts: The MIT Press, 2020. http://search.ebscohost.com.ezproxy.neu.edu/login.aspx?direct=true&AuthType=ip,shib&db=nlebk&AN=2378911&site=ehost-live&scope=site.