Developing a Discipline-Specific Corpus and High-Frequency Word List for Science and Engineering Students in Graduate School
Suwako Ueharaa, Hibiya Harakia, and Stuart McLeanb
aThe University of Electro-Communications; bMomoyama Gakuin University
doi: https://doi.org/10.7820/vli.v11.2.uehara
Download this article (pdf)
Abstract
Japanese graduate school students in the field of science and engineering need to read academic research in their second language (L2), and such tasks can be challenging. Studies showed a strong (0.78) correlation between vocabulary size and reading comprehension (McLean et al., 2020), and providing high-frequency word lists could enhance comprehension. In this work-in-progress, 1.35 million tokens of professor-recommended reading materials were used to investigate a method to create a vocabulary list that would benefit science majors in graduate school; the procedures to create a corpus and a high-frequency word list efficiently; and the steps required to create a cleaner corpus. This paper outlines a systematic literature-informed method that includes input from professors in the field; the combined use of tailored script in MATLAB and AntCont (Anthony, 2022) generated corpus and high-frequency words efficiently; and repeated comparison of original PDFs and the matching text files, then adding MATLAB script to deal with specific issues created by a cleaner text. This proposed method can be applied in other contexts to enhance the generation of high-frequency word lists.
Citation
Uehara, S., Haraki, H., & McLean, S. (2022). Developing a discipline-specific corpus and high-frequency word list for science and engineering students in graduate school. Vocabulary Learning and Instruction, 11(2), 57–68. https://doi.org/10.7820/vli.v11.2.uehara