VLI 3(1): Rogers et al. (2014)

A Methodology for Identification of the Formulaic Language Most Representative of High-frequency Collocations
James Rogers (a), Chris Brizzard (a), Frank Daulton (b), Cosmin Florescu (c), Ian MacLean (a), Kayo Mimura (a), John O’Donoghue (d), Masaya Okamoto (e), Gordon Reid (a) and Yoshiaki Shimada (f)
(a) Kansai Gaidai University; (b) Ryukoku University; (c) University of New England; (d) Osaka Board of Education; (e) University of Manchester; (f) State University of New York at Albany
doi: http://dx.doi.org/10.7820/vli.v03.1.rogers.et.al
Download this article (pdf)

Researchers have stated that learning formulaic language is key to
achieving fluency. It has also been stated that studying vocabulary in
this way is more efficient than isolated vocabulary learning. However,
there is a lack of research in regards to which formulaic language should
be taught. There is a further lack of research about how such formulaic
language can be identified. This study aimed to evaluate a methodology
for identifying the most common formulaic language. It compared multiword
unit identification results from both 500 and 1,000 example
sentences and quantified how often native speakers opt to extend multiword
units beyond their core pivot and collocate. This study also
identified and quantified colligational issues affecting multi-word unit
identification. The results showed no difference in multi-word unit
identification between 500 and 1,000 example sentences, that native
speakers opted to extend multi-word units more than half of the time, and
that colligational issues only affected approximately 3% of the items
examined. This study concluded that 500 example sentences are just as
reliable as 1,000 when identifying multi-word units. It also found that
extending multi-word units beyond their core pivot and collocate is an
essential step researchers should take. This study also found that a
colligational treatment is necessary if the aim is to achieve the most
accurate data; however, the percentage of items that were affected were
small and the methodology time-consuming. This finding indicates that
there is a need for improved software to better automate the steps taken.

Rogers, J., Brizzard, C., Daulton, F., Florescu, C., MacLean, I., Mimura, K., O’Donoghue, J., Okamoto, M., Reid, G., & Shimada, Y. (2014). A methodology for identification of the formulaic language most representative of high-frequency collocations. Vocabulary Learning and Instruction, 3 (1), 51-65. doi: 10.7820/vli.v03.1.rogers.et.al