VLI 3(2): Kamimoto (2014)

Local Item Dependence on the Vocabulary Levels Test Revisited
Tadamitsu Kamimoto
Kumamoto Gakuen University
doi: http://dx.doi.org/10.7820/vli.v03.2.kamimoto
The purpose of this study was to address the question of local item
dependence (LID) on the Vocabulary Levels Test (VLT). The test format
adopts a matching format consisting of six words and three definitions.
A review of the literature suggested that items presented in such a format
cannot be considered independent. However, Schmitt, Schmitt, and
Clapham reported that according to Rasch analysis, most of the items
performed independently. The present study examined effects of LID
from a non-Rasch approach. A set of three clusters was collapsed into one
large cluster of 18 words and 9 definitions on the assumption that such a
treatment would make items practically independent. Both the collapsed
VLT and the intact VLT were given to 114 Japanese English as a Foreign
Language (EFL) students at an interval of one week. Results showed that
scores were 15% higher on the original VLT than on the collapsed form.
Furthermore, an LID index based on the correct/wrong response types
between the two tests indicated that scores on the original VLT were
about 19% inflated. Implications were drawn and discussed.

VLI 3(2): McLean, Hogg, & Kramer (2014)

Estimations of Japanese University Learners’ English Vocabulary Sizes Using the Vocabulary Size Test
Stuart McLean(a), Nicholas Hogg(b) and Brandon Kramer(c)
(a) Temple University, Japan; (b) Osaka Yuhigaoka Gakuen High School; (c) Momoyama Gakuin University
doi: http://dx.doi.org/10.7820/vli.v03.2.mclean.et.al
Measuring students’ lexica is time-consuming, as one sitting of the
Vocabulary Size Test (VST) usually takes 40-60 minutes. As a result,
teachers would benefit from being able to make reasonable estimates from
commonly available information. This paper aims to investigate: (1) What
are the mean vocabulary sizes of students at Japanese universities as a
whole, and by university department (hensachi)? and (2) Are a university’s
department standardized rank scores (hensachi) a useful proxy for English
vocabulary size? This study used a cross-sectional design where 3,449
Japanese university students were tested using Nation and Beglar’s VST.
The results showed an average score of 3,715.20 word families and that
VST scores were significantly higher for students in higher department
hensachi programs. This current department hensachi was also found to
have a stronger correlation with VST scores than with other covariates
when the entire sample was considered. Lastly, there appears to be a
lack of consistent knowledge of the most frequent words of English,
suggesting that curriculum designers at Japanese universities should focus
on teaching high-frequency English words. Although the findings support
the use of the VST for comparing receptive written vocabulary knowledge
between learners, they perhaps do not support its use in establishing a
vocabulary size to decide lexically appropriate materials.

VLI 3(2): Laufer (2014)

Vocabulary in a Second Language: Selection, Acquisition, and Testing: A Commentary on Four Studies for JALT Vocabulary SIG
Batia Laufer
University of Haifa
doi: http://dx.doi.org/10.7820/vli.v03.2.laufer
Four papers by Charles Browne, Rachael Ruegg & Cherie Brown, Makoto
Yoshii and Junko Yamashita were presented in the morning session of the
Third Annual JALT Vocabulary SIG Vocabulary Symposium in Fukuoka,
Japan on June 14, 2014. As discussant, it is my pleasure to comment upon
each manuscript. These lexical researchers originate from all over Japan:
Tokyo, Akita, Kumamoto and Nagoya. Their lexical topics are related to
three themes that are central to vocabulary research: selection, acquisition
and testing. The papers are concerned with the types of words that should
be selected for teaching, with the optimal conditions for vocabulary
acquisition and with instruments that measure lexical proficiency, or are
used in lexical research. After commenting on each paper in turn, I shall
present a few suggestions for their future research.

VLI 3(2): Yamashita (2014)

Effects of Instruction on Yes-No Responses to L2 Collocations
Junko Yamashita
Nagoya University
doi: http://dx.doi.org/10.7820/vli.v03.2.yamashita
The lexical decision task (LDT), in which a participant makes dichotomous
judgments on target letter strings, is an established method in
psycholinguistic research to investigate the mental lexicon. With the
expansion of research interests from single lexeme to collocations, second
language (L2) researchers have started to use a similar judgment task at a
phrasal level (referred to as a phrasal decision task or PDT in this paper).
However, unlike the LDT, the PDT has not yet established a standard
form of prompt, and variation has been observed in previous L2 studies.
Hence, the purpose of this study was to examine effects of varying
instructions on PDT performance. Three instructions (acceptable/
commonly used/natural) were tested with Japanese university students
and native speakers of English, who were asked to make judgments on
English word combinations. Examining responses to congruent (felicitous
both in Japanese and English), incongruent (felicitous only in Japanese
or English), and baseline items, the study identified some effects of
instruction differences. However, these effects were not so strong as to
obscure the expected cross-linguistic congruency effect. Therefore, this
result has led to the conclusion that researchers have more freedom of
instruction selection in the PDT, at least among the three examined in this
study and to the extent that the congruency effect was measured by
accuracy scores.

VLI 3(2): Yoshii (2014)

Effects of Glosses and Reviewing of Glossed Words on L2 Vocabulary Learning through Reading
Makoto Yoshii
Prefectural University of Kumamoto
doi: http://dx.doi.org/10.7820/vli.v03.2.yoshii
This study is an attempt to integrate incidental and intentional
vocabulary learning in a reading activity without sacrificing the enjoyment
of reading. The paper reports on a study which examined the
effectiveness of a reading program on the web. The program contained
glosses in a text and a reviewing component at the end of reading. The
learners read the text for comprehension purposes on computers and were
able to look up certain words by clicking on them. At the end of the
reading the learners were also able to review the words they had looked
up during the reading. This study examines how well learners can pick up
words through this reading program. This study also examines the
effectiveness of a reviewing activity by comparing the words reviewed and
the words not reviewed. The study investigates if there are any differences
in immediate and medium-term effects for vocabulary learning. Data
from a pretest one week prior to the experiment, an immediate test right
after the reading, and a delayed test were used for the analysis. Lookup
behaviors of glosses and reviewing behaviors were also taken into account
for analyzing the data.

VLI 3(2): Ruegg & Brown (2014)

Analyzing the Effectiveness of Textbooks for Vocabulary Retention
Rachael Ruegg and Cherie Brown
Akita International University
doi: http://dx.doi.org/10.7820/vli.v03.2.ruegg.brown
Although many language educators are aware of factors necessary for
vocabulary acquisition and retention, in many institutions around the
world instructors are required to use textbooks as a basis for instruction,
yet, to date, little has been done to analyze the effectiveness of textbooks
to foster vocabulary acquisition and retention. Therefore, the purpose of
the present study was to analyze the vocabulary content of a range of
textbooks. The number of target words, frequency level of those words
and length of reading texts were analyzed in reading texts and their
associated activities of 20 English as a Second Language (ESL)/English as
a Foreign Language (EFL) textbooks. It was found that each text had as
little as no target words and on average 10, while the average text length
was 639 words. In terms of the frequency level of the target vocabulary,
although the textbooks claimed to be at an intermediate level or above,
the frequency level of the target vocabulary was very inconsistent, with as
many as 60% of target words coming from the first 1000 words or as
many as 100% of target words being low frequency ‘offlist’ words.
Furthermore, it was found that integrated skills textbooks had significantly
fewer target words and significantly shorter reading texts than
reading textbooks, while they also had a significantly higher percentage of
words from the 1000 word list and a significantly lower percentage from
the Academic Word List (AWL).

VLI 3(2): Browne (2014)

A New General Service List: The Better Mousetrap We’ve Been Looking for?
Charles Browne
Meiji Gakuen University
doi: http://dx.doi.org/10.7820/vli.v03.2.browne
This brief paper introduces the New General Service List (NGSL), a
major update of Michael West’s 1953 General Service List (GSL) of core
vocabulary for second language learners. After describing the rationale
behind the NGSL, the specific steps taken to create it and a discussion of
the latest 1.01 version of the list, the paper moves on to comparing the
text coverage offered by the NGSL against both the GSL as well as
another recent GSL published by Brezina and Gablasova (referred to as
Other New General Service List [ONGSL] in this paper). Results indicate
that while the original GSL offers slightly better coverage for texts of
classic literature (about 0.8% better than the NGSL and 4.5% more than
the ONGSL), the NGSL offers 5-6% more coverage than either list for
more modern corpora such as Scientific American or The Economist.

VLI 2(1): In’nami (2013)

Second-Language Vocabulary Assessment Research: Issues and Challenges
Yo In’nami
Shibaura Institute of Technology
doi: http://dx.doi.org/10.7820/vli.v02.1.innami
The four papers on second-language vocabulary assessment reviewed
below are exemplary works that merit close scrutiny. Therefore, this paper
provides a brief summary of each study, followed by comments and
suggestions, particularly in regard to the experimental designs and
analyses used in the studies.

VLI 2(1): Tseng (2013)

Validating a Pictorial Vocabulary Size Test via the 3PL-IRT Model
Wen-Ta Tseng
National Taiwan Normal University
doi: http://dx.doi.org/10.7820/vli.v02.1.tseng
The paper presented a newly conceived vocabulary size test based on
pictorial cues: Pictorial Vocabulary Size Test (PVST). A model-based
(1-2-3 parameter logistic item response theory model comparisons)
approach was taken to check which model could absorb the most
information from the data. Junior high school and primary school
students participated in the study (N = 1,354). Subjects’ ability estimates
and item parameter estimates were computed based on expected a
posteriori (EAP) method, one type of Bayesian method. BILOG-MG 3
was adopted to execute parameter estimates and model comparisons. The
results showed that the 3PL-IRT model best fit the empirical data. It was
then argued that test takers’ English vocabulary size could be best
captured under the 3PL-IRT model, as not only the discrimination
parameter, but also the guessing parameter has a fundamental role to
play in consideration of the test format adopted in the PVST. The article
concluded that the PVST could have positive washback effects on test
development and English vocabulary instruction.

VLI 2(1): Coulson et al. (2013)

Difficulties in Reading English Words: How do Japanese Learners Perform on a Test of Phonological Deficit?
David Coulson, Mayumi Ariiso, Rina Kojima, and Masami Tanaka
University of Niigata Prefecture
doi: http://dx.doi.org/10.7820/vli.v02.1.coulson.et.al
The motivation for this research is the observation of frequent read-aloud
miscues among Japanese university students, and the slow rate of reading
on simplified graded readers by many post-secondary learners. We
investigate what components of the second-language reading complex
may remain undeveloped. Word recognition in different languages
employs different phonological processes; so inadequately developed skill
in the foreign language processes may lead to poor decoding. This
situation requires formal assessment. However, practical tests of word-
recognition skill for second-language learners are not well developed.
Therefore, we adapted a test from Wydell and Kondo, replicating their
methodology to test the phonological skill of a JapaneseEnglish
bilingual diagnosed with dyslexia. We do not assume dyslexia among
Japanese English learners. Rather, the use of this test format aims to
elucidate the state of phonological skill of word-recognition ability in
ordinary learners. The subjects were university students at discrete
proficiency levels. The results show that this skill can be remarkably
underdeveloped. The average skill of subjects with lower proficiency was
similar to the objective standard of Wydell and Kondo’s English-reading
disabled subject. Higher-proficiency subjects performed much better. The
results do not imply dyslexia, although some lower-proficiency students
may, in fact, be English-dyslexic. Instead, they focus attention on the lack
of appropriate reading skills development in English education in Japan,
and its possible effect on overall proficiency. This situation principally
indicates a need for prolonged phonics training and more extensive L2

VLI 2(1): Stoeckel & Bennett (2013)

Sources of Differential Item Functioning between Korean and Japanese Examinees on a Second- Language Vocabulary Test
Tim Stoeckel and Phil Bennett
Miyazaki International College
doi: http://dx.doi.org/10.7820/vli.v02.1.stoeckel.bennett
The use of item response theory in equating or creating computeradaptive
tests relies on the assumption of invariance of item parameters
across populations. This assumption can be assessed with an analysis of
differential item functioning (DIF). The purpose of this study was (a) to
ascertain whether DIF between two native language groups was present
on a 90-item multiple-choice English vocabulary test and (b) to explore
the causes of DIF, should it exist. Participants were 184 Korean and 146
Japanese undergraduate students learning English as a foreign language
in their home countries. A separate calibration t-test approach was used
to identify DIF, with the criteria set at p < 0.01 and effect size > 1 logit,
calculated as the difference in Rasch item-difficulty between the two
groups. Twenty-one items displayed DIF. The causes of DIF in nine of
those items were tentatively identified as relating to their status as
loanwords in the L1. When a tested word was a loanword in both Korean
and Japanese, differences in both the frequency and range of use of the
loanword in the two languages predicted the direction of DIF. Similarly,
phonological/orthographic overlap between two separate English loanwords
in the L1 was found to be a possible cause of DIF. Implications for
test development and further research in this area are discussed.

