Become a member

Language Magazine is a monthly print and online publication that provides cutting-edge information for language learners, educators, and professionals around the world.

― Advertisement ―

― Advertisement ―

Breaking Down the Monolingual Wall V: Collaborate to Thrive

Let’s Start Here Educator collaboration is an essential component...
HomeProfessional DevelopmentLanguage Adapts, Students Adapt—Why Not Language Tests?

Language Adapts, Students Adapt—Why Not Language Tests?

Victor Santos extolls the benefits of computer-adaptive tests

Imagine you are trying to build strength. You go to the gym every day, lift weights, and then come home to rest and recover. The next day, you are back at the gym for another round of weightlifting. For the past two weeks, you have only been able to do ten reps with 60 pounds at the bench press, but your goal is to lift 100 pounds. What is the best way of getting to your goal of lifting 100 pounds?

Option A: Keep trying to lift 100 pounds until one day you are hopefully able to do so;
Option B: Try lifting 61–62 pounds next time you go to the gym and then increase the weight each week by two pounds;
Option C: Try lifting 80–81 pounds next time you go to the gym and then increase the weight each week by five pounds;
Option D: Keep lifting 60 pounds every time and hope you can lift 100 pounds someday.

Most of us would agree that the best way to achieve a goal—whether building strength, as in the example above, or improving language proficiency—is by pushing our limits little by little (option B above). If we do not push our limits at all (option D), no growth or development will take place. If we push too hard (options A and C), we will be disappointed that we cannot do it and will lose motivation. In the same manner, a juggler doesn’t learn to juggle five balls before first being able to juggle three and then four. And the only way to learn to juggle four balls is by being able to juggle three and then attempting to juggle four. Mistakes will happen, but with practice, patience, and the support of the foundational skills previously learned while juggling three balls, that juggler will eventually be able to juggle four balls, getting them closer to their goal of five.

Making Progress in Learning a Language

A notion that has had a significant impact in the fields of psychology, sociology, and applied linguistics is that of the zone of proximal development (a.k.a. ZPD), first presented by Vygotsky in 1978 as a component of his theory of learning and development. As originally proposed, the ZPD indicates “the distance between the actual development level (of the learner) as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance, or in collaboration with more capable peers” (Vygotsky, 1978, p. 86). Since its proposal, the core idea around ZPD has been applied to many different areas. One area in which the concept of ZPD needs to be better understood and absorbed by practitioners is the area of world language instruction and especially that of assessment.

All of us have likely met language teachers who constantly give their students tasks that are either too easy or too hard to complete. In other words, they do not really know how to adapt the lessons to the varied proficiency levels of their students or how to select exercises and tasks that will push the boundaries of their knowledge just a little further. They struggle with selecting activities that are within each student’s zone of proximal development. These would be tasks that are slightly above each student’s ability level (i.e., an intermediate-high level task for an intermediate-mid level student).

However, the idea that progress takes place when students are exposed to slightly more difficult tasks than they are capable of easily completing has been extensively researched and promoted in the applied linguistics literature. We see it, for example, in the Lexile framework, which promotes that the Lexile measure of a reading text be slightly above a student’s Lexile reading measure for the student to learn new words and structures in a scaffolded context. We see it in Krashen’s input hypothesis (1985), in which he defends the idea that acquisition takes place when input is slightly beyond a language speaker’s current level of competence (a.k.a., i+1).

Finally, we see it in the research into reading comprehension, which suggests that readers are still able to comprehend a text when about 2% to 5% of the words are unknown (Laufer and Ravenhorst-Kalovski, 2010; Nation, 2006; van Zeeland and Schmitt, 2012) and that this should be a target to aim for. The 95% to 98% of words that are known act as the scaffolding necessary for readers to infer the meaning of and gradually learn the unknown 2% to 5% of words. This is another example of i+1 in practice in our field.

Part of being a good language teacher is knowing how to adapt the content of our instruction to push students just the right amount so that they can improve their language proficiency by being continually exposed to language materials and language use scenarios that are just slightly above their current levels of ability. When we adapt, the students adapt, and so do their language skills.

Adaptive Language Tests?

Let’s imagine we have 100 language learners at various levels of reading proficiency in French, ranging from novice-low to advanced-high, but we do not know what each learner’s level is. Our task is to create an assessment that measures their reading proficiency in French. There are many ways to develop an assessment for that purpose, including

  • Option A: Create a paper-and-pencil or computerized test with several reading questions at each level (e.g., ACTFL levels from novice-low to advanced-high). Then, have all test takers, regardless of their actual proficiency, take the entire test, which is the same for everyone. At the end, see how many questions they got correct and assign a proficiency level to each test taker.
  • Option B: Create a computerized test that has several reading questions at each level from novice-low to advanced-high. Then, have each test taker take a few intermediate-level questions at first (the expected average ability of the group). For those test takers who fail to correctly answer most of the initial intermediate-level questions, show them novice-level questions and let them attempt to answer those. Accordingly, for those who do succeed in correctly answering most of the intermediate-level questions, show them advanced-level questions instead.

I hope most of the readers will agree that assessment method B above will tend to be shorter and will lead to less frustration or boredom than assessment method A. After all, if test takers can successfully answer intermediate-level questions, why ask them to answer a series of novice-level questions? Not only would that make the test longer and more boring for these test takers, but it could in fact decrease the level of precision of the assessment instrument. If I have seen a juggler successfully juggling six balls on a few occasions, do I need to see them successfully juggling four to be highly confident they can juggle four balls? The answer is no. We know their juggling ability is at least six balls. And the reason we can safely say so in this case is because juggling ability is a unitary construct, just like reading proficiency in a language, which follows determined developmental stages.

A well-known fact of Rasch measurement, a measurement framework commonly employed in developing language assessments (Ockey, 2021), is that the measurement precision of a question (i.e., the amount of statistical information provided by a question given a specific examinee) increases the closer the level of the question is to the level of the test taker. This is, for example, the psychometric framework employed in developing computer-adaptive tests (CATs), which are tests capable of adapting in real time to the estimated proficiency level of each individual test taker.

By not having all test takers take the same exact, linear, fixed-form test, a CAT has the potential to increase the precision and efficiency of the measurement while also providing for a much more pleasant test-taking experience for each test taker. As Schultz, Whitney, and Zickar (2014) note, CATs “can be both more effective and more efficient” when compared to fixed-form tests. This is because tests in which items are better targeted to the level of each test taker can afford to be shorter than their fixed-form counterparts without compromising the precision of the measurement (and, in most cases, improving it).

CATs can come in a variety of flavors, with the two most employed in language testing being adaptivity at the item level and adaptivity at the stage level. In the former, all test takers take the same initial item. After they respond to that item, the computer then chooses a subsequent item for each test taker that is close to their currently estimated proficiency. That process is repeated until the system is confident of the test taker’s proficiency level. In the latter method, called a multistage computer adaptive test (MsCAT), the test adapts at the stage level instead. All test takers take the same initial group of items.

Whether a language testing developer decides to implement an item-adaptive or stage-adaptive test is a matter of available human and computational resources, knowledge of the subject, and personal preference.

However, one thing is certain: in this day and age, there are few good reasons why a test developer may want to deliver a linear, fixed-form computer test of language proficiency that emulates a decades-old pen-and-paper test that does not adapt in real time to the level of the test takers.

Now, back to our strength-building analogy. Let’s say you are now a personal trainer (teacher) helping a group of 20 people (students) of varied abilities build their strength (improve their language proficiency). If you do not know where they currently stand, how would you go about assessing their current abilities (their current proficiency levels) in an efficient and effective manner in order to put together a comprehensive and targeted program for each one of them, making sure not to hurt them during the process?


Krashen, S. (1985). The Input Hypothesis: Issues and Implications. London: Longman.
Laufer, B., and Ravenhorst-Kalovski, G. C. (2010). “Lexical Threshold Revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension.” Reading in a Foreign Language, 22, 15–30.
Lund, R. J. (1991). “A Comparison of Second Language Listening and Reading Comprehension.” Modern Language Journal, 75, 196–204.
Nation, I. (2006). “How Large a Vocabulary Is Needed for Reading and Listening?” Canadian Modern Language Review, 63(1), 59–82.
Ockey, G. J. (2021). “Item Response Theory and Many-Facet Rasch Measurement.” In The Routledge Handbook of Language Testing (pp. 462–476). Routledge.
Schultz, K. S., Whitney, D. J., and Zickar, M. J. (2014). Measurement Theory in Action (2nd ed.). Hove, Sussex: Taylor & Francis.
Van Zeeland, H., and Schmitt, N. (2012). “Lexical Coverage in L1 and L2 Listening Comprehension: The same or different from reading comprehension?” Applied Linguistics. doi:10.1093/applin/ams074.
Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA: Harvard University Press.

Victor D. O. Santos, PhD, is director of assessment and research for Avant Assessment. His PhD dissertation (Iowa State University, 2017) was on the topic of assessing students’ academic vocabulary breadth by means of computer-adaptive assessment. Dr. Santos has served as a reviewer for several academic journals in the areas of language learning and assessment.

Language Magazine
Send this to a friend