Automated Vocabulary Profiling of TOEIC Listening Materials: A CEFR-Aligned Approach for EFL Learners
Downloads
This study examines the vocabulary characteristics of TOEIC Listening materials to support the development of more targeted English language teaching resources for EFL learners, particularly in Thai higher education. Using a corpus-based approach, we collected and analyzed a representative dataset of TOEIC preparation texts with a custom-built Python tool for vocabulary profiling. The tool performed key tasks such as frequency analysis, concordance generation, n-gram extraction, collocation detection, and CEFR-level classification. The vocabulary items were categorized using established lists, including the General Service List (GSL), Academic Word List (AWL), and CEFR levels. Results reveal that basic (K1) and function words dominate the materials, while a substantial proportion of off-list and domain-specific vocabulary was also identified. Most words fall within the B1 proficiency level, suggesting intermediate-level accessibility. The study contributes a novel, automated vocabulary profiling framework that integrates linguistic metrics and CEFR-based classification, offering practical implications for curriculum design, test preparation, and vocabulary instruction. This approach enhances the precision and efficiency of material evaluation, bridging the gap between test content and learner needs. The findings highlight the potential of automated tools to improve vocabulary-focused teaching strategies and inform language assessment practices in EFL contexts.
Downloads
[1] Pecorari, D., Shaw, P., & Malmström, H. (2019). Developing a new academic vocabulary test. Journal of English for Academic Purposes, 39, 59–71. doi:10.1016/j.jeap.2019.02.004.
[2] Webb, S. (2008). Receptive and productive vocabulary sizes of L2 learners. Studies in Second Language Acquisition, 30(1), 79–95. doi:10.1017/S0272263108080042.
[3] Sun, D., Chen, Z., & Zhu, S. (2023). What affects second language vocabulary learning? Evidence from multivariate analysis. Frontiers in Education, 8. doi:10.3389/feduc.2023.1210640.
[4] Tran, Y. (2023). Improving English Vocabulary for Students Through Listening to English News. International Journal of Language and Literary Studies, 5(1), 1–13. doi:10.36892/ijlls.v5i1.1152.
[5] Meebangsai, D., Pongtin, P., Kitipoontanakorn, P., & Laosrirattanachai, P. (2023). Investigating Proficiency of Academic English in Student Writing: A Comparative Case Study on Vocabulary Utilization in Student Research Article Writing vis-à-vis National and International Research. Pasaa, 67(1), 66–100. doi:10.58837/chula.pasaa.67.1.3.
[6] Jang, W., & Leech, K. (2023). Contextual Modulation of Adult–Child Language Interaction: Semantic Network Connectivity and Children’s Vocabulary Development. Education Sciences, 13(11), 1084. doi:10.3390/educsci13111084.
[7] Laufer, B. (1989). What percentage of text lexis is necessary for comprehension? Special Language: From Humans Thinking to Thinking Machines, C. Lauren and M. Nordman, Eds. Bristol: Multilingual Matters, 316-323. Available online: https://www.lextutor.ca/cover/papers/laufer_1989.pdf (accessed on August 2025).
[8] Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34(2), 213. doi:10.2307/3587951.
[9] Kaneko, M. (2020). Lexical Frequency Profiling of High-Stakes English Tests: Text Coverage of Cambridge First, EIKEN, GTEC, IELTS, TEAP, TOEFL, and TOEIC. JACET Journal, 64(64), 79–93.
[10] Yildiz, M. (2023). Lexical Coverage Required for Minimal and Optimal Levels of Reading Comprehension in the English Tests of the Higher Education Institutions Examination. REFLections, 30(3), 695–711. doi:10.61508/refl.v30i3.268077.
[11] Phung, D. H., & Ha, H. T. (2022). Vocabulary Demands of the IELTS Listening Test: An In-Depth Analysis. SAGE Open, 12(1), 1–13. doi:10.1177/21582440221079934.
[12] Towns, S. G. (2020). Which Word List Should I Teach? Using Word Lists to Support Textbook Vocabulary Instruction. THAITESOL Journal, 33(1), 20–35.
[13] Sukying, A. (2023). The role of vocabulary size and depth in predicting postgraduate students’ second language writing performance. LEARN Journal: Language Education and Acquisition Research Network, 16(1), 575-603.
[14] Nazri, M. A., Fikni, Z., Hijjah, P., & Wati, L. (2024). Engaging English Language Learners: The Impact of Pictionary on Student Interest and Vocabulary Retention in EFL Classrooms. ELT Worldwide: Journal of English Language Teaching, 11(2), 487. doi:10.26858/eltww.v11i2.64579.
[15] West, M. (1953). A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology. Longman, London, United Kingdom.
[16] Seong, S., & Cha, J. (2023). Domain Word Extension Using Curriculum Learning. Sensors, 23(6), 3064. doi:10.3390/s23063064.
[17] Liu, C., Wang, S., Qing, L., Kuang, K., Kang, Y., Sun, C., & Wu, F. (2024). Gold Panning in Vocabulary: An Adaptive Method for Vocabulary Expansion of Domain-Specific LLMs. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 7442–7459. doi:10.18653/v1/2024.emnlp-main.4244.
[18] Schäfer, J., Reuter, T., Karbach, J., & Leuchter, M. (2024). Domain-specific knowledge and domain-general abilities in children’s science problem-solving. British Journal of Educational Psychology, 94(2), 346–366. doi:10.1111/bjep.12649.
[19] Manorom, P., Hunsapun, N., & Chansanam, W. (2024). An investigation of English speaking problems of Chinese EFL students in Thailand. International Journal of English Language and Literature Studies, 13(2), 270–286. doi:10.55493/5019.v13i2.5046.
[20] Panmei, B. (2023). Strategic Vocabulary Learning in Vocabulary List Learning: Insights from EFL Learners in Thailand. 3L The Southeast Asian Journal of English Language Studies, 29(1), 93–107. doi:10.17576/3l-2023-2901-07.
[21] Raungsawat, N., & Chumworatayee, T. (2021). The Effects of Vocabulary Self-Collection Strategy Instruction on Thai EFL Undergraduate Students’ Vocabulary Knowledge and Perceptions. Arab World English Journal, 12(1), 253–269. doi:10.24093/awej/vol12no1.18.
[22] Rofiah, N. L., & Waluyo, B. (2024). Effects of Gamified Grammar and Vocabulary Learning in an English Course on EFL Students in Thailand. Teaching English with Technology, 24(2), 22-46. doi:10.56297/vaca6841/lrdx3699/djjl1101.
[23] Tiansoodeenon, M., & Prasongngern, P. (2025). Enhancing Active Learning through the Interactive Learning Platform to Improve Thai EFL Students’ English Vocabulary, Grammatical Retention, and Motivation in English Learning. Higher Education Studies, 15(1), 232. doi:10.5539/hes.v15n1p232.
[24] Kaneko, M. (2017). Vocabulary size targets for the TOEIC test. JACET Journal, 61, 57-67.
[25] Nation, P., & Beglar, D. (2007). A vocabulary size test. Plenary Speaker, JALT2007, 1-4.
[26] Sharpe, P. J. (2018). Barron’s TOEIC Practice Exams. Barron's Educational Series, New York, United States.
[27] Collins, H. (2019). Collins Skills for the TOEIC Test: Listening and Reading (2nd Ed.). Harper Collins, London, United Kingdom.
[28] ETC. (2014). Test and score data summary for TOEFL iBT® tests: January 2013–December 2014 test data. Educational Testing Service (ETC), Princeton, United States.
[29] Trew, G. (2007). Tactics for TOEIC: Listening and reading test. Oxford University Press, Oxford, United Kingdom.
[30] Sabatini, J., O’Reilly, T., & Doorey, N. a. (2018). Retooling Literacy Education for the 21st Century: Key Findings of the Reading for Understanding Initiative and Their Implications. Educational Testing Service. Available online: https://files.eric.ed.gov/fulltext/ED587186.pdf (accessed on August 2025).
[31] Educational Testing Service and Kaplan Test Prep. (2009). Kaplan's TOEIC Listening and Reading Prep Plus 2009–2010. Kaplan Publishing, New York, United States.
[32] Lougheed, L. (2007). Longman Preparation series for the new TOEIC test-More practice tests (4th Ed.). Pearson education, Inc, London, United Kingdom.
[33] Lertcharoenwanich, P. (2022). The Effect of Communicative Language Teaching in Test Preparation Course on TOEIC Score of EFL Business English Students. Journal of Language Teaching and Research, 13(6), 1188–1195. doi:10.17507/jltr.1306.06.
[34] Biber, D., Johansson, S., Leech, G. N., Conrad, S., & Finegan, E. (2000). Grammar of spoken and written English. Longman, London, United Kingdom.
[35] Cobb, T. (2021). Web Vocabprofile, an adaptation of Heatley, Nation & Coxhead's (2002) Range. Available online: http://www.lextutor.ca/vp/ (accessed on August 2025).
[36] Heatley, A., Nation, I. S. P., & Coxhead, A. (2002). RANGE and FREQUENCY programs. Victoria University of Wellington, Wellington, New Zealand.
[37] Li, H. (2025). Teaching academic English in higher education: strategies and challenges. Frontiers in Education, 10. doi:10.3389/feduc.2025.1559307.
[38] Little, D. (2020). Common European Framework of Reference for Languages. The TESOL Encyclopedia of English Language Teaching, 1–7, John Wiley & Sons, Hoboken, United States. doi:10.1002/9781118784235.eelt0114.pub2.
[39] Laurence Anthony. (2020). AntConc (Version 3.5.9), Waseda University, Tokyo, Japan. Available online: https://www.laurenceanthony.net/software (accessed on August 2025).
[40] Salkind, N. (2010). Encyclopedia of Research Design. Sage Publishing, Thousand Oaks, United States. doi:10.4135/9781412961288.
[41] Sedgwick, P. (2014). Cross sectional studies: advantages and disadvantages. BMJ, 348, 2276. doi:10.1136/bmj.g2276.
[42] Tangsakul, S. (2024). Relationship Between Vocabulary Size and TOEIC Reading Achievement Among Undergraduate Students. International Journal of Sociologies and Anthropologies Science Reviews, 4(4), 305–312. doi:10.60027/ijsasr.2024.3896.
[43] Tsai, R. M. R., & Huang, S. C. (2023). EFL reading strategies used by high school students with different English proficiency. Forum for Linguistic Studies, 5(3), 1855–1855. doi:10.59400/fls.v5i3.1855.
[44] Namsaeng, P. (2021). An Analysis and Techniques Used for TOEIC Test Takers in Thailand. Journal of Liberal Arts and Service Industry, 4(2), 658-683.
[45] Chinda, B., & Hinkelman, D. (2023). Teacher Cognition of EFL Assessment: A Case Study of Professional Development on Performance-based Language Assessment in Japan. REFLections, 30(3), 757–775. doi:10.61508/refl.v30i3.268136.
[46] Thongsonkleeb, K. (2023). Students’ Satisfaction with the Use of Google Forms for the TOEIC Test to Evaluate English Proficiency. 2023 8th International Conference on Business and Industrial Research (ICBIR), 1303–1306. doi:10.1109/icbir57571.2023.10147504.
[47] Bancha, W., & Tongtep, N. (2020). Effects of TOEIC vocabulary lessons plus LMS exercises and TOEIC vocabulary lessons plus MultiEx games on the short-term vocabulary memorization and long-term vocabulary retention of Thai tertiary students. Project Code FIS R6302, Prince of Songkla University, Hat Yai, Thailand.
[48] Li, C. H. (2023). Exploring aural vocabulary knowledge for TOEIC as a language exit requirement in higher education in Taiwan. IRAL - International Review of Applied Linguistics in Language Teaching, 62(4), 1853–1875. doi:10.1515/iral-2023-0021.
[49] Janjaroongpak, K. (2022). CEFR-referenced item specification analysis of TOEIC incomplete sentences part on Intermediate Thai learners. St. Theresa Journal of Humanities and Social Sciences, 8(2), 61-76.
[50] Ha, H. T., Le, H. T., Phung, D. H., & Nguyen, S. D. (2022). Is “general” easier than “academic”? A corpus-based investigation into the two modules of IELTS reading test. SN Social Sciences, 2(8), 159. doi:10.1007/s43545-022-00461-1.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.





















