Automated Vocabulary Profiling of TOEIC Listening Materials: A CEFR-Aligned Approach for EFL Learners

TOEIC Listening Vocabulary Profiling Corpus-based Analysis CEFR Levels EFL Instruction

Authors

Downloads

This study examines the vocabulary characteristics of TOEIC Listening materials to support the development of more targeted English language teaching resources for EFL learners, particularly in Thai higher education. Using a corpus-based approach, we collected and analyzed a representative dataset of TOEIC preparation texts with a custom-built Python tool for vocabulary profiling. The tool performed key tasks such as frequency analysis, concordance generation, n-gram extraction, collocation detection, and CEFR-level classification. The vocabulary items were categorized using established lists, including the General Service List (GSL), Academic Word List (AWL), and CEFR levels. Results reveal that basic (K1) and function words dominate the materials, while a substantial proportion of off-list and domain-specific vocabulary was also identified. Most words fall within the B1 proficiency level, suggesting intermediate-level accessibility. The study contributes a novel, automated vocabulary profiling framework that integrates linguistic metrics and CEFR-based classification, offering practical implications for curriculum design, test preparation, and vocabulary instruction. This approach enhances the precision and efficiency of material evaluation, bridging the gap between test content and learner needs. The findings highlight the potential of automated tools to improve vocabulary-focused teaching strategies and inform language assessment practices in EFL contexts.