Analysis of Factors Influencing Online Learning Using the Decision Tree Method

Xiaojie Li, Huili Tang


Objective: With the continuous development of online learning, the analysis of students' online learning has become increasingly important. Understanding which factors can influence students' engagement in online learning plays a crucial role in improving their learning performance. Methods: By utilizing web crawling techniques, students' online learning behavior data was collected from the Chinese University’s massive open online courses (MOOC) platform. To address the imbalance in the dataset, a synthetic minority oversampling technique (SMOTE) was used. Course progress was used to reflect students' online learning status, which was categorized into interruptions and completions. Furthermore, to tackle the issue of low computational efficiency in the C4.5 decision tree algorithm, its calculation formula was improved to develop an improved version of C4.5. Findings: Of the several factors analyzed, the number of course chapters had the greatest impact on students' online learning, followed by the number of course evaluations and overall course scores. The classification of students’ online learning situations based on an improved C4.5 algorithm revealed that the improved method achieved the highest accuracy rate of 0.942 and the shortest classification time of 0.165 s compared to methods such as the naive Bayesian and random forest algorithms. Novelty: This study designed an improved version of C4.5 to analyze the influencing factors in online learning, and its reliability was demonstrated through experiments, providing a new effective method for data analysis in online learning.


Doi: 10.28991/HIJ-2024-05-02-018

Full Text: PDF


Higher Education; Decision Tree; Online Learning; Influence Factors; e-Learning.


Feitosa de Moura, V., Alexandre de Souza, C., & Noronha Viana, A. B. (2021). The use of Massive Open Online Courses (MOOCs) in blended learning courses and the functional value perceived by students. Computers and Education, 161, 1–14. doi:10.1016/j.compedu.2020.104077.

Khalid, A., Lundqvist, K., & Yates, A. (2022). A literature review of implemented recommendation techniques used in Massive Open online Courses. Expert Systems with Applications, 187, 1–16. doi:10.1016/j.eswa.2021.115926.

Goli, A., Chintagunta, P. K., & Sriram, S. (2022). Effects of Payment on User Engagement in Online Courses. Journal of Marketing Research, 59(1), 11–34. doi:10.1177/00222437211016360.

Yahya, A. A., & Osman, A. (2019). A data-mining-based approach to informed decision-making in engineering education. Computer Applications in Engineering Education, 27(6), 1402–1418. doi:10.1002/cae.22158.

Dhanalakshmi, R., Muthukumar, B., & Canessane, R. A. (2022). Analysis of Special Children Education Using Data Mining Approach. International Journal of Uncertainty, Fuzziness and Knowldege-Based Systems, 30, 125–140. doi:10.1142/S0218488522400074.

Sang, H. (2021). Analysis and Research of Psychological Education Based on Data Mining Technology. Security and Communication Networks, 2021, 1–8. doi:10.1155/2021/8979507.

Thangakumar, J., & Kommina, S. B. (2020). Ant colony optimization-based feature subset selection with logistic regression classification model for education data mining. International Journal of Advanced Science and Technology, 29(3), 5821–5834., N. R. (2021). Prediction of Student Dropout in Malaysian’s Private Higher Education Institute using Data Mining Application. Turkish Journal of Computer and Mathematics Education, 12(3), 2326–2334. doi:10.17762/turcomat.v12i3.1219.

Ulkhaq, M. M., Pramono, S. N. W., & Adyatama, A. (2023). Assessing the tendency of judging bias in student competition: a data mining approach. Journal of Applied Research in Higher Education, 15(4), 1198–1211. doi:10.1108/JARHE-02-2022-0053.

Onyema, E. M., Khan, R., Eucheria, N. C., & Kumar, T. (2023). Impact of Mobile Technology and Use of Big Data in Physics Education During Coronavirus Lockdown. Big Data Mining and Analytics, 6(3), 381–389. doi:10.26599/BDMA.2022.9020013.

Bessadok, A., Abouzinadah, E., & Rabie, O. (2023). Exploring students’ digital activities and performances through their activities logged in learning management system using educational data mining approach. Interactive Technology and Smart Education, 20(1), 58–72. doi:10.1108/ITSE-08-2021-0148.

Tian, X., Alassafi, M. O., & Alsaadi, F. E. (2023). An efficient english teaching driven by enterprise-social media big data: A neural network-based solution. Fractals, 31(6), 1–9. doi:10.1142/S0218348X23401515.

Pham, D. Van, & Nguyen, B. K. (2023). A Visual Analytics Approach Applying for Discovering Knowledge from Multivariate Datasets of Stakeholders Feedback in the University. Vietnam Journal of Computer Science, 10(4), 463–483. doi:10.1142/S2196888823500082.

Mubarak, A. A., Cao, H., & Hezam, I. M. (2021). Deep analytic model for student dropout prediction in massive open online courses. Computers and Electrical Engineering, 93, 1–14. doi:10.1016/j.compeleceng.2021.107271.

Icourse163 (2024). iCourse_Free Online Course, Beijing, China. Available online: (accessed on March 2024).

Arwatchananukul, S., Saengrayap, R., Chaiwong, S., & Aunsri, N. (2022). Fast and Efficient Cavendish Banana Grade Classification using Random Forest Classifier with Synthetic Minority Oversampling Technique. IAENG International Journal of Computer Science, 49(1), 46–54.

Gallego, A. J., Calvo-Zaragoza, J., Valero-Mas, J. J., & Rico-Juan, J. R. (2018). Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition, 74, 531–543. doi:10.1016/j.patcog.2017.09.038.

Lashgari, M., & Shahrtash, S. M. (2022). Fast online decision tree-based scheme for predicting transient and short-term voltage stability status and determining driving force of instability. International Journal of Electrical Power and Energy Systems, 137, 1–13. doi:10.1016/j.ijepes.2021.107738.

Es-Sabery, F., Es-Sabery, K., Qadir, J., Sainz-De-Abajo, B., Hair, A., García-Zapirain, B., & De La Torre-Díez, I. (2021). A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier. IEEE Access, 9, 58706–58739. doi:10.1109/ACCESS.2021.3073215.

Irawan, Y. (2021). Application of the C4.5 Decision Tree Algorithm to Predict the Eligibility of Prospective Donors to Donate Blood Using Data Mining Classification. JTIM: Journal of Information Technology and Multimedia, 2(4), 181–189. doi:10.35746/jtim.v2i4.75.

Navarro, J. D., Millwater, H. R., Montoya, A., & Restrepo, D. (2022). Arbitrary-Order Sensitivity Analysis in Phononic Metamaterials Using the Multicomplex Taylor Series Expansion Method Coupled with Bloch’s Theorem. Journal of Applied Mechanics, Transactions ASME, 89(2), 1–15. doi:10.1115/1.4052830.

Effendy, F., Kurniawati, O. D., & Priambada, G. (2021). Factor Affecting E-Learning User Acceptance: A Case Study of AULA. Journal of Physics: Conference Series, 1783(1), 1–6. doi:10.1088/1742-6596/1783/1/012122.

Elshami, W., Taha, M. H., Abdalla, M. E., Abuzaid, M., Saravanan, C., & Al Kawas, S. (2022). Factors that affect student engagement in online learning in health professions education. Nurse Education Today, 110, 105261. doi:10.1016/j.nedt.2021.105261.

Syafril, & Novrianti. (2021). The Factor Analysis of Online Learning Barriers on Learning Evaluation Course during Covid-19 Pandemic. Journal of Physics: Conference Series, 1779(1), 1–5. doi:10.1088/1742-6596/1779/1/012021.

Mehmood, T., Kanwal, A., & Butt, M. M. (2022). Naive Bayes combined with partial least squares for classification of high dimensional microarray data. Chemometrics and Intelligent Laboratory Systems, 222, 1–8. doi:10.1016/j.chemolab.2022.104492.

Lu, M., Sadiq, S., Feaster, D. J., & Ishwaran, H. (2018). Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods. Journal of Computational and Graphical Statistics, 27(1), 209–219. doi:10.1080/10618600.2017.1356325.

Marston, Z. P. D., Cira, T. M., Knight, J. F., Mulla, D., Alves, T. M., Hodgson, E. W., Ribeiro, A. V., MacRae, I. V., & Koch, R. L. (2022). Linear Support Vector Machine Classification of Plant Stress from Soybean Aphid (Hemiptera: Aphididae) Using Hyperspectral Reflectance. Journal of Economic Entomology, 115(5), 1557–1563. doi:10.1093/jee/toac077.

Full Text: PDF

DOI: 10.28991/HIJ-2024-05-02-018


  • There are currently no refbacks.

Copyright (c) 2024 Xiaojie Li, Huili Tang