Evaluating the Performance of Topic Modeling Techniques for Bibliometric Analysis Research: An LDA-based Approach
Abstract
Doi: 10.28991/HIJ-2024-05-02-07
Full Text: PDF
Keywords
References
Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285-296. doi:10.1016/j.jbusres.2021.04.070.
Mejia, C., Wu, M., Zhang, Y., & Kajikawa, Y. (2021). Exploring topics in bibliometric research through citation networks and semantic analysis. Frontiers in Research Metrics and Analytics, 6, 742311. doi:10.3389/frma.2021.742311.
Ninkov, A., Frank, J. R., & Maggio, L. A. (2022). Bibliometrics: Methods for studying academic publishing. Perspectives on medical education, 11(3), 173-176. doi:10.1007/s40037-021-00695-4.
Li, X., & Lei, L. (2021). A bibliometric analysis of topic modelling studies (2000–2017). Journal of Information Science, 47(2), 161-175. doi:10.1177/0165551519877049.
Kuhn, K. D. (2018). Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transportation Research Part C: Emerging Technologies, 87, 105-122. doi:10.1016/j.trc.2017.12.018.
Nielsen, M. W., & Börjeson, L. (2019). Gender diversity in the management field: Does it matter for research outcomes?. Research Policy, 48(7), 1617-1632. doi:10.1016/j.respol.2019.03.006.
Gohari, P., Wu, B., Hawkins, C., Hale, M., & Topcu, U. (2021). Differential privacy on the unit simplex via the dirichlet mechanism. IEEE Transactions on Information Forensics and Security, 16, 2326-2340. doi:10.1109/TIFS.2021.3052356.
Jiang, H., Qiang, M., & Lin, P. (2016). Finding academic concerns of the Three Gorges Project based on a topic modeling approach. Ecological indicators, 60, 693-701. doi:10.1016/j.ecolind.2015.08.007.
Li, Y., Jiang, D., Lian, R., Wu, X., Tan, C., Xu, Y., & Su, Z. (2021). Heterogeneous latent topic discovery for semantic text mining. IEEE Transactions on Knowledge and Data Engineering, 35(1), 533-544. doi:10.1109/TKDE.2021.3077025.
Zhou, X., Liang, W., Luo, Z., & Pan, Y. (2021). Periodic-aware intelligent prediction model for information diffusion in social networks. IEEE Transactions on Network Science and Engineering, 8(2), 894-904. doi:10.1109/TNSE.2021.3064952.
Isoaho, K., Gritsenko, D., & Mäkelä, E. (2021). Topic modeling and text analysis for qualitative policy research. Policy Studies Journal, 49(1), 300-324. doi:10.1111/psj.12343.
Kwok, S. W. H., Vadde, S. K., & Wang, G. (2021). Tweet topics and sentiments relating to COVID-19 vaccination among Australian Twitter users: machine learning analysis. Journal of medical Internet research, 23(5), e26953. doi:10.2196/26953.
Wu, Q., Hare, A., Wang, S., Tu, Y., Liu, Z., Brinton, C. G., & Li, Y. (2021). Bats: A spectral biclustering approach to single document topic modeling and segmentation. ACM Transactions on Intelligent Systems and Technology (TIST), 12(5), 1-29. doi:10.1145/3468268.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. doi:10.1145/2133806.2133826.
Yin, B., & Yuan, C. H. (2022). Detecting latent topics and trends in blended learning using LDA topic modeling. Education and Information Technologies, 27, 12689–12712. doi:10.1007/s10639-022-11118-0.
Hwang, S., & Cho, E. (2021). Exploring Latent Topics and Research Trends in Mathematics Teachers’ Knowledge Using Topic Modeling: A Systematic Review. Mathematics, 9(22), 2956. doi:10.3390/math9222956.
Schoepflin, U., & Glänzel, W. (2001). Two decades of" Scientometrics". An interdisciplinary field represented by its leading journal. Scientometrics, 50(2), 301-312. doi:10.1023/a:1010577824449.
Jonkers, K., & Derrick, G. E. (2012). The bibliometric bandwagon: Characteristics of bibliometric articles outside the field literature. Journal of the American Society for Information Science and Technology, 63(4), 829-836. doi:10.1002/asi.22620.
Milojević, S., & Leydesdorff, L. (2013). Information metrics (iMetrics): A research specialty with a socio-cognitive identity?. Scientometrics, 95, 141-157. doi:10.1007/s11192-012-0861-z.
Ayaz, A., Ozyurt, O., Al-Rahmi, W. M., Salloum, S., Shutaleva, A., Alblehai, F., & Habes, M. (2023). Exploring Gamification Research Trends Using Topic Modeling. IEEE Access, 11, 119676-119692. doi:10.1109/ACCESS.2023.3326444.
Robledo, S., & Zuluaga, M. (2022). Topic modeling: Perspectives from a literature review. IEEE Access, 11, 4066-4078. doi:10.1109/ACCESS.2022.3232939.
Mifrah, S., & Benlahmar, E. H. (2020). Topic modeling coherence: A comparative study between LDA and NMF models using COVID’19 corpus. International Journal of Advanced Trends in Computer Science and Engineering, 5756-5761. doi:10.30534/ijatcse/2020/231942020.
Cui, W., Jinling, L., Zhang, T., & Zhang, S. (2023). A Recognition Method of Measuring Literature Topic Evolution Paths Based on K-means-NMF. Knowledge Organization, 50(4), 257-271. doi:10.5771/0943-7444-2023-4-257.
Motamedi, N., Ghazimirsaeid, J., Sheikhshoaei, F., Mansourzadeh, M. J., & Dehdarirad, H. (2023). Bibliometric Analysis and Topic Modeling of Information Systems in Maternal Health Publications. International Journal of Information Science and Management, 21(2), 85-101. doi:10.22034/ijism.2023.1977814.0.
Almenara, C. A. (2022). 40 years of research on eating disorders in domain-specific journals: Bibliometrics, network analysis, and topic modeling. PloS one, 17(12), e0278981. doi:10.1371/journal.pone.0278981.
Sharma, C., Batra, I., Sharma, S., Malik, A., Hosen, A. S., & Ra, I. H. (2022). Predicting trends and research patterns of smart cities: A semi-automatic review using latent dirichlet allocation (LDA). IEEE Access, 10, 121080-121095. doi:10.1109/ACCESS.2022.3214310.
Gurcan, F., & Cagiltay, N. E. (2022). Exploratory analysis of topic interests and their evolution in bioinformatics research using semantic text mining and probabilistic topic modeling. IEEE Access, 10, 31480-31493. doi:10.1109/ACCESS.2022.3160795.
Cobelli, N., & Blasi, S. (2024). Combining topic modeling and bibliometric analysis to understand the evolution of technological innovation adoption in the healthcare industry. European Journal of Innovation Management, 27(9), 127-149. doi:10.1108/EJIM-06-2023-0497.
Chen, X., & Xie, H. (2020). A structural topic modeling-based bibliometric study of sentiment analysis literature. Cognitive Computation, 12, 1097-1129. doi:10.1007/s12559-020-09745-1.
Chen, X., Xie, H., Cheng, G., & Li, Z. (2022a). A decade of sentic computing: topic modeling and bibliometric analysis. Cognitive computation, 14(1), 24-47. doi:10.1007/s12559-021-09861-6.
Jiang, H., Qiang, M., & Lin, P. (2016). A topic modeling based bibliometric exploration of hydropower research. Renewable and Sustainable Energy Reviews, 57, 226-237. doi:10.1016/j.rser.2015.12.194.
Linnenluecke, M. K., Marrone, M., & Singh, A. K. (2020). Conducting systematic literature reviews and bibliometric analyses. Australian Journal of Management, 45(2), 175-194. doi:10.1177/0312896219877678.
Chen, X., Zou, D., & Xie, H. (2022). A decade of learning analytics: Structural topic modeling based bibliometric analysis. Education and Information Technologies, 27(8), 10517-10561. doi:10.1007/s10639-022-11046-z.
Amaro, A., & Bacao, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. doi:10.28991/ESJ-2024-08-01-09.
Cho, S. B., Shin, S., & Kang, D. S. (2018). A study on the research trends on open innovation using topic modeling. Informatization policy, 25(3), 52-74.
Ali, M. (2020). PyCaret: An open source, low-code machine learning library in Python. PyCaret Version, 2.
Bettina, G., & Kurt, H. (2011). Topic models: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1-30. doi:10.18637/jss.v040.i13.
Chowdhury, C. R., & Bhuyan, P. (2010). Information retrieval using fuzzy c-means clustering and modified vector space model. 3rd International Conference on Computer Science and Information Technology, 1, 696-700. doi:10.1109/ICCSIT.2010.5564542.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78, 15169-15211. doi:10.1007/s11042-018-6894-4.
Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, 952-961.
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., & Blei, D. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems, 22.
Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. Proceedings of the eighth ACM international conference on Web search and data mining, 399-408. doi:10.1145/2684822.2685324.
Řehůřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Valletta, Malta: University of Malta, 2010, 46-50.
Chen, X., Zou, D., & Xie, H. (2020). Fifty years of British Journal of Educational Technology: A topic modeling based bibliometric perspective. British Journal of Educational Technology, 51(3), 692-708. doi:10.1111/bjet.12907.
Ozansoy Çadırcı, T., & Sağkaya Güngör, A. (2021). 26 years left behind: a historical and predictive analysis of electronic business research. Electronic Commerce Research, 21, 223-243. doi.org:10.1007/s10660-021-09459-y.
Zhu, B., Zheng, X., Liu, H., Li, J., & Wang, P. (2020). Analysis of spatiotemporal characteristics of big data on social media sentiment with COVID-19 epidemic topics. Chaos, Solitons & Fractals, 140, 110123. doi:10.1016/j.chaos.2020.110123.
Bovens, L., & Hartmann, S. (2003). Solving the riddle of coherence. Mind, 112(448), 601-633. doi:10.1093/mind/112.448.601
Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of topic coherence. Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics, 100-108.
Lau, J. H., Newman, D., & Baldwin, T. (2014). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530-539.
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. Proceedings of the 2011 conference on empirical methods in natural language processing, 262-272.
Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7-9), 1775-1781. doi:10.1016/j.neucom.2008.06.011.
Sievert, C., & Shirley, K. (2014, June). pyLDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces, 63-70.
Small, H. (1997). Update on science mapping: Creating large document spaces. Scientometrics, 38, 275-293. doi:10.1007/BF02457414.
Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual review of information science and technology, 37(1), 179-255. doi:10.1002/aris.1440370106.
Chuang, J., Manning, C. D., & Heer, J. (2012). Termite: Visualization techniques for assessing textual topic models. Proceedings of the international working conference on advanced visual interfaces, 74-77.
Zhao, W., Chen, J.J., Perkins, R. et al. A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics 16 (Suppl 13), S8 (2015). doi:10.1186/1471-2105-16-S13-S8.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., ... & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. doi:10.1080/19312458.2018.1430754.
Aria, M., & Cuccurullo, C. (2017). Bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975. doi:10.1016/j.joi.2017.08.007.
DOI: 10.28991/HIJ-2024-05-02-07
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Lan Thi Nguyen, Wirapong Chansanam, Nalatpa Hunsapun, Vispat Chaichuay, Suparp Kanyacome, Akkharawoot Takhom, Yuttana Jaroenruen