Multi-class SVM Classification Comparison for Health Service Satisfaction Survey Data in Bahasa

Gede Indrawan, Heri Setiawan, Aris Gunadi

Abstract


This study aimed to compare the Multi-class Support Vector Machine (MSVM) classification with the One-versus-One (OvO) and One-versus-Rest (OvR) approaches using unigram and bigram features. The study used the service satisfaction survey report of Denpasar public health centers by the Center for Public Health Innovation (CPHI), Medical School, Udayana University. As Bali is known as the world's main tourism destination, it is important to know about its supporting public health service through its representative capital city, Denpasar. Moreover, this study laid the foundation for the classification process using the available methods to fit in Indonesian health service satisfaction survey data, which assists in making decisions to improve health services. Since Bali is one of the provinces in Indonesia and all of those provinces refer to the same national regulation, health service satisfaction survey data that is in the Indonesian language (Bahasa) should have the same aspects, like category, priority, word-related matters (including abbreviations, acronyms, terminology), etc. that overall make it unique and need specific processing. That work was considered a contribution since there is no such study to the best of the author's knowledge and the foundation would be useful as a part of the future vision for the integrated system of Indonesian health big data. Since in reality, satisfaction survey data tends to be unbalanced, this study also compares the developed models using unigram and bigram features without and with feature selection (FS). Those features were then processed using the OvO MSVM and OvR MSVM models. k-fold cross-validation was used to divide training data and testing data and, at the same time, validate the models. Through experiments without and with FS, the OvO MSVM and OvR MSVM models with unigram features had better performance in general than the same models with bigram features. Without FS and with unigram features, comparable differences were found where the OvO MSVM model was slightly better on accuracy and precision, while the OvR MSVM model was slightly better on recall and the F1score. Without FS and with bigram features, comparable differences were also found, where the OvR MSVM model had slightly better performance than the OvO MSVM model. With FS and with unigram and bigram features, the OvR MSVM model had better performance in general than the OvO MSVM model.

 

Doi: 10.28991/HIJ-2022-03-04-05

Full Text: PDF


Keywords


Bahasa; Classification; Multi-Class; Satisfaction Survey; Support Vector Machine.

References


Indonesian Ministry of Education Culture Research and Technology. (2021). Research and Community Service Guide Book Edition XIII Revision. Jakarta, Indonesia. (In Indonesian).

Center for Public Health Innovation (CPHI). (2021). Denpasar Health Centers Satisfaction Survey Report. Denpasar, Indonesia. (In Indonesian)

Indonesian Ministry of Health. (2016). Regulation of the Minister of Health No. 39 of 2016 concerning Guidelines for Implementing the Healthy Indonesia Program with A Family Approach. Jakarta, Indonesia. (In Indonesian).

Indonesian Ministry of Health. (2019). Regulation of the Minister of Health No. 30 of 2019 concerning Hospital Classification and Licensing. Jakarta, Indonesia. (In Indonesian).

Mishbahuddin, B. (2020). Improving Hospital Health Service Management. Tangga Ilmu, Yogyakarta, Indonesia. (In Indonesian).

Sabilla, A. G. (2021). The Relationship between Quality of Health Services and Patient Satisfaction Levels Using BPJS at First Level Health Facilities. Medika Hutama, 3(1), 1654–1659. (In Indonesian).

Zhang, L., & Liu, B. (2017). Sentiment Analysis and Opinion Mining. Encyclopaedia of Machine Learning and Data Mining, Springer, Boston, United States. doi:10.1007/978-1-4899-7687-1_907.

Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425. doi:10.1109/72.991427.

Lei, H., & Govindaraju, V. (2005). Half-Against-Half Multi-class Support Vector Machines. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds) Multiple Classifier Systems, MCS 2005. Lecture Notes in Computer Science, 3541. Springer, Berlin, Germany. doi:10.1007/11494683_16.

Hsu, B. M. (2020). Comparison of supervised classification models on textual data. Mathematics, 8(5), 851. doi:10.3390/MATH8050851.

Polpinij, J., & Luaphol, B. (2021). Comparing of Multi-class Text Classification Methods for Automatic Ratings of Consumer Reviews. Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2021. Lecture Notes in Computer Science, 12832, Springer, Cham, Switzerland. doi:10.1007/978-3-030-80253-0_15.

Kharwar, A. R., & Thakor, D. V. (2021). An Ensemble Approach for Feature Selection and Classification in Intrusion Detection Using Extra-Tree Algorithm. International Journal of Information Security and Privacy, 16(1), 1–21. doi:10.4018/ijisp.2022010113.

Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences, 34(4), 1060–1073. doi:10.1016/j.jksuci.2019.06.012.

Wang, H., He, J., Zhang, X., & Liu, S. (2020). A Short Text Classification Method Based on N‐Gram and CNN. Chinese Journal of Electronics, 29(2), 248–254. doi:10.1049/cje.2020.01.00.

Perdana, A., Furqon, M. T., & Indriati, I. (2018). Application of the Support Vector Machine Algorithm in the Classification of Schizophrenia Mental Illness: A Study Case on RSJ. Radjiman Wediodiningrat, Lawang. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 2(9), 3162–3167. (In Indonesian).

Widyawati, W. & Sutanto, S. (2019). Comparison of the Naïve Bayes Algorithm and the Support Vector Machine in Indonesian SMS Spam Classification. Jurnal Ilmiah Sains Dan Teknologi, 3(2), 178–19. (In Indonesian).

Alita, D., Fernando, Y., & Sulistiani, H. (2020). Implementation of Multiclass SVM Algorithm on Indonesian Language Public Opinion on Twitter. Jurnal Tekno Kompak, 14(2), 86. doi:10.33365/jtk.v14i2.792. (In Indonesian).

Pangestu, D. A. (2020). Sentiment Analysis of Public Opinion on Mental Health During the Covid-19 Pandemic on Social Media Twitter Using the Naive Bayes Classifier and Support Vector Machine. Ph.D. Thesis, Universitas Islam Indonesia, Yogyakarta, Indonesia. (In Indonesian).

Hermanto, H., Mustopa, A., & Kuntoro, A. Y. (2020). Naive Bayes Classification and Support Vector Machine Algorithms in Student Complaint Services. Jurnal Ilmu Pengetahuan Dan Teknologi Komputer, 5(2), 211–220. doi:10.33480/jitk.v5i2.1181. (In Indonesian).

Fitriana, D. N. & Sibaroni, Y. (2022). Tweet Data Classification Using the Multi-Class Support Vector Machine Classification Method: A Case Study of PT. KAI. e-Proceeding of Engineering, 7(2), 8493–8505. doi:10.34818/eoe.v7i2.12746. (In Indonesian).

Dhammajoti, Young, J. C., & Rusli, A. (2020). A Comparison of Supervised Text Classification and Resampling Techniques for User Feedback in Bahasa Indonesia. 2020 Fifth International Conference on Informatics and Computing (ICIC), Gorontalo, Indonesia. doi:10.1109/icic50835.2020.9288588.

Sujadi, H. (2022). Analysis of the Sentiment of Twitter Social Media Users towards the Covid-19 Outbreak with the Naïve Bayes Classifier and Support Vector Machine . INFOTECH Journal, 8(1), 22–27. doi:10.31949/infotech.v8i1.1883. (In Indonesian).

Cikania, R. N. (2021). Implementation of the Naïve Bayes Classifier Algorithm and Support Vector Machine in the HALODOC Telemedicine Service Review Sentiment Classification. Jambura Journal of Probability and Statistics, 2(2), 96–104. doi:10.34312/jjps.v2i2.11364. (In Indonesian).

Sohrabi, M. K., & Hemmatian, F. (2019). An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: a twitter case study. Multimedia Tools and Applications, 78(17), 24863–24882. doi:10.1007/s11042-019-7586-4.

Malloy, B. A., & Power, J. F. (2019). An empirical analysis of the transition from Python 2 to Python 3. Empirical Software Engineering, 24(2), 751–778. doi:10.1007/s10664-018-9637-2.

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919. doi:10.48550/arXiv.1707.02919.

Wang, M., & Hu, F. (2021). The application of nltk library for python natural language processing in corpus research. Theory and Practice in Language Studies, 11(9), 1041–1049. doi:10.17507/tpls.1109.09.

Prakoso, R. (2017). Indonesian standard words for sentiment analysis. Available online: https://github.com/ramaprakoso/ analisis-sentimen (accessed on November 2022).

Indonesian Ministry of Health (2022). Dictionary of terms and definitions related to Health. Indonesian Ministry of Health, Jakarta, Indonesia. Available online: https://www.kemkes.go.id/folder/view/full-content/structure-kamus.html (accessed on November 2022). (In Indonesian).

Tala, F. (2003). A study of stemming effects on information retrieval in Bahasa Indonesia. Master Thesis, Universiteit van Amsterdam, Amsterdam, Netherlands.

Robbani, H. A. (2018). PySastrawi: Indonesian stemmer. Python port of PHP Sastrawi project. Available online: https://github.com/har07/PySastrawi (accessed on November 2022).

Carpenter, J. (2022). Swifter: A package which efficiently applies any function to a panda’s data frame or series in the fastest available manner. Available online: https://github.com/jmcarpenter2/swifter (accessed on November 2022).

Howard, B. E., Phillips, J., Miller, K., Tandon, A., Mav, D., Shah, M. R., Holmgren, S., Pelch, K. E., Walker, V., Rooney, A. A., Macleod, M., Shah, R. R., & Thayer, K. (2016). SWIFT-Review: A text-mining workbench for systematic review. Systematic Reviews, 5(1), 87. doi:10.1186/s13643-016-0263-z.

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of Massive Datasets. Cambridge University Press Cambridge, United Kingdom. doi:10.1017/9781108684163.

Jones, K. S. (2021). A Statistical Interpretation of Term Specificity and Its Application in Retrieval (1972). Ideas That Created the Future, 339–348, Cambridge, Massachusetts, United States. doi:10.7551/mitpress/12274.003.0037.

Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation, 60(5), 503–520. doi:10.1108/00220410410560582.

Uther, W. (2010). Encyclopedia of Machine Learning. Springer, Boston, United States. doi:10.1007/978-0-387-30164-8.

Wang, S., Tang, J., Liu, H. (2017). Feature Selection. Encyclopedia of Machine Learning and Data Mining. Springer, Boston, United States. doi:10.1007/978-1-4899-7687-1_101.

Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature Selection. ACM Computing Surveys, 50(6), 1–45. doi:10.1145/3136625.

Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. doi:10.1007/s10994-006-6226-1.

Sharaff, A., & Gupta, H. (2019). Extra-Tree Classifier with Metaheuristics Approach for Email Classification. Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, 924, Springer, Singapore. doi:10.1007/978-981-13-6861-5_17.

Ossai, C. I., & Wickramasinghe, N. (2022). GLCM and statistical features extraction technique with Extra-Tree Classifier in Macular Oedema risk diagnosis. Biomedical Signal Processing and Control, 73, 103471. doi:10.1016/j.bspc.2021.103471.

Ampomah, E. K., Qin, Z., & Nyame, G. (2020). Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information (Switzerland), 11(6), 332. doi:10.3390/info11060332.

Podder, P., Khamparia, A., Mondal, M. R. H., Rahman, M. A., & Bharati, S. (2021). Forecasting the Spread of COVID-19 and ICU Requirements. International Journal of Online and Biomedical Engineering, 17(05), 81. doi:10.3991/ijoe.v17i05.20009.

Arathi Krishna, V., Anusree, A., Jose, B., Anilkumar, K., & Lee, O. T. (2021). Phishing Detection using Extra Trees Classifier. IEEE, 5th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India. doi:10.1109/ISCON52037.2021.9702372.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825−2830.

Chang, C. C., & Lin, C. J. (2011). LIBSVM: A Library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27. doi:10.1145/1961189.1961199.

Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

Bharadwaj, Prakash, K.B., & Kanagachidambaresan, G.R. (2021). Pattern Recognition and Machine Learning. Programming with TensorFlow. EAI/Springer Innovations in Communication and Computing, Springer, Cham, Switzerland. doi:10.1007/978-3-030-57077-4_11.

Han, H., & Jiang, X. (2014). Overcome Support Vector Machine Diagnosis Overfitting. Cancer Informatics, 13(S1), CIN.S13875. doi:10.4137/cin.s13875.

Salam, M. A., Taher, A., Samy, M., & Mohamed, K. (2021). The Effect of Different Dimensionality Reduction Techniques on Machine Learning Overfitting Problem. International Journal of Advanced Computer Science and Applications, 12(4), 641-655. doi:10.14569/ijacsa.2021.0120480.

Tao, Z., Huiling, L., Wenwen, W., & Xia, Y. (2019). GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Applied Soft Computing Journal, 75, 323–332. doi:10.1016/j.asoc.2018.11.001.

Tatwani, S., & Kumar, E. (2019). A stable SVM-RFE feature selection method for gene expression data. International Journal of Engineering and Advanced Technology, 8(6), 2110–2115. doi:10.35940/ijeat.F8482.088619.

Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo, J. Y., Baker, J. A., & Tourassi, G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks, 21(2–3), 427–436. doi:10.1016/j.neunet.2007.12.031.


Full Text: PDF

DOI: 10.28991/HIJ-2022-03-04-05

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Gede Indrawan, Heri Setiawan, Aris Gunadi