A Self-Adaptive Weights for K-Means Classification Algorithm
Downloads
This paper presents an improved K-means clustering algorithm that addresses the traditional algorithm’s sensitivity to outlier and susceptibility to local optima by introducing an adaptive weight adjustment mechanism. It employs an exponential decay function to dynamically reduce the feature weights of outlier data points, effectively suppressing outliers while preserving the structure of the normal data. The proposed method retains the computational efficiency of standard K-means. Key contributions include: (a) A novel distance-based weighting strategy that progressively reduces the influence of noisy points, mitigating the impact of outliers on clustering performance. (b) An innovative form of "local dimensionality reduction" for outlier points via weight decay, which interferes only with the feature space of noisy regions while preserving the global topological structure of clean data. Extensive experiments on three benchmark datasets Iris (4-dimensional, balanced classes), Wine (13-dimensional, correlated features), and Wisconsin Breast Cancer Diagnosis (30-dimensional, imbalanced data) demonstrate the effectiveness of the approach. Compared to standard K-means, the proposed algorithm achieves accuracy improvements of 7.47% on Iris, 13.89% on Wine, and 19% on WBCD. This adaptive strategy offers a practical and efficient solution for clustering in noisy, high-dimensional environments, without the added complexity of mixture models.
Downloads
[1] Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: application and trends. Artificial Intelligence Review, 56(7), 6439–6475. doi:10.1007/s10462-022-10325-y.
[2] Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science, 2(3). doi:10.1007/s42979-021-00592-x.
[3] Han, J., Pei, J., & Tong, H. (2022). Data mining: concepts and techniques. Morgan kaufmann. Burlington, United States.
[4] Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178–210. doi:10.1016/j.ins.2022.11.139.
[5] Dwivedi, Y. K., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., Duan, Y., Dwivedi, R., Edwards, J., Eirug, A., Galanos, V., Ilavarasan, P. V., Janssen, M., Jones, P., Kar, A. K., Kizgin, H., Kronemann, B., Lal, B., Lucini, B., … Williams, M. D. (2021). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, 101994. doi:10.1016/j.ijinfomgt.2019.08.002.
[6] Hosseinzadeh, M., Koohpayehzadeh, J., Bali, A. O., Asghari, P., Souri, A., Mazaherinezhad, A., Bohlouli, M., & Rawassizadeh, R. (2021). A diagnostic prediction model for chronic kidney disease in internet of things platform. Multimedia Tools and Applications, 80(11), 16933–16950. doi:10.1007/s11042-020-09049-4.
[7] Wallace, W., Chan, C., Chidambaram, S., Hanna, L., Iqbal, F. M., Acharya, A., Normahani, P., Ashrafian, H., Markar, S. R., Sounderajah, V., & Darzi, A. (2022). The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digital Medicine, 5(1), 118. doi:10.1038/s41746-022-00667-w.
[8] Hamzehi, M., & Hosseini, S. (2022). Business intelligence using machine learning algorithms. Multimedia Tools and Applications, 81(23), 33233–33251. doi:10.1007/s11042-022-13132-3.
[9] Djabalul Lael, T. A., & Pramudito, D. A. (2023). Use of Data Mining for The Analysis of Consumer Purchase Patterns with The Fpgrowth Algorithm on Motor Spare Part Sales Transactions Data. IAIC Transactions on Sustainable Digital Innovation (ITSDI), 4(2), 128–136. doi:10.34306/itsdi.v4i2.582.
[10] Santoso, M. H. (2021). Application of Association Rule Method Using Apriori Algorithm to Find Sales Patterns Case Study of Indomaret Tanjung Anom. Brilliance: Research of Artificial Intelligence, 1(2), 54–66. doi:10.47709/brilliance.v1i2.1228.
[11] Selim, S. Z., & Ismail, M. A. (1984). K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1), 81–87. doi:10.1109/tpami.1984.4767478.
[12] Veenman, C. J., Reinders, M. J. T., & Backer, E. (2002). A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1273–1280. doi:10.1109/TPAMI.2002.1033218.
[13] Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., & Han, X. (2021). A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Information Sciences, 572, 574–589. doi:10.1016/j.ins.2021.02.056.
[14] Sieranoja, S., & Fränti, P. (2022). Adapting k-means for graph clustering. Knowledge and Information Systems, 64(1), 115–142. doi:10.1007/s10115-021-01623-y.
[15] Gupta, A. K., Seal, A., Khanna, P., Krejcar, O., & Yazidi, A. (2021). AWkS: adaptive, weighted k-means-based superpixels for improved saliency detection. Pattern Analysis and Applications, 24(2), 625–639. doi:10.1007/s10044-020-00925-1.
[16] Zhang, H., Li, J., Zhang, J., & Dong, Y. (2024). Speeding up k-means clustering in high dimensions by pruning unnecessary distance computations. Knowledge-Based Systems, 284, 111262. doi:10.1016/j.knosys.2023.111262.
[17] Yang, X., Zhao, W., Xu, Y., Wang, C. D., Li, B., & Nie, F. (2024). Sparse K-means clustering algorithm with anchor graph regularization. Information Sciences, 667, 120504. doi:10.1016/j.ins.2024.120504.
[18] Wan, B., Huang, W., Pierre, B., Cheng, Y., & Zhou, S. (2024). K-Means algorithm based on multi-feature-induced order. Granular Computing, 9(2), 45. doi:10.1007/s41066-024-00470-w.
[19] Kouadio, K. L., Liu, J., Liu, R., Wang, Y., & Liu, W. (2024). K-Means Featurizer: A booster for intricate datasets. Earth Science Informatics, 17(2), 1203–1228. doi:10.1007/s12145-024-01236-3.
[20] Qtaish, A., Braik, M., Albashish, D., Alshammari, M. T., Alreshidi, A., & Alreshidi, E. J. (2024). Optimization of K-means clustering method using hybrid capuchin search algorithm. Journal of Supercomputing, 80(2), 1728–1787. doi:10.1007/s11227-023-05540-5.
[21] Liu, Z., Qiu, H., & Letchmunan, S. (2024). Self-adaptive attribute weighted neutrosophic c-means clustering for biomedical applications. Alexandria Engineering Journal, 96, 42–57. doi:10.1016/j.aej.2024.03.092.
[22] Zhou, Q., & Sun, B. (2024). Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem. Data and Information Management, 8(3), 100064. doi:10.1016/j.dim.2023.100064.
[23] Li, H., Sugasawa, S., & Katayama, S. (2024). Adaptively Robust and Sparse K-means Clustering. arXiv Preprint, arXiv:2407.06945. doi:10.48550/arXiv.2407.06945.
[24] An, L., Sun X. H., & Wang, Y. (2024). K-Means Clustering Algorithm Based on Improved Differential Evolution. Information Dynamics and Applications, 3(3), 200-210. doi:10.56578/ida030305.
[25] Long, J., & Liu, L. (2025). K*-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries. International Journal of Parallel Programming, 53(1), 1–27. doi:10.1007/s10766-024-00779-8.
[26] Zhang, Z., Chen, X., Wang, C., Wang, R., Song, W., & Nie, F. (2025). Structured multi-view k-means clustering. Pattern Recognition, 160, 111113. doi:10.1016/j.patcog.2024.111113.
[27] Pei, S., Sun, Y., Nie, F., Jiang, X., & Zheng, Z. (2025). Adaptive Graph K-Means. Pattern Recognition, 161, 111226. doi:10.1016/j.patcog.2024.111226.
[28] Ran, X., Suyaroj, N., Tepsan, W., Ma, J., Zhou, X., & Deng, W. (2024). A hybrid genetic-fuzzy ant colony optimization algorithm for automatic K-means clustering in urban global positioning system. Engineering Applications of Artificial Intelligence, 137, 109237. doi:10.1016/j.engappai.2024.109237.
[29] Abdullah, A. A., Ahmed, A. M., Rashid, T., Veisi, H., Rassul, Y. H., Hassan, B., ... & Shamsaldin, A. S. (2024). Advanced clustering techniques for speech signal enhancement: A review and metanalysis of fuzzy c-means, k-means, and kernel fuzzy c-means methods. arXiv preprint arXiv:2409.19448. doi:10.48550/arXiv.2409.19448.
[30] Huang, Z., Zheng, H., Li, C., & Che, C. (2024). Application of Machine Learning-Based K-means Clustering for Financial Fraud Detection. Academic Journal of Science and Technology, 10(1), 33–39. doi:10.54097/74414c90.
[31] Hassan, M. M., Eesa, A. S., Mohammed, A. J., & Arabo, W. K. (2021). Oversampling method based on gaussian distribution and K-means clustering. Computers, Materials and Continua, 69(1), 451–469. doi:10.32604/cmc.2021.018280.
[32] Ay, M., Özbakır, L., Kulluk, S., Gülmez, B., Öztürk, G., & Özer, S. (2023). FC-Kmeans: Fixed-centered K-means algorithm. Expert Systems with Applications, 211, 118656. doi:10.1016/j.eswa.2022.118656.
[33] Chenghu, C., & Thammano, A. (2024). A Novel Classification Model Based on Hybrid K-Means and Neural Network for Classification Problems. HighTech and Innovation Journal, 5(3), 716–729. doi:10.28991/HIJ-2024-05-03-012.
[34] Celebi, M. E. (Ed.). (2015). Partitional Clustering Algorithms. Springer, Cham, Switzerland. doi:10.1007/978-3-319-09259-1.
[35] MacQueen, J. (1967). Multivariate observations. Proceedings ofthe 5th Berkeley Symposium on Mathematical Statisticsand Probability, Vol. 1, 281-297, University of California Press, Berkeley, United States.
[36] Murtagh, F., & Contreras, P. (2011). Algorithms for hierarchical clustering: an overview. WIREs Data Mining and Knowledge Discovery, 2(1), 86–97. doi:10.1002/widm.53.
[37] Kriegel, H., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. WIREs Data Mining and Knowledge Discovery, 1(3), 231–240. doi:10.1002/widm.30
[38] Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. doi:10.48550/arXiv.2010.16061.
[39] Naidu, G., Zuva, T., Sibanda, E.M. (2023). A Review of Evaluation Metrics in Machine Learning Algorithms. Artificial Intelligence Application in Networks and Systems. CSOC 2023. Lecture Notes in Networks and Systems, Vol 724. Springer, Cham, Switzerland. doi:10.1007/978-3-031-35314-7_2.
[40] Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. Advances in Artificial Intelligence, AI 2006, Lecture Notes in Computer Science, vol 4304, Springer, Berlin, Germany. doi:10.1007/11941439_114.
[41] Tharwat, A. (2020). Classification assessment methods. Applied Computing and Informatics, 17(1), 168–192. doi:10.1016/j.aci.2018.08.003.
[42] Aeberhard, S. & Forina, M. (1992). Wine [Dataset]. UCI Machine Learning Repository, Noida, India. doi:10.24432/C5PC7J.
[43] Wong, T.-T., & Yeh, P.-Y. (2019). Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1586–1594. doi:10.1109/tkde.2019.2912815.
[44] Fushiki, T. (2011). Estimation of prediction error by using K-fold cross-validation. Statistics and Computing, 21(2), 137–146. doi:10.1007/s11222-009-9153-8.
[45] Kindelan, R., Frías, J., Cerda, M., & Hitschfeld, N. (2021). Classification based on topological data analysis. doi:10.48550/arXiv.2102.03709.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.





















