Outlier Detection in VPN Authentication Logs for Corporate Computer Networks Access using CRISP-DM

Nilo Legowo, Wilyu Mahendra Bad

Abstract


A Virtual Private Network (VPN) serves as a critical network access solution widely employed by corporations, enabling users to connect to company computer networks via a global infrastructure. Amid the ongoing Covid-19 pandemic, heightened reliance on computer network access has increased the vulnerability to data breaches by unauthorized parties. This necessitates a proactive approach from companies to safeguard data integrity, particularly by identifying abnormal access patterns and timestamps. This study aims to develop a model for detecting anomalous activities within authentication log data obtained from VPN usage. The dataset comprises log entries from September to November 2022, totaling 36,807 records, selected via a systematic sampling approach. Two key attributes, namely user ID and access time, are analyzed to trace access patterns. Employing the CRISP-DM method ensures a structured and efficient research process. The selection of the k value in the K-Nearest Neighbors (K-NN) method significantly impacts outlier detection and can be tailored to suit organizational requirements. By utilizing the K-Means algorithm for data clustering and K-NN for measuring inter-point distances, the study identifies outliers that warrant further investigation by the company. Integration of the proposed model into the company's big data platform facilitates real-time monitoring, enabling the security team to preemptively address potential threats and mitigate network access misuse. By enhancing awareness and responsiveness to information security risks, the model contributes to fortifying the company's cyber security posture amidst evolving digital landscapes.

 

Doi: 10.28991/HIJ-2024-05-04-016

Full Text: PDF


Keywords


Outlier Detection; Log VPN; K-Nearest Neighbors, K-Means; Data Mining; CRISP-DM.

References


Singh, K. K. V. V., & Gupta, H. (2016). A new approach for the security of VPN. ACM International Conference Proceeding Series, 04-05-March-2016(2016). doi:10.1145/2905055.2905219.

Alshalan, A., Pisharody, S., & Huang, D. (2016). A Survey of Mobile VPN Technologies. IEEE Communications Surveys and Tutorials, 18(2), 1177–1196. doi:10.1109/COMST.2015.2496624.

Smith, K. T., Martin, H. M., & Smith, L. M. (2014). Human trafficking: A global multi-billion dollar criminal industry. International Journal of Public Law and Policy, 4(3), 293-308. doi:10.1504/IJPLAP.2014.063006.

CNN Indonesia. (2020). The Difference between the Cases of Denny Siregar - Telkomsel vs. Tokopedia - Bukalapak. CNN Indonesia, Jakarta, Indonesia. Available online: https://www.cnnindonesia.com/teknologi/20200720073117-185-526519/beda-kasus-denny-siregar-telkomsel-vs-tokopedia-bukalapak (accessed on November 2024).

Wang, H., Bah, M. J., & Hammad, M. (2019). Progress in Outlier Detection Techniques: A Survey. IEEE Access, 7, 107964–108000. doi:10.1109/ACCESS.2019.2932769.

Gustientiedina, G., Adiya, M. H., & Desnelita, Y. (2019). Application of K-Means Algorithm for Drug Data Clustering. National Journal of Technology and Information Systems, 5(1), 17–24. doi:10.25077/teknosi.v5i1.2019.17-24.

Mandhare, H. C., & Idate, S. R. (2017). A comparative study of cluster based outlier detection, distance based outlier detection and density based outlier detection techniques. Proceedings of the 2017 International Conference on Intelligent Computing and Control Systems, ICICCS 2017, 931–935. doi:10.1109/ICCONS.2017.8250601.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Massachusetts, United States.

Dang, T. T., Ngan, H. Y. T., & Liu, W. (2015). Distance-based k-nearest neighbors outlier detection method in large-scale traffic data. International Conference on Digital Signal Processing, DSP, 2015-September, 507–510. doi:10.1109/ICDSP.2015.7251924.

Radovanović, M., Nanopoulos, A., & Ivanović, M. (2015). Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1369–1382. doi:10.1109/TKDE.2014.2365790.

Andrian, B., Simanungkalit, T., Budi, I., & Wicaksono, A. F. (2022). Sentiment Analysis on Customer Satisfaction of Digital Banking in Indonesia. International Journal of Advanced Computer Science and Applications, 13(3), 466–473. doi:10.14569/IJACSA.2022.0130356.

Ranjan, S., & Mishra, S. (2020). Comparative Sentiment Analysis of App Reviews. 2020 11th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2020. doi:10.1109/ICCCNT49239.2020.9225348.

Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (Eds.). (1996). Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, United States. doi:10.2307/1271414.

Gullo, F. (2015). From patterns in data to knowledge discovery: What data mining can do. Physics Procedia, 62, 18–22. doi:10.1016/j.phpro.2015.02.005.

Martinez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernandez-Orallo, J., Kull, M., Lachiche, N., Ramirez-Quintana, M. J., & Flach, P. (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8), 3048–3061. doi:10.1109/TKDE.2019.2962680.

Wirth, R. (2000). CRISP-DM: Towards a Standard Process Model for Data Mining. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, 24959, 29–39.

Bošnjak, Z., Grljević, O., & Bošnjak, S. (2009). CRISP-DM as a framework for discovering knowledge in small and medium sized enterprises’ data. Proceedings - 2009 5th International Symposium on Applied Computational Intelligence and Informatics, SACI 2009, 509–514. doi:10.1109/SACI.2009.5136302.

Osman, A. S. Data mining techniques: Review. International Journal of Data Science Research, 2(1), 1–4.

Kohout, J., & Pevny, T. (2015). Unsupervised detection of malware in persistent web traffic. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1757–1761. doi:10.1109/ICASSP.2015.7178272.

Mutua, N. M., & Matoušek, P. (2021). Outlier Detection in Smart Grid Communication. arXiv, preprint arXiv:2108.12781. doi:10.48550/arXiv.2108.12781.

Jones, P. J., James, M. K., Davies, M. J., Khunti, K., Catt, M., Yates, T., Rowlands, A. V., & Mirkes, E. M. (2020). FilterK: A new outlier detection method for k-means clustering of physical activity. Journal of Biomedical Informatics, 104, 103397. doi:10.1016/j.jbi.2020.103397.

Kiani, R., Keshavarzi, A., & Bohlouli, M. (2020). Detection of Thin Boundaries between Different Types of Anomalies in Outlier Detection Using Enhanced Neural Networks. Applied Artificial Intelligence, 34(5), 345–377. doi:10.1080/08839514.2020.1722933.

Zhang, Y., Meratnia, N., & Havinga, P. (2010). Outlier detection techniques for wireless sensor networks: A survey. IEEE Communications Surveys and Tutorials, 12(2), 159–170. doi:10.1109/SURV.2010.021510.00088.

Gupta, M., Gao, J., Aggarwal, C. C., & Han, J. (2014). Outlier Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), 2250–2267. doi:10.1109/TKDE.2013.184.

Zhao, S., Li, W., & Cao, J. (2018). A user-adaptive algorithm for activity recognition based on K-means clustering, local outlier factor, and multivariate gaussian distribution. Sensors (Switzerland), 18(6), 1850. doi:10.3390/s18061850.

Chapple, M. J., Chawla, N., & Striegel, A. (2007). Authentication anomaly detection: A case study on a virtual private network. MineNet’07: Proceedings of the Third Annual ACM Workshop on Mining Network Data, 17–22. doi:10.1145/1269880.1269886.


Full Text: PDF

DOI: 10.28991/HIJ-2024-05-04-016

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Nilo Legowo, Wilyu Mahendra Bad