Mathematical Approaches and Algorithms in Big Data Architecture and Hybrid System Efficiency
Downloads
This article presents a formal demonstration of a hybrid big data processing architecture that combines the fault tolerance and storage robustness of Hadoop with the speed and in-memory processing capabilities of Apache Spark. The proposed architecture is evaluated through test execution and performance benchmarking in real-world data centers across three regions in Kazakhstan. The model integrates distributed resource management components, Directed Acyclic Graph (DAG)-based scheduling mechanism, and Resilient Distributed Datasets (RDDs) to enable dynamic workload distribution and rapid failure recovery. The results demonstrate that the hybrid system consistently outperforms standalone Spark and Hadoop architectures under variable workloads, illustrating enhancements in execution time, task recovery, and resource utilization. Quantitative performance metrics allow for a structured comparison of architectures and help optimize deployments for diverse scenarios. The proposed hybrid architecture shows significant improvements, reducing average execution time by up to 38% and increasing resource efficiency by 25% compared to standalone Spark and Hadoop systems.
Downloads
[1] Abid, A., Jemili, F., & Korbaa, O. (2023). Distributed deep learning approach for intrusion detection system in industrial control systems based on big data technique and transfer learning. Journal of Information and Telecommunication, 7(4), 513–541. doi:10.1080/24751839.2023.2239617.
[2] Alghazzawi, D., Razaq, A., Alolaiyan, H., Noor, A., Khalifa, H. A. E. W., & Xin, Q. (2024). Selecting the foremost big data tool to optimize YouTube data in dynamic Fermatean fuzzy knowledge. PLoS ONE, 19(8), 0307381. doi:10.1371/journal.pone.0307381.
[3] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. doi:10.1145/1327452.1327492.
[4] Domenteanu, A., Cibu, B., & Delcea, C. (2024). Mapping the Research Landscape of Industry 5.0 from a Machine Learning and Big Data Analytics Perspective: A Bibliometric Approach. Sustainability (Switzerland) , 16(7), 2764. doi:10.3390/su16072764.
[5] Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., ... & Stoica, I. (2012). Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing. 9th USENIX symposium on networked systems design and implementation (NSDI 12), 25-27, 2017, San Jose, United States.
[6] Du, G. (2024). Design and Implementation of Teaching Quality Assessment System for Universities Based on Data Mining Algorithms. Journal of Electrical Systems, 20(6s), 1811–1822. doi:10.52783/jes.3098.
[7] EL Azzaoui, A., Salim, M. M., & Park, J. H. (2023). Secure and Reliable Big-Data-Based Decision Making Using Quantum Approach in IIoT Systems. Sensors, 23(10), 4852. doi:10.3390/s23104852.
[8] Deshai, N., Venkataramana, S., Sekhar, B.V.D.S., Srinivas, K., & Saradhi Varma, G.P. (2020). A Study on Big Data Processing Frameworks: Spark and Storm. Smart Intelligent Computing and Applications. Smart Innovation, Systems and Technologies, vol 160, Springer, Singapore. doi:10.1007/978-981-32-9690-9_43.
[9] Dos Anjos, J. C. S., Matteussi, K. J., De Souza, P. R. R., Grabher, G. J. A., Borges, G. A., Barbosa, J. L. V., González, G. V., Leithardt, V. R. Q., & Geyer, C. F. R. (2020). Data processing model to perform big data analytics in hybrid infrastructures. IEEE Access, 8, 170281–170294. doi:10.1109/ACCESS.2020.3023344.
[10] Barik, R. K., Misra, C., Lenka, R. K., Dubey, H., & Mankodiya, K. (2019). Hybrid mist-cloud systems for large scale geospatial big data analytics and processing: opportunities and challenges. Arabian Journal of Geosciences, 12(2), 32. doi:10.1007/s12517-018-4104-3.
[11] Ahmad, I., Wan, Z., Ahmad, A., & Ullah, S. S. (2024). A Hybrid Optimization Model for Efficient Detection and Classification of Malware in the Internet of Things. Mathematics, 12(10), 1437. doi:10.3390/math12101437.
[12] Suma, S., Mehmood, R., & Albeshri, A. (2020). Automatic Detection and Validation of Smart City Events Using HPC and Apache Spark Platforms. Smart Infrastructure and Applications. EAI/Springer Innovations in Communication and Computing, Springer, Cham, Switzerland. doi:10.1007/978-3-030-13705-2_3.
[13] Singh, A., Mittal, M., & Kapoor, N. (2019). Data Processing Framework Using Apache and Spark Technologies in Big Data. Big Data Processing Using Spark in Cloud. Studies in Big Data, vol 43, Springer, Singapore. doi:10.1007/978-981-13-0550-4_5.
[14] Al, S., & Dener, M. (2021). STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment. Computers & Security, 110, 102435. doi:10.1016/j.cose.2021.102435.
[15] Guerrero-Prado, J. S., Alfonso-Morales, W., Caicedo-Bravo, E., Zayas-Pérez, B., & Espinosa-Reza, A. (2020). The power of big data and data analytics for AMI data: A case study. Sensors (Switzerland), 20(11), 1–27. doi:10.3390/s20113289.
[16] Ali, M., Razaque, A., Yoo, J., Kabievna, U. R., Moldagulova, A., Ryskhan, S., Zhuldyz, K., & Kassymova, A. (2024). Designing an Intelligent Scoring System for Crediting Manufacturers and Importers of Goods in Industry 4.0. Logistics, 8(1), 33. doi:10.3390/logistics8010033.
[17] Peres, R.S., Rocha, A.D., Coelho, A., & Barata Oliveira, J. (2017). A Highly Flexible, Distributed Data Analysis Framework for Industry 4.0 Manufacturing Systems. Service Orientation in Holonic and Multi-Agent Manufacturing, SOHOMA 2016. Studies in Computational Intelligence, vol 694. Springer, Cham, Switzerland. doi:10.1007/978-3-319-51100-9_33.
[18] Sansyzbay, K. M., Bakhtiyarova, Y. A., Iliev, T., Patokin, G. S., Tasbolatova, L. T., & Sagmedinov, D. B. (2024). Development of an Algorithm for a National Microprocessor-Based Centralization System With a Modular Architecture KZ-MPC-MA Featuring Advanced Intelligent Control Functions. IEEE Access, 12, 193229–193240. doi:10.1109/ACCESS.2024.3521219.
[19] Dahiya, R., Le, S., Ring, J. K., & Watson, K. (2022). Big data analytics and competitive advantage: the strategic role of firm-specific knowledge. Journal of Strategy and Management, 15(2), 175–193. doi:10.1108/JSMA-08-2020-0203.
[20] Kabashkin, I. (2024). Digital Twin Framework for Aircraft Lifecycle Management Based on Data-Driven Models. Mathematics, 12(19), 2979. doi:10.3390/math12192979.
[21] Sarinova, A., Bekbayeva, A., Dunayev, P., Sarsikeyev, Y., & Sansyzbay, K. (2021). Hyperspectral image compression algorithms for phytosanitary inspection of agricultural crops in aerospace photography. Journal of Theoretical and Applied Information Technology, 99(24), 6280-6290.
[22] Liu, S., Liu, O., & Chen, J. (2023). A Review on Business Analytics: Definitions, Techniques, Applications and Challenges. Mathematics, 11(4), 899. doi:10.3390/math11040899.
[23] Lychev, A. V. (2023). Synthetic Data Generation for Data Envelopment Analysis. Data, 8(10), 146. doi:10.3390/data8100146.
[24] Mahmoud, M. (2024). Editorial for the Special Issue “Data Science and Big Data in Biology, Physical Science and Engineering.” Technologies, 12(1), 8. doi:10.3390/technologies12010008.
[25] Cheng, Z., Chow, M. Y., Jung, D., & Jeon, J. (2017). A big data based deep learning approach for vehicle speed prediction. IEEE International Symposium on Industrial Electronics, 389–394. doi:10.1109/ISIE.2017.8001278.
[26] Shukla, S., Balachandran K, & Sumitha V S. (2016). A framework for smart transportation using Big Data. 2016 International Conference on ICT in Business Industry & Government (ICTBIG), 1–3. doi:.1109/ictbig.2016.7892720.
[27] Dunayev, P., Abramov, S., Sansyzbay, K., & Kismanova, A. (2021). The IP channel bandwidth during transmission of the video and tomography signals. Journal of Theoretical and Applied Information Technology, 99(12), 2834–2859.
[28] Arfat, Y., Suma, S., Mehmood, R., & Albeshri, A. (2020). Parallel Shortest Path Big Data Graph Computations of US Road Network Using Apache Spark: Survey, Architecture, and Evaluation. Smart Infrastructure and Applications. EAI/Springer Innovations in Communication and Computing. Springer, Cham, Switzerland. doi:10.1007/978-3-030-13705-2_8.
[29] Rashid, A. N. M. B., Ahmed, M., & Ullah, A. B. (2022). Data Lakes: A Panacea for Big Data Problems, Cyber Safety Issues, and Enterprise Security. Next-Generation Enterprise Security and Governance, 135–162, CRC Press, Boca Raton, United States. doi:10.1201/9781003121541-6.
[30] Martinez-Mosquera, D., Navarrete, R., Luján-Mora, S., Recalde, L., & Andrade-Cabrera, A. (2024). Integrating OLAP with NoSQL Databases in Big Data Environments: Systematic Mapping. Big Data and Cognitive Computing, 8(6), 64. doi:10.3390/bdcc8060064.
[31] Farhan, M. S., Youssef, A., & Abdelhamid, L. (2024). A Model for Enhancing Unstructured Big Data Warehouse Execution Time. Big Data and Cognitive Computing, 8(2), 17. doi:10.3390/bdcc8020017.
[32] Lei Yu, Yunyun Zhu, W. M. (2024). Quality Improvement Model of English Teaching in Universities Based on Big Data Mining. Journal of Electrical Systems, 20(3s), 506–518. doi:10.52783/jes.1322.
[33] Grădinaru, G. I., Dinu, V., Rotaru, C. L., & Toma, A. (2024). The Development of Educational Competences for Romanian Students in the Context of the Evolution of Data Science and Artificial Intelligence. Amfiteatru Economic, 26(65), 14–32. doi:10.24818/EA/2024/65/14.
[34] Khalid, M., & Yousaf, M. M. (2021). A comparative analysis of big data frameworks: An adoption perspective. Applied Sciences (Switzerland), 11(22), 11033. doi:10.3390/app112211033.
[35] Arif, Z., & Zeebaree, S. R. (2024). Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management. The Indonesian Journal of Computer Science, 13(2), 3819. doi:10.33022/ijcs.v13i2.3819.
[36] Gupta, D., & Rani, R. (2018). A study of big data evolution and research challenges. Journal of Information Science, 45(3), 322–340. doi:10.1177/0165551518789880.
[37] Ataie, E., Evangelinou, A., Gianniti, E., & Ardagna, D. (2022). A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications. Computer Journal, 65(12), 3123–3140. doi:10.1093/comjnl/bxab131.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.





















