Mathematical Approaches and Algorithms in Big Data Architecture and Hybrid System Efficiency

Hybrid Big Data Architecture Apache Spark RDD DAG Fault Tolerance Scalability

Authors

Downloads

This article presents a formal demonstration of a hybrid big data processing architecture that combines the fault tolerance and storage robustness of Hadoop with the speed and in-memory processing capabilities of Apache Spark. The proposed architecture is evaluated through test execution and performance benchmarking in real-world data centers across three regions in Kazakhstan. The model integrates distributed resource management components, Directed Acyclic Graph (DAG)-based scheduling mechanism, and Resilient Distributed Datasets (RDDs) to enable dynamic workload distribution and rapid failure recovery. The results demonstrate that the hybrid system consistently outperforms standalone Spark and Hadoop architectures under variable workloads, illustrating enhancements in execution time, task recovery, and resource utilization. Quantitative performance metrics allow for a structured comparison of architectures and help optimize deployments for diverse scenarios. The proposed hybrid architecture shows significant improvements, reducing average execution time by up to 38% and increasing resource efficiency by 25% compared to standalone Spark and Hadoop systems.