Smart Data Placement Strategy in Heterogeneous Hadoop
Abstract
Doi: 10.28991/HIJ-2025-06-01-03
Full Text: PDF
Keywords
References
Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 42–47. doi:10.1109/CTS.2013.6567202.
Gong, C., Liu, J., Zhang, Q., Chen, H., & Gong, Z. (2010). The characteristics of cloud computing. Proceedings of the International Conference on Parallel Processing Workshops, 275–279. doi:10.1109/ICPPW.2010.45.
White, T. (2012). Hadoop: The Definitive Guide. O'Reilly Media, California, United States.
Khezr, S. N., & Navimipour, N. J. (2017). MapReduce and Its Applications, Challenges, and Architecture: a Comprehensive Review and Directions for Future Research. Journal of Grid Computing, 15(3), 295–321. doi:10.1007/s10723-017-9408-0.
Dev, D., & Patgiri, R. (2015). Performance evaluation of HDFS in big data management. 2014 International Conference on High Performance Computing and Applications, ICHPCA 2014, 9, 1–7. doi:10.1109/ICHPCA.2014.7045330.
Shah, A., & Padole, M. (2018). Load Balancing through Block Rearrangement Policy for Hadoop Heterogeneous Cluster. 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018, 230–236. doi:10.1109/ICACCI.2018.8554404.
Lee, C. W., Hsieh, K. Y., Hsieh, S. Y., & Hsiao, H. C. (2014). A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments. Big Data Research, 1, 14–22. doi:10.1016/j.bdr.2014.07.002.
Reddy, K. H. K., Pandey, V., & Roy, D. S. (2019). A novel entropy-based dynamic data placement strategy for data intensive applications in Hadoop clusters. International Journal of Big Data Intelligence, 6(1), 20. doi:10.1504/ijbdi.2019.097395.
Shithil, S. M., Saha, T. K., & Sharma, T. (2017). A dynamic data placement policy for heterogeneous Hadoop cluster. 4th International Conference on Advances in Electrical Engineering, ICAEE 2017, 302–307. doi:10.1109/ICAEE.2017.8255371.
Bae, M., Yeo, S., Park, G., & Oh, S. (2021). Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments. Concurrency and Computation: Practice and Experience, 33(18), 5752. doi:10.1002/cpe.5752.
Xiong, R., Luo, J., & Dong, F. (2015). SLDP: A Novel Data Placement Strategy for Large-Scale Heterogeneous Hadoop Cluster. Proceedings - 2014 2nd International Conference on Advanced Cloud and Big Data, CBD 2014, 158, 9–17. doi:10.1109/CBD.2014.57.
Liu, Y., Wu, C. Q., Wang, M., Hou, A., & Wang, Y. (2018). On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters. 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018, 5, 1–7. doi:10.1109/ISNCC.2018.8530970.
Xiong, R., Du, Y., Jin, J., & Luo, J. (2018). HaDaap: a hotness‐aware data placement strategy for improving storage efficiency in heterogeneous Hadoop clusters. Concurrency and Computation: Practice and Experience, 30(20), e4830. doi:10.1002/cpe.4830.
Eltabakh, M. Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., & McPherson, J. (2011). CoHadoop: Flexible data placement and its exploitation in Hadoop. Proceedings of the VLDB Endowment, 4(9), 575–585. doi:10.14778/2002938.2002943.
Hussain, M. W., & Roy, D. S. (2022). A Counter-Based Profiling Scheme for Improving Locality through Data and Reducer Placement. Intelligent Systems Reference Library, 218, 101–118. doi:10.1007/978-981-16-8930-7_4.
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., & Baldeschwieler, E. (2013). Apache hadoop YARN: Yet another resource negotiator. Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013, 1–16. doi:10.1145/2523616.2523633.
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., & Qin, X. (2010). Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Ph.D. Forum, IPDPSW 2010, 1–9. doi:10.1109/IPDPSW.2010.5470880.
Vengadeswaran, S., Balasundaram, S. R., & Dhavakumar, P. (2024). IDaPS — Improved data-locality aware data placement strategy based on Markov clustering to enhance MapReduce performance on Hadoop. Journal of King Saud University - Computer and Information Sciences, 36(3), 101973. doi:10.1016/j.jksuci.2024.101973.
Kumar, K. A., Deshpande, A., & Khuller, S. (2013). Data placement and replica selection for improving co-location in distributed environments. arXiv preprint, arXiv:1302.4168. doi:10.48550/arXiv.1302.4168.
Wu, J. xuan, Zhang, C. sheng, Zhang, B., & Wang, P. (2016). A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop. Microprocessors and Microsystems, 47, 161–169. doi:10.1016/j.micpro.2016.07.011.
Qureshi, N. M. F., & Shin, D. R. (2016). RDP: A storage-tier-aware robust data placement strategy for hadoop in a cloud-based heterogeneous environment. KSII Transactions on Internet and Information Systems, 10(9), 4063–4086. doi:10.3837/tiis.2016.09.003.
Liu, J., Xie, M., Chen, S., Xu, G., Wu, T., & Li, W. (2023). TS-REPLICA: A novel replica placement algorithm based on the entropy weight TOPSIS method in spark for multimedia data analysis. Information Sciences, 626, 133–148. doi:10.1016/j.ins.2023.01.049.
Vengadeswaran, S., & Balasundaram, S. R. (2020). CLUST - Grouping aware data placement for improving the performance of large-scale data management system. ACM International Conference Proceeding Series, 1–9. doi:10.1145/3371158.3371159.
Liu, L., Song, J., Wang, H., & Lv, P. (2016). BRPS: A Big Data Placement Strategy for Data Intensive Applications. IEEE International Conference on Data Mining Workshops, ICDMW, 813–820. doi:10.1109/ICDMW.2016.0120.
Ciritoglu, H. E., Saber, T., Buda, T. S., Murphy, J., & Thorpe, C. (2018). Towards a Better Replica Management for Hadoop Distributed File System. Proceedings - 2018 IEEE International Congress on Big Data, BigData Congress 2018 - Part of the 2018 IEEE World Congress on Services, 104–111. doi:10.1109/BigDataCongress.2018.00021.
Ciritoglu, H. E., Murphy, J., & Thorpe, C. (2019). HaRD: a heterogeneity-aware replica deletion for HDFS. Journal of Big Data, 6(1), 1-21. doi:10.1186/s40537-019-0256-6.
Dai, W., Ibrahim, I., & Bassiouni, M. (2016). A New Replica Placement Policy for Hadoop Distributed File System. Proceedings - 2nd IEEE International Conference on Big Data Security on Cloud, IEEE BigDataSecurity 2016, 2nd IEEE International Conference on High Performance and Smart Computing, IEEE HPSC 2016 and IEEE International Conference on Intelligent Data and Security, IEEE IDS 2016, 262–267. doi:10.1109/BigDataSecurity-HPSC-IDS.2016.30.
Bui, D. M., Hussain, S., Huh, E. N., & Lee, S. (2016). Adaptive Replication Management in HDFS Based on Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 28(6), 1369–1382. doi:10.1109/TKDE.2016.2523510.
Ahmed, M. A., Khafagy, M. H., Shaheen, M. E., & Kaseb, M. R. (2023). Dynamic Replication Policy on HDFS Based on Machine Learning Clustering. IEEE Access, 11, 18551–18559. doi:10.1109/ACCESS.2023.3247190.
Fazul, R. W. A., & Barcelos, P. P. (2022). An event-driven strategy for reactive replica balancing on apache hadoop distributed file system. Proceedings of the ACM Symposium on Applied Computing, 255–263. doi:10.1145/3477314.3507311.
Zayed, N. A., Saleh, Y. N. M., Aboelfarag, A. A., & Shaheen, M. A. (2024). Optimizing Hadoop Distributed File System Replication Policies with Predictive Categorization. ACM International Conference Proceeding Series, 26–32. doi:10.1145/3694860.3694864.
He, Q., Zhang, F., Bian, G., Zhang, W., Li, Z., & Chen, C. (2023). Dynamic decision-making strategy of replica number based on data hot. Journal of Supercomputing, 79(9), 9584–9603. doi:10.1007/s11227-022-05029-7.
He, Q., Zhang, F., Bian, G., Zhang, W., Li, Z., Yu, Z., & Feng, H. (2024). File block multi-replica management technology in cloud storage. Cluster Computing, 27(1), 457–476. doi:10.1007/s10586-022-03952-1.
Wang, Z., Li, T., Xiong, N., & Pan, Y. (2012). A novel dynamic network data replication scheme based on historical access record and proactive deletion. Journal of Supercomputing, 62(1), 227–250. doi:10.1007/s11227-011-0708-z.
DOI: 10.28991/HIJ-2025-06-01-03
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Nour-Eddine BAKNI, Ismail ASSAYAD