Evaluating Household Consumption Patterns: Comparative Analysis Using Ordinary Least Squares and Random Forest Regression Models

En Lee, Thian Song Ong, Yvonne Lee


This research aims to decompose the contribution of socioeconomic factors towards household consumption expenditure using a regression approach, with log per capita expenditure as the dependent variable. Our study stands out as the first to utilise SHAP analysis and Machine Learning models to analyse household consumption expenditure. We select both OLS (linear) and Random Forest (nonlinear) models to compare how they estimate consumption expenditure differently. Both models explain about 85% of the variation in log per capita expenditure. The SHAP analysis reveals the nonlinear relationships inside the Random Forest model. Several insightful findings were suggested that can be integrated into current policy-making. The results are as follows: (1) Both models agree that income, household size, and educational level are major factors in the purchasing power of household heads. (2) The Random Forest model demonstrated a nonlinear contribution of age and household size towards log per capita expenditure, contrasting with previous studies that treated them as linear. (3) Household heads with a higher income and educational level tend to spend more. (4) Current policy should consider focusing on households with larger sizes and lower incomes, who tend to spend more despite earning less, primarily by assisting them with non-cash transfers and subsidies.


Doi: 10.28991/HIJ-2024-05-02-019

Full Text: PDF


Household Consumption; Machine Learning; Linear Regression; Random Forest; Shapley Value.


Akyelken, N. (2020). Urban conceptions of economic inequalities. Regional Studies, 54(6), 863–872. doi:10.1080/00343404.2020.1732902.

Saari, M. Y., Dietzenbacher, E., & Los, B. (2014). Production interdependencies and poverty reduction across ethnic groups in Malaysia. Economic Modelling, 42, 146–158. doi:10.1016/j.econmod.2014.06.008.

Todaro, M. P., & Smith, S. C. (2020). Economic Development. Pearson Hall, London, United Kingdom.

Wang, X. (2022). On the Relationship Between Income Poverty and Multidimensional Poverty in China. International Research on Poverty Reduction, 85–106. doi:10.1007/978-981-19-1189-7_5.

UNDP. (2023). Unstacking Global Poverty: Data for high impact action. In Global Multi-dimensional Poverty Index 2023. United Nations Development Programme, New York, United States. Available online: https://hdr.undp.org/content/2023-global-multidimensional-poverty-index-mpi#/indicies/MPI (accessed on March 2024).

Abdul Rahman, M., Sani, N. S., Hamdan, R., Ali Othman, Z., & Abu Bakar, A. (2021). A clustering approach to identify multidimensional poverty indicators for the bottom 40 percent group. PloS one, 16(8), e0255312. doi:10.1371/journal.pone.0255312.

Rani, M. S. A., Nordin, S. H. B., Chin Ching Lau, Lim, S. heng L., & Siow, Z. S. (2016). Rich Debt, Poor Debt: Assessing Household Indebtedness and Debt Repayment Capacity. BNM-BIS Conference on Financial Systems and the Real Economy, 91, 153–166.

Bhanoji Rao, V. V. (1981). Measurement of deprivation and poverty based on the proportion spent on food: An exploratory exercise. World Development, 9(4), 337–353. doi:10.1016/0305-750X(81)90081-4.

Kumar, T. K., Mallick, S., & Holla, J. (2009). Estimating consumption deprivation in India using survey data: A state-level rural - Urban analysis before and during reform period. Journal of Development Studies, 45(4), 441–470. doi:10.1080/00220380802265207.

Herrera, G. P., Constantino, M., Su, J. J., & Naranpanawa, A. (2023). The use of ICTs and income distribution in Brazil: A machine learning explanation using SHAP values. Telecommunications Policy, 47(8), 102598. doi:10.1016/j.telpol.2023.102598.

Hwang, Y., Lee, Y., & Fabozzi, F. J. (2023). Identifying household finance heterogeneity via deep clustering. Annals of Operations Research, 325(2), 1255–1289. doi:10.1007/s10479-022-04900-3.

Chowdhury, R. A., Ceballos-Sierra, F., & Sulaiman, M. (2023). Grow the pie, or have it? Using machine learning to impact heterogeneity in the Ultra-poor graduation model. Journal of Development Effectiveness, 1–20. doi:10.1080/19439342.2023.2276928.

Zeng, Q., & Chen, X. (2023). Identification of urban-rural integration types in China – an unsupervised machine learning approach. China Agricultural Economic Review, 15(2), 400–415. doi:10.1108/CAER-03-2022-0045.

Ang, W. C., & Cheah, Y. K. (2023). Inequalities in Consumption Expenditure on Pharmaceuticals: Evidence from Malaysia. International Journal of Social Determinants of Health and Health Services, 53(4), 528–538. doi:10.1177/27551938231170831.

Zawiah, W., Zin, W., & Nabilah, S. F. (1998). Malaysian Household Consumption Expenditure: Rural vs Urban. Department of Statistics, Malaysia, MyStats 2015 Conference Papers, Kuala Lumpur, Malaysia.

Ayyash, M., & Sek, S. K. (2020). Decomposing inequality in household consumption expenditure in Malaysia. Economies, 8(4), 83. doi:10.3390/economies8040083.

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, NIPS 2017, 30.

DOSM. (2020). Penemuan Utama - The Key Findings. Department of Statistics Malaysia (DOSM), Kuala Lumpur, Malaysia.

Moav, O., & Neeman, Z. (2012). Saving Rates and Poverty: The Role of Conspicuous Consumption and Human Capital. Economic Journal, 122(563), 933–956. doi:10.1111/j.1468-0297.2012.02516.x.

Full Text: PDF

DOI: 10.28991/HIJ-2024-05-02-019


  • There are currently no refbacks.

Copyright (c) 2024 En Lee, Thian Song Ong, Lean Ee Yvonne Lee