Interpretable and Uncertainty-Aware Multi-Modal Spatio-Temporal Deep Learning Framework for Regional Economic Forecasting

Regional Economic Forecasting Spatio-Temporal Deep Learning Multi-Modal Fusion Uncertainty Quantification

Authors

Downloads

The objective of this study is to improve the accuracy, interpretability, and reliability of regional economic forecasting, a task essential for effective policy-making, infrastructure planning, and crisis management. Existing econometric and machine learning models often suffer from linear assumptions, limited use of heterogeneous data, and a lack of transparent uncertainty quantification. To address these limitations, we propose a unified multi-modal spatio-temporal deep learning framework that integrates satellite imagery, structured economic indicators, and policy documents through an adaptive cross-modal attention mechanism. The methodology incorporates a spatio-temporal cross-attention module to capture dynamic inter-regional dependencies and temporal patterns, along with a Bayesian neural prediction head to quantify uncertainty. Applied to a 13-year dataset from 75 Chinese cities, the model demonstrates substantial improvements, reducing mean absolute error by 37% compared to XGBoost and achieving 92% PICP (Prediction Interval Coverage Probability) under a 90% confidence threshold. Case studies further validate its ability to trace pandemic-induced economic shocks and reveal latent propagation pathways. The novelty of this work lies in its integrative architecture that jointly advances multi-modal fusion, interpretability, and uncertainty quantification, offering both methodological innovation and practical utility. This framework provides policymakers with transparent, risk-aware predictions and establishes a scalable foundation for next-generation economic forecasting.