Analyzing the Epidemiological Outbreak of COVID-19: Real-time, Visual Data Analysis, Short-term Forecasting, and Risk Factor Identification

Jiawei Long


The COVID-19 outbreak was initially reported in Wuhan, China, and it has been declared as a Public Health Emergency of International Concern (PHEIC) on 30 January 2020 by WHO. It has now spread to over 180 countries, and it has gradually evolved into a world-wide pandemic, endangering the state of global public health and becoming a serious threat to the global community. To combat and prevent the spread of the disease, all individuals should be well-informed of the rapidly changing state of COVID-19. To accomplish this objective, I have built a website to analyze and deliver the latest state of the disease and relevant analytical insights. The website is designed to cater to the general audience, and it aims to communicate insights through various straightforward and concise data visualizations that are supported by sound statistical methods, accurate data modeling, state-of-the-art natural language processing techniques, and reliable data sources. This paper discusses the major methodologies which are utilized to generate the insights displayed on the website, which include an automatic data ingestion pipeline, normalization techniques, moving average computation, ARIMA time-series forecasting, and logistic regression models. In addition, the paper highlights key discoveries that have been derived in regard to COVID-19 using the methodologies.


Doi: 10.28991/HIJ-2021-02-03-09

Full Text: PDF


Coronavirus Epidemiology; Data Analysis; Data Visualization; Hypothesis Testing; ARIMA Time-Series Forecast; Natural Language Processing; Logistic Regression.


Ahmad, S. (2020). A Review of COVID-19 (Coronavirus Disease-2019) Diagnosis, Treatments and Prevention. Eurasian Journal of Medicine and Oncology. doi:10.14744/ejmo.2020.90853.

Dey, S. K., Rahman, M. M., Siddiqi, U. R., & Howlader, A. (2020). Analyzing the epidemiological outbreak of COVID‐19: A visual exploratory data analysis approach. Journal of Medical Virology, 92(6), 632–638. doi:10.1002/jmv.25743.

Tong, Y. C. (2020). Mathematical Analysis, Model and Prediction of COVID-19 Data. doi:10.1101/2020.08.04.20168195.

Chen, B., Shi, M., Ni, X., Ruan, L., Jiang, H., Yao, H., … Ge, T. (2020). Visual Data Analysis and Simulation Prediction for COVID-19. International Journal of Educational Excellence, 6(1), 95–114. doi:10.18562/ijee.055.

Wiersinga, W. J., Rhodes, A., Cheng, A. C., Peacock, S. J., & Prescott, H. C. (2020). Pathophysiology, Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19). JAMA, 324(8), 782. doi:10.1001/jama.2020.12839.

Pan, L., Mu, M., Yang, P., Sun, Y., Wang, R., Yan, J., … Tu, L. (2020). Clinical Characteristics of COVID-19 Patients with Digestive Symptoms in Hubei, China: A Descriptive, Cross-Sectional, Multicenter Study. American Journal of Gastroenterology, 115(5), 766–773. doi:10.14309/ajg.0000000000000620.

Han, C., Duan, C., Zhang, S., Spiegel, B., Shi, H., Wang, W., … Hou, X. (2020). Digestive Symptoms in COVID-19 Patients with Mild Disease Severity: Clinical Presentation, Stool Viral RNA Testing, and Outcomes. American Journal of Gastroenterology, 115(6), 916–923. doi:10.14309/ajg.0000000000000664.

Hyndman, Rob J. (2009) “Moving Averages.” Springer, doi:10.1007/springerreference_205462.

Hyndman, R. (2015). Forecasting functions for time series and linear models, R package version 6.1. CRAN. Available online: (accessed on May 2021).

Nau, Robert. (2014) “Introduction to ARIMA Models.” Introduction to ARIMA Models, Available online: (accessed on May 2021).

Hyndman, Rob, and George Athana­sopou­los. “Forecasting: Principles and Practice.” Chapter 8 ARIMA Models, Available online: (accessed on April 2021).

BIOST 515, (2004) “Estimation and Hypothesis Testing for Logistic Regression.” Estimation and Hypothesis Testing for Logistic Regression, Available online: (accessed on May 2021).

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer Texts in Statistics. doi:10.1007/978-1-4614-7138-7.

NCSS Statistical Software, “Two-Sample T-Test.” Two-Sample T-Test, Available online: (accessed on May 2021).

McHugh, M. L. (2013). The Chi-square test of independence. Biochemia Medica, 143–149. doi:10.11613/bm.2013.018.

Full Text: PDF

DOI: 10.28991/HIJ-2021-02-03-09


  • There are currently no refbacks.

Copyright (c) 2021 Jiawei Long