Analyzing the Epidemiological Outbreak of COVID-19: Real-time, Visual Data Analysis, Short-term Forecasting, and Risk Factor Identification

Jiawei Long


The COVID-19 outbreak was initially reported in Wuhan, China, and it has been declared a Public Health Emergency of International Concern (PHEIC) on January 30, 2020 by WHO. It has now spread to over 180 countries and has gradually evolved into a world-wide pandemic, endangering the state of global public health and becoming a serious threat to the global community. To combat and prevent the spread of the disease, all individuals should be well-informed of the rapidly changing state of COVID-19. To accomplish this objective, I have built a website to analyze and deliver the latest state of the disease and relevant analytical insights. The website is designed to cater to the general audience and aims to communicate insights through various straightforward and concise data visualizations that are supported by sound statistical methods, accurate data modeling, state-of-the-art natural language processing techniques, and reliable data sources. This paper discusses the major methodologies, which are utilized to generate the insights displayed on the website, which include an automatic data ingestion pipeline, normalization techniques, moving average computation, ARIMA time-series forecasting, and logistic regression models. In addition, the paper highlights key discoveries that have been derived with regard to COVID-19 using the methodologies.


Doi: 10.28991/HIJ-2021-02-03-09

Full Text: PDF


Coronavirus Epidemiology; Data Analysis; Data Visualization; Hypothesis Testing; ARIMA Time-Series Forecast; Natural Language Processing; Logistic Regression.


Ahmad, S. (2020). A Review of COVID-19 (Coronavirus Disease-2019) Diagnosis, Treatments and Prevention. Eurasian Journal of Medicine and Oncology. doi:10.14744/ejmo.2020.90853.

Dey, S. K., Rahman, M. M., Siddiqi, U. R., & Howlader, A. (2020). Analyzing the epidemiological outbreak of COVID‐19: A visual exploratory data analysis approach. Journal of Medical Virology, 92(6), 632–638. doi:10.1002/jmv.25743.

Tong, Y. C. (2020). Mathematical Analysis, Model and Prediction of COVID-19 Data. Medrxiv preprint: Cold Spring Harbor Laboratory, New York, United States. doi:10.1101/2020.08.04.20168195.

Chen, B., Shi, M., Ni, X., Ruan, L., Jiang, H., Yao, H., … Ge, T. (2020). Visual Data Analysis and Simulation Prediction for COVID-19. International Journal of Educational Excellence, 6(1), 95–114. doi:10.18562/ijee.055.

Wiersinga, W. J., Rhodes, A., Cheng, A. C., Peacock, S. J., & Prescott, H. C. (2020). Pathophysiology, Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19). JAMA, 324(8), 782. doi:10.1001/jama.2020.12839.

Pan, L., Mu, M., Yang, P., Sun, Y., Wang, R., Yan, J., … Tu, L. (2020). Clinical Characteristics of COVID-19 Patients with Digestive Symptoms in Hubei, China: A Descriptive, Cross-Sectional, Multicenter Study. American Journal of Gastroenterology, 115(5), 766–773. doi:10.14309/ajg.0000000000000620.

Han, C., Duan, C., Zhang, S., Spiegel, B., Shi, H., Wang, W., … Hou, X. (2020). Digestive Symptoms in COVID-19 Patients with Mild Disease Severity: Clinical Presentation, Stool Viral RNA Testing, and Outcomes. American Journal of Gastroenterology, 115(6), 916–923. doi:10.14309/ajg.0000000000000664.

Hyndman R.J. (2011) Moving Averages. In: Lovric M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-04898-2_380.

Hyndman, R. (2015). Forecasting functions for time series and linear models, R package version 6.1. CRAN. Available online: (accessed on May 2021).

Nau, R. (2014). Introduction to ARIMA Models: Nonseasonal and Seasonal. Available online: ~rnau/Slides_on_ARIMA_models--Robert_Nau.pdf (accessed on May 2021).

Hyndman, R.J., & Athanasopoulos, G. (2018). Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. Available online: (accessed on April 2021).

BIOST 515. (2004). Estimation and Hypothesis Testing for Logistic Regression, Lecture 13. University of Washington, | Seattle, WA, United States. Available online: (accessed on May 2021).

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer Texts in Statistics. doi:10.1007/978-1-4614-7138-7.

NCSS Statistical Software. (2021). Two-Sample T-Test. Utah, United States. Available online: (accessed on May 2021).

McHugh, M. L. (2013). The Chi-square test of independence. Biochemia Medica, Croatian Society of Medical Biochemistry and Laboratory Medicine, Zagreb, Croatia, 143–149. doi:10.11613/bm.2013.018.

Full Text: PDF

DOI: 10.28991/HIJ-2021-02-03-09


  • There are currently no refbacks.

Copyright (c) 2021 Jiawei Long