Analyzing the Epidemiological Outbreak of COVID-19: Real-time, Visual Data Analysis, Short-term Forecasting, and Risk Factor Identification

Coronavirus Epidemiology Data Analysis Data Visualization Hypothesis Testing ARIMA Time-Series Forecast Natural Language Processing Logistic Regression.

Authors

  • Jiawei Long
    peterljw@g.ucla.edu
    Department of Biostatistics, UCLA Fielding School of Public Health University of California, Los Angeles,, United States

Downloads

The COVID-19 outbreak was initially reported in Wuhan, China, and it has been declared a Public Health Emergency of International Concern (PHEIC) on January 30, 2020 by WHO. It has now spread to over 180 countries and has gradually evolved into a world-wide pandemic, endangering the state of global public health and becoming a serious threat to the global community. To combat and prevent the spread of the disease, all individuals should be well-informed of the rapidly changing state of COVID-19. To accomplish this objective, I have built a website to analyze and deliver the latest state of the disease and relevant analytical insights. The website is designed to cater to the general audience and aims to communicate insights through various straightforward and concise data visualizations that are supported by sound statistical methods, accurate data modeling, state-of-the-art natural language processing techniques, and reliable data sources. This paper discusses the major methodologies, which are utilized to generate the insights displayed on the website, which include an automatic data ingestion pipeline, normalization techniques, moving average computation, ARIMA time-series forecasting, and logistic regression models. In addition, the paper highlights key discoveries that have been derived with regard to COVID-19 using the methodologies.

 

Doi: 10.28991/HIJ-2021-02-03-09

Full Text: PDF