The COVID-19 pandemic

This notebook uses the NYT github repository to analyze the progression of COVID-19 throughout the US states. The analyses contained herein are not meant to be used as primary literature on their own, and may have mistakes. I make no claims as to the accuracy of my calculations.

The questions I am interested in asking regarding this pandemic are fairly straightforward:

* What is the case fatality rate through time?
* What do the case / death curves look through time?
* Are the curves flattening?

I have used the 2019 population census projections to normalize data by population, and I also used the census bureau areas to compute population density.

To compute per-day difference, I was originally using a savgol_filter as implemented by scipy. As of April 26, 2020, I am using a Gaussian Kernel smoother with a 2 standard deviation bandwidth.

Loading the data

You can find the spreadsheets I downloaded here: https://github.com/dangeles/dangeles.github.io/blob/master/data/

COVID in the total US

Epidemiological curves of COVID-19

I have plotted the cases and deaths through time in the plots below in 2 different ways. The first column shows the absolute number of cases (first row) or deaths (second row). The second column shows the number of cases (deaths) normalized to the population of each state. The second column can be interpreted as your risk of getting COVID-19 through time for any given state, since it tells you the number of cases (or deaths) per million people for each state.

Are the curves flattening?

Notice that the case curves are on linear scale; the death curves are on log-scale.

When are the peaks happening?

Case Fatality Rates

The plots below show the relationship between the number of cases and the number of deaths per state. The case fatality rate is defined as the fraction of COVID-19 infected individuals who pass away from the disease. In the graphs below, it's clear that as the number of cases has grown in New York, New Jersey and Massachusetts, so have the case fatality rate. The reason behind this relationship is unclear to me, but I suspect it has to do with decreasing quality of care as the system is overloaded.

An important point in the future will be the presence (or lack thereof) of hysteresis in the system. That is, as cases fall, will the death rate follow the same trajectory as it did before, falling back down to 1% initially? Unfortunately, I suspect that once the death rate is high, it will take a significant fall in the number of cases to bring it back down. I could very well be wrong about that, particularly if cheap and ample pharmacological supply becomes available.

Total CFR by state

In the graph below, I plot the total CFR by state. I plot the minimum CFR per state, the maximum CFR, and the current CFR calculated on all deaths and cases to date.

What states have the highest and lowest # of deaths per capita?

The viral reproductive viral, $R_t$, through time

Using this data, we can calculate the viral reproductive number through time. If we get $R_t < 1$, then that means the virus is dying out, whereas if it is above 1, the virus is growing.

I performed this analysis using the reported cases, but I also performed a second analysis using reported deaths. An $R_t$ based on deaths is meaningless from the point of view of quantitative interpretation, however, once we achieve $R_t < 1$ for a sufficient amount of time, we should see both curves dip below 1.

Here are the curves for four states:

What are the states with the best or worst $R_t$?