Wavelets, Delta and COV19, an exercise in time series analysis

In this jupyter notebook, I will try to estimate how much more infectious Delta is than the 'original' COV-19 strain. Here, I define 'original' as the ensemble of viruses that spread from 2020 through June 2021.

The approach will be as follows:

  1. Download the NYT database, and normalize the COVID cases to each state's population.
  2. For each state, I will find the new cases reported each day -- this data oscillates wildly and needs to be smoothed. Though most analyses these days are done by taking the rolling average, another approach is to use wavelets to identify the sources of variation and smooth them out. I used the pywt package to do this.
  3. Calculate a poor man's $R_{eff}$ by dividing the total cases at week t by the cases reported at week t-1.
  4. Cluster the states according to their $R_{eff}$ behavior by finding the K-nearest neighbors for each state, then doing leiden clustering.
  5. Identify the peak $R_{eff}$ for the original strain (prior to July 2021), and identify the peak $R_{eff}$ for delta (posterior to July 2021) for each cluster, compare and contrast.

Smoothed case estimates

Smoothed Death Estimates

Rough $R_{eff}$ estimates

All states, plotted jointly. Color indicates whether the virus is actively spreading or not.

$R_{eff}$ estimates, clustered

Maximal $R_{eff}$ for Delta and COV19

How much worse is Delta than the 'original' COV-19?

We can compare the raw $R_{eff}$ numbers:

However, this would be naive. Let's pretend that $R_{eff} = R_0 * S * X$, where $R_0$ is the viral reproductive number at the beginning of the pandemic, in the absence of all social distancing measures; $S$ is the fraction of the total population susceptible to the virus, and $X$ represents governmental policies intended to dampen viral transmission. Clearly, $S \sim 1$ when the first COVID strains swept the US. By July 2021, however, most of the US has been immunized either through vaccines or infection. Now, it's as of yet unclear just how protective natural immunity is, and it appears that in a 2-dose mRNA regime (absent booster doses), vaccine efficacy against delta may not be ideal. Given that the US has measured about 40 million cases, and it's likely that COVID cases are under-diagnosed, this means at LEAST there have been 40 million cases in the US. At most, everyone may have had COVID once already -- taking the geometric mean of these two estimates suggests that ~30-40% of the US has already been infected. Around July 2021, something close to 40% of all americans had also been vaccinated.

If these two numbers are independent, then up to 80% of the US population was immunized before Delta came around. Let's assume that number is a little lower, maybe 50-60%.

In this case, rather than compare $R_{eff}$, we really should compare $R_0 *X$, assuming that governmental policies have not changed between the two timepoints we are comparing. Assuming the susceptible population for the original COVID-19 strain was 100%, and 50% for delta, then:

Oh wow. If this is true, then the $R_0$ for Delta is about twice that of the original SARS-Cov-2 virus that first landed in the US. That seems like a lot.

Finally, we know that the value of governmental and social interventions of the virus is such that the $R_0$ of the original SARS-Cov-2 was cut down by about 50-70% (the original Cov-2 virus is estimated to have a true $R_0$ of ~3, and interventions brought $R_0 \sim 1$). Therefore, $R_0$ for delta in the absence of governmental interventions would be...: