In the previous report linked to above, a very simple metric for determining the virulence of a virus was shown: Simply compare age-specific death rates immediately before and after the event. But while that “spot-it-with-the-naked-eye” analysis is already convincing enough, critics and detractors demand “statistical proof.”

To really and truly know if a value cannot be explained by the common causes of variation — i.e., must be explained by a special event or a special exposure — you need to measure the variation in the background. A really good method was invented by Hampel and is called the Hampel Identifier. You first find a rolling median:

When values are put in order, from lowest to highest, the median is the middle value. Half of the values are below the median and the other half are above. As you can see, the average death rate (yearly deaths per 1,000) for those of age 45 in Sweden has come way down from what it was back in the 1700’s. Notice the offset in years used.

While Hampel uses the median of a set # of years before a target year, along with a set # of years after a target year, validity can be improved with more weight from prior years — establishing a better baseline and diminishing the weight of anomalous after-effects (e.g., pull-forward) of some event which disturbed the expected mortality rate.

The next step is to evaluate the death rate of single years against their own median rate, while standardizing by finding the recent variation in the rate with the use of a Median Absolute Deviation (MAD), dividing raw rate differences by the measure of variance from recent years, and comparing it against a critical reference value (orange):

Three years stand out in Sweden’s history of death (1773, 1789, and 1809) as being so different from neighboring years that they cannot be explained by the common causes of variation in death. For those 3 years, in order to explain the extra death, we have to invoke a special cause — such as war, or a famine, etc. The year of 1853 was borderline.

As the subtitle shows, it was pre-determined that 5 standard deviations proves the existence of an outlier, using Chebychev’s theorem proving that the proportion of any data set (literally, any data set!) within a certain number of standard deviations of the mean is always (yes, always) going to be more than:

\(1-\frac{1}{k^2}\)

where k represents the number of standard deviations away from a mean. Notice how, when k = 5, then the denominator becomes (5 * 5 =) 25, and the proportion within 5 standard deviations becomes 0.96 (96% of any data set), leaving less than 0.04 (less than 4%) of the data set lying beyond 5 standard deviations.

From 268 years of death in Sweden, only 3 or 4 years were anomalies when using a 5-standard deviation cut-off — i.e., only about 1% to 1.5% of the years observed. Because of Chebychev theorem, this finding of a value that was below 4% of the data was already to be expected (because the theorem always applies to any data set viewed).

If using the more liberal cut-off value of 3 standard deviations to identify outliers/anomalies, then the critical reference value for Hampel becomes 4.45 (right in the middle of the horizontal line for a standardized residual of 3 and the horizontal line for a standardized residual of 6. Even then, death during COVID was not unusual.

