Pausing 'COVID on Trial' Series
A Probability Primer, because the conclusions are getting "thick" now
In Part 2 of the COVID on Trial series, it was brought out that when COVID was at it’s very worst (Alpha variant), the IFR was still not more than twice as bad as severe flu.
Because dozens and dozens of articles in peer-reviewed medical journals claim or assume that COVID was, at least for a time, over twice as bad as severe flu, it is important for me to explain how I arrived at a conclusion that is so different from so much existing research.
Otherwise, you are left to believe who you were inclined to believe in the first place:
—dozens upon dozens of highly-trained, professional researchers with Ph.D.s, or
—some random guy with a funny-looking avatar and a propensity for making Excel graphs
It’s actually pretty important to discover if COVID was ever over twice as bad as severe flu, because there have been more than a million excess deaths in the US, and that type of thing requires explaining. Had COVID stayed exactly twice as bad as flu, about 100,000 people in total would have died.
Possible manners of death are
1) natural causes (such as diseases)
2) accidents
3) suicide
4) homicide
Knowing what proportions of those “million excess deaths in the US” which happen to fall into each category is vital. I’ll first address how you can find the limit of plausibility on what it is that can possibly be true of the world, and then give detailed instructions on how to use a computer to determine natural limits.
How to find the limits of what can be true of the world
If man in a trenchcoat said that a coin was a fair coin, one that comes up heads half the time when flipped, and the coin came up heads 60% of the time during a trial of 100 coin flips: could you know that it was a fake and that the man in the trenchcoat was lying?
The answer is: it depends.
What it depends on is the sample size or number of trials you put it through. Any fair coin very easily comes up heads 60% of the time when you only flip it 10 times. The information contained in the results of “10 total coin flips; 6 of them heads” is essentially useless to you.
Even 100 coin flips is not enough information to be certain that the man in the scary coat is lying. Here are the results of a computer simulation using the statistics software, R:
You can see that the computer assigned random variables to a probability inherent to the coin — the probability of heads, or P(heads).
99,000 unique coin probabilities were found to lead to the results of 60 heads out of 100 total coin flips, with the mean value of approximately 0.6, just as would be expected.
A 99% credible interval showing the middle 99% of all probability reveals that the smallest probability consistent with the observed 60 heads is 0.47 — a coin that is actually a little biased away from heads and toward coming up tails most of the time.
There was some detectable probability below P(heads)=0.5, meaning that a fair coin cannot be ruled out here. But let’s increase the total coin flips to 100,000 (100k) and see what happens to the realm of possibility on what can be true of the coin …
Look at how fast the probability drops when the P(heads) values drop below 0.60. With 1048 of the random values on P(heads) leading to the “60,000 heads out of 100,000 coin flips” outcome, but none of them with P(heads)=0.55 or below EVER leading to 60,000 heads.
99 million random values were put through this time, and it is reasonable to think that about 55% of them were 0.55 or below. That’s over 50 million values and not even one single success (where 60,000 heads were found from 100,000 coin tosses).
Things that do not happen, even when attempted 50 million times in a row, are things which may never happen — because they might be impossible. There was no detectable “positive (nonzero) probability” associated with a fair coin here — where P(heads) = 0.5, but still … somehow … 60,000 heads were seen.
But theoreticians and academics may call for more data before determining that it is physically impossible to obtain 60,000 heads in 100,000 coin flips from a fair coin.
But do we REALLY need more data to prove the man in the trenchcoat lied?
Spoiler:
When it is more than a million times more likely that someone is lying than telling the truth, then you have “enough” data. Most decisions humans make are made on much less certainty than that.
For all practical purposes, the man was lying and more data than 100,000 coin flips is NOT required. Asking for more data at this point is to engage in semantics divorced from the logic of human decision-making — where everyday tradeoffs are made with outcomes associated with much higher probabilities.
When to not believe claims
If you are more likely to get hit by lightning not just once, but twice in the next 12 months, then don’t believe the person. If someone claimed that COVID caused a million deaths, then you’d want to know if it can possibly be true, or if — for all practical purposes — it cannot even possibly be true.
How you can discover the limits of plausibility on your device
I mentioned before about how, unlike the Imperial College London researchers, I’m willing to share all my code — unredacted. Here is the R code which gets you the first results when a coin is flipped 100 times, comes up heads 60 times, and you want to discover the total number of possible “realities” which could ever lead to the observed data:
# Bayesian Inference to Determine if a Coin is Fair, given 60% heads
# Define and draw 10 million times from a uniform prior distribution
set.seed(123)
n_draw <- 10000000
prior_dist_100_flips <- runif(n_draw, 0, 1)
# Define the generative model as a binomial experiment
gen_model <- function(propHeads) {
heads <- rbinom(1, size = 100, prob = propHeads)
heads
}
# Simulate the data
heads <- rep(NA, n_draw)
for(i in 1:n_draw) {
heads[i] <- gen_model(prior_dist_100_flips[i])
}
# Filter out those parameter values that didn't result in data actually observed
post_dist_100_flips <- prior_dist_100_flips[heads == 60]
# Count the samples that led to the data observed (60 heads from 100 coin flips)
length(post_dist_100_flips)
#average of posterior distribution
mean(post_dist_100_flips)
# CI99 (0.5th %-ile, 99.5th %-ile)
quantile(post_dist_100_flips, c(0.005, 0.995))
# create empirical cumulative distribution function and assign it to "Fn"
Fn <- ecdf(post_dist_100_flips)
# get eCDF up to P(heads) = 0.50
Fn(0.5)
# Plot and summarize the posterior distribution
hist(post_dist_100_flips, breaks=seq(0.355, 0.815, 0.01), xlim = c(0.355, 0.815))
# bold vertical dashed line at P(heads) = 0.50
abline(v=0.5, lty=5)
You can download R for free and run it on your system. If you copy and paste the code above, you will get the results I showed (the random number seed guarantees it).