There are two main schools of thought in statistics, the Frequentists and the Bayesians. Frequentists are “bean-counters” and they count up the relative frequencies of things over a long course of time. But from approximately 1933 or so, they are also the disciples of NHST: Null Hypothesis Significance Testing.
In NHST, you form a null hypothesis such as that the mean reaction time of sober people is the same as those who are drunk with alcohol. Then you do a reaction time test. If there was no difference in reaction times, and you kept repeating the experiment, then you’d see some “allowable” differences that are expected to be small.
But if you get a difference so big that it would be very rare to see when the null hypothesis is true, you go back and reject the null hypothesis and boldy declare:
There is a difference in reaction times when people are drunk
In particular, it takes drunk people longer to respond — and this is one of the reasons against the practice of “drunk driving” because it can lead to otherwise-preventable accidents. But Bayesians ask a different question. They do not ask about the probability of witnessing the evidence in light of an assumed null hypothesis.
They ask about the probability of the hypothesis being the truth, given the evidence that you saw. Sometimes they come to opposite conclusions. Using an example from this website, I recreated the results of a 1-million-flip coin toss experiment:
[click image to enlarge]
At right is a beta distribution, which is a “probability of probabilities.” In the case of the coin, the expected outcome will be close to a probability of heads that is Pr(heads) = 0.5. That’s because the coin is assumed to be a fair coin under NHST. When the coin is fair, results between the dotted lines are found. In this case, though, they weren’t.
At 498,800 heads from a million coin flips, NHST declares the coin biased. But what do the Bayesians say? The Bayesians look at the posterior distribution of coin probabilities (the blue curve), a distribution which has been narrowed because of witnessing so much evidence.
At bottom left there is a “CDF” which gives back “total probability of being below __” and we find a 0.67 probability that the coin is below NHST’s signficance threshold. But that means there is a 0.33 chance it is above the usual range of variation in outcomes that are expected from fair coins!
In other words, Bayesians DO NOT reject the null hypothesis — even though NHST advocates do, because there had been such a low probability to have made it outside of the dotted lines, assuming the coin had been fair all along. While the Z score for the result under NHST is -2.4, with a two-tailed p-value of 0.016, we cannot reject the null.
The posterior distribution of coin probabilities overlaps the range of expected values under the null hypothesis by a huge margin of 0.33! In this comparison, the Bayesians “win” because they made an inference which is more correct than NHST followers.