Word Forensics
Post #1504
With the release of a large tranche of the Epstein Files, renewed interest has arisen regarding the “conspiracy theory” of PizzaGate, where elites such as John Podesta are thought to be using the very same online code that self-proclaimed pedophiles use on 4Chan and 8Chan: a code that uses pizza references for sexual subjects.
But proper analytics can help. This post only sets the stage for such analysis. It is easy to see when the PizzaGate scandal first took off:
After Nov 2015, the big jump indicates the PizzaGate scandal taking off, on what was then called Twitter. When using the English language, the word “pizza” is supposed to comprise less than 0.0006% of written words (6 words per million):
While the average (mean) rate that you see pizza show up is 6 per million words, the standard deviation of the rate might turn out to be somewhere around 50% of the mean rate:
This means that, if you get a rate 50% larger (9 words per million), then that’s 1 standard deviation above the mean. If analysis reveals that pizza is found to comprise more than 0.0015% of written words — or 15 words per million; aka 3 standard deviations above the mean — then that would suggest possible nefarious use of pizza.
The standard of word frequency would be the most-used word in the English language:
The word “the” comprises ~3.6% of all written words (1 in every 28 written words). Although the use of the word “the” appears to be higher when looking at spoken English (linguistics) vs. written English (literature):
The word “the” got used ( 6.1% vs. 3.6% = ) 1.7 times more frequently when speaking vs. writing, so a sensitivity analysis can be performed by treating the Epstein emails as if they were verbal dialogue, and using a correction factor of 1.7x to represent the expected rate of pizza word use when engaged in verbal dialogue with someone.
That rate, after correction for use in verbal dialogue, is ( 1.7 * 6.1 = ) 10 words per million. The new critical threshold level at 3 standard deviations above the mean — assuming 1 SD = 50% of the mean — becomes 25 words per million. If Epstein emails contain “pizza” at a rate higher than 25 words per million, then that would be suspect.
To be clear, when people say “pizza” more than 25 times for every million words, it does not automatically make them out to be a pedophile — though it is statistically rare and, in the context of other data, should raise suspicion.






