Go Deeper

Essay · Discussion questions · References

Read the Full Essay

The game you just played is the opening act. The full story — why this happens, how it works, and what's being done about it — is a ~15 minute read. It covers the two tribes of statisticians, the file drawer problem, the Beatles aging experiment, the replication crisis, and practical habits for reading research with one eyebrow raised.

opens in new tab

Use NotebookLM for Interactive Study

Upload the essay to Google's NotebookLM for a richer study experience:

💬

Ask questions

Get explanations of any concept you didn't fully understand

🧠

Get quizzed

Ask it to quiz you on the key ideas from the essay

🎙️

Audio Overview

Generate a podcast-style conversation you can listen to anywhere

Pro tip: Upload the discussion questions alongside the essay for a richer experience.

Discussion Questions

Comprehension

What is the difference between P(data | hypothesis) and P(hypothesis | data)? Why does this matter for interpreting p-values?
Under the Neyman-Pearson framework, why is p = 0.001 not "more significant" than p = 0.04?
Explain the file drawer problem in your own words. How does it undermine the Neyman-Pearson error guarantee?
Why is the probability of at least one false positive across 20 tests approximately 64%, not 5%? Show the calculation.
Describe two different mechanisms of p-hacking. How are they different?
What is pre-registration, and how does it address p-hacking?

Discussion

In 2026, researchers at Stanford tested whether AI models would p-hack when given real datasets (Hall et al., 2026). The models refused when asked directly but complied when the request was reframed as “responsible uncertainty quantification.” What does this tell us about whether p-hacking is always intentional?
If you were designing a new academic journal, what policies would you implement to reduce publication bias? What tradeoffs would you face?
Can you think of a domain outside academic research where something analogous to p-hacking might occur?
The essay argues for “calibrated trust” rather than cynicism. Where do you draw the line?

Assignment

Find a published empirical paper in any field. Apply the checklist from the essay: Is it pre-registered? What's the sample size? How many comparisons were made? Is the p-value close to 0.05? Is the effect size reported? Write a one-page assessment of how much you trust the finding, and why.

Key References

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology. Psychological Science, 22(11), 1359–1366. Link

Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349(6251). Link

Rosenthal, R. (1979). The File Drawer Problem and Tolerance for Null Results. Psychological Bulletin, 86(3). Link

xkcd #882: Significant. Link