Third-Party Research & Methodology Only

This section shares summaries of third-party academic research and descriptions of quantitative models. The content represents the findings of the original researchers, not the opinions or recommendations of Foxholm Financial. Foxholm Financial does not publish hypothetical or backtested performance metrics on its quantitative research pages. All content is restricted to methodology, signal construction, factor logic, and risk architecture. SEC rules require that investment advisers not present misleading performance data, and our methodology-only approach reflects that standard and the firm's fiduciary obligations.

Bonferroni Correction

Statistical Method Hypothesis Testing

The Bonferroni correction is a statistical technique that raises the bar for calling a result "statistically significant" when running multiple tests at once. It is one of the simplest and most widely used safeguards against false discoveries in quantitative research.

When researchers test dozens or hundreds of potential investment factors, some will appear significant purely by chance. The Bonferroni correction addresses this problem by dividing the standard significance threshold by the number of tests performed. The result is a stricter cutoff that makes it harder for random noise to masquerade as a genuine finding.

Definition

The Bonferroni correction adjusts the significance level (the probability threshold below which a result is considered meaningful) to account for the number of simultaneous comparisons. In statistics, a "significant" result typically means there is less than a 5% chance it occurred by luck alone. That 5% threshold is called the significance level, often written as α (alpha).

Formula

Adjusted Significance Level = α ÷ n

Where α is the original significance level (commonly 0.05, meaning 5%) and n is the number of tests being performed. Each individual test must meet this stricter threshold to be considered significant.

For example, if the standard threshold is 5% and a researcher runs 20 tests, the Bonferroni-adjusted threshold becomes 0.05 ÷ 20 = 0.0025, or 0.25%. A result must now have less than a 0.25% probability of occurring by chance before it can be called significant.

How It Works

The logic behind the Bonferroni correction is straightforward. Imagine flipping a coin 100 times and looking for unusual streaks. With enough flips, some streaks will appear simply because of randomness, not because the coin is rigged. The same principle applies to testing investment factors.

Consider a research team testing 100 potential stock-selection factors. At a 5% significance level, pure chance alone would produce roughly 5 factors that appear significant (100 × 0.05 = 5). Those 5 "discoveries" are false positives: they look real but reflect nothing more than statistical noise.

The Bonferroni correction handles this by lowering the threshold. With 100 tests, the adjusted significance level becomes 0.05 ÷ 100 = 0.0005, or 0.05%. Now a factor must clear a much higher bar before the researcher can claim it works. This dramatically reduces the odds that a false positive slips through.

Step-by-Step Process

Step 1: Set the overall significance level. This is the maximum acceptable probability of making even one false discovery across all tests. A common choice is 5% (α = 0.05).
Step 2: Count the number of tests. Determine how many separate hypotheses are being tested. In factor research, this is the number of candidate factors.
Step 3: Divide. Calculate the adjusted threshold: α ÷ n. Each individual test must produce a p-value (the probability the result occurred by chance) below this threshold.
Step 4: Apply. Only results with p-values below the adjusted threshold are considered statistically significant.

Practical Example

Suppose a quantitative analyst screens 50 trading signals against historical data. Without any correction, using a 5% significance level, roughly 2 to 3 signals would appear to "work" by pure chance.

Scenario	Number of Tests	Threshold per Test	Expected False Positives
No correction	50	5.00%	~2.5
Bonferroni correction	50	0.10%	~0.05

With the Bonferroni correction applied, the threshold drops from 5% to 0.1% per test. The expected number of false positives falls from about 2.5 to roughly 0.05. Any signal that passes this stricter bar is far more likely to reflect a genuine pattern rather than a statistical fluke.

In practice, this means the analyst's final shortlist will be smaller but more reliable. A signal with a p-value of 0.03 (3%) would pass the uncorrected threshold but fail the Bonferroni threshold. Only signals with very strong statistical evidence survive.

Known Limitations

Limitations to Keep in Mind

Overly conservative. The Bonferroni correction is the strictest commonly used multiple-testing adjustment. When the number of tests is large (hundreds or thousands), the adjusted threshold can become so low that even genuinely significant results fail to pass. This means the correction may cause researchers to miss real effects, a problem statisticians call low "statistical power."
Assumes independence. The correction works best when the tests are independent of each other. In finance, many factors are correlated (for example, value and profitability metrics often move together). When tests are correlated, the Bonferroni correction is more conservative than necessary.
Does not rank results. The correction gives a binary answer: significant or not. It does not help researchers prioritize among the results that do pass the threshold.
Better alternatives exist for large-scale testing. The Holm-Bonferroni method offers a less conservative stepwise approach that still controls the overall error rate. The Benjamini-Hochberg procedure controls the "false discovery rate" (the expected proportion of false positives among all results called significant), which is often more practical for exploratory research with hundreds of tests.

Method	What It Controls	Key Difference from Bonferroni
Holm-Bonferroni	Family-wise error rate (same as Bonferroni)	Stepwise procedure; less conservative, more powerful
Benjamini-Hochberg	False discovery rate	Allows a controlled proportion of false positives; better for large-scale screening
Sidak Correction	Family-wise error rate	Slightly less conservative; assumes independent tests
Permutation Testing	Empirical significance	Uses data resampling rather than a formula-based threshold

For most quantitative finance applications involving tens or hundreds of factor tests, the Benjamini-Hochberg procedure is often preferred because it balances the risk of false discoveries against the risk of missing real effects. The Bonferroni correction remains a standard, conservative choice when the cost of a single false positive is very high, such as in regulatory reporting or clinical trials.

Academic Origin

The correction is named after Italian mathematician Carlo Emilio Bonferroni, who developed the underlying probability inequality in 1936. However, the method as applied in statistics was popularized by Olive Jean Dunn in her 1961 paper, which showed how to use Bonferroni's inequality for simultaneous confidence intervals and hypothesis tests. Dunn's formulation is what researchers use today.

In quantitative finance, the Bonferroni correction gained renewed attention after Harvey, Liu, and Zhu's influential 2016 paper, which argued that most published factor discoveries in finance fail to survive proper multiple-testing adjustments. Their work highlighted how many of the hundreds of "anomalies" in the academic literature may be false positives, a finding with direct implications for factor investing strategies.

Bonferroni, C.E. (1936). "Teoria statistica delle classi e calcolo delle probabilità." Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62.
Dunn, O.J. (1961). "Multiple Comparisons Among Means." Journal of the American Statistical Association, 56(293), 52–64.
Harvey, C.R., Liu, Y., and Zhu, H. (2016). "...and the Cross-Section of Expected Returns." The Review of Financial Studies, 29(1), 5–68.
Benjamini, Y. and Hochberg, Y. (1995). "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing." Journal of the Royal Statistical Society: Series B, 57(1), 289–300.

Glossary Statistics Hypothesis Testing Multiple Comparisons Factor Research

Definition
How It Works
Practical Example
Known Limitations
Related Methods
Academic Origin
Further Reading
Related Terms

Meet with a Fiduciary Advisor

Foxholm Financial is a fee-only registered investment adviser serving Georgia. We bring quantitative rigor to every client engagement. Explore our services or get in touch to discuss how we can help. To see how this kind of analysis informs real client work, explore a Strategic Portfolio Review.

Institutional Clients

Are you an institution or FinTech firm? Learn about our Quantitative Consulting Services.

Quantitative Fellowships

Foxholm Financial trains the next generation of quantitative analysts. Students and early-career researchers can explore our quantitative investment fellowships.

Disclaimer

This content is for educational and informational purposes only and does not constitute an offer to sell or a solicitation of an offer to buy any securities. Nothing herein constitutes investment advice or recommendations tailored to your individual situation. All investments involve risk, including the potential loss of principal. Past performance is no guarantee of future results. Information presented is believed to be factual and up-to-date, but Foxholm Financial does not guarantee its accuracy and it should not be regarded as a complete analysis of the subjects discussed. Before making investment decisions, consult with a qualified financial advisor who can evaluate your specific circumstances.