t-Statistic
The t-statistic measures how many standard errors a result is from zero. It is the primary tool for determining whether an observed pattern in financial data, such as a factor's excess return, is statistically significant or likely due to chance.
A standard error is a measure of the uncertainty around an estimate. If a factor appears to earn 2% per year on average but the standard error of that estimate is 3%, the result is well within the range of random noise. The t-statistic captures this relationship in a single number: the estimated effect divided by its standard error. Larger t-statistics mean the result is harder to explain as pure luck.
Definition
The t-statistic is calculated by dividing an estimated value (such as a portfolio's average excess return) by the standard error of that estimate. The standard error reflects how much the estimate would vary if the analysis were repeated on different samples of data.
Formula
t-Statistic = Estimated Value ÷ Standard Error of the Estimate
For example, if a factor earns an average excess return of 0.5% per month with a standard error of 0.2% per month, the t-statistic is 0.5 ÷ 0.2 = 2.5. This means the observed return is 2.5 standard errors away from zero.
The t-statistic was introduced by William Sealy Gosset in 1908, writing under the pen name "Student" while working at the Guinness brewery in Dublin. Gosset needed a way to draw reliable conclusions from small samples of barley data. The resulting "Student's t-test" became one of the most widely used tools in all of statistics.
How to Interpret the t-Statistic
The t-statistic is compared against threshold values to determine whether a result is "statistically significant," meaning it is unlikely to have occurred by chance alone. Higher t-statistics provide stronger evidence against the possibility that the true effect is zero.
| t-Statistic Threshold | Significance Level | Interpretation |
|---|---|---|
| Below 1.96 | p > 0.05 | Insufficient evidence; the result could easily be due to chance |
| 1.96 | p = 0.05 | Traditional threshold; roughly a 5% chance the result is random |
| 2.0 | p ≈ 0.046 | Commonly used round-number approximation of the 5% threshold |
| 2.58 | p = 0.01 | Stronger evidence; roughly a 1% chance the result is random |
| 3.0 and above | p < 0.003 | Strong evidence; recommended minimum for financial factor research |
These thresholds assume a single test. When many hypotheses are tested on the same data, the probability of finding at least one "significant" result by chance increases rapidly. This is the multiple testing problem, and it is why the traditional 1.96 threshold is often too lenient for modern financial research.
Practical Example: Testing a Factor's Alpha
Suppose a researcher builds a portfolio that goes long on stocks with high gross profitability and short on stocks with low gross profitability. After running the strategy on 20 years of monthly data (240 months), the results are as follows.
| Metric | Value |
|---|---|
| Average monthly excess return | 0.40% |
| Standard deviation of monthly returns | 3.10% |
| Standard error (std dev ÷ √240) | 0.20% |
| t-Statistic (0.40 ÷ 0.20) | 2.0 |
A t-statistic of 2.0 would pass the traditional 5% significance test. However, if this factor was one of many tested, the traditional threshold may be insufficient. The question becomes: should the bar be higher?
Why Finance Needs a Higher Bar: Harvey, Liu & Zhu
In a landmark 2016 paper, Harvey, Liu, and Zhu documented a critical problem in financial research. Hundreds of "factors" had been published claiming to predict stock returns, most using the traditional t > 1.96 threshold. But when hundreds of factors are tested, many will appear significant purely by chance.
The researchers argued that the appropriate threshold for a new factor in 2012 was approximately t > 3.0, and that this threshold should increase over time as more factors are tested. Their reasoning is straightforward: if 300 factors have been tested and the true significance level is 5%, roughly 15 will appear significant by luck alone. Raising the bar to t > 3.0 dramatically reduces the number of false discoveries.
The Harvey, Liu & Zhu Threshold
For new factor research, a t-statistic of at least 3.0 is recommended. This accounts for the hundreds of factors already tested in the academic literature. A factor with t = 2.0 may look significant in isolation but is unremarkable when placed in the context of all the other factors that have been tried.
This higher threshold is analogous to a Bonferroni correction (a method that adjusts significance thresholds when multiple tests are conducted simultaneously), adapted for the cumulative body of financial research.
This insight has important practical implications. Many widely cited factors in the academic literature have t-statistics between 2.0 and 3.0. Under the Harvey, Liu, and Zhu framework, these factors should be treated with greater skepticism. Some may represent real effects, but some are likely the product of data mining across decades of research.
Known Limitations
Limitations to Keep in Mind
- Statistical significance is not economic significance. A factor can have a t-statistic of 4.0 but earn only 0.1% per month. After transaction costs, that return may not be exploitable. The t-statistic says nothing about whether a result is large enough to matter in practice.
- Assumes well-behaved data. The t-test assumes returns are roughly normally distributed (following a bell curve) and independent over time. Financial returns often have fat tails (extreme events occur more often than a bell curve predicts) and serial correlation (one period's return influences the next). These violations can inflate or deflate the t-statistic.
- Sensitive to the sample period. A strategy tested from 2010 to 2020 may produce a very different t-statistic than the same strategy tested from 2000 to 2010. Short sample periods increase uncertainty, while long periods may span structural changes in markets.
- Does not account for survivorship bias. If the data used to compute the t-statistic contains survivorship bias, the inflated returns will produce an inflated t-statistic. The statistical test inherits whatever biases exist in the underlying data.
- A high t-statistic does not prove causation. A statistically significant relationship between two variables does not mean one causes the other. Spurious correlations (coincidental patterns with no real connection) are common in financial data, especially when many variables are available.
Further Reading
- Harvey, C.R., Liu, Y., and Zhu, H. (2016). "...and the Cross-Section of Expected Returns." The Review of Financial Studies, 29(1), 5–68.
- Student [Gosset, W.S.] (1908). "The Probable Error of a Mean." Biometrika, 6(1), 1–25.
- Chordia, T., Goyal, A., and Saretto, A. (2020). "Anomalies and False Rejections." The Review of Financial Studies, 33(5), 2134–2179.
- Hou, K., Xue, C., and Zhang, L. (2020). "Replicating Anomalies." The Review of Financial Studies, 33(5), 2019–2133.
Related Terms
Foxholm Financial is a fee-only registered investment adviser serving Georgia. We bring quantitative rigor to every client engagement. Explore our services or get in touch to discuss how we can help.
Are you an institution or FinTech firm? Learn about our Quantitative Consulting Services.
This content is for educational and informational purposes only and does not constitute an offer to sell or a solicitation of an offer to buy any securities. Nothing herein constitutes investment advice or recommendations tailored to your individual situation. All investments involve risk, including the potential loss of principal. Past performance is no guarantee of future results. Information presented is believed to be factual and up-to-date, but Foxholm Financial does not guarantee its accuracy and it should not be regarded as a complete analysis of the subjects discussed. Before making investment decisions, consult with a qualified financial advisor who can evaluate your specific circumstances.