Third-Party Research & Methodology Only

This section shares summaries of third-party academic research and descriptions of quantitative models. The content represents the findings of the original researchers, not the opinions or recommendations of Foxholm Financial. Foxholm Financial does not publish hypothetical or backtested performance metrics on its quantitative research pages. All content is restricted to methodology, signal construction, factor logic, and risk architecture. SEC rules require that investment advisers not present misleading performance data, and our methodology-only approach reflects that standard and the firm's fiduciary obligations.

Backtesting

Method Strategy Development Risk Factor

Backtesting is the process of testing an investment strategy against historical market data to evaluate how it would have performed. It is a core step in quantitative strategy development, used to assess whether a trading rule or portfolio construction method warrants further evaluation before committing real capital.

The basic idea is straightforward: take a set of rules that define when to buy, sell, or rebalance, then apply those rules to past market prices and see what happens. If the historical simulation aligns with the strategy's objectives, the approach may warrant further investigation. If it would have performed poorly, it can be discarded or refined without any actual losses.

Definition

A backtest simulates how a strategy would have behaved during a specific historical period. The strategy's rules are applied to actual market data (prices, volumes, and other observable inputs) as if an investor had followed them in real time. The output is a simulated track record that can be analyzed for return, risk, and other characteristics.

Key Concept

Backtesting answers the question: "If I had followed these exact rules during this historical period, what would the results have looked like?"

It is important to understand that a successful backtest does not guarantee future success. Historical markets reflect conditions, regulations, and participant behavior that may not repeat. A backtest is a necessary first filter, not a proof of concept.

The term is used across quantitative finance, from individual stock selection strategies to multi-asset portfolio construction and risk management. Any rule-based investment process can be backtested, provided sufficient historical data exists.

How Backtesting Works

A typical backtesting process follows a structured sequence. Each step introduces decisions that affect the reliability of the final results.

Define the Strategy Rules

Specify the exact conditions for entering and exiting positions, position sizing, rebalancing frequency, and any constraints (such as sector limits or maximum position sizes). The rules must be fully mechanical, meaning no judgment calls are allowed during the simulation.

Select the Historical Data

Choose the time period, asset universe, and data sources. The data must include all securities that existed during the period, not just those that survived to the present. This includes delisted stocks, merged companies, and bankrupt firms.

Simulate Execution

Walk through the historical data chronologically, applying the strategy rules at each decision point. At each step, the simulation should only use information that would have been available at that moment, never future data.

Account for Realistic Costs

Include transaction costs (commissions, bid-ask spreads), market impact (the price effect of placing large orders), borrowing costs for short positions, and any taxes or fees that would apply in practice.

Analyze Results

Evaluate the simulated track record using risk-adjusted metrics such as the Sharpe ratio (excess return per unit of volatility), maximum drawdown (the largest peak-to-trough decline), and win rate. Compare results across different time periods and market conditions.

Common Pitfalls

Backtesting is prone to several well-documented errors that can make a strategy appear far more attractive than it would be in practice. Understanding these pitfalls is essential for interpreting any backtested result.

Survivorship Bias

Survivorship bias occurs when the historical dataset only includes securities that still exist today, excluding companies that went bankrupt, were delisted, or were acquired. Because failed companies are removed from the dataset, the remaining universe looks artificially healthy. A strategy tested on survivors-only data will appear to perform better than it actually would have, because the worst-performing stocks have been quietly removed.

Look-Ahead Bias

Look-ahead bias happens when the backtest uses information that would not have been available at the time of the decision. For example, using annual earnings data on January 1 when those earnings are not actually reported until March introduces look-ahead bias. The strategy appears to "know the future" during the simulation, producing results that are impossible to replicate in real time.

Overfitting

Overfitting (also called curve fitting) occurs when a strategy is tuned to match the specific patterns of the historical data rather than capturing a genuine, repeatable market phenomenon. An overfitted strategy performs exceptionally well on the data it was designed around but fails when applied to new, unseen data. The more parameters a strategy has, and the more times those parameters are adjusted during development, the greater the risk of overfitting.

For a deeper exploration of these and other testing errors, see the Backtesting Pitfalls concept page.

Known Limitations

Limitations to Keep in Mind

Past markets do not predict future markets. Backtests assume that historical patterns contain useful information about the future. Market structure, regulations, technology, and participant behavior change over time, and a strategy that worked in one era may not work in another.
Transaction costs are difficult to model accurately. Bid-ask spreads, market impact, and liquidity conditions vary by time period and market regime. Historical cost estimates are approximations, and small differences in assumed costs can significantly change backtested results.
Data quality varies widely. Historical price data may contain errors, missing values, or inconsistent adjustments for stock splits, dividends, and corporate actions. Backtests are only as reliable as the data they are built on.
Multiple testing inflates results. When many strategy variations are tested on the same data, some will appear successful purely by chance. Harvey, Liu, and Zhu (2016) showed that the standard statistical thresholds used in finance research are far too lenient when hundreds or thousands of strategies are tested on overlapping datasets.
Capacity constraints are invisible. A backtest does not reveal whether a strategy can absorb real-world capital. A strategy that works with small hypothetical positions may fail when scaled to institutional size because its trades move the market.

Academic Origin

Backtesting has been a part of quantitative finance since the earliest systematic studies of stock market patterns. Alfred Cowles published one of the first rigorous evaluations of investment forecasting in 1933, testing whether stock market forecasters could actually predict future prices. His work established the principle that investment claims should be tested against historical evidence rather than accepted on faith.

The modern framework for backtesting emerged alongside the growth of factor investing in the 1990s and 2000s. As researchers identified potential return predictors (value, momentum, size, and others), the need for rigorous testing protocols became urgent. Bailey, Borwein, López de Prado, and Zhu (2014) formalized the problem of backtest overfitting, showing that the probability of selecting a false strategy increases sharply with the number of strategy variations tested. Their work provided a mathematical framework for quantifying how much a backtested result should be discounted based on the number of trials conducted.

Harvey, Liu, and Zhu (2016) extended this line of research by examining the broader problem of multiple testing in cross-sectional finance research. Their findings suggest that many published factor strategies may be the product of data mining rather than genuine economic phenomena, raising the statistical bar that any backtested strategy should clear.