Third-Party Research & Methodology Only

This section shares summaries of third-party academic research and descriptions of quantitative models. The content represents the findings of the original researchers, not the opinions or recommendations of Foxholm Financial. Foxholm Financial does not publish hypothetical or backtested performance metrics on its quantitative research pages. All content is restricted to methodology, signal construction, factor logic, and risk architecture. SEC rules require that investment advisers not present misleading performance data, and our methodology-only approach reflects that standard and the firm's fiduciary obligations.

ML Stock Prediction

Machine Learning Trading Strategy Signal Generation

Robert Stowe, AAMS® Investment Advisor

Machine learning (ML) stock prediction uses algorithms that learn patterns from historical data to forecast future stock price movements. Instead of coding specific rules by hand, ML models discover relationships in the data on their own, potentially finding patterns too complex for a human analyst to specify.

The appeal is straightforward: financial markets generate enormous amounts of data, and ML algorithms are designed to find structure in large, noisy datasets. The challenge is equally straightforward: financial data is extraordinarily noisy, the patterns change over time, and the risk of overfitting (finding patterns that are real in historical data but do not persist in the future) is severe. Most ML models that look impressive in a backtest fail to generate meaningful predictions on new, unseen data.

Conceptual Framework

ML stock prediction belongs to the broader field of supervised learning: the algorithm is given historical examples of inputs (features) and outputs (what happened next), and it learns a mapping between them. The goal is to build a model that generalizes beyond the training data and produces useful predictions on data it has never seen.

This differs from traditional quantitative finance, where the researcher specifies a model based on economic theory (e.g., "stocks with low price-to-earnings ratios tend to outperform"). In ML, the algorithm discovers the relationships itself. This flexibility is both the method's strength and its primary vulnerability: a flexible model can fit any dataset, including the noise.

Core Assumptions

ML stock prediction makes implicit assumptions about the nature of financial markets. These are rarely stated explicitly but profoundly affect whether the approach works:

Patterns exist in the data: The approach assumes that there are repeating, learnable relationships between input features and future returns. If markets were perfectly efficient (all available information immediately reflected in prices), there would be no patterns to learn. The academic consensus is that markets are mostly efficient, with small, fleeting inefficiencies that are difficult to capture consistently.
Past patterns predict the future: ML models learn from historical data and assume that the relationships they find will continue to hold. Financial markets are non-stationary, meaning the underlying data-generating process changes over time. A pattern that existed during low-interest-rate environments may vanish when rates rise.
Signal-to-noise ratio is sufficient: Stock returns are extremely noisy. Daily returns have a signal-to-noise ratio close to zero, meaning the random component overwhelms the predictable component. Even a genuinely predictive model may explain only 1-2% of the variation in returns, which makes it very difficult for the algorithm to distinguish signal from noise during training.
Enough data exists for the model's complexity: Complex models (like deep neural networks) have millions of parameters and need correspondingly large amounts of data to estimate them reliably. Financial history is limited: there have been roughly 25,000 trading days in the U.S. stock market since 1926. Models with more parameters than data points will fit the noise perfectly and generalize poorly.

ML Pipeline Architecture

An ML stock prediction system follows a pipeline from raw data to a trading signal. Each stage introduces design decisions that affect the model's ability to generalize.

Step 1

Feature Engineering

Step 2

Model Selection

Step 3

Training & Validation

Step 4

Out-of-Sample Test

Step 5

Signal Generation

Feature Engineering

Features are the input variables the model uses to make predictions. In stock prediction, common features include price-based indicators (moving averages, momentum, volatility), fundamental data (earnings, revenue, valuation ratios), and alternative data (news sentiment, options activity, insider transactions). Feature engineering is often the most important step: the quality and relevance of the inputs matters more than the sophistication of the algorithm.

A critical discipline is ensuring that features use only information that would have been available at the time of prediction. Using data that was not yet published (like using Friday's earnings announcement to predict Wednesday's return) is called look-ahead bias and produces results that look excellent in testing but are impossible to replicate in practice.

Common Model Types

ML models for stock prediction span a wide range of complexity. The most commonly used approaches include:

Linear models (Ridge, Lasso): These are extensions of standard regression that add a penalty for large parameter values, which prevents the model from fitting noise. They are transparent, fast to train, and surprisingly competitive with more complex methods on financial data. The penalty (called regularization) shrinks unimportant features toward zero, effectively performing automatic feature selection.
Tree-based ensembles (Random Forest, Gradient Boosting): These models build hundreds or thousands of decision trees, each trained on a random subset of the data, and average their predictions. Random Forests reduce variance by averaging independent trees. Gradient boosting (XGBoost, LightGBM) builds trees sequentially, where each new tree corrects the errors of the previous ones. These models handle non-linear relationships and feature interactions well, but they can overfit if not carefully regularized.
Neural networks: Deep learning models that can approximate arbitrarily complex functions. Recurrent neural networks (RNNs) and Long Short-Term Memory networks (LSTMs) are designed for sequential data like time series. Transformer architectures, originally developed for language processing, have been adapted for financial time series. These models are powerful but require large amounts of data, substantial computational resources, and careful regularization to avoid overfitting.

Training, Validation, and Testing

The most important part of the ML pipeline is how the model is evaluated. The data must be split into three non-overlapping periods: a training set (where the model learns), a validation set (where hyperparameters are tuned), and a test set (where final performance is measured). The test set must never be used during development; looking at test results and then adjusting the model defeats the purpose of having a holdout.

For time-series data like stock returns, the standard machine learning practice of random cross-validation does not work because it allows the model to "peek" at future data. Instead, walk-forward validation is used: the model is trained on data up to time T, predicts at time T+1, then the training window advances forward. This mirrors how the model would be used in real time and gives a more honest estimate of out-of-sample performance.

Risk Architecture

ML stock prediction introduces model risks that are distinct from those in traditional quantitative finance. The flexibility that makes ML powerful also makes it uniquely vulnerable to producing misleading results.

Model Risk

The dominant risk in ML stock prediction is overfitting: the model learns patterns in the training data that are noise rather than signal. An overfit model shows excellent performance in backtests but fails on new data. Financial data is particularly vulnerable to overfitting because the signal is weak (returns are mostly random), the features are numerous (hundreds of potential inputs), and the sample size is limited (decades of daily data, not millions of observations).

A related risk is the multiple testing problem. Researchers often try many model specifications, feature sets, and parameter combinations before settling on one that works. Each trial increases the chance of finding a spurious result by random chance. Harvey, Liu, and Zhu (2016) estimated that a t-statistic of 3.0 (not the traditional 2.0) is needed to account for the number of trials typically conducted in quantitative finance research.

Known Limitations

Limitations to Consider

Overfitting risk is extreme: With enough complexity, any ML model can perfectly fit historical data. The gap between in-sample and out-of-sample performance is typically large in financial applications. A model showing 90% accuracy in training may drop to 51% on new data.
Non-stationarity: Financial markets change. Regulatory shifts, technological disruption, and changes in market structure mean that patterns found in one period may not exist in the next. Models need frequent retraining, but there is no guarantee that the new patterns will persist either.
Black-box interpretability: Complex models (neural networks, large ensembles) are difficult to interpret. When the model makes a prediction, it is often unclear which features drove the decision and whether the reasoning is economically sensible. This makes it hard to distinguish genuine insight from a spurious pattern.
Data leakage: Subtle forms of information leakage (using data that would not have been available in real time) are common and hard to detect. Even small leaks can inflate apparent performance dramatically.
Adversarial environment: Financial markets are different from image recognition or language processing because other participants actively work against you. If a profitable pattern is discovered and widely adopted, the trading activity itself eliminates the pattern. This means ML predictions may degrade specifically because they are useful.

Practical Considerations

Defending Against Overfitting

The single most important practical concern in ML stock prediction is controlling overfitting. Effective defenses include:

Regularization: Adding penalties for model complexity (L1/Lasso for feature selection, L2/Ridge for weight shrinkage) forces the model to find simpler explanations.
Feature reduction: Using fewer, more carefully chosen features reduces the dimensionality of the problem and limits the model's ability to memorize noise. Principal Component Analysis (PCA) can compress many correlated features into a smaller set of independent factors.
Ensemble methods: Combining predictions from multiple models reduces the variance of the forecast and makes the overall prediction more robust to the idiosyncrasies of any single model.
Walk-forward testing: Consistently evaluating on future data that was not available during training. If the model cannot predict data it has never seen, it has not learned anything useful.

Setting Realistic Expectations

The signal-to-noise ratio in stock returns is very low. Academic research suggests that even the best models explain only a small fraction of return variation. A model that correctly predicts the direction of the market 53% of the time would be considered excellent. Claims of 80% or 90% accuracy should be treated with extreme skepticism, as they almost always indicate overfitting, data leakage, or a misunderstanding of the evaluation methodology.

ML is most useful in finance when combined with domain knowledge. Using economic intuition to select features, imposing constraints that reflect market structure, and validating predictions against plausible mechanisms all reduce the risk of learning spurious patterns. A model that makes predictions for economically interpretable reasons is more likely to persist than one that relies on opaque statistical relationships.

The Role of Alternative Data

Alternative data refers to non-traditional data sources such as satellite imagery (e.g., counting cars in retail parking lots), credit card transaction records, social media sentiment, web traffic data, and natural language analysis of earnings call transcripts. These datasets can provide information that is not yet reflected in traditional financial data. The challenge is that alternative data is expensive to acquire, difficult to clean, and subject to the same overfitting risks as any other input.

ML Stock Prediction

Conceptual Framework

Core Assumptions

ML Pipeline Architecture

Feature Engineering

Common Model Types

Training, Validation, and Testing

Risk Architecture

Model Risk

Known Limitations

Limitations to Consider

Practical Considerations

Defending Against Overfitting

Setting Realistic Expectations

The Role of Alternative Data

Related Models

Further Reading