Introduction To Forecast Signals
Introduction
A forecast signal is a model that takes in features and predicts future returns. Some systematic trading strategies, at their core, are just pipelines for generating these signals and converting them into portfolio positions.
The pipeline is deceptively simple: gather data, transform it into features, use machine learning to generate forecasts, then size your positions. But finance has among the lowest signal-to-noise ratios of any prediction domain, roughly 1 basis point of predictability per day against 2% daily volatility.
That means most of the information in your model is noise, not signal. Understanding how to build forecast signals in this unforgiving environment is the foundation of systematic investing.
P.S: This is a weekly free article where we try to cover some foundational technical concept, so that we can build towards more advanced concepts and ideas in later articles. If you liked what you read, I’d recommend subscribing to stay on top of all the other articles we will cover along the great breadths and depths of the investment process!
I’d also appreciate sharing this with someone that you think will benefit from this!
Pipeline
Every machine learning system for return prediction follows the same basic flow. Think of it as a production line where each stage transforms the output of the previous one.
Stage 1: Data is your raw material. Stock prices, trading volumes, company financial statements, and increasingly, alternative sources like satellite imagery or credit card transaction patterns. The quality of your raw data constrains everything downstream. Most failures trace back to data problems, not model problems.
Stage 2: Features transforms that raw data into numbers your model can actually use. A stock’s price history becomes “how much has this stock gone up over the past 12 months” (momentum). Financial statements become ratios like “how cheap is this stock relative to its earnings” (value). This transformation is where most of the intellectual work happens, and where academics have identified many, many potentially predictive characteristics. The features you choose often matter more than the algorithm you pick.
Stage 3: Forecasts is where machine learning actually happens. You feed your features into a model and it spits out predictions for each stock’s expected return. The goal is a “Goldilocks” model: large enough to detect complex predictive relationships, but not so flexible that it overfits noise. In most publications, the target variable is going to be expected returns, but we will show in later articles that you can predict other targets that give you substantially uncorrelated returns.
Stage 4: Positions converts those predictions into actual portfolio decisions. Which stocks do you buy? How much of each? A simple approach: rank all stocks by predicted return, buy the highest-ranked ones, sell short the lowest-ranked ones.
The pipeline is only as strong as its weakest link. A sophisticated neural network fed poor features will underperform a simple model fed quality features. Most practitioners spend 80% of their time on model architecture when they should flip that ratio toward data and features.
Data Stage
Here are some popular categories of data:
Market data includes prices, volumes, and returns. Clean and standardized, it gives you momentum, volatility, and liquidity signals. The downside: everyone has it, so edges get competed away quickly.
Fundamental data comes from financial statements: balance sheets, income statements, cash flows. It captures business performance but arrives quarterly with delays. Contrary to belief, even crypto has fundamental data. It just requires more intelligent parsing to get it to a point where it resembles equities fundamental data. However, one of the more annoying parts of crypto is that most tokens do not have a clear relationship with the profits of the underlying business!
Alternative data is the newer frontier: sentiment from news, satellite imagery, credit card spending. Messy but potentially less crowded.
Features Stage
A feature represents some relationship that might predict future returns either as a standalone or when used with other features.
Feature engineering is mostly art based on financial intuition, educated priors, and understanding of market workings. The academic literature has documented many, many, characteristics with predictive power. We will probably cover a very significant portion of them in our upcoming articles.
Here some examples of very popular (mid frequency) features:
Value: Is the stock cheap relative to its fundamentals? Ratios like book-to-market or earnings yield capture this.
Momentum: Has the stock been going up or down? Past returns over various time windows.
Quality: Is the company well-managed? Profitability, return on equity, debt levels.
Size: Is it a large company or small? Market capitalization.
Volatility: How risky is it? Historical return variability.
Liquidity: How easily can you trade it? Turnover, trading volume, bid-ask spreads.
Building Features
Selection. Choose which variables to include. It is easier to start with these established characteristics before inventing new ones. Novel features often look great in historical tests but fail to work going forward.
Standardization. Raw features come in wildly different units: market cap in billions, return volatility as a decimal, book-to-market as a ratio. Feed these directly into a model and the large numbers dominate, regardless of predictive value. Standardization puts all features on a comparable scale. Why does this matter so much? Because many algorithms assume features are similarly scaled. Ridge regression penalizes large coefficients, but if market cap is measured in billions while volatility is measured in decimals, the “large” coefficients will always be on the smaller-scaled variables. Your regularization will be systematically wrong. Extreme values also distort estimates. A single stock with an unusually high book-to-market ratio can pull the entire regression line toward it. This is where winsorization helps. Winsorizing means capping extreme values at a threshold, say the 1st and 99th percentiles. Instead of letting outliers dominate, you contain their influence.
Interactions. Consider smart interactions between features. Some features themselves are not predictive of returns, but when used with another feature, may be. E.g. Momentum may not be a strong predictor as a standalone feature, but when used together with short interest, it may be very powerful.
Dimension Reduction. As you begin to work on features, you will have a “feature pool”. This will cause you to face the “curse of dimensionality.” This results because each feature adds a parameter to estimate, each estimate carries error, and those errors compound. At this point, many people will ask you to use PCA and call it a day. I strongly advise you to think clearly about what you are doing. PCA may capture 90% of the variation in your features, and yet that 90% may have nothing to with returns. You are often times, better off running some kind of feature selection methodology.
Choosing Your Model Architecture
Once your features are prepared, you need to choose an algorithm. There is no universally best model. Each has strengths that make it suited for different situations.
Linear Models with Regularization
Ridge regression, Lasso, and Elastic Net are the workhorses of financial ML. They are fast to train, easy to interpret, and remarkably effective when combined with proper regularization.
Ridge shrinks all coefficients toward zero but keeps every feature. Use it when you believe many features contribute small amounts of predictive power.
Lasso can shrink coefficients exactly to zero, effectively selecting which features matter. Use it when you suspect only a handful of features are truly predictive.
Elastic Net combines both, useful when features are correlated and you want both shrinkage and selection.
These models are your baseline. If a fancier approach cannot beat a well-tuned Elastic Net, the complexity is not worth it. A common criticism of linear models is that they cannot capture non-linear relationships between features. Well then, just introduce it yourself as a new feature, e.g. new_feature = feature_a x feature_b!
Tree-Based Methods
Random Forests and Gradient Boosted Trees (XGBoost, LightGBM) excel at capturing nonlinear relationships and interactions automatically.
Random Forests are harder to overfit; XGBoost often achieves better performance but demands more tuning. Use trees when you suspect interactions matter, for example, if momentum works differently for small versus large stocks.
The common textbook “tradeoff” is less interpretability, but honestly, that is no longer true. There are many open source libraries that really explain interpretability for tree-based methods now.
The real tradeoff is that tree-based methods tend to be much more computationally expensive because you are training M (shallow) models instead of 1, and when forecasting, you need to aggregate forecasts from M models instead of 1.
Neural Networks
Neural networks can learn complex patterns but require substantially more data and tuning. They are extremely powerful and will be the best models of an experienced team. Yet, they will almost definitely fail in the hands of beginners.
Neural networks are extremely “information” hungry, in finance, where signal-to-noise is extremely low, very deep networks with more parameters to estimate, can just end up fitting those extra parameters to noise.
We will spend a substantial amount of time going forward exploring all the many different architectures of neural networks in finance, and the ways in which they can actually work in our upcoming articles.
Selecting A Model
Start with regularized linear models. They are fast, interpretable, and establish a solid baseline. Move to trees if you suspect you have a lot of useful interactions between features and have enough data. Consider neural networks only with very large datasets after exhausting simpler approaches, and if you are fairly certain you “know what you are doing”.
The difference between models often matters less than proper feature preprocessing and rigorous out-of-sample testing. A well-executed Lasso beats a poorly-tuned neural network every time.
Advanced Tricks
You will learn, with experience and time, that you can fold a lot of your important objectives directly into your models’ loss function. We will cover this in detail in later articles.
Targets For Forecasting
As mentioned above, most publications will only talk about predicting expected returns, or “variants” of expected returns. This is the most “obvious” way of monetizing your forecasts, however, it is also very limiting and sometimes very noisy.
The intuition I have for you guys is the following:
Every source of signals is folded into the returns (e.g. returns moving due to pairs trading, momentum, index rebal, calendar effects, etc).
When you predict returns, you are forced to implicitly predict the interaction of every source of signals in your returns. This is, for obvious reasons, extremely noisy.
There are times where you can isolate that returns are LARGELY going to be due to 1 very specific signal. For example, when an earnings revision drops, you know returns in the COMING session is going to be largely influenced by this revision, barring no other contradictory news. Hence, you can actually train a machine learning model to predict the earnings revision ahead of time, so that you can generate returns from THIS source of signal, ignoring all else.
There are many, many such spots where you DO NOT want to be predicting returns as the target variable, and coming up with these spots is an important part of a forecasting process.
It (hopefully) goes without saying that choosing your features should be based on how you think your features is going to interact with or predict your target!
From Predictions to Trades
Once you have your forecasts, either in expected returns space or having been converted into expected returns space, then you need to convert your predictions into positions. In many large firms or pods, this process or step is called “monetisation”.
A simple approach: sort stocks by predicted return, buy the top 10%, sell short the bottom 10%. This “long-short” portfolio captures the spread between your best and worst predictions.
Just be careful and take note that predictive accuracy and trading performance don’t always align. A model with better predictions might produce worse returns if it concentrates in expensive-to-trade stocks or generates excessive turnover. Evaluate on what matters for your actual process.
A Few Simple Rules
This framework works best when you have you have a large universe with proper regularization throughout and rigorous out-of-sample testing with no lookahead biases.
I encourage you to:
Start with established features. Don’t invent new features until you’ve extracted what you can from proven ones.
Regularize everything. Never run unconstrained regression with many features.
Preprocess features carefully. Winsorize outliers, standardize scales, consider rank transformation.
Run meaningful dimensionality reduction. Have a good feature selection process!
Evaluate on what matters for trading. As mentioned above, accuracy and after-cost PnL are not necessarily always in agreement with each other!
Conclusion
Forecast signals are one of the atomic units of systematic investing. Building them requires understanding the full pipeline: data, features, forecasts, positions.
Remember that when you don’t know what you are doing, financial data rewards constraint over complexity. Regularization transforms models from “worse than useless” to genuinely predictive. Start simple, regularize aggressively, and add complexity only when simpler approaches are exhausted.
We will eventually cover more complex models and break down monetization in depth and how to convert ML forecasts into portfolio weights without falling into optimization traps.

