All The Forms A Signal Can Take
Introduction
People ask about the “right” way to build signals, as if there is one canonical answer that separates the professionals from the amateurs.
There isn’t. But there IS a framework that I think is genuinely useful, and it would have saved me months of getting confused between “signals”, “alphas”, “forecasts” and “predictions”. They’re all overloaded and can represent each other and they might mean different things at different places, even with the same name.
A Simple Taxonomy
The core idea is this: every signal is either a “feature” or a “forecast”. The thread tying both “features” and “forecast” together is that they are both expected to have some predictive relationship with forward returns in other to be useful.
A feature is straight forward, you are expecting it to serve as an input to some transformations or machine learning model that will output a “forecast”.
A “forecast” can be an implicit forecast (e.g. where a formulaic alpha model is outputting weights of a portfolio) or an explicit forecast where a model being trained to predict some target is outputting its predictions.
This allows every signal you build to take one of five forms. Understanding which form you are using, and what it can and cannot do, is foundational and also makes communication that MUCH clearer.
The Five Forms
Form 1: Everything Is A Feature
Signals designed as inputs to machine learning models. No portfolio weight constraints. These are transformed data meant for model consumption: z-scores, percentile ranks, industry-relative metrics, binary, etc. Anything you need to create so that it’s amenable to YOUR transformation or machine learning pipeline.
Raw Data -> Feature Engineering -> Features (Form 1) -> ML Model -> PredictionsStatistical arbitrage practitioners will often just think of everything as a feature. This isn’t just compartmentalization, in essence you could model everything as features to a giant machine learning model and have it learn to predict returns and call it a day. It does work but it requires significant infrastructure and modelling skills.
Form 2: Implicit Forecast, Dollar-Neutral, Leverage-1
This is the workhorse. The most common form for backtesting. Every day, your long positions sum to +0.5 and your short positions sum to -0.5. Net exposure is zero. Gross exposure is one.
Most formulaic alphas are implicitly this structure.
If you get confused reading this, just know that it’s easier to visualize the scores/predictions as a panel or a TxN matrix like the following:
instrument: 0 | 1 | 2
datetime 1 0.1 | 0.2 | -0.3
datetime 2 0.2 | 0.2 | -0.4
datetime 3 0.2 | -0.1 | -0.1
def construct_form1_signal(scores):
# Demean for dollar neutrality
weights = scores - scores.mean()
# Normalize for leverage-1
weights = weights / np.abs(weights).sum()
return weightsThe quintessential example here is: scores = rank((vwap-close)/(close+epsilon))). You bet that instruments in proportion to how far away they are to vwap. It is a cross-sectional, market neutral bet that instruments far away from their vwap will have returns in larger magnitudes than those nearer to their vwap. My hypothesis as to why this alpha works practically everywhere is because of execution algorithms structurally targeting vwap.
Why is this form so popular? Because it’s convenient. You can calculate PnL directly by multiplying weights times forward returns and summing. No external normalization needed. Just signal times returns equals PnL.
The constraint, and this is the part people miss, is that you cannot express any view about market direction. More on this later.
Form 3: Implicit Forecast, Dollar-Neutral, Variable Leverage
Like Form 2, but the absolute sum of weights can vary from 0 to 1. This lets you express “I have no opinion today” by setting leverage to zero, or “I have partial conviction” by using leverage less than one.
def construct_form3_signal(scores, confidence):
# Start with Form 2
weights = scores - scores.mean()
weights = weights / np.abs(weights).sum()
# Scale by confidence
weights = weights * np.clip(confidence, 0, 1)
return weights
You can reduce your cross-sectional bet size, but you still cannot take a directional market view. You can express uncertainty. You cannot express bullishness or bearishness.
Form 4: Implicit Forecast, Non Dollar-Neutral, Variable Leverage
This is the form that can actually express a market view. Your net exposure can range from -1 to +1, and your gross exposure can vary as well. You can be net long, net short, or flat. You can size your conviction.
def construct_form4_signal(scores, market_view):
# Normalize to bounded weights
weights = scores / np.abs(scores).sum()
# Shift by market view (-1 to +1)
weights = weights + market_view / len(scores)
# Clip to prevent excessive leverage
weights = np.clip(weights, -1, 1)
return weightsWhy does this form exist? Because sometimes you actually DO have a view on direction. You want to be net long going into a catalyst. You want to be flat before a binary event. You want to be net short when your model says the market is overextended.
Form 2 and Form 3 cannot express any of this. They are structurally incapable of it. The constraint that makes them dollar-neutral is the same constraint that hedges away your timing alpha.
This is the form I see people reach for when they shouldn’t, and avoid when they should. If your alpha comes from relative value (stock A vs stock B), use Form 2 or Form 3. If your alpha comes from directional timing (I want event/factor exposure now, not later), you need Form 4.
Note that you can set market_view to be 0, then form 4 signals will be amenable to trends, breakouts and the like as well. The main point here is not necessary the market_view, but that we do not cross-sectionally demean the scores.
Form 5: Explicit Forecasts, ML Predictions
The outputs of machine learning models. These are not portfolio weights. They require post-processing to become actionable. Below is an example of a very simple post-processing that will get you a long-short dollar-neutral portfolio.
def predictions_to_weights(predictions):
# Demean for dollar neutrality
weights = predictions - predictions.mean()
# Normalize for leverage-1
weights = weights / np.abs(weights).sum()
return weightsHere’s something that matters more than most practitioners realize: the question of what to predict. There are many variations that produce very diverse outcomes, an example of a simple decision is predicting returns, then sorting to get ranks or predicting ranks directly with a ranker model.
On Overloading
As you can see, the word “signal” is hopelessly overloaded.
When someone says “signal,” they might mean:
Features going into a model (Form 1)
Explicit forecasts/predictions coming out of a model (Form 5)
Implicit forecasts of portfolio weight vectors (Forms 2, 3, or 4)
All of these get called “signals” depending on who you’re talking to and what shop they came from. The beauty of classifying them this way is realizing that:
Form 2, Form 3, AND Form 4 signals can ALSO be used directly as features because they are already normalized and bounded.
Features (form 1) and targets/forecasts (form 5) need to be expressed as and transformed into form 2, 3 or 4 if you want to backtest them or add them together in an optimizer.
This creates a natural connection between traditional signal construction and ML pipelines.
Over here, in SysLS land, when you see the word “signal”/”alpha”, you should assume that they are mostly going to be features and occasionally explicit forecasts.
This depends on how far along the investment process we are: investment processes are hierarchical and allow for many “layers” of data -> features -> model -> forecasts. Signal works for us because a model’s forecast/output can be another model’s feature, and the common thread of a signal is that it’s broadly predictive/correlated to future returns.
Structural Differences
Dollar Neutral Signals (Form #2, #3, #4)
The constraint that makes dollar-neutral signals blind to market direction is not a bug - it is a feature for the use case these signals are designed for. Cross-sectional signals bet on relative value. You believe stock A will outperform stock B. You do not care whether both go up or both go down. Your alpha comes from the spread.
This makes dollar-neutral signals naturally hedged against market moves. The correlation between a Form 2 signal’s PnL and the market is near zero - You are not taking market risk. You are taking relative value (basis) risk.
This is the correct form for factor AND traditional “formulaic alpha” portfolios. Momentum, value, quality - these are all cross-sectional concepts. You want the cheap stocks relative to the expensive ones. You want the high-momentum stocks relative to the low-momentum ones. The absolute level of the market is noise. It is also the correct form for statistical arbitrage. If you believe Coke will outperform Pepsi, you go long Coke and short Pepsi. You do not care whether the beverage sector goes up or down.
Form 2: Dollar-Neutral, Leverage-1
When you are testing a new idea, Form 2 is your friend. It is the signal workhorse in long short statistical arbitrage. The direct PnL calculation makes iteration fast. The consistent leverage makes comparison across alphas straightforward.
Form 3: Dollar-Neutral, Variable Leverage
Some strategies have clear conviction variation. You know more on some days than others. Maybe your model confidence is lower during earnings season. Maybe your signal quality degrades during high-volatility regimes. Form 3 lets you express this. When confidence is low, reduce gross exposure. When confidence is high, run at full leverage. You are still dollar-neutral - you still cannot express market views, but you can modulate your bet size.
The danger is overfitting the confidence scaling. If your confidence metric is itself a signal, you are now running two signals stacked on top of each other, and the backtest will look better than reality.
Form 4: Non Dollar-Neutral, Variable Leverage
If your alpha is event-driven - you want to go long on positive earnings surprises, flat before announcements, short on negative surprises, dollar-neutral forms cannot express this. You need a time-series approach with variable net exposure. You need the ability to say “I want net long today” or “I want to be flat today.”
Some Properties
Small universes break cross-sectional signals
When you have fewer than 50 assets, cross-sectional ranking becomes noisy. The difference between the top decile and bottom decile is not statistically meaningful. If you are trading a concentrated sector or a small market, cross-sectional signals are unreliable. You are better off with time-series approaches.
ML predictions are not weights
A subtle but common pitfall: treating raw ML predictions as portfolio weights. They are not. Predictions must be post-processed into dollar-neutral, leverage-normalized form before use.
If your model outputs a vector of expected returns, you cannot simply go long the top five and short the bottom five at equal weight. The predictions need to be demeaned and normalized. Skipping this step introduces unintended biases and inconsistent leverage.
There is A LOT of alpha to be gained in thinking very hard about how to do this very well, and is something we will discuss in later articles.
Conclusion
Signals come in five forms and are either features or forecasts (implicit or explicit). The structural properties of each form determine what views you can express. Dollar-neutral forms cannot express market timing. Non dollar-neutral forms take on market risk.
Most of the confusion I see in signal construction comes from communication around mismatched forms or not understanding the taxonomy of signals.
Every Saturday, I hope to send out an article that is free and foundational, in hopes of setting up the basis for more complex discussions in further articles. If you liked what you read, sharing this with someone you think will appreciate it will go a long way.
Also, in the spirit of Christmas and at the request of a subscriber, I am offering a Christmas/New Year special discount for the yearly packages so you don’t miss all the upcoming articles of deep dives across the investment process.

