The Volume Signal That Rescued Mean-Reversion Stat-Arb After 2003

Dec 14, 2025

Introduction

Mean-reversion in equity residuals - betting that stocks which drift away from their sector will snap back - generated Sharpe ratios above 1.4 through the late 1990s. Then something broke. After 2003, the same signals that printed money for years started bleeding out. But there’s a fix: weigh your signals inversely by trading volume, and the strategy claws back a Sharpe of 1.5 through 2007, including surviving the August quant meltdown.

The insight is simple: a stock that drops on low volume probably overreacted. A stock that drops on high volume might know something. Weight accordingly.

Context: The Mean-Reversion Setup

The core idea is classic stat-arb: decompose stock returns into market-factor exposure and a residual. If the residual mean-reverts, trade it as a contrarian signal. When a stock is cheap relative to its sector peers, buy it. When expensive, short it.

The execution matters more than the theory:

Universe: U.S. equities above $1B market cap (~1,400 stocks)
Horizon: daily signals, holding periods around 6-10 days
Reported Sharpe: 1.44 using PCA factors (1997-2007), but only 0.9 after 2003
With volume-time adjustment: 1.51 Sharpe from 2003-2007
Transaction costs: 10 bps round-trip assumed

The strategy is market-neutral by construction - you’re always long some stocks and short others, hedged to zero beta against the factors you’re using to de-mean the residuals.

Model and Prediction Target

The model has two pieces: factor decomposition and residual dynamics.

Factor Decomposition:

For each stock, regress returns on factor returns to extract the idiosyncratic component:

# For stock i on day t

R_i = beta_i1 * F_1 + beta_i2 * F_2 + ... + beta_im * F_m + X_i

# Factors can be:
# Option A: Sector ETF returns (one beta per stock)
# Option B: PCA eigenportfolios (top 15 eigenvectors of correlation matrix)

The PCA approach is data-driven - you extract factors from the correlation matrix of returns over a trailing window. The first eigenvector is basically the market (all positive weights, larger on low-vol stocks). Higher eigenvectors are long-short sector bets - second eigenvector is typically Energy vs. Financials/Real Estate, third is Utilities vs. Semiconductors.

Residual Dynamics:

Model the cumulative residual X_i(t) as an Ornstein-Uhlenbeck mean-reverting process:

# Residual follows OU process
dX = kappa * (m - X) * dt + sigma * dW
kappa = mean-reversion speed (higher = faster reversion)
m = equilibrium level
sigma = volatility
tau = 1/kappa = characteristic time to mean-revert

The key filtering step: only trade stocks where tau < 30 days. Slow mean-reversion means your holding period becomes too long, costs pile up, and the model’s stationarity assumption breaks down.

Prediction Target:

The s-score measures how far the residual has drifted from equilibrium in volatility-adjusted units:

# S-score definition
sigma_eq = sigma / sqrt(2 * kappa)  # equilibrium std dev
s = (X - m_adjusted) / sigma_eq
m_adjusted = m - cross_sectional_mean(m)  # remove model bias

This is your signal. High positive s-score = expensive relative to factor model = short candidate. High negative s-score = cheap = long candidate.

Data and Features

Data Setup:

15 industry sectors with corresponding ETFs (XLF, XLE, XLK, etc.)
Estimation window: 60 days (one earnings cycle)
Correlation matrix for PCA: 252-day trailing window
Market cap filter: >$1B at trade date (avoids survivorship bias)

Feature Construction:

The feature set is minimal - this is not a kitchen-sink ML model:

# Step 1: Get factor returns (ETF or eigenportfolio)
F_sector = daily_return(ETF_for_stock_sector)

# Step 2: Run 60-day rolling regression
R_stock ~ alpha + beta * F_sector + residual

# Step 3: Cumulate residuals to get X process
X = cumsum(residuals)

# Step 4: Fit AR(1) to X series
X[n+1] = a + b * X[n] + noise

# Step 5: Extract OU parameters
kappa = -log(b) * 252
m = a / (1 - b)
sigma_eq = sqrt(var(noise) / (1 - b**2))

Step 6: Compute s-score (normalized to zero mean across stocks)
s = -m / sigma_eq + cross_sectional_adjustment

The cross-sectional adjustment is important: if your model systematically thinks all stocks are cheap (or expensive), that’s model bias, not signal.

Volume-Weighted Modification:

Here’s where the post-2003 rescue comes in:

# Standard return
R_t = (price[t] - price[t-1]) / price[t-1]

# Volume-weighted return
avg_volume = rolling_mean(daily_volume, 60)
R_bar_t = R_t * (avg_volume / daily_volume[t])
# Use R_bar instead of R in the regression

Low volume day? The return gets amplified in the signal. High volume day? The return gets dampened. The intuition: moves on low volume are more likely to be noise/overreaction and more likely to revert.

Trading Rules

The signal-to-trade mapping is bang-bang, not continuous:

# Entry thresholds
s_open_long = -1.25   # buy when s < -1.25
s_open_short = +1.25  # sell when s > +1.25

# Exit thresholds (asymmetric)
s_close_long = -0.50  # close long when s > -0.50
s_close_short = +0.75 # close short when s < +0.75

# Position sizing
# Equal dollar amount per position
# Leverage: 2x long + 2x short = “2x2”
position_size = portfolio_equity * leverage_factor / num_positions

No scaling by conviction, no partial entries. Either the s-score is extreme enough to trade, or it’s not.

Does It Actually Work?

Performance by Strategy:

The numbers across different factor choices (all at 2+2 leverage, 10 bps round-trip):

1997-2007 (full period):

PCA 15 factors: Sharpe 1.44
Synthetic sector ETFs: Sharpe 1.1

2003-2007 (degraded period):

PCA 15 factors: Sharpe 0.9
Actual ETFs: Sharpe 0.9
ETF + trading-time: Sharpe 1.51
PCA + trading-time: no significant improvement

The trading-time modification specifically helps the ETF strategy. The authors speculate this is because ETFs are biased toward large-cap holdings, and the volume signal adds information about small/mid-cap names that the ETF betas don’t capture well. PCA eigenportfolios, being volatility-weighted rather than cap-weighted, already incorporate some of this information.

Mean-Reversion Statistics:

Median reversion time: 7.5 days
36% of tradeable signals have tau < 5 days
Average sector beta: ~0.96 (stocks track their sectors closely)
Average absolute daily alpha: 0.15%

Factor Concentration Varies Over Time:

The number of PCA factors needed to explain 55% of return variance ranges from 10 to 35 depending on market conditions. During high-VIX periods (late 2002, August 2007), variance concentrates in fewer factors - the market becomes “one big trade.” Strategy performance is better when you can explain most variance with fewer factors, because your model matches reality better.

August 2007 Stress Test

The strategy did experience the August quant meltdown, but with different severity by approach:

ETF strategies: ~10% drawdown August 6-10
PCA strategies: ~5% drawdown (more resilient)
Full recovery within 10 trading days

The pattern matches Khandani & Lo’s contrarian strategy when adjusted for leverage (they used 4+4, this paper used 2+2). What’s interesting is the sector breakdown: Technology and Consumer Discretionary got hit hardest, not Financials and Real Estate. This is consistent with the “unwinding theory” - stat-arb funds were liquidating their entire books, not just subprime-exposed positions.

How Much of This Survives Contact With The Real World?

Costs:

The 10 bps round-trip assumption is probably fine for a strategy that holds ~200 positions and rebalances daily. Today, with different market structure, you’d want to re-test with more conservative cost assumptions. The volume-time version helps here by trading less frequently on high-volume days where competition is fiercer.

Execution:

The paper assumes closing-price execution. In practice, you’re competing with the entire MOC flow. The s-score signals are calculated at EOD, so there’s no look-ahead in the backtest, but the execution assumption flatters the results.

Parameter Stability:

60-day estimation window with fixed entry/exit thresholds across all stocks and time periods. This is admirably simple and avoids overfitting, but also means you’re not adapting to changing market microstructure. The performance decay after 2003 suggests the static parameters became stale.

What’s Actually Robust:

The volume-weighted signal modification is the most interesting contribution. It’s a clean, intuitive adjustment that meaningfully improves out-of-sample performance and has a sensible interpretation. This is worth testing even if you don’t buy the rest of the setup.

What Traders Should Actually Do With This

Test the volume weighting on your existing signals. If you’re running any mean-reversion or contrarian strategy at daily frequency, try weighting returns by inverse volume before computing your signals. It’s a one-line change and the intuition is sound.
Track the factor concentration metric. Computing how many eigenvalues you need to explain X% variance is a useful regime indicator. When the market concentrates into fewer factors, mean-reversion within those factors should work better; when variance disperses across many factors, you’re fighting noise.
The s-score framework is a reasonable starting point for residual stat-arb. The OU model is simple enough to implement cleanly and the filtering by mean-reversion speed makes economic sense. Just don’t expect the same Sharpe ratios in current markets.
Be very skeptical of 2x2 leverage at daily frequency. The August 2007 drawdown was 5-10% in this backtest, but the authors had no competitor flow, no funding stress, no broker risk limits. Real-world drawdowns in similar strategies were much worse.

Minimal Reproduction Plan

Core hypothesis to test:

Cross-sectional equity mean-reversion (s-score based on residuals after factor exposure) generates positive risk-adjusted returns, and weighting by inverse volume improves performance.

Minimal viable data:

# Assets: S&P 500 constituents (or top 500 by market cap)
# Period: 2 years recent data (e.g., 2023-2024)
# Frequency: Daily close prices, daily volume
# Source: Yahoo Finance (free), or Sharadar/CRSP for cleaner data
# Sector assignments: GICS sectors, or use sector ETF tickers

Cheaper alternative: Start with sector ETF prices only (SPY, XLF, XLE, XLK, etc.)

Feature construction (simplified):

# Step 1: Assign each stock to a sector
sector_map = {ticker: gics_sector for ticker in universe}

# Step 2: Get sector ETF returns
etf_returns = daily_returns(sector_etf)

# Step 3: Rolling 60-day regression
for each stock:
  beta, residuals = rolling_regression(
    stock_returns[-60:],
    sector_etf_returns[-60:]
)

# Step 4: Cumulative residual
X = cumsum(residuals)

# Step 5: AR(1) fit
a, b = fit_ar1(X)
kappa = -np.log(b) * 252
m = a / (1 - b)
sigma_eq = np.std(X) * np.sqrt(1 - b**2)

# Step 6: S-score
s_raw = (X[-1] - m) / sigma_eq
s = s_raw - cross_sectional_mean(s_raw)  # de-mean

Target construction:

# Signal: s-score < -1.25 -> long, s-score > +1.25 -> short
# Target: next 5-10 day return of residual (or raw return)
# Success metric: correlation between signal and forward return

Model specification:

# No ML model - this is a rule-based signal
# Trading rule:

if s < -1.25 and kappa > 8.4:
    position = +1  # long
elif s > +1.25 and kappa > 8.4:
    position = -1  # short
else:
    position = 0

# Portfolio: equal-weight all active positions, rebalance daily

Training and evaluation:

# Metrics to compute:
# - Sharpe ratio (daily returns annualized)
# - Max drawdown
# - Average holding period
# - Win rate
# - Compare: with vs without volume weighting

Sanity checks:

# Before declaring success:
# - Signal exists in train AND test periods separately
# - Volume-weighted version outperforms standard version
# - Results survive removing top 10% most volatile names
# - Results survive with 20 bps round-trip costs (2x paper’s assumption)
# - Drawdown during high-VIX periods is manageable

# Apply to backtest
gross_pnl = sum(position * forward_return)
trading_cost = num_trades * avg_position * total_round_trip
net_pnl = gross_pnl - trading_cost

Success criteria:

# SUCCESS if:
# - Net Sharpe > 0.5 out-of-sample after costs
# - Volume-weighted version beats standard by >0.2 Sharpe
# - Max drawdown < 15%

# PARTIAL SUCCESS if:
# - Gross metrics positive but costs kill it
# - Signal exists but Sharpe < 0.5

# FAILURE if:
# - Signal flat or negative out-of-sample
# - Volume weighting makes no difference

Conclusion

We can decompose U.S. equity returns into systematic factors (either PCA eigenportfolios or sector ETFs) and idiosyncratic residuals, model the residuals as Ornstein-Uhlenbeck mean-reverting processes, and trade based on standardized deviation from equilibrium (s-score). The key contribution is showing that weighting signals by inverse trading volume rescues the strategy’s performance after 2003, achieving Sharpe 1.51 from 2003-2007 compared to 0.9 without the volume adjustment.

The methodology is clean, the backtest is honest about transaction costs and avoids survivorship bias, and the August 2007 analysis is genuinely useful for understanding how these strategies behave under stress. The volume-time modification is worth stealing even if you’re not running this exact strategy. The headline Sharpe ratios won’t replicate today - too many people are trading this edge - but the framework for thinking about factor decomposition and mean-reversion signals is sound.

Systematic Long Short

Discussion about this post

Ready for more?