Systematic Long Short

Systematic Long Short

The Statistical Factor Model That ACTUALLY Predicts Returns

Systematic Long Short's avatar
Systematic Long Short
Feb 05, 2026
∙ Paid

The Problems With PCA

Firstly, the factors have no stable meaning. PCA extracts whatever linear combination explains the most variance. This month “factor 2” might be long tech, short energy. Next month it might flip to long energy, short tech. The factors are mathematically correct but economically meaningless.

This is the “identity crisis.” Mathematically, PCA factors are identified only up to orthogonal transformations, meaning any rotation that preserves angles between factors gives an equally valid solution. There’s no anchor that pins down which rotation is “correct.”

Secondly, the stocks themselves have no stable identity either. When a new instrument, e.g. (Tesla) enters your universe, PCA has never seen it. There’s no loading estimate. You have to re-run the entire estimation. The model thinks in terms of ticker symbols, not in terms of what stocks actually are.

Run PCA on 3,000 stock returns with 5 factors. You get 15,000 loading parameters, one per stock per factor. With this, it rewards memorizing idiosyncratic patterns specific to each stock. You have enough flexibility to fit noise rather than signal. The model memorizes “AAPL had this specific return pattern” rather than learning “large-cap stocks tend to have this risk exposure.”

Today’s article is to learn a technique that precisely solves these issues!

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Systematic Long Short · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture