Systematic Long Short

Systematic Long Short

How To Select From A Million Features

Systematic Long Short's avatar
Systematic Long Short
Jan 22, 2026
∙ Paid

Introduction

How to do fast feature selection for high-dimensional data? This is a problem that one will see constantly in alpha research. You have a massive feature pool, maybe 5,000 trading signals or 10,000 alternative data features, and you need to select the useful ones.

Standard advice: compute pairwise correlations and remove redundant features. That’s 50 million calculations for 10,000 features.

There’s a nice O(N log N) algorithm that avoids most pairwise calculations while still removing redundancy.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Systematic Long Short · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture