How To Select From A Million Features

Jan 22, 2026

∙ Paid

Introduction

How to do fast feature selection for high-dimensional data? This is a problem that one will see constantly in alpha research. You have a massive feature pool, maybe 5,000 trading signals or 10,000 alternative data features, and you need to select the useful ones.

Standard advice: compute pairwise correlations and remove redundant features. That’s 50 million calculations for 10,000 features.

There’s a nice O(N log N) algorithm that avoids most pairwise calculations while still removing redundancy.

Systematic Long Short

How To Select From A Million Features

Introduction

This post is for paid subscribers