Systematic Long Short

Systematic Long Short

Institutional Datasets: Equities Fundamentals

Systematic Long Short's avatar
Systematic Long Short
Feb 23, 2026
∙ Paid

Series Introduction

It seems many people are highly curious about the institutional datasets that the largest hedge funds use in their investment process. This is a series that talks about the 100+ institutional datasets that I’ve learnt about, researched on, or created features or signals on. We’ll cover many datasets that span the following: Technical prices, fundamentals, proprietary models, consumer sentiment, consumer spending, news, sentiment, index data, macroeconomic data, social media, flows, analyst reports, and many more!

As a researcher or even a PM, knowing your datasets inside out, e.g. where to get them, alternatives, how much they cost (public vs what can be negotiated) is of great leverage because most funds want to know that you can replicate your investment process all the way from the data component.

I’ll stick to the “high level” discussion in these articles and will be happy to take questions in our discord as usual!

Introduction

Today, we’re going to cover the workhorses of systematic equity strategies: fundamental datasets. You’ll find that most of the largest hedge funds trace back to the same handful of fundamental datasets. Part of the reason is because standardized financials across thousands of companies and decades of history is genuinely hard to assemble.

Further, fundamentals data makes it slightly harder by being really sparse, and thus requiring really long histories to create signals of any kind of statistical significance. Lastly, fundamental data suffer from a very heinous problem of “updating” the fundamental reports. E.g. Tesla reports 100mn revenue on 2nd Jan as they publish a quarterly report, and on 10th Jan, they report that they’ve actually fat fingered it, and meant to report 1000mn revenue. Tesla stock would have likely crashed on 2nd Jan and recovered strongly on 10th Jan.

A lesser fundamentals provider may just give you a fundamentals report that’s dated 2nd Jan with 1000mn revenue (the correction); although that’s not at all what happened. And your signals look fucking prescient because you will be buying when everyone else is selling, and selling when everyone else is buying - but it’s not real. It’s a fugazzi. Good fundamentals providers give you proper PIT data.

These are the datasets we are discussing today:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Systematic Long Short · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture