Your Firm Wants You Blind So You Can't Compete!
Introduction
Let me start with a question for you: How much of your firm’s investment process do you actually understand? I don’t mean the softball stuff, I don’t mean the “big picture”. I mean, if you left today and were given unlimited manpower and resources to replicate the firm, how much of it COULD you replicate?
Not in theory. In practice. Do you know where your data comes from? Which vendor quirks got smoothed over before it reached you? Do you know what happens to your signal after you hand it off? How the optimizer weights it, what constraints clip it, what the execution algo does with the residual? If you are like most practitioners I have met, the honest answer is: not much.
This blindness is an orchestrated strategic design by large firms to ensure that nobody but senior management truly knows how the sausage is made, and that no single person has the know-how to truly compete in a like-for-like manner. Whilst strategic abstraction is common across knowledge industries, I will only talk about the quant industry since that is where I am currently residing.
Today’s article has 2 implications:
If you are a researcher and/or a PM residing in a large quant firm, it often pays to think deeply about what details have been conveniently obfuscated for you.
If you are a hiring manager and are bringing someone who has had immense success in a previous firm - it is well worth the effort to discover whether or not his successes are predicated on black boxes he neither truly understood nor properly attributed his success to.
---
This article is free and foundational, in hopes of setting up the basis for more complex discussions in further articles. If you liked what you read, sharing this with someone you think will appreciate it will go a long way.
Also, in the spirit of Christmas and at the request of a subscriber, I am offering a Christmas/New Year special discount for the yearly packages so you don’t miss all the upcoming articles of deep dives across the investment process.
Strategic Blindness
The systematic investment process is a pipeline of specialized black boxes - Data, Features, Signals, Portfolio, Risk, Execution, and Infrastructure glues the above together.
These organizational silos mean very few practitioners understand the end-to-end flow. Researchers do not feel data pain, nor do they understand the black magic of combining and monetizing signals. PMs work with black-box optimizers and execution; and in many shops, themselves do not know exactly how the signals work.
This is true in both the largest commingled research funds, largest podshops and even within the largest pods in these podshops. They will tell you that this is for “operational efficiency”, and that if you want to go to market quickly, you should just plug into their systems.
Only multi-hat practitioners at smaller pods, or firms or the most senior architects at large pods or firms see the complete picture. Everyone else knows their station. Nobody knows the car.
The Problem
The investment process looks simple when you draw it on a whiteboard: raw data goes in, orders come out. But the moment you try to implement it, you discover it is actually six to seven distinct engineering problems, each requiring different expertise, different infrastructure, and different mental models.
The standard breakdown of an investment process:
Data - Sourcing, cleaning, indexing, storage, point-in-time snapshots
Features - Transforming raw data into informative signals
Signals - Generating predictions of future returns
Portfolio - Constructing optimal portfolios subject to constraints
Risks - Allows a team to continuously monitor and provide guidance on how to improve your portfolios | Trade on top of projects / pods | Capital allocation
Execution - Getting fills in the market without destroying alpha
Infrastructure - Glue that holds everything together. It’s hard to capture the entire complexity of what is “infrastructure”. But it typically includes any piece of technology that connects disparate parts of the investment process together. Two simple examples I can give are simulation platforms that ingests features to backtest signals or ingest signals and allows for backtesting of portfolios; as well as MLops platforms that allow for and manage the training and deployment of large scale machine learning models.
Each stage has its own failure modes, its own tooling, its own specialists. The pipeline only works if stages trust each other’s outputs without understanding each other’s internals.
Data teams deliver “clean” data -> researchers consume it without knowing what was scrubbed. Signal researchers hand off alphas -> PMs optimize them without knowing the theoretical basis. PMs generate targets -> execution implements them without questioning urgency.
This is a caricature, for obvious reasons, but it is not that far from the truth. In some firms there are “knobs” and parameters you can control in your outputs that allow the next stage to better handle your products; but for the most part you are relegated to the alpha mines you are not going to breathe the same air as the PMs in the refinery.
For a large firm, this abstraction is by design - it is the one of the ways to scale effectively. But it creates 2 distinct problems:
It entirely disadvantages the data scientists to operate with no idea how to create signals from the data they have just cleaned and enriched, to the researcher who has not the first clue on how to monetize their signals, to the portfolio manager who has no idea how these signals were constructed, how the optimizer was written or how his target portfolios are being acted upon. Your convenience is your chain and ball.
In large firms it is not surprising to see inefficiencies arising from communication friction between the different teams responsible for different parts of the pipeline. PMs want portfolios with low turnover and high idiosyncratic variance, researchers want to beat correlation and submit as many signals as possible. Data teams have little incentive to enrich datasets and think deeply about practicability.
The framework
Think of the systematic investment process as an assembly line. Each station does one thing well and passes the product downstream.
Station 1: Data Curation
Business developments handle data procurement, negotiations and vendor management. Vendor management, following up and chasing on surprisingly unwilling salespeople is thankless.
Data scientists handle data quality issues (gaps, inconsistencies, ETL failures), point-in-time database maintenance, survivorship-free universe construction, and corporate actions. Mapping data to the correct instruments and tracking their changes in names and divestitures, mergers and acquisitions, etc is absolutely harrowing. This is costly and painful work.
What downstream users see: clean, processed data. What downstream users do not see: vendor contracts, product factsheets, data restatements, the fight to get backfill when a vendor changes methodology. The pains of stitching disparate series into a clean, functional one, and all the micro-decisions that determine if data is “clean” and “useable”.
The black box effect: researchers receive data through APIs and assume it is correct. Off-the-shelf tools and internal platforms make backtesting “increasingly easy” - you can subscribe to data feeds and backtest “every combination and permutation” without ever touching the underlying curation. This convenience is also blindness.
Station 2: Feature Engineering
Researchers/data scientists transform raw data into predictive signals. Different shops handle this differently. It also depends on the size of the team/firm if there is an additional layer between clean data and “signals”. Sometimes it exists and this layer is typically “feature generation”. Sometimes feature generation is handled by “researchers”, sometimes feature generation is handled by “data scientists”.
They handle normalization, outlier control, decay analysis, and redundancy analysis. A feature is anything that “is normalized, and is ex-ante believed to either have some positive correlation to returns on its own or is able to have some positive correlation to returns.”
What downstream users see: a numeric time series, perhaps with some documentation. What downstream users do not see: the underlying data, sometimes unstructured, why certain transformations were chosen, what edge cases the feature handles poorly, the full set of features that were considered and rejected.
Station 3: Signal Generation
Researchers produce signals based on feature analyses or algorithmic traversal of signal search spaces. They evaluate economic mechanisms, derive theories beyond statistical flukes, and generate forecasts of future returns.
At some large firms, researchers generate alpha signals for a “centralized signal pool” for subsequent combining and portfolio construction. I could write essays on what this entails and means - but I will leave it as such today: The job of researchers here is to traverse the infinite search space of features x transformations to find a finite amount of useful signals: AKA, WORKING IN THE ALPHA MINES.
The information asymmetry: signals describe the desire for an action, but the strategy may choose not to act. The signal “happens only in its own context.” A signal researcher may not know whether their alpha is weighted 2% or 20% in the final portfolio, what constraints clip it, or how it correlates or interacts with other signals in the pool.
What downstream users see: a forecast, perhaps an expected return. What downstream users do not see: model internals, the theoretical basis, the conditions where the signal breaks.
Station 4: Portfolio Construction
Portfolio managers handle mean-variance optimization, factor risk management, position sizing, and constraint handling (ADV limits, sector caps, turnover budgets, gross exposure). They use risk models that may come from vendors - and try to “concentrate on maximizing alpha” while treating risk as someone else’s solved problem.
In recent years, there is much more push towards compensating PMs in proportion with idiosyncratic sharpe to get them to be more incentivised in thinking about creating portfolios that are risk-optimal. Some firms have gone as far as firing PMs with >X% of factor exposure. However, exactly how the risk models were produced and extended may be a black box. While BARRA/Axioma remains the standard, large firms often push out these models with custom factors that are the risk flavor of the quarter.
In some firms, portfolio managers even work with optimizers they do not fully understand. The optimizer is a mathematical black box - inputs go in, portfolio weights come out. What happened inside? The PM trusts the math.
More critically, PMs may not know the signals feeding their optimizer. Especially in commingled research firms, the PM sees a pool of signals with some metadata and little information beyond that. The signals are black box. The risk model is black box. The only thing the PM owns is the target. Then PMs push out optimal weights to execution.
What execution sees: position targets. What execution does not see: why this trade is urgent, what signal generated it, how quickly the implicit/explicit forecast decays, what the theoretical edge is.
Station 5: Central Risk | Central Book
This is the layer that exists above the pods or research projects but below the firm. It is the eye in the sky that sees what no individual research/PM can see: the aggregate.
Central Risk handles firm-wide exposure monitoring, cross-pod/project correlation management, checking for value-add, capital allocation, drawdown triggers, etc.
The most sophisticated Central Risk operations run real-time aggregation across all pods, stress testing the combined book against historical scenarios and hypothetical shocks. They model liquidity across the aggregate position, not just your slice of it. They know that if three pods need to exit the same illiquid name simultaneously, the transaction costs will be 10x what any individual pod estimated. You, sitting in your pod, have no visibility into this. You sized your position based on your signals. Their analysis may conclude you are all going to destroy each other on the way out.
The coverage model is increasingly popular. Some podshops now have dedicated coverage teams; essentially internal consultants who sit between Central Risk and the pods. They translate the aggregate view into actionable guidance: “Your sector concentration is elevated,” “Consider reducing your momentum exposure,”.
The Central Book is the firm trading (charitably) on top of its own pods. It is the logical endpoint of seeing the aggregate: if you know what everyone is doing, why not trade on that information yourself? The firm aggregates signals, positions, or intended trades from across pods and constructs its own portfolio at the firm level. This might be additive (capturing diversification benefits that no single pod can achieve), defensive (hedging out exposures the firm does not want), or extractive (front-running the pods’ own flow).
What Central Book sees: Everything. Every signal submission. Every target portfolio. Every intended trade. In the most aggressive implementations, Central Book sees your positions before execution does. They can act on information you generated before you can act on it yourself.
The charitable interpretation: Central Book exists to harvest the any surplus diversification premium. Twenty pods with uncorrelated alphas, when combined optimally, produce a portfolio with a higher Sharpe than any individual pod. The pods get their payouts; the firm gets the diversification alpha. Everyone wins. Another charitable, efficient role of the central book is netting.
Netting is the phenomenon where Pod A wants to buy 100,000 shares of AAPL and Pod B wants to sell 80,000 shares of AAPL, why send both orders to market? Central Book can cross them internally, saving transaction costs for both. The firm pockets the spread that would have gone to market makers. This is pure efficiency, where everyone benefits and no one is harmed.
The less charitable interpretation: Your alpha becomes their alpha. The information asymmetry that protects the pods from each other does not protect the pods from the firm. In some cases, your strategy has a capacity of $1bn, the firm has allocated $500mn to you, and the other $500mn is captured by the Central Book. In others, the Central Book can act on your signals before you have.
Central Risk/Book team is the ultimate information asymmetry. They see everything; you see your pod. They know the firm’s true factor exposures, the crowding across pods, the names where the firm is dangerously concentrated. You know your portfolio.
Station 6: Execution
Execution handles order routing, venue selection, market impact minimization, and transaction cost analysis. This is where theory meets market microstructure.
The fundamental barrier: “Microstructure cannot be easily tested in a backtest.” Backtesting typically uses L1 data; real execution happens in L2 order books with jitter, latency, and adversarial counterparties.
The only way to understand execution reality is through post-trade analysis of real fills. Transaction costs can completely eliminate alpha.
The timing effect is even worse. For a one-day reversal factor: if you can trade on the same day’s close, Sharpe is 1.4x. If you have to trade at next day’s open (which is typically reality past a certain scale), Sharpe drops to 0.3x. That is an 80% loss from execution timing alone. Did the signal researcher account for this? In my experience: rarely.
The Glue: Infrastructure
Infrastructure is the invisible skeleton. It is the reason your backtest runs in seconds instead of minutes, the reason your signal deploys to production without manual intervention, the reason your portfolio optimizer can ingest 20000 signals and spit out weights before the market opens. It is also the reason you have no idea how any of this actually works.
Infrastructure teams handle simulation engines, data pipelines, job orchestration, compute provisioning, model registries, deployment frameworks, monitoring, alerting, and the thousand small services that make the machine hum. They are the plumbers of the quant world. Nobody thinks about plumbing until the toilet backs up.
What researchers/PMs see: A web UI where you upload a signal and get a backtest. A button that says “deploy to production.” A dashboard showing your PnL. Jupyter notebooks that magically have access to terabytes of data.
What users do not see: The distributed compute cluster running your backtest across 500 nodes. The dependency graph ensuring your features are computed before your signals. The versioning system tracking which model artifact is currently live. The databases, cloud storages, NAS and complicated interleaves of storage solutions that store and deliver the petabytes of data consumed by the firm with 99.999% reliability.
Simulation Platforms
The black box effect here is perhaps the most pernicious, because infrastructure black boxes compound all the other black boxes. Consider the simulation platform. It ingests your signal, runs it against historical data, and returns a Sharpe ratio. But what assumptions did it make?
Did it use point-in-time data or did it accidentally peek forward? How did it handle the 2008 delisting wave? What transaction cost model did it apply -fixed basis points, or something more sophisticated? When it computed that Sharpe, did it use overlapping or non-overlapping windows? Did it check for look-ahead in your feature normalization? Did it account for the execution lag you will actually face in production?
Most researchers I have met cannot answer these questions. They trust the platform. They LOVE the platform. It is the ultimate form of stockholm syndrome.
MLOps Platforms
MLOps platforms are another layer of abstraction that researchers happily ignore. You train a model, you register it, you deploy it. The platform handles feature stores, model versioning, and inference pipelines. Convenient. Also: you have no idea what your model is actually doing in production.
Is it receiving the same features it was trained on, or has upstream drift occurred? Is the inference latency acceptable, or are you missing trading windows? When the model was retrained last month, did it automatically deploy, or is production still running the stale version? These are not hypothetical failure modes.
Infrastructure Incentives
The infrastructure team operates with a different incentive structure than the investment teams. Their job is uptime, reliability, and scalability. Their job is not alpha. This creates a subtle but persistent misalignment.
When infrastructure optimizes for “ease of use,” they often hide complexity that researchers need to understand. When they build guardrails to prevent system failures, those guardrails sometimes prevent legitimate edge cases. When they standardize on a single simulation framework to reduce maintenance burden, they eliminate the flexibility that might let a clever researcher discover something new.
When It Bites
The most dangerous infrastructure is the infrastructure that works perfectly. Because when it works, you forget it exists. You forget that your entire research process depends on assumptions baked into platforms you have never inspected. You forget that “production” is not a magical place where your signal becomes money. It is a fragile chain of services, any one of which can silently degrade your alpha if not implemented properly.
At larger firms, the infrastructure is robust, scalable, battle-tested, and also understood by no one in its entirety. The researchers using it daily could not tell you how it works if their bonus depended on it.
From the perspective of the firm, this is great. This is EXCELLENT. This complexity represents a well-oiled, smooth running machine that is an OPERATIONAL moat. No one man can easily replicate this. But, from the perspective of a sole researcher / PM - this complexity is your prison. This is why the more locked-in you are to your firm’s investment ecosystem, the less value you have when you leave it.
There are literally tier 1 firms where some researchers/PMs are almost unhireable because they are completely dependent on their firm’s data, tools, processes and infrastructure.
How To Be Free: A Survival Guide
If you have read this far and feel a creeping sense of unease, good. That discomfort is information. The question is what you do with it.
The uncomfortable truth is this: your firm has every incentive to make you productive within their walls and useless outside of them. The smoother your workflow, the more suspicious you should be. Every abstraction that saves you time is also an abstraction that atrophies a skill you might need later.
Here is what you can do about it.
Touch the Data
Do not accept clean data as a gift from the heavens. At least once, trace a dataset back to its source. Talk to the data team. Ask them what broke last quarter. Ask them what vendor quirks they papered over. Ask them what “clean” actually means.
If your firm uses a vendor like Bloomberg, Refinitiv, or FactSet, get your hands on the raw product documentation. Understand what corporate actions adjustments were applied. Understand what backfill policies exist. Understand what happens when a company gets delisted, acquired, or spun off.
This is not fun work. It is not glamorous. But if you leave tomorrow and join a smaller shop where you are the data team, you will thank yourself.
Build One Thing End-to-End
Find an excuse, any excuse, to build something that spans multiple stations. It does not have to be production quality. It does not have to be at work.
On your own time, take a public dataset, engineer features, build a signal, construct a portfolio, and paper trade it. Write your own tools. Feel the pain of every stage. Discover that point-in-time data is hard. Discover that your signal decays faster than you thought. Discover that transaction costs eat your lunch.
The goal is not to build a profitable strategy. The goal is to know what you do not know. The researcher who has never touched an optimizer has no intuition for what constraints do to a signal. The PM who has never built a signal has no intuition for why certain alphas decay fast.
Interrogate Your Tools
Every platform you use makes assumptions. Your job is to discover them.
If you use a simulation platform, ask: What transaction cost model does it use? How does it handle delisted securities? Does it apply survivorship bias corrections? What is the default execution assumption: close, VWAP, next open? Where does it source its data, and is that the same data you will have in production?
If you use an optimizer, ask: What objective function is it maximizing? What risk model is it using? What is the underlying algorithm and does it matter? How does it handle corner cases: illiquid names, extreme volatility, missing data?
You will not get answers to all of these questions. Some are proprietary. Some are buried in documentation no one reads. But the act of asking forces you to confront what you do not know. Write it down. Maintain a personal document of “things I trust but do not understand.”
Learn Adjacent Stations Two Steps From You
You do not need to become an expert in every stage of the pipeline. But you should be conversationally fluent in the stages immediately upstream and downstream of your own.
If you are a signal researcher, learn enough about portfolio construction to understand how your signal gets weighted, constrained, and combined. Sit with a PM. Ask them what makes a signal easy or hard to trade. Ask them what they wish researchers understood.
If you are a PM, learn enough about signal generation to understand the theoretical basis of the alphas you are trading. Ask researchers what conditions break their models. Ask them what the decay structure looks like.
Conclusion
Here is the trade-off that your firm does not want you to discuss: the things that make you maximally productive today are often the things that make you minimally portable tomorrow.
Using the internal backtesting platform is faster than building your own. Trusting the data team is easier than auditing their work. Focusing on your narrow specialization is more efficient than learning adjacent domains.
But efficiency within a system is not the same as value outside of it. The most “productive” researchers at large firms are sometimes the least hireable, because their productivity was entirely dependent on scaffolding they did not build and cannot replicate.
You do not need to become a paranoid generalist who trusts nothing and rebuilds everything from scratch. That is its own pathology. But you should, with clear eyes, understand the bargain you are making. Every hour you spend learning a proprietary tool is an hour not spent learning something portable. Every abstraction you accept is an abstraction you cannot replicate when you wear the big pants.
The goal is not to reject your firm’s infrastructure. The goal is to see through it. To know where the black boxes are, what assumptions they contain, and what you would do if they disappeared tomorrow.
Because one day, they will.


