Introduction To Practical Portfolio Optimization
Introduction
Portfolio optimisation for systematic trading comes down to one question: given expected returns and a covariance matrix, what weights maximise risk-adjusted return?
Today we are going to show how to solve this using CVXPY, a Python library for convex optimisation. CVXPY works well for small problems but become slow at scale; they are not optimised for large-scale quadratic programs. In future articles, we will look at MOSEK, a commercial solver that runs 10-100x faster on portfolios with thousands of signals/positions. Later we will look to roll our own optimizers which are themselves 100x faster than Mosek.
Today’s article, like many others before it, is free and educational, and serves as a foundation for more complex and interesting articles after this. I would greatly appreciate if you share it with someone you think will benefit from this.
The Insight
Without burying the lead, the insight today is the following:
Most material on Mean-Variance Optimisation (MVO) will tell you to perform portfolio optimization on instruments directly. You will then try to use historical returns and covariance matrices on instruments, and find that MVO is absolutely horrendously at giving you an “optimal portfolio”.
You will correctly intuit that it is because historical returns for instruments are very noisy and not a good estimator for future returns of an instrument! And, your MVO is only as good as your estimated future returns!
So, how do you solve this? Easy. Rather than performing MVO on your instruments, perform MVO on your signals! You’ve already done the work on making sure your signals are valid and statistically significant, so their signal to noise ratio is going to be magnitudes above individual instruments, which means their historical returns are going to be less poor of an estimator than individual instruments!
So then, suppose you have expected returns for 500 signals, how do you translate these into portfolio weights? The naive answer is to go long the positives and short the negatives. But this ignores correlations. Two signals with similar expected returns might be highly correlated: when one moves, the other moves with it. If you hold both at equal weights, your portfolio is far less diversified than it appears. You are doubling down on the same bet.
MVO solves this problem. Given the returns vector (expected returns) and a covariance matrix of your signals, MVO finds the weights that maximise the risk-adjusted return of your portfolio, accounting for returns, risks and correlations.
A Short Plug
I’d like to take some time to answer a frequent question I get: “How does your content differ from the myriad quant content out there?“
Well, it’s simple and boils down to two simple things:
I actually eat my own cooking. I’ve managed money at scale and quite successfully. I just so happen to be in my leaking alpha for clout capture era. I don’t write about stuff I’ve never tried or thought deeply about. This means I do not flood you with BS.
My articles are going to be practical, to the point, and hopefully contains one insight that will make it useful. Most content are vacuous and do not contain any insights, therefore, they pad the content with “proofs”, “formulas” and “equations”. There is no need for that here. You will get the meat without the bones.
There’s going to be so much more to be done in 2026. If you don’t want to miss any of this, make sure you subscribe!
What Is MVO?
MVO is the mathematical answer to “given my beliefs about returns and risk, what positions maximise risk-adjusted return?”
The objective function looks like this:
maximize:
alpha^T w - gamma * w^T Sigma wMVO requires 3 inputs:
alpha is a vector that represents the expected returns for each signal
Sigma is the covariance matrix that describes how signals move together
gamma is the risk aversion parameter that controls the return-vs-risk tradeoff
w is what you are solving for, the portfolio weights (for each signal)
Alpha is your n-vector of expected returns, one number per signal. Sigma is n x n. In the objective function, the first term captures expected return. The second term penalises variance. Higher gamma means more risk-averse; lower gamma means more aggressive.
The solution has an elegant closed form: w* ~ Sigma^-1 alpha. You can verify this by taking the derivative of the objective with respect to w and setting it to zero.
What the solution is saying isn’t ground breaking, just that positions scale with expected return strength, shrink for higher-variance signals, and adjust for correlations. However, THIS is the important part: when two signals have similar expected returns but high correlation, the optimizer reduces both positions compared to what it would hold if they were independent. MVO quantitatively avoids overconcentrating in correlated bets. There’s a small fun fact Id like to throw in here that even if your signals were individually VERY STRONG, as long as they are correlated beyond a certain threshold, it is actually mathematically optimal to long one and short the other (I explain this in more detail in one other article)!
Elegant as the closed form is, you will still need to understand solvers. Real MVO problems include constraints: position limits, exposure caps, turnover bounds. Once you add those, there is no closed-form solution. Understanding how to use solvers becomes essential.
Why Signals Work Better Than Raw Returns
Coming back to the signals vs instruments argument, signals are the preferred inputs to MVO rather than raw historical returns because the expected returns on signals are significantly less noisy.
Consider the difference: estimating the expected return on a single stock requires fighting through enormous noise. Daily stock returns have standard deviations around 2%, but daily expected returns are measured in basis points. The signal-to-noise ratio is brutal. A momentum signal, value signal, or ML forecast has already done the work of extracting a predictable pattern from this noise. The expected return on the signal is more stable than the expected return on any individual constituent.
MVO then combines multiple signals optimally, accounting for their correlations. Asking “what is the expected return of Stock A?” is a fool’s errand. Asking “what is the expected return of my momentum signal?” is answerable because the signal has aggregated many stocks into a systematic pattern.
What Is CVXPY and How It Works
CVXPY is a Python library for convex optimisation. You describe the problem in mathematical syntax, and CVXPY calls an appropriate solver. The core pattern for MVO in CVXPY is five lines:
import cvxpy as cp
# declares your decision variable
w = cp.Variable(n)
# portfolio expected returns
ret = alpha @ w
# portfolio variance
risk = cp.quad_form(w, Sigma)
prob = cp.Problem(
cp.Maximize(ret - gamma * risk), # objective
[cp.sum(w) == 1, w >= 0] # constraints
)
prob.solve()cp.Variable(n) declares your decision variable, the portfolio weights for your signals. cp.quad_form(w, Sigma) computes the quadratic form w^T Sigma w, your portfolio variance. CVXPY needs this special function because the solver must verify the matrix is positive semidefinite.
Constraints are just list items. Want long-only? Add w >= 0. Want factor neutrality? Add F.T @ w == 0. The declarative style means the solver figures out how to satisfy all of them together.
What Is DCP?
CVXPY doesn’t just solve any math problem you throw at it. It enforces Disciplined Convex Programming (DCP), a set of rules that guarantee your problem is solvable. Think of DCP as a grammar checker for optimisation: before any solver runs, CVXPY parses your objective and constraints to verify they follow the rules. If they don’t, it refuses to proceed.
The practical rule to remember: equality constraints must be linear. You can write cp.sum(w) == 1 because summing variables is linear. You cannot write cp.norm(w, 1) == 1 because the L1 norm involves absolute values, which are nonlinear. CVXPY rejects the second before any solver sees it.
Inequalities are more flexible. cp.norm(w, 1) <= 2 is fine: you can upper-bound nonlinear expressions. But the moment you write ==, CVXPY demands linearity.
Long Short Portfolios
The natural way to express a market-neutral portfolio is: weights sum to zero, gross exposure equals one. You might try:
cp.sum(w) == 0, # dollar neutral - works fine
cp.norm(w, 1) == 1.0, # gross exposure = 100% - DCP violationThe first constraint is affine (linear*). The second uses the L1 norm, which is convex. DCP rejects convex equality constraints, so this fails.
Similarly, you might try cp.sum(cp.pos(w)) == 0.5 to force exactly 50% long exposure. But cp.pos(w) is max(w, 0), which is convex. Same problem.
The decomposition trick sidesteps this by introducing auxiliary variables:
w = cp.Variable(n)
w_long = cp.Variable(n)
w_short = cp.Variable(n)
constraints = [
w == w_long - w_short, # decomposition
w_long >= 0,
w_short >= 0,
cp.sum(w_long) == 0.5, # 50% long
cp.sum(w_short) == 0.5, # 50% short
w_long <= 0.05, # position limits
w_short <= 0.05,
]Now every constraint is affine or a simple bound. cp.sum(w_long) == 0.5 works because w_long is just a variable, the sum of variables is affine. We have moved the nonlinearity into the structure of the problem rather than the constraints.
The position limits (w_long <= 0.05, w_short <= 0.05) serve double duty: they enforce diversification and prevent a subtle loophole where the optimiser could set w_long[i] = w_short[i] for the same signal, satisfying the budget constraints while keeping actual exposure near zero. With position limits, there is not enough room to game the formulation this way.
Scaling To Large Universes
How do we calculate the covariance matrix? Sample covariance is the starting point: take 252 trading days of returns and compute the outer product. For small universes (under 100 signals), this often works. For large universes, it fails badly because you have more parameters to estimate than data points. E.g. For 3000 signals, that is 9 million entries. Inverting it scales O(n^3).
Factor models solve this by assuming returns are driven by a smaller set of common factors. The simplest approach is PCA on your return matrix: take the first k eigenvectors as factor returns.
The factor structure decomposes risk:
Sigma = F @ Sigma_tilde @ F.T + D
Where F is n x k, Sigma_tilde is k x k factor covariance, and D is diagonal idiosyncratic variance. The solve time drops from O(n^3) to O(nk^2). For n=3000 and k=30, that is roughly 10,000x faster.
We will explore this in more detail in later articles, but I wanted to get it out there that computational complexity of large MVO universes is going to be a pain in the butt once you become a serious user of it.
Failure Modes of MVO
MVO assumes your expected returns and covariance matrix are correct. They will never be. Correlations change. Volatilities spike. Regimes shift.
Ill-defined correlation matrices. When you have more signals than observations, the sample covariance matrix is singular or numerically unstable. You cannot invert it. Even when invertible, small eigenvalues create enormous sensitivity to estimation noise. The solution is shrinkage estimators or factor models that reduce the effective number of parameters. Again, we will cover this in deep detail in future articles.
Estimation errors in expected returns. Small errors in alpha can cause large swings in optimal weights. MVO is notoriously sensitive to return forecasts. A 10 basis point change in expected return can flip a position from long to short. This is why ranking-based signals are preferred: only the ordering matters, not the absolute magnitudes. We’ve already covered this in some detail in previous articles.
Estimation errors in covariance matrices. Historical sample covariance is backward-looking. When regime shifts occur, your estimated correlations are stale. If your signals are thematic, you can cluster them together and then do MVO on thematic clusters instead of individual signals. This is very powerful in practice.
MVO is single-period. Your alpha today is not your alpha next week. Multi-period extensions exist that address optimal trading trajectories, but they add complexity. We will discuss this in future articles as well.
Conclusion
MVO answers one question: given what you expect about returns and how they move together, what weights maximise risk-adjusted return?
Rather than using instruments as an input, we argue that signals produce more stable expected returns than raw instruments, making MVO more reliable.
Then, we show a practical way to implement MVO using CVXPY, which reduces the implementation to declaring your objective, listing your constraints, and calling solve().


Outstanding walkthrough of signal-level optimization. The key insight about MVO working better on signals than instruments is underappreciated - most implementations get stuck trying to forecast instrument returns directly and end up with garbage in garbage out. Learned this the hard way at a previous shop where our MVO kept suggesting wild concentrated bets untill we realized the estimation error on individual equities was drowning the covariance structure. Running MVO at the signal layer sidesteps that entirely.
If you do MVO on signal space how would you add constraints on bounds of individual asset, and constraint such as lambda * |w optimal - current w | ? Is there a clean way or do you again have to map the problem back into another optimization problem.
i have usually been combining alphas before the optimization routine to come with a unified forecast for each asset and run optimization on the instrument with t.cost and risk model.