How To Use PCA To Actually Undercover Real Factors
Introduction
You’ve read it somewhere, run PCA for “statistical factor analysis”; but material on this is either so shallow that it’s meaningless (run pca and the eigenvectors are factors), or so dense that you’ll need a PhD in Statistics to parse it. This is the most information dense article on why PCA can actually extract factors, and how to reason about it.
Run PCA on 500+ stocks and your first 5-10 eigenvectors converge to the actual systematic factors, not arbitrary statistical directions. The conditions are simple: K eigenvalues explode as you add assets, the rest stay bounded.
That’s it. That’s the rationale. The rest of the article is to give form to this argument.

