How To Use Loops In Agentic Engineering To Build The Hardest Products
Introduction
It is now old news in some circles that loops are pretty much the heart of meaningful agentic engineering.
After a trillion tokens spent on building a cutting-edge institutional investment process (@openforage) with nothing but agents, here are some thoughts about “agentic loops” and how to reason about and efficiently utilise them.
Reasoning About Loops
Loops are simply an intuitive mechanism: the broader public has learnt that throwing more tokens at a problem improves the quality of the solution (for most problems).
This shouldn’t be hard to intuit, but it’s often easy to forget. Like humans, an agent’s first pass on most things will not be their best. I have quite literally never encountered a session where I felt an agent’s first pass could not be improved upon.
Throwing more tokens at the problem allows them to explore a larger section of the solution space (breadth), and allows them to reason about more dimensions of various solutions (depth). Just like humans, agentic reasoning is greatly beneficial to coming up with novel and out-of-distribution solutions.
That being said, tokens are expensive, and frontier models are not trained only to solve extremely difficult problems - some people are only interested in using agents for confirmation bias and psychosis, in which case the only capability an agent requires is to say “yes, you are absolutely right”.
Therefore, these frontier models are trained and tuned to be “token efficient” and to appropriately provide a “fast response”. This, unfortunately, goes against the (critically important) paradigm that throwing more tokens at a problem drastically increases the quality of the solution.
So, we’ve invented looping as a reasonable middle ground. The psychosis / conversationalist crowd can have their rapid responses, and the builders trying to solve difficult problems can easily throw (billions of) tokens at a problem.
The Problem With Dumb Loops
So, we’ve established throwing tokens at a hard problem is beneficial AND we’ve now reached a level of model intelligence where agents can set reasonably difficult goals, labor on them over long-running sessions, some up to a week.
It sounds like utopia, but naive implementations of this will almost always result in a mountain of slop. You will either get something that has drifted so far from your original idea or riddled with so many bugs that it is virtually useless.
Why?
Fortunately, the explanation is intuitive and satisfying.
In a naive loop, three things are happening at once.
Errors Are Compounding
Agent errors made early on can compound and spiral out of control, much like trying to build an extremely tall tower on shaky foundations. You will find that bad decisions, designs, or bugs introduced early on in a long-running session can continue to affect code and processes written in later parts of the long-running session.
There are many variants of this, but one of the more common (and egregious) examples is making a very shitty design choice early on, and then doubling down on it in later parts of the loop by enforcing the shitty design choice and basing the entire infrastructure on it.
It’s not that your agent is stupid, but that compaction has made it “forget” that the design choice was a “compromise” rather than an intended artifact.
The Absence Of Meaningful Iteration
A recent meme is @pmarca‘s “retardmaxxin” tweet storm, where he takes a simplified stance that introspection is bad.
On the surface it seems like a strange stance to take, but the simple corollary is that introspection without iteration is bad. The point of life is progress. Introspection for the sake of introspection is often devoid of exactly that - progress.
There are many extremely intelligent people whose impact we will never feel because they are too busy stuck in their own heads, only ever pitting their ideas against a cheap and shallow simulation of reality.
Agents, too, do not benefit from introspection. An agent does not meaningfully progress into better situations when you get it to review its own work with the same context. Improvements in this regard are shallow.
Humans and agents alike are drawing solutions out of a solution distribution (set of all possible solutions) given some context. Without a change in context, it is unlikely that we will draw another solution that is far from the current one.
Have you ever written or drawn or created something, checked through it, beamed when you thought it was the best you could do, only to come back a few days later and think it was absolute horseshit?
That is only possible because you allowed yourself a change in context (a different day, a different mood, etc) to draw from a different solution distribution.
Therefore, an agent stuck in a naive loop trying to improve its work often does not do better than some marginal improvement.
There Is No North Star
Without some north star for implementation / verification to tend towards, agents can drift indefinitely, as each compaction brings them further and further away from the original intention.
Writing Better Loops
To address the problems of dumb loops, we need to resolve 3 things:
We need a mechanism that can arrest errors early on and prevent them from compounding
We need a mechanism that can provide meaningful iteration to the agent, by introducing different contexts so it can draw out of a wider solution distribution, and by offering meaningful objectives so the agent can actually hill climb the objective
We need a north star
That is WHY we use verification as a key fixture in a loop, because it happens to fix all issues of dumb loops.
After your agents have implemented something, you create a new agent with fresh context, unpolluted by the context of implementation, to verify the work of your implementation agent.
You need to approach verification often and early. It is also important that your verification agents have a fresh context each time - which allows them to avoid problems of context exhaustion, and again, as the code/solution changes, the verification agent draws its feedback from a different distribution, and that is injected into your implementation agent which allows it to draw a solution out of a different distribution, etc.
Doing it early in your agentic workflows allows you to catch misunderstandings and bugs before they spiral into permanent design choices that later agents no longer recognize as compromises and start to build around.
Doing it often is providing the feedback so that your implementation can iterate towards a better solution. This is the mechanism that “throws more tokens” at the solution to achieve a better solution and simultaneously acts as a north star, preventing spec drift.
Have the mindset that the solution you are looking for is at the end of a hundred iterations. You want your verification agent to kick in as often as possible then - providing feedback to your implementation agent. Verification is very (token) expensive and hence how frequently and how intensely you verify becomes a matter of (harness) optimisation.
The general understanding is that the more often you conduct verification, the higher quality your end solution will be, because you would have thrown a higher number of tokens at the problem, and the number of iterations your solution would have undergone will be higher.
This is, of course, only true if you have a good verification.
What Makes For A (Simple) Good Verification
A good verification must be guided by meaningful rubrics. You should almost always invest time in designing good rubrics for what you want to verify.
For example, if you want verification on clean code, you might care about: 1) Extensibility Of Code, 2) Naming Of Variables, 3) Modularity Of Code, 4) Meaningful Normalisation, etc.
Within each of the dimensions of clean code you care about, you might further identify granular fields that capture some aspect of that particular dimension. How those fields should be scored, and how those scores might be aggregated into a single dimension, should be well defined.
A verifier agent can then objectively score a solution by referring to the rubrics, and provide feedback to the implementation agent to iterate. The lack of a rubric makes verification extremely fuzzy, which makes your iteration noisy.
There are a few ways to define “good verifications / good scores”.
You can use fixed thresholds, which is stopping if you score above some threshold, let’s say 90 points in total. You can also use percentage thresholds, which is stopping once a solution doesn’t improve on the previous solution by more than some threshold, let’s say 10%. You can also use early stopping, which is stopping once a solution doesn’t improve on the previous solution after some number of attempts, let’s say 3 iterations. You can and should explore a combination of stopping mechanisms.
By defining a “good score”, you also organically create a stopping point for your loops, which creates clear agentic workflows centered around verification.
I gave a verification for “clean code” because it is easy to explain (pedagogically clean), but at least one major dimension of your verification should be tied directly to the project/spec. You need a “how good is this solution at achieving the spec or solving my problem” kind of verification.
That is what acts as a north star and prevents spec drift. You don’t want to be handed an impressionist painting of a boat when you’ve asked for an actual boat.
The Canonical Loop
Want to build something
-> Design good rubrics to verify the solution
-> Come up with a threshold for what good looks like
-> (1) Agent implements
-> (2) Implementation is sent off for verification early and often
-> (3) Verification Passes?
-> (4) If yes, end loop, If no, go back to (1) with verification feedback
Conclusion
There are levels to verification, and one can design truly powerful verification systems to avoid being blasted with the agentic slop cannon. However, even a basic, well-designed verification system will bring you very far.
That concludes my article on meaningful looping in agentic engineering. If this has been helpful, I would love some feedback so I may continue this series on building real, hard products with agentic engineering.


