Preamble: It’s time for my second newsletter. About a month and a half since the last one – a bit over my monthly target cadence. I’ve got some interesting stuff today.
Source Code Is the New Assembly
Line 4892 was off and it was already 11pm. I rubbed my eyes; at what point does exhaustion outweigh my principles? I was too tired to even name the problem, let alone fix it.
I’d traded coding fatigue for reviewing fatigue. More productive, but not nearly enough. And more sinister.
The Trough
I had a reasonable plan: make the AI follow rules, such that I could orchestrate it into larger and larger tasks. My friend Cyril warned me against building multi-agent from the get-go, so of course I nodded and kept building. After all, is not experience the best teacher? For weeks it accreted endlessly… a hydra of features and edge cases begetting their own edge cases. The thing meant to save me time was costing me more of it, and I could feel it failing before I admitted it. I think most builders know. You just keep going because the sunk cost whispers that the next fix will be the one that makes it click.
But the deeper problem wasn’t the tool I was building. It was me. My reviews got sloppier as the day wore on. Frustration yielded to YOLOing the model against tasks and hoping for the best. The style guides I’d written were being ignored by the model, and I wasn’t even noticing. It was draining me and giving it more hours wasn’t working.
It would take everything I had. I had to give it nothing.
First Principles
I stepped back and stared at the shape of the problem. LLMs had moved me from writing code to reviewing code, but my time was still the bottleneck. I could generate ten times more code, but I had to review ten times more code. Better, but another linear function of my time. Not enough.
So what are agents actually good at? They’re persistent. They don’t get sloppy at 4pm. They can grind on a problem. But they need a quantitative target — not “follow this style guide,” something they can pursue mechanically, over and over, without qualitative judgment.
What if I gave my code a score, and the agent just… made the score go up?
Not a feeling — a number. You run your code through analysis, and out comes X.
Now imagine you could break that score down. Not just “the codebase scores X” but “this file is dragging it down, and within that file, these three functions are the worst offenders.” You can point at the pain.
Now hand that breakdown to an AI: fix the worst thing. Re-score. Better. Fix the next worst thing. Keep going.
That’s the whole loop. Score, find the hotspots, fix, re-score, repeat.
The Link
This pattern has a name. It’s how we train neural networks.
The score (inverted, so it goes down as quality goes up) is called a loss function. The breakdown of what’s responsible for the loss is called the gradient. The loop of scoring and improving is called training. Each pass through the loop is an epoch.
I’m training my codebase the way we train neural nets.
Goodhart’s Tension
There’s an obvious problem with optimizing for a score: gaming it. Tell an AI to minimize code, and the optimal solution is to delete everything.
But real problems can be subtler than the absurd version. I hit this early. The optimizer, trying to minimize function complexity, started shattering functions into tiny single-use helpers. Each function looked simpler on paper, but the logic was scattered across dozens of fragments. The codebase became harder to read, harder to follow, harder to reason about. It was trimming muscle, not fat.
The fix was a competing score. I added a measure of ‘code economy’, which penalized scattering. This incentivized balancing function simplicity and cohesive logic. Trim the fat, not the muscle.
The tension between scores isn’t a flaw. It IS the system. You don’t want any single metric maximized. You want the equilibrium where competing goals hold each other honest.
Btw, this trap has a name: Goodhart’s Law. When a measure becomes a target, it ceases to be a good measure. The antidote, it turns out, is not better measures. I want to keep those few and simple. It’s tension between them.
The Loop
Here’s some messy networking code I ran through the loop.
The loss breakdown pointed at two things: duplication across similar functions, and state complexity. I didn’t tell it to look for these; the loss function found them.
Four passes. Notice the loss drops:
| Epoch | Loss | Delta |
|---|---|---|
| baseline | 0.49 | – |
| epoch 1 | 0.32 | -0.17 |
| epoch 2 | 0.13 | -0.19 |
| epoch 3 | 0.11 | -0.02 |
| epoch 4 | 0.09 | -0.02 |
Big gains early, then diminishing returns (the same shape as neural networks!). We can stop when the losses converge. This is when we’ve maximized tension between our competing principles.
The state cleanup illustrates what it found. Before, the state model was a bag of flags:
// Before: each flag is independently true/false,
// each Option is independently Some/None.
// 8 booleans + 3 Options = 2^11 = 2,048 representable states.
// Most of those states are nonsensical -- authenticated
// but socket closed? handshake complete but not connected?
// Nothing in the code prevents it.
pub struct Connection {
pub is_connected: bool,
pub socket_open: bool,
pub is_authenticated: bool,
pub handshake_complete: bool,
pub socket_id: Option<u64>,
pub last_error: Option<String>,
// ...
}
After: five states, enforced by the type system.
// After: exactly 5 states. The impossible combinations
// don't just go unchecked -- they become inexpressible.
// socket_id only exists when the connection is Ready.
// last_error only exists when something went wrong.
// The type system holds the truth, not the programmer's memory.
pub enum Connection {
Idle,
Draining,
Ready { socket_id: u64 },
Backoff { retry_after_ms: u64, last_error: String },
Error { last_error: String },
}
From 2,048 representable states to 5. Any LLM could make this change if you pointed at the struct and asked. The point is the system identified this as the highest-value target without being told. The loss function pointed here. The gradient said this is where the pain lives. The agent made the fix. I never said “look at the state model.”
That’s what the loop buys you at scale. Not better one-shot refactors – better prioritization. It finds the things a human reviewer misses at 4pm, across a whole codebase, without getting tired.
Side Quest: LLM Vision
The loop gives the LLM two things: a loss to reduce and a location responsible for it. I wanted to see what it sees.
You used to read code to find problems. Now the loss finds problems and points at code. But maybe that isn’t surprising. Performance engineers don’t read every function looking for the slow one — they profile, see the hotspot, go straight there. Same idea. I built a flame graph for code quality: instead of “where is the CPU time going?” it shows “where is the complexity going?” You can explore it on the above example here:
Human Out the Loop
So what do you think about when you’re not staring at line 4,892?
This has happened before. Assembly programmers went up the stack. They stopped writing instructions by hand and started writing in higher-level languages. Assembly didn’t disappear – it became intermediate. We just stopped looking at it. I haven’t been writing code for a while now… and I don’t miss it.
Source code is starting to feel like the new assembly.
P.S.
I’d love to hear what you think.
What’s next for me: I want to push this further and start orchestrating larger work streams on top of the loop. More files, more services, more complex dependency graphs. I want to find the walls. I also have some interesting questions ahead of me that I don’t have answers to yet – how do I show this to people? What’s the right UX? Open source or closed? If you have opinions on any of that, I’m genuinely asking.
A few things I’ve been enjoying lately:
- Your LLM Doesn’t Write Correct Code. It Writes Plausible Code. – an analysis of an LLM reimplementation of SQLite. Honest, helpful, and a good reminder that LLMs target compilation, not correctness.
- Project Hail Mary – fun sci-fi. Almost put it down in the beginning, but glad I stuck with it. A bit “written for TV” feeling. Kept wishing it would turn into The Three Body Problem instead.
- Blizzard stealth-released the first Diablo 2 expansion in over twenty-five years. Took over my life for a week or two.
Till next time.