Reading the actual code beats vibes

I rewrote half this blog because the first drafts were embarrassing. Not wrong, exactly. Just thin. Generic. The kind of post you skim and forget. They read like every other “here’s how I built a WordPress addon” article, because that’s effectively what they were generated from: a paragraph of my notes, plus whatever the model could reconstruct from memory about how plugins usually work.

The second time around I did one thing differently. I pointed the agent at the actual repositories. The real addon code. And the writing changed completely.

Vibes are confident and wrong

When you hand an AI coding agent a short summary and ask it to write or build from there, you get vibes. It pattern-matches against the average of everything it has seen. The average WooCommerce extension. The average payment integration. The average folder layout. The output is fluent and plausible and almost never matches your code.

For prose this shows up as vagueness. The first draft of one post said something like “the addon hooks into the booking flow and adds a deposit step.” True in spirit. Useless in practice. There’s no file in there. No decision. No reason it’s built the way it is. It’s the description you’d write if you’d never opened the repo.

For code it’s worse, because vibes compile. The agent invents a helper that doesn’t exist, calls a method with the signature it expects the framework to have, assumes a config key that was renamed two versions ago. It all looks right until it runs. You spend more time correcting confident hallucinations than you’d have spent writing it yourself.

What changed when I gave it the repo

For my own commercial addons I keep each one in its own repository under github.com/74h1r. When I re-ran the blog work with agents that could actually read those repos, the difference was immediate and concrete.

The post about my deposit addon stopped saying “it adds a deposit step” and started naming the real shape of the thing: which entry file registers it, where the settings live, the specific decision to force first-time customers onto online payment with a configurable minimum rather than blanket-charging everyone. That nuance wasn’t in my notes. It was in the code. The agent only found it because it read the code.

Same story across the others. The subdomain-and-custom-domain routing addon. The per-staff duration override. The standalone bank-transfer payment method. Each post got specific because the source was specific. File structures that are real. Trade-offs I actually made, surfaced back to me in a way that occasionally reminded me why I’d done something.

That’s the tell, by the way. When the output teaches you something about your own work, the agent is reading. When it flatters you with smooth generalities, it’s guessing.

This is not a writing trick

The lesson generalizes straight into day-to-day coding, which is where I actually spend my time.

Don’t describe the bug. Show the failing test. “Sometimes the discount doesn’t apply” gives the agent nothing but room to invent. The actual assertion that fails, the actual stack trace, the actual input that triggers it — that’s a target it can hit.

Don’t summarize the API. Point at the real class. Every framework has its own conventions, and a guess at them is a guess. WordPress exposes hooks and a global $wpdb; Booknetic provides its own structure on top of WordPress; a whole category of plugins out there run raw queries straight against the database with no abstraction at all. An agent working from “it’s a WordPress plugin” will produce code for the average plugin, not yours. An agent reading your actual files writes for your conventions.

Don’t paraphrase the config. Open it. The difference between payment_min_deposit and min_deposit_amount is the difference between code that runs and code that throws, and no amount of reasoning recovers a key the model never saw.

The pattern underneath all three: replace your description of reality with reality. Your memory of the code is lossy. Your summary is lossy. The repo is not.

Keeping it grounded without drowning the agent

The obvious objection is context. You can’t dump an entire codebase into a prompt and expect coherence — you get a different failure mode, where the signal drowns. So the real skill isn’t “give it everything,” it’s “give it the right slice, scoped small.”

The rhythm that works for me, on both writing and code:

Brainstorm. Talk through the intent first, out loud, before any file is touched. What am I actually trying to do and why.
Spec. Write down what done looks like. Concrete enough that I could hand it to someone who isn’t me.
Plan. Break the spec into ordered steps, each one small enough to verify on its own.
Small chunks. Execute one step at a time, each pointed at the specific real files it needs and nothing else.

Every stage narrows the context. By the time the agent is doing the small chunk, it isn’t holding the whole repo in its head — it’s holding the three files this step touches, plus the failing test it has to make pass. That’s the sweet spot: real code, but scoped. Specific, not drowning.

This is also the part I won’t pretend is effortless. I run my own setup for orchestrating agents through this loop. It has a name I use in private and a shape I’m not going to lay out here. Call it the thing that keeps the agents pointed at reality while I sleep. That’s all you get.

The contrarian bit

The industry conversation right now is mostly about bigger models, longer context windows, cleverer prompting. All useful. None of it is the lever.

The lever is mundane: did the agent read the actual thing, or did it improvise from a vibe. A modest model reading your real code will outwrite and out-engineer a frontier model guessing from a summary, every single time. I have the rewritten blog posts to prove the first half, and a year of addon commits to prove the second.

So before you reach for a better model or a longer prompt, check the boring thing first. Is the source in the room. Is it the failing test, the real config, the actual class — or your description of them. Close that gap and most of the quality problem closes with it.

Vibes are for music. Code wants the code.