A futuristic Drupal contributor working in an AI-enhanced IDE

Collection

Drupal
The premier open source digital experience software

Vibe Coding Drupal: the Scent of Progress

May 11, 2026 11 minute read
AI wrote a Drupal module. Discover why "directed coding," using IDE tools, and human oversight are essential to avoid the "scent of progress" trap.
A futuristic Drupal contributor working in an AI-enhanced IDE

Collection :

Drupal

In SEO circles, people used to talk about the "scent of information", the subtle cues that tell users they're navigating toward what they want, encouraging them to keep clicking. I kept thinking about that phrase while spending a few days building a Drupal module almost entirely through AI tools, because the experience has its own version: a scent of progress. The model writes confidently. The diffs look clean. Each turn feels like forward motion. The trick is that sometimes the motion is real, and sometimes you're just spinning your wheels, changing code without getting closer to solving your problem.

In my previous post I shared how I got started on the module — Canvas Field Component, a small extension to  Drupal Canvas, the next-generation layout tool  — in roughly three days, for an effort that would have taken me weeks unassisted, since I'd never worked with Drupal Canvas at a code level before. It works, it passes static analysis and code quality checks, and parts of it are arguably better than what I'd have written on my own. But the most useful thing I came away with wasn't the code. It was a clearer sense of when the scent is real, when it's a mirage, and what tools and habits separate the two.

Here's where Claude and Codex impressed me, where they wandered, and how I'd approach the next AI-assisted contribution differently.

The scent of progress

The seductive part of vibe coding isn't the speed. It's that the work fits into the gaps in your day. When I'm manually writing code, it demands my full attention — a meeting or even making dinner means the work stops. With an AI agent running, I could nudge the model in a direction, walk away for ten minutes, come back to a wall of confident output, skim it, and nudge again. That fragmenting of attention turned out to be both the appeal and the trap.

The same pull operates at the level of scope, not just attention. When I started on Canvas Field Component, I would have been happy with basic formatter support. The model got me there so quickly that adding third-party settings — and with it the entire ecosystem of Date Formatters — felt like a small additional ask. It wasn't. The initial implementation was fast; getting it fully working took most of the back-and-forth in the project. A microcosm of the whole experience, really: surprisingly quick to something, much slower to done.

Because here's what I noticed: the scent is strongest when nothing is actually happening. When Claude was genuinely solving a problem, the output was often terse — a small diff, a quick note about what it changed, a request to confirm. When it was stuck, the output ballooned. Long explanations of the approach. Confident summaries of what "should now work." Whole files regenerated to address what was actually a one-line bug. The model wasn't lying, exactly. It was just going in circles, and the volume of activity obscured the fact that it was really just thrashing.

I started to develop a few heuristics for telling the difference. The first is that if a model fails to fix the same problem twice in a row, it almost certainly won't fix it on the third try with the same input. Continuing to ask gets you a different-shaped answer, not a better one. The fix is to change the input: reset the context to a clean window, switch to a different model, or — usually most effective — give much sharper feedback. "This doesn't work" is a request for the model to guess again. "This doesn't work when I do Y, but it does work if I do Z" gives it a wedge it can actually use. That single change in how I described problems was probably the highest-leverage habit I picked up.

The harder discipline is noticing when the scent is pulling you forward in the first place. It's easy, mid-meeting, to glance at an AI agent's latest update and think "great, it figured it out" because the surface signals all suggest competence. The audit happens later, when you actually run the code and discover the loop you've been in for the past hour. The cost isn't just tokens — it's the false sense that you're moving with the current while you're actually treading water.

Tools matter less than environments

I started this project in the Claude desktop app and moved to Codex inside VS Code partway through, so what follows isn't a fair head-to-head — different models, different contexts, different stages of the work. But the experience did teach me something I didn't expect: the biggest variable isn't which model you're using, it's how much of your environment the model can actually see and touch.

Claude Sonnet 4.6 was strong at the parts of the work that happen mostly inside the model's head. It understood what I was trying to build, offered a reasonable initial approach, and produced the scaffolding for that approach quickly. It was also useful as a fresh set of eyes on code another tool had produced — handing it a file and asking "what's wrong with this?" was often the fastest path to spotting a subtle bug. Where it gave me friction was at the edges, where the work meets the environment. Claude had a habit of committing its changes to its own branch in a separate repo it created inside a .claude directory, which meant testing those changes required asking it to copy them over. Some of that is probably setup I haven't figured out yet — but the broader pattern was that I had to be the bridge between what the model wrote and what was actually running on my machine.

Claude had stylistic tics worth naming, too. It produced more inline documentation than I'd write by hand — not necessarily a bad thing, but the kind of verbosity that obscures rather than clarifies once it accumulates. And it had a habit of adding section dividers to long files: 80-character comment lines meant to break up the module's main class (which had grown past 1,400 lines) into navigable chunks. The instinct wasn't terrible — that class probably should have been three classes — but the implementation often miscounted by a character or two, which tripped phpcs in ways phpcbf couldn't fix automatically. A small thing, but a telling one: it's exactly the kind of error the model would have shipped confidently if a deterministic check hadn't caught it.

Codex closed that gap in a way that mattered more than any difference in raw capability. Running inside the IDE, it could look at the rendered Drupal output and adjust the PHP when the markup was wrong, without me describing the symptom. It ran phpcs at the end of most tasks without being asked. It rebuilt the Drupal cache before telling me a change was ready to test, and flagged when a hard browser refresh would be needed because the module's Javascript had changed. Those are unglamorous habits, but they're exactly the kind of grounded feedback that disrupts the scent of progress. Each one is a deterministic check the model is running on itself before it hands the work back. None of it is the model being smarter. It's the model being less alone.

There's a real cost catch. Codex burned through my OpenAI quota much faster than I expected — fast enough that I'd want a Pro tier before using it regularly. Whether the productivity gain is worth the spend probably depends on what you're building. For a finite contribution with a clear end state, I'd happily pay it. For ongoing exploratory work, I'd want to be more deliberate about scoping each session.

If I were starting over, I'd default to the IDE-resident tool from the beginning and reserve the chat-style model for the moments when I genuinely want a second opinion or a sanity check on a piece of work. The model in the chat window thinks. The model in the IDE thinks and verifies. Those are different jobs.

So was it actually vibe coding?

It's worth pinning down what we're talking about, because "vibe coding" gets used to mean at least three different things. In the strictest sense, it means asking an AI for code and accepting what comes back without much concern for how it's built — the prototype-and-pray school. Looser usage describes prompt-driven development where you're making architectural choices as you go, distinct from "agentic coding" (handing the model a detailed spec and letting it plan and execute) and "assisted coding" (the IDE-autocomplete model where you're still doing most of the writing).

By the strict definition, I wasn't vibe coding. By the looser one, I was — mostly. There were stretches that drifted toward agentic, like when I described the third-party settings work in enough detail to let the model plan its own implementation. And there were moments I pushed back on choices a true vibe coder wouldn't have cared about. When Claude implemented a hook in a .module file, I asked for it to be moved to a class. When dynamic form loading was being handled by hand-written Javascript, I had the model rewrite it with HTMX. Neither of those changed what the module did. They changed the approach, and to me that mattered.

I think this is where the strict definition breaks down as a practice. If you genuinely don't care how the code is built, you can't tell when the model is stuck. The scent of progress works precisely because the surface signals — confident prose, clean diffs, plausible structure — look identical whether the model is solving the problem or pattern-matching its way around it. The only way to smell the difference is to have opinions about the underlying work: this hook belongs in a class, this Javascript should be HTMX, this 1,400-line file wants section dividers because it should have been three files. Take those instincts away and you've lost sight of the horizon.

The same logic is why I wouldn't skip linters and static analysis on AI-assisted work, even when the model is running them too. PHPCS and PHPStan are deterministic. They don't have a scent. They tell you yes or no, and that kind of grounded feedback is exactly what's missing from the rest of the loop. The proof was in the running: phpstan passed at levels 0, 1, and 2 on the first try, with a single level-3 issue I left for community feedback before going further. That's a striking thing to be able to say about code I largely didn't write line by line — and it only feels striking because the deterministic tools were checking the model's work the whole way through. Codex picked up on this too: it ran phpcs unprompted at the end of most tasks, and that habit was more valuable to me than any single piece of code it produced.

So if I had to name what I was doing, I'd call it something like directed coding: AI as the writer, me as the editor with opinions. The vibes were involved. They just weren't in charge.

What I'm taking into the next contribution

There's a version of this story where AI tools are the protagonist — where the headline is "Claude shipped a Drupal module" and the rest is decoration. That isn't what happened. The model wrote a lot of code, much of it good, some of it that needed pushing back on. But the work that actually moved the project forward was the work of staying oriented: noticing when the progress had stopped, knowing which deterministic check to reach for, deciding which architectural choices were worth a fight. Take that work away and the module wouldn't exist in three days, or three weeks, or possibly at all.

That's the part I think gets lost in the current conversation about vibe coding. The interesting question isn't whether AI can write code — it can — but what posture the human needs to hold while it does. For me, the answer turned out to be opinionated, attentive at the seams, and willing to stop asking the same question when the same answer keeps coming back in different costumes. None of that is exotic. It's just the kind of thing you don't have to do consciously when you're typing every character yourself.

The other thing I'm taking forward is humility about where this is heading. The tooling I used a few weeks ago is already not the tooling I'd recommend today, and what looks like sound judgment now will probably look quaint by the time my next contribution ships. I'd rather develop habits than allegiances. Four I plan to keep: default to the IDE-resident model, treat repeated failures as a signal to change the input rather than the input's volume, never trust a clean diff that hasn't been through phpcs, and ship before the model has convinced me it isn't done.

If you're contributing to Drupal — or any open source community — with AI tools in the loop, I'd love to compare notes. The most useful thing I can imagine right now isn't a benchmark or a tutorial. It's a shared vocabulary for the strange middle ground where these tools have landed us. "Scent of progress" is my best attempt at one. I'm curious about yours.

Keep Reading

View More Resources