Stop Babysitting Your Agent · Mohammad Abu Aboud

The sidebar wasn't closing. I described the bug to Claude, it wrote a fix, and said it was done. It wasn't. I described it again. Another fix. Still broken.

The problem wasn't Claude. The problem was me. I was the bottleneck.

The loop

Describe a problem, Claude writes code, you test it, it's wrong, you describe it again. Repeat.

That loop feels normal until you notice you're doing all the verification — every iteration, I was the test runner. That meant the loop ran at human speed. You're not delegating. The agent isn't running. You are.

The fix isn't better prompts. It's a different contract.

Old: {problem}

New: {problem} + {how to verify it's fixed}

Before handing off a task, define what done looks like.

A great engineer, handed a bug they can't reproduce, is just guessing. The feedback loop is what separates debugging from diagnosis. Without one, neither you nor the agent knows if the work is finished.

Story 1: The sidebar

The sidebar had a bug. Press X: instead of closing and shifting the layout, it left a white rectangle behind, hidden, not closed.

I described it to Claude. It wrote a fix. The sidebar closed. Now there was a white rectangle where it used to be. Differently broken.

I installed a Chrome extension that gives Claude browser access, then reprompted: "The sidebar is not closing properly. Go to [URL], click X, confirm the layout shifts correctly. Don't stop until it does."

One pass 30 mins later. It worked.

Not because Claude got smarter. Because it could now check whether it was actually done.

Story 2: The Docker image

We had a Docker image at four gigabytes. Goal: under six hundred megabytes.

The manual cycle was miserable. Build: ten minutes. Analyze. Tweak. Rebuild. Four or five cycles, two to three hours, one decision every thirty minutes. So it never happened.

I ran one prompt: "The Docker image is too large. Build it, measure the size, and do not stop until it's under six hundred megabytes."

Then I closed the tab and worked on something else.

A few hours later: done. Under six hundred megabytes.

Story 3: The race condition

We had refactored the worker. Tests started failing intermittently. Sometimes pass, sometimes not.

Instead of diagnosing it, I thought about what passing would look like.

"Write me a test that fires this flow 100 times and verifies it succeeds every time."

Claude wrote the test. It failed. Claude added logs, read the output, formed a hypothesis, iterated. For about an hour, no input from me.

Then it told me: a race condition.

I never told Claude to look for one. I just defined what passing looked like.

Be Great Team Lead

The best engineering team leads I've worked with don't tell people how to implement things. They define what shipped looks like: "users can log in with Google without losing their cart." Then they step away. The worst ones sit in on every PR review and approve every design decision. Their teams do work that's only as good as the team lead.

I was being that second team lead. When I stopped dictating steps and started defining outcomes, I got solutions I hadn't thought of.

The bottleneck was never Claude's capability. A loop doesn't need a human in the middle. It needs a condition at the end.