The Human in the Loop Is the Whole Point

Bill Rappleye

6 min read

I race bikes. Not professionally, but seriously enough that I follow a structured training plan, track my power output, and care about getting faster. The plan is built by software that knows my power curve, my training load, my recovery metrics, and a hundred other variables I'd never think to track on my own. By any reasonable measure, it's a smart plan.

What the plan doesn't know is what I've learned about myself. I've identified weaknesses the algorithm hasn't picked up on — places where my numbers look fine on paper but my actual performance tells a different story. While the plan keeps prescribing more volume at threshold, what I actually need is repeatability — the ability to deliver hard efforts set after set without the quality collapsing on the back half of a workout.

So I've been adapting the plan. I’m not ignoring it — the structure is good and I'm not trying to outsmart the science. I'm substituting workouts that target the weakness directly. Holding back, even when I feel strong, because I've learned that the gains live in completing every interval cleanly. Cutting a set short when my form starts to break down, because finishing with garbage reps reinforces exactly what I'm trying to fix.

The plan is just a tool(and it’s a genuinely good one, too). But the gains compound when I bring my own informed judgment to it. I'm getting stronger faster than blind adherence would have made me, because I know things about myself the plan can't see.

I've been thinking about this constantly as we build out our agentic development practices.

The Phrase That Sells Humans Short

"Human in the loop" has become industry shorthand for how humans participate in AI-driven workflows. It also subtly sells the human short. The framing implies the human is a checkpoint — a safety mechanism we tolerate until the AI gets good enough to remove the loop entirely.

I want to argue the opposite. The human isn't in the loop. The human is the loop. Take them out, and you don't have a faster system. You have a system that's confidently going somewhere nobody asked for.

Agentic tools have gotten remarkably good. They produce credible first drafts of code, scaffold features, write the tests you would have written, and surface bugs you might have missed. The productivity gains are real.

But the limits are real too. The ARC-AGI-3 benchmark, released in March 2026 by the ARC Prize Foundation, was built specifically to test whether AI agents can explore unfamiliar environments, figure out the rules, and adapt without being told how. Humans score 100%. Frontier AI systems score below 1%. The gap isn't about raw capability; it's about a kind of generalization that current architectures don't yet produce.

This tracks with what we keep noticing in our own work: the teams getting the most out of these tools are the ones with the deepest foundational expertise, not the best prompts.

The agents close knowledge gaps brilliantly, but somebody has to know enough to drive them in the right direction in the first place, and to recognize when the right direction is different from what the tool is suggesting.

The Two Kinds of Expertise You Can't Skip

There's a pitch making the rounds right now: just describe your application, and the AI will build it. The demos are impressive. The dream is seductive.

It's also, in any serious context, a fantasy that has nothing to do with AI’s capability.It’s actually because articulating what you want from a software system—with enough precision that it can be built correctly—requires a kind of expertise you almost certainly don't have if you're relying on AI to do the whole job.

And actually, it really requires two kinds of expertise, both crucial.

The first is technical craft: the judgment of experts who know how systems hold together. They're the ones who can tell you why a particular data model will paint you into a corner three years from now, why an authentication choice that seems fine today will become a security incident later, why this innocent-looking feature will balloon the complexity of everything around it. They notice when an agent's output is locally correct but globally wrong, and when the code passes its tests but introduces a coupling that will haunt the system for years. This is hard-won pattern recognition, and it lives in people who have built and maintained real systems.

The second is domain comprehension: a lived understanding of what the client's business actually does, how the users actually behave, and what the requirements actually mean. Not what the requirements say, but what they mean. This is the expertise of someone who has sat across the table from the people who will use the software, watched them work, and developed an intuition for the gap between stated needs and real needs. They're the ones who catch the moment when an agent has produced a perfectly reasonable feature that solves the wrong problem.

When either is missing, the AI can't compensate. It will infer, pattern-match, and produce something plausible. But the gap between "plausible" and "right" is where projects quietly die.

The cycling parallel maps cleanly here. TrainerRoad is a tool that builds excellent adaptive training plans based on your goals. For a lot of riders, it's all the coach they need.

But there's a difference between TrainerRoad and a human coach who has worked with you for years. The coach has watched you ride. They know how you respond to volume versus intensity. They hear it in your voice when you've had a stressful week. They notice when you're starting to overreach before the data shows it. And critically, when something in the plan needs to change, they have the experience and the relationship to know what to change it to. The AI is TrainerRoad. The architect who has lived inside your industry for fifteen years is the coach who knows you. You need both.

You Can't Outsource the Felt Sense

Even with the right experts in the loop, there's a second layer of human contribution the AI can't replicate: real-time judgment.

Back to the training plan — the reason a generic program can't fully replace human input is that no external system has access to the variables that matter most. How rested you actually feel. The subtle ache that might be nothing or might be the start of something. The hundred small things that constitute "how you actually are right now."

Software has the same dimension. Agents are remarkably capable when given good direction. They are equally capable of producing confident, plausible-looking output that can be quietly wrong. The variable that determines which outcome you get is whether someone on the team has the foundational knowledge to recognize the difference, and the time and authority to act on it. That action often looks like the cycling equivalent of modifying the plan: not throwing out the agent's output, but redirecting it, reframing the prompt, substituting a different approach when the suggested one is technically valid but practically wrong.

Here's the risk that worries me most: if we use AI to accelerate output but compress the time humans spend thinking, we lose efficiency, and more significantly, we actively degrade the capability that makes the whole system work. Engineers who only review agent output lose their feel for code. Architects who let agents propose designs lose their intuition for tradeoffs. The human in the loop has to remain a skilled human, and skill atrophies without exercise.

The Strategic Bet

The most important decision facing engineering leaders right now isn't which AI tools to adopt. The tools will keep improving, and the teams that aren't using them will fall behind. That part is settled.

The harder decision is what to do with the humans. Do you treat them as a constraint, or as the central asset that makes the whole agentic system worthwhile? Do you compress the time they spend thinking, or protect it? Do you assume their skills will take care of themselves, or invest deliberately in keeping them sharp?

The teams that win the next decade won't be the ones with the most sophisticated AI tooling. AI tooling is becoming the prerequisite to success. The teams that win will be the ones who understood early that the value flowing through the agents was always coming from the humans driving them. They'll be the ones whose engineers can still tell, on any given Tuesday, whether to do the workout according to the plan or how to modify it to address what the plan can't see.

The human is the loop. Build accordingly.

‍