AI is corrupting the OODA Loop

Reading Barnes in the context of Boyd

May 08, 2026

As AI adoption becomes mainstream within the tech and tech-adjacent world, it’s forcing humans to grapple with some new questions.

One of the sets of questions that I’ve been particularly concerned with over the past few weeks is what it means for various types of work product to be human-generated — and the necessary downstream questions of when particular work products ought to be human-generated versus AI-generated.

Last week, I came across this piece by Phin Barnes, General Partner at the GP. When Phin writes something, it’s nearly always worth reading. This piece was particularly well-timed from my point of view, since it spoke to the topics I had been thinking about.

The General Partnership

Systems of Judgment

The workflows we’re automating with AI today were never designed to be optimal. They were designed around the limits of humans doing the work. A software team has a product manager, a designer, and an engineer not because the work inherently requires three people, but because no single person could hold all of that context simultaneously. The Apollo pro…

10 days ago · Phin Barnes

The most salient paragraph comes about midway through the document, when he says:

A system of judgment does four things, in a loop:

It ingests domain context and makes a recommendation. It captures the human’s actual decision — did they accept, modify, or reject the recommendation, and why. It observes the outcome. And it uses that complete cycle to make the next recommendation better.

To put Phin’s steps in order, these four steps from an AI are:

Orient (ingest domain content)
Recommend
Act (execute the human’s decision)
Observe (the outcome)

The point that he’s making through this sequence is that the actual decision is made by a human, and this is the most important signal to the AI in the process.

Phin’s description of the process is probably correct — he’s almost certainly spent more time interacting with AI agents than I have. But I think framing the decision as something “only humans can do” is on some level missing the point about what the current capabilities and limitations around agents actually are.

The first prototype F-15 Eagle in flight. — The design of the F-15A Eagle was strongly influenced by Boyd’s work.

John Boyd, a Cold War-era fighter pilot, developed the OODA Loop, which is a model that describes how humans in high-stress situations make good choices quickly:

Observe
Orient
Decide
Act

Both Barnes and Boyd see their subjects as having, if not agency, at the very least the capacity to carry out actions. My impression is that both would agree turning over the action itself to AI is a broadly acceptable idea. As software, agents can just do certain things more quickly than the human brain, and that’s incredibly powerful.

I write Molding Moonshots in a personal capacity.

If you are building in deep tech and thinking about raising Pre-seed, Seed, or Series A funding, I’d be more than happy to have a chat on professional terms, as an investor at MFV Partners!

Send me an email and we’ll find a time.

Email Aaron!

However, I think they disagree about pretty much everything before the action.

By contextualizing the decision in observation and orientation, Boyd argued that it was necessary for a human to develop specific context in order to make good decisions. Phin, on the other hand, seems to be asserting that humans can still make good decisions if AI agents take over the context-handling part of work.

I am broadly skeptical of Phin’s claim at this point in the specific context of white-collar jobs that work primarily with unstructured data — not primarily code or spreadsheets. This constraint I’m imposing is generally appropriate, because the back half of Phin’s argument is basically about vertical software taking the form of systems of judgment, and I think that implies white-collar work.

The reason for this is simple: I find Boyd’s argument that humans must have context in order to make consistently great decisions more attractive because the US Department of Defense trusts it, and I think one of its core competencies is making good decisions with unstructured data at scale. The Air Force and Navy, which operate most fighter planes, have had almost half a century to come up with a better model, and they haven’t done it yet.

Yes, it is certainly possible to use AI to ingest context and provide it to the human in addition to a recommendation, and this could happen within Phin’s system of judgment model. That’s not necessarily going to ensure the human considers the context — though that’s not a failure mode unique to conversations with artificial agents.

More critically, the state of the art in frontier models suggests that that’s not necessarily going to get the human better recommendations unless they’re already an expert. That’s wildly problematic for white-collar jobs which use unstructured data, which feel more at risk every day as Anthropic releases new integrations and skills for Claude. Perhaps the most stunning case of this is medicine, which is a field known for not being super excited about technical innovation.

The most stunning example of this I’ve come across is documented in “IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures”, a pre-print paper published on Arxiv in April 2026. It expressed that frontier model families like Opus, Llama, and GPT gave better medical advice in response to prompts self-identifying as a clinician than prompts self-identifying as a patient. It seems like these agents were poisoning the prompter’s ability to make a good decision.

If this is true, and if the claim generalizes to similar fields like law, humanity might actually have something of a problem in getting consistently good advice from AI agents in fields where context-specific expertise is important. Without being able to trust that an agent isn’t withholding information, it seems darn difficult to trust its recommendations.

Thanks for reading Molding Moonshots! This post is public so feel free to share it.

In turn, this makes me wonder how this potentially subpar human decision-making will impact training of AI agents.

I’m becoming increasingly concerned about the possibility that agents might start outperforming humans at specific tasks not because agents are becoming better, but because humans could become increasingly dependent on agents, and these agents may provide worse advice.

Critical Path

The Ghost in the Machine

Congress just passed the largest defense budget in American history. Shipyards are being asked to double submarine production. Munitions plants are running three shifts. The money is there. The demand signal is unambiguous. And yet the people who actually know how to build things — who hold in their hands and in their memories the accumulated knowledge …

8 days ago · 11 likes · 8 comments · Drew Wandzilak

As experts retire, and people hand over judgment to agents, our societal knowledge of certain nuances in various fields might be lost. There’s a very good chance that this expands beyond knowledge work

This is admittedly a very pessimistic view of the future, and I certainly don’t want it to come to pass.

At my core, I’m an optimist, so I don’t believe this vision of the future is inevitable. I think there are brighter days ahead if, and only if, we all think very carefully about what specific sorts of context and decisions we find it acceptable to hand off to agents.

Share Molding Moonshots

Molding Moonshots

Discussion about this post

Ready for more?