March 11, 2026 · 10 min read · 2,406 words

The Decision Gap

Why deploying more AI into enterprises will make decisions worse, not better.

Everyone agrees there is a gap between AI capability and AI adoption.

Anthropic’s latest research makes this concrete. Their study, “Labour Market Impacts of AI,” introduces a metric called “observed exposure” that compares what AI can theoretically do against what it actually does in professional settings. The numbers are striking. For computer and mathematical occupations, large language models could handle 94% of tasks. In practice, only 33% are covered. For office and administrative roles, the theoretical capability is 90%. The real-world usage is a fraction of that.

The consensus response to this data is predictable: organisations need to adopt AI faster. Leaders need to redesign how their teams operate. The bottleneck is organisational inertia, change management, skills gaps.

This diagnosis is wrong. Or rather, it is incomplete in a way that will cause serious damage if acted upon uncritically.

The gap between AI capability and AI adoption is real. But closing it by deploying AI into enterprise decision processes, as they exist today, will not improve outcomes. It will make them worse.

What happens when you combine humans and AI for decisions

In late 2024, researchers from MIT’s Centre for Collective Intelligence published a systematic review and meta-analysis in Nature Human Behaviour. Michelle Vaccaro, Abdullah Almaatouq, and Thomas Malone analysed 106 experiments reporting 370 effect sizes, all published between January 2020 and June 2023. Every study in the analysis included three measurements: the performance of humans alone, the performance of AI alone, and the performance of humans and AI working together.

The headline finding: on average, human-AI combinations performed significantly worse than the best of humans or AI alone for decision tasks.

58%

Cases where combining humans and AI produced worse results than using whichever was better alone

Not marginally worse. Measurably worse, with a negative pooled effect size (Hedges’ g = −0.23, 95% CI −0.39 to −0.07). In 58% of all measured cases, combining humans and AI produced results worse than just using whichever one was better on its own.

The entire premise of “AI-assisted decision-making” is that the combination should outperform either party working alone. A systematic review of over three years of research says the opposite is true.

The researchers did find that human-AI combinations outperformed humans working alone. This is called “human augmentation” and it makes intuitive sense: giving someone access to AI tools helps them do better than they would unaided. But this is the wrong benchmark. If the AI alone would have made a better decision, and adding the human degraded it, you have not improved your decision process. You have made it more expensive and less accurate.

Explanations do not help. Confidence scores do not help.

One of the most consequential findings in the MIT study is what did not matter.

Much of the enterprise AI industry is built on the premise that “explainable AI” solves the adoption problem. If we show people why the model made a recommendation, they will use it more appropriately. If we attach confidence scores, people will calibrate their trust correctly.

The meta-analysis found that neither AI explanations nor confidence indicators significantly affected the performance of human-AI systems.

This undermines the core value proposition of every enterprise AI vendor selling “transparent” or “explainable” recommendations. Showing people the reasoning behind an AI suggestion does not help them decide better. They still over-rely on the AI when they should trust their own judgement, and under-rely on it when the AI is more accurate.

The researchers suggest this is because most human-AI decision systems follow the same template: the AI provides a recommendation, the human makes the final call. This sounds reasonable. But it creates a structural problem. The human is asked to evaluate the AI’s suggestion without any framework for knowing when the AI is likely to be right and when it is likely to be wrong. Explanations and confidence scores do not solve this problem because they describe the AI’s internal state, not the relationship between the AI’s capabilities and the specific decision at hand.

What actually works: decomposing the decision

The MIT study found two factors that significantly moderated whether human-AI collaboration worked.

First, the type of task. For decision tasks, where participants chose between predefined options, the combined performance was negative. For creation tasks, where participants produced open-ended content, the combined performance was positive. The researchers hypothesise this is because creation tasks naturally decompose into subtasks: the human provides creative direction and insight, while the AI handles routine generation. Each party does what it is better at.

Second, the relative performance of human and AI. When humans outperformed AI alone, adding AI to the process produced significant gains. The combination beat both. But when AI outperformed humans alone, adding the human to the process produced significant losses. The combination was worse than just letting the AI decide.

The implication is clear. For human-AI collaboration to work, you need to know which parts of the decision the human is better at, and which parts the AI is better at, and route accordingly.

Only 3 out of 100+ experiments in the analysis explored this approach: structured delegation where specific subtasks were assigned a priori to the most capable party. Those 3 showed a positive effect (g = 0.22), though the sample was too small for statistical significance. The direction matters more than the p-value here. Out of hundreds of experimental configurations, almost nobody tested the one design that the theory predicts should work.

3 out of 100

Experiments that tested structured delegation between humans and AI

Three out of a hundred.

This is not a technology problem. It is a process design problem. And it is a process design problem that almost nobody is working on.

The adoption gap is actually a decision architecture gap

Now bring the two studies together.

Anthropic tells us that AI capability vastly exceeds AI adoption. The consensus response is: deploy more AI into knowledge work.

MIT tells us that deploying AI into decision processes, as currently designed, makes decisions worse in the majority of cases.

These findings are not contradictory. They are complementary. And together, they point to a different gap entirely.

The real gap is not between AI capability and AI adoption. It is between the way organisations make decisions and the infrastructure required to make those decisions well, whether with AI or without it.

Consider what is missing in the typical enterprise decision process today. A demand planning team meets weekly to review forecasts. A financial analyst overrides a statistical model based on sales team input. A supply chain leader approves an expedited shipment based on a phone call from a customer. A CFO allocates capital across business units based on a deck prepared by three different teams with three different assumptions.

In each of these cases, a decision is made. But ask any of the following questions and you will find no system, no tool, and no record that can answer them:

Who made the decision? On what basis? What alternatives were considered? What assumptions were embedded in the analysis? What information was available at the time? What information was missing? How did this decision relate to the organisation’s stated strategy? What happened as a result?

This is what I mean by the decision architecture gap. The decision itself is uninstrumented. Nobody captures the reasoning. Nobody traces the logic. Nobody measures decision quality over time. There is no learning loop.

And this is exactly why deploying AI into these processes fails. You are adding a powerful analytical engine to a process that has no structure for deciding what to do with the analysis. The AI produces a recommendation. The human either accepts it or overrides it. Neither choice is captured, evaluated, or improved upon.

Faster adoption into broken processes accelerates the wrong things

Anthropic’s own Economic Index, published in January 2026, includes a finding about deskilling that deserves attention here. When AI-automated tasks are removed from jobs, the remaining work requires lower educational levels. The report uses travel agents as an example: AI covers the complex work of planning and arranging itinerary packages, while routine tasks like ticket printing remain with the human.

In enterprise decision-making, the analogue is this: if AI handles the analytical complexity (scenario modelling, forecasting, constraint optimisation), and the human is left only with the final yes-or-no, you have not augmented the human. You have reduced their role to a rubber stamp on a process they cannot meaningfully evaluate.

This is consistent with what the MIT meta-analysis measured. In the majority of studied cases, the human’s contribution to the AI-assisted decision was neutral at best and harmful at worst. Not because humans are incapable, but because the process gave them no meaningful way to contribute.

I should be honest about the limits of this argument. The MIT experiments were laboratory studies, not field research inside enterprises. The gap between a controlled decision task and a quarterly demand review is real. It is possible that experienced professionals in their own domain navigate AI recommendations better than study participants do. But the consistent direction of the finding, across 106 experiments, is hard to dismiss. And my own experience in enterprise supply chains suggests the lab results, if anything, understate the problem. Real organisations have more noise, more politics, and less structure than a lab setting.

Deskilling in this context is not an inevitable consequence of AI deployment. It is a design choice. Organisations that decompose their decisions, that instrument who is better at what, that capture reasoning and build learning loops, can use AI to amplify human judgement rather than hollow it out. But this requires decision infrastructure that does not exist in most enterprises today.

Bad decisions hide their own causes

When an enterprise misses its quarterly forecast, the post-mortem follows a predictable path. Demand was volatile. The statistical model was not accurate enough. The planning team did not adjust fast enough. Each of these explanations points to a tool or a process that can be improved.

But push further. Why did the planning team not adjust? Because someone overrode the statistical forecast based on input from the sales team that turned out to be wrong. Why was that override not caught? Because there was no structured way to evaluate the quality of the override, no record of the reasoning, no confidence weighting, no accountability mechanism. Why does this keep happening? Because the entire decision process is invisible infrastructure. Nobody owns it. Nobody instruments it. Nobody improves it.

The pain from bad decisions is real and enormous. But it shows up as write-downs, missed quarters, restructurings, and quiet career damage. It never shows up as a category of problem with a name, a budget line, or an owner.

This is why the “adoption gap” framing is dangerous. It implies that the solution is to put more AI into the existing process. But the existing process is the problem. The decisions themselves are the last uninstrumented process in the enterprise.

What decision infrastructure actually requires

The MIT researchers close their paper with a roadmap. Three of their recommendations map directly to what I would call decision infrastructure.

Develop innovative processes. The researchers note that human-AI synergy requires humans to be better at some parts of a task, AI to be better at other parts, and the system as a whole to be good at appropriately allocating subtasks to whichever partner is best. This is a design problem, not a technology problem. It requires decomposing decisions into their constituent parts and routing each part to the most capable agent, whether human or machine.

Develop more comprehensive evaluation metrics. Most experiments evaluate performance on a single measure of accuracy. But real enterprise decisions involve multiple criteria: financial impact, risk, strategic alignment, regulatory compliance, acceptance from buyers and operators. Decision infrastructure needs to evaluate outcomes across these dimensions and track them over time.

Develop commensurability criteria. The researchers call for standardised task designs, quality constraints, incentive schemes, process types, and evaluation metrics for human-AI collaboration. In enterprise terms, this is a decision architecture: a structured framework for how decisions are made, who contributes what, how alternatives are evaluated, and how outcomes are traced back to inputs.

None of this is about AI. All of it is about the decision process itself. AI is one input to that process. The process is what determines whether the input improves the output.

The uncomfortable conclusion

The Anthropic data says we are underusing AI. The MIT data says when we do use it for decisions, we are doing it wrong. Both findings point to the same root cause.

We have spent decades building infrastructure for data: pipelines, warehouses, lakes, lakehouses, quality frameworks, governance, lineage tracking. We have spent years building infrastructure for AI: model training, deployment, monitoring, explainability, safety. We have spent almost nothing building infrastructure for the thing that data and AI are supposed to serve: the decision itself.

The next phase of AI in the enterprise will not be defined by better models or faster adoption. It will be defined by whether organisations build the infrastructure to make decisions deliberately, to decompose them into parts where humans and machines each contribute their strengths, to capture the reasoning, and to learn from the outcomes.

Without that infrastructure, more AI will produce more of what MIT measured: worse decisions, delivered faster, with less accountability, and a growing population of human decision-makers who have been deskilled into spectators of processes they cannot meaningfully influence.

The gap that matters is not between AI capability and AI adoption. It is between the decisions organisations make and their ability to understand, evaluate, and improve those decisions over time.

That is the gap worth closing.

References:

Massenkoff, M. & McCrory, P. (2026). “Labour Market Impacts of AI: A New Measure and Early Evidence.” Anthropic Research.
Appel, R., Massenkoff, M. & McCrory, P. (2026). “Anthropic Economic Index: Economic Primitives.” Anthropic Research, January 2026.
Vaccaro, M., Almaatouq, A. & Malone, T. (2024). “When combinations of humans and AI are useful: A systematic review and meta-analysis.” Nature Human Behaviour, 8, 2293–2303.