Why we believe the future of AI lies in augmenting human expertise, not replacing it
The rush to automate everything overlooks a fundamental truth: in high-stakes domains, human judgment isn't a bug to fix, it's the core feature.
Removes human judgment from critical decisions, leading to errors in edge cases and loss of institutional knowledge
Lack of explainability makes it impossible to audit decisions or understand failure modes in regulated industries
Generic AI tools fail to capture domain-specific nuance and expert intuition that defines competitive advantage
Enterprise buyers in healthcare, legal, financial services, and other regulated industries aren't looking for AI to replace their experts. They're looking for tools that make their experts 10x more effective while maintaining accountability and control.
Understanding what AI can and cannot learn from data alone
Information that can be codified, documented, and transferred through written or verbal communication.
AI excels at processing explicit knowledge at scale
Deep expertise gained through experience that's difficult to articulate or transfer.
This is where human expertise remains irreplaceable
AI handles explicit knowledge processing at superhuman speed and scale.
Humans contribute tacit knowledge and final judgment.
How we structure AI systems for high-stakes decision-making
Machine learning models analyze data, identify patterns, and generate recommendations
Domain experts review AI outputs, apply contextual knowledge, and exercise judgment
Final decisions are made with human accountability, creating an audit trail
Productivity improvement vs. unassisted humans
Accuracy in high-stakes decision-making
Decisions with clear human accountability
How human-in-the-loop AI creates value across industries
Analyzes medical imaging, flags anomalies, suggests differential diagnoses
Radiologist reviews findings, considers patient history, makes final diagnosis
Faster turnaround with same or better accuracy, clear liability chain
Reviews contracts, identifies non-standard clauses, highlights risk areas
Attorney assesses business context, negotiates terms, provides counsel
80% time savings on document review, lawyers focus on strategy
Monitors transactions, detects anomalies, scores fraud probability
Analyst investigates flagged cases, considers customer context, approves/denies
10x more cases reviewed with lower false positive rate
Monitors sensor data, predicts equipment failures, recommends maintenance
Engineer validates predictions, schedules interventions, optimizes operations
70% reduction in unplanned downtime, better resource allocation
Why human-in-the-loop companies create stronger competitive advantages
Every human decision improves the model. As more experts use the system, it becomes more valuable to all users.
Deeply embedded in expert workflows means high switching costs. The system becomes part of how teams work.
Requires deep understanding of industry workflows and regulations. Generic AI companies can't easily replicate.
Human feedback creates a flywheel: better AI attracts more users, more users generate better training data.
Initial model + early adopter feedback
Improved accuracy attracts more customers
Market-leading performance, hard to displace
Unlike pure automation plays that commoditize quickly, human-in-the-loop systems accumulate proprietary knowledge that compounds over time. The longer a company operates, the wider its moat becomes.
What we look for in human-in-the-loop AI companies
The system explicitly defines where humans add value and maintains accountability
Selling to businesses with budget and compliance requirements, not consumer markets
Founding team has deep experience in the vertical they're building for
If you're building human-in-the-loop AI for a high-stakes domain, we want to hear from you.
Initial check size
Target stage
Response time
Epistemologists distinguish between two types of knowledge. This distinction is the intellectual foundation of Noodle's investment thesis — and the reason why the problem we are solving is structural, not incidental.
AI handles this well
The written procedure, the published rule, the documented best practice. AI systems are exceptionally capable at retrieving, synthesising, and applying this layer — faster and more consistently than any human practitioner.
Examples: MARPOL regulatory annexes, clinical evidence-based protocols, national curriculum frameworks, sports biomechanics research.
The dominant factor in outcomes
The clinician who knows this particular patient will not follow the standard protocol, regardless of what it recommends. The coach who sees that a player's backhand fails under psychological pressure, not physical fatigue. The compliance officer who knows that a specific regulator is currently applying a stricter interpretation than the published text requires.
This knowledge is embodied, relational, and contextual. It lives in people, not databases. And it is the dominant factor in outcome quality across virtually every high-value domain.
The businesses that learn to systematically bridge the Explicit–Tacit gap will outperform those that don't. Significantly. Durably.
In Bayesian terms, explicit knowledge constitutes the prior: well-documented, stable, transferable. Tacit knowledge is the likelihood function — the nuanced, context-sensitive update that transforms a generic prior into a useful posterior for this specific situation.
Current AI systems are extraordinarily good at approximating priors. They are structurally incapable of capturing likelihood functions that were never written down. The Explicit–Tacit gap is, formally, the difference between:
P(outcome | documented context) — what AI optimises forP(outcome | documented context + relational context + practitioner judgment) — what decisions actually requireThe delta between these two distributions is where value is destroyed, and where Noodle operates.
This is the most commonly misunderstood aspect of AI deployment at scale, and it is central to why Noodle's approach generates durable competitive advantage.
A language model or predictive system produces outputs that are statistically likely given its training distribution. It cannot be otherwise: the model has never encountered this patient, this regulator, this student's emotional state today. Every output is, fundamentally, an estimate — a probability distribution over possible correct responses, collapsed into a single answer.
This is not a flaw. It is an architectural property. The question is not "how do we make AI certain?" The question is "how do we anchor probabilistic outputs to deterministic real-world requirements?"
The Human-in-the-Loop layer does not eliminate probabilistic reasoning. It converts probabilistic AI outputs into decisions that meet the deterministic standards that outcomes actually require.
| Probabilistic AI Output | Deterministic Requirement |
|---|---|
| The most likely treatment for this presentation is X | This patient will not comply with X; prescribe Y |
| This filing most likely satisfies regulation Z | This regulator requires section 4.2 explicitly cited — include it |
| This student's performance suggests focus on algebra | This student shuts down under timed pressure — adjust assessment format |
The human expert does not override the AI. They contextualise it — converting a statistically valid answer into a situationally correct one. Over time, through our progressive encoding architecture, those contextual corrections are systematically fed back into the system, shifting the model's output distribution toward the deterministic requirements of the domain.
There are two distinct sources of uncertainty in predictive systems:
Current AI failures in high-stakes domains are predominantly epistemic, not aleatoric. The tacit knowledge that drives outcome quality simply does not exist in any training corpus. HITL refinement functions as a structured epistemic uncertainty reduction mechanism — each human correction is a labelled sample from the true posterior distribution that the model cannot access through pre-training alone.
More precisely: each HITL intervention generates a (context, AI output, expert correction, outcome) tuple. Over time, these tuples populate the model's effective training distribution with examples from the exact distributional region where its epistemic uncertainty was highest. This is active learning at the domain level.
What We Back
Noodle invests in ventures built around the Human-in-the-Loop (HITL) refinement architecture — a structured methodology for intercepting AI outputs, enriching them with contextual human judgment, and progressively encoding that judgment back into the system.
This is not a chatbot with a human reviewer. It is a compounding knowledge infrastructure.
Before AI touches any workflow, we surface the implicit rules, heuristics, and exceptions of the best practitioners in the domain. We convert lived expertise into structured context that the AI can operate within — not as an afterthought, but at the foundation.
AI generates the first pass. Domain experts reshape it — not to correct errors, but to translate a technically valid output into one that is contextually actionable. This is the difference between a plan that is correct and one that will actually work in this situation, with this person, under these conditions.
Every refinement feeds back into improved prompts, contextual guardrails, and domain-specific patterns. The human layer teaches the system continuously. Over time, the cost of intervention falls while the baseline quality of output rises. This is how the model scales without simply adding headcount.
Real-world results feed back into the system. The loop closes not on user satisfaction but on actual outcomes: did the compliance filing withstand scrutiny? Did the athlete's serve percentage improve? Did the student's assessment score increase? Performance data sharpens every subsequent cycle.
The four-stage architecture maps directly onto a reinforcement learning framework:
Unlike standard RLHF (Reinforcement Learning from Human Feedback), which uses generic preference data, the Noodle architecture generates reward signals that are domain-specific, outcome-validated, and cumulative. The reward model is not a proxy — it is actual domain performance data. This produces a fundamentally tighter and more reliable optimisation signal than preference-based fine-tuning alone.
This principle is counterintuitive, and it is one of the most important lessons from AI deployment at scale.
AI systems optimise for correctness: the answer that is most consistent with documented evidence, most statistically probable given training data, most aligned with published best practice. This is exactly what they should do, and exactly why they fail in deployment.
Real-world outcomes are not determined by correct answers. They are determined by answers that work — given who is in the room, what constraints are active, what relationships are at stake, and what history precedes this moment.
A clinically correct treatment recommendation that a patient will not follow is inferior to a slightly less optimal protocol they will actually adopt.
A technically sound legal filing that uses language a particular regulator finds adversarial will underperform a pragmatically worded one.
A pedagogically optimal learning sequence that a particular student experiences as overwhelming will produce worse outcomes than a gentler approach that maintains engagement.
The practitioner who knows the difference between these options is not overriding the AI. They are doing the work the AI cannot do: mapping a correct answer onto the real-world conditions that determine whether it succeeds.
The progressive encoding mechanism captures this pragmatic intelligence systematically. When a compliance expert adjusts an AI filing to match a specific regulator's preferences — and that filing consistently performs — the system learns the preference. When a coach modifies an AI training plan because a particular athlete needs confidence more than load this week — and the match results improve — the system learns the contextual priority weighting.
Over time, the gap between "correct" and "effective" narrows. Not because the AI becomes more correct, but because it becomes more contextually calibrated. This is a qualitatively different kind of improvement — and one that is specific to the domain, the relationships, and the history that Noodle has built. It cannot be replicated by deploying a better foundation model.
The correct-vs-pragmatic failure is a loss function misalignment problem that is structurally resistant to standard ML solutions.
Training loss minimises error against the training distribution. But "pragmatic effectiveness" is not in the training distribution — it is an emergent property of a specific deployment context, specific human relationships, and specific historical sequences of interactions. It cannot be labelled in advance.
This is why scaling model size, adding more training data, or applying RLHF on generic preference data does not close the gap. The signal that would close it — "did this specific output work for this specific person in this specific context?" — only exists in deployed feedback loops.
The Noodle architecture is designed specifically to capture, structure, and re-inject that signal. It is a solution to a problem that foundation model providers are not positioned to solve.
Why the Breadth Is the Point
The explicit–tacit gap appears wherever AI is deployed. The methodology for bridging it is transferable. Expand each domain to see how it plays out.
Competitive Position
The HITL architecture is transferable across sectors. The tacit knowledge it captures in each sector is not. A maritime compliance knowledge base built through our layer cannot be replicated by a competitor entering the space — it is built from years of expert refinement and is specific to the relationships, jurisdictions, and edge cases we have encountered. The methodology is our engine. The accumulated knowledge is our moat.
As our human experts refine AI outputs, they systematically encode their judgment into the system. Over time, the AI requires less human intervention to reach the same quality threshold. Cost of delivery falls while output quality rises. Competitors who rely on raw AI output never achieve this. Competitors who rely on pure human delivery cannot scale it. We sit at the intersection.
The longer Noodle's architecture operates in a domain — with a practitioner, a client, an institution — the more tacit knowledge has been extracted and encoded. Replacing us means losing that institutional and individual memory. The AI stack built on top of our layer becomes dependent on the context layer we have built. That context does not transfer to a competitor.
The current AI deployment cycle is producing a large and growing class of disappointments: capable systems, poor outcomes, frustrated users. The failure mode is almost always the same — the explicit–tacit gap.
The ventures that solve this problem systematically, across domains, with a methodology that scales and compounds, will capture disproportionate value in the next phase of AI adoption. Noodle Investments is building and backing exactly those ventures.
We are not betting on a better model. We are betting on the layer that makes any model actually work — in the real world, for real people, in conditions that no training dataset fully anticipated.
This appendix formalises the investment thesis for technical audiences, mapping the core concepts onto established machine learning, epistemology, and systems design frameworks.
For further discussion of the technical architecture, data infrastructure, or domain-specific deployment models, contact the Noodle investment team.