If humans can’t tell what’s real anymore, how can we expect autonomous agents to?
The AI industry is flooded with data quality tools promising to reduce hallucinations and improve model performance. But we’re treating symptoms, not the root cause. People are rapidly realizing that security and trust around MCP are not only inadequate, but likely present a deep threat, especially for autonomous agentic systems.
Hans Granqvist recently drew a clever analogy: “Steel produced before 1945 is highly sought after because it is not contaminated by nuclear fallout. Likewise, any data produced before 2020 will be highly sought after because it is provably not contaminated by LLM regurgitation.”
This hits at something urgent. We’re rapidly approaching a world where “pre-AI” content becomes as valuable as low-background steel—and for the same reason: provable authenticity.
I’ve been working on content authenticity and digital integrity challenges for years, and the window for action is closing fast. When leaders like Elon Musk openly discuss having Grok “rewrite history,” we’re not just talking about data quality anymore. We’re talking about the preservation of truth itself.
The real problem isn’t data quality—it’s data authenticity.
Large-language models often need access to information that was not available when they were trained. RAG (Retrieval-Augmented Generation) systems emerged to address this short-coming, using external sources to build “context” that the LLM in turn uses as input to its reasoning. With the emergence of RAG, attention has shifted from purely training large language models to augmenting them with external context in real-time. Instead of relying solely on what an AI learned during training, these systems pull in fresh information from databases, documents, and web sources to inform their responses. This makes them more current and potentially more accurate—but also more vulnerable to contamination by fabricated information.
We’re already deploying AI agents to make high-stakes decisions across industries. But what happens when the training data, the context, and even the historical records these systems reference become indistinguishable from AI-generated content?
We’re seeing promising efforts with C2PA and CAI, which work well for combating deepfakes and authenticating binary content. Cloudflare’s recent announcement of C2PA support and bot paywalls hints at something bigger: authenticity and integrity have the potential to be built into the network itself, enforced by infrastructure. But they don’t address structured, interlinked knowledge graph applications where most AI systems actually operate.
I’ve been exploring what I call authentic context—content and metadata that’s cryptographically bound, policy-verifiable, and tamper-evident. Context that doesn’t just claim accuracy but proves its own integrity and provenance.
We need to implement digital integrity solutions now, before there’s nothing left but AI slop.
Ultimately, digital integrity might become table stakes—or the only ticket to entry onto the internet. The future of AI isn’t just about better algorithms—it’s about preserving the ability to distinguish authentic information from synthetic noise. This becomes critical as we build more sophisticated RAG systems, implement MCP protocols, and deploy complex agentic workflows that depend on trusted external data sources.
Authentic context offers a way to bake integrity into agentic systems so they can reason about trustworthiness at the data level—making autonomous decisions based on provable truth rather than plausible fiction.