Agentic Security - introducing extensible-mcp

If we can remember back to the arrival of ChatGPT, there was a lot of worry about LLMs interacting with systems, from the user’s filesystem to the open Internet. Now we’re all pretty comfortable - sure, there are scams and the occasional company losing its files, but the blast radius from these issues seems contained. This illusion of comfort will fall away as we move to an agentic Internet. With LLMs communicating and transacting, the potential blast radius is suddenly exponentially larger. Once venturing forth from our laptops and closed networks, agents will be the targets of continuous attacks. But agents need to explore this larger world and yet be subject to standard security constraints.

This evolution will require more than just incremental changes to our tools. extensible-mcp is an MCP proxy designed for that shift. Effectively, there are four key properties, three of which it delivers now:

Dynamic discovery and loading of tools and services. The agent’s purview cannot be limited to some startup config file when they cannot know what resources (services, tools, other agents) will be necessary to complete an open ended task.
RAG-based tool discovery from these loaded resources. Since we cannot know in advance which tool from an MCP Server will be necessary, we should be able to load only those the LLM requests into the LLM’s context.
Security concerns need to be addressed at all points. The security infrastructure needs to be outside the LLM’s control boundary; as they are under attack, we cannot trust them to manage their own security. Every external communication is a potential attack vector, and then there are hallucinations.
Cryptographically secure, structured, LLM tamper free, verified agreements among principals whose meaning is not at the whims of agent interpretation after an A2A negotiation.

I arrived here from a simple problem - how to reduce the cost of MCP Server tools on the context window. The result was extensible-mcp, an MCP Server proxy for the agentic age. My first decision was to move server and tool descriptions into a vector database so the LLM could query for tools it needed, and only those it required need show up in the context. Given this, it was a small step to allow dynamic server loading; certainly in the coming agentic age, an LLM would need to dynamically find and load resources. Placing tool descriptions in a vector database would allow an LLM to load a huge number of tools, even if only a few ended up being necessary. This addresses the first two points above. However that opened up another problem - and possible resolution. Clearly, every interaction is a potential security violation. While significant effort is being invested to limit hallucinations and more will undoubtedly be done to reduce prompt injection attacks, they will not be eliminated. Ultimately this is hope as a strategy. Just as with humans, agents will require an independent security infrastructure. The architecture demands slots where telemetry and security issues can be dealt with. The current implementation supports injecting security code at every point in the pipeline. This code runs outside the purview of the potentially compromised LLM.

Consider the general scenario I have outlined and all the potential points of attack. It starts with an agent or user requesting to load an MCP Server. Perhaps the agent, or some other agent it communicated with via A2A, suffered a prompt injection attack leading to a malicious URL and an information leak. Intercepting and evaluating that URL (for example, check a blacklist) can prevent significant damage. When the LLM decides it needs a tool, it can ask a semantic query run against the vector database, so it only sees tools that respond to its actual needs. But at that point we can again filter or (more interestingly, as we will see) update the tools as they are stored in the vector database. In the first release, this step removes any tool whose name contains the string “delete”. This provides an example of a filter written in Python. Next, when the LLM actually sends a tool call, that too can be evaluated. In the current example, attempts to close an issue are blocked. This shows how to use Rego compiled to WASM. Finally, we can manipulate the return, perhaps to check for malicious content.

With simple adapters, you can call out to any security apparatus, not just write Python or Rego. These are just examples.

However, while this addresses our third point, and matches the current state of the extensible-mcp repo, it doesn’t address the fourth point. Despite the important infrastructure we’ve described, LLMs still cannot be trusted to speak truthfully on a principal’s behalf. Suppose the LLM calls a tool to delete a file. Because this is a destructive operation, the tool requires a confirmation from the user, say a string property, like confirmation: 'CONFIRM_DELETE'. But can we trust this? Could the LLM be hallucinating or a victim of a prompt injection attack? Perhaps. In essence, the LLM’s claim is second hand - basically hearsay. It’s telling the tool that the user told it to delete the file. What we need is direct, non-repudiable evidence of user agreement by a path the LLM can’t influence. This is what we get with the W3C’s Verifiable Credentials.

The W3C Verifiable Credentials standard was developed with serious involvement from major payments companies (Visa, Mastercard, Fiserv), hyperscalers (Google, Microsoft, IBM, Intel, Adobe), governments (DHS, NIST, HM Government, India), and the digital-identity ecosystem (Evernym, Sovrin, Block, etc.) It provides a standard for wrapping claims, such as “I want to delete this file” inside a PKI infrastructure, making it clear who made the claim. Because it is signed, the claim can be passed from party to party without losing its proof of authenticity.

Much of the development predates our modern adoption of LLMs, but the architecture is being reused. For example, Google, with partners, developed the Agent Payments Protocol (AP2). The goals specifically include providing a non-repudiable way for a merchant to be sure an agent’s request “accurately reflects the user’s true intent”.

Assuming the claims were signed outside of the LLM’s control, this provides the guarantees we need, but only for the specific case of agent mediated payments. We need these guarantees wherever an LLM claims to speak for another entity. It moves us from hearsay to first person testimony. The next version of extensible-mcp will add support for Verifiable Credentials, including the formats AP2 defines.

In a Verifiable Credentials based agentic architecture, valid credentials will be needed for calling tools. extensible-mcp can validate credentials on a tool call. However, external tools and internal security do not always see eye-to-eye. extensible-mcp can also update a tool’s parameters to require this additional level of authentication. Finally we can provide an out-of-band means for an LLM to request credentials from 3rd parties to be included in later tool calls.

We also argue for the signed material to be structured content with clear semantics. Anything can be signed, but in an environment where statements have real consequences, what is signed is important.

Consider the AP2 case. Before we arrive at a signable proposal, all we have are various agents conversing through A2A leaving a conversational trail. At the end of that we have a lot of textual exchanges with potentially ambiguous meaning. One agent requests information, another proposes some price point, they bargain, change conditions, all in natural language. At the end of this they may come to something resembling an agreement, but each concerned agent may interpret it differently. All that we have is ambiguity, backed up by a conversation of ambiguous statements by all parties.

But when this ambiguous text is rephrased as an agreement in the context of AP2, the ambiguities are squeezed out. Not just the LLMs, but the user, bank, and other concerned parties can read and understand the document. It is clear what has been agreed to.

Consider how this problem becomes multiplied as the parties to an agreement increase. Not only are there several LLMs involved, each potentially having their own interpretation, and not all necessarily honest, but then there are the human principals, as well as related entities such as banks, stores, manufacturers, etc.

In a previous age, humans created contracts as a way to contain the ambiguity of human language and contract law as a way to clarify the semantics. Structured documents with agreed semantics will be needed to play this role in the agentic age. “I agree to delete this document” is simple - if I actually own the document, if its existence is not a legal requirement, and so on.

This fourth point raises the bar for the effort, but AP2 stands as an example in a fairly complex domain. Other domains will need to generate their own. (We’ve discussed our preferred approach to validating semantics in Policy as Code, Policy as Type.) Developing a usable syntax and semantics is not a trivial task, but the internet has proven itself capable of generating both ad hoc and de jure standards when necessary. The alternative is an endless array of LLM lawyers litigating at Internet speed.