1. The Response Taking Shape
In March 2026, an autonomous agent breached McKinsey’s internal AI platform in two hours. No credentials, no insider knowledge, access to millions of internal messages.[15] The governance gap that made this possible is structural. AI agents operate across company systems with broad permissions and no runtime oversight. In The New Frontier in Identity Security: AI Agent Access, we mapped why this is inevitable. Agents inherit human credentials, receive standing permissions at setup and accumulate tool connections over time. The governance gap widens with every tool an agent can reach. The responses are here. The layer that governs them is not.
Five Responses, One Missing Layer
Model Context Protocol (MCP) gives agents a universal way to reach tools, but the protocol has no opinion on who may use them or under what conditions. Five responses are now trying to answer that question, spanning standards bodies, protocol authors and identity vendors.
A2A, the Agent-to-Agent protocol (Google → Linux Foundation). Google describes A2A as “designed to support enterprise-grade authentication and authorization, with parity to OpenAPI’s authentication schemes.”[2] In practice, A2A lets one agent delegate a task to another. The requesting agent reads the target’s Agent Card, a JSON file listing what the target can do. It then sends the work as a structured task object. The target processes the task and updates its state as it goes: submitted on arrival, working during processing, auth-required when an action needs permission. At this point execution pauses. Because A2A specifies no authorizer, the protocol marks the pause without defining who should answer it.
IETF Agent Auth (AWS, Zscaler, Ping Identity). A resource server is the app or service an agent is trying to reach. This draft defines how an agent identifies itself to that server when acting on a user’s behalf. Standard OAuth issues a token that names only the user. This draft binds two identities into the same token: the agent goes into the client_id claim and the delegating user into the sub claim. A resource server reads both and authorizes the request against the pair, not the user alone. To do this, the draft composes existing standards rather than inventing new ones. Its authors acknowledge the limit: “additional specification or design work may be needed to define how out-of-band interactions with the User occur at different stages of execution.”[3] The framework binds the agent’s identity at token issuance but defers the runtime authorization question to future specifications.
AAuth (IETF Draft). AAuth extends OAuth 2.1 with a reason parameter. The specification defines it as a “concise, human-readable explanation provided by the agent.”[4] The authorization server must echo this reason verbatim and display it to the user. It does not verify that the reason is accurate, proportionate or consistent with policy. Any claim passes. Once the token is issued, AAuth has no further role. It grants access once, at the start of a session. What the agent does with that access is not governed by the framework. AAuth addresses transparency at first access. It does not address control during execution.
SCIM Agent Extension (Okta). SCIM is an existing standard for provisioning and managing user accounts across enterprise systems. Okta’s IETF draft extends it with two new resource types, Agent and AgenticApplication, that register AI agents as directory objects alongside human users. An agent with a SCIM entry has an official identity in the system: a name, an ID, a lifecycle that can be managed and revoked. The extension is designed as complementary infrastructure, “intended to provide greater interoperability… while reducing the responsibilities assumed by… new protocols for agents.”[5] It manages whether an agent exists. The framework does not address what the agent does at runtime.
OWASP ANS. OWASP describes the Agent Name Service as “a framework for secure discovery and registration of AI agents in multi-agent systems.”[6] ANS functions like DNS for agents: an agent registers by publishing an Agent Card that contains its name, capabilities and endpoint. Other systems query the registry to confirm the agent exists and is who it claims to be. The framework answers the question of identity. It does not answer the question of authorization. A verified agent is not a permitted agent.
All five efforts solve questions that come before an agent acts: is it real, can it prove its identity, is it registered. These are prerequisites for execution, not constraints on it. Once resolved, the agent holds a valid identity, proven credentials and a place in the directory. Nothing in that stack evaluates what happens next.
The missing question is runtime governance: should this agent perform this specific action right now, given the data it has already accessed and the tools it has already called? “Who” and “how” have answers. “Should” does not.
2. Four Architectures for Enforcement
The agent is about to act. It has a name, a token and a place in the registry. Someone needs to decide whether this action should go through. That decision has to happen somewhere between the agent and the tools it connects to. Where depends on what your organization controls. You might own the infrastructure the agent runs on, which gives you access to the code and the network. A cloud-hosted agent leaves you with neither. The downstream app might let you enforce rules inside it, or it might not.
These constraints produce four enforcement architectures. Each places a checkpoint at a different layer: in the network, in the agent’s code, at the operating system or inside the vendor’s app. Herman Errico’s Autonomous Action Runtime Management (AARM) specification (February 2026) maps these four approaches.[7] Here is what each one does, where it sits and what it trades away:
1. Protocol Gateway
A protocol gateway places the enforcement boundary in the network, between the agent and the tools it connects to. When an agent is about to send an email, query a database or delete a file, the request passes through the gateway first. The gateway intercepts every tool call, checks it against policy and decides: allow, deny, escalate to a human or wait for more context. Because the gateway speaks the same protocol the agent already uses (MCP, A2A or any standard protocol), nothing changes on either side. The agent does not know the gateway is there. The downstream app does not know either.
The gateway remembers everything the agent has done in a session: which tools it called, what data it accessed, what parameters it used and what the tools returned. Each new action is evaluated not in isolation but against the entire chain. An agent with permission to query a customer database and permission to send emails can do both individually. But if it reads customer records and then sends an email to an external address, a protocol gateway recognizes the composition attack: data exfiltration. The composition violates policy in a way that neither action reveals on its own.
The tradeoff is visibility at the boundary. The gateway sees every action that passes through the protocol. It cannot see the agent’s internal reasoning, its memory or actions that bypass the protocol (local file reads, shell commands, in-memory operations). It controls the border between the agent and its tools, not what happens inside the agent.
2. SDK (Software Development Kit) Instrumentation
SDK Instrumentation places governance checkpoints inside the agent’s own runtime. Before the agent reads a customer record, a hook (an extension point the platform exposes) fires inside the runtime. The SDK checks it against policy and records what happened. Because it operates inside the agent, it has the deepest visibility of any architecture. It sees the original user request, the reasoning chain, prior actions in the session and what data the agent has already accessed.
The limitation: every agent framework needs its own integration. There is no universal SDK. If you run five different agent frameworks, you need five separate integrations. And because the SDK runs inside the same program as the agent, it can be bypassed. A compromised agent can call tools directly without passing through the hooks. A protocol gateway, by contrast, runs outside the agent as a separate service, and the agent has no path to its tools except through it. The SDK is only as trustworthy as the agent it governs.
3. Kernel-Level Monitoring
The third architecture operates below the application, at the operating system itself. Every action an agent takes on a machine, whether it opens a network connection, writes a file or starts a process, passes through the kernel. Monitoring at this level means intercepting those system calls before they execute. The kernel can allow the call, block it or log it for review. Tools like Falco and Sysdig already do this for general security. The same approach applies to agent governance.
The tradeoff is visibility without context. The kernel knows the agent opened a network connection. It does not know the agent was trying to delete a production database, or whether the actor who triggered it had permission to do so. Kernel-level monitoring catches forbidden actions like connections to known malicious endpoints, but it cannot replace a governance layer that understands what the agent is doing and why.[7]
4. Vendor App Integration
The fourth approach pushes enforcement into the downstream apps themselves. Each tool vendor (e.g., GitHub, Slack, Salesforce) implements its own governance hooks. Before the agent reaches any app, the identity provider (IdP) authenticates it and issues a scoped token. What happens after that depends entirely on what the tool vendor chose to enforce. Its appeal is independence from the agent side of the chain. The approach works whether the agent runs on a cloud platform you do not control (such as ChatGPT, Claude.ai, Microsoft Copilot), inside a self-hosted runtime or anywhere in between. It does not require the agent platform to expose any instrumentation hooks. The cost is that enforcement only exists where vendors have built it.
Once the agent holds that token, the IdP has no visibility into what happens next. Okta and CyberArk are building runtime controls for this layer, but both remain limited: Okta’s Agent Relay is in early access, not general availability,[9] CyberArk’s AI Agent Gateway works only within its own identity stack.[10] Both depend on every downstream tool vendor cooperating. Getting hundreds of apps to implement governance hooks takes years, not months. And even when apps cooperate, neither approach gives you per-action visibility across tools. No tool-vendor approach today offers per-action governance that spans multiple applications.
3. Cakewalk’s Path from Gateway to Governance
Each architecture makes a tradeoff. Cakewalk chose the protocol gateway. Every tool call routes through it, whether the agent runs self-hosted or on a cloud platform. Governance must operate at machine speed, or it becomes the bottleneck that defeats the purpose of agent delegation. The gateway evaluates each action against policy deterministically, using rule matching rather than inference, so every decision is auditable and reproducible before the action executes.
But a gateway alone is not governance. Four things separate Cakewalk from every other approach:
- The gateway knows who your user is and what they are allowed to delegate.
- Your agent’s context grows dynamically as the task unfolds rather than being fixed at setup.
- Every tool call is evaluated against policy in real time.
- Every decision is captured in a structured audit trail.
User Context in the Gateway
Other gateways see tool calls. They evaluate the action but not who the agent is acting for. Cakewalk’s gateway evaluates three inputs on every call: the action (Read, Write, Destructive or External), the user behind the agent (department, seniority and role, pulled automatically from HR systems) and the target app (risk level, data classification and category).
That context changes the outcome. An agent reading internal documentation for an engineering lead is routine. The same action for an external contractor requires escalation. Identical tool call, different governance decision, because the user context is different. This extends beyond your own org. When a partner’s contractual restrictions apply to your data, the gateway needs to know who is asking, not just what is being asked.
This is not a feature added on top of the gateway. It requires an identity governance platform underneath: users, roles, departments, app risk classifications, permission models, approval workflows and audit infrastructure. Cakewalk already operates that platform. A gateway without an identity platform underneath would need to build it.
Dynamic Agent Context
Most gateways give an agent a fixed set of tools at setup time. The agent’s knowledge boundary is decided before the task begins and does not move. This is static agent context. It is the default model in MCP gateways, agent SDKs and vendor app integrations. If the task needs a tool the agent was not configured for, the task either fails or completes with degraded output.
Cakewalk inverts this. Your agent starts each task with no context at all. As the task progresses, every approved access request expands the agent’s working context by one tool. Cakewalk calls this Dynamic Agent Context.
Two boundaries hold the model together. The outer boundary is the total information surface of your company: every app, every dataset, every tool your org runs. It is the ceiling for what any agent could possibly reach. The inner boundary is your agent’s Dynamic Agent Context: what it currently knows about and can act on. It starts at zero on every task, grows one tool at a time and collapses back to zero when the task ends. The outer boundary does not move.
Each expansion follows a different path depending on how far the tool is from the agent’s current reach. A tool the agent already holds credentials for requires only a policy check. If the user can access a tool but the agent cannot yet, the user authenticates. A tool the user does not have access to triggers a full request through the company’s approval chain. On approval, Cakewalk’s provisioning agent handles the rest: account created, permissions assigned, agent connected. Just-in-time provisioning at runtime, not a ticket in an IT queue.
Inside Every Tool Call
What decides each expansion? A product manager asks their agent to analyze customer churn. The first time the agent reaches for a CRM tool, the call passes through the gateway. The gateway evaluates the call against the user, the action and the target app. The engine returns one of three outcomes. Escalate if the data is classified as sensitive. Deny if it violates policy. Approve if the user has CRM access and the action is a read.
Escalations trigger Suspend-and-Resume. The gateway holds the connection while a different human reviews the request: a manager, a security admin, an app owner. Not the user who initiated the task. To the agent, this looks like a slow tool call. Denials return a structured response so the agent can adapt or surface the reason to the user. Every decision is captured in the Decision Trace.
If approved, the gateway reads the credential from your vault and injects it into the outbound call. The agent does not see real credentials, only a temporary reference that expires with the task. The agent’s Dynamic Agent Context has grown by one tool. When the task ends, the inner boundary collapses to zero, while the outer boundary stays unchanged.
Decision Traces, Not Log Files
Every tool call produces a Decision Trace: a structured, immutable record of who delegated, which policy fired, what inputs matched, who approved, what executed and when access ended. One call, one trace.
A marketing manager asks their agent to pull campaign performance from HubSpot. The gateway intercepts the call, evaluates it against the manager’s role and HubSpot’s risk classification, then approves. The gateway injects the credential from the vault and the agent pulls the data. The trace lands before the call leaves.
The point is queryability. Ask “who approved this agent’s access to customer data?” and the answer is a name, a policy and a timestamp. Ask “who owns this agent?” and the answer traces to the delegating user, their department and their current employment status. Log files would require parsing free text across systems. Because the trace is tied to identity, lifecycle changes propagate automatically. When an employee in your org is offboarded, their agents lose access the moment they do. Role changes adjust agent access without manual intervention.
Trust Through Architecture
Every tool call, every credential exchange and every policy decision passes through the gateway. A component with that much control over your security posture needs to earn trust through architecture, not promises.
The gateway is stateless. It reads credentials from your vault at the moment of each tool call and discards them when the call completes. Agents never see real tokens. Nothing is cached, nothing is stored and nothing persists beyond the action.
Governance and Autonomy
Our previous article, The New Frontier in Identity Security: AI Agent Access, laid out the bind every security team faces. Block agents outright or allow them without controls. Companies are not stuck because they distrust agents. They are stuck because no governance layer lets them deploy agents at full speed. The first option destroys the productivity gain agents promise. The second opens risk that compounds with every tool they reach.
The third path exists. Policy-driven governance that lets agents operate autonomously with real-time evaluation at every action. Low-risk actions execute at machine speed. Sensitive actions escalate to the right human. External actions wait for the human in the loop. Destructive actions are denied outright. Access is provisioned when needed and revoked when the task completes. Every decision traces back to a human.
Governance and autonomy. Not governance or autonomy.