AI Agents Aren’t Trustworthy (But We’re Deploying Them Anyway)
For the last couple of years, most enterprise AI has been advisory. Companies have been using language models to summarize, generate content, and help people move faster. A person still had to decide what to do with the output.
Agents are different. An AI agent can retrieve data, write code, trigger workflows, and operate across applications. Once AI starts doing things instead of just saying things, the security problem changes, because now you have to think about what the system can reach.
I think a lot of people are still focused on the wrong part of this. Model accuracy, hallucinations, and output quality. Those things matter, but the more pressing question, and one I do not think enough teams are asking, is what these systems can access and what happens when they get it wrong.
Key concepts
- AI agent security risk is driven by access, not model accuracy, as agents act inside enterprise systems with real permissions
- Over-permissioned AI agents increase risk by operating with broad, persistent access across multiple systems
- Effective AI agent security requires identity-based, task-scoped, and time-bound access controls
- Runtime access control and contextual identity enforcement are critical to prevent unauthorized actions and limit blast radius
AI agents vs generative AI: what changes when systems can act
Traditionally, machine learning systems classified, scored, or predicted. They could tell you whether a transaction looked suspicious, whether a customer might churn, or whether a lead should be prioritized. They were undoubtedly useful, but still contained.
Generative AI expanded that. Now the system could explain, summarize, translate, and write code. It could communicate in ways that felt much more flexible and much more “human.” But even then, at its core, the model was still predicting the next token and producing an output.
Asking a chatbot how to book a flight is passive. Asking an agent to actually book the flight means it has to review options, make choices, use personal information, interact with external systems, and potentially spend money. That crosses from generation into action, and once you give a system the ability to act, you are dealing with a participant in your enterprise systems, one that operates as an identity inside your environment.
AI agents create risk even when the model is accurate
Today's models can be very good at prediction. In many cases, they are shockingly accurate. What they are not very good at is understanding how confident they should be in that prediction.
A system can produce an answer that sounds plausible and still have no reliable sense of whether it is operating in a high or low-confidence situation. It does not consistently know when it is wrong. Most of the time, AI models work well, but sometimes they fail in unexpected ways, different from the ways that humans fail. With traditional question/answer LLMs, hallucinations are amusing; you think, "How could it make this kind of mistake?" When a hallucination happens in an agent-based system, your business can be at risk.
Agentic systems are powerful, and they are improving very quickly. But they are not dependable enough to be trusted on their own, especially when the task involves access, authority, and execution.
So when people ask whether AI agents are trustworthy, I think they are starting in the wrong place. Instead of asking “How do I make the agent trustworthy?” The better question is, “How do I constrain what the agent is allowed to do?”
Multi-agent systems expand the AI attack surface
This gets more complicated when you stop thinking about one agent and start thinking about systems of agents.
Agentic workflows are rarely linear. One agent plans, another executes, a third evaluates the results, etc. They may loop through that process several times until they get to an outcome that works. This can be incredibly powerful. It is also risky.
Why do multi-agent workflows create unintended access paths?
Once you have multiple agents working together, the system is no longer following a narrow, predetermined path. It is exploring options, iterating, and adjusting its behavior based on feedback.
The same pattern has already shown up very clearly in real-world red teaming scenarios. A planning agent can map out how to approach a target, an attack agent can interact with the target system, and a judge agent can evaluate whether the attempt succeeded. Then the system can revise its approach and try again. It can keep doing that until it finds a path that works.
The important point is that the system is not reasoning about ethics or acceptable boundaries in any way. It is optimizing towards its objective. If the objective is to get information, it will keep exploring until it finds a successful path unless something outside the agent prevents it.
This is compounded by the fact that agents cannot self-govern in any meaningful way. If you tell an agent not to use certain data, that instruction exists only in the prompt. It does not become a system-level constraint. If the agent can use that data to complete the task, it may still do so, because it is optimizing for the outcome, not enforcing the absence of something.
¹ https://arxiv.org/abs/2202.03286
Access control is the real security boundary for AI agents
For an agent to be useful, it needs access. I know that sounds obvious, but it’s where the security model changes. An agent cannot retrieve financial data, write an email, update a record, or take action in a workflow unless it can interact with the systems involved.
Often, this means operating with inherited permissions, and sometimes it means broad permissions across multiple systems.
In an enterprise environment, the action itself usually does not tell you whether something is wrong. Pulling financial data is normal. Sending an email is normal. Retrieving healthcare records is normal in certain situations. Neither the output nor the workflows are inherently suspicious. The difference between a valid action and a security problem is context:
- Who is the agent acting for?
- What was it supposed to do?
- What data should it have access to at that moment?
- What action was actually necessary to complete the task?
Without that information, a lot of bad behavior looks legitimate. The only way to know something has gone wrong is to connect the action to identity, permissions, and intent.
Why is AI agent access control easier to describe than to implement?
Suppose I ask an agent to look at last quarter's financials and draft an email to my boss summarizing the results. That sounds simple, but even that task carries a number of identity questions. There is an easy version of this problem, and there is a hard version.
The easy version of AI access control
The easy version is intersection. What can the user access? What can the agent system access? Give the agent the overlap between those two things. That at least prevents some obvious failures. If I do not have access to my boss's email, the agent should not have access to my boss's email, and it should not be able to read them just because I asked.
That gets you part of the way.
The hard version of AI agent access control
This does not solve the harder problem, which is context. The agent needs enough access to retrieve the financials. It needs enough access to draft or send the email. It does not, however, need access to unrelated data on the same server or read access to my boss’s emails. It likely does not even need persistent access to the financials after that part of the task is complete.
What you actually want is correctly scoped, task-specific, temporary access.
Give the system only the permissions it needs for the task at hand, and only for as long as it needs them. Remove them when that step is done. Limit the blast radius if the system is compromised, manipulated, or simply wrong.
What this really requires is runtime access control for AI agents, which is very close to a zero trust way of thinking. You do not assume the system should be broadly trusted just because it is acting on behalf of a valid user. You constrain the action to the smallest useful scope.
AI agents. An unfinished control model. And what you can do about it.
Most enterprise environments still rely on broad permissions and persistent access. They were designed around static roles, not dynamic execution. In many cases, the system enforcing identity does not even have visibility into the original request that kicked off the agent workflow.
So what happens? Over-permissioning. You give the agent access to everything it might need, because you do not have a reliable way to determine what it needs at a particular moment. This is understandable from an implementation standpoint, but also how risk expands.
Enterprises are not going to wait for perfect security models before they deploy agents because the business pressure is real and the capabilities are improving too quickly to ignore. This means it is imperative that organizations ensure they are putting the right controls in place before these systems become even more deeply embedded in their environments than they already are.
Constrain the identity. Constrain the permissions. Constrain the duration of access. Constrain the blast radius.
Identity connects action to actor. It tells you what the system is doing on whose behalf, helps distinguish valid requests from inappropriate access, and gives you the mechanism to set and enforce boundaries.
Without the right governance, agent behavior may look reasonable right up until the moment it becomes a breach.
Frequently asked questions about AI agent security
What is AI agent security?
AI agent security focuses on controlling what agents can access and do inside enterprise systems. It shifts the focus from model accuracy to identity, permissions, and limiting the impact of incorrect or unintended actions.
Why is access control critical for AI agents?
AI agents operate with real permissions across systems. Even when the model is correct, excessive access can lead to data exposure or unintended actions. Access control defines the boundary between useful automation and security risk.
What is the biggest risk with AI agents?
The biggest risk is over-permissioning. Agents are often given broad, persistent access because it simplifies implementation. That increases the blast radius if the agent behaves incorrectly or is manipulated.
Do AI agents need their own identities?
Yes. AI agents act inside systems, retrieve data, and take action on behalf of users. Treating them as distinct identities allows you to define what they can access, track what they do, and enforce boundaries based on context and intent.
Why is AI agent security different from traditional AI security?
Traditional AI security focuses on model behavior like accuracy, hallucinations, and output quality. AI agent security focuses on access and execution. Once systems can act, the risk shifts from what they say to what they can do.
Your next read: Managing AI Agent Lifecycles: From Registration to Retirement
Related Post
Report
Saviynt Named Gartner Voice of the Customer for IGA
EBook
Welcoming the Age of Intelligent Identity Security
Press Release
AWS Signs Strategic Collaboration Agreement With Saviynt to Advance AI-Driven Identity Security
Solution Guide