TL;DR: Trustworthy Agents Need Governance, and That’s Where the Real Work Starts

06/05/2026

The short version:

AI agents don’t just answer questions anymore. They plan, act, use your tools, and run in a self-directed loop with a lot less human oversight than a chatbot. That autonomy is what makes them valuable. It’s also what makes them risky. Anthropic’s latest piece, Trustworthy agents in practice, does a good job explaining why trust has to be built into every layer of an agent, not just the model.

What the article actually says

Anthropic breaks an agent into four parts, and each one is a source of capability and a source of risk:

e line that stuck with me: “A well-trained model can still be exploited through a poorly configured harness, an overly permissive tool, or an exposed environment.” The two risks they flag most are agents misreading what you actually wanted and prompt injection attacks. Both get worse as agents get smarter and we hand them bigger decisions. Their fix is defense at every layer, built on five principles: human control, alignment with user values, security, transparency, and privacy.

Why this is a governance problem, not just a model problem

Here’s what I think a lot of people miss. The model is only one of those four layers, and the other three (harness, tools, environment) are owned by the enterprise, not the model vendor. Anthropic says it directly. Customers have to think carefully about which tools and data they give an agent, which permissions they grant, and which environments they let it run in.

That’s not a prompt-engineering task. That’s identity, least-privilege access, and visibility. In other words, AI agent governance.

Where Refoundry comes in

Refoundry leads in AI agent governance, and the idea we keep coming back to is simple. Treat agents like employees. Every agent gets its own identity, least-privilege access, and a real lifecycle: onboarding, monitoring, and offboarding. You can’t govern what you can’t see, so the first move is always an agent inventory. Find the shadow agents already running in your environment before you grant a single new permission.

How Microsoft’s new tooling makes it real

The Anthropic principles tell you what good looks like. Microsoft’s agentic stack is how you actually build it if you live in M365:

Microsoft Agent 365. A single control plane and registry for the agents running across your org. This is your visibility layer.
Entra Agent ID. Gives every agent its own governed identity instead of a shared service account, so permissions, conditional access, and offboarding work the way they already do for your people. This is "treat agents like employees" in production.
Purview, Defender, and DSPM.. Data protection and threat monitoring that wrap the tools and environment Anthropic warns about, including the prompt-injection entry points.

So Anthropic makes the case that trust is layered. Microsoft ships the identity and control plane for the layers the enterprise owns. Refoundry is how you stand it up: inventory, identity, least privilege, and a governance model that actually scales.

Bottom line, agents are going to reshape how your people work. Whether that happens on a secure foundation comes down to the layers you own. Start with visibility, govern with identity, and treat every agent like an employee.

Send Us a Message

"*" indicates required fields

Posted in Refoundry Blog