There’s been a big mis-selling in Agentic AI implementation. People compare its implementation to software enablement. But this breaks when the agent can change a workflow.
The agent approves a refund, opens an incident, updates a customer record, begins onboarding for a new customer, or escalates a support ticket. At that point a training calendar and a Slack message are not enough for a rollout plan.
It needs a change record.
Enterprise AI adoption has a naming problem. Work ‘adoption’ gets viewed through the same lens as software ‘usage’. Thus work is framed in terms of seats, office hours, examples of how to properly format a prompt, and wait for it to kick in. But then the work actually gets executed out through an agent that in turn changes a workflow.
The system has entered the process.
Microsoft’s 2026 Work Trend Index frames this shift as an operating-model problem. WorkLab analysis finds that employees may be ready for AI, while the systems around work are not. Agent approvals, open incidents, and changed customer records create a different implementation roadmap.
That changes the implementation roadmap.
The Rollout Surface Changed
Agents behave differently from a chat tool. An agent is released through a system.
ServiceNow announced Action Fabric at Knowledge 2026, explicitly opening its governed system of action to agents. The MCP Server gives agents access to workflows, playbooks, approvals, catalog requests, and business rules. All of which run through identity verification, granted permissions, and audit trails.
Within an enterprise the enterprise agent problem manifests itself when an agent has moved from the edge of a process, creating a summary of work done, to inside the process, making a move.
The first key question that comes to the surface for the enterprise is no longer “who should have access to this tool” and rather “what change is this tool going to drive for the business, and who is going to own that change (ie: the teams that run the production systems, compliance to regulations, promises to customers, incident response, and the overall economics of the workflows that this will insert into)”.
The reality of the enterprise is well captured in a preview for LangChain’s Interrupt 2026: the initial excitement to have agents proving work in production will quickly give way to questions about the team, tooling and infrastructure required to support agents that are no longer ‘proof-of-concept’ work (LangChain Interrupt 2026 preview LangChain Interrupt 2026 preview). My experience with clients has been the same: there is initial excitement with the first useful agent, overlap of work with the second and finally ownership problems with the third.
Fine. That is the good version.
The bad version of this is quiet. A team enables an agent with a service account, an admin token, a dashboard that nobody looks at. It looks good during the demo, and then a change in a source system happens (e.g. a field name changes), a policy document drifts, an approval queue gets renamed, a customer edge case gets found out, and the agent keeps moving. Nobody owns the change because nobody treated the agent as a change.
The rollout path gets safer when every promotion carries evidence, scope, and a rollback owner.
The Change Record Is the Agent Spec
Atlassian describes IT change management as planning, reviewing, approving, and deploying changes to services with as little disruption as possible. Boring. Also the right object.
Agentic AI needs the same boring object.
A change record should specify which human role loses or gains work, which systems the agent can interact with, which actions require approval, which actions are forbidden, which metrics define harm, which traces prove behavior, and which owner can roll back changes made by the agent when something goes wrong.
Rather than going straight to a typical roadmap of discovery, pilot, platform choice, training, and rollout, I would put a change-control spine through each step of that typical roadmap.
By discovering the workflows instead of thinking of all the cool things an AI can do, we can categorize “Summarize account notes” and “renew an enterprise contract” for example into different risk classes. For example, pilot work should run in a sandbox that is production-like in terms of data and failure handling. Limited rollout of an agent should in the first place constrain the authority of the agent before it’s given to more people. And production should have a clear owner, and the agent and all its traces should be kept for a defined amount of time, after which they can be evaluated for performance, and in case of an incident there should be a clear path to resolve it.
This keeps the agent’s actual permissions from being discovered during an incident review.
By embedding service ownership into an organization’s way of working, these implementation dangers can be mitigated by establishing contracts between teams, a sandboxed deployment, and an appropriate rollout sequence. The AI team can be left to own the things they know best, i.e. the evaluation harness, the evals, model routing, and deployment mechanics. The business process owner must own the workflow semantics. Security, operations, and the relevant parts of legal or compliance must own the permission envelope, production response, and the consequences of non-compliance (respectively).
Shared ownership is annoying. So is production.
This is why I keep harping on service ownership for agent work. LangGraph for enterprise agent development made the runtime version of this point. Production agents have operational contracts. A clever graph is not enough. It can fall apart after the first model swap, policy change, or integration outage.
The change record is the handoff object between business process, agent runtime, security, and operations.
The Metrics Already Exist
No need for another exotic agent scorecard. The software delivery world already has the basic bones. DORA’s software delivery metrics track change lead time, deployment frequency, failed deployment recovery time, change fail rate, and deployment rework rate.
Change lead time: time from proposing agent behavior to approving production behavior. Deployment frequency: rate of safe promoting of an agent to production, such as adding an agent to a tool registry, policy pack, an organization’s memory schema, retrieval index, or a workflow. Failed deployment recovery time: time to reverse an action of an agent, such as reverting a prompt or policy that was added to production, removing a permission that was granted to an agent, or switching back to a previous workflow. Change fail rate: percentage of changes to agents that require intervention.
This would all be nice and clean if an agent’s behavior failed in a binary way, like an exception being thrown. But it does not. It produces a technically correct answer that just happens to be wrong in the context of the workflow. Which is why the failure is behavioral, not binary, and is invisible to a deployment platform that only knows how to scream when a process fails to start.
So the metric needs evidence.
In the end, the production agent rollout should collect all traces of decisions (tool calls, approval steps etc), rejected actions (e.g. because of insufficient privileges), user corrected mistakes as well as any failures of the eval routine. Business outcomes should also be added to that list of the things changed for a release story and then the team has the evidence for the change board that they’re approving of “stuff” with a slightly nicer UI.
This is where Everybody Tests comes in. Testing cannot be relegated to downstream QA when an agent can affect a live workflow. Product, engineering, operations, security, and enterprise systems teams should be able to run the test. Ideally, they should understand it, too. The eval suite tests behavioral regressions. Traces reveal runtime drift. Approval logs expose authority escalation. Business metrics surface harm the model never sees.
All of them are part of the change.
The Roadmap Is a Promotion Ladder
Start with read-only assistance. The agent assists with summarization, search, templates, classification, and process explanation. That finds workflow fit and failure modes without giving the system authority to act.
Next, the team gradually grants more permission inside well-defined boundaries. Completing low-dollar refunds, updating internal tickets, sending non-regulated customer messages, changing low-risk account fields, deploying to test environments. The goal is to prove bounded authority before scope expands.
This promotion path pays for itself by preventing a business process from being secretly screwed by an AI that nobody can explain.
Make each step on the promotion ladder concrete. Human-in-the-loop needs a named reviewer, a review surface, override power, correction capture, and a rule for when the agent stops asking. Same for guardrails, observability, and governance. Each word should collapse to an owner, system, threshold, and audit trail.
McKinsey’s 2026 AI trust survey is useful here because it separates adoption from maturity. Strategy, governance, and controls for agentic AI remain the weak spots. Security and risk concerns remain the main barrier to scaling. Which tracks.
Boring. Beautiful.
Own the Change
So long as an organization treats an enterprise AI agent like another tool intended to spread to more people in the organization with the same amount of enthusiasm, then the AI agent’s implementation will fail shortly after the first collisions with the organization’s permission models, its customers’ reporting structures, its compliance requirements, its process exceptions and its sheer number of customers.
I have no particular interest in helping to recreate the CAB theater for Enterprise Agents. Meetings with 8 approvers (or more!) for a password reset workflow that they cannot even understand is a huge waste of time and effort. Yes, review is reasonable in regulated paths, but that should be the exception, not the rule. And it should be as trivial and technical as possible, ideally close to where the work is actually being done. (In this case a simple approval in the workflow UI).
Put the agent change record next to the PR, the eval report, the trace sample, the permission diff, and the rollback plan. Have the workflow owner sign the semantics; security sign the authority; engineering sign the runtime; and operations sign the incident path.
Then ship.
That is what an AI implementation roadmap needs now: a promotion path for systems that can act.
Production always gets weird.

