When Should You Interrupt an AI Agent Running in Production?

An operator watched their coding agent edit 47 files over 25 minutes. Each action looked reasonable in isolation. The aggregate result broke the build, corrupted the database migration, and required a four-hour rollback.

The agent wasn't malfunctioning. The oversight was.

Research from Anthropic's analysis of millions of Claude Code sessions reveals a pattern that challenges conventional wisdom about AI agent safety. Experienced users auto-approve twice as often as novices. But they also interrupt twice as often. The shift isn't toward less oversight. It's toward different oversight: monitoring instead of approving, intervening when signals matter instead of rubber-stamping every step.

The difference between reliable agent deployments and production incidents often comes down to one question: when do you step in? This guide covers the signals that trigger intervention, the parameters that define pause conditions, and the failure modes that emerge when you get human oversight wrong.

Why step-by-step approval breaks at scale

The default mental model for agent safety is approval-based oversight. Before each action, the agent pauses. A human reviews. The human approves or rejects. This works for demos. It fails in production.

Three forces break the approval model:

Force 1: Action velocity outpaces review capacity

Anthropic's data shows the 99.9th percentile turn duration in Claude Code nearly doubled from 25 minutes to 45 minutes between October 2025 and January 2026. Agents are doing more. Reviewing every action at that scale means either spending hours on oversight or creating bottlenecks that defeat the purpose of automation.

Force 2: Context fragmentation

When an agent makes 47 file edits across 25 minutes, each individual edit looks reasonable. The problem isn't any single action. It's the aggregate. Approval-based oversight sees trees, not forests. By the time the pattern becomes visible, the damage is done.

Force 3: Human attention decay

Studies of approval workflows show decision quality degrades after 20-30 consecutive approvals. Humans stop reading carefully. They develop "approval muscle memory." The safeguard becomes theater.

The operators who scale agent deployments don't approve more actions. They approve fewer, but they monitor more. The shift is from pre-approval to active intervention.

Four intervention timing failure modes

Production failures in human oversight cluster into four patterns. All look like diligence. All break under real usage.

Failure mode 1: Premature interrupt

You interrupt the agent before it has enough context to succeed. The agent was gathering information, but the partial results looked wrong. You stopped it. Now you've wasted the work and the agent has to start over.

Anthropic's research documented this pattern: Claude Code asks for clarification more than twice as often on complex tasks than humans interrupt it. The agent's self-imposed pauses often come at better moments than human interrupts. When humans jump in too early, they disrupt information-gathering that would have resolved the apparent problem.

Example: An agent begins debugging by reading error logs, checking recent deployments, and querying metrics. After three minutes, you see it reading logs from the wrong time window and interrupt. But the agent was about to cross-reference those logs with deployment timestamps. The interrupt wasted three minutes of work.

Mitigation:

Set minimum observation window before first intervention (e.g., 3-5 actions or 2 minutes)
Track agent progress toward stated goal before interrupting
Use "explain your plan" trigger instead of immediate interrupt

Parameters: min_actions_before_interrupt=3, min_duration_before_interrupt=120s, require_plan_explanation=true

Failure mode 2: Missed intervention

The agent drifts from its original goal. Each action is defensible. The trajectory is wrong. By the time you notice, the agent has spent 30 minutes on a tangential problem.

This pattern appears in Anthropic's data on scope expansion. As task complexity increases, agents naturally explore adjacent problems. The research found that software engineering accounts for nearly 50% of agentic activity, and within that domain, scope drift is the most common cause of wasted effort.

Example: An agent tasked with fixing a login bug starts by investigating authentication, then drifts into optimizing the database connection pool, then begins rewriting the ORM layer. Each step followed logically from the previous one. None addressed the original bug.

Mitigation:

Track goal drift score: semantic distance between current action and original goal
Set maximum scope expansion ratio (e.g., 2x original task boundaries)
Require re-confirmation after N actions in new domain

Parameters: max_goal_drift_score=0.7, max_scope_expansion_ratio=2.0, reconfirm_after_scope_change=5

Failure mode 3: Wrong-level intervention

You interrupt the agent at the wrong abstraction level. Instead of redirecting the goal, you micromanage the implementation. Or instead of correcting a specific action, you halt the entire task.

Example: An agent is deploying a feature with an incorrect configuration. You interrupt and explain the correct architecture. The agent restarts from scratch instead of fixing the config file.

Mitigation:

Classify intervention type before acting: goal redirect, step correction, or halt
Use intervention vocabulary that matches agent's reasoning level
Provide minimal sufficient correction, not comprehensive redesign

Parameters: intervention_type_required=true, correction_granularity=step|goal|halt, max_correction_length=200_chars

Failure mode 4: Intervention fatigue

You've interrupted so often that you've trained yourself to ignore warning signs. The agent triggers pause conditions constantly. Most are false positives. When a real problem emerges, you've stopped paying attention.

Anthropic's analysis of human involvement patterns found that 73% of tool calls appear to have some form of human oversight. But as session length increases, the quality of that oversight degrades. Users who interrupt frequently early in a session interrupt less often later, even when signals warrant it.

Example: An agent with overly sensitive cost alerts pauses every time it makes an API call. After 20 pauses in an hour, you start auto-resuming without reading. The 21st pause was a $500 charge.

Mitigation:

Calibrate alert thresholds to actual risk, not theoretical risk
Batch low-priority interventions for periodic review
Track false positive rate and adjust thresholds weekly

Parameters: max_interventions_per_hour=5, false_positive_target=<0.3, threshold_review_cadence=7_days

The intervention signal stack: what actually matters

Effective intervention isn't about watching everything. It's about watching the right signals. Production oversight needs at least five signals scored together.

The thresholds below derive from Anthropic's published session data and operational patterns observed in Claude Code deployments. They represent starting points, not universal truths. Adjust based on your false positive rate.

Signal 1: Duration

How long has this agent been running? Long-running agents either succeed big or fail big. Anthropic's data shows the 99.9th percentile turn duration is 45 minutes, while the median is under 1 minute. The gap between typical and extreme usage is where intervention matters most.

Parameters (derived from Claude Code session distributions):

Warning threshold: 10 minutes (above 90th percentile, check in)
Intervention threshold: 20 minutes (require explanation)
Hard stop threshold: 45 minutes (99.9th percentile, halt and review)

Signal 2: Cost accumulation

What resources has this agent consumed? Cost isn't just money. It's API calls, compute time, network bandwidth. Anthropic's risk analysis found that only 0.8% of agent actions appear irreversible, but the financial impact of a single runaway session can be substantial.

Parameters (based on typical API cost structures):

Per-action cost limit: $1 per action (catches expensive single calls)
Session cost warning: $10 cumulative (typical task budget)
Session cost hard limit: $50 cumulative (maximum before human review)

Signal 3: Error rate

How often is the agent failing and retrying? High error rates indicate the agent is outside its competence boundary. The retry loop prevention patterns covered elsewhere address automated retries. This signal catches the pattern where an agent keeps trying different approaches without making progress.

Parameters (tuned for <30% false positive rate):

Warning error rate: 20% of actions (normal exploration)
Intervention error rate: 40% of actions (stuck pattern)
Hard stop error streak: 5 consecutive failures (fundamental problem)

Signal 4: Scope drift

Is the agent still working on the original problem? Semantic drift is the silent killer of agent reliability. Anthropic's research on experienced users found that interrupt rates increase with task complexity, suggesting that scope expansion is a common pattern that requires active monitoring.

Parameters (based on semantic similarity thresholds):

Track embedding distance between current action and original goal
Warning at 0.5 drift score (related but tangential)
Intervention at 0.7 drift score (likely off-target)

Signal 5: Resource consumption

What external resources is the agent touching? Files modified, APIs called, databases queried. Anthropic's analysis found that 80% of tool calls have some form of safeguard, but resource tracking is often the weakest layer.

Parameters (calibrated for software engineering workflows):

Track unique resources touched
Warning at 20 unique resources (broader than expected)
Intervention at 50 unique resources (scope expansion likely)
Hard stop at 100 unique resources (halt and review scope)

Eight parameters that define pause conditions

Production oversight needs concrete numbers. Here are the parameters that define when an agent should pause:

| Parameter | Conservative | Standard | Permissive | |-----------|-------------|----------|------------| | Max turn duration | 15 minutes | 25 minutes | 45 minutes | | Cost threshold (warning) | $5 | $10 | $25 | | Cost threshold (hard stop) | $25 | $50 | $100 | | Error streak limit | 3 failures | 5 failures | 10 failures | | Scope drift score | 0.5 | 0.7 | 0.85 | | Resources touched limit | 30 | 50 | 100 | | Interventions per hour | 3 | 5 | 10 | | Response time target | 2 minutes | 5 minutes | 15 minutes |

Defaults for production: 25-minute duration, $10 warning / $50 hard stop cost, 5-failure error streak, 0.7 drift score, 50 resources, 5 interventions per hour, 5-minute response target.

Five failure scenarios with mitigations

These scenarios reflect patterns documented in Anthropic's agent autonomy research and operational experience from production deployments.

Scenario 1: Agent spends 45 minutes on wrong approach

Agent attempts debugging strategy that doesn't match the actual problem. Each step is logical. The overall approach is wrong. Anthropic's data shows the 99.9th percentile turn duration reached 45 minutes in early 2026, meaning the longest-running sessions can waste significant time if headed in the wrong direction.

Mitigation:

Require agent to state hypothesis before executing
Track hypothesis vs evidence alignment
Pause after 10 minutes if hypothesis not updated

Parameters: require_hypothesis=true, hypothesis_check_interval=600s, max_hypothesis_age=900s

Scenario 2: Agent triggers cascading API charges

Agent makes repeated calls to paid APIs. Each call is small. The aggregate is large. Anthropic's risk scoring found that financial transactions cluster in higher-risk categories, and while most actions are low-risk, a single runaway session can accumulate significant charges before detection.

Mitigation:

Implement per-session cost accumulator
Require explicit approval for any API not on free tier
Auto-pause at cost milestones

Parameters: track_api_costs=true, paid_api_require_approval=true, cost_milestone_intervals=10|25|50

Scenario 3: Agent modifies production without confirmation

Agent has write access to production systems. It makes changes that should require review. Anthropic found that only 0.8% of tool calls appear irreversible, but production modifications fall into this category. The research also noted that 80% of tool calls have some safeguard, but environment-aware safeguards are less common than they should be.

Mitigation:

Tag all resources with environment (dev/staging/prod)
Require explicit confirmation for any prod modification
Log all prod touches with timestamps

Parameters: environment_tagging=required, prod_require_confirmation=true, prod_audit_log=90_days

Scenario 4: Agent creates dependencies on its own work

Agent builds tools or configs that it then depends on. Creates hidden coupling that breaks later. This pattern emerged in Anthropic's analysis of complex tasks, where agents operating at higher autonomy levels (scores 6+) showed increased tendency to create intermediate artifacts that became implicit dependencies.

Mitigation:

Track agent-created resources separately
Require documentation for any agent-created dependency
Schedule review of agent-created artifacts

Parameters: track_agent_artifacts=true, require_artifact_docs=true, artifact_review_cadence=7_days

Scenario 5: Human misses intervention due to alert fatigue

Too many false positive pauses. Human stops responding carefully. Anthropic's research on experienced users found that interrupt rates increase from 5% to 9% as users gain experience. This suggests effective monitoring requires calibration. Too many interrupts lead to fatigue. Too few lead to missed interventions.

Mitigation:

Calibrate thresholds to maintain <30% false positive rate
Batch non-urgent interventions
Track intervention quality metrics

Parameters: false_positive_target=0.3, batch_non_urgent=true, weekly_threshold_review=true

Agent-initiated vs human-initiated stops

Anthropic's research shows Claude asks for clarification more than twice as often as humans interrupt it. Agent-initiated stops are an important form of oversight.

When the agent should pause itself:

Ambiguous instructions with multiple valid interpretations
Missing credentials or permissions
Detected scope drift beyond confidence boundary
Encountered error type not in training distribution
Resource consumption approaching limits

When humans should intervene:

Duration exceeds threshold without progress signal
Cost accumulation rate accelerates
Error rate exceeds tolerance
Scope drift detected by external monitoring
External event invalidates assumptions

The most reliable deployments combine both: agents trained to recognize their own uncertainty, plus humans monitoring for patterns the agent can't see.

Designing your oversight stack

Putting this together:

Define your intervention vocabulary: goal redirect, step correction, clarification request, hard halt
Set your signal thresholds: start conservative, adjust based on false positive rate
Implement both agent-initiated and human-initiated pauses: they catch different things
Track intervention quality: false positive rate, intervention outcome, time to resolution
Review thresholds weekly: what worked at 10 agents won't work at 100

The operators who scale agent deployments aren't the ones who approve every action. They're the ones who know which signals matter, when to step in, and when to let the agent work. Effective oversight isn't about control. It's about knowing when control matters.

Sources

The research referenced in this guide comes from Anthropic's February 2026 study "Measuring AI agent autonomy in practice", which analyzed millions of Claude Code sessions and public API tool calls. Key findings used:

99.9th percentile turn duration: 25→45 minutes (Oct 2025–Jan 2026)
Experienced users auto-approve 40% vs 20% for novices
Interrupt rate increases from 5% to 9% with experience
Claude asks for clarification 2x more often than humans interrupt
80% of tool calls have safeguards, 73% have human oversight
0.8% of actions appear irreversible
Software engineering accounts for ~50% of agentic activity

When Should You Interrupt an AI Agent Running in Production?

Why step-by-step approval breaks at scale

Four intervention timing failure modes

Failure mode 1: Premature interrupt

Failure mode 2: Missed intervention

Failure mode 3: Wrong-level intervention

Failure mode 4: Intervention fatigue

The intervention signal stack: what actually matters

Signal 1: Duration

Signal 2: Cost accumulation

Signal 3: Error rate

Signal 4: Scope drift

Signal 5: Resource consumption

Eight parameters that define pause conditions

Five failure scenarios with mitigations

Scenario 1: Agent spends 45 minutes on wrong approach

Scenario 2: Agent triggers cascading API charges

Scenario 3: Agent modifies production without confirmation

Scenario 4: Agent creates dependencies on its own work

Scenario 5: Human misses intervention due to alert fatigue

Agent-initiated vs human-initiated stops

Designing your oversight stack

Sources

Want the boring part handled?

Still verifying the trust story?