Error monitoring patterns for Power Automate

How to monitor Power Automate flows in production — Run History, alerts, Application Insights integration, and operational patterns for catching failures fast.

Updated 2026-10-05

A Power Automate flow that fails silently is worse than no flow at all — users assume the work is done; data drifts; problems compound. Production-grade flows need monitoring that catches failures fast, alerts the right people, and provides actionable diagnostic context.

The Run History. The primary surface:

Per-flow page showing recent runs.
Status: Succeeded / Failed / Cancelled / Running.
Per-run drill-down to action details.
Inputs and outputs per action.
Error messages.

Run history is essential for troubleshooting. Limitation: visible per flow; doesn't aggregate across flows naturally.

Failure alerts. In flow settings:

Email notifications on failure.
Specify recipient address.

Easy to set up; the bare minimum. But:

Easily ignored emails.
One alert per failure, no aggregation.
No diff between transient and persistent.

For production, more sophisticated alerting needed.

Power Platform admin centre. Cross-flow visibility:

All flows in environment.
Failure rates.
Recent failures.
Trend graphs.

Useful for admin visibility; less actionable than flow-specific.

Application Insights integration. The robust path:

Flow emits to App Insights via custom connector or HTTP action.
Failure events captured.
Custom dashboards.
Sophisticated alerts.

Setup more complex but the right answer for serious operations.

Structured logging in flows. Emit consistent log records:

Flow start — log run ID, trigger context.
Key checkpoints — major state changes.
External calls — request and response (sanitised).
Exceptions — caught errors with context.

Log to a dedicated table in Dataverse or to App Insights.

Try-Catch patterns. Configure scope actions:

Try scope — main logic.
Catch scope — error handler; runs if Try fails.
Finally scope — runs always.

Configure run-after settings to implement try-catch-finally semantics. Catch logs the error and notifies; finally cleans up.

Idempotency for retry.

Flows may retry automatically or manually.
Each step should be idempotent — same input → same outcome whether run once or many.
Without idempotency, retries cause duplicates.

Failure categories.

Transient — network blip, throttling, temporary downstream.
Persistent — schema mismatch, permission issue, logic bug.
Data-specific — bad data triggers different path.

Different categories need different responses.

Triage workflow.

Failure alert fires.
Operator reviews flow run.
Categorise the failure.
Transient — retry; maybe nothing else needed.
Persistent — fix code; rerun.
Data-specific — quarantine bad data; fix; rerun.

Without triage discipline, all failures get treated equally.

Alert fatigue. A common problem:

Too many alerts; team ignores them.
Each alert costs attention.
Important alerts buried.

Mitigate:

Group similar alerts (one alert for 100 same-error failures).
Severity-based routing (critical to phone, warning to inbox).
Auto-resolution for known transient.

Dashboards. Visualise flow health:

Failure rate over time.
Top failing flows.
Failures by error type.
Runtime trends.

Power BI or Application Insights dashboards work; refresh regularly.

On-call rotation. For business-critical flows:

Defined on-call schedule.
Pager-style alerts route to current on-call.
Escalation if no response within minutes.
Incident retrospective post-resolution.

For most teams, lighter touch suffices; for SLA-driven services, on-call is real ops.

Flow run retention.

Default 28-day retention.
Older runs purged.
For compliance, archive externally before purge.

If you need run history for a year, export periodically.

Common flow failures.

Throttling from connectors — rate limit hit.
Connection expired — auth token invalid.
Schema change in source data — flow expects old format.
Concurrency / race conditions — two flows on same record.
Conditional logic gap — unexpected branch.

Each has typical signature; learn to recognise.

Common pitfalls.

No failure handler. Flow fails silently; data inconsistent.
Catch eats all errors. Errors caught, logged, swallowed; flow appears successful.
No alerting. Failures pile up unseen.
Per-flow monitoring only. Cross-flow patterns invisible.
Run history overwhelm. 1000 failures; no prioritisation.
Slow flows. Successful but slow; user complaints; no perf monitoring.

Best practices.

Production flow has failure handler. Always.
Failure handler captures context. Log enough to diagnose.
Alerts go somewhere actionable — Teams channel, ops queue.
Periodic flow audit. Which flows fail often? Why? Fix.
Document failure scenarios per flow.

Performance monitoring.

Flow run duration tracked over time.
Alert if duration exceeds baseline by 2x.
Optimise slow steps.

Slow flows degrade user experience and risk timeouts.

Strategic positioning. Power Automate flows in production need operational discipline equal to other production systems. The monitoring tooling has matured; the discipline to use it varies. For mission-critical flows (payments, customer-facing automations, compliance), invest in App Insights integration, structured logging, and on-call. For internal automations, lighter touch suffices but still better than nothing. The cost of ignoring production flow health: cascading failures, eroded trust, eventually-discovered data corruption. The cost of paying attention: ongoing operational investment but predictable, reliable automation.

Related guides

← All guides Glossary →