Async jobs in Dataverse
How Dataverse runs background work — system jobs, async plug-ins, workflow runs, and how to monitor, troubleshoot, and prevent the async backlog from getting out of hand.
Many things in Dataverse happen asynchronously — background workflows, async plug-ins, scheduled jobs, system events. They all flow through the async system under the hood. When this system slows down or backs up, user-facing operations slow with it. Understanding async behaviour is essential for any production Dataverse deployment of any scale.
What runs async.
- Async plug-ins at Stage 50.
- Async workflows (legacy real-time workflows).
- Bulk operations — bulk delete, bulk import.
- System events — duplicate detection runs, calculated column recalcs.
- Power Automate flows (for many trigger types).
- Custom async actions.
The system job. Each async operation is a row in the System Job (asyncoperation) table:
- Owner — who initiated.
- Operation Type — what kind.
- Status — Waiting, In Progress, Succeeded, Failed, Canceled.
- Status Reason — finer detail.
- Start Time, End Time, Duration.
- Error Code, Error Message if failed.
- Retry Count.
The table is queryable and reportable.
Monitoring.
- System Jobs page in admin centre — filterable view.
- Advanced Find — query AsyncOperations directly.
- Power BI / Fabric — pull async data for trend reporting.
Mature deployments have a dashboard showing job counts, failure rates, duration percentiles.
Job statuses.
- Waiting — queued; not yet picked up.
- In Progress — running.
- Pausing / Paused — manual pause.
- Canceling / Canceled — manually canceled.
- Succeeded.
- Failed.
Stuck "In Progress" jobs older than expected indicate stalled execution.
Job queue management.
- Capacity is shared across the environment.
- Async jobs prioritised by type and retry rules.
- Background work doesn't block user-facing operations (mostly).
Retry behaviour.
- Transient errors — retried automatically with backoff.
- Permanent errors — marked Failed; not retried.
- Max retries — configurable; defaults vary by job type.
Bulk operations have their own retry semantics.
Common async backlog causes.
- Heavy bulk import — fills queue with insert / update jobs.
- Misbehaving plug-in — failing repeatedly, retrying.
- Workflow with infinite loop — workflow creates record; trigger fires another workflow; loop.
- External system slow — async plug-in calls external API; backed up waiting.
- Plug-in throwing on every record — visible bug; high failure rate.
Backlog impact. When async queue is deep:
- New async work waits.
- Some operations appear delayed to users.
- Investigation gets harder (older jobs harder to find).
Cleaning up.
- Cancel specific jobs.
- Bulk delete completed jobs — older than retention period.
- Async cleanup job — schedule periodic cleanup; configurable retention.
Without cleanup, the async table balloons over time; performance degrades.
Async cleanup job. A system job that purges old async records:
- Configurable retention days.
- Runs periodically.
- Keeps async table size manageable.
In high-volume environments, configure retention to 7-30 days; queries against async are then fast.
Querying async.
GET /api/data/v9.2/asyncoperations?$filter=statecode eq 3 and statuscode eq 31
statecode 3 = Completed, statuscode 31 = Failed. Combine to find recent failures.
Failure analysis. Failed jobs:
- Error message — start here.
- Stack trace in plug-in failures.
- Input data — what was the job processing.
- Time pattern — failures in clusters indicate system issue.
Common failure causes:
- Plug-in exception (most common).
- External system timeout.
- Concurrency / lock conflict.
- Resource limits.
Bulk delete jobs. Separate but related:
- Bulk delete is an async operation.
- Can run for hours on large data sets.
- Configured with query, recurrence, time window.
For data archival or compliance, bulk delete is the right mechanism.
Plug-in profiling. When async plug-ins are slow:
- Trace log inside plug-in.
- Application Insights integration (where configured).
- Identify hot paths.
Performance issues compound — slow plug-in × many records = significant time.
Common pitfalls.
- No backlog monitoring. First sign is user complaints; reactive.
- Cleanup disabled. Async table balloons; queries slow.
- No failure investigation. Failed jobs pile up; problems compound.
- Sync work where async appropriate — user-facing slowness.
- Async work where sync needed — race conditions when subsequent steps assume completion.
- Throw-on-retry plug-ins — same failure repeats; backlog grows.
Operational rhythm.
- Daily — failure count check.
- Weekly — backlog depth review.
- Monthly — performance trend analysis.
- Per incident — root cause of significant failures.
Strategic positioning. Async jobs are the invisible backbone of Dataverse extensibility. They work reliably most of the time; when they don't, the symptoms can be subtle (operations seem slow, some side effects don't happen) and the diagnosis requires understanding the system. Investing in monitoring, cleanup, and failure analysis early prevents accumulated debt. Mature deployments treat async health as a first-class operational metric, not an afterthought.
Related guides
- Bulk delete jobs in DataverseHow Dataverse's bulk delete handles mass record cleanup — scheduling, filters, retention policies, and the operational discipline around storage management.
- Business rules in DataverseHow business rules let you add field-level logic to forms without code — set value, lock field, show error, recommendation, and the limits of the engine.
- Business units and teams in Dataverse — a deep diveHow business units, owner teams, access teams, and Microsoft 365 group teams compose the security model in Dataverse — what each is for, how they interact, and the common design mistakes.
- Calculated and rollup columns in DataverseHow calculated columns and rollup columns work in Dataverse — what each does, the performance trade-offs, and when to use a formula column or a Power Automate flow instead.
- Cascading delete in DataverseHow Dataverse relationships behave on delete — cascade, restrict, remove link, and the implications for data integrity and accidental data loss.