Observability for Dynamics 365 integrations

How to build observability into Dynamics 365 integrations — logs, metrics, traces, correlation IDs, and the patterns that make production integration health visible.

Updated 2026-11-13

A production integration that "works" by absence of complaints isn't truly working — it's just not failing visibly yet. Observability is the discipline of making system behaviour visible: what's happening, how well, with what latency, with what errors. For Dynamics 365 integrations spanning multiple systems, observability is essential for reliable operations.

The three pillars.

Logs — discrete events with context.
Metrics — aggregated quantitative data.
Traces — request flow across services.

Each gives different visibility; modern observability uses all three.

Logs in Dynamics 365 integrations.

Plug-in trace logs — for Dataverse plug-in execution.
Application Insights logs — from custom code.
Azure Function logs — Azure-side processing.
Service Bus / Event Grid logs — message broker activity.
Dataverse audit logs — business-level changes.

Each source contributes; centralising helps.

Structured logging.

log.LogInformation("Order processed",
    new { OrderId = "...", CustomerId = "...", DurationMs = 123 });

Structured logs enable filter, aggregation, correlation in tools like Application Insights, Splunk, ELK.

Metrics. Quantitative; useful patterns:

Throughput — messages per minute.
Latency — duration percentiles (P50, P95, P99).
Error rate — % failures.
Saturation — resource usage.

The "USE" pattern (Utilization, Saturation, Errors) and "RED" pattern (Rate, Errors, Duration) are common frameworks.

Distributed tracing. Following a request across services:

Correlation ID generated at entry point.
Propagated in HTTP headers, messages, plug-ins.
Logged at each hop.
Visualised as trace graph.

Enables "follow this request" debugging across many services.

OpenTelemetry. Open standard:

Vendor-neutral instrumentation.
Supports logs, metrics, traces.
Multiple language libraries.
Export to Application Insights, Datadog, Jaeger, etc.

Modern recommendation: use OpenTelemetry; future-proof.

Application Insights for Dynamics.

Dataverse integrates with App Insights (configurable).
Plug-in traces flow to App Insights.
Custom code emits.
Centralised dashboard.

For Microsoft-aligned organisations, App Insights is the natural starting point.

Per-integration observability. Each integration should provide:

Heartbeat — is it running?
Throughput — messages processed.
Latency distribution.
Error rate.
Dependent system health.

Each integration's health visible separately.

End-to-end visibility. Cross-integration:

Business transaction tracking — order received → processed → fulfilled.
SLA monitoring — was order processed within commitment?
Funnel analysis — where in flow do issues occur?

Customer-facing SLA depends on end-to-end visibility.

Alerting.

Error rate spike — alert.
Latency degradation — alert.
Throughput drop — alert (something broken upstream).
Heartbeat missing — alert.

Each alert routes to on-call or operations.

Alert quality. Critical:

Actionable — recipient knows what to do.
Not too noisy — don't fatigue responders.
Severity-appropriate — critical via page; informational via email.
Correlated — one alert per incident, not 100.

Bad alerting is worse than no alerting; people ignore.

Dashboards.

Operations dashboard — current state at a glance.
Business dashboard — business-relevant metrics.
Per-integration drill-down.
Incident dashboards — for ongoing investigations.

Dashboards different per audience.

Per-message observability. Trace a specific message:

Where did it start?
What systems handled it?
How long at each step?
Did it succeed?

For customer service ("what happened to my order?"), this matters.

SLA / SLO tracking.

Service Level Objective — target performance.
Service Level Agreement — contractual.
Error budget — allowable failures.
Burn rate — how fast budget consumed.

Mature operations have SLOs informed by observability.

Logging best practices.

Structured — JSON or similar.
Levels — Debug / Info / Warning / Error.
Sensitive data excluded — no PII, secrets.
Context-rich — IDs, timestamps, source.
Sampled for high-volume — full logging too expensive.

Common pitfalls.

No observability. Issues only known when users complain.
Verbose logging. Logs flood; signal lost.
No correlation. Can't trace requests across services.
Per-system silos. Each system has logs; no unified view.
Alerts ignored. Too noisy; alert fatigue.
Sensitive data in logs. Compliance violation.
Retention too short. Can't investigate older issues.

Cost considerations.

Log volume — App Insights / Datadog priced by ingestion.
High-cardinality metrics — expensive.
Long retention — expensive.

Balance: enough observability to operate; not so much it bankrupts.

Privacy considerations.

Logs may contain personal data.
GDPR / privacy regulations apply.
Retention policies needed.
Subject access requests might include logs.

Design with privacy in mind.

Audit logs vs operational logs.

Audit — for compliance; long retention.
Operational — for diagnostics; shorter retention.

Different needs; separate streams typically.

Observability culture.

Engineers add observability as they write code.
Reviews include observability check.
Incidents analyse observability gaps.
Continuous improvement.

The culture determines whether observability is investment or afterthought.

Strategic positioning. Observability is the foundation of operational excellence. Without it, integration operations is reactive and chaotic; with it, problems are detected and resolved fast.

For architects:

Design observability into integrations from start.
Standardise on tooling.
Build dashboards and alerts.
Train operators.
Iterate based on incidents.

The investment is meaningful but compounds: every integration benefits from the foundation. The teams that take observability seriously have predictable operations; the teams that don't have surprise incidents. The difference is largely the observability discipline.

Related guides

← All guides Glossary →