Observability for Dynamics 365 integrations
How to build observability into Dynamics 365 integrations — logs, metrics, traces, correlation IDs, and the patterns that make production integration health visible.
A production integration that "works" by absence of complaints isn't truly working — it's just not failing visibly yet. Observability is the discipline of making system behaviour visible: what's happening, how well, with what latency, with what errors. For Dynamics 365 integrations spanning multiple systems, observability is essential for reliable operations.
The three pillars.
- Logs — discrete events with context.
- Metrics — aggregated quantitative data.
- Traces — request flow across services.
Each gives different visibility; modern observability uses all three.
Logs in Dynamics 365 integrations.
- Plug-in trace logs — for Dataverse plug-in execution.
- Application Insights logs — from custom code.
- Azure Function logs — Azure-side processing.
- Service Bus / Event Grid logs — message broker activity.
- Dataverse audit logs — business-level changes.
Each source contributes; centralising helps.
Structured logging.
log.LogInformation("Order processed",
new { OrderId = "...", CustomerId = "...", DurationMs = 123 });
Structured logs enable filter, aggregation, correlation in tools like Application Insights, Splunk, ELK.
Metrics. Quantitative; useful patterns:
- Throughput — messages per minute.
- Latency — duration percentiles (P50, P95, P99).
- Error rate — % failures.
- Saturation — resource usage.
The "USE" pattern (Utilization, Saturation, Errors) and "RED" pattern (Rate, Errors, Duration) are common frameworks.
Distributed tracing. Following a request across services:
- Correlation ID generated at entry point.
- Propagated in HTTP headers, messages, plug-ins.
- Logged at each hop.
- Visualised as trace graph.
Enables "follow this request" debugging across many services.
OpenTelemetry. Open standard:
- Vendor-neutral instrumentation.
- Supports logs, metrics, traces.
- Multiple language libraries.
- Export to Application Insights, Datadog, Jaeger, etc.
Modern recommendation: use OpenTelemetry; future-proof.
Application Insights for Dynamics.
- Dataverse integrates with App Insights (configurable).
- Plug-in traces flow to App Insights.
- Custom code emits.
- Centralised dashboard.
For Microsoft-aligned organisations, App Insights is the natural starting point.
Per-integration observability. Each integration should provide:
- Heartbeat — is it running?
- Throughput — messages processed.
- Latency distribution.
- Error rate.
- Dependent system health.
Each integration's health visible separately.
End-to-end visibility. Cross-integration:
- Business transaction tracking — order received → processed → fulfilled.
- SLA monitoring — was order processed within commitment?
- Funnel analysis — where in flow do issues occur?
Customer-facing SLA depends on end-to-end visibility.
Alerting.
- Error rate spike — alert.
- Latency degradation — alert.
- Throughput drop — alert (something broken upstream).
- Heartbeat missing — alert.
Each alert routes to on-call or operations.
Alert quality. Critical:
- Actionable — recipient knows what to do.
- Not too noisy — don't fatigue responders.
- Severity-appropriate — critical via page; informational via email.
- Correlated — one alert per incident, not 100.
Bad alerting is worse than no alerting; people ignore.
Dashboards.
- Operations dashboard — current state at a glance.
- Business dashboard — business-relevant metrics.
- Per-integration drill-down.
- Incident dashboards — for ongoing investigations.
Dashboards different per audience.
Per-message observability. Trace a specific message:
- Where did it start?
- What systems handled it?
- How long at each step?
- Did it succeed?
For customer service ("what happened to my order?"), this matters.
SLA / SLO tracking.
- Service Level Objective — target performance.
- Service Level Agreement — contractual.
- Error budget — allowable failures.
- Burn rate — how fast budget consumed.
Mature operations have SLOs informed by observability.
Logging best practices.
- Structured — JSON or similar.
- Levels — Debug / Info / Warning / Error.
- Sensitive data excluded — no PII, secrets.
- Context-rich — IDs, timestamps, source.
- Sampled for high-volume — full logging too expensive.
Common pitfalls.
- No observability. Issues only known when users complain.
- Verbose logging. Logs flood; signal lost.
- No correlation. Can't trace requests across services.
- Per-system silos. Each system has logs; no unified view.
- Alerts ignored. Too noisy; alert fatigue.
- Sensitive data in logs. Compliance violation.
- Retention too short. Can't investigate older issues.
Cost considerations.
- Log volume — App Insights / Datadog priced by ingestion.
- High-cardinality metrics — expensive.
- Long retention — expensive.
Balance: enough observability to operate; not so much it bankrupts.
Privacy considerations.
- Logs may contain personal data.
- GDPR / privacy regulations apply.
- Retention policies needed.
- Subject access requests might include logs.
Design with privacy in mind.
Audit logs vs operational logs.
- Audit — for compliance; long retention.
- Operational — for diagnostics; shorter retention.
Different needs; separate streams typically.
Observability culture.
- Engineers add observability as they write code.
- Reviews include observability check.
- Incidents analyse observability gaps.
- Continuous improvement.
The culture determines whether observability is investment or afterthought.
Strategic positioning. Observability is the foundation of operational excellence. Without it, integration operations is reactive and chaotic; with it, problems are detected and resolved fast.
For architects:
- Design observability into integrations from start.
- Standardise on tooling.
- Build dashboards and alerts.
- Train operators.
- Iterate based on incidents.
The investment is meaningful but compounds: every integration benefits from the foundation. The teams that take observability seriously have predictable operations; the teams that don't have surprise incidents. The difference is largely the observability discipline.
Related guides
- Circuit breakers in Dynamics 365 integrationsHow the circuit-breaker pattern protects Dynamics 365 integrations from cascading failures — implementation in Azure Functions, Logic Apps, and Dataverse plug-ins, with operational tuning.
- Integration testing patterns for Dynamics 365How to test integrations for Dynamics 365 — unit, contract, integration, end-to-end tests, and the patterns that catch regressions before production.
- OpenTelemetry with Dynamics 365 integrationsHow OpenTelemetry standardises observability in Dynamics 365 architectures — instrumentation, exporters, distributed tracing, and the path to vendor-neutral observability.
- Retry policies with Azure services for Dynamics 365 integrationsHow to implement retry policies in Azure-based Dynamics 365 integrations — exponential backoff, idempotency, circuit-breaker integration, and the patterns that handle transient failures gracefully.
- Eventually consistent integrations with Dynamics 365How to design and operate eventually-consistent integrations — consistency vs availability trade-offs, conflict resolution, and the UX implications.