Azure Synapse Link for Dataverse

How Synapse Link replicates Dataverse data to Azure Data Lake Storage continuously — architecture, configuration, query patterns, and the path forward as Microsoft Fabric Link emerges.

Updated 2026-08-09

For analytical workloads against Dataverse data, the historical answer was nightly ETL via ADF or pulling data through the Dataverse API into a warehouse. Azure Synapse Link for Dataverse offered a better answer: continuous, low-latency replication of Dataverse changes to Azure Data Lake Storage (ADLS) Gen2, with Synapse SQL pools or Spark for query. With Microsoft Fabric's emergence, the equivalent is Fabric Link for Dataverse, which is increasingly the recommended path. Understanding both clarifies the strategic direction.

The problem Synapse Link solves.

Dataverse is a transactional system; analytical queries against it can degrade performance for users.
API-based extraction is slow for large data; OData paging adds overhead.
Nightly batch ETL introduces 24-hour latency.
Analytical queries want columnar storage and parallel compute; Dataverse's storage isn't optimised for that.

Synapse Link extracts changes continuously, lands them in ADLS as CSV or Parquet, and exposes them to analytical tools.

Architecture.

Dataverse environment — source.
Synapse Link configuration — per-table choices for replication.
ADLS Gen2 storage account — destination.
Synapse workspace — provides SQL serverless and Spark compute over ADLS.
Initial sync — bulk copy of selected tables.
Continuous delta — change-tracking-based updates flowing every 15+ minutes.

The setup is mostly point-and-click in the Power Platform portal.

Storage format. Two options:

CSV — simpler, larger, no schema enforcement.
Delta Parquet — columnar, smaller, schema-on-write, queryable as a Delta Lake table.

Delta Parquet is the modern choice; better performance and capability.

Folder structure. ADLS layout typically:

container/dataverse/
  {table-name}/
    {year}/{month}/{day}/
      file1.parquet
      file2.parquet
    ...
  table-name.snapshot/
  table-name.cdf/

.snapshot holds the current state; .cdf (change data feed) holds incremental changes.

Query patterns.

Synapse SQL serverless — SELECT ... FROM OPENROWSET(...) over the Parquet files.
Synapse Spark — Spark notebooks reading the Delta tables.
Power BI — directly query the Synapse SQL endpoint.
External tools — read the Parquet directly.

Fabric Link for Dataverse. The Microsoft Fabric–native version:

Same conceptual model: continuous replication of Dataverse to lake-based storage.
Destination: OneLake (Fabric's storage layer).
Native integration with Fabric Lakehouse, Warehouse, semantic models.
Direct Lake mode — Power BI semantic models read from OneLake without import.

Fabric Link is the strategic direction. New deployments should default here; existing Synapse Link deployments may migrate over time.

What replicates.

All standard columns of selected tables.
Choice values as labels (not numeric codes).
Lookups as related record IDs.
Audit history (optional).
Activity records (optional).

Latency. From change in Dataverse to availability in ADLS / OneLake:

Minimum ~15 minutes for active changes.
Typical within an hour.
Bulk operations may lag longer.

Not real-time but fresh enough for most analytical use.

Cost.

Dataverse Synapse Link — minimal incremental Dataverse cost.
Storage — ADLS or OneLake bytes.
Compute — Synapse SQL serverless per query; Spark per session.
Fabric — Fabric capacity consumed.

For analytical workloads, the cost is generally favourable vs running queries directly against Dataverse APIs at scale.

Per-table selection. Not every table needs replication:

High-value tables for analytics — yes.
System tables (audit, async operations) — usually no.
Custom tables — yes if used in analytics.
Reference data — usually yes.

Each table replicated costs storage and compute. Curate the list.

Schema evolution. When a Dataverse table changes (column added, renamed):

Synapse Link / Fabric Link picks up new columns.
Renamed columns can break dependent queries.
Deleted columns lag in the lake.

Schema drift is a continuous concern; document column dependencies for queries.

Use cases beyond reporting.

Machine learning training — historical Dataverse data as ML training set.
Data warehouse loading — stage to a curated warehouse.
Cross-system analytics — combine Dataverse with non-Dataverse data.
Archival — long-term retention beyond Dataverse's storage tier.

Comparison with alternatives.

| Approach | Latency | Setup | Cost | Use case | |---|---|---|---|---| | Synapse Link | ~15 min | Low | Moderate | Analytics replication | | Fabric Link | ~15 min | Low | Moderate | Fabric-centric analytics | | ADF pipeline | Scheduled | Higher | Moderate | Custom transformations | | API extraction | On-demand | Custom | Per-call | Specific data needs | | Direct Dataverse query | Real-time | None | Per-query | Operational queries |

For analytical workloads, Synapse/Fabric Link wins for most scenarios.

Common pitfalls.

Selecting everything. All Dataverse tables replicated; storage and cost balloon; usability suffers.
Schema changes not tracked. Downstream queries break silently.
No retention policy. Lake accumulates indefinitely; cost grows.
Mixed modes. Some tables in CSV, some in Parquet; queries inconsistent.
Security gap. Lake permissions less granular than Dataverse; sensitive data exposed to broader audience.

Security considerations. Replicated data in the lake bypasses Dataverse's row-level security:

Lake security is file/folder/container based.
Sensitive data in replicated tables is accessible to anyone with lake access.
Mitigation — separate sensitive tables into restricted containers; column-level masking in views.

For regulated data, careful planning of who can query the lake is essential.

Strategic positioning. Synapse Link / Fabric Link is the canonical pattern for Dataverse analytics. For new analytics workloads, set up Fabric Link from the start; build Lakehouse and semantic models on top. For existing reporting on Dataverse, evaluate moving to Fabric Link to offload analytical load from production Dataverse. The operational simplicity (low setup, continuous replication, native integration) makes it default for any meaningful analytics workload against Dataverse.

Related guides

← All guides Glossary →