Power BI dataflows vs datamarts

Power BI dataflows, datamarts, semantic models, and Fabric items — when to use each, how they relate, and the path Microsoft's Fabric strategy is pushing data work toward.

Updated 2026-06-27

Power BI has grown from a desktop reporting tool into a layered data platform. The layering — dataflows, datamarts, semantic models, paginated reports — is confusing without a map. With Microsoft Fabric's emergence, the layers shift further, with some legacy concepts deprecated in favour of Fabric items. This article maps the terrain.

The traditional Power BI stack.

  • Dataset (now semantic model) — the analytical model: tables, relationships, measures (DAX). Backed by Vertipaq in-memory columnstore. Consumers connect to it from reports and Excel.
  • Dataflow — ETL artefact built in Power Query Online. Outputs to a CDM-folder-structured storage in OneLake (Fabric) or a workspace (legacy).
  • Datamart (legacy) — a self-contained dataflow + Azure SQL DB + semantic model. Designed for self-serve "I want a small star schema with SQL access" scenarios.
  • Paginated report — pixel-perfect operational reports; SSRS heritage.
  • Power BI report — interactive dashboards built in Power BI Desktop.

Fabric introduces.

  • Lakehouse — files-based store on OneLake; Delta tables; Spark and SQL access.
  • Data warehouse (Fabric) — full T-SQL warehouse on OneLake.
  • Eventstream — real-time ingestion.
  • Pipelines — orchestration.
  • KQL database — telemetry/logs.
  • Notebook — Spark code.
  • Direct Lake mode — semantic model reads Lakehouse Delta tables directly, no import.

Dataflows. The Power Query ETL workspace asset. Reads from sources (SQL, OneDrive, REST APIs), applies transformations, writes to CDM-folder output. Multiple semantic models can reference the same dataflow — central place to do "join customer master once" without each model duplicating logic.

Datamarts. Introduced 2022; provided self-serve a "SQL endpoint plus model in one click" experience. With Fabric, datamarts are effectively superseded by Lakehouses (for files) and Fabric warehouses (for SQL). Microsoft has signalled datamarts won't see significant new investment; existing datamarts continue working.

Semantic models. The analytic engine. Two import modes:

  • Import — data copied into the in-memory model; updates via scheduled or incremental refresh.
  • DirectQuery — queries pushed to the source at report time; no copy in the model.
  • Direct Lake (Fabric) — reads Lakehouse Delta tables directly without import or query at run time.

Direct Lake is the new default for Fabric-resident data: in-memory performance without import overhead.

When to use a dataflow vs other ETL.

  • Quick self-serve transformations for non-engineering users → dataflow.
  • Enterprise-grade ETL with version control and testing → pipelines + notebooks in Fabric, or Azure Data Factory.
  • Real-time ingestion → Eventstream into a Lakehouse.

Dataflows remain a strong middle ground for citizen data engineers.

Centralised vs decentralised modelling.

  • Centralised semantic models — IT/BI team owns; business consumers connect. Quality controlled; less duplication.
  • Decentralised personal models — every analyst builds their own from raw data. Fast for that analyst, chaotic across organisation.

Modern Power BI practice favours certified semantic models at the centre with self-serve report building on top.

Connectivity from Excel and Power BI. Semantic models are consumable from:

  • Power BI reports in the same workspace or other workspaces.
  • ExcelConnect to Power BI dataset.
  • External tools through XMLA endpoint.
  • Custom apps via the Power BI REST API.

This makes the certified semantic model the "single version of truth" surface.

Capacity considerations.

  • Premium per user (PPU) — for the user, models up to capacity limits.
  • Premium capacity (P SKU) — for the workspace, defined throughput.
  • Fabric capacity (F SKU) — Fabric's unified compute/storage capacity.

Fabric capacity is the future; new investments should land there.

Refresh patterns.

  • Scheduled refresh — N times per day; standard for batch reporting.
  • Incremental refresh — only updated data refreshes; saves time and storage.
  • Hybrid table — recent data DirectQuery, older data imported.
  • Direct Lake — no refresh; reads live from Lakehouse.

Common pitfalls.

  • Dataflow proliferation. Every report builds its own dataflow; duplication explodes.
  • Mixed modes in one model. Some tables import, some DirectQuery; performance unpredictable.
  • Semantic model dependencies broken. Underlying data source schema changes; model fails. Build version checks into pipelines.
  • Datamart investment. Building new datamarts in 2026 — likely wasted effort given Fabric direction.
  • Capacity contention. Multiple heavy workspaces on shared capacity; reports slow.

Strategic direction. Microsoft Fabric is the unifying story: OneLake holds the data, Lakehouses and warehouses serve it, semantic models analyse it, Power BI surfaces it. New investments should think Fabric-first: Lakehouse for raw and curated data, Fabric pipelines for ETL, semantic models in Direct Lake mode for analytics, Power BI reports on top. Legacy Power BI artefacts continue working but the centre of gravity has shifted.

Related guides