A consolidated, scenario‑first reference organized around the four official exam domains. Every topic is paired with the decisions, trade‑offs, and distractors you will actually encounter on the exam.
Eight chapters covering the exam blueprint, deep topic notes for all four domains, comparison tables, the traps the exam likes to set, and a twenty‑item pre‑exam checklist.
The DP‑600 exam validates your ability to design, build, and deploy enterprise‑scale analytics solutions in Microsoft Fabric — spanning Lakehouse, Warehouse, semantic models, DAX, T‑SQL, and PySpark. It sits between data engineering and BI: you are the person who turns raw data into reusable, trusted analytics assets.
| Domain | Weight | Focus |
|---|---|---|
| 01 Plan, Implement & Manage a Solution | 10–15% | Workspaces, lifecycle, security |
| 02 Prepare & Serve Data | 40–45% | Lakehouse, Warehouse, ingest |
| 03 Implement & Manage Semantic Models | 20–25% | Star schema, DirectLake, DAX |
| 04 Explore & Analyze Data | 20–25% | SQL endpoints, profiling |
Prepare & Serve Data is by far the largest domain — roughly four in ten questions. Spend the most prep time on Lakehouse vs. Warehouse choice, Delta table maintenance, Dataflow Gen2, notebooks, and medallion architecture. Breadth is the challenge: the exam spans data‑engineer skills (Spark, Lakehouse) and BI skills (DAX, semantic models).
Fabric · OneLake · Lakehouse · Warehouse · Power BI
Multiple choice · Drag‑and‑drop · Case studies
SQL · DAX · PySpark · Power Query (M)
Know the division of labor between Git integration (source control, diffing, PRs) and deployment pipelines (stage promotion with rules). They are complementary — Git tracks change, pipelines promote it.
| Requirement | Feature | Note |
|---|---|---|
| Isolate tenants by row | Row‑Level Security (RLS) | Security predicate; applies in Warehouse, Lakehouse SQL endpoint, and semantic models. |
| Hide sensitive columns | Column / Object‑Level Security | DENY SELECT on specific columns. |
| Mask PII at presentation | Dynamic Data Masking | Not encryption — DBAs still see real data. |
| Restrict folders in OneLake | OneLake file/folder ACLs | Enforced across every engine reading the path. |
| Protect downstream exports | Sensitivity labels (Purview) | Labels flow to PBIX, Excel, shared reports. |
| Signal trust | Endorsements | Promoted (author) or Certified (admin) — not a security control. |
| Item | Primary purpose | Owner‑persona |
|---|---|---|
| Lakehouse | Files + Delta tables over OneLake. | Data engineer + analytics eng. |
| Warehouse | T‑SQL DW with multi‑table transactions. | Analytics engineer |
| SQL endpoint | Read‑only T‑SQL over a Lakehouse. | Analyst / analytics eng. |
| Semantic model | Star‑schema model for Power BI. | Analytics engineer |
| Notebook | PySpark / SparkSQL transformations. | Analytics eng. / DE |
| Dataflow Gen2 | Low‑code Power Query ingest + shape. | Analyst |
| Data pipeline | Orchestration with copy, notebook, dataflow. | Analytics eng. / DE |
| Power BI report | Visual explore & distribute. | Analyst |
Analytics engineer's job = turn Lakehouse/Warehouse tables into a reusable semantic model with clean names, clear relationships, and performant DAX. Everything on this page supports that goal.
| Aspect | Lakehouse | Warehouse |
|---|---|---|
| Storage | Delta Parquet (files + tables) | Delta (managed T‑SQL) |
| Primary language | Spark SQL / PySpark | T‑SQL |
| Multi‑table txn | no | yes |
| Unstructured data | yes (Files area) | no |
| DML via SQL endpoint | read‑only | full |
| Shortcuts in | yes | Via OneLake |
| Best fit | Medallion ELT, ML, raw & curated data | Star‑schema DW, serve semantic model |
spark.read, Shortcut, Mirroring.EventDate). Never partition by high‑cardinality keys — it creates tiny files.INSERT … SELECT, MERGE, stored procedures.LastModified) for incremental; use CDC / Change Tracking when available.MERGE … WHEN MATCHED UPDATE.EffectiveFrom, EffectiveTo, IsCurrent. Close prior row; insert new.ROW_NUMBER() OVER (PARTITION BY natural_key ORDER BY ts DESC), keep rn = 1.COALESCE defaults, or quarantine nulls.
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY CustomerId
ORDER BY UpdatedAt DESC) rn
FROM stg.Customer
) t WHERE rn = 1;
-- pipeline expression
@{activity('LookupWM').output.firstRow.LastLoad}
SELECT * FROM src.Orders
WHERE ModifiedDate > @prevWatermark;
When the exam asks "which tool for this transformation?" read for persona clues: analyst + UI → Dataflow Gen2 (M); large joins / ML → Notebook (PySpark); multi‑table transaction → Warehouse (T‑SQL).
EventDate). High‑cardinality partitioning creates the small‑file problem.SELECT *, minimize data movement across distributions.| Symptom | Remedy | Why |
|---|---|---|
| Many tiny files from streaming | OPTIMIZE | Compacts into larger Parquet files. |
| Slow filter on one column | Z‑ORDER BY col | Co‑locates matching rows. |
| Storage bill growing | VACUUM | Removes files past retention. |
| Slow read in Power BI DirectLake | Keep V‑Order on | Optimized layout for Vertipaq. |
| Query does full scan on a date filter | Partition on date_col | Partition pruning. |
OPTIMIZE ≠ VACUUM. OPTIMIZE compacts small files; VACUUM removes obsolete files after retention. The exam tests you on picking the right one for the stated symptom.
USERELATIONSHIP.Date dim referenced by OrderDate, ShipDate via multiple (inactive) relationships.| Mode | How it works | Best for | Trade‑off |
|---|---|---|---|
| Import | Data loaded into VertiPaq (in‑memory). | Fast reports; complex DAX. | Needs refresh; memory‑bound. |
| DirectQuery | Every visual issues a query to source. | Live data; very large models. | Slower; pushes load to source. |
| DirectLake | Reads Delta directly from OneLake into VertiPaq on demand. | Fabric‑native; no refresh; near‑Import speed. | Fabric Lakehouse/Warehouse only; falls back to DirectQuery if rules violated. |
| Composite | Mix: Import fact + DQ dim, dual tables. | Hybrid scenarios. | Relationship limits; complexity. |
framing refreshes the view of Delta for the model.When the scenario mentions huge volumes + always‑current + Fabric source, the right answer is usually DirectLake. But if calculated columns or non‑Fabric sources are required, it must be Import or DirectQuery.
FILTER, ALL, KEEPFILTERS, REMOVEFILTERS.DATEADD, SAMEPERIODLASTYEAR, TOTALYTD, DATESBETWEEN. Require a marked Date table with contiguous dates.SUMX, AVERAGEX — evaluate expression per row; avoid on huge tables when a plain SUM would do.VAR + RETURN — evaluate once, reuse, improve readability and performance.CALCULATE converts row context to filter context; know this for iterator questions.RangeStart / RangeEnd parameters; set policy for storing N years + refreshing M days. Power BI auto‑partitions.USERPRINCIPALNAME()).
[Region] =
LOOKUPVALUE(
UserSecurity[Region],
UserSecurity[Email],
USERPRINCIPALNAME()
)
Sales PY :=
VAR _prev =
CALCULATE(
[Sales],
SAMEPERIODLASTYEAR('Date'[Date])
)
RETURN _prev
If a DAX measure is slow, first check whether you are iterating a large fact table. Replace SUMX(Fact, Fact[x]*Fact[y]) with a pre‑calculated column only if the model isn't DirectLake — otherwise push the math upstream into the Lakehouse/Warehouse.
%%sql, %%pyspark, display() for rich previews, describe for profiling.| Metric | T‑SQL / SparkSQL | Use |
|---|---|---|
| Row count | SELECT COUNT(*) | Sanity check after load. |
| Cardinality | COUNT(DISTINCT col) | Dedupe targets, key candidates. |
| Null ratio | AVG(CASE WHEN col IS NULL THEN 1.0 ELSE 0 END) | Missing‑data audit. |
| Min / max / avg | MIN / MAX / AVG | Range sanity + outlier hunt. |
| Top frequencies | GROUP BY col ORDER BY count DESC | Skew detection. |
queryinsights.exec_requests_history, long‑running and frequent queries.sys.dm_exec_* for sessions, requests, waits.display() / T‑SQL profiling queries.
SUM(Amount) OVER (
PARTITION BY CustomerId
ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING
) AS RunTotal
WITH r AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY CategoryId
ORDER BY Sales DESC) rn
FROM Products
)
SELECT * FROM r WHERE rn <= 5;
MERGE dim.Customer AS t
USING stg.Customer AS s
ON t.CustomerKey = s.CustomerKey
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT (...)
VALUES (...);
SELECT f.*, d.ProductName
FROM [SalesWH].dbo.FactSales f
JOIN [RetailLH].dbo.DimProduct d
ON f.ProductKey = d.ProductKey;
spark.read.format("delta").load(path) / spark.table("lh.schema.tbl").filter(), .groupBy().agg(), .withColumn(), .write.format("delta").mode("append").saveAsTable(...)F.broadcast(dim_df).F functions over UDFs; avoid collect() on large sets.Windowed dedup, top‑N, running totals, and MERGE upsert are the four SQL patterns most likely to appear in drag‑and‑drop questions. Know what each one looks like at a glance.
| Feature | Lakehouse | Warehouse | KQL Database | SQL DB in Fabric |
|---|---|---|---|---|
| Primary workload | Files + Delta tables | T‑SQL DW | Real‑time / telemetry | OLTP |
| Language | Spark SQL / PySpark | T‑SQL | KQL | T‑SQL |
| SQL writes | read‑only endpoint | full | Append | full |
| Multi‑table txn | no | yes | no | yes |
| Unstructured files | yes | no | no | no |
| Best for | Medallion, ML, raw+curated | Star schema, gold serving | Logs, IoT, events | App backend |
| Mode | Data location | Refresh model | Choose when… |
|---|---|---|---|
| Import | In‑memory Vertipaq | Scheduled | Fastest reports, complex DAX, size fits memory. |
| DirectQuery | Stays in source | On query | Real‑time, very large, can't import. |
| DirectLake | Delta in OneLake, loaded on demand | Framing | Fabric‑native, always current, near‑Import speed. |
| Composite | Mixed (Dual tables) | Mixed | Hybrid: Import facts + DQ dims. |
| Language | Strength | Typical item |
|---|---|---|
| Power Query (M) | Low‑code UI shaping. | Dataflow Gen2 |
| PySpark / Spark SQL | Distributed compute, ML. | Notebook |
| T‑SQL | Set ops, transactions, views. | Warehouse, SQL endpoint (read) |
| DAX | Analytical measures in model. | Semantic model |
The exam uses plausible‑sounding options to test depth of understanding. Twelve of the most common traps, with corrections.
Review the night before. If any item feels unfamiliar, revisit that topic in the guide.
USERPRINCIPALNAME() and a security table.RangeStart/RangeEnd and detect‑changes.Microsoft Learn — Study Guide for Exam DP‑600, Implementing Analytics Solutions Using Microsoft Fabric.