Back to DP-600 Parquet DP-600 · Study Guide
Microsoft Certified · Fabric Analytics Engineer Associate

Exam DP‑600
Study Guide

Implementing Analytics Solutions Using Microsoft Fabric — Lakehouse, Warehouse, Semantic Models, DAX, T‑SQL and PySpark.

A consolidated, scenario‑first reference organized around the four official exam domains. Every topic is paired with the decisions, trade‑offs, and distractors you will actually encounter on the exam.

Domain weighting
Plan · 10–15% Prepare & Serve · 40–45% Semantic Models · 20–25% Explore · 20–25%
Exam codeDP‑600
Duration100 min / 120 seat
Questions40 – 60
Passing score700 / 1000
Fabric Analytics Engineer Associate 01 / 16
DP‑600 · Study GuideTable of Contents
00Contents

What's inside.

Eight chapters covering the exam blueprint, deep topic notes for all four domains, comparison tables, the traps the exam likes to set, and a twenty‑item pre‑exam checklist.

01 Exam Domains & Weightage Blueprint, audience profile, and what the exam tests. p. 03
02 Domain 1 — Plan, Implement & Manage Workspace architecture, lifecycle, governance, security. p. 04
03 Domain 2 — Prepare & Serve Data Lakehouse, Warehouse, ingestion, transformation, medallion. p. 06
04 Domain 3 — Semantic Models Star schema, DirectLake, DAX, incremental refresh, XMLA. p. 09
05 Domain 4 — Explore & Analyze SQL endpoints, notebooks, profiling, Copilot, perf queries. p. 11
06 Service Comparison Tables Stores, storage modes, languages, tools. p. 13
07 Common Pitfalls & Distractors Twelve plausible‑but‑wrong answers the exam loves. p. 14
08 Final Checklist — 20 Must‑Knows Night‑before review list. p. 15
DP‑60002 / 16
DP‑600 · 01Exam Domains
01Official Exam Domains & Weightage

The blueprint.

The DP‑600 exam validates your ability to design, build, and deploy enterprise‑scale analytics solutions in Microsoft Fabric — spanning Lakehouse, Warehouse, semantic models, DAX, T‑SQL, and PySpark. It sits between data engineering and BI: you are the person who turns raw data into reusable, trusted analytics assets.

Audience profile

Weighting

DomainWeightFocus
01   Plan, Implement & Manage a Solution10–15%Workspaces, lifecycle, security
02   Prepare & Serve Data40–45%Lakehouse, Warehouse, ingest
03   Implement & Manage Semantic Models20–25%Star schema, DirectLake, DAX
04   Explore & Analyze Data20–25%SQL endpoints, profiling
Exam alert

Prepare & Serve Data is by far the largest domain — roughly four in ten questions. Spend the most prep time on Lakehouse vs. Warehouse choice, Delta table maintenance, Dataflow Gen2, notebooks, and medallion architecture. Breadth is the challenge: the exam spans data‑engineer skills (Spark, Lakehouse) and BI skills (DAX, semantic models).

Format at a glance

Platforms

Fabric · OneLake · Lakehouse · Warehouse · Power BI

Question types

Multiple choice · Drag‑and‑drop · Case studies

Languages

SQL · DAX · PySpark · Power Query (M)

01 · Domains03 / 16
DP‑600 · 02Domain 1 · Plan & Manage
02Domain 1 · 10 – 15%

Plan, implement & manage the solution.

2.1   Plan a data analytics environment

2.2   Implement & manage an analytics environment

Exam alert

Know the division of labor between Git integration (source control, diffing, PRs) and deployment pipelines (stage promotion with rules). They are complementary — Git tracks change, pipelines promote it.

2.3   Security & access model

RequirementFeatureNote
Isolate tenants by rowRow‑Level Security (RLS)Security predicate; applies in Warehouse, Lakehouse SQL endpoint, and semantic models.
Hide sensitive columnsColumn / Object‑Level SecurityDENY SELECT on specific columns.
Mask PII at presentationDynamic Data MaskingNot encryption — DBAs still see real data.
Restrict folders in OneLakeOneLake file/folder ACLsEnforced across every engine reading the path.
Protect downstream exportsSensitivity labels (Purview)Labels flow to PBIX, Excel, shared reports.
Signal trustEndorsementsPromoted (author) or Certified (admin) — not a security control.
02 · Domain 104 / 16
DP‑600 · 02Lifecycle & deployment
02 · cardLifecycle card · Git + deployment pipelines

From notebook to production.

Step 01
Develop
Author lakehouses, warehouses, semantic models in a dev workspace.
Step 02
Source control
Workspace bound to Git branch; commit notebooks, TMDL, PBIP.
Step 03
Promote
Deployment pipeline pushes items dev → test → prod with rules.
Step 04
Operate
Monitor Hub, Capacity Metrics app, refresh health, usage metrics.

Fabric items an analytics engineer touches

ItemPrimary purposeOwner‑persona
LakehouseFiles + Delta tables over OneLake.Data engineer + analytics eng.
WarehouseT‑SQL DW with multi‑table transactions.Analytics engineer
SQL endpointRead‑only T‑SQL over a Lakehouse.Analyst / analytics eng.
Semantic modelStar‑schema model for Power BI.Analytics engineer
NotebookPySpark / SparkSQL transformations.Analytics eng. / DE
Dataflow Gen2Low‑code Power Query ingest + shape.Analyst
Data pipelineOrchestration with copy, notebook, dataflow.Analytics eng. / DE
Power BI reportVisual explore & distribute.Analyst
Mental model

Analytics engineer's job = turn Lakehouse/Warehouse tables into a reusable semantic model with clean names, clear relationships, and performant DAX. Everything on this page supports that goal.

02 · Domain 105 / 16
DP‑600 · 03Domain 2 · Prepare & Serve
03Domain 2 · 40 – 45% (largest domain)

Prepare & serve data.

3.1   Lakehouse vs. Warehouse — the central decision

AspectLakehouseWarehouse
StorageDelta Parquet (files + tables)Delta (managed T‑SQL)
Primary languageSpark SQL / PySparkT‑SQL
Multi‑table txnnoyes
Unstructured datayes (Files area)no
DML via SQL endpointread‑onlyfull
Shortcuts inyesVia OneLake
Best fitMedallion ELT, ML, raw & curated dataStar‑schema DW, serve semantic model

3.2   Medallion architecture

3.3   Create objects in a Lakehouse or Warehouse

If →you need multi‑table transactions for gold layer → land gold in a Warehouse.
If →you need to analyze ADLS files without copying → Shortcut into Lakehouse.
If →you need a live read of Azure SQL with minimal ETL → Mirroring.
If →a team wants low‑code ETL with Power Query → Dataflow Gen2.
03 · Domain 206 / 16
DP‑600 · 03Ingest & transform
03 · cont.Ingest, transform, copy, merge

3.4   Copy & transform data

3.5   Handle SCDs, duplicates, late data

Dedup + incremental — quick forms

Deduplicate on natural key

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER
    (PARTITION BY CustomerId
     ORDER BY UpdatedAt DESC) rn
  FROM stg.Customer
) t WHERE rn = 1;

Incremental watermark load

-- pipeline expression
@{activity('LookupWM').output.firstRow.LastLoad}

SELECT * FROM src.Orders
WHERE ModifiedDate > @prevWatermark;

Pro tip

When the exam asks "which tool for this transformation?" read for persona clues: analyst + UI → Dataflow Gen2 (M); large joins / ML → Notebook (PySpark); multi‑table transaction → Warehouse (T‑SQL).

03 · Domain 207 / 16
DP‑600 · 03Delta maintenance & optimization
03 · cont.Delta maintenance & performance

3.6   Optimize Lakehouse / Warehouse performance

Decide: which maintenance op?

SymptomRemedyWhy
Many tiny files from streamingOPTIMIZECompacts into larger Parquet files.
Slow filter on one columnZ‑ORDER BY colCo‑locates matching rows.
Storage bill growingVACUUMRemoves files past retention.
Slow read in Power BI DirectLakeKeep V‑Order onOptimized layout for Vertipaq.
Query does full scan on a date filterPartition on date_colPartition pruning.

3.7   Denormalize for gold + serve

Exam alert

OPTIMIZE ≠ VACUUM. OPTIMIZE compacts small files; VACUUM removes obsolete files after retention. The exam tests you on picking the right one for the stated symptom.

03 · Domain 208 / 16
DP‑600 · 04Domain 3 · Semantic Models
04Domain 3 · 20 – 25%

Design & manage semantic models.

4.1   Design & build semantic models

4.2   Storage modes — Import vs. DirectQuery vs. DirectLake

ModeHow it worksBest forTrade‑off
ImportData loaded into VertiPaq (in‑memory).Fast reports; complex DAX.Needs refresh; memory‑bound.
DirectQueryEvery visual issues a query to source.Live data; very large models.Slower; pushes load to source.
DirectLakeReads Delta directly from OneLake into VertiPaq on demand.Fabric‑native; no refresh; near‑Import speed.Fabric Lakehouse/Warehouse only; falls back to DirectQuery if rules violated.
CompositeMix: Import fact + DQ dim, dual tables.Hybrid scenarios.Relationship limits; complexity.

4.3   DirectLake — know the rules

Exam alert

When the scenario mentions huge volumes + always‑current + Fabric source, the right answer is usually DirectLake. But if calculated columns or non‑Fabric sources are required, it must be Import or DirectQuery.

04 · Domain 309 / 16
DP‑600 · 04DAX, refresh, deploy
04 · cont.DAX optimization & model operations

4.4   DAX essentials the exam tests

4.5   Partitioning, incremental refresh, XMLA

4.6   Implement & manage security on models

Dynamic RLS pattern

[Region] =
LOOKUPVALUE(
  UserSecurity[Region],
  UserSecurity[Email],
  USERPRINCIPALNAME()
)

Time‑intel measure with VAR

Sales PY :=
VAR _prev =
  CALCULATE(
   [Sales],
   SAMEPERIODLASTYEAR('Date'[Date])
  )
RETURN _prev

Pro tip

If a DAX measure is slow, first check whether you are iterating a large fact table. Replace SUMX(Fact, Fact[x]*Fact[y]) with a pre‑calculated column only if the model isn't DirectLake — otherwise push the math upstream into the Lakehouse/Warehouse.

04 · Domain 310 / 16
DP‑600 · 05Domain 4 · Explore & Analyze
05Domain 4 · 20 – 25%

Explore & analyze data.

5.1   Exploratory tools in Fabric

5.2   Descriptive / data profiling

MetricT‑SQL / SparkSQLUse
Row countSELECT COUNT(*)Sanity check after load.
CardinalityCOUNT(DISTINCT col)Dedupe targets, key candidates.
Null ratioAVG(CASE WHEN col IS NULL THEN 1.0 ELSE 0 END)Missing‑data audit.
Min / max / avgMIN / MAX / AVGRange sanity + outlier hunt.
Top frequenciesGROUP BY col ORDER BY count DESCSkew detection.

5.3   Query performance — warehouse & SQL endpoint

If →an analyst wants ad‑hoc T‑SQL over Lakehouse Delta tables → SQL endpoint, visual query.
If →profiling a new dataset for nulls, skew, cardinality → notebook with display() / T‑SQL profiling queries.
If →a repeated slow query → check query insights + plan; enable result‑set caching, update statistics.
05 · Domain 411 / 16
DP‑600 · 05Query patterns
05 · cont.Practical query patterns

5.4   Query patterns to recognize on the exam

Running total (windowed)

SUM(Amount) OVER (
  PARTITION BY CustomerId
  ORDER BY OrderDate
  ROWS UNBOUNDED PRECEDING
) AS RunTotal

Top N per group

WITH r AS (
 SELECT *, ROW_NUMBER() OVER
  (PARTITION BY CategoryId
   ORDER BY Sales DESC) rn
 FROM Products
)
SELECT * FROM r WHERE rn <= 5;

MERGE for upsert (Warehouse)

MERGE dim.Customer AS t
USING stg.Customer AS s
  ON t.CustomerKey = s.CustomerKey
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT (...)
  VALUES (...);

Cross‑database query

SELECT f.*, d.ProductName
FROM [SalesWH].dbo.FactSales f
JOIN [RetailLH].dbo.DimProduct d
  ON f.ProductKey = d.ProductKey;

5.5   PySpark patterns (notebook)

Pro tip

Windowed dedup, top‑N, running totals, and MERGE upsert are the four SQL patterns most likely to appear in drag‑and‑drop questions. Know what each one looks like at a glance.

05 · Domain 412 / 16
DP‑600 · 06Service comparison tables
06Service comparison tables

Choose the right tool.

Fabric data stores compared

FeatureLakehouseWarehouseKQL DatabaseSQL DB in Fabric
Primary workloadFiles + Delta tablesT‑SQL DWReal‑time / telemetryOLTP
LanguageSpark SQL / PySparkT‑SQLKQLT‑SQL
SQL writesread‑only endpointfullAppendfull
Multi‑table txnnoyesnoyes
Unstructured filesyesnonono
Best forMedallion, ML, raw+curatedStar schema, gold servingLogs, IoT, eventsApp backend

Storage modes for semantic models

ModeData locationRefresh modelChoose when…
ImportIn‑memory VertipaqScheduledFastest reports, complex DAX, size fits memory.
DirectQueryStays in sourceOn queryReal‑time, very large, can't import.
DirectLakeDelta in OneLake, loaded on demandFramingFabric‑native, always current, near‑Import speed.
CompositeMixed (Dual tables)MixedHybrid: Import facts + DQ dims.

Transformation language — at a glance

LanguageStrengthTypical item
Power Query (M)Low‑code UI shaping.Dataflow Gen2
PySpark / Spark SQLDistributed compute, ML.Notebook
T‑SQLSet ops, transactions, views.Warehouse, SQL endpoint (read)
DAXAnalytical measures in model.Semantic model
06 · Comparison13 / 16
DP‑600 · 07Common pitfalls
07Common pitfalls & distractor answers

Plausible, but wrong.

The exam uses plausible‑sounding options to test depth of understanding. Twelve of the most common traps, with corrections.

01
Lakehouse SQL endpoint supports full DML.
WrongThe Lakehouse SQL endpoint is read‑only. For full T‑SQL DML use a Warehouse or write via Spark.
02
DirectLake needs a scheduled refresh.
WrongDirectLake reads Delta on demand. Only framing / schema refresh is needed — no data copy.
03
Calculated columns are fine in a DirectLake model.
WrongCalculated columns/tables are not supported in DirectLake. Push derivations upstream into Lakehouse/Warehouse.
04
Shortcuts copy data into OneLake.
WrongShortcuts virtualize data — no copy, no egress. Source changes are visible immediately.
05
Mirroring is bidirectional.
WrongMirroring is read‑only in OneLake. Writes go to the source system (Azure SQL, Cosmos DB, Snowflake).
06
Dynamic Data Masking encrypts data.
WrongDDM masks at presentation only. Privileged users still see real data.
07
Git integration replaces deployment pipelines.
WrongGit tracks change; deployment pipelines promote items across stages. Use both.
08
Partitioning always speeds up a Lakehouse table.
WrongHigh‑cardinality partitioning creates tiny files and hurts performance. Partition on low‑cardinality filter columns only.
09
OPTIMIZE and VACUUM do the same thing.
WrongOPTIMIZE compacts small files; VACUUM removes obsolete files past retention. Different operations.
10
Bi‑directional relationships are the default best practice.
WrongUse single‑direction one‑to‑many by default. Bi‑directional introduces ambiguity and performance cost — only when justified.
11
Calculated columns are better than measures for performance.
WrongMeasures compute at query time without inflating model size. Calculated columns bloat memory — prefer measures when possible.
12
Endorsement is the same as a sensitivity label.
WrongEndorsement (Promoted / Certified) signals trust. Sensitivity labels enforce protection. Different purposes.
07 · Pitfalls14 / 16
DP‑600 · 08Final checklist
08Night‑before review · 20 must‑know items

Twenty things you must know.

Review the night before. If any item feels unfamiliar, revisit that topic in the guide.

01You can choose between Lakehouse and Warehouse and justify on workload, DML, and transactions.
02You understand the medallion (bronze / silver / gold) pattern and layer responsibilities.
03You can set up Git integration and a deployment pipeline, and know they are complementary.
04You can apply workspace, item, row, column, object, file access controls.
05You can implement dynamic RLS with USERPRINCIPALNAME() and a security table.
06You understand Shortcuts (zero‑copy) vs. Mirroring (read‑only replica) vs. Copy.
07You can ingest via Copy activity, Dataflow Gen2, Notebook, Shortcut, Mirroring.
08You can transform with Power Query (M), PySpark, T‑SQL, and pick by persona.
09You can implement SCD Type 1 and Type 2, dedup, late‑arriving facts.
10You can maintain Delta tables with V‑Order, OPTIMIZE, Z‑ORDER, VACUUM — and pick the right one.
11You understand Import vs. DirectQuery vs. DirectLake vs. Composite and when to choose each.
12You know DirectLake limits: no calculated columns/tables; fallback to DirectQuery.
13You can design a star schema with conformed dims, surrogate keys, proper relationships.
14You can write core DAX: CALCULATE, FILTER, time‑intel, iterators, VAR.
15You can configure incremental refresh with RangeStart/RangeEnd and detect‑changes.
16You can connect via the XMLA endpoint using SSMS / Tabular Editor / ALM Toolkit.
17You can profile data with T‑SQL or Spark (counts, nulls, cardinality, distributions).
18You can investigate slow queries via query insights, DMVs, execution plans.
19You recognize common SQL patterns: window functions, top‑N per group, MERGE upsert.
20You can design end‑to‑end: ingest → medallion → semantic model → report on Fabric.
08 · Checklist15 / 16
DP‑600 · ColophonClosing
— Good luck
Focus on scenario‑based reasoning: understand not just what each Fabric item does, but when and why you would pick Lakehouse over Warehouse, DirectLake over Import, or a measure over a calculated column.
Source

Microsoft Learn — Study Guide for Exam DP‑600, Implementing Analytics Solutions Using Microsoft Fabric.

End of guide · DP‑600
DP‑60016 / 16