Back to DP-700 Parquet DP-700 · Study Guide
Microsoft Certified · Fabric Data Engineer Associate

Exam DP‑700
Study Guide

Implementing Data Engineering Solutions Using Microsoft Fabric — Lakehouse, Warehouse, Eventhouse, Pipelines, Notebooks, KQL and T‑SQL.

A consolidated, scenario‑first reference organized around the three official exam domains. Every topic is paired with the decisions, trade‑offs, and distractors you will actually encounter on the exam.

Domain weighting
Implement & manage · 30–35% Ingest & transform · 30–35% Monitor & optimize · 30–35%
Exam codeDP‑700
Duration100 min / 120 seat
Questions40 – 60
Passing score700 / 1000
Fabric Data Engineer Associate 01 / 16
DP‑700 · Study GuideTable of Contents
00Contents

What's inside.

Eight chapters covering the exam blueprint, deep topic notes for all three domains, comparison tables, the traps the exam likes to set, and a twenty‑item pre‑exam checklist.

01 Exam Domains & Weightage Blueprint, audience profile, and what the exam tests. p. 03
02 Domain 1 — Implement & Manage Workspaces, lifecycle, deployments, security, governance. p. 04
03 Domain 2 — Ingest & Transform Batch & streaming ingest, pipelines, notebooks, dataflows. p. 07
04 Domain 3 — Monitor & Optimize Monitor Hub, capacity, Delta tuning, Spark & SQL tuning. p. 10
05 Real‑time Intelligence Eventstreams, eventhouse, KQL patterns, activators. p. 12
06 Service Comparison Tables Stores, ingest tools, languages, streaming options. p. 13
07 Common Pitfalls & Distractors Twelve plausible‑but‑wrong answers the exam loves. p. 14
08 Final Checklist — 20 Must‑Knows Night‑before review list. p. 15
DP‑70002 / 16
DP‑700 · 01Exam Domains
01Official Exam Domains & Weightage

The blueprint.

The DP‑700 exam validates your ability to implement, manage, and monitor data engineering solutions in Microsoft Fabric — spanning Lakehouse, Warehouse, Eventhouse, pipelines, notebooks, and real‑time intelligence. It sits squarely in the data engineer's world: you are the person who moves, shapes, and keeps data flowing reliably.

Audience profile

Weighting

DomainWeightFocus
01   Implement & Manage a Data Engineering Solution30–35%Workspaces, lifecycle, security
02   Ingest & Transform Data30–35%Batch, streaming, pipelines, Spark
03   Monitor & Optimize an Analytics Solution30–35%Monitor, tune Delta, Spark, SQL
Exam alert

All three DP‑700 domains are weighted roughly equally — about one‑third each. You cannot skip any area. Expect Ingest & Transform and Monitor & Optimize to trade questions depending on the form you receive. Know pipelines, notebooks, Delta maintenance, and the Monitor Hub cold.

Format at a glance

Platforms

Fabric · OneLake · Lakehouse · Warehouse · Eventhouse

Question types

Multiple choice · Drag‑and‑drop · Case studies

Languages

SQL · PySpark · KQL · Power Query (M)

01 · Domains03 / 16
DP‑700 · 02Domain 1 · Implement & Manage
02Domain 1 · 30 – 35%

Implement & manage the solution.

2.1   Plan a data engineering environment

2.2   Configure workspaces & items

2.3   Security, governance & sensitivity

RequirementFeatureNote
Isolate tenants by rowRow‑Level Security (RLS)Security predicate; Warehouse, Lakehouse SQL endpoint, semantic models.
Hide sensitive columnsColumn / Object‑Level SecurityDENY SELECT on specific columns (Warehouse).
Mask PII at presentationDynamic Data MaskingNot encryption — privileged users still see real data.
Restrict folders in OneLakeOneLake file / folder ACLsEnforced across every engine reading the path.
Protect downstream exportsSensitivity labels (Purview)Labels flow to PBIX, Excel, downstream reports.
Signal trustEndorsementPromoted (author) or Certified (admin) — not a security control.
Exam alert

DP‑700 leans on scenario‑based RLS and OneLake ACLs. Know the difference between DDM (masking at read) and OLS (hard deny on a column). DDM is not a security control on its own.

02 · Domain 104 / 16
DP‑700 · 02Lifecycle & deployment
02 · cardLifecycle card · Git + deployment pipelines

From notebook to production.

Step 01
Develop
Author notebooks, pipelines, lakehouses in a dev workspace.
Step 02
Source control
Workspace bound to Git branch; commit notebooks, pipelines, shortcuts.
Step 03
Promote
Deployment pipeline pushes items dev → test → prod with rules.
Step 04
Operate
Monitor Hub, Capacity Metrics app, pipeline run history, alerts.

Git integration — what's supported

Deployment pipelines — promotion rules

Fabric items a data engineer touches

ItemPrimary purposeOwner‑persona
LakehouseFiles + Delta tables over OneLake.Data engineer
WarehouseT‑SQL DW with multi‑table transactions.Data / analytics engineer
EventhouseKQL databases for streaming + telemetry.Data engineer
NotebookPySpark / SparkSQL transformations.Data engineer
Data pipelineOrchestrate copy, notebook, dataflow, SP.Data engineer
Dataflow Gen2Low‑code Power Query ingest + shape.Analyst / DE
EventstreamIngest & route streaming events.Data engineer
Mirrored DBNear‑real‑time replica of external source.Data engineer
02 · Domain 105 / 16
DP‑700 · 02CI/CD & scheduling
02 · cont.CI/CD, scheduling & version control

2.4   Version control for pipelines & notebooks

2.5   Scheduling & orchestration

2.6   Environments for Spark

Pro tip

For production pipelines that must be reproducible, pin a custom environment and reference it from every notebook activity. The starter pool is convenient in dev but can shift runtime behavior silently.

2.7   Governance: sensitivity labels & endorsements

02 · Domain 106 / 16
DP‑700 · 03Domain 2 · Ingest & Transform
03Domain 2 · 30 – 35%

Ingest & transform data.

3.1   Pick the right ingest tool

NeedToolWhy
Scheduled batch copy, 100+ connectorsData pipeline — Copy activitySchema mapping, fault tolerance, staging.
Low‑code shape & enrich for analystsDataflow Gen2Power Query UI; lands results in Lakehouse/Warehouse.
Complex logic, reuse, MLNotebook (PySpark / Spark SQL)Distributed compute, code‑first.
Live read of operational DBMirroringContinuous replication into OneLake as Delta.
Virtualize without copyShortcutPoint at ADLS, S3, GCS, Dataverse, OneLake.
Real‑time streamsEventstreamIngest events → Lakehouse / Eventhouse / custom endpoint.

3.2   Shortcut vs. Mirroring vs. Copy

Exam alert

Watch for "minimal source load" and "near real‑time" in a scenario — they point to Mirroring. "Without copying data" or "avoid duplication" → Shortcut. "Scheduled nightly load with transformations" → Copy activity or Dataflow.

3.3   Batch, incremental & SCD handling

Deduplicate on natural key

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER
    (PARTITION BY CustomerId
     ORDER BY UpdatedAt DESC) rn
  FROM stg.Customer
) t WHERE rn = 1;

Incremental watermark load

-- pipeline expression
@{activity('LookupWM').output.firstRow.LastLoad}

SELECT * FROM src.Orders
WHERE ModifiedDate > @prevWatermark;

03 · Domain 207 / 16
DP‑700 · 03Spark, notebooks, PySpark
03 · cont.Transform with Spark · PySpark patterns

3.4   Notebook basics

3.5   PySpark patterns the exam tests

Read Delta & aggregate

from pyspark.sql import functions as F

df = spark.table("lh.silver.orders")
daily = (df
  .filter(F.col("status")=="paid")
  .groupBy("order_date")
  .agg(F.sum("amount").alias("rev")))
daily.write.format("delta")\
  .mode("overwrite")\
  .saveAsTable("lh.gold.daily_rev")

Upsert into Delta (MERGE)

from delta.tables import DeltaTable

tgt = DeltaTable.forName(spark,"lh.dim.customer")
(tgt.alias("t")
 .merge(stg.alias("s"),
    "t.CustomerKey = s.CustomerKey")
 .whenMatchedUpdateAll()
 .whenNotMatchedInsertAll()
 .execute())

3.6   T‑SQL transformations in Warehouse

Pro tip

When the exam describes large joins, complex logic, reuse via functions → pick notebook. Analyst with a UIDataflow Gen2. Transactional, multi‑table operationT‑SQL in Warehouse.

03 · Domain 208 / 16
DP‑700 · 03Streaming & Dataflow Gen2
03 · cont.Streaming ingest · Dataflow Gen2 · Mirroring

3.7   Eventstreams — streaming ingest

3.8   Eventhouse & KQL database

3.9   Dataflow Gen2 — low‑code

If →events land every second from IoT devices into a queryable log store → EventstreamEventhouse / KQL database.
If →analysts want to land CSVs from SharePoint into a Lakehouse with simple shaping → Dataflow Gen2.
If →an Azure SQL OLTP DB must be analytically queryable with minimal latency → Mirroring.
If →you need scheduled load of a dozen tables with Copy + a stored proc at the end → Data pipeline.
If →the source is ADLS and you want OneLake consumers to read it as if it were local → Shortcut.
Exam alert

Dataflow Gen2 has a Fast Copy toggle that bypasses the M engine for large loads. If a scenario cares about throughput and a Power Query destination is given, Fast Copy is often the correct answer.

03 · Domain 209 / 16
DP‑700 · 04Domain 3 · Monitor & Optimize
04Domain 3 · 30 – 35%

Monitor & optimize solutions.

4.1   Monitoring surfaces

4.2   Delta table maintenance

OperationWhat it doesWhen to run
V‑OrderWrite‑time Parquet ordering for Fabric engines.On by default; keep on for DirectLake / SQL endpoint reads.
OPTIMIZECompacts many small files into few larger ones.After streaming or frequent small writes.
Z‑ORDER BY colCo‑locates rows on chosen columns for data skipping.Queries frequently filter on that column.
VACUUMRemoves obsolete files past retention (default 7d).Reduce storage; after big rewrites.
PartitioningPhysical layout by a low‑cardinality column.When date/region pruning gives big wins.
Exam alert

OPTIMIZE ≠ VACUUM. OPTIMIZE compacts small files; VACUUM removes tombstoned files after retention. Never partition on a high‑cardinality column — it creates the small‑file problem.

4.3   Capacity & throttling

04 · Domain 310 / 16
DP‑700 · 04Tune Spark & Warehouse
04 · cont.Spark tuning · Warehouse tuning · Error handling

4.4   Tune Spark workloads

4.5   Tune Warehouse / SQL endpoint

4.6   Error handling & resilience

Decide: which remedy?

SymptomRemedyWhy
Many tiny files on a Delta tableOPTIMIZECompacts into larger Parquet files.
Slow filter on a hot columnZ‑ORDER BY colCo‑locates rows for skipping.
Storage bill growingVACUUM (with retention)Removes obsolete files.
One Spark stage far slower than siblingsSalt / repartition / broadcastClassic skew fix.
Pipeline fails on flaky sourceActivity retries + on‑failure branchResilience without manual rerun.
04 · Domain 311 / 16
DP‑700 · 05Real‑time Intelligence
05Real‑time intelligence · KQL patterns

Eventstreams, Eventhouse & KQL.

5.1   Architecture at a glance

5.2   KQL patterns the exam tests

Window aggregate per 5 min

Telemetry
| where Timestamp > ago(1h)
| summarize
  avgTemp = avg(Temp)
  by bin(Timestamp, 5m),
    DeviceId

Find spikes (vs. prior 1h)

Telemetry
| make-series cnt=count() default=0
  on Timestamp step 1m
  by DeviceId
| extend (anomalies, score, base) =
  series_decompose_anomalies(cnt)

Materialized view (auto‑refresh)

.create materialized-view
  DailyRevenue on table Orders
{
  Orders
  | summarize sum(Amount)
    by bin(CreatedAt, 1d)
}

Retention & cache

.alter-merge table Telemetry policy retention
'{"SoftDeletePeriod":"30.00:00:00"}'

.alter table Telemetry policy caching
hot = 7d

5.3   Activator (Reflex)

Pro tip

Turn on OneLake availability on a KQL table when downstream consumers (Power BI DirectLake, notebooks) need the same data without double‑ingesting.

05 · Real‑time12 / 16
DP‑700 · 06Service comparison tables
06Service comparison tables

Choose the right tool.

Fabric data stores compared

FeatureLakehouseWarehouseEventhouse (KQL)Mirrored DB
Primary workloadFiles + Delta tablesT‑SQL DWStreaming telemetry / logsReplica of external DB
LanguageSpark SQL / PySparkT‑SQLKQLT‑SQL (via SQL endpoint)
SQL writesread‑only endpointfullAppend (ingest)read‑only
Multi‑table txnnoyesnono
Unstructured filesyesnonono
OneLake availabilitynativenativeOpt‑in per tablenative (Delta)
Best forMedallion ELT, MLStar schema, gold servingLogs, IoT, telemetryLive analytics on OLTP source

Ingest tools compared

ToolBest forPersona
Copy activityScheduled batch, 100+ connectors, heavy transforms downstream.Data engineer
Dataflow Gen2Low‑code M shaping with destination.Analyst / DE
NotebookCode‑first Spark pipelines.Data engineer
ShortcutZero‑copy virtualization of external data.Data engineer
MirroringNear‑real‑time read‑only replica.Data engineer
EventstreamStreaming ingest from Event Hub / IoT / Kafka.Data engineer

Transformation languages — at a glance

LanguageStrengthTypical item
Power Query (M)Low‑code UI shaping.Dataflow Gen2
PySpark / Spark SQLDistributed compute, ML, large joins.Notebook
T‑SQLSet ops, transactions, SPs.Warehouse
KQLTime‑series, logs, streaming analytics.Eventhouse / KQL DB
06 · Comparison13 / 16
DP‑700 · 07Common pitfalls
07Common pitfalls & distractor answers

Plausible, but wrong.

The exam uses plausible‑sounding options to test depth of understanding. Twelve of the most common traps, with corrections.

01
Lakehouse SQL endpoint supports full DML.
WrongThe Lakehouse SQL endpoint is read‑only. For DML use a Warehouse, or write via Spark.
02
OPTIMIZE and VACUUM do the same thing.
WrongOPTIMIZE compacts small files; VACUUM removes files past retention. Different operations.
03
Shortcuts copy data into OneLake.
WrongShortcuts virtualize — no copy, no egress. Source changes visible immediately.
04
Mirroring is bidirectional.
WrongMirroring is read‑only in OneLake. Writes go to the source (Azure SQL, Cosmos DB, Snowflake, etc.).
05
Partitioning always speeds up a Lakehouse table.
WrongHigh‑cardinality partitioning creates tiny files and hurts performance. Partition on low‑cardinality filter columns only.
06
Z‑ORDER replaces partitioning.
WrongZ‑ORDER co‑locates rows within files; partitioning physically segregates data. Complementary, not interchangeable.
07
Dynamic Data Masking encrypts data.
WrongDDM masks at presentation only. Privileged users still see real data.
08
Git integration replaces deployment pipelines.
WrongGit tracks change; deployment pipelines promote items across stages. Use both.
09
You should schedule notebooks directly (outside pipelines).
WrongWrap notebooks in pipelines for retries, dependency logic, monitoring, and parameters.
10
Eventhouse is a drop‑in replacement for a Warehouse.
WrongEventhouse (KQL) is optimized for time‑series / logs. Warehouse is a relational DW with multi‑table transactions. Different tools.
11
Endorsement is the same as a sensitivity label.
WrongEndorsement signals trust. Sensitivity labels enforce protection. Different purposes.
12
Starter pool is fine for production workloads.
WrongStarter pool is convenient for dev. Production should use a pinned custom environment + named pool for reproducibility.
07 · Pitfalls14 / 16
DP‑700 · 08Final checklist
08Night‑before review · 20 must‑know items

Twenty things you must know.

Review the night before. If any item feels unfamiliar, revisit that topic in the guide.

01You can pick between Lakehouse, Warehouse, Eventhouse, Mirrored DB based on workload.
02You understand Shortcut vs. Mirroring vs. Copy and can tell them apart in a scenario.
03You can set up Git integration + a deployment pipeline with deployment rules.
04You can apply workspace, item, row, column, object, OneLake file access controls.
05You can implement dynamic RLS with USERPRINCIPALNAME() and a security table.
06You can ingest via Copy, Dataflow Gen2, Notebook, Shortcut, Mirroring, Eventstream.
07You can transform with PySpark, Spark SQL, T‑SQL, Power Query, and pick by persona.
08You can implement SCD Type 1 and Type 2, dedup, watermark incremental load.
09You can maintain Delta with V‑Order, OPTIMIZE, Z‑ORDER, VACUUM — and pick the right one.
10You can design medallion layers and know what belongs in bronze / silver / gold.
11You can author a pipeline with retries, dependencies, If Condition, and notebook activity.
12You can build a notebook that reads/writes Delta, merges, and exits cleanly for a pipeline.
13You can route an Eventstream to Lakehouse + Eventhouse + Activator.
14You can write KQL for time windows, anomalies, materialized views, retention/cache.
15You can read Monitor Hub, Capacity Metrics, query insights, Spark UI.
16You can tune Spark: skew, broadcast, AQE, partition size, cache.
17You can tune Warehouse: stats, result cache, execution plan, star schema, filter early.
18You can diagnose throttling via Capacity Metrics and describe remedies.
19You can configure Fabric environments with pinned libraries, pool, runtime.
20You can design end‑to‑end: ingest → medallion → serve → monitor on Fabric.
08 · Checklist15 / 16
DP‑700 · ColophonClosing
— Good luck
Focus on scenario‑based reasoning: understand not just what each Fabric item does, but when and why you would pick Lakehouse over Warehouse, Mirroring over Shortcut, or a pipeline over a standalone notebook.
Source

Microsoft Learn — Study Guide for Exam DP‑700, Implementing Data Engineering Solutions Using Microsoft Fabric.

End of guide · DP‑700
DP‑70016 / 16