Microsoft Certified · Fabric Data Engineer Associate

Exam DP‑700
Study Guide

Implementing Data Engineering Solutions Using Microsoft Fabric — Lakehouse, Warehouse, Eventhouse, Pipelines, Notebooks, KQL and T‑SQL.

A consolidated, scenario‑first reference organized around the three official exam domains. Every topic is paired with the decisions, trade‑offs, and distractors you will actually encounter on the exam.

Domain weighting

Implement & manage · 30–35% Ingest & transform · 30–35% Monitor & optimize · 30–35%

Exam codeDP‑700

Duration100 min / 120 seat

Questions40 – 60

Passing score700 / 1000

Fabric Data Engineer Associate 01 / 16

DP‑700 · Study GuideTable of Contents

00Contents

What's inside.

Eight chapters covering the exam blueprint, deep topic notes for all three domains, comparison tables, the traps the exam likes to set, and a twenty‑item pre‑exam checklist.

01 Exam Domains & Weightage Blueprint, audience profile, and what the exam tests. p. 03

02 Domain 1 — Implement & Manage Workspaces, lifecycle, deployments, security, governance. p. 04

03 Domain 2 — Ingest & Transform Batch & streaming ingest, pipelines, notebooks, dataflows. p. 07

04 Domain 3 — Monitor & Optimize Monitor Hub, capacity, Delta tuning, Spark & SQL tuning. p. 10

05 Real‑time Intelligence Eventstreams, eventhouse, KQL patterns, activators. p. 12

06 Service Comparison Tables Stores, ingest tools, languages, streaming options. p. 13

07 Common Pitfalls & Distractors Twelve plausible‑but‑wrong answers the exam loves. p. 14

08 Final Checklist — 20 Must‑Knows Night‑before review list. p. 15

DP‑70002 / 16

DP‑700 · 01Exam Domains

01Official Exam Domains & Weightage

The blueprint.

The DP‑700 exam validates your ability to implement, manage, and monitor data engineering solutions in Microsoft Fabric — spanning Lakehouse, Warehouse, Eventhouse, pipelines, notebooks, and real‑time intelligence. It sits squarely in the data engineer's world: you are the person who moves, shapes, and keeps data flowing reliably.

Audience profile

Data engineer responsible for the full lifecycle — ingest, transform, load, monitor, and optimize analytical data estates.
Responsibilities: designing pipelines, building Lakehouse / Warehouse / Eventhouse workloads, orchestrating refresh, managing capacity.
Partners with analytics engineers, data architects, platform admins and security.
Skilled at SQL, PySpark, and KQL, plus Python, Delta Lake internals, and event streaming.

Weighting

Domain	Weight	Focus
01 Implement & Manage a Data Engineering Solution	30–35%	Workspaces, lifecycle, security
02 Ingest & Transform Data	30–35%	Batch, streaming, pipelines, Spark
03 Monitor & Optimize an Analytics Solution	30–35%	Monitor, tune Delta, Spark, SQL

Exam alert

All three DP‑700 domains are weighted roughly equally — about one‑third each. You cannot skip any area. Expect Ingest & Transform and Monitor & Optimize to trade questions depending on the form you receive. Know pipelines, notebooks, Delta maintenance, and the Monitor Hub cold.

Format at a glance

Platforms

Fabric · OneLake · Lakehouse · Warehouse · Eventhouse

Question types

Multiple choice · Drag‑and‑drop · Case studies

Languages

SQL · PySpark · KQL · Power Query (M)

01 · Domains03 / 16

DP‑700 · 02Domain 1 · Implement & Manage

02Domain 1 · 30 – 35%

Implement & manage the solution.

2.1 Plan a data engineering environment

Capacity sizing. Fabric capacity (F‑SKU) is shared across workspaces. Watch CU seconds, bursting, smoothing, and throttling thresholds in the Capacity Metrics app.
Domains & workspace strategy. Group workspaces by business domain; domain admins govern discoverability and defaults independently of the tenant admin.
Gateways. On‑premises data gateway for on‑prem sources; VNet data gateway for Azure private endpoints.
Recommend Fabric items. Lakehouse for files + Delta tables, Warehouse for T‑SQL DW, Eventhouse for streaming + KQL, Data pipeline for orchestration, Notebook for Spark.

2.2 Configure workspaces & items

Workspace settings. Git integration, default lakehouse, Spark environment, managed private endpoints, auto‑scale Spark pool.
Default lakehouse. Each notebook attaches to a default lakehouse for lakehouse.default path resolution. Pin carefully to avoid accidental writes.
Spark environments. Reusable bundles of libraries, Spark properties, compute size. Attach to notebooks and pipelines.
Managed & private endpoints. Secure outbound data access from Fabric capacity to PaaS sources.

2.3 Security, governance & sensitivity

Requirement	Feature	Note
Isolate tenants by row	Row‑Level Security (RLS)	Security predicate; Warehouse, Lakehouse SQL endpoint, semantic models.
Hide sensitive columns	Column / Object‑Level Security	`DENY SELECT` on specific columns (Warehouse).
Mask PII at presentation	Dynamic Data Masking	Not encryption — privileged users still see real data.
Restrict folders in OneLake	OneLake file / folder ACLs	Enforced across every engine reading the path.
Protect downstream exports	Sensitivity labels (Purview)	Labels flow to PBIX, Excel, downstream reports.
Signal trust	Endorsement	Promoted (author) or Certified (admin) — not a security control.

Exam alert

DP‑700 leans on scenario‑based RLS and OneLake ACLs. Know the difference between DDM (masking at read) and OLS (hard deny on a column). DDM is not a security control on its own.

02 · Domain 104 / 16

DP‑700 · 02Lifecycle & deployment

02 · cardLifecycle card · Git + deployment pipelines

From notebook to production.

Step 01

Develop

Author notebooks, pipelines, lakehouses in a dev workspace.

Step 02

Source control

Workspace bound to Git branch; commit notebooks, pipelines, shortcuts.

Step 03

Promote

Deployment pipeline pushes items dev → test → prod with rules.

Step 04

Operate

Monitor Hub, Capacity Metrics app, pipeline run history, alerts.

Git integration — what's supported

Supported providers. Azure DevOps and GitHub. Connect at workspace level; bind to a branch.
Supported items. Notebooks, pipelines, lakehouses (metadata), warehouses (metadata), semantic models (TMDL), reports (PBIP), environments, data pipelines, eventstreams, KQL databases, mirrored databases.
Not supported (as of current). Some preview items — verify before committing to the design in production.
Workflow. Commit from workspace → branch → PR review → merge. Sync back with Update all.

Deployment pipelines — promotion rules

Three stages. Dev → Test → Prod by default; Fabric also supports custom stage count.
Deployment rules. Override per stage: data source bindings, parameters, connection references, default lakehouse for notebooks.
Selective deploy. Promote specific items; skip items still in flux.
Comparison. Deployment pipelines compare items between stages and surface differences before you promote.

Fabric items a data engineer touches

Item	Primary purpose	Owner‑persona
Lakehouse	Files + Delta tables over OneLake.	Data engineer
Warehouse	T‑SQL DW with multi‑table transactions.	Data / analytics engineer
Eventhouse	KQL databases for streaming + telemetry.	Data engineer
Notebook	PySpark / SparkSQL transformations.	Data engineer
Data pipeline	Orchestrate copy, notebook, dataflow, SP.	Data engineer
Dataflow Gen2	Low‑code Power Query ingest + shape.	Analyst / DE
Eventstream	Ingest & route streaming events.	Data engineer
Mirrored DB	Near‑real‑time replica of external source.	Data engineer

02 · Domain 105 / 16

DP‑700 · 02CI/CD & scheduling

02 · cont.CI/CD, scheduling & version control

2.4 Version control for pipelines & notebooks

Notebook versioning. Workspace Git integration tracks .ipynb plus Fabric metadata. Use branches for feature work.
Pipeline versioning. Data pipeline JSON committed through Git integration. Deployment pipelines apply stage‑specific deployment rules for connections.
Database projects. Source‑control Warehouse schema via SQL database projects (.sqlproj). Applies for T‑SQL objects; deploy with DacPac.
PBIP / TMDL. Semantic models committed in TMDL text format, human‑reviewable in PR.

2.5 Scheduling & orchestration

Pipeline schedule. Cron or interval; configure retries, timeouts, concurrency.
Pipeline triggers. Scheduled, event‑based (Activator), or manual. Chain pipelines via Invoke pipeline activity.
Notebook scheduling. Run as activity inside pipelines; avoid scheduling notebooks independently for production.
Dependency management. Use If Condition, Until, and activity dependencies (success / failure / skipped / completed) to handle branch logic.

2.6 Environments for Spark

Fabric environment item. Reusable bundle: Spark properties, Python/R libraries, pool size, runtime version.
Attach at workspace, notebook, or pipeline scope. Notebook‑scoped wins for that notebook run.
Library management. Upload custom wheels, reference feeds, or use inline %pip install in a notebook session.
Spark pool. Starter pool (fast start, size capped) vs. custom pool (dedicated, configurable).

Pro tip

For production pipelines that must be reproducible, pin a custom environment and reference it from every notebook activity. The starter pool is convenient in dev but can shift runtime behavior silently.

2.7 Governance: sensitivity labels & endorsements

Sensitivity labels. Inherit from Purview; apply to items, flow downstream to Power BI, Excel exports, and shared links.
Endorsement. Promoted by any member; Certified by a tenant‑role. Not a security control — a discoverability signal.
Scanner APIs / Admin Portal. Admin visibility into where sensitive data lives and where it flows.

02 · Domain 106 / 16

DP‑700 · 03Domain 2 · Ingest & Transform

03Domain 2 · 30 – 35%

Ingest & transform data.

3.1 Pick the right ingest tool

Need	Tool	Why
Scheduled batch copy, 100+ connectors	Data pipeline — Copy activity	Schema mapping, fault tolerance, staging.
Low‑code shape & enrich for analysts	Dataflow Gen2	Power Query UI; lands results in Lakehouse/Warehouse.
Complex logic, reuse, ML	Notebook (PySpark / Spark SQL)	Distributed compute, code‑first.
Live read of operational DB	Mirroring	Continuous replication into OneLake as Delta.
Virtualize without copy	Shortcut	Point at ADLS, S3, GCS, Dataverse, OneLake.
Real‑time streams	Eventstream	Ingest events → Lakehouse / Eventhouse / custom endpoint.

3.2 Shortcut vs. Mirroring vs. Copy

Shortcut. Zero‑copy virtualization. Data stays in source; any OneLake engine can read it. Changes visible immediately. Choose when you don't want duplication and source is already trusted.
Mirroring. Continuous, read‑only replication of Azure SQL DB, Azure Cosmos DB, Snowflake, Azure Databricks Unity Catalog, SQL DB in Fabric into OneLake as Delta. Near real‑time, low impact on source. Choose when you need analytical queries without ETL.
Copy. Classic ELT — bulk move at a schedule. Choose when you need transformations in flight, sources not supported by mirroring, or explicit control over what lands when.

Exam alert

Watch for "minimal source load" and "near real‑time" in a scenario — they point to Mirroring. "Without copying data" or "avoid duplication" → Shortcut. "Scheduled nightly load with transformations" → Copy activity or Dataflow.

3.3 Batch, incremental & SCD handling

Full load. Rare in production for large tables; fine for small dims.
Incremental watermark. Track LastModified in a lookup, copy rows above watermark, then update watermark. Rebuild the parameter in pipeline expressions.
CDC / Change Tracking. Use when source exposes it; faster than watermark for busy sources.
SCD Type 1. MERGE ... WHEN MATCHED UPDATE — overwrite the row.
SCD Type 2. Close current row (IsCurrent=0, EffectiveTo), insert new version.
Dedup. ROW_NUMBER() OVER (PARTITION BY nk ORDER BY ts DESC), keep rn = 1.

Deduplicate on natural key

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER
    (PARTITION BY CustomerId
     ORDER BY UpdatedAt DESC) rn
  FROM stg.Customer
) t WHERE rn = 1;

Incremental watermark load

-- pipeline expression
@{activity('LookupWM').output.firstRow.LastLoad}

SELECT * FROM src.Orders
WHERE ModifiedDate > @prevWatermark;

03 · Domain 207 / 16

DP‑700 · 03Spark, notebooks, PySpark

03 · cont.Transform with Spark · PySpark patterns

3.4 Notebook basics

Languages. PySpark, Spark SQL, Scala, SparkR; switch per cell with magics (%%sql, %%pyspark).
Attach. Each notebook attaches to a default lakehouse (read/write via /lakehouse/default/Files or Tables).
MSSparkUtils. Helper library: mssparkutils.fs, mssparkutils.notebook.run, mssparkutils.credentials.
Parameter cell. Tag a cell as Parameters to accept values from pipelines.

3.5 PySpark patterns the exam tests

spark.read.format("delta").load(path) or spark.table("lh.schema.tbl").
df.filter(...).groupBy(...).agg(...); chain, don't materialize.
df.write.format("delta").mode("append").saveAsTable("lh.schema.tbl").
Merge (upsert). DeltaTable.forName(spark,"t").alias("t").merge(source.alias("s"), "t.k=s.k").whenMatchedUpdateAll().whenNotMatchedInsertAll().execute().
Broadcast. Join small dims with F.broadcast(dim).
Avoid UDFs. Prefer F.* — Catalyst can optimize them; UDFs are black boxes.
Avoid collect(). Brings all rows to the driver; OOM on large sets.

Read Delta & aggregate

from pyspark.sql import functions as F

df = spark.table("lh.silver.orders")
daily = (df
  .filter(F.col("status")=="paid")
  .groupBy("order_date")
  .agg(F.sum("amount").alias("rev")))
daily.write.format("delta")\
  .mode("overwrite")\
  .saveAsTable("lh.gold.daily_rev")

Upsert into Delta (MERGE)

from delta.tables import DeltaTable

tgt = DeltaTable.forName(spark,"lh.dim.customer")
(tgt.alias("t")
.merge(stg.alias("s"),
"t.CustomerKey = s.CustomerKey")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute())

3.6 T‑SQL transformations in Warehouse

INSERT / UPDATE / DELETE / MERGE. Set‑based; wrap in explicit transactions where needed.
Stored procedures. First‑class for repeatable transformation; schedule via pipeline SP activity.
Cross‑database queries. Within a workspace, join Warehouse + Lakehouse SQL endpoint tables directly in T‑SQL.
External data. OPENROWSET over Parquet in OneLake for ad‑hoc reads without ingestion.

Pro tip

When the exam describes large joins, complex logic, reuse via functions → pick notebook. Analyst with a UI → Dataflow Gen2. Transactional, multi‑table operation → T‑SQL in Warehouse.

03 · Domain 208 / 16

DP‑700 · 03Streaming & Dataflow Gen2

03 · cont.Streaming ingest · Dataflow Gen2 · Mirroring

3.7 Eventstreams — streaming ingest

Sources. Event Hub, IoT Hub, Kafka (on‑prem/MSK/Confluent), CDC (SQL MI / Azure SQL / Postgres), sample data, custom endpoint.
Transformations. Filter, aggregate, group‑by window (tumbling / hopping / sliding / session), manage fields, join with reference data, union.
Destinations. Lakehouse (Delta), Eventhouse (KQL DB), custom endpoint, derived stream, Activator.
Format & schema. JSON, Avro, CSV. Define schema, handle evolution.

3.8 Eventhouse & KQL database

Eventhouse. Workspace‑level container for one or more KQL databases. Optimized for time‑series and high‑cardinality streaming.
KQL database. Append‑only by default; define update policies to enrich on ingestion. Retention & cache policies per table control hot vs. cold storage.
OneLake availability. Toggle a KQL table to expose as Delta in OneLake — read from Lakehouse, Power BI DirectLake, or notebooks.
Materialized views. Pre‑aggregate for fast serving; update on ingest.

3.9 Dataflow Gen2 — low‑code

Power Query (M). Familiar UI; same language as Power BI dataflows.
Destinations. Lakehouse, Warehouse, SQL DB in Fabric, Azure SQL, Azure Data Explorer.
Fast copy. Enable on Power Query step to push large copies to the pipeline engine (no eval in Power Query).
Incremental refresh. Define RangeStart / RangeEnd, detect changes column.

If →events land every second from IoT devices into a queryable log store → Eventstream → Eventhouse / KQL database.

If →analysts want to land CSVs from SharePoint into a Lakehouse with simple shaping → Dataflow Gen2.

If →an Azure SQL OLTP DB must be analytically queryable with minimal latency → Mirroring.

If →you need scheduled load of a dozen tables with Copy + a stored proc at the end → Data pipeline.

If →the source is ADLS and you want OneLake consumers to read it as if it were local → Shortcut.

Exam alert

Dataflow Gen2 has a Fast Copy toggle that bypasses the M engine for large loads. If a scenario cares about throughput and a Power Query destination is given, Fast Copy is often the correct answer.

03 · Domain 209 / 16

DP‑700 · 04Domain 3 · Monitor & Optimize

04Domain 3 · 30 – 35%

Monitor & optimize solutions.

4.1 Monitoring surfaces

Monitor Hub. Tenant‑wide view of activities: pipeline runs, dataflow refreshes, semantic model refresh, Spark sessions. Filter, re‑run, and inspect.
Capacity Metrics app. CU seconds consumed per item; identify throttling, bursts, noisy workloads. Install from the app store into an admin workspace.
Pipeline run history. Drill into activity‑level logs, input/output, error payload, rerun from failure.
Spark monitoring. Per‑session: jobs, stages, tasks, executors, Spark UI, query plan.
Warehouse query insights. queryinsights.exec_requests_history, long_running_queries, frequently_run_queries.
Activator (Reflex). Trigger on thresholds — pipeline failed, metric crossed, streaming pattern.

4.2 Delta table maintenance

Operation	What it does	When to run
V‑Order	Write‑time Parquet ordering for Fabric engines.	On by default; keep on for DirectLake / SQL endpoint reads.
OPTIMIZE	Compacts many small files into few larger ones.	After streaming or frequent small writes.
Z‑ORDER BY col	Co‑locates rows on chosen columns for data skipping.	Queries frequently filter on that column.
VACUUM	Removes obsolete files past retention (default 7d).	Reduce storage; after big rewrites.
Partitioning	Physical layout by a low‑cardinality column.	When date/region pruning gives big wins.

Exam alert

OPTIMIZE ≠ VACUUM. OPTIMIZE compacts small files; VACUUM removes tombstoned files after retention. Never partition on a high‑cardinality column — it creates the small‑file problem.

4.3 Capacity & throttling

CU seconds. Every operation consumes capacity CUs billed per second. Interactive and background workloads smoothed over 24 h.
Throttling tiers. Overage → delays on interactive ops; sustained overage → job rejection. Capacity Metrics app shows current tier.
Remedies. Scale up (F‑SKU), schedule heavy jobs off‑peak, pause non‑essential items, right‑size Spark pools.

04 · Domain 310 / 16

DP‑700 · 04Tune Spark & Warehouse

04 · cont.Spark tuning · Warehouse tuning · Error handling

4.4 Tune Spark workloads

Spark UI. Inspect stages, tasks, shuffle read/write, skew.
Data skew. Symptom: one task takes 10× longer. Fix with salting, repartition, or broadcast the small side.
Broadcast joins. Force with F.broadcast(small_df) when the small side fits comfortably in executor memory.
AQE. Adaptive Query Execution is on by default in Fabric — rebalances shuffles and coalesces partitions at runtime.
Cache. df.cache() for a DataFrame reused many times in one session. Drop with unpersist().
Files & partitions. Aim for ~100–200 MB per Parquet file. Too many small files → metadata thrash; too few huge ones → skew.

4.5 Tune Warehouse / SQL endpoint

Query insights. queryinsights.exec_requests_history to find long / frequent queries.
Statistics. Auto‑created; UPDATE STATISTICS after large loads if estimates look off.
Result‑set cache. Identical queries return instantly while cache is hot.
Execution plan. Look for data movement, missing stats, Cartesian products, big hash builds.
Design. Star schema, avoid SELECT *, filter early, prefer set operations.

4.6 Error handling & resilience

Pipeline activity retries. Configure count + interval at the activity level.
If Condition & Fail. Branch on expression; explicit Fail activity to raise a meaningful error.
Try/catch via dependency conditions. Use on failure to run cleanup or alert; on completion for must‑run steps.
Notebook exit value. mssparkutils.notebook.exit("ok") to pass a signal back to the calling pipeline.
Fault tolerance in Copy. Skip incompatible rows, log to a side Lakehouse, continue — fail only on a threshold.

Decide: which remedy?

Symptom	Remedy	Why
Many tiny files on a Delta table	`OPTIMIZE`	Compacts into larger Parquet files.
Slow filter on a hot column	`Z‑ORDER BY col`	Co‑locates rows for skipping.
Storage bill growing	`VACUUM` (with retention)	Removes obsolete files.
One Spark stage far slower than siblings	Salt / repartition / broadcast	Classic skew fix.
Pipeline fails on flaky source	Activity retries + on‑failure branch	Resilience without manual rerun.

04 · Domain 311 / 16

DP‑700 · 05Real‑time Intelligence

05Real‑time intelligence · KQL patterns

Eventstreams, Eventhouse & KQL.

5.1 Architecture at a glance

Source → Eventstream → Destination. Events flow through an Eventstream with optional transformations; land in Eventhouse (KQL), Lakehouse (Delta), custom endpoint, or feed Activator.
Derived streams. Chain an Eventstream as the source of another for reusable pipelines.
Route by content. Filter and split a single ingest into many destinations.

5.2 KQL patterns the exam tests

Window aggregate per 5 min

Telemetry
| where Timestamp > ago(1h)
| summarize
  avgTemp = avg(Temp)
  by bin(Timestamp, 5m),
    DeviceId

Find spikes (vs. prior 1h)

Telemetry
| make-series cnt=count() default=0
  on Timestamp step 1m
  by DeviceId
| extend (anomalies, score, base) =
  series_decompose_anomalies(cnt)

Materialized view (auto‑refresh)

.create materialized-view
  DailyRevenue on table Orders
{
  Orders
  | summarize sum(Amount)
    by bin(CreatedAt, 1d)
}

Retention & cache

.alter-merge table Telemetry policy retention
'{"SoftDeletePeriod":"30.00:00:00"}'

.alter table Telemetry policy caching
hot = 7d

5.3 Activator (Reflex)

Detect & act. Triggers on patterns in Eventstreams or Power BI visuals — fire a Teams message, email, pipeline, custom action.
Use cases. Stock out, SLA breach, machine anomaly, pipeline failure.
Compare with Monitor Hub. Monitor Hub reports on item runs; Activator fires on data patterns.

Pro tip

Turn on OneLake availability on a KQL table when downstream consumers (Power BI DirectLake, notebooks) need the same data without double‑ingesting.

05 · Real‑time12 / 16

DP‑700 · 06Service comparison tables

06Service comparison tables

Choose the right tool.

Fabric data stores compared

Feature	Lakehouse	Warehouse	Eventhouse (KQL)	Mirrored DB
Primary workload	Files + Delta tables	T‑SQL DW	Streaming telemetry / logs	Replica of external DB
Language	Spark SQL / PySpark	T‑SQL	KQL	T‑SQL (via SQL endpoint)
SQL writes	read‑only endpoint	full	Append (ingest)	read‑only
Multi‑table txn	no	yes	no	no
Unstructured files	yes	no	no	no
OneLake availability	native	native	Opt‑in per table	native (Delta)
Best for	Medallion ELT, ML	Star schema, gold serving	Logs, IoT, telemetry	Live analytics on OLTP source

Ingest tools compared

Tool	Best for	Persona
Copy activity	Scheduled batch, 100+ connectors, heavy transforms downstream.	Data engineer
Dataflow Gen2	Low‑code M shaping with destination.	Analyst / DE
Notebook	Code‑first Spark pipelines.	Data engineer
Shortcut	Zero‑copy virtualization of external data.	Data engineer
Mirroring	Near‑real‑time read‑only replica.	Data engineer
Eventstream	Streaming ingest from Event Hub / IoT / Kafka.	Data engineer

Transformation languages — at a glance

Language	Strength	Typical item
Power Query (M)	Low‑code UI shaping.	Dataflow Gen2
PySpark / Spark SQL	Distributed compute, ML, large joins.	Notebook
T‑SQL	Set ops, transactions, SPs.	Warehouse
KQL	Time‑series, logs, streaming analytics.	Eventhouse / KQL DB

06 · Comparison13 / 16

DP‑700 · 07Common pitfalls

07Common pitfalls & distractor answers

Plausible, but wrong.

The exam uses plausible‑sounding options to test depth of understanding. Twelve of the most common traps, with corrections.

01

Lakehouse SQL endpoint supports full DML.

WrongThe Lakehouse SQL endpoint is read‑only. For DML use a Warehouse, or write via Spark.

02

OPTIMIZE and VACUUM do the same thing.

WrongOPTIMIZE compacts small files; VACUUM removes files past retention. Different operations.

03

Shortcuts copy data into OneLake.

WrongShortcuts virtualize — no copy, no egress. Source changes visible immediately.

04

Mirroring is bidirectional.

WrongMirroring is read‑only in OneLake. Writes go to the source (Azure SQL, Cosmos DB, Snowflake, etc.).

05

Partitioning always speeds up a Lakehouse table.

WrongHigh‑cardinality partitioning creates tiny files and hurts performance. Partition on low‑cardinality filter columns only.

06

Z‑ORDER replaces partitioning.

WrongZ‑ORDER co‑locates rows within files; partitioning physically segregates data. Complementary, not interchangeable.

07

Dynamic Data Masking encrypts data.

WrongDDM masks at presentation only. Privileged users still see real data.

08

Git integration replaces deployment pipelines.

WrongGit tracks change; deployment pipelines promote items across stages. Use both.

09

You should schedule notebooks directly (outside pipelines).

WrongWrap notebooks in pipelines for retries, dependency logic, monitoring, and parameters.

10

Eventhouse is a drop‑in replacement for a Warehouse.

WrongEventhouse (KQL) is optimized for time‑series / logs. Warehouse is a relational DW with multi‑table transactions. Different tools.

11

Endorsement is the same as a sensitivity label.

WrongEndorsement signals trust. Sensitivity labels enforce protection. Different purposes.

12

Starter pool is fine for production workloads.

WrongStarter pool is convenient for dev. Production should use a pinned custom environment + named pool for reproducibility.

07 · Pitfalls14 / 16

DP‑700 · 08Final checklist

08Night‑before review · 20 must‑know items

Twenty things you must know.

Review the night before. If any item feels unfamiliar, revisit that topic in the guide.

01You can pick between Lakehouse, Warehouse, Eventhouse, Mirrored DB based on workload.

02You understand Shortcut vs. Mirroring vs. Copy and can tell them apart in a scenario.

03You can set up Git integration + a deployment pipeline with deployment rules.

04You can apply workspace, item, row, column, object, OneLake file access controls.

05You can implement dynamic RLS with USERPRINCIPALNAME() and a security table.

06You can ingest via Copy, Dataflow Gen2, Notebook, Shortcut, Mirroring, Eventstream.

07You can transform with PySpark, Spark SQL, T‑SQL, Power Query, and pick by persona.

08You can implement SCD Type 1 and Type 2, dedup, watermark incremental load.

09You can maintain Delta with V‑Order, OPTIMIZE, Z‑ORDER, VACUUM — and pick the right one.

10You can design medallion layers and know what belongs in bronze / silver / gold.

11You can author a pipeline with retries, dependencies, If Condition, and notebook activity.

12You can build a notebook that reads/writes Delta, merges, and exits cleanly for a pipeline.

13You can route an Eventstream to Lakehouse + Eventhouse + Activator.

14You can write KQL for time windows, anomalies, materialized views, retention/cache.

15You can read Monitor Hub, Capacity Metrics, query insights, Spark UI.

16You can tune Spark: skew, broadcast, AQE, partition size, cache.

17You can tune Warehouse: stats, result cache, execution plan, star schema, filter early.

18You can diagnose throttling via Capacity Metrics and describe remedies.

19You can configure Fabric environments with pinned libraries, pool, runtime.

20You can design end‑to‑end: ingest → medallion → serve → monitor on Fabric.

08 · Checklist15 / 16

DP‑700 · ColophonClosing

— Good luck

Focus on scenario‑based reasoning: understand not just what each Fabric item does, but when and why you would pick Lakehouse over Warehouse, Mirroring over Shortcut, or a pipeline over a standalone notebook.

Source

Microsoft Learn — Study Guide for Exam DP‑700, Implementing Data Engineering Solutions Using Microsoft Fabric.

End of guide · DP‑700

DP‑70016 / 16

Exam DP‑700Study Guide

What's inside.

The blueprint.

Audience profile

Weighting

Format at a glance

Implement & manage the solution.

2.1 Plan a data engineering environment

2.2 Configure workspaces & items

2.3 Security, governance & sensitivity

From notebook to production.

Git integration — what's supported

Deployment pipelines — promotion rules

Fabric items a data engineer touches

2.4 Version control for pipelines & notebooks

2.5 Scheduling & orchestration

2.6 Environments for Spark

2.7 Governance: sensitivity labels & endorsements

Ingest & transform data.

3.1 Pick the right ingest tool

3.2 Shortcut vs. Mirroring vs. Copy

3.3 Batch, incremental & SCD handling

Deduplicate on natural key

Incremental watermark load

3.4 Notebook basics

3.5 PySpark patterns the exam tests

Read Delta & aggregate

Upsert into Delta (MERGE)

3.6 T‑SQL transformations in Warehouse

3.7 Eventstreams — streaming ingest

3.8 Eventhouse & KQL database

3.9 Dataflow Gen2 — low‑code

Monitor & optimize solutions.

4.1 Monitoring surfaces

4.2 Delta table maintenance

4.3 Capacity & throttling

4.4 Tune Spark workloads

4.5 Tune Warehouse / SQL endpoint

4.6 Error handling & resilience

Decide: which remedy?

Eventstreams, Eventhouse & KQL.

5.1 Architecture at a glance

5.2 KQL patterns the exam tests

Window aggregate per 5 min

Find spikes (vs. prior 1h)

Materialized view (auto‑refresh)

Retention & cache

5.3 Activator (Reflex)

Choose the right tool.

Fabric data stores compared

Ingest tools compared

Transformation languages — at a glance

Plausible, but wrong.

Twenty things you must know.

Exam DP‑700
Study Guide