Back to DP-800 Parquet DP-800 · Study Guide
Microsoft Certified · SQL AI Developer Associate

Exam DP‑800
Study Guide

Developing AI‑Enabled Database Solutions across SQL Server, Azure SQL, and SQL DB in Fabric.

A consolidated, scenario‑first reference organized around the three official exam domains. Every topic is paired with the decisions and distractors you will actually encounter on the exam.

Domain weighting
Design & Develop · 35–40% Secure, Optimize, Deploy · 35–40% AI Capabilities · 25–30%
Exam codeDP‑800
Duration100 min / 120 seat
Questions40 – 60
Passing score700 / 1000
01 / 15
DP‑800 · Study GuideTable of Contents
00Contents

What's inside.

Seven chapters covering the exam blueprint, deep topic notes, comparison tables, the traps this exam likes to set, and a twenty‑item pre‑exam checklist.

01 Exam Domains & Weightage Blueprint, audience profile, and what the exam actually tests. p. 03
02 Domain 1 — Design & Develop Tables, programmability, advanced T‑SQL, AI‑assisted tooling. p. 04
03 Domain 2 — Secure, Optimize, Deploy Security, performance, CI/CD, Azure integration. p. 07
04 Domain 3 — AI Capabilities Embeddings, intelligent search, RAG in T‑SQL. p. 10
05 Service Comparison Tables SQL platforms, encryption methods, DAB vs. custom APIs. p. 12
06 Common Pitfalls & Distractors Twelve plausible‑but‑wrong answers the exam loves. p. 13
07 Final Checklist — 20 Must‑Knows Night‑before review list. p. 14
DP‑80002 / 15
DP‑800 · 01Exam Domains
01Official Exam Domains & Weightage

The blueprint.

The DP‑800 exam validates your ability to design, develop, secure, optimize, and deploy AI‑enabled database solutions across SQL Server, Azure SQL, and SQL databases in Microsoft Fabric.

Audience profile

Weighting

DomainWeightFocus
01   Design & Develop Database Solutions35 – 40%Schema, T‑SQL, AI tooling
02   Secure, Optimize & Deploy35 – 40%Security, perf, CI/CD, Azure
03   Implement AI Capabilities25 – 30%Embeddings, search, RAG
Exam alert

Most questions cover GA features, but commonly used Preview features — T‑SQL vector functions, AI_* functions — may appear. Expect heavy case‑study and scenario‑based questions.

Format at a glance

Platforms

SQL Server · Azure SQL · SQL DB in Fabric

Question types

Multiple choice · Drag‑and‑drop · Case studies

Duration & scoring

100 min · 40–60 items · 700 / 1000 to pass

01 · Domains03 / 15
DP‑800 · 02Domain 1 · Design & Develop
02Domain 1 · 35 – 40%

Design & develop database solutions.

2.1   Design & implement database objects

Core concepts

If →the business needs an immutable audit trail that regulators can verify → use a Ledger table. Temporal tables track history but are mutable by admins; ledger tables provide cryptographic tamper‑evidence.
If →the workload mixes heavy OLTP writes with occasional analytics → add a nonclustered columnstore index for real‑time operational analytics (HTAP).
If →JSON documents are queried by a known set of paths → create computed columns on those paths and index them, rather than scanning JSON at query time.

2.2   Implement programmability objects

Exam alert

Know the difference between inline TVFs (single SELECT, no BEGIN/END) and multi‑statement TVFs. The exam often presents performance scenarios where replacing a multi‑statement TVF with an inline TVF is the correct fix.

02 · Domain 104 / 15
DP‑800 · 02Domain 1 · Design & Develop
02 · cont.Advanced T‑SQL · AI‑assisted tools

2.3   Write advanced T‑SQL

A code‑heavy sub‑domain. Expect to read and write queries using modern T‑SQL features.

Language features

  • CTEs. Recursive CTEs for hierarchical data (org charts, BOMs). Anchor + recursive member pattern.
  • Window functions. ROW_NUMBER, RANK, DENSE_RANK, NTILE for ranking; LAG/LEAD for row comparison; SUM/AVG OVER for running totals.
  • JSON. JSON_OBJECT, JSON_ARRAY, JSON_ARRAYAGG, OPENJSON, JSON_VALUE, JSON_CONTAINS.
  • Correlated subqueries. EXISTS / NOT EXISTS. Know when they outperform JOINs.
  • Error handling. TRY…CATCH, THROW, RAISERROR, XACT_ABORT, ERROR_NUMBER(), ERROR_MESSAGE(), ERROR_LINE().

Text & graph

  • Regular expressions. REGEXP_LIKE, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR, REGEXP_COUNT, REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE.
  • Fuzzy matching. EDIT_DISTANCE (Levenshtein), EDIT_DISTANCE_SIMILARITY (%), JARO_WINKLER_DISTANCE. Dedup, data cleansing, approximate matching.
  • Graph queries. NODE / EDGE tables with the MATCH operator for pattern matching across relationships.
Pro tip

The new regex and fuzzy‑matching functions are heavily tested. Practice combining REGEXP_LIKE in WHERE clauses for data validation, and EDIT_DISTANCE for approximate‑match scenarios in customer data.

2.4   Design SQL solutions with AI‑assisted tools

A new, distinctive sub‑domain covering Copilot integration and the Model Context Protocol.

Core concepts

If →the team wants Copilot to follow company naming conventions (e.g. tbl_ prefix, snake_case columns) → create a copilot‑instructions.md file with explicit rules.
If →a developer needs AI‑assisted queries over Fabric lakehouse data → configure an MCP server endpoint pointing to the lakehouse.
02 · Domain 105 / 15
DP‑800 · 02Domain 1 · Reference card
02 · cardTable & object decision guide

When to reach for which object.

ObjectChoose when…Watch out for
Temporal tableYou need point‑in‑time queries and a history of changes.History is mutable by admins — not tamper‑evident.
Ledger tableYou need cryptographic tamper‑evidence for compliance or regulatory audit.Heavier write path; schema changes more restricted.
In‑memory (OLTP)Write‑heavy OLTP where latching / locking is the bottleneck.Durability trade‑offs; memory footprint.
Graph (node/edge)Queries traverse relationships — social graph, fraud detection.Avoid for purely relational workloads.
External tableQuery data that lives outside SQL without importing it.Latency and per‑query data movement.
Clustered columnstorePure analytical / reporting workload on large tables.Not ideal for frequent point updates.
Nonclustered columnstoreMix of OLTP writes with occasional analytics (HTAP).Maintenance overhead on the rowstore.

T‑SQL function cheat‑lines

NeedFunctionNote
Scalar from JSONJSON_VALUE(doc,'$.path')Scalars only — not objects.
Object / array from JSONJSON_QUERY, OPENJSONOPENJSON shreds to rows.
Build JSONJSON_OBJECT, JSON_ARRAY, JSON_ARRAYAGGAggregates rows → array.
Regex testREGEXP_LIKE(col,'pattern')Use in WHERE for validation.
Approximate string matchEDIT_DISTANCE, JARO_WINKLER_DISTANCEDedup, fuzzy joins.
Call external RESTsp_invoke_external_rest_endpointCore of RAG pipelines.
Similarity over vectorsVECTOR_DISTANCE, VECTOR_SEARCHENN without index, ANN with.
Mental model

Exam questions rarely ask "what does function X do?" They ask "given this scenario, which feature do you reach for?" — read every question as a decision tree.

02 · Reference06 / 15
DP‑800 · 03Domain 2 · Secure, Optimize, Deploy
03Domain 2 · 35 – 40%

Secure, optimize, & deploy.

3.1   Data security & compliance

Security is foundational: encryption, masking, row‑ and object‑level access, auditing, endpoint security.

If →sensitive data (SSN, credit card) must be protected even from DBAs → use Always Encrypted. DDM is not sufficient — DBAs can still query the real data.
If →multi‑tenant apps need row isolation per tenant → implement Row‑Level Security with a filter predicate keyed on tenant ID.
If →applications need to connect to Azure SQL without storing credentials → use Managed Identity.

3.2   Optimize database performance

03 · Domain 207 / 15
DP‑800 · 03Isolation · CI/CD
03 · cont.Isolation · CI/CD · SQL Database Projects

Transaction isolation levels

LevelDirty readNon‑repeatablePhantomBlocking
READ UNCOMMITTEDyesyesyesMinimal
READ COMMITTED (default)noyesyesLow
REPEATABLE READnonoyesModerate
SERIALIZABLEnononoHigh
SNAPSHOTnononoNone · row vers.
Exam alert

SNAPSHOT isolation eliminates blocking by using row versioning in tempdb — but increases tempdb usage. The exam loves to test when SNAPSHOT is the right choice versus SERIALIZABLE.

3.3   CI/CD with SQL Database Projects

If →unauthorized schema changes are appearing in production → enable schema drift detection in the pipeline and enforce all changes through SQL Database Projects + pull requests.
If →deployments need multi‑environment approval (dev → staging → prod) → configure approval gates and branching policies.
03 · Domain 208 / 15
DP‑800 · 03Azure integration
03 · cont.Integrate SQL with Azure Services

3.4   Integrate SQL solutions with Azure services

Change event handling — which mechanism?

MechanismUse caseGranularity
Change Data Capture (CDC)Full change history for ETL / audit.Row‑level, all columns
Change TrackingLightweight sync — "what changed?"Row‑level, flag only
Change Event Streaming (CES)Real‑time streaming to downstream systems.Event‑based
Azure Functions + SQL triggerServerless reaction to row changes.Row‑level, code‑driven
Azure Logic AppsLow‑code workflow on data changes.Row‑level, connector
If →a frontend needs a quick REST/GraphQL API over existing tables with no backend code → use Data API Builder.
If →downstream systems need real‑time change events → use CES or Azure Functions with the SQL trigger binding.
If →you need full before/after images of every change for audit → CDC. If you only need to know which rows changed → Change Tracking.
03 · Domain 209 / 15
DP‑800 · 04Domain 3 · AI Capabilities
04Domain 3 · 25 – 30%

Implement AI capabilities.

4.1   Models & embeddings

Pro tip

The exam tests your ability to choose the right embedding maintenance strategy. Triggers are simplest but add latency to writes. CDC + Azure Functions is asynchronous and scalable. Choose based on latency tolerance and write throughput.

4.2   Intelligent search

Search typeMatches onBest forLimitation
Full‑TextKeywords / linguistic formsExact term / phrase lookupNo semantic understanding
VectorMeaning / similarityConcept search, synonym handlingNeeds embeddings; approximate
Hybrid (RRF)Keywords + meaningProduction search appsMore complex setup
04 · Domain 310 / 15
DP‑800 · 04Retrieval‑Augmented Generation
04 · cont.Retrieval‑Augmented Generation

4.3   RAG in T‑SQL.

A pattern that grounds language‑model responses in your data. Know the end‑to‑end flow — the exam tests each step individually and as a complete pipeline.

Step 01
User query
Natural‑language question arrives from the app.
Step 02
Retrieve
Embed the query, vector‑search for context rows.
Step 03
Augment
System prompt + context JSON + user question.
Step 04
Invoke
sp_invoke_external_rest_endpoint calls the model.
Step 05
Extract
OPENJSON + JSON_VALUE parse the response.

Core concepts

If →users search by product name and expect exact matches → Full‑Text Search.
If →users describe what they want in natural language ("comfortable running shoes for flat feet") → Vector Search over description embeddings.
If →you need both keyword precision and semantic recall → Hybrid Search with RRF.
If →the dataset has 1M+ rows and needs sub‑second search → ANN with a vector index. ENN would be too slow.
04 · RAG11 / 15
DP‑800 · 05Service comparison tables
05Service comparison tables

Choose the right platform.

SQL platform comparison

FeatureSQL Server · on‑premAzure SQL DatabaseSQL DB in Fabric
DeploymentSelf‑managed VM / bare metalFully managed PaaSFabric workspace
ScalingManual — add CPU/RAMDTU or vCore auto‑scaleFabric capacity units
AI integrationExternal REST callsNative + REST endpointsDeep Fabric AI/ML
CopilotGitHub Copilot in IDEGitHub Copilot + portalCopilot in Fabric
Graph tablesyesyesyes
Ledger tables2022+yesyes
Vector supportpreviewpreview / GApreview / GA
CI/CDSQL Database ProjectsSQL Database ProjectsFabric Git integration

Encryption methods

MethodKey locationServer can read?Use case
Always EncryptedClient onlynoProtect from DBAs & cloud admins
AE + EnclavesSecure enclaveIn enclave onlyAE + range queries, LIKE
Column‑level (T‑SQL)DatabaseyesFlexible server‑side processing
TDEServer / HSMYes (data at rest)Full‑database at‑rest encryption
Dynamic Data MaskingN/A — no encryptionyesPresentation‑layer obfuscation

Data API Builder vs. custom API

AspectData API BuilderCustom API · App Service / Functions
Code requiredZero — config‑drivenFull application code
ProtocolsREST + GraphQLAny
Setup timeMinutesHours to days
CustomizationLimited (config options)Unlimited
AuthEntra ID, API keysAny auth provider
Best forStandard CRUD over DB objectsComplex business logic
05 · Comparison12 / 15
DP‑800 · 06Common pitfalls
06Common pitfalls & distractor answers

Plausible, but wrong.

The exam uses plausible‑sounding options to test depth of understanding. Below are the twelve most common traps — read each distractor, then the correction.

01
Dynamic Data Masking protects data from admins.
WrongDDM is presentation‑layer only. Users with UNMASK or db_owner see real data. For true protection use Always Encrypted.
02
Temporal tables provide tamper‑proof audit.
WrongAdmins can modify history tables. For tamper‑evidence use Ledger tables with cryptographic hashes.
03
Multi‑statement TVFs perform like inline TVFs.
WrongMulti‑statement TVFs have fixed cardinality estimates and prevent optimizer push‑down. Inline TVFs are expanded like views.
04
IDENTITY is always better than SEQUENCE.
WrongSEQUENCE generates values before INSERT and works across multiple tables. IDENTITY is table‑bound.
05
ENN is preferred for production.
WrongENN scans all vectors — too slow for large datasets. Use ANN with a vector index for production.
06
CDC and Change Tracking are interchangeable.
WrongCDC captures full before/after images; Change Tracking only tells you which rows changed. CDC is heavier but richer.
07
SNAPSHOT isolation has no cost.
WrongSNAPSHOT uses row versioning in tempdb — I/O and space cost. Prevents blocking but not write‑write conflicts.
08
Always Encrypted supports all query operations.
WrongStandard AE supports equality only (deterministic) or no comparisons (randomized). Use Enclaves for range / LIKE.
09
Vector indexes are required for vector search.
WrongVECTOR_DISTANCE works without an index (ENN). Indexes enable ANN for performance, but are not mandatory.
10
Triggers are the best way to maintain embeddings.
WrongTriggers are synchronous and slow writes. For high‑throughput tables use async methods — CDC + Functions, CES.
11
RRF replaces the need for both search types.
WrongReciprocal Rank Fusion merges results from full‑text AND vector search. You still need both implementations; RRF combines them.
12
JSON_VALUE can return entire objects.
WrongJSON_VALUE returns scalars only. Use OPENJSON or JSON_QUERY to extract objects or arrays.
06 · Pitfalls13 / 15
DP‑800 · 07Final checklist
07Night‑before review · 20 must‑know items

Twenty things you must know.

Review the night before. If any item feels unfamiliar, revisit that topic in the guide before going to bed.

01You can explain when to use temporal vs. ledger vs. graph vs. in‑memory tables.
02You can write a recursive CTE for hierarchical data.
03You know all JSON functions: JSON_OBJECT, JSON_ARRAY, JSON_ARRAYAGG, OPENJSON, JSON_VALUE, JSON_CONTAINS.
04You can use window functionsROW_NUMBER, RANK, LAG/LEAD, running SUM.
05You understand the new regex functions (REGEXP_LIKE, REGEXP_REPLACE, …) and can write basic patterns.
06You know the difference between EDIT_DISTANCE and JARO_WINKLER_DISTANCE.
07You can write a graph MATCH query.
08You can configure GitHub Copilot instruction files and MCP server endpoints.
09You understand Always Encrypted (standard vs. enclaves) and when to use each.
10You can implement Dynamic Data Masking and explain its limitations.
11You can implement Row‑Level Security with filter and block predicates.
12You can read a query execution plan and identify key lookups, scans, implicit conversions.
13You can explain all five transaction isolation levels and their trade‑offs.
14You can diagnose blocking and deadlocks using DMVs.
15You understand SQL Database Projects, schema drift, and CI/CD pipeline controls.
16You can configure Data API Builder for REST and GraphQL endpoints.
17You know the differences between CDC, Change Tracking, CES, and SQL trigger bindings.
18You can design an embedding pipeline — columns, chunking, generation, maintenance.
19You understand ANN vs. ENN, vector indexes, and VECTOR_DISTANCE / VECTOR_SEARCH.
20You can implement end‑to‑end RAG — embed query → vector search → build prompt → call LLM → parse response.
07 · Checklist14 / 15
DP‑800 · ColophonClosing
— Good luck
Focus on scenario‑based reasoning: understand not just what each technology does, but when and why you would choose it over the alternatives.
Source

Microsoft Learn — Study Guide for Exam DP‑800.

End of guide · DP‑800
DP‑80015 / 15