Microsoft Azure Data Scientist Associate
225 practice questions
Last reviewed: April 2026
Personal notes and resource links for your study journey
Filter by Certification
DP-100 validates the day-to-day skills of a data scientist working on Azure: designing ML solutions, exploring and preparing data, training and deploying models in Azure Machine Learning, and β since the 2024 refresh β optimizing language models for AI applications. The audience is practicing data scientists and ML engineers who write Python against the Azure ML SDK / CLI v2 and use Azure ML studio. The exam is heavier on Azure-specific implementation than on classical statistics or algorithm theory: expect 40β60 questions in 100 minutes including code-completion drag-and-drops, scenario items, and at least one case study.
About 22%. Choosing compute and storage for ML workloads, Azure ML workspaces, datastores and data assets, environments, and responsible-AI considerations at design time.
About 22%. Azure ML notebooks, AutoML for classification / regression / forecasting / NLP / CV, Azure ML designer, and basic MLflow integration for experiment tracking.
Largest classical-ML domain at 28%. Training jobs (script and command jobs), distributed training, hyperparameter sweep jobs, model registration, managed online endpoints, batch endpoints, and pipelines.
New domain added in 2024 at 28% weight. Prompt flow, fine-tuning foundation models in Azure ML / Azure AI Foundry, evaluating LLM applications, RAG patterns, and responsible-AI controls for generative scenarios.
Services you'll encounter on the exam and why each one matters.
End-to-end managed ML platform β workspaces, compute, datastores, environments, jobs, registries, and managed inference endpoints across the full lifecycle.
Why it's on the exam: Azure ML is the umbrella service spanning every DP-100 domain β expect questions on workspace setup, compute selection, asset versioning, and v2 CLI/SDK usage.
Web-based workspace for Azure ML β notebooks, experiment tracking, asset browsers, compute management, and one-click model deployment.
Why it's on the exam: Domain 2 (Explore data and run experiments) tests Studio as the surface for kicking off jobs, viewing run metrics, and comparing experiments side-by-side.
Drag-and-drop visual interface for building, training, and deploying ML pipelines without writing code, with built-in dataset and transform modules.
Why it's on the exam: Domain 1 surfaces Designer as the low-code path for preparing data and assembling training pipelines β distinguish it from SDK/CLI v2 workflows.
Automated training that sweeps algorithms, featurization, and hyperparameters across classification, regression, forecasting, NLP, and vision tasks.
Why it's on the exam: Domain 3 (Train and deploy models) tests AutoML for baseline models, model selection at scale, and surfacing the best run for registration.
Versioned multi-step orchestration for data prep, training, evaluation, and deployment β declared via YAML jobs or the Python SDK v2.
Why it's on the exam: Reproducibility and reusability scenarios in Domains 2 and 3 name Pipelines as the canonical answer over ad-hoc scripts or notebooks.
Workspace-scoped registry of versioned models with stages, tags, and signed lineage back to the training job and dataset version.
Why it's on the exam: Domain 3 promotion-to-production scenarios test the Model Registry as the auditable handoff between training and inference deployment.
Managed online (low-latency real-time) and batch endpoints for hosted inference, with traffic splitting, autoscaling, and managed identity.
Why it's on the exam: Domain 3 frequently asks online-vs-batch tradeoffs and blue/green traffic-split rollouts β managed endpoints are the named primitive.
Unified workspace for building generative-AI apps β model catalog, fine-tuning, evaluation flows, prompt orchestration, and content-safety integration.
Why it's on the exam: Domain 4 (Optimize language models for AI applications) is anchored on AI Foundry as the platform for foundation-model selection, fine-tuning, and evaluation.
Managed Apache Spark + Delta Lake platform with MLflow tracking, distributed training, and tight Azure ML interop for large-scale data prep and modeling.
Why it's on the exam: Domain 1 large-data scenarios prefer Databricks for distributed feature engineering and PySpark transforms outside Azure ML compute.
Unified analytics platform combining dedicated/serverless SQL pools, Spark pools, and pipelines for warehouse-scale data preparation feeding Azure ML.
Why it's on the exam: Domain 1 questions on enterprise data sources for training name Synapse as the warehouse-side source connector for Azure ML datastores.
Hierarchical-namespace blob storage tier optimized for analytics workloads β the default backing store for Azure ML datastores and training assets.
Why it's on the exam: Every DP-100 data-prep scenario assumes ADLS Gen2 as the data substrate β datastore registration, ACLs, and lifecycle policies all surface in Domain 1.
Managed access to OpenAI foundation models (GPT-4o, GPT-4.1, o-series, embedding models) with fine-tuning, content filters, and Entra-ID authentication.
Why it's on the exam: Domain 4 fine-tuning, embedding, and prompt-engineering scenarios for language-model apps are anchored on Azure OpenAI Service.
Native MLflow tracking server inside every Azure ML workspace β log params, metrics, artifacts, and models with the open MLflow SDK.
Why it's on the exam: Domain 2 experiment-tracking questions name MLflow as the canonical API; expect distractors comparing MLflow autolog to manual job-output capture.
Managed compute clusters (CPU/GPU), compute instances, and serverless training options with autoscaling, low-priority pricing, and spot tiers.
Why it's on the exam: Domain 3 training scenarios test CPU-vs-GPU selection, cluster sizing for distributed training, and quota planning for hyperparameter sweeps.
Visual + code-first authoring of LLM workflows β prompt templates, chained tool calls, evaluation flows, and batch-run grading against test sets.
Why it's on the exam: Domain 4 RAG, prompt-engineering, and evaluation scenarios are tested through Prompt Flow as the production-grade authoring surface.
Serverless event-driven compute for lightweight real-time inference, model-output post-processing, and stitching Azure ML calls into business workflows.
Why it's on the exam: Domain 3 deployment-pattern questions distinguish managed online endpoints from custom Functions-based inference for cold-start, payload-size, or cost reasons.
Identity platform providing user/service-principal authentication, managed identities, RBAC roles, and Conditional Access for every Azure ML resource.
Why it's on the exam: Domain 1 workspace setup and Domain 3 deployment access-control scenarios name Entra ID managed identities as the AAD-native way to authorize compute and endpoints.
Managed secrets, keys, and certificates store for connection strings, model API keys, and customer-managed keys protecting training data and artifacts.
Why it's on the exam: Customer-managed key encryption for Azure ML workspace data and secret retrieval from training jobs both name Key Vault as the answer in Domain 1.
Metrics, logs, alerts, and Application Insights coverage for Azure ML endpoints, training jobs, and data drift signals via Log Analytics workspaces.
Why it's on the exam: Domain 3 production-monitoring scenarios test Azure Monitor + Log Analytics for endpoint latency/error alerts and surfacing model-drift detections to ops teams.
Unified data governance for scanning datastores, classifying sensitive data, mapping lineage, and enforcing access policies across the analytics estate.
Why it's on the exam: Responsible-AI and data-governance questions in Domains 1 and 4 reference Purview for training-data lineage, PII classification, and lineage-aware model release gates.
$115kβ$165kβ$230k USD annual
Range covers US-based mid-to-senior data scientists where Azure ML proficiency is required. FAANG / unicorn applied scientists often clear $300k TC. Cert is a screening signal; demonstrated modeling experience and publication / kaggle / open-source presence drive the high end.
Source: levels.fyi 2025 data scientist / ML engineer roles, U.S. BLS OEWS May 2024 (15-2051 data scientists, 15-2099 ML scientists), Glassdoor 2025. Figures are approximate; actual compensation depends on role, region, and experience.
DP-100 has held steady demand as enterprises operationalize ML on Azure ML and increasingly on Azure AI Foundry. Recruiters treat it as the canonical Azure ML proof point β most useful for data scientists who need to demonstrate they can ship beyond a notebook into managed endpoints and pipelines. The 2024 LLM-optimization domain has made DP-100 more attractive to GenAI engineers as well. It pairs naturally with AI-102 for engineers building production GenAI apps and with DP-203 / DP-700 for data-engineer-leaning ML practitioners.
There are no formal prerequisites, but DP-100 assumes practitioner-level data-science skills going in. Microsoft's outline expects fluency in Python, the scikit-learn / pandas / NumPy stack, and the core ML workflow (split, train, evaluate, deploy). DP-900 is a useful conceptual on-ramp for candidates new to Azure data services, but is not required.
The official Microsoft Learn path covers all four domains in roughly 30β40 hours, focused on Azure ML SDK / CLI v2 and prompt flow. Hands-on time is essentially required: a personal Azure subscription with a small Azure ML workspace, plus 10+ hours running real training jobs, model deployments, and prompt-flow runs. The 2024 LLM-optimization domain is under-covered by older third-party material, so candidates should rely on Microsoft Learn modules for that area.
DP-100 sits in the Associate tier and is generally considered moderately difficult β easier than AZ-204 / AI-102 for experienced data scientists, harder for engineers new to ML. Plan on 60β100 hours of study over 6β10 weeks with prior data-science experience; substantially longer if Python ML is new to you. The exam runs about 100 minutes with 40β60 questions in multiple-choice, multiple-response, drag-and-drop (including code-completion), hot-area, and case-study formats.
The most common stumbling block is Azure ML SDK / CLI v2 specifics β Microsoft's recent migration from SDK v1 to v2 broke many third-party study guides, so older material may show outdated YAML and command shapes. The new LLM-optimization domain (prompt flow, fine-tuning, evaluation) has a learning curve of its own and tends to surprise candidates who treated DP-100 as a classical-ML exam.
Major refresh adding the LLM-optimization domain (28% weight), modernizing training-job and deployment material to Azure ML SDK / CLI v2, and integrating Azure AI Foundry concepts. Microsoft refreshes DP-100 approximately every 12β18 months without changing the exam code.
Migrated from Azure ML SDK v1 to SDK / CLI v2 framing, retired Azure ML designer-heavy questions, and added MLflow integration coverage.
Initial GA, replacing the retired DP-100 (legacy code). Original outline focused on Azure ML designer, AutoML, and SDK v1.
DP-100 (Microsoft Azure Data Scientist Associate) is a a moderately difficult exam expecting practical hands-on experience plus solid understanding of best practices Associate-level exam. Most candidates need 80β150 hours of study spread over 6β12 weeks for associate-level exams. Most candidates who score consistently above the passing threshold on practice exams pass on their first attempt.
Most candidates need 80β150 hours of study spread over 6β12 weeks for associate-level exams. Time-to-pass varies widely by prior experience. Engineers with hands-on production experience in the underlying technology typically need less; candidates new to the platform should plan toward the upper end of that range.
DP-100 is a recognized credential in the Azure ecosystem and signals validated knowledge to employers, recruiters, and clients. Whether it is worth the time and fee for you depends on your role and goals β it tends to pay off most for cloud engineers, architects, and consultants who work with Azure day-to-day or want to move into roles that do.
The passing score for DP-100 is 700 / 1000. The exam contains 50 questions and lasts 1 hr 40 min.
The DP-100 exam fee is $165 USD. Fees are set by Azure and may vary by region; always confirm the current price on the official Azure certification page before booking.
Microsoft role-based certifications expire after 1 year but can be renewed for free via an unproctored online assessment on Microsoft Learn, starting 6 months before expiration.
Yes. You can take the exam online (proctored via the provider's secure browser, available 24/7 in most regions) or at an in-person Pearson VUE test center during business hours. Both formats use the same questions, time limit, and passing score.
CertLabPro provides 15 study modes across the practice question bank for DP-100. The exam-simulation mode mirrors the real exam: 50 questions in 1 hr 40 min, with the same passing threshold of 700 / 1000. Browse mode lets you read every Q&A statically.