Azure DP-100: a 6-week study plan for the Data Scientist Associate
A realistic 6-week DP-100 plan covering Azure ML SDK v2, MLflow, designer, and deployment β plus the gotchas that fail otherwise-prepared candidates.
DP-100 is the Designing and Implementing a Data Science Solution on Azure exam. $165 USD, 40β60 questions over 100 minutes (the count varies because of case studies), one or two case-study sections, scaled passing score of 700/1000. It's the role-based associate cert for data scientists working on Azure Machine Learning.
Six weeks at 8β10 hours a week is enough if you already know Python, scikit-learn, and basic ML concepts. If you're learning ML from scratch, DP-100 isn't the right exam yet β go take a model-training course first. The exam tests Azure ML, not whether you understand a confusion matrix.
What DP-100 actually tests
The current exam guide (refreshed in 2024 to remove SDK v1 and lean fully into v2) breaks down roughly as:
- Manage Azure ML resources (workspaces, compute, datastores, environments) β about 25%
- Run experiments and train models (jobs, MLflow tracking, AutoML, hyperdrive) β about 25%
- Deploy and operationalize ML solutions (managed online endpoints, batch endpoints, monitoring) β about 25%
- Implement responsible ML (fairness, interpretability, differential privacy) β about 25%
What that means in practice: you need to be fluent with the Azure ML Python SDK v2, comfortable in Azure ML Studio (designer plus notebooks), and clear on the differences between MLflow tracking in Azure ML, AutoML for tabular / image / NLP, and HyperDrive / sweep jobs for hyperparameter tuning. The deployment half wants you to know managed online endpoints (real-time, with traffic splitting and blue-green) versus batch endpoints (scoring at scale).
Prerequisites you actually need
Before week 1, you should be at:
- Python, comfortably. Reading and writing functions, classes, decorators, virtual environments.
- pandas + numpy at a working level.
- scikit-learn, including
Pipeline,train_test_split, basic regressors and classifiers, andColumnTransformer. - Conceptual ML: train/validation/test split, cross-validation, overfitting, regularization, the difference between regression and classification metrics.
- Some Azure exposure β at minimum, AZ-900 vocabulary. Resource groups, RBAC, storage accounts, and Key Vault won't be re-explained on the exam.
If those bullets feel shaky, spend two weeks shoring them up before starting the plan below.
Week 1: workspace and compute
Get hands on the platform first. Don't read the exam guide front-to-back yet.
- Spin up an Azure free account if you don't have one. Create an Azure ML workspace through the portal. Note what gets created alongside it: storage account, Key Vault, container registry, Application Insights. The exam asks about these.
- Provision a compute instance (a small one β D2s_v3 is fine) and a compute cluster with min nodes = 0. Note that compute instances are billed even when idle but cluster nodes scale to zero. This is on the exam.
- Walk through the Azure ML Studio UI. Click into Datastores, Datasets / Data assets, Environments, Models, Endpoints. You're not building yet β you're getting the layout.
- Attach a notebook on the compute instance. Install
azure-ai-ml(the SDK v2 package β notazureml-core, which is v1 and deprecated). Authenticate withDefaultAzureCredentialand create anMLClient. Print the workspace name. That's your "hello world."
End-of-week checkpoint: you can connect to your workspace from a notebook in under 60 seconds without looking anything up.
Week 2: data, environments, jobs
Now you build real things.
- Register a CSV as a
Dataasset (URI file or MLTable). Read it from a notebook usingml_client.data.get(...). The exam loves the distinction betweenuri_file,uri_folder, andmltabledata asset types β memorize the use case for each. - Build a custom environment. Either author a
conda.yamlor use a curated environment plus an extra pip dep. Submit acommandjob that runs a training script (a 30-line scikit-learn classifier on the dataset you just registered). - Use MLflow autologging in your script (
mlflow.sklearn.autolog()then fit). Watch the metrics and artifacts show up in the job. Compare it to manually logging withmlflow.log_metric(). - Submit the job to your compute cluster instead of a compute instance. Watch the cluster spin up from 0 and back down.
Gotcha to internalize: in SDK v2, jobs are submitted via the command function from azure.ai.ml, not via ScriptRunConfig (that was v1). The exam will give you v1-style code in the wrong-answer choices. Train your eyes to spot it.
Week 3: AutoML, HyperDrive, pipelines
Heavier ML week.
- Run an AutoML classification job from the SDK against the same dataset. Limit it to 30 minutes and
max_trials=10so you don't burn credits. Look at the leaderboard. - Run a sweep / HyperDrive job over a custom training script. Try
randomsampling first, thenbayesian(which doesn't support early termination β that's an exam question). - Read up on the early termination policies: bandit, median stopping, truncation selection. Know the interface for each β in particular bandit's
slack_factorandslack_amounttoggle. - Build a pipeline job with at least two components β a data prep component and a training component β wired together. Pipelines aren't huge on the exam but they show up enough that you don't want to be guessing the YAML on test day.
End-of-week checkpoint: you can describe out loud what Random, Grid, and Bayesian sampling do, when to use which, and why Bayesian doesn't combine with bandit.
Week 4: deployment
This is where most candidates lose points.
- Register a model from a job's output. Practice both ways: from the SDK with
ml_client.models.create_or_update, and from the studio UI. - Deploy the model to a managed online endpoint. Stand up at least two deployments behind the same endpoint and split traffic 90/10 between them. This is the blue/green pattern Microsoft tests directly.
- Deploy the same model to a batch endpoint. Score a folder of input files. Note that batch endpoints don't keep compute idle; they spin clusters up per invocation.
- Set up data drift monitoring on the deployment. Configure an Application Insights alert. The exam will ask about Model Monitor (the new name for what used to be called Data Drift Monitor in SDK v1) for at least one question.
Gotcha: managed online endpoints are billed by the underlying VM whether or not you're sending traffic. The exam will set up a scenario where the cheapest answer is a batch endpoint and the wrong answers all default to online endpoints. Read the question for "predictions don't need to be real-time" before you pick.
Week 5: responsible ML and case studies
Less code, more reading.
- Walk through Microsoft's Responsible AI dashboard for a trained model. Generate fairness metrics, error analysis, model interpretability (SHAP) values. The exam tests vocabulary, not implementation depth.
- Read about differential privacy in Azure ML β
azureml-opendp-smartnoiseexists but the exam keeps it conceptual. - Take your first full-length practice exam under timed conditions. Two case studies in a row will eat 30+ minutes. Get used to the rhythm.
- Identify weak domains from the practice score. For most candidates that's either deployment internals or hyperparameter tuning policies β go back to weeks 3 or 4.
Week 6: drill and ship it
Practice exams every other day. After each one, write down the services or concepts you got wrong. Patterns will surface β usually around environments (curated vs. custom vs. registered), data asset types, and which monitoring tool is the right answer (Application Insights vs. Azure Monitor vs. Log Analytics workspace).
Schedule the exam for the end of the week. If you're scoring above 80% on two consecutive practice exams under timed conditions, you're ready. Below 70% means push another week β the $165 retake plus the 24-hour cooldown costs more than another seven days.
How DP-100 fits with AI-102 and DP-900
DP-100 is the data-scientist track; AI-102 is the AI-engineer track. The overlap is small. DP-100 wants you training and deploying custom models in Azure ML; AI-102 wants you wiring up Azure AI services (Vision, Language, OpenAI) into applications. If you're a data scientist, DP-100 alone is enough. If you're a software engineer building Copilot-style features, AI-102 is the better fit and DP-100 is overkill.
DP-900 is a friendly warmup β useful if you're new to Azure data services in general, redundant if you've already shipped on Azure ML.
When you're ready to drill questions, browse the DP-100 question bank on CertLabPro or start a timed simulation. The case-study questions are where time pressure bites β practice them under the clock, not in a coffee-shop afternoon read.