动手实验室 — DP-100 Microsoft Azure Data Scientist Associate

最后审核时间：2026年5月

使用原生 Terraform 构建 DP-100 考试中的 AWS 服务——每次构建一个代码块，并紧扣考试领域。相同的代码可在 OpenTofu 上运行。

概述

在本实验结束时，您将使用纯 Terraform 预置 Azure 机器学习工作区控制平面——工作区本身、三个必需的依赖项（存储账户、密钥保险库、Application Insights）以及一个在空闲时缩放至零的 Azure ML 计算集群，以避免不必要的开销。这是 DP-100 参考工作区设置；训练作业和模型部署都将连接到它。

将这些片段放入一个 main.tf 文件中，运行 terraform init，然后逐步运行 terraform apply。

先决条件

Terraform >= 1.5 或 OpenTofu >= 1.6。
具有创建 Azure ML、存储、密钥保险库和 Application Insights 资源权限的 Azure 订阅。
Azure CLI 已认证 (az login)。
订阅中已接受认知服务负责任 AI 条款（Azure ML 工作区创建时需要此项，只需一次）。

费用说明

控制平面空闲时接近 $0 成本：

Azure ML 工作区、存储、密钥保险库、App Insights：空闲时总计约 $1/月。
计算集群 (Standard_DS3_v2, 最小 0 个节点, 最大 2 个)：当缩放至 0 时（实验默认）成本为 $0。当作业运行时，每个节点约 $0.30/小时。

DP-100 的成本陷阱是将计算集群的 min_node_count 设置为 > 0——即使一个空闲节点每月也会产生 $200+ 的费用。我们将 min_node_count 设置为 0，并将 scale_down_nodes_after_idle_duration 设置为 PT15M（空闲 15 分钟后缩减）。运行前请验证。完成后请销毁。

步骤

1.提供者、资源组、命名

标准的 Azure 开篇。Azure ML 工作区是区域性的——如果您计划在实验之后继续使用，请选择一个具有广泛 GPU SKU 可用性的区域（eastus、westus、westeurope 是安全的选择）。

terraform {
  required_version = ">= 1.5"

  required_providers {
    azurerm = { source = "hashicorp/azurerm", version = "~> 4.0" }
    random  = { source = "hashicorp/random",  version = "~> 3.6" }
  }
}

provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy = true
    }
  }
}

resource "random_id" "suffix" {
  byte_length = 3
}

data "azurerm_client_config" "current" {}

locals {
  tags = {
    Project   = "certlabpro-dp-100"
    ManagedBy = "terraform"
  }
}

resource "azurerm_resource_group" "main" {
  name     = "certlabpro-dp-100-rg"
  location = "eastus"
  tags     = local.tags
}

2.预置三个 Azure ML 工作区依赖项

配置服务：

Azure Storage
Azure Key Vault
Azure Application Insights

Azure ML 工作区需要连接到三个预先存在的资源：一个存储账户（用于数据集、模型、日志）、一个密钥保险库（用于凭据）和一个 Application Insights 实例（用于运行遥测数据）。DP-100 反复测试这个三元组——“为什么我无法创建工作区？”几乎总是因为缺少其中一个。

这里的存储账户获得了标准的安全默认设置；密钥保险库使用 RBAC 授权（现代默认设置）。

resource "azurerm_storage_account" "ml" {
  name                            = "dp100ml${random_id.suffix.hex}"
  resource_group_name             = azurerm_resource_group.main.name
  location                        = azurerm_resource_group.main.location
  account_tier                    = "Standard"
  account_replication_type        = "LRS"
  account_kind                    = "StorageV2"
  https_traffic_only_enabled      = true
  min_tls_version                 = "TLS1_2"
  allow_nested_items_to_be_public = false

  tags = local.tags
}

resource "azurerm_key_vault" "ml" {
  name                       = "kv-dp100-${random_id.suffix.hex}"
  resource_group_name        = azurerm_resource_group.main.name
  location                   = azurerm_resource_group.main.location
  tenant_id                  = data.azurerm_client_config.current.tenant_id
  sku_name                   = "standard"
  enable_rbac_authorization  = true
  soft_delete_retention_days = 7

  tags = local.tags
}

resource "azurerm_application_insights" "ml" {
  name                = "appi-dp100-${random_id.suffix.hex}"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  application_type    = "web"

  tags = local.tags
}

3.创建 Azure ML 工作区本身
配置服务：
- Azure Machine Learning
工作区将这三个依赖项关联起来，并获得一个系统分配的托管标识，下游的计算目标、数据集和端点将使用该标识从依赖项中读取数据。DP-100 的“管理用于 ML 的 Azure 资源”领域正是测试这种确切的形态——工作区 + 标识 + 角色分配。

我们将 public_network_access_enabled 设置为 true 以简化实验；生产工作区通常使用私有端点（DP-100 的“设计和准备机器学习解决方案”领域测试私有链接变体）。
```
resource "azurerm_machine_learning_workspace" "main" {
  name                          = "mlw-dp100-${random_id.suffix.hex}"
  resource_group_name           = azurerm_resource_group.main.name
  location                      = azurerm_resource_group.main.location
  application_insights_id       = azurerm_application_insights.ml.id
  key_vault_id                  = azurerm_key_vault.ml.id
  storage_account_id            = azurerm_storage_account.ml.id
  public_network_access_enabled = true

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}
```
4.添加一个在空闲时缩放至零的 Azure ML 计算集群
配置服务：
- Azure Machine Learning
训练作业需要计算资源。Azure ML 计算集群是托管的 VM 池，可根据作业队列深度进行扩展。min_node_count = 0 是 DP-100 实验室/开发工作区的强制性成本优化设置——当没有作业排队时，集群会缩放至零节点并仅产生 $0 费用（仅元数据）。

Standard_DS3_v2（4 个 vCPU，14 GB RAM，每小时 $0.30）是典型的实验室默认配置——足够大以运行 sklearn 或小型 PyTorch 训练作业，又足够小以保持廉价。生产训练集群使用 GPU SKU（Standard_NC6s_v3 系列）。

scale_down_nodes_after_idle_duration = "PT15M"（ISO 8601 格式的 15 分钟持续时间）是 DP-100 中反复出现的成本问题：设置过长会导致昂贵节点持续运行；设置过短会导致频繁启动/关闭。15 分钟是 Azure 文档中记载的默认值。
```
resource "azurerm_machine_learning_compute_cluster" "main" {
  name                          = "cpu-cluster"
  location                      = azurerm_resource_group.main.location
  vm_priority                   = "Dedicated"
  vm_size                       = "Standard_DS3_v2"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.main.id

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = 2
    scale_down_nodes_after_idle_duration = "PT15M"
  }

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}
```

清理

terraform destroy 会销毁所有资源。关键注意事项：

计算集群会自动缩放至零——在销毁之前无需手动停止它。
Azure ML 工作区和密钥保险库都具有软删除保留功能；提供者特性中的 purge_soft_delete_on_destroy = true 使密钥保险库的销毁操作能够实际清除。工作区的软删除可在门户中配置，但 terraform destroy 无论如何都会生效。
存储账户默认软删除 14 天。

本实验不涵盖的内容

DP-100 涵盖了本实验无法容纳的更多 Azure ML 方面——计算实例（单用户 IDE VM，空闲时非常昂贵）、在线终结点（托管的实时推理）、批处理终结点（托管的批处理推理）、AutoML 作业、设计器（可视化管道编辑器）、MLflow 跟踪集成、ParallelRunStep、模型注册表提升工作流以及整个数据资产/数据存储目录。

我们专注于工作区控制平面，因为它是所有 DP-100 模式所依赖的基础。终结点连接到工作区。作业在计算集群上运行。模型在工作区的 MLflow 跟踪中注册。数据集存储在存储账户中。

对于上述方面，请参阅此认证页面上的浏览和Editorial部分。DP-100 的“训练 ML 模型”和“部署和操作 ML 解决方案”领域最好通过在此工作区上运行作业来学习——本实验为您提供了基础，而 Python SDK 完成实际的训练和部署。

← 返回 DP-100 中心

概述

将这些片段放入一个 main.tf 文件中，运行 terraform init，然后逐步运行 terraform apply。

费用说明

控制平面空闲时接近 $0 成本：

Azure ML 工作区、存储、密钥保险库、App Insights：空闲时总计约 $1/月。
计算集群 (Standard_DS3_v2, 最小 0 个节点, 最大 2 个)：当缩放至 0 时（实验默认）成本为 $0。当作业运行时，每个节点约 $0.30/小时。

步骤

1.提供者、资源组、命名

terraform {
  required_version = ">= 1.5"

  required_providers {
    azurerm = { source = "hashicorp/azurerm", version = "~> 4.0" }
    random  = { source = "hashicorp/random",  version = "~> 3.6" }
  }
}

provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy = true
    }
  }
}

resource "random_id" "suffix" {
  byte_length = 3
}

data "azurerm_client_config" "current" {}

locals {
  tags = {
    Project   = "certlabpro-dp-100"
    ManagedBy = "terraform"
  }
}

resource "azurerm_resource_group" "main" {
  name     = "certlabpro-dp-100-rg"
  location = "eastus"
  tags     = local.tags
}

2.预置三个 Azure ML 工作区依赖项

配置服务：

Azure Storage
Azure Key Vault
Azure Application Insights

这里的存储账户获得了标准的安全默认设置；密钥保险库使用 RBAC 授权（现代默认设置）。

resource "azurerm_storage_account" "ml" {
  name                            = "dp100ml${random_id.suffix.hex}"
  resource_group_name             = azurerm_resource_group.main.name
  location                        = azurerm_resource_group.main.location
  account_tier                    = "Standard"
  account_replication_type        = "LRS"
  account_kind                    = "StorageV2"
  https_traffic_only_enabled      = true
  min_tls_version                 = "TLS1_2"
  allow_nested_items_to_be_public = false

  tags = local.tags
}

resource "azurerm_key_vault" "ml" {
  name                       = "kv-dp100-${random_id.suffix.hex}"
  resource_group_name        = azurerm_resource_group.main.name
  location                   = azurerm_resource_group.main.location
  tenant_id                  = data.azurerm_client_config.current.tenant_id
  sku_name                   = "standard"
  enable_rbac_authorization  = true
  soft_delete_retention_days = 7

  tags = local.tags
}

resource "azurerm_application_insights" "ml" {
  name                = "appi-dp100-${random_id.suffix.hex}"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  application_type    = "web"

  tags = local.tags
}

3.创建 Azure ML 工作区本身

配置服务：

Azure Machine Learning

工作区将这三个依赖项关联起来，并获得一个系统分配的托管标识，下游的计算目标、数据集和端点将使用该标识从依赖项中读取数据。DP-100 的“管理用于 ML 的 Azure 资源”领域正是测试这种确切的形态——工作区 + 标识 + 角色分配。

我们将 public_network_access_enabled 设置为 true 以简化实验；生产工作区通常使用私有端点（DP-100 的“设计和准备机器学习解决方案”领域测试私有链接变体）。

resource "azurerm_machine_learning_workspace" "main" {
  name                          = "mlw-dp100-${random_id.suffix.hex}"
  resource_group_name           = azurerm_resource_group.main.name
  location                      = azurerm_resource_group.main.location
  application_insights_id       = azurerm_application_insights.ml.id
  key_vault_id                  = azurerm_key_vault.ml.id
  storage_account_id            = azurerm_storage_account.ml.id
  public_network_access_enabled = true

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}

4.添加一个在空闲时缩放至零的 Azure ML 计算集群

配置服务：

Azure Machine Learning

训练作业需要计算资源。Azure ML 计算集群是托管的 VM 池，可根据作业队列深度进行扩展。min_node_count = 0 是 DP-100 实验室/开发工作区的强制性成本优化设置——当没有作业排队时，集群会缩放至零节点并仅产生 $0 费用（仅元数据）。

Standard_DS3_v2（4 个 vCPU，14 GB RAM，每小时 $0.30）是典型的实验室默认配置——足够大以运行 sklearn 或小型 PyTorch 训练作业，又足够小以保持廉价。生产训练集群使用 GPU SKU（Standard_NC6s_v3 系列）。

scale_down_nodes_after_idle_duration = "PT15M"（ISO 8601 格式的 15 分钟持续时间）是 DP-100 中反复出现的成本问题：设置过长会导致昂贵节点持续运行；设置过短会导致频繁启动/关闭。15 分钟是 Azure 文档中记载的默认值。

resource "azurerm_machine_learning_compute_cluster" "main" {
  name                          = "cpu-cluster"
  location                      = azurerm_resource_group.main.location
  vm_priority                   = "Dedicated"
  vm_size                       = "Standard_DS3_v2"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.main.id

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = 2
    scale_down_nodes_after_idle_duration = "PT15M"
  }

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}

清理

terraform destroy 会销毁所有资源。关键注意事项：

计算集群会自动缩放至零——在销毁之前无需手动停止它。
Azure ML 工作区和密钥保险库都具有软删除保留功能；提供者特性中的 purge_soft_delete_on_destroy = true 使密钥保险库的销毁操作能够实际清除。工作区的软删除可在门户中配置，但 terraform destroy 无论如何都会生效。
存储账户默认软删除 14 天。

本实验不涵盖的内容

动手实验室 — DP-100 Microsoft Azure Data Scientist Associate

概述

先决条件

💰费用说明

步骤

清理

本实验不涵盖的内容

动手实验室 — DP-100 Microsoft Azure Data Scientist Associate

概述

先决条件

💰费用说明

步骤

清理

本实验不涵盖的内容

费用说明

费用说明