Playbook

Microsoft Azure AI Engineer Associate

Last reviewed: May 2026

A scannable reference of architectural patterns the AI-102 exam tests. Read top-to-bottom, or jump to a section.

Plan, manage, and secure an Azure AI solution

Automate API key rotation for an AI service with no application downtime.

Store both primary and secondary keys in Azure Key Vault with automatic rotation. Configure the application to try the secondary key if the primary fails.

Why: Key Vault manages the rotation lifecycle. The dual-key pattern ensures one key is always valid during the rotation window.

Ensure AI service traffic never leaves the VNet and Microsoft cannot use customer data for service improvement.

Deploy the AI service with a private endpoint and disable public network access. Separately, enable the data processing opt-out setting on the resource.

Why: Private Endpoint provides network isolation. Data opt-out is a distinct setting for data privacy. One does not imply the other.

Provide an application in Azure Kubernetes Service (AKS) with secure, credential-free access to an AI service.

Assign a user-assigned managed identity to the AKS pods. Grant this identity the "Cognitive Services User" RBAC role on the AI service resource.

Why: Managed identity is the standard passwordless authentication pattern for Azure resources, eliminating the need to store secrets in pod configurations.

Reference

Track AI service costs and enforce spending limits per department without using separate subscriptions.

Create separate AI service resources per department. Apply a "department" resource tag to each. Configure Azure Cost Management budgets with alert thresholds based on the tag value.

Why: Tags are the standard for cost allocation. Azure Budgets can be scoped to tags to enforce spending limits via alerts or actions.

Alert operations when AI service API error rates exceed 5% or latency surpasses 2 seconds.

Configure Azure Monitor metric alerts on the AI service resource. Use the "Failed Requests" and "Latency" metrics with an appropriate aggregation period.

Why: Azure Monitor provides direct, platform-level metrics for performance and reliability, enabling real-time alerting without log query delays.

Design a disaster recovery plan for custom AI models (e.g., Custom Vision, LUIS) with a low RTO/RPO.

Deploy resources in paired regions. Automate daily model export to geo-redundant storage (GRS). Use Azure Traffic Manager for health-probe based automatic failover.

Why: AI PaaS services are not covered by Azure Site Recovery. DR requires explicit, scripted model export/import and a DNS-level traffic routing service.

Log full request and response payloads for all AI service calls for long-term audit retention.

Place Azure API Management (APIM) in front of the AI service. Configure APIM policies to log full request/response bodies. Send logs to Azure Storage with an immutability policy.

Why: Native AI service diagnostics do not log full payloads. APIM is the standard pattern for a logging and policy facade. Immutable storage ensures audit trail integrity.

Deploy a healthcare AI solution that processes patient data and must comply with HIPAA.

Deploy the AI resource in a US Azure region that supports HIPAA. Sign a Business Associate Agreement (BAA) with Microsoft for the subscription.

Why: HIPAA compliance requires both technical controls (region selection) and legal agreements (BAA). Both are mandatory.

Implement a content moderation system that flags harmful content but allows human review before blocking.

Use Azure AI Content Safety API. Auto-block content flagged with "high" severity. Queue content flagged as "medium" or "low" for a human review workflow.

Why: This human-in-the-loop pattern balances automated safety with the nuance required for moderation, preventing over-blocking of legitimate content.

Implement image and video processing solutions

Detect and count specific, branded products on retail store shelves, handling occlusion and varying orientation.

Train a Custom Vision object detection model. Use a dataset of labeled images representing the products in realistic shelf environments.

Why: Object detection provides both classification and location (for counting). A custom model is required to recognize specific product SKUs.

Reference

Perform real-time quality control image analysis in a factory with unreliable internet connectivity.

Deploy the Azure AI Vision container for Image Analysis to an edge device (e.g., Azure IoT Edge).

Why: Containers package cloud AI models for local execution, providing low latency and offline capability while still allowing for model updates when connected.

Extract text from scanned historical documents containing mixed print, handwritten text, and multiple languages.

Use the Azure AI Vision Read API (part of Image Analysis). Specify the latest model version to ensure best performance on mixed content.

Why: The Read API is Azure's most advanced OCR engine, specifically optimized for document-centric, mixed-content scenarios, outperforming older OCR APIs.

Analyze video streams to monitor store occupancy, track customer movement patterns, and measure queue lengths.

Deploy the Azure AI Vision Spatial Analysis container to an edge device connected to store cameras.

Why: Spatial Analysis is a purpose-built, containerized solution for real-time spatial analytics from video, providing operations like `personcount`, `persondistance`, and `personcrossingline`.

A Custom Vision object detection model has high precision but low recall (misses many objects).

Augment the training dataset with more diverse examples of the missed objects, particularly images with different lighting, angles, sizes, and partial occlusion.

Why: Low recall is a data quantity/diversity problem. The model hasn't seen enough variations to generalize effectively. Adding varied examples is the primary solution.

Implement natural language processing solutions

Analyze customer reviews to identify sentiment towards specific product features (e.g., positive for "battery life", negative for "screen").

Use the Azure AI Language Sentiment Analysis API with the `opinionMining` parameter enabled.

Why: Opinion Mining (also called Aspect-Based Sentiment Analysis) is the specific feature designed to extract sentiment associated with individual targets (aspects) within text.

Reference

Create an FAQ bot that supports many languages but uses a single knowledge base authored in English.

Use Azure AI Language's Custom Question Answering feature. It has built-in query translation to match questions against the English knowledge base.

Why: The built-in translation capability eliminates the need to maintain separate knowledge bases for each language, drastically simplifying content management.

A Conversational Language Understanding (CLU) model confuses two similar intents (e.g., "OrderPizza" vs. "ModifyOrder").

Add more diverse training utterances to both intents, focusing on examples that highlight the distinguishing keywords and phrases. Review for and remove ambiguous or overlapping examples.

Why: Model accuracy is primarily driven by the quality and clarity of training data. The goal is to create a clear "decision boundary" between intents.

Extract domain-specific entities like "ContractValue" or "TerminationClause" from legal documents.

Train a Custom Named Entity Recognition (NER) model using Azure AI Language. Provide labeled examples from your documents.

Why: Pre-built NER models only recognize general entities (Person, Location, etc.). Custom NER is required for any domain-specific entity extraction task.

Automatically find and redact Personally Identifiable Information (PII) like names and phone numbers from text.

Use the Azure AI Language PII detection API. Configure the entity categories to detect and set the redaction mode.

Why: This is a purpose-built API for PII, more reliable and comprehensive than regex or generic NER for this specific compliance task.

Extract medical entities, relationships, and assertions (e.g., negation) from clinical notes.

Use Azure AI Health Insights, specifically the Text Analytics for Health service.

Why: This is a specialized, HIPAA-compliant service trained on medical ontologies (e.g., UMLS), providing a deep understanding of clinical text that general NLP models lack.

Translate technical documents, ensuring that industry-specific terminology and brand names are translated correctly.

Use Azure Custom Translator. Train a custom model using a corpus of your existing translated documents (parallel documents).

Why: Custom Translator adapts to your specific domain's vocabulary and style, providing higher fidelity than the general-purpose translation model, which may mistranslate niche terms.

Transcribe a multi-participant meeting in real-time and attribute the text to each speaker.

Use the Azure AI Speech to Text service with conversation transcription and diarization enabled.

Why: Diarization is the specific feature that segments audio by speaker, providing "who said what" information alongside the transcription.

Improve speech-to-text accuracy for audio containing domain-specific acronyms, jargon, or proper names.

Train a custom speech model. Provide a dataset of audio samples with matching human-labeled transcripts, and a pronunciation file for custom terms.

Why: Custom models adapt the base acoustic and language models to your specific audio environment, speaking styles, and vocabulary for significantly higher accuracy.

Control the emphasis, pitch, rate, and pauses of text-to-speech narration for an e-learning module.

Use Speech Synthesis Markup Language (SSML) in the Text-to-Speech API request.

Why: SSML is the W3C standard for providing detailed instructions to a speech synthesizer, allowing for fine-grained control beyond plain text input.

Implement knowledge mining and document intelligence solutions

Design a search solution for 10M+ documents requiring low latency for a high volume of concurrent queries.

Use Azure AI Search on a Standard or higher tier. Scale out with replicas to handle query load and with partitions to handle data volume.

Why: Replicas are for query throughput (QPS). Partitions are for index size and I/O. Both are needed for high-scale, high-performance scenarios.

Enable users to ask natural language questions (e.g., "What is the return policy?") and get direct answers from a document collection.

Use Azure AI Search with semantic search enabled. Utilize the semantic answers and captions features.

Why: Semantic search goes beyond keyword matching to understand user intent and can extract and return direct, concise answers from the source text.

Reference

Implement a product search that finds exact matches for model numbers (keyword) and conceptually similar items (semantic).

Configure an Azure AI Search index with both searchable text fields and vector fields. Issue hybrid queries that combine keyword (`search`) and vector (`vectorQueries`) parameters.

Why: Hybrid search combines the precision of BM25 keyword ranking with the conceptual relevance of vector similarity, providing the best of both worlds.

Extract custom-formatted entities, like product codes (XX-####), during the Azure AI Search indexing pipeline.

Create a custom skillset that calls an Azure Function. The function contains the regex or other custom logic to find and extract the entities.

Why: Custom skills provide an extensibility point in the enrichment pipeline for any logic not covered by built-in cognitive skills.

Ensure searches for "laptop", "notebook", and "ultrabook" all return the same set of relevant documents.

Create a synonym map in Azure AI Search defining the equivalent terms. Associate the synonym map with the relevant searchable fields in the index definition.

Why: Synonym maps are the dedicated feature for expanding queries to include user-defined equivalent terms, directly improving search recall.

When updating an Azure AI Search skillset, re-process only the documents affected by the changes to save time and cost.

Enable enrichment caching in the indexer configuration. The indexer will then use cached results for unchanged skills and only re-run new or modified skills.

Why: Caching intermediate skill outputs is the key to enabling efficient incremental enrichment, avoiding costly full reprocessing of the entire dataset.

Orchestrate a pipeline to extract data from various documents (e.g., invoices), validate it against business rules, and store structured output.

Use a composed model in Azure AI Document Intelligence for extraction, an Azure Function for custom validation logic, and Azure Cosmos DB for storage. Orchestrate with Azure Logic Apps.

Why: This serverless architecture correctly separates concerns: Document Intelligence for specialized extraction, Functions for bespoke business logic, and Logic Apps for workflow orchestration.

Process a document package containing multiple form types (e.g., a claim form, receipts, and photos) in a single transaction.

Use an Azure AI Document Intelligence composed model. Train a classification model to identify the document type and route it to the appropriate custom or pre-built extraction model.

Why: Composed models act as a router, allowing a single endpoint to intelligently handle a mix of document types, each processed by its optimal model.

Redact PII from documents before they are indexed by Azure AI Search, so sensitive data is never stored in the search index.

Add the PII Detection cognitive skill to the indexer skillset. Configure the skill to mask PII and map the redacted text field to the index.

Why: This performs redaction "in-flight" during indexing, ensuring the searchable content is clean from the start, which is a critical security and compliance pattern.

Boost search results based on a document's recency (publication date) and popularity (view count).

Define a custom scoring profile in Azure AI Search. Use a `freshness` function on the date field and a `magnitude` function on the view count field.

Why: Scoring profiles allow you to modify the base BM25 relevance score by incorporating business-specific signals from document metadata.

Implement generative AI solutions

An Azure OpenAI chatbot needs to provide consistent, focused, and non-creative responses for a customer service scenario.

Set the `temperature` parameter to a low value, such as 0.1 or 0.2. Avoid setting it to exactly 0 for most models.

Why: Temperature controls the randomness of the output. Lowering it makes the model more deterministic and likely to choose the highest-probability tokens.

In a RAG solution, ensure the generative model only synthesizes answers from documents the specific user is permitted to access.

Implement security trimming at the retrieval stage. In Azure AI Search, apply security filters to the search query based on the user's AAD identity and group memberships.

Why: Access control must be enforced before the LLM sees the data. Filtering at the search (retrieval) layer is the only secure way to implement this.

Consistently extract structured data from unstructured text into a valid JSON object using Azure OpenAI.

Use a prompt that includes: 1) A clear role. 2) Explicit instruction to return ONLY JSON. 3) The desired JSON schema with field names and types. 4) Few-shot examples if possible.

Why: Highly structured and explicit prompts significantly increase the reliability of getting well-formed, structured output from LLMs.

A mission-critical application requires guaranteed, consistent throughput from Azure OpenAI, with no throttling during peak load.

Purchase and deploy the model using Provisioned Throughput Units (PTU).

Why: PTUs provide dedicated, reserved model processing capacity, unlike standard pay-as-you-go deployments which operate on a shared capacity model and are subject to throttling.

Reference

Maintain context in a long-running chatbot conversation without exceeding the model's token limit.

Implement a conversation summarization strategy. Periodically use a separate LLM call to summarize older parts of the conversation, and include this summary plus the most recent turns in the prompt.

Why: This "summarize and slide" pattern preserves long-term context much more effectively and economically than simple truncation or sending the entire (and eventually too long) history.

Enable an Azure OpenAI model to call an external API to get current weather information.

Define the API as a tool for the model using a precise JSON Schema format. Include a clear function `description` and detailed `parameter` descriptions so the model knows when and how to use it.

Why: The model relies entirely on the schema and descriptions to make an informed decision to call a function. A well-described function is critical for reliability.

Use Azure OpenAI to summarize a document that is much longer than the model's context window.

Implement a "map-reduce" or "refine" strategy. Chunk the document, generate a summary for each chunk (map), and then generate a final summary from the collection of chunk summaries (reduce).

Why: This is the standard pattern for applying fixed-context models to arbitrarily long inputs, ensuring the entire document content is considered.

Improve the perceived responsiveness of a chat application by displaying the AI's response as it is being generated.

When calling the Chat Completions API, set the `stream` parameter to `true`. Process the server-sent events as they arrive to build the response token by token.

Why: Streaming provides a much better user experience for real-time applications than waiting for the full response to be generated, which can take several seconds.

An AI agent must dynamically decide which of several tools (e.g., database query, web search, email sender) to use to fulfill a user request.

Use a framework like Semantic Kernel or Azure AI Agent Service. Define each capability as a distinct tool/plugin and let the agent's planner or ReAct loop orchestrate the tool calls.

Why: Agentic frameworks provide the orchestration layer (planner/reasoning loop) that enables an LLM to move beyond simple Q&A to become an autonomous actor that uses tools.

Prevent an autonomous AI agent from performing high-risk actions (e.g., deleting data, spending money) without oversight.

Implement a human-in-the-loop pattern. When the agent plans a high-risk action, the system must pause and require explicit confirmation from a human operator before executing.

Why: This is a critical responsible AI pattern for agentic systems, balancing autonomy with safety by gating irreversible or high-impact actions.