Playbook — AI-901 Microsoft Azure AI Fundamentals

Last reviewed: June 2026

A scannable reference of architectural patterns the AI-901 exam tests. Read top-to-bottom, or jump to a section.

Identify AI concepts and capabilities

A loan-approval model must not disadvantage people based on gender or ethnicity.

Apply the Responsible AI principle of fairness: assess and mitigate bias across demographic groups.

Why: Fairness targets equitable treatment; don't confuse it with reliability (consistent results) or privacy (data protection).

Reference

An autonomous-driving model must behave consistently and fail safely in unexpected conditions.

Apply reliability and safety: rigorous testing, monitoring, and guardrails for edge cases.

Why: Reliability and safety is about consistent, harm-avoiding behavior under uncertainty, not about who the data is about.

A health app processes personal data and must protect it from exposure or misuse.

Apply privacy and security: protect personal data, control access, and respect consent.

Why: This principle covers data confidentiality and protection; fairness and inclusiveness are about who the system serves.

A solution should empower and engage people regardless of ability or background.

Apply inclusiveness: design so everyone benefits, including people with disabilities.

Why: Inclusiveness is about broad accessibility and reach; fairness is specifically about unbiased outcomes.

Users must understand how an AI system reaches its decisions and what its limits are.

Apply transparency: explain how the system works, its purpose, capabilities, and limitations.

Why: Transparency makes the system understandable; accountability is about who is responsible for it.

Clear ownership and governance are needed so someone answers for an AI system's behavior.

Apply accountability: people design, build, and operate within a governance framework and remain answerable.

Why: Accountability assigns human responsibility; transparency only explains the system, it doesn't assign ownership.

Explain how a generative AI model produces a coherent paragraph from a prompt.

A large language model predicts the next token repeatedly using patterns learned from massive training text.

Why: Generative models generate new content via probabilistic token prediction; they don't retrieve stored answers verbatim.

Reference

Pick a model: one needs cheap high-volume chat, another needs complex multi-step reasoning.

Match the model to the task — a smaller/faster model for routine chat, a larger reasoning model for hard tasks.

Why: Bigger isn't always better; capability, cost, and latency trade off, so choose by workload need.

Reference

Decide how to consume a model: managed standard endpoint vs. dedicated capacity.

Use a standard (pay-as-you-go) deployment for variable load, or provisioned throughput for steady high-volume, predictable performance.

Why: Provisioned reserves capacity for consistent latency; standard bills per token and is simplest to start.

Reference

Output is too repetitive; you want more creative, varied responses.

Raise the temperature parameter; lower it toward 0 for deterministic, focused output.

Why: Temperature controls randomness of token selection; high = creative, low = consistent.

Responses get cut off mid-sentence, or you need to cap output length and cost.

Adjust the max tokens (max completion length) parameter to set the response ceiling.

Why: Max tokens bounds output length; it doesn't change creativity — that's temperature or top-p.

You want to limit token choices to the most probable set without a hard temperature.

Use top-p (nucleus sampling) to restrict sampling to the smallest set of tokens whose probabilities sum to p.

Why: Top-p narrows the candidate pool by probability mass; tune it or temperature, generally not both aggressively.

A workload must draft marketing copy and summaries from prompts.

This is a generative AI workload — content creation from natural-language instructions.

Why: Generative workloads produce new content; text analysis only extracts insight from existing text.

A system must plan steps, call tools, and act toward a goal with minimal supervision.

This is an agentic AI workload — an LLM-driven agent that reasons, uses tools, and takes actions.

Why: Agentic adds autonomy and tool use on top of generation; plain generative AI just returns content.

A workload mines customer reviews for sentiment, key phrases, and named entities.

This is a text analysis (NLP) workload.

Why: Text analysis interprets existing text; it doesn't generate new prose like a generative workload.

Reference

A workload must transcribe calls and read responses aloud.

This is a speech workload — speech-to-text recognition and text-to-speech synthesis.

Why: Speech covers audio in/out; computer vision handles images, not audio.

A workload must detect objects and read text from photographs.

This is a computer vision workload — image classification, object detection, and OCR.

Why: Computer vision interprets images; information extraction from documents is a related but distinct task.

A workload must pull fields like dates, totals, and vendor from scanned invoices.

This is an information / data extraction workload (document understanding).

Why: Extraction pulls structured fields from documents; generic OCR only returns raw text, not labeled fields.

You need the main topics from a block of text without reading it all.

Use key phrase (keyword) extraction to surface the most important terms.

Why: Key phrase extraction lists salient terms; entity detection instead classifies named things like people or places.

You must identify people, organizations, locations, and dates mentioned in text.

Use named entity recognition (entity detection) to classify these entities.

Why: Entity detection labels named things; sentiment analysis instead scores the emotional tone.

You want to know whether reviews are positive, negative, or neutral.

Use sentiment analysis to score the opinion expressed in the text.

Why: Sentiment measures tone; summarization condenses content, it doesn't judge polarity.

You need a short abstract of a long report.

Use text summarization (extractive or abstractive) to condense the document.

Why: Summarization shortens while preserving meaning; key phrase extraction only lists terms, not a readable summary.

Distinguish converting audio to text from converting text to audio.

Speech recognition (speech-to-text) transcribes audio; speech synthesis (text-to-speech) generates spoken audio.

Why: Recognition is input audio→text; synthesis is the reverse — don't swap the two directions.

Reference

Implement AI solutions by using Microsoft Foundry

You need a single place to discover models, deploy them, and build AI apps on Azure.

Use the Microsoft Foundry portal — it hosts the model catalog, deployments, playground, and agent tooling.

Why: Foundry is the unified hub; individual Azure AI services exist but Foundry is where you compose and deploy solutions.

Reference

You want the model to always answer as a polite support agent regardless of the question asked.

Set behavior and persona in the system prompt; put the specific question in the user prompt.

Why: The system prompt frames overall behavior and rules; the user prompt is the per-turn request.

Reference

You picked a model in the catalog and need it callable from an app.

Create a deployment for the model in the Foundry portal, which gives an endpoint and key.

Why: A model in the catalog isn't usable until deployed; the deployment exposes the callable endpoint.

Reference

You want to test prompts and tune temperature before writing any code.

Use the chat playground in the Foundry portal to interact with the deployed model and adjust parameters.

Why: The playground lets you iterate on prompts and settings interactively; no SDK needed to experiment.

You need to call the deployed chat model from application code.

Use the Foundry (Azure AI) SDK to create a chat client that sends messages to the deployment endpoint.

Why: The SDK wraps the endpoint with a typed client; you pass system and user messages and read the completion.

Reference

Your app must authenticate to the deployed model.

Use the deployment's endpoint URL with an API key or Microsoft Entra ID (Azure AD) credential.

Why: Key-based auth is simplest; Entra ID is more secure and avoids embedding secrets in code.

You want an AI assistant that follows instructions and uses tools, built without much code.

Create a single agent in the Foundry portal — define its instructions, model, and tools (the Agent Service).

Why: The portal agent builder configures behavior and tools declaratively; you don't hand-write the orchestration loop.

Reference

Your agent should always cite sources and refuse off-topic requests.

Encode these rules in the agent's instructions (its system-level guidance).

Why: Agent instructions steer consistent behavior across turns, similar to a system prompt for a plain chat model.

Your agent must answer from your company documents, not just its training data.

Give the agent a knowledge/grounding tool (e.g., file search or Azure AI Search) so it retrieves your data.

Why: Grounding/RAG supplies current, private context; without it the model can hallucinate or use stale knowledge.

You need a custom app to drive a Foundry agent programmatically.

Build an agent client app with the Foundry SDK — create a thread, add messages, run the agent, read responses.

Why: The SDK exposes threads, runs, and messages so your app can integrate the agent into any workflow.

Reference

You must build an app that extracts sentiment and entities from incoming text.

Use Azure AI Language (text analysis) via the SDK or REST, accessed through Foundry, calling sentiment and NER features.

Why: For classic NLP tasks, the Language service is purpose-built and cheaper than prompting a general LLM.

Reference

A user wants to speak a question and have a deployed model answer it.

Send the audio to a multimodal model that accepts speech input, or transcribe first then prompt the model.

Why: Multimodal models can take audio directly; otherwise use speech-to-text to feed a text model.

Your app needs high-quality transcription and natural spoken output.

Use Azure AI Speech within Foundry Tools for speech-to-text and text-to-speech.

Why: The Speech service offers tuned recognition and lifelike neural voices, beyond what a chat model alone provides.

Reference

You need the app to read responses aloud in a natural-sounding voice.

Use Azure AI Speech text-to-speech with a neural voice; control prosody with SSML if needed.

Why: Neural voices sound natural; SSML lets you tune pace, pitch, and pronunciation.

An app must describe what is happening in a user-supplied photo and answer questions about it.

Send the image to a multimodal model in Foundry and prompt it with the question.

Why: Multimodal LLMs reason over image content; the classic Vision service only returns fixed tags and captions.

Reference

An app must produce images from text descriptions on demand.

Deploy a text-to-image model (e.g., a DALL-E / image generation model) in Foundry and call it from your app.

Why: Image generation models create visuals from prompts; a vision model only analyzes existing images.

Reference

You need an app that classifies images and reads printed text from them.

Build a vision app using Azure AI Vision (image analysis and OCR) accessed through Foundry.

Why: Azure AI Vision provides ready image analysis and OCR; you don't need to train a model for common tasks.

Reference

An app must extract printed and handwritten text from scanned pages.

Use the OCR (Read) capability of Azure AI Vision to return the recognized text and its location.

Why: OCR returns raw text with coordinates; structured-field extraction needs Content Understanding instead.

You must extract structured fields (totals, dates, line items) from invoices and forms.

Use Azure AI Content Understanding in Foundry Tools to extract structured data from documents and forms.

Why: Content Understanding pulls labeled fields; plain OCR only returns unstructured text.

Reference

You need structured descriptions and metadata extracted from a batch of images.

Use Azure AI Content Understanding to analyze images and return structured output.

Why: Content Understanding produces consistent structured results across content types, beyond a free-text caption.

You must turn call recordings into structured summaries with key data points.

Use Azure AI Content Understanding on the audio to transcribe and extract structured fields.

Why: Content Understanding combines transcription with extraction; Speech alone only gives the transcript.

You need scenes, topics, and key fields pulled from training videos.

Use Azure AI Content Understanding for video to extract structured insights across modalities.

Why: It analyzes audio and visual streams together to produce structured output, not just a transcript.

Reference

You must add your company's private FAQ knowledge to model answers, with minimal effort.

Ground the model with retrieval (RAG) over your documents rather than fine-tuning.

Why: RAG injects current data at query time and is simpler/cheaper; fine-tuning changes behavior, not knowledge freshness.

You must block harmful or unsafe text and image outputs from a deployed model.

Enable Azure AI Content Safety filters on the deployment to detect and block harmful content.

Why: Content Safety enforces responsible-AI guardrails at runtime; the base model alone isn't guaranteed safe.

Reference

After deploying, you need to measure response quality and watch for drift.

Use Foundry evaluation and monitoring tools to score outputs and track metrics over time.

Why: Evaluation quantifies quality (groundedness, relevance); monitoring catches regressions in production.

You need to organize models, agents, and connections for one application.

Create a Foundry project, which groups deployments, connected resources, and tools for that solution.

Why: A project is the workspace boundary; connections link external resources like Azure AI Search or storage.