A backend service must call models and agents defined in a Foundry project.
→Use the Azure AI Foundry SDK (AIProjectClient) with the project connection string and a DefaultAzureCredential to get model and agent clients.
Why: The project client resolves connections and deployments centrally; hardcoding per-model endpoints and keys bypasses project governance.
Reference↗
Build a Q&A app grounded in policy documents.
→Embed and index the docs, retrieve top-k chunks per query, and pass them as context into the chat completion with a cite-your-sources instruction.
Why: RAG keeps knowledge current and citable without retraining; passing the full corpus into the prompt blows the context window and cost.
The model must look up live order status during a conversation.
→Define a tool with a JSON schema, let the model emit a tool call, execute it server-side, and return the result for the model to summarize.
Why: Function/tool calling lets the model invoke real systems deterministically; asking it to "guess" the status produces fabrications.
Reference↗
A task needs several dependent tool calls before a final answer.
→Run a tool-use loop: feed each tool result back to the model and iterate until it returns a final message, with a max-iteration cap.
Why: Iterative tool loops support multistep reasoning; a single round trip can't chain dependent lookups, and an uncapped loop can run away.
Before shipping, quantify how often a RAG app hallucinates or drifts off-topic.
→Run Foundry evaluators for groundedness, relevance, and coherence over a labeled test set and gate release on threshold scores.
Why: Built-in evaluators give measurable quality and safety signals; eyeballing a few samples doesn't catch systematic fabrication.
Reference↗
Define a support agent with a clear persona, goals, and boundaries.
→Set the agent's system instructions (role, goals, refusal rules) and attach only the tools it needs for its scope.
Why: Tight instructions plus least-tool access keep the agent on-task; broad instructions and every tool invite scope creep and unsafe actions.
An agent must remember context across turns within a session.
→Use Foundry Agent Service threads, which persist the message history per conversation so each run sees prior turns.
Why: Threads provide managed conversation memory; re-sending the whole transcript manually each call is brittle and easy to truncate wrong.
Reference↗
An agent needs web grounding and code execution without custom plumbing.
→Attach built-in Foundry agent tools such as Grounding with Bing Search and the Code Interpreter rather than hand-rolling integrations.
Why: Managed tools are governed and supported out of the box; custom reimplementations add maintenance and skip platform safety controls.
A primary agent should delegate billing questions to a specialized billing agent.
→Use connected agents: expose the billing agent as a tool the main agent can call, so it routes sub-tasks to specialists.
Why: Connected agents enable hierarchical delegation; cramming every domain into one mega-agent bloats instructions and degrades accuracy.
Reference↗
A workflow needs a planner, a researcher, and a writer collaborating with shared state.
→Orchestrate them with a multi-agent framework (Semantic Kernel / AutoGen on Foundry) using a defined orchestration pattern and shared context.
Why: Frameworks manage turn-taking, state, and termination; ad-hoc string passing between agents has no coordination or stop condition.
An agent runs unattended overnight and must not take risky actions alone.
→Bound it with allow-listed tools, per-action budgets, content filters, and a checkpoint that escalates high-impact steps for approval.
Why: Layered safeguards keep autonomy safe; an autonomous loop with full tool access and no approval gate can cause irreversible damage.
An agent intermittently fails mid-task and you must find the failing step.
→Inspect the run's traced steps and tool-call inputs/outputs in Foundry to locate the failing tool or malformed argument.
Why: Step-level traces pinpoint where a run broke; a single final error message hides which tool call or reasoning step actually failed.
Outputs are inconsistent and ignore formatting instructions.
→Use a clear system message, few-shot examples, and explicit output constraints; for strict shape, enable structured outputs / JSON schema.
Why: Structured prompting and schema-enforced outputs make results reliable; raising temperature or retrying blindly doesn't fix instruction-following.
Reference↗
A creative copy task feels too repetitive; a data-extraction task is too random.
→Raise temperature/top-p for the creative task and lower them toward 0 for extraction to make it deterministic.
Why: Sampling params trade diversity against determinism; switching models is overkill when the parameter setting is the real cause.
A reasoning agent makes avoidable logic errors on hard tasks.
→Add a reflection / self-critique step where the agent reviews and revises its draft, or use a reasoning model for the step.
Why: Chain-of-thought and self-critique improve hard-task accuracy; a single forward pass has no chance to catch its own mistake.
Operations needs token spend, latency, and safety signals per request in production.
→Emit OpenTelemetry traces and metrics from the app to Azure Monitor / Application Insights, capturing tokens, latency, and content-safety flags.
Why: Unified observability ties cost, performance, and safety together; scraping logs by hand can't correlate a slow turn with its token usage.
Reference↗
One app mixes cheap classification with occasional complex reasoning.
→Orchestrate multiple deployments: route simple turns to an SLM and escalate hard turns to a frontier LLM behind one app layer.
Why: Model routing optimizes cost and quality per turn; using one premium model for everything overpays for the easy majority.