Choosing between one agent and a multi-agent swarm for a complex workflow.
→Start with a single agent + tools. Split into multiple agents only when task boundaries are clear, context windows overflow, or different model tiers are needed per sub-task.
Why: Multi-agent adds latency, error surface, and orchestration cost. Most production workloads succeed with one well-tooled agent.
Agent must reason about observations before acting again.
→Implement a ReAct (Reason + Act) loop: the model generates a thought, selects a tool, receives the result, and repeats until a stop condition is met.
Why: ReAct makes intermediate reasoning visible, improving debuggability and letting you audit the chain of thought.
Agent needs to interact with external systems (APIs, databases, file systems).
→Define tools via the tool_use API. The model emits a tool_use block; your code executes it and returns a tool_result. The model then continues.
Reference↗
Orchestrator must dispatch heterogeneous sub-tasks (code review, web search, data analysis).
→Use a supervisor agent that decomposes the goal, delegates to specialist sub-agents, and aggregates results. Each sub-agent has its own system prompt and tool set.
Multiple sub-agents must coordinate without direct peer-to-peer communication.
→Route all inter-agent messages through a supervisor. The supervisor decides which sub-agent runs next, passes context, and enforces ordering constraints.
Why: Direct peer messaging creates cycles and makes state hard to track. A central supervisor keeps the execution DAG explicit.
Agent must remember context across a multi-turn session.
→Pass the full conversation history (system + prior user/assistant turns) in the messages array. For long sessions, summarize older turns to stay within the context window.
Agent needs persistence across sessions or across users.
→Store facts in an external memory layer (vector DB, key-value store, file). Retrieve relevant memories via RAG and inject into the system prompt each turn.
Team defaults to agentic architecture for every LLM feature.
→Do not use agents when a single prompt + structured output suffices. Agents add latency, cost, and failure modes. Reserve agentic loops for tasks requiring iteration or tool use.
Complex reasoning task needs more internal deliberation before the answer.
→Enable extended thinking with a budget_tokens parameter. The model uses a thinking block before responding, improving accuracy on multi-step problems.
Why: Extended thinking trades latency for quality. Set budget_tokens proportional to task complexity; cap it to control cost.
Reference↗
Tool call returns an error; agent must recover gracefully.
→Return the error as a tool_result with is_error: true. The model sees the failure and can retry with corrected parameters, try an alternative tool, or explain the failure to the user.
Reference↗
Transient API failures (429, 529) during an agentic loop.
→Implement exponential backoff with jitter. On 429 (rate limit), respect the retry-after header. On 529 (overloaded), back off longer. Never retry 400-class errors blindly.
Measuring whether an agentic system actually improves over time.
→Build an eval suite: define input-output pairs, run the agent, score outputs (exact match, LLM-as-judge, human review). Track pass rate per release.
Why: Without evals, prompt tweaks are guesswork. Regression detection requires automated, repeatable scoring.
Agent produces low-quality output on the first pass.
→Add a reflection step: after generating an answer, prompt the model to critique its own output and revise. Use a separate message turn or extended thinking.
Agentic workflow performs irreversible actions (deleting resources, sending emails).
→Insert a checkpoint before destructive operations. Present the planned action to the user, wait for approval, then execute. Log the decision for audit.