Playbook — CCA-F Claude Certified Architect — Foundations

Last reviewed: May 2026

A scannable reference of architectural patterns the CCA-F exam tests. Read top-to-bottom, or jump to a section.

Agentic Architecture & Orchestration

Choosing between one agent and a multi-agent swarm for a complex workflow.

Start with a single agent + tools. Split into multiple agents only when task boundaries are clear, context windows overflow, or different model tiers are needed per sub-task.

Why: Multi-agent adds latency, error surface, and orchestration cost. Most production workloads succeed with one well-tooled agent.

Agent must reason about observations before acting again.

Implement a ReAct (Reason + Act) loop: the model generates a thought, selects a tool, receives the result, and repeats until a stop condition is met.

Why: ReAct makes intermediate reasoning visible, improving debuggability and letting you audit the chain of thought.

Agent needs to interact with external systems (APIs, databases, file systems).

Define tools via the tool_use API. The model emits a tool_use block; your code executes it and returns a tool_result. The model then continues.

Reference

Orchestrator must dispatch heterogeneous sub-tasks (code review, web search, data analysis).

Use a supervisor agent that decomposes the goal, delegates to specialist sub-agents, and aggregates results. Each sub-agent has its own system prompt and tool set.

Multiple sub-agents must coordinate without direct peer-to-peer communication.

Route all inter-agent messages through a supervisor. The supervisor decides which sub-agent runs next, passes context, and enforces ordering constraints.

Why: Direct peer messaging creates cycles and makes state hard to track. A central supervisor keeps the execution DAG explicit.

Agent must remember context across a multi-turn session.

Pass the full conversation history (system + prior user/assistant turns) in the messages array. For long sessions, summarize older turns to stay within the context window.

Agent needs persistence across sessions or across users.

Store facts in an external memory layer (vector DB, key-value store, file). Retrieve relevant memories via RAG and inject into the system prompt each turn.

Team defaults to agentic architecture for every LLM feature.

Do not use agents when a single prompt + structured output suffices. Agents add latency, cost, and failure modes. Reserve agentic loops for tasks requiring iteration or tool use.

Complex reasoning task needs more internal deliberation before the answer.

Enable extended thinking with a budget_tokens parameter. The model uses a thinking block before responding, improving accuracy on multi-step problems.

Why: Extended thinking trades latency for quality. Set budget_tokens proportional to task complexity; cap it to control cost.

Reference

Tool call returns an error; agent must recover gracefully.

Return the error as a tool_result with is_error: true. The model sees the failure and can retry with corrected parameters, try an alternative tool, or explain the failure to the user.

Reference

Transient API failures (429, 529) during an agentic loop.

Implement exponential backoff with jitter. On 429 (rate limit), respect the retry-after header. On 529 (overloaded), back off longer. Never retry 400-class errors blindly.

Measuring whether an agentic system actually improves over time.

Build an eval suite: define input-output pairs, run the agent, score outputs (exact match, LLM-as-judge, human review). Track pass rate per release.

Why: Without evals, prompt tweaks are guesswork. Regression detection requires automated, repeatable scoring.

Agent produces low-quality output on the first pass.

Add a reflection step: after generating an answer, prompt the model to critique its own output and revise. Use a separate message turn or extended thinking.

Agentic workflow performs irreversible actions (deleting resources, sending emails).

Insert a checkpoint before destructive operations. Present the planned action to the user, wait for approval, then execute. Log the decision for audit.

Claude Code Configuration & Workflows

Multiple CLAUDE.md files in a monorepo; unclear which takes precedence.

Three tiers: ~/.claude/CLAUDE.md (user), project-root CLAUDE.md (project), workspace-child CLAUDE.md (workspace). All are merged; workspace overrides project overrides user.

Reference

Team wants reusable prompts invoked as /my-command.

Create a .claude/commands/<name>.md file containing the prompt template. Invoke with /<name>. Use $ARGUMENTS for user input. Project-scoped commands live in the repo.

Reference

Run a linter automatically after Claude edits a file.

Configure a PostToolUse hook in settings.json that matches the Write/Edit tool. The hook script runs after the tool completes; non-zero exit blocks the change.

Reference

Claude Code prompts for permission on every shell command; slowing iteration.

Use allowlist patterns in settings.json under permissions.allow. Three modes: default (prompt for each), allowlist (auto-approve matched patterns), and yolo (auto-approve all — not recommended for production).

Reference

Developer wants personal overrides not committed to the repo.

settings.json is committed (team defaults). settings.local.json is gitignored (personal overrides). Local merges on top of project settings.

Running Claude Code in a CI pipeline with no interactive terminal.

Use `claude -p "prompt" --output-format json` in headless mode. Pipe input via stdin, parse structured output. Set ANTHROPIC_API_KEY as a CI secret.

Reference

Claude Code needs access to a custom MCP server (database, internal API).

Add the server to mcpServers in settings.json with command and args. Claude Code launches the MCP server as a child process and discovers tools at startup.

Reference

Claude Code working on a feature branch while you develop on main.

Use git worktrees. Claude Code operates in the worktree directory; your main checkout is untouched. Avoids index conflicts and stash juggling.

Claude Code generates changes but you want clean atomic commits.

Claude Code tracks file changes and can create commits with messages. Review the diff before committing. Prefer staging specific files over git add -A to avoid leaking secrets.

Using Claude Code from VS Code or JetBrains.

Install the Claude Code extension. It embeds the CLI as a panel inside the IDE, sharing the same CLAUDE.md, hooks, and settings. Terminal-based and IDE-based sessions are interchangeable.

Reference

Prompt Engineering & Structured Output

Long prompt with multiple sections; model confuses instructions with data.

Wrap sections in XML tags: <instructions>, <context>, <examples>. Claude is trained to respect XML boundaries as structural delimiters.

Defining persistent behavior across all turns (tone, constraints, persona).

Place invariant instructions in the system prompt. Keep it concise: role, constraints, output format. User messages carry per-turn context; system carries session-wide rules.

Force the model to start its response with a specific prefix (e.g., JSON opening brace).

Add a partial assistant message at the end of the messages array. Claude continues from where you left off. Useful for enforcing output format.

Model output format is inconsistent despite detailed instructions.

Add 2-3 few-shot examples as user/assistant turn pairs before the real query. Examples anchor format, tone, and reasoning style more reliably than prose instructions.

Model skips reasoning steps on multi-step logic problems.

Prompt with "Think step by step" or use extended thinking. For production, use extended thinking (budget_tokens) rather than prompting for visible chain-of-thought to keep output clean.

Reference

Choosing between deterministic and creative outputs.

temperature=0 for deterministic tasks (classification, extraction). temperature=0.5-0.7 for creative writing. temperature=1.0 for maximum diversity. Note: extended thinking requires temperature=1.

Need guaranteed valid JSON output from the model.

Define a tool with the desired JSON schema as input_schema. Set tool_choice to force that tool. The model returns structured JSON in the tool_use block, validated against the schema.

Reference

User-facing app needs low time-to-first-token.

Use stream=true on the Messages API. Process server-sent events incrementally: content_block_start, content_block_delta, message_stop. Display tokens as they arrive.

Processing thousands of prompts where latency is not critical.

Use the Message Batches API. Submit up to 100k requests per batch. Results arrive within 24 hours at 50% cost reduction. Poll or use a webhook for completion.

Reference

Extracting data from scanned documents or images.

Pass images as base64 content blocks (type: image) or PDF pages (type: document) in the user message. Claude processes up to 20 MB per request. Prefer native PDFs over screenshots for text-heavy docs.

Reference

Choosing between Opus, Sonnet, and Haiku for a workload.

Opus: highest capability, complex reasoning, agentic tasks. Sonnet: balanced performance/cost, general production use. Haiku: fastest and cheapest, classification, routing, simple extraction.

Reference

Repeated calls share the same long system prompt; want to reduce cost.

Mark cacheable content with cache_control: { type: "ephemeral" }. Cached prefixes are reused across calls for up to 5 minutes (auto-extended on hit). Write cost is 25% more; read cost is 90% less.

Reference

Tool Design & MCP Integration

Defining a tool for the Claude Messages API.

Each tool has name, description, and input_schema (JSON Schema). The description tells Claude when to use it; the schema validates parameters. Keep descriptions action-oriented and concise.

Reference

Tool executed successfully; need to return the result to Claude.

Send a user message with role: "user" and a tool_result content block. Include the tool_use_id to correlate. Return data as text or structured content; keep payloads under 100k tokens.

Reference

Agent needs to fetch data from three independent sources simultaneously.

Claude can emit multiple tool_use blocks in a single response. Execute them in parallel, then return all tool_result blocks in one user message. Reduces round trips.

Reference

Understanding the Model Context Protocol component model.

Three roles: Host (application like Claude Code), Client (protocol handler per server), Server (exposes tools/resources/prompts). Clients maintain 1:1 connections with servers.

Reference

Choosing how an MCP client connects to a server.

stdio: local process, simplest setup. SSE: HTTP-based, legacy. Streamable HTTP: current standard for remote servers, supports resumability and server-initiated messages.

Reference

Deciding which MCP primitive to expose.

Resources: read-only data (files, DB rows) the client pulls. Tools: actions the model invokes (write, compute, query). Prompts: reusable prompt templates the user selects. Tools are model-controlled; resources are application-controlled.

Reference

Creating a custom MCP server to expose internal APIs.

Use the MCP SDK (TypeScript or Python). Implement tool handlers with input schemas. Register via server.tool(). Transport: stdio for local, streamable HTTP for remote.

Reference

Agent must interact with a GUI application (clicking, typing, screenshots).

Enable computer use tools: computer_20250124 (screenshot + mouse + keyboard), text_editor_20250124, bash_20250124. The model receives screenshots and emits coordinate-based actions.

Reference

Model must always call a specific tool rather than responding with text.

Set tool_choice to { type: "tool", name: "my_tool" }. The model is forced to call that tool. Use type: "any" to require some tool call, or type: "auto" (default) to let the model decide.

Reference

Context Management & Reliability

Application hits context limit mid-conversation.

Claude models support 200k tokens. Monitor usage via response.usage. When approaching the limit, summarize older turns or truncate. Never silently drop messages.

Reference

Processing a 150-page document that fills most of the context window.

Place the document early in the prompt (after system). Put questions last. Use prompt caching to avoid re-sending on follow-ups. For multi-doc tasks, use RAG to select relevant chunks.

Reference

Knowledge base is too large to fit in context; model needs access at query time.

Embed and index documents in a vector store. At query time, retrieve top-k chunks, inject into the user message. Cite source documents in the output for traceability.

Model confidently states incorrect facts.

Ground responses in provided context (RAG). Instruct the model to say "I don't know" when evidence is insufficient. Use citations. Validate factual claims against source documents programmatically.

Application receives 429 (rate limited) or 529 (overloaded) responses.

429: you hit your tier rate limit. Back off and retry; respect retry-after. 529: Anthropic API is overloaded. Back off longer. Both are transient. Never retry 400 or 401.

Monthly API spend is higher than expected.

Use prompt caching for repeated prefixes (90% read discount). Route simple tasks to Haiku. Use the Batch API for async workloads (50% discount). Monitor token usage per endpoint. Trim unnecessary context.

Reference

Need visibility into per-request token consumption.

Every Messages API response includes usage.input_tokens, usage.output_tokens, and (if cached) usage.cache_read_input_tokens. Log these per call, aggregate by endpoint, set budget alerts.