Augment a foundation model with private company data (PDFs, docs, S3 content) without fine-tuning.
→Create an Amazon Bedrock Knowledge Base. Bedrock handles ingestion, chunking, embedding, and retrieval (RAG) at inference time.
Why: Cheaper and faster to update than fine-tuning. Source data changes → re-sync the KB; no retraining.
Reference↗
Data changes frequently (inventory, pricing, news) and the model must reflect current state.
→RAG with a knowledge base. Avoid fine-tuning — retraining cycles can't keep up.
Why: RAG separates the model from the data; the KB updates independently of the model.
Fine-tune a foundation model with labeled examples for a specific task.
→Provide prompt-completion (instruction-response) pairs. JSONL format is standard.
Why: Instruction fine-tuning teaches the model to map user inputs to desired outputs in the target task.
Reference↗
Teach a foundation model specialized vocabulary (medical, legal, scientific) using lots of unlabeled domain text.
→Continued pre-training on the unlabeled domain corpus.
Why: Continued pre-training updates the model's understanding of vocabulary and concepts; instruction fine-tuning teaches task behavior. Different goal, different data shape.
Reference↗
Multi-step workflow that combines LLM reasoning with calls to external APIs, databases, or AWS services.
→Amazon Bedrock Agents — orchestrates LLM reasoning, tool/API invocation, and result synthesis in a single managed runtime.
Why: Agents plan steps, call tools, and stitch results back into a final response without you writing the orchestration loop.
Reference↗
Pick a vector database for embeddings.
→Managed RAG → Bedrock Knowledge Bases (handles vector store automatically). Custom vector DB → OpenSearch Service (k-NN), Aurora PostgreSQL with pgvector, Neptune Analytics, or RDS for PostgreSQL with pgvector.
Why: OpenSearch is the default for high-scale k-NN; pgvector reuses an existing relational DB.
Reference↗
Deploy a fine-tuned model from Bedrock for production serving.
→Buy Provisioned Throughput for the custom Bedrock model. Custom models cannot be invoked via on-demand pricing.
Why: Custom-model capacity is dedicated, billed in model units, and required for invocation.
Reference↗
Estimate or reduce Bedrock inference cost.
→Cost ≈ tokens processed × per-token rate. Reduce by shortening prompts, trimming few-shot examples, picking smaller models, or using prompt caching where supported.
Reference↗
Generate high-accuracy labeled data with human-in-the-loop review (e.g. specialized images, medical records).
→Amazon SageMaker Ground Truth Plus — managed HITL labeling workforce.
Why: For periodic auditing of low-confidence model predictions, pair with Amazon A2I (Augmented AI).
Reference↗
Speech recognition mishears domain-specific terms (medical, legal, brand names).
→Amazon Transcribe with a custom language model or custom vocabulary trained on domain text.
Reference↗
Model performs well on training but poorly in production (overfit) — increase generalization without changing architecture.
→Increase volume and diversity of training data. Don't cut data or only add hyperparameters.
Why: More representative data is the highest-leverage fix; regularization and early stopping help but data dominates.
Evaluate generative output quality.
→Translation quality → BLEU. Summarization quality → ROUGE. Semantic similarity to reference → BERTScore. Stylistic preference → human evaluation with custom prompt sets.
Pick a Bedrock foundation model for a use case where output style matters.
→Run human evaluation on a custom prompt dataset across candidate models. Don't rely on public leaderboards or latency metrics alone.
Why: Style/tone fit is subjective; benchmarks miss it.
Reference↗
Generate charts and dashboards from natural-language questions over business data.
→Amazon Q in QuickSight — natural-language BI over QuickSight datasets.
Reference↗