Discovering, testing, and deploying a variety of foundation models on Google Cloud.
→Use Vertex AI Model Garden as the central catalog for Google’s proprietary models (Gemini), open-source models (Llama, Mistral), and partner models.
Why: Model Garden is the unified entry point for accessing a curated set of foundation models, simplifying discovery and deployment within an enterprise-grade environment.
Reference↗
An AI assistant needs to answer questions about frequently changing information, like product inventory or recent news.
→Implement a Retrieval-Augmented Generation (RAG) pattern. Connect the LLM to an external, up-to-date knowledge base (e.g., a database, document store).
Why: RAG allows the model to access real-time information at inference time, overcoming its knowledge cutoff and providing accurate, current answers.
Building an enterprise search engine or a conversational AI agent grounded in company data.
→Use Vertex AI Search and Conversation (part of Agent Builder). Point it to your data sources (websites, documents) to create a search app or chatbot.
Why: This is a managed, low-code solution for building grounded, enterprise-grade search and chat applications, significantly reducing development complexity.
A model needs to learn a highly specialized skill, terminology, or consistent behavior that prompting alone cannot achieve.
→Perform supervised fine-tuning on a foundation model using a curated dataset of high-quality examples.
Why: Fine-tuning adapts the model’s internal weights, making it an expert in a specific domain. It is more powerful than prompting for deep specialization.
Need to customize a foundation model for a specific domain but lack the resources for full fine-tuning.
→Use a Parameter-Efficient Fine-Tuning (PEFT) method like LoRA or adapter tuning available in Vertex AI.
Why: PEFT tunes only a small fraction of the model’s parameters, achieving significant customization with drastically lower computational cost and time.
A model is failing at tasks that require complex, multi-step reasoning (e.g., math problems, logic puzzles).
→Use chain-of-thought (CoT) prompting. Instruct the model to "think step by step" before giving the final answer.
Why: CoT encourages the model to break down a problem, which has been shown to significantly improve its reasoning ability and final answer accuracy on complex tasks.
The model needs to consistently generate output in a specific format (e.g., JSON, a certain writing style).
→Use few-shot prompting. Provide 2-5 examples of the desired input-output pattern directly in the prompt.
Why: Providing examples is more effective than just describing the format. The model learns the pattern and applies it to the new request.
Choosing the right Gemini model variant for a specific use case.
→Use Gemini Pro for complex, high-quality reasoning. Use Gemini Flash for high-volume, low-latency, and cost-sensitive tasks. Use Gemini Nano for on-device applications.
Why: Selecting the right model size is a critical trade-off between capability, speed, and cost. Using the smallest model that meets the requirement is a best practice.
Applying generative AI capabilities (e.g., summarization, sentiment analysis) to data stored in a BigQuery data warehouse.
→Use BigQuery ML to call Vertex AI foundation models directly with SQL commands. Process the data in place without moving it.
Why: This simplifies the architecture, improves security by keeping data within BigQuery, and allows data analysts to leverage AI using familiar SQL syntax.
Boosting productivity for business users within their existing tools like Gmail, Docs, and Sheets.
→Integrate Gemini for Google Workspace. This provides AI assistance directly inside the Workspace applications for tasks like drafting emails, summarizing documents, and analyzing data.
Why: This brings AI capabilities to users in their familiar workflow, accelerating adoption and providing immediate productivity benefits without context switching.
Improving developer velocity and code quality.
→Provide developers with Gemini Code Assist, which integrates into IDEs to offer code completion, generation, explanation, and test creation.
Why: AI code assistants reduce time spent on boilerplate code, help understand complex codebases, and improve overall developer productivity.
Choosing the right tool for generative AI experimentation and development.
→Use Google AI Studio for quick, no-cost web-based prototyping with Gemini models via an API key. Use Vertex AI Studio for enterprise-grade development with GCP integration, security controls, and MLOps capabilities.
Why: Google AI Studio is for rapid prototyping; Vertex AI Studio is the path to production, offering enterprise security, data governance, and scalability.
An AI agent needs to adopt a specific persona, follow rules, and maintain a consistent tone across conversations.
→Define the agent's behavior using a system prompt. This instruction is provided to the model separately from the user query to guide its overall conduct.
Why: A system prompt is the most effective way to establish durable, consistent behavioral guidelines without having to repeat them in every user-facing prompt.
A solution requires a common, specific AI capability like translation, speech-to-text, or text-to-speech.
→Use the pre-trained, purpose-built APIs: Cloud Translation API, Speech-to-Text API, or Text-to-Speech API.
Why: These managed APIs are highly optimized for their specific task and are more cost-effective and simpler to implement than using a general-purpose LLM for the same function.