Playbook — PCDE Google Cloud Professional Cloud Database Engineer

Last reviewed: May 2026

A scannable reference of architectural patterns the PCDE exam tests. Read top-to-bottom, or jump to a section.

Domain 1: Design scalable and highly available cloud database solutions

Global e-commerce platform requiring ACID transactions, strong consistency, and 99.999% availability across multiple continents.

Cloud Spanner with a multi-region configuration (e.g., nam-eur-asia).

Why: Spanner is the only GCP managed service providing globally distributed, strongly consistent ACID transactions at scale with a 99.999% SLA.

Reference

Migrating a large, high-performance Oracle OLTP database with complex stored procedures and analytical query needs.

AlloyDB for PostgreSQL.

Why: AlloyDB offers superior PostgreSQL performance, Oracle compatibility features, and a columnar engine for accelerating analytical queries (HTAP) without impacting transactional workloads.

Reference

High-throughput (millions of OPS) ingestion of time-series data (e.g., IoT, logs) requiring low-latency reads and automatic data expiration.

Cloud Bigtable with a row key design of `(entity_id)#(reverse_timestamp)` and a garbage collection policy.

Why: Bigtable is designed for massive-scale, low-latency key/value workloads. A reverse timestamp in the row key co-locates recent data for efficient scans. Garbage collection handles TTL.

Reference

Mobile or web application requiring a flexible schema, real-time data synchronization to clients, and offline support.

Firestore in Native Mode.

Why: Firestore is purpose-built for this serverless app backend pattern, providing real-time listeners and offline persistence via its client SDKs out-of-the-box.

Reference

Large-scale (10M+ vectors) similarity search for AI/ML applications (e.g., RAG, recommendations) needing sub-100ms latency.

AlloyDB for PostgreSQL with pgvector extension and a ScaNN index.

Why: AlloyDB integrates Google's high-performance ScaNN algorithm for approximate nearest neighbor (ANN) search, outperforming standard vector search implementations at scale.

Designing a Cloud Spanner schema for a write-heavy workload to prevent hotspots on a single server.

Design primary keys that do not use monotonically increasing values (e.g., sequential IDs, timestamps) as the first key part. Use UUIDs, hashed values, or bit-reversed sequences instead.

Why: Spanner distributes data lexicographically by primary key. Sequential keys direct all writes to a single split, creating a hotspot. Randomly distributed keys spread writes across all splits.

Reference

A Spanner schema has a strong parent-child relationship (e.g., Customers and Orders) and queries frequently fetch a parent with all its children.

Use interleaved tables, defining the child table with `INTERLEAVE IN PARENT`.

Why: Interleaving physically co-locates child rows with their parent row in storage. This makes parent-child joins extremely efficient, as it becomes a highly optimized range scan on a single split.

Tracking real-time locations for a massive fleet of vehicles (50k+ writes/sec) with queries to find vehicles within a geographic area.

Cloud Bigtable with a row key prefixed by a GeoHash of the vehicle's location.

Why: Bigtable handles the extreme write throughput. GeoHash encoding converts 2D coordinates into a 1D string where prefixes represent geographic proximity, enabling efficient geospatial range scans.

Storing and analyzing petabyte-scale data (e.g., genomic data, logs) with complex analytical SQL queries.

Store raw data in Cloud Storage and query it directly from BigQuery using external tables, or load into native BigQuery storage.

Why: BigQuery is a serverless data warehouse built for petabyte-scale analytics. Its separation of storage and compute provides unparalleled query performance and cost-effectiveness for OLAP workloads.

A high-availability in-memory cache for complex data structures (hashes, sets) with pub/sub capabilities for cache invalidation.

Memorystore for Redis Standard Tier with read replicas.

Why: Standard Tier provides a 99.9% SLA with automatic failover. Redis supports complex data types and pub/sub, unlike Memcached. Read replicas can scale read throughput.

Designing a multi-tenant SaaS application on Spanner requiring strong data isolation and performance guarantees per tenant.

Use tenant_id as the first component of the primary key for all tables. For stronger isolation, use a database-per-tenant model within a single Spanner instance.

Why: A tenant_id prefix naturally co-locates all of a single tenant's data, optimizing queries and allowing Spanner to split data by tenant. Database-per-tenant provides the strongest logical isolation.

Domain 2: Manage a solution that can span multiple database solutions

A Cloud SQL database is experiencing slow query performance and high CPU usage.

Use Query Insights to identify the most resource-intensive queries, analyze their execution plans, and identify missing indexes or inefficient patterns.

Why: Query Insights is the primary, built-in tool for diagnosing query performance in Cloud SQL. It visualizes query load, identifies wait events, and helps pinpoint the root cause without third-party tools.

An organization needs a single dashboard and alerting policy set for dozens of database instances spread across multiple GCP projects.

Create a Cloud Monitoring workspace in a central project and configure its "metrics scope" to include all projects containing database instances.

Why: Metrics scopes allow a single Monitoring workspace to aggregate and display metrics from multiple projects, providing a unified view without data duplication or complex configuration.

Need to provision and manage Cloud SQL instances across dev, staging, and prod environments consistently and with version control.

Use Terraform with the Google Cloud provider. Define a Cloud SQL module and use separate `.tfvars` files for each environment.

Why: Terraform provides Infrastructure as Code (IaC), enabling repeatable, auditable, and version-controlled deployments. This avoids manual configuration errors and ensures consistency across environments.

A contractor needs temporary elevated database access that must be revoked automatically after 4 hours.

Grant the necessary IAM role with an IAM Condition that uses a time-based expression (`request.time < timestamp(...)`).

Why: IAM Conditions provide a native, secure way to grant time-limited access without manual cleanup, which is error-prone. Access is automatically denied after the timestamp expires.

A security policy requires that all database disk encryption uses customer-managed keys (CMEK) with controlled rotation.

Configure the Cloud SQL or AlloyDB instance to use a key from Cloud KMS. Configure automatic rotation on the KMS key.

Why: CMEK provides control and auditability over the keys used for at-rest encryption. Cloud KMS handles key lifecycle management, including automated rotation, seamlessly.

Compliance requires capturing all SQL queries executed on a Cloud SQL for PostgreSQL instance, with logs retained for 7 years.

Enable the `pgaudit` extension on the instance. Configure Cloud Audit Logs for Data Access. Create a log sink from Cloud Logging to BigQuery for long-term retention and analysis.

Why: pgaudit provides detailed SQL-level auditing. Sinking logs to BigQuery is the standard, cost-effective pattern for long-term, searchable log retention beyond Cloud Logging's default.

Data analysts need to run heavy analytical queries on production Cloud SQL data without impacting the transactional workload.

Create a read replica and direct all analytical queries to it. For more complex analytics, use BigQuery federated queries against the read replica.

Why: A read replica completely isolates analytical read traffic from the primary instance, protecting OLTP performance. Federation allows using BigQuery's powerful engine without a separate ETL pipeline.

A Bigtable cluster shows uneven CPU load, with some nodes heavily utilized while others are idle, indicating a performance bottleneck.

Use the Key Visualizer tool in the Cloud Console to analyze the access patterns and identify the specific row key ranges that are being accessed too frequently (hotspotting).

Why: Key Visualizer is the purpose-built diagnostic tool for Bigtable performance issues. It provides a heat map of key access, making it easy to identify hotspots that need to be addressed via schema redesign.

Need to replicate changes from a Cloud SQL OLTP database to a BigQuery data warehouse in near real-time.

Use Datastream to configure a Change Data Capture (CDC) stream from the source Cloud SQL instance directly to BigQuery.

Why: Datastream is a managed, low-latency CDC service that reads database logs, minimizing impact on the source. It handles schema drift and delivers changes reliably to BigQuery.

A Cloud Run application is exhausting database connections due to rapid scaling during traffic spikes.

Deploy the Cloud SQL Auth Proxy as a sidecar container and configure it for connection pooling (or use it with a dedicated pooler like PgBouncer).

Why: Serverless platforms can scale to thousands of instances, overwhelming database connection limits. A connection pooler multiplexes these numerous, ephemeral application connections onto a small, stable set of database connections.

Domain 3: Migrate data solutions

Migrating a large (5TB) on-premises MySQL database to Cloud SQL for MySQL with a maximum downtime of 30 minutes.

Use the Database Migration Service (DMS) to configure a continuous replication job. DMS performs an initial load and then streams changes until cutover.

Why: DMS is the managed solution for minimal-downtime migrations. Continuous replication means the only downtime is the time it takes to stop writes, wait for the final sync, and point the application to the new database.

Reference

Migrating an Oracle database to AlloyDB for PostgreSQL, including complex PL/SQL stored procedures.

Use DMS for data migration. Use schema conversion tools (like Ora2Pg or DMS Schema Conversion) to convert schemas and PL/SQL to PL/pgSQL, followed by manual review and testing.

Why: Heterogeneous migrations require both data migration (handled by DMS) and schema/code conversion. Automated tools handle ~80% of the conversion, but manual effort is always required for Oracle-specific features.

Need to verify data integrity and completeness after migrating a database from an on-premises datacenter to Google Cloud.

Use the open-source Data Validation Tool (DVT). Configure it to compare row counts, column-level aggregations (min, max, sum), and row-level hashes between source and target.

Why: DVT provides a comprehensive, scalable, and customizable framework for data validation that goes beyond simple row counts, catching subtle data corruption or transformation issues.

Migrating a sharded MySQL application to a single, globally consistent database.

Use multiple parallel Dataflow jobs to migrate each shard concurrently into a single Cloud Spanner database. Redesign the schema to eliminate the need for application-level sharding.

Why: Spanner is designed to replace complex sharded architectures. A parallel migration approach with Dataflow is the most time-efficient way to consolidate large, sharded datasets into Spanner.

Migrating a SQL Server database using Windows Authentication (Active Directory) to Cloud SQL for PostgreSQL.

Integrate Cloud SQL with Cloud Identity using IAM database authentication. Sync AD groups to Google Groups via GCDS, and map database roles to these groups.

Why: This approach replicates the centralized, group-based access control model of AD in a cloud-native way, avoiding manual user/password management and leveraging existing identity structures.

Migrating an application from Amazon DynamoDB to Cloud Bigtable.

Map the DynamoDB composite primary key (partition key + sort key) to a concatenated Bigtable row key, separated by a delimiter (e.g., `partitionKey#sortKey`).

Why: This row key design preserves the query capabilities of the DynamoDB composite key, allowing for efficient lookups by partition key prefix and range scans on the sort key portion.

Domain 4: Deploy and maintain database solutions for continuous operation

An application connecting to a high-availability Cloud SQL instance must survive a zonal failover without manual intervention.

Connect to the database using the Cloud SQL Auth Proxy with the instance connection name (project:region:instance), not a static IP address.

Why: The instance IP address changes during a failover. The Auth Proxy and instance connection name provide a stable endpoint that automatically resolves to the current primary instance's IP address.

Reference

A global Spanner application has users in North America and Asia. Writes originate mostly in NA, but Asian users need low-latency reads.

Use a multi-region configuration with the leader region in North America (`nam*`). Reads in Asia will be served by local read-only replicas.

Why: Writes in Spanner are routed through the leader region, so placing it near the write source minimizes write latency. Read replicas in other regions provide low-latency reads for globally distributed users.

An AlloyDB-backed application has a 10:1 read-to-write ratio and needs to scale to handle high read traffic while maintaining 99.99% availability.

Configure the primary instance with high availability and add multiple read pool instances. Direct read traffic to the read pool.

Why: AlloyDB high availability provides the 99.99% SLA. Read pool instances are designed for horizontal read scaling, offloading traffic from the primary instance to dedicated read-optimized nodes.

A latency-sensitive Cloud SQL instance with SSD storage has insufficient I/O performance.

Increase the provisioned storage size of the instance.

Why: In Cloud SQL, both read and write IOPS scale linearly with the amount of provisioned persistent disk storage. Increasing the disk size is the direct way to increase the available IOPS.

Need to deploy a risky schema change to a critical Cloud SQL database with a rapid rollback capability.

Create a read replica of the production (blue) instance. Promote the replica to a standalone instance (green), apply and validate the schema changes. Then, redirect application traffic to the green instance. Keep blue running for rollback.

Why: This pattern allows for full testing of changes on a production-scale copy of the data without impacting the live system. Traffic can be switched instantly, and rollback is as simple as pointing traffic back to the blue instance.

Need to test a database disaster recovery plan quarterly without affecting the production environment.

Create a temporary test instance by restoring from a recent production backup. Execute the documented DR procedures against this test instance, including simulated failover and application reconnection tests.

Why: Testing on a restored backup provides a realistic environment to validate RTO/RPO and recovery procedures without the risk of causing a production outage.

A Cloud Run service needs to connect to a Cloud SQL instance securely without traffic traversing the public internet.

Configure Cloud SQL with a private IP. Create a Serverless VPC Access connector in the same VPC and configure the Cloud Run service to route traffic through it.

Why: This is the standard, secure pattern for connecting serverless compute to VPC-native resources. The connector bridges the serverless environment and your VPC, keeping all traffic on Google's private network.

Adding a new, non-nullable column to a massive, actively-written Cloud Spanner table without downtime.

1. Add the column as nullable. 2. Update application code to write to the new column. 3. Backfill existing rows in batches using Dataflow. 4. After backfill, alter the column to be NOT NULL.

Why: This multi-step process is the standard online schema change pattern for large tables. It avoids locking the table for a long duration or causing a massive, performance-impacting backfill operation in a single transaction.