Google Cloud Professional Data Engineer
225 practice questions
Last reviewed: April 2026
Personal notes and resource links for your study journey
Filter by Certification
The Google Cloud Professional Data Engineer (PDE) validates the ability to design, build, secure, and operationalize data-processing systems on Google Cloud. The exam is one of the more popular GCP Professional credentials and consistently ranks among the highest-paying single data certifications in the market. Expect deep coverage of BigQuery (partitioning, clustering, materialized views, BI Engine, BigLake, Omni), Dataflow (Apache Beam batch and streaming, windowing, watermarks), Pub/Sub, Dataproc, Cloud Composer (managed Airflow), Dataform, Dataplex, Datastream, and Vertex AI integration for ML pipelines. Question style is scenario-heavy and rewards candidates who think in terms of cost, latency, freshness, and schema-evolution tradeoffs simultaneously.
Source-system analysis, data-warehouse vs. data-lake vs. lakehouse design, schema modeling for BigQuery (denormalized, nested, ARRAY/STRUCT), choosing the right storage (BigQuery vs. Bigtable vs. Spanner vs. Firestore vs. Cloud SQL). 22%.
Largest domain at 25%. Pub/Sub patterns, Dataflow batch and streaming with Apache Beam (windowing, triggers, watermarks, exactly-once semantics), Dataproc Spark jobs, Datastream CDC, Storage Transfer Service.
BigQuery partitioning and clustering, materialized views, BI Engine, BigLake external tables, table-level snapshots and time travel, Bigtable schema design, Cloud Storage class transitions. 20%.
BigQuery SQL (window functions, ARRAY/STRUCT manipulation, search indexes), BigQuery ML, Looker semantic model basics, federated queries to Cloud SQL / Spanner / Cloud Storage, Vertex AI integration. 15%.
Cloud Composer DAGs, Dataform workflows, BigQuery scheduled queries, slot reservations and on-demand pricing, monitoring with Cloud Monitoring, IAM at dataset / table / column / row level. 18%.
Services you'll encounter on the exam and why each one matters.
Serverless columnar data warehouse with separated storage/compute, on-demand and reservation slots, BigQuery ML for in-warehouse modelling, and materialized views for incremental aggregates.
Why it's on the exam: BigQuery is the headline analytics surface across all five PDE domains β partitioning, clustering, slot reservations, and query optimization dominate the exam.
Object storage that anchors the GCP data lake β landing/curated/consumption zones, multi-region and dual-region buckets, lifecycle policies, and source for every downstream analytics service.
Why it's on the exam: Every PDE storage and ingestion scenario assumes Cloud Storage as the substrate; storage classes, retention policies, and signed-URL access patterns drive Storing the Data questions.
Fully managed Apache Beam runner for unified streaming and batch pipelines, with autoscaling workers, Streaming Engine, and Flex Templates for repeatable deployments.
Why it's on the exam: Dataflow is the canonical answer in Ingesting and Processing β questions on windowing, triggers, exactly-once semantics, and streaming vs. batch tradeoffs all land here.
Managed Spark, Hadoop, Hive, Presto, and Flink clusters with ephemeral autoscaling, Dataproc Serverless for batch Spark, and Spark-on-GKE for shared infra.
Why it's on the exam: PDE expects Dataproc as the migration target for existing Spark/Hadoop workloads β ephemeral vs. long-running, autoscaling policies, and Dataproc-vs-Dataflow choices appear in Designing data processing.
Globally distributed messaging service for asynchronous ingest, with at-least-once delivery, ordering keys, dead-letter topics, and Pub/Sub Lite for cost-optimized regional streams.
Why it's on the exam: Pub/Sub is the default streaming ingestion surface in Ingesting and Processing β delivery semantics, subscription types, and backlog behavior are recurring exam topics.
Managed Apache Airflow service for orchestrating cross-service DAGs spanning BigQuery, Dataflow, Dataproc, and external systems, with Composer 2 running on GKE Autopilot.
Why it's on the exam: Maintaining and Automating workloads tests DAG patterns, retries, and SLA monitoring β Composer is the named orchestrator on PDE versus Workflows for simpler chains.
Globally distributed relational database with strong consistency, horizontal scale, and SQL β used as the operational system of record feeding analytics pipelines.
Why it's on the exam: PDE storage questions distinguish OLTP (Spanner) from OLAP (BigQuery) and ask when Spanner federated queries from BigQuery beat a CDC pipeline.
Wide-column NoSQL service with single-digit-millisecond reads at petabyte scale, optimised for time-series and IoT workloads with HBase API compatibility.
Why it's on the exam: Designing data processing tests row-key design, hotspotting, and SSD-vs-HDD tradeoffs β Bigtable is the GCP answer whenever low-latency analytical reads are required.
Managed PostgreSQL, MySQL, and SQL Server with automated backups, read replicas, and high availability β the relational source for many ingestion pipelines.
Why it's on the exam: Cloud SQL surfaces in Ingesting and Storing as the upstream OLTP database whose changes feed BigQuery via Datastream or scheduled batch exports.
Serverless document database with real-time listeners, ACID transactions, and global replication in Enterprise mode β backs application-tier event capture.
Why it's on the exam: PDE storage scenarios pick Firestore for low-latency app-tier writes that subsequently flow into BigQuery through Eventarc or Pub/Sub.
Unified storage engine that exposes Cloud Storage and external (S3, ADLS) data as governed BigQuery tables with fine-grained access control and Apache Iceberg support.
Why it's on the exam: BigLake is the lakehouse answer in Storing the Data β distinguishes external table federation from native BigQuery storage and enables multi-cloud analytics.
Serverless change-data-capture service that replicates MySQL, PostgreSQL, Oracle, and SQL Server into BigQuery, Cloud Storage, or Cloud SQL with low latency.
Why it's on the exam: Ingesting and Processing tests CDC patterns; Datastream is the GCP-native answer for log-based replication into the warehouse without custom Debezium plumbing.
Managed CDAP-based visual ETL platform with 150+ connectors and a code-free pipeline designer that compiles to Dataproc under the hood.
Why it's on the exam: PDE expects Data Fusion when a question favors low-code visual ETL with broad connector coverage over hand-written Beam in Dataflow.
BigQuery-native SQL workflow service with version control, dependency graphs, assertions, and incremental table materializations β analogous to dbt inside GCP.
Why it's on the exam: Maintaining and Automating tests in-warehouse transformation patterns; Dataform is the canonical SQL-orchestration answer for BigQuery-centric ELT.
Visual data wrangling service for exploring, cleaning, and transforming structured/semi-structured data with intelligent suggestions and recipe export.
Why it's on the exam: Preparing and Using Data for Analysis names Dataprep as the no-code path for analyst-driven data shaping before BigQuery consumption.
Managed service for discovering, classifying, and de-identifying PII across BigQuery, Cloud Storage, and Datastore using inspection templates and transformation jobs.
Why it's on the exam: PDE governance scenarios cite Sensitive Data Protection for masking, tokenizing, or redacting PII before data lands in shared analytics layers.
Project- and resource-scoped permissions for every data service, including BigQuery row-level, column-level, and policy-tag-based fine-grained access.
Why it's on the exam: PDE governance questions on least-privilege access to BigQuery datasets, Cloud Storage buckets, and Pub/Sub topics all return to IAM bindings and conditions.
Managed cryptographic keys with customer-managed encryption keys (CMEK) for BigQuery, Cloud Storage, Pub/Sub, Dataflow, and Spanner, plus Cloud HSM and external-key options.
Why it's on the exam: Encryption-at-rest with CMEK is the canonical PDE answer for protecting regulated analytics data, key rotation, and tenant-isolated multi-team warehouses.
Unified data fabric for cataloging, classifying, securing, and monitoring data across BigQuery, Cloud Storage, and external sources, with built-in data lineage and quality.
Why it's on the exam: PDE governance and data-quality scenarios name Dataplex as the GCP-native catalog/lineage layer for lake + warehouse, replacing standalone Data Catalog.
Unified observability for pipeline runs, BigQuery job metrics, Dataflow worker autoscaling, Pub/Sub backlog, and SLO-based alerting via Cloud Monitoring policies.
Why it's on the exam: Maintaining and Automating workloads expects Cloud Logging + Cloud Monitoring for job-failure alerts, slot utilisation dashboards, and audit-log retention.
$140kβ$195kβ$290k USD annual
Range reflects US-based senior data engineers where GCP is the primary platform. FAANG L5 data engineer TC clears $300k. PDE is consistently cited as one of the highest-paying single data certifications by job-posting salary band; combined with strong Apache Beam / Dataflow experience it commands a premium at GCP shops. Pure analyst-engineer roles trend lower.
Source: levels.fyi 2025β2026 (Google L4βL5 data engineers, FAANG and unicorn senior data engineers), U.S. BLS OEWS May 2024 (15-2051 data scientists, 15-1252 software developers). Figures are approximate; actual compensation depends on role, region, and experience.
PDE is the most-requested GCP data credential and one of the strongest signals for senior data-engineer roles at GCP-heavy companies. Heavy demand at digital-native GCP shops (Spotify, Snap, PayPal, Wayfair, several major retailers and ad-tech companies), BigQuery-centric analytics organizations, and Google Cloud partners with data practices. The cert is also valued at Google itself for customer-engineering data specialists. PDE pairs naturally with the Professional ML Engineer (PMLE) for an end-to-end "data + ML" profile, and with Cloud Architect (PCA) for a broader senior-engineering profile. Holders consistently report strong recruiter response.
There are no formal prerequisites. Google recommends three or more years of industry experience including one or more years designing and managing solutions on Google Cloud. In practice, PDE is not a credible first GCP cert for someone new to data β successful candidates have shipped non-trivial pipelines and have working SQL, Python, and at least conceptual familiarity with Apache Beam.
The Associate Cloud Engineer (ACE) is a common stepping stone but the Associate Data Practitioner (ADP) is a more direct on-ramp for the data-specific content. Strong SQL fluency (window functions, CTEs, ARRAY/STRUCT manipulation), comfort with at least one programming language for Beam pipelines (Python or Java), and familiarity with streaming concepts (windowing, watermarks, exactly-once delivery) are effectively required. The official Data Engineer Learning Path on Google Cloud Skills Boost (around 50β80 hours of labs) is a good baseline.
PDE is rated professional and is consistently hard β many candidates rate it the second-hardest GCP cert after PCA / PCNE, primarily because of the streaming and Dataflow / Apache Beam content. Plan on 100β150 hours of study over 10β14 weeks if PDE is your first GCP professional cert, or 50β80 hours over 5β8 weeks if you already hold ACE / ADP plus production data-engineering experience. The exam is 50β60 multiple-choice / multiple-select questions in 120 minutes, delivered through Pearson VUE (Google migrated from Kryterion / Webassessor in early 2026 β no exams Feb 23 through Mar 1 2026; first Pearson delivery March 2 2026).
The most common stumbling block is Dataflow streaming β windowing strategies (fixed, sliding, session), watermarks, late data, and exactly-once semantics account for a disproportionate share of failed attempts. The second stumbling block is choosing between BigQuery, Bigtable, Spanner, and Cloud SQL for storage scenarios where multiple options are technically viable. Google does not publish numeric scores β only pass/fail. The credential is valid for two years and recertification requires re-passing the current exam.
Current exam guide refreshed in early 2023 to add BigLake, BigQuery Omni, Dataform, Dataplex, and Datastream coverage. Expanded ML-pipeline integration with Vertex AI.
Major refresh that re-balanced the storage and processing domains and added Pub/Sub Lite and Dataflow Prime coverage.
Original general availability β one of the first three Google Cloud Professional credentials.
PDE (Google Cloud Professional Data Engineer) is a a challenging, scenario-heavy exam that requires deep hands-on experience and the ability to make architectural trade-off decisions Professional-level exam. Most candidates need 150β300 hours of study spread over 3β6 months for professional and expert-level exams. These exams typically expect prior associate-level proficiency. Most candidates who score consistently above the passing threshold on practice exams pass on their first attempt.
Most candidates need 150β300 hours of study spread over 3β6 months for professional and expert-level exams. These exams typically expect prior associate-level proficiency. Time-to-pass varies widely by prior experience. Engineers with hands-on production experience in the underlying technology typically need less; candidates new to the platform should plan toward the upper end of that range.
PDE is a recognized credential in the GCP ecosystem and signals validated knowledge to employers, recruiters, and clients. Whether it is worth the time and fee for you depends on your role and goals β it tends to pay off most for cloud engineers, architects, and consultants who work with GCP day-to-day or want to move into roles that do.
The passing score for PDE is Not published. The exam contains 50 questions and lasts 2 hr.
The PDE exam fee is $200 USD. Fees are set by GCP and may vary by region; always confirm the current price on the official GCP certification page before booking.
Google Cloud Professional certifications are valid for 2 years. Recertify by re-passing the current version of the exam.
Yes. You can take the exam online (proctored via the provider's secure browser, available 24/7 in most regions) or at an in-person Pearson VUE test center during business hours. Both formats use the same questions, time limit, and passing score.
CertLabPro provides 15 study modes across the practice question bank for PDE. The exam-simulation mode mirrors the real exam: 50 questions in 2 hr, with the same passing threshold of Not published. Browse mode lets you read every Q&A statically.