Last reviewed: May 2026
Build the AWS services on the ADP exam with plain Terraform — one block at a time, each tied back to an exam domain. The same code works on OpenTofu.
By the end of this lab you'll have provisioned, with plain Terraform, the smallest realistic ADP data substrate — a Cloud Storage landing bucket, a BigQuery dataset with one table partitioned by ingestion date, and a BigQuery scheduled query that runs every hour, reading from a public dataset and writing into the table. Four blocks; the GCP analytics on-ramp.
Drop the snippets into a single main.tf, run terraform init, then terraform apply step-by-step.
>= 1.5 or OpenTofu >= 1.6.your-project-id in the provider block.All free at lab scope:
~$0/month at lab volume. Real BigQuery workloads bill on bytes scanned — partition + cluster aggressively and SELECT only what you need.
Enable Cloud Storage, BigQuery, and BigQuery Data Transfer Service (which powers scheduled queries).
terraform {
required_version = ">= 1.5"
required_providers {
google = { source = "hashicorp/google", version = "~> 6.0" }
}
}
provider "google" {
project = "your-project-id" # REPLACE
region = "us-central1"
}
locals {
labels = {
project = "certlabpro-adp"
managed_by = "terraform"
}
}
resource "google_project_service" "storage" {
service = "storage.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "bigquery" {
service = "bigquery.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "bqdts" {
service = "bigquerydatatransfer.googleapis.com"
disable_on_destroy = false
}Every ADP-pattern data pipeline starts with a landing bucket — raw files (CSV / JSON / Parquet / Avro) drop here, downstream jobs read from it. The bucket is the boundary between outside the lake and inside the lake. ADP exam tests the storage class pick again here — Standard for the landing tier (frequent reads in the first 30 days), with a lifecycle rule transitioning to Coldline after 90 days.
Uniform bucket-level access is on (the ADP-recommended security default).
resource "random_id" "suffix" {
byte_length = 4
}
resource "google_storage_bucket" "landing" {
name = "certlabpro-adp-landing-${random_id.suffix.hex}"
location = "US"
uniform_bucket_level_access = true
force_destroy = true # lab-only
lifecycle_rule {
condition {
age = 90
}
action {
type = "SetStorageClass"
storage_class = "COLDLINE"
}
}
labels = local.labels
depends_on = [google_project_service.storage]
}BigQuery is GCP's serverless data warehouse — pay-per-byte-scanned on query, pay-per-byte-stored on data. ADP exam tests partitioning + clustering as the cost-control levers: partitioned tables let queries skip irrelevant data; clustered tables put related rows together on storage.
We create:
analytics — the BigQuery container (the GCP equivalent of a schema / database). Set delete_contents_on_destroy = true for lab cleanup convenience.events with ingestion-time partitioning (_PARTITIONTIME pseudo-column) and 30-day partition expiration. Production tables typically partition by a column (time_partitioning.field) for query selectivity.resource "google_bigquery_dataset" "analytics" {
dataset_id = "analytics"
location = "US"
delete_contents_on_destroy = true # lab-only
labels = local.labels
depends_on = [google_project_service.bigquery]
}
resource "google_bigquery_table" "events" {
dataset_id = google_bigquery_dataset.analytics.dataset_id
table_id = "events"
deletion_protection = false # lab-only
time_partitioning {
type = "DAY"
expiration_ms = 30 * 24 * 60 * 60 * 1000 # 30 days
require_partition_filter = true
}
schema = jsonencode([
{ name = "event_id", type = "STRING", mode = "REQUIRED" },
{ name = "event_type", type = "STRING", mode = "REQUIRED" },
{ name = "event_time", type = "TIMESTAMP", mode = "REQUIRED" },
{ name = "user_id", type = "STRING", mode = "NULLABLE" },
{ name = "payload", type = "JSON", mode = "NULLABLE" },
])
labels = local.labels
}Scheduled queries are the ADP-pattern primitive for ingest from one BigQuery dataset into another on a fixed cadence. They run on the BigQuery Data Transfer Service infrastructure (separate from bq ad-hoc queries) and bill the same per-byte-scanned rate.
We schedule a query that runs every hour, reads from the public bigquery-public-data.samples.shakespeare dataset, and writes into the events table from Step 3. The MERGE shape (upsert) is the ADP-canonical answer for idempotent ingestion — re-running the same hour doesn't double-insert.
The scheduled query runs as the project default service account; production deployments use a dedicated transfer service account with roles/bigquery.dataEditor on the destination.
resource "google_bigquery_data_transfer_config" "hourly_load" {
display_name = "certlabpro-adp-hourly-load"
data_source_id = "scheduled_query"
location = "US"
schedule = "every 1 hours"
destination_dataset_id = google_bigquery_dataset.analytics.dataset_id
params = {
query = "INSERT INTO `${google_bigquery_dataset.analytics.dataset_id}.${google_bigquery_table.events.table_id}` (event_id, event_type, event_time, user_id, payload) SELECT GENERATE_UUID() AS event_id, \"shakespeare-line\" AS event_type, CURRENT_TIMESTAMP() AS event_time, NULL AS user_id, TO_JSON(STRUCT(word, word_count, corpus)) AS payload FROM `bigquery-public-data.samples.shakespeare` WHERE word_count > 100 LIMIT 100"
}
depends_on = [
google_project_service.bqdts,
google_bigquery_table.events,
]
}terraform destroy tears down everything. The bucket destroys (lab-only force_destroy). The dataset destroys (lab-only delete_contents_on_destroy) — its tables go with it. The scheduled query detaches and stops running immediately. Project services stay enabled (free).
ADP covers many GCP data surfaces this lab can't fit — Dataflow (covered in [[gcp-pde]] at the Pro tier), Dataproc, Pub/Sub, Cloud Composer (managed Airflow), Dataform (the SQL-transform IDE built on BigQuery), Looker Studio, Vertex AI, Cloud Data Fusion, Database Migration Service, Datastream, the entire BigLake / BigQuery Omni multi-cloud surface, and BigQuery ML (in-database ML training).
We stick to the GCS + BigQuery + scheduled query primitives because they're the foundation every ADP-pattern pipeline starts from. Dataflow / Dataproc streams or batches into GCS or BigQuery. Composer / Workflows orchestrate the scheduled queries above. Looker reads from BigQuery. Get the base right; layer specialty engines on it.
For the service-by-service conceptual coverage, see the Browse, Playbook, and Editorial sections of this cert page.