Hands-on Lab — ADP Google Cloud Associate Data Practitioner

Last reviewed: May 2026

Build the AWS services on the ADP exam with plain Terraform — one block at a time, each tied back to an exam domain. The same code works on OpenTofu.

Overview

By the end of this lab you'll have provisioned, with plain Terraform, the smallest realistic ADP data substrate — a Cloud Storage landing bucket, a BigQuery dataset with one table partitioned by ingestion date, and a BigQuery scheduled query that runs every hour, reading from a public dataset and writing into the table. Four blocks; the GCP analytics on-ramp.

Drop the snippets into a single main.tf, run terraform init, then terraform apply step-by-step.

Prerequisites

Terraform >= 1.5 or OpenTofu >= 1.6.
A GCP project you own (with billing attached).
gcloud CLI authenticated as ADC.
Replace your-project-id in the provider block.

Cost note

All free at lab scope:

Cloud Storage (Standard): 5 GB/month free.
BigQuery storage: 10 GB/month free.
BigQuery queries: 1 TB/month free. Lab scheduled query reads kilobytes.
Scheduled queries: no per-query infra fee.

~$0/month at lab volume. Real BigQuery workloads bill on bytes scanned — partition + cluster aggressively and SELECT only what you need.

Steps

1.Provider, project services, naming

Enable Cloud Storage, BigQuery, and BigQuery Data Transfer Service (which powers scheduled queries).

terraform {
  required_version = ">= 1.5"

  required_providers {
    google = { source = "hashicorp/google", version = "~> 6.0" }
  }
}

provider "google" {
  project = "your-project-id" # REPLACE
  region  = "us-central1"
}

locals {
  labels = {
    project    = "certlabpro-adp"
    managed_by = "terraform"
  }
}

resource "google_project_service" "storage" {
  service            = "storage.googleapis.com"
  disable_on_destroy = false
}

resource "google_project_service" "bigquery" {
  service            = "bigquery.googleapis.com"
  disable_on_destroy = false
}

resource "google_project_service" "bqdts" {
  service            = "bigquerydatatransfer.googleapis.com"
  disable_on_destroy = false
}

2.Provision a Cloud Storage landing bucket for raw data
Provisions:
- Cloud Storage
Every ADP-pattern data pipeline starts with a landing bucket — raw files (CSV / JSON / Parquet / Avro) drop here, downstream jobs read from it. The bucket is the boundary between outside the lake and inside the lake. ADP exam tests the storage class pick again here — Standard for the landing tier (frequent reads in the first 30 days), with a lifecycle rule transitioning to Coldline after 90 days.

Uniform bucket-level access is on (the ADP-recommended security default).
```
resource "random_id" "suffix" {
  byte_length = 4
}

resource "google_storage_bucket" "landing" {
  name                        = "certlabpro-adp-landing-${random_id.suffix.hex}"
  location                    = "US"
  uniform_bucket_level_access = true
  force_destroy               = true # lab-only

  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }

  labels = local.labels

  depends_on = [google_project_service.storage]
}
```

3.Create a BigQuery dataset + ingestion-time-partitioned table

Provisions:

BigQuery

BigQuery is GCP's serverless data warehouse — pay-per-byte-scanned on query, pay-per-byte-stored on data. ADP exam tests partitioning + clustering as the cost-control levers: partitioned tables let queries skip irrelevant data; clustered tables put related rows together on storage.

We create:

Dataset analytics — the BigQuery container (the GCP equivalent of a schema / database). Set delete_contents_on_destroy = true for lab cleanup convenience.
Table events with ingestion-time partitioning (_PARTITIONTIME pseudo-column) and 30-day partition expiration. Production tables typically partition by a column (time_partitioning.field) for query selectivity.

resource "google_bigquery_dataset" "analytics" {
  dataset_id                  = "analytics"
  location                    = "US"
  delete_contents_on_destroy  = true # lab-only

  labels = local.labels

  depends_on = [google_project_service.bigquery]
}

resource "google_bigquery_table" "events" {
  dataset_id          = google_bigquery_dataset.analytics.dataset_id
  table_id            = "events"
  deletion_protection = false # lab-only

  time_partitioning {
    type                     = "DAY"
    expiration_ms            = 30 * 24 * 60 * 60 * 1000 # 30 days
    require_partition_filter = true
  }

  schema = jsonencode([
    { name = "event_id",    type = "STRING",    mode = "REQUIRED" },
    { name = "event_type",  type = "STRING",    mode = "REQUIRED" },
    { name = "event_time",  type = "TIMESTAMP", mode = "REQUIRED" },
    { name = "user_id",     type = "STRING",    mode = "NULLABLE" },
    { name = "payload",     type = "JSON",      mode = "NULLABLE" },
  ])

  labels = local.labels
}

4.Schedule a BigQuery query that runs hourly

Provisions:

BigQuery

Scheduled queries are the ADP-pattern primitive for ingest from one BigQuery dataset into another on a fixed cadence. They run on the BigQuery Data Transfer Service infrastructure (separate from bq ad-hoc queries) and bill the same per-byte-scanned rate.

We schedule a query that runs every hour, reads from the public bigquery-public-data.samples.shakespeare dataset, and writes into the events table from Step 3. The MERGE shape (upsert) is the ADP-canonical answer for idempotent ingestion — re-running the same hour doesn't double-insert.

The scheduled query runs as the project default service account; production deployments use a dedicated transfer service account with roles/bigquery.dataEditor on the destination.

resource "google_bigquery_data_transfer_config" "hourly_load" {
  display_name           = "certlabpro-adp-hourly-load"
  data_source_id         = "scheduled_query"
  location               = "US"
  schedule               = "every 1 hours"
  destination_dataset_id = google_bigquery_dataset.analytics.dataset_id

  params = {
    query = "INSERT INTO `${google_bigquery_dataset.analytics.dataset_id}.${google_bigquery_table.events.table_id}` (event_id, event_type, event_time, user_id, payload) SELECT GENERATE_UUID() AS event_id, \"shakespeare-line\" AS event_type, CURRENT_TIMESTAMP() AS event_time, NULL AS user_id, TO_JSON(STRUCT(word, word_count, corpus)) AS payload FROM `bigquery-public-data.samples.shakespeare` WHERE word_count > 100 LIMIT 100"
  }

  depends_on = [
    google_project_service.bqdts,
    google_bigquery_table.events,
  ]
}

Cleanup

terraform destroy tears down everything. The bucket destroys (lab-only force_destroy). The dataset destroys (lab-only delete_contents_on_destroy) — its tables go with it. The scheduled query detaches and stops running immediately. Project services stay enabled (free).

What this lab doesn't cover

ADP covers many GCP data surfaces this lab can't fit — Dataflow (covered in [[gcp-pde]] at the Pro tier), Dataproc, Pub/Sub, Cloud Composer (managed Airflow), Dataform (the SQL-transform IDE built on BigQuery), Looker Studio, Vertex AI, Cloud Data Fusion, Database Migration Service, Datastream, the entire BigLake / BigQuery Omni multi-cloud surface, and BigQuery ML (in-database ML training).

We stick to the GCS + BigQuery + scheduled query primitives because they're the foundation every ADP-pattern pipeline starts from. Dataflow / Dataproc streams or batches into GCS or BigQuery. Composer / Workflows orchestrate the scheduled queries above. Looker reads from BigQuery. Get the base right; layer specialty engines on it.

For the service-by-service conceptual coverage, see the Browse, Playbook, and Editorial sections of this cert page.

← Back to ADP hub

Overview

Drop the snippets into a single main.tf, run terraform init, then terraform apply step-by-step.

Cost note

All free at lab scope:

Cloud Storage (Standard): 5 GB/month free.
BigQuery storage: 10 GB/month free.
BigQuery queries: 1 TB/month free. Lab scheduled query reads kilobytes.
Scheduled queries: no per-query infra fee.

~$0/month at lab volume. Real BigQuery workloads bill on bytes scanned — partition + cluster aggressively and SELECT only what you need.

Steps

1.Provider, project services, naming

Enable Cloud Storage, BigQuery, and BigQuery Data Transfer Service (which powers scheduled queries).

terraform {
  required_version = ">= 1.5"

  required_providers {
    google = { source = "hashicorp/google", version = "~> 6.0" }
  }
}

provider "google" {
  project = "your-project-id" # REPLACE
  region  = "us-central1"
}

locals {
  labels = {
    project    = "certlabpro-adp"
    managed_by = "terraform"
  }
}

resource "google_project_service" "storage" {
  service            = "storage.googleapis.com"
  disable_on_destroy = false
}

resource "google_project_service" "bigquery" {
  service            = "bigquery.googleapis.com"
  disable_on_destroy = false
}

resource "google_project_service" "bqdts" {
  service            = "bigquerydatatransfer.googleapis.com"
  disable_on_destroy = false
}

2.Provision a Cloud Storage landing bucket for raw data

Provisions:

Cloud Storage

Every ADP-pattern data pipeline starts with a landing bucket — raw files (CSV / JSON / Parquet / Avro) drop here, downstream jobs read from it. The bucket is the boundary between outside the lake and inside the lake. ADP exam tests the storage class pick again here — Standard for the landing tier (frequent reads in the first 30 days), with a lifecycle rule transitioning to Coldline after 90 days.

Uniform bucket-level access is on (the ADP-recommended security default).

resource "random_id" "suffix" {
  byte_length = 4
}

resource "google_storage_bucket" "landing" {
  name                        = "certlabpro-adp-landing-${random_id.suffix.hex}"
  location                    = "US"
  uniform_bucket_level_access = true
  force_destroy               = true # lab-only

  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }

  labels = local.labels

  depends_on = [google_project_service.storage]
}

3.Create a BigQuery dataset + ingestion-time-partitioned table

Provisions:

BigQuery

We create:

Dataset analytics — the BigQuery container (the GCP equivalent of a schema / database). Set delete_contents_on_destroy = true for lab cleanup convenience.
Table events with ingestion-time partitioning (_PARTITIONTIME pseudo-column) and 30-day partition expiration. Production tables typically partition by a column (time_partitioning.field) for query selectivity.

resource "google_bigquery_dataset" "analytics" {
  dataset_id                  = "analytics"
  location                    = "US"
  delete_contents_on_destroy  = true # lab-only

  labels = local.labels

  depends_on = [google_project_service.bigquery]
}

resource "google_bigquery_table" "events" {
  dataset_id          = google_bigquery_dataset.analytics.dataset_id
  table_id            = "events"
  deletion_protection = false # lab-only

  time_partitioning {
    type                     = "DAY"
    expiration_ms            = 30 * 24 * 60 * 60 * 1000 # 30 days
    require_partition_filter = true
  }

  schema = jsonencode([
    { name = "event_id",    type = "STRING",    mode = "REQUIRED" },
    { name = "event_type",  type = "STRING",    mode = "REQUIRED" },
    { name = "event_time",  type = "TIMESTAMP", mode = "REQUIRED" },
    { name = "user_id",     type = "STRING",    mode = "NULLABLE" },
    { name = "payload",     type = "JSON",      mode = "NULLABLE" },
  ])

  labels = local.labels
}

4.Schedule a BigQuery query that runs hourly

Provisions:

BigQuery

The scheduled query runs as the project default service account; production deployments use a dedicated transfer service account with roles/bigquery.dataEditor on the destination.

resource "google_bigquery_data_transfer_config" "hourly_load" {
  display_name           = "certlabpro-adp-hourly-load"
  data_source_id         = "scheduled_query"
  location               = "US"
  schedule               = "every 1 hours"
  destination_dataset_id = google_bigquery_dataset.analytics.dataset_id

  params = {
    query = "INSERT INTO `${google_bigquery_dataset.analytics.dataset_id}.${google_bigquery_table.events.table_id}` (event_id, event_type, event_time, user_id, payload) SELECT GENERATE_UUID() AS event_id, \"shakespeare-line\" AS event_type, CURRENT_TIMESTAMP() AS event_time, NULL AS user_id, TO_JSON(STRUCT(word, word_count, corpus)) AS payload FROM `bigquery-public-data.samples.shakespeare` WHERE word_count > 100 LIMIT 100"
  }

  depends_on = [
    google_project_service.bqdts,
    google_bigquery_table.events,
  ]
}

What this lab doesn't cover

For the service-by-service conceptual coverage, see the Browse, Playbook, and Editorial sections of this cert page.

Hands-on Lab — ADP Google Cloud Associate Data Practitioner

Overview

Prerequisites

💰Cost note

Steps

Cleanup

What this lab doesn't cover

Hands-on Lab — ADP Google Cloud Associate Data Practitioner

Overview

Prerequisites

💰Cost note

Steps

Cleanup

What this lab doesn't cover

Cost note

Cost note