Optimize a large BigQuery table for query cost and performance.
→Partition the table by a frequently filtered time-unit column (e.g., transaction date). Cluster the table by other high-cardinality, frequently filtered columns (e.g., `customer_id`).
Why: Partitioning is the most effective way to reduce cost and latency by pruning the amount of data scanned. Clustering further improves performance by sorting data within partitions.
Reference↗
Prevent data from a sensitive BigQuery dataset from being copied to an unauthorized destination (e.g., a public GCS bucket), even by a user with valid credentials.
→Use VPC Service Controls to create a service perimeter around the project containing the BigQuery dataset.
Why: VPC Service Controls act as a "virtual firewall" for GCP services, preventing data from leaving the perimeter. This is a critical defense-in-depth control against data exfiltration.
Reference↗
Restrict access to sensitive columns (e.g., PII) in a BigQuery table to authorized groups, while allowing others to query the remaining columns.
→Use Data Catalog to create a taxonomy and policy tags. Apply policy tags to sensitive columns and grant the "Fine-Grained Reader" role to authorized groups.
Why: This is the native, scalable method for column-level security in BigQuery. It provides centralized governance without needing to create and manage separate views.
Filter a table so that users can only see rows that pertain to them (e.g., sales managers see only their own region's data).
→Create a Row-Level Security Policy on the table that filters rows based on `SESSION_USER()`.
Why: Provides dynamic, predicate-based filtering at query time. This is more secure and manageable than creating an authorized view for each user or role.
Automatically delete data from a BigQuery table after a specified retention period to comply with regulations (e.g., delete data older than 7 years).
→For time-series data, set a partition expiration on the time-partitioned table. For other tables, set the default table expiration.
Why: This is a built-in, "set-and-forget" feature that ensures compliance without manual cleanup scripts or external orchestration.
A BigQuery table was accidentally modified or deleted.
→Use BigQuery Time Travel to query the table as it existed at a point in time before the incident, using `FOR SYSTEM_TIME AS OF`.
Why: BigQuery automatically maintains a 7-day history of table data. This allows for instant recovery within the time travel window without needing to restore from backups.
Reference↗
Discover, manage, secure, and monitor data assets (BigQuery, GCS) across an entire organization.
→Use Dataplex.
Why: Dataplex acts as an intelligent data fabric, providing a unified pane for data governance, quality, lineage, discovery, and lifecycle management across disparate data silos.
Understand and visualize how data flows from source systems, through transformation jobs, to final reporting tables.
→Use Dataplex Data Lineage.
Why: Automatically captures lineage information from BigQuery, Data Fusion, and Composer logs to provide an interactive, graph-based view of data dependencies for impact analysis and auditing.
Ensure predictable query performance and cost for critical workloads, avoiding "slot contention" from other users.
→Purchase BigQuery Editions (capacity-based pricing). Create reservations to dedicate a pool of slots to specific projects or folders.
Why: Switches from a shared, on-demand pool to a dedicated compute capacity, guaranteeing resources for critical jobs and providing predictable billing.
Scan all data assets in BigQuery and Cloud Storage to automatically identify and classify PII and other sensitive data.
→Configure a Cloud Data Loss Prevention (DLP) discovery scan job.
Why: Cloud DLP uses hundreds of predefined detectors to find sensitive data at scale. It can integrate with Data Catalog to automatically apply policy tags for governance.
A containerized application (on GKE or Cloud Run) needs to securely authenticate to BigQuery without managing service account keys.
→Use Workload Identity.
Why: The recommended best practice for service-to-service authentication. It maps a Kubernetes service account to a GCP IAM service account, using short-lived, automatically rotated tokens.
For compliance, generate a report of all users who have queried a sensitive BigQuery table in the last 90 days.
→Enable and query the BigQuery Data Access audit logs, which can be routed to a BigQuery dataset for analysis.
Why: Data Access logs provide an immutable record of who accessed what data and when. They are essential for security and compliance audits but must be explicitly enabled.
Identify which users or queries are responsible for high BigQuery costs.
→Query the `INFORMATION_SCHEMA.JOBS` view.
Why: This metadata view contains detailed information for every query run, including the user, bytes billed, and slots consumed, enabling precise cost attribution and analysis.