AWS Certified Data Engineer Associate
275 practice questions
Last reviewed: April 2026
Personal notes and resource links for your study journey
Filter by Certification
The AWS Certified Data Engineer Associate (DEA-C01) launched in March 2024 as the practitioner-focused successor to the retired Data Analytics Specialty. It validates the ability to design, build, operate, and secure data pipelines and analytics workloads on AWS β including ingestion, transformation, storage, orchestration, and governance. The exam targets working data engineers, analytics engineers, and ETL developers on AWS-centric stacks. Heavy emphasis on Glue, Lambda, Kinesis Data Streams / Firehose, Managed Kafka (MSK), S3 data lakes, Lake Formation, Athena, Redshift, and EMR. Expect scenario-driven questions about cost-aware ingestion choices, file format and partitioning strategy, and pipeline reliability. DEA-C01 is conceptual (no labs) but assumes hands-on pipeline experience.
The largest domain at 34%. Kinesis Data Streams vs. Firehose vs. MSK selection, Glue ETL jobs and DataBrew, Lambda for lightweight ETL, and AppFlow for SaaS sources. Common stumbling block: choosing the right ingestion service under latency and ordering constraints.
S3 data lake design, file formats (Parquet, ORC, Avro), partitioning, Lake Formation governance, Redshift architecture (RA3, Serverless), and DynamoDB for operational workloads. Tests practical storage tradeoffs.
Workflow orchestration with Step Functions, Glue Workflows, MWAA (Managed Airflow), and EventBridge. CloudWatch monitoring of data jobs, retries, and alerting. Often missed: when MWAA is justified vs. simpler Step Functions.
Lake Formation permissions, fine-grained access via row/column-level security, KMS for at-rest encryption, IAM patterns for cross-account data sharing, and PII detection (Macie). Smaller weight (18%) but high-density questions.
Services you'll encounter on the exam and why each one matters.
Serverless ETL platform with a managed Spark/Python runtime, Crawlers for schema discovery, the Glue Data Catalog, and Glue DataBrew for low-code transformation.
Why it's on the exam: Glue is the headline service in Data Ingestion and Transformation β expect questions on job bookmarks, dynamic frames, partitioning strategy, and DataBrew vs. Glue Studio tradeoffs.
Object storage that serves as the foundation for the AWS data lake β landing zone, raw / curated / consumption layers, and source for every downstream analytics service.
Why it's on the exam: Every DEA-C01 storage and ingestion scenario assumes S3 as the substrate; storage classes, lifecycle, Intelligent-Tiering, and partition layout drive Data Store Management questions.
Managed cloud data warehouse with columnar MPP storage, RA3 separated compute/storage, Redshift Spectrum over S3, and zero-ETL ingestion from Aurora.
Why it's on the exam: Data Store Management questions repeatedly contrast Redshift (warehouse) against Athena/Glue/Lake Formation (lakehouse) β distribution keys, sort keys, and workload management land here.
Managed Hadoop / Spark / Hive / Presto / Flink runtime supporting EMR on EC2, EMR Serverless, and EMR on EKS for large-scale batch and streaming jobs.
Why it's on the exam: Data Ingestion and Transformation scenarios beyond Glue's scale or requiring Spark/Hudi/Iceberg integration name EMR as the answer.
Real-time streaming service for ingesting clickstream, IoT, application, and log events at scale, with shard- or on-demand capacity and replay within the retention window.
Why it's on the exam: Data Ingestion and Transformation tests streaming ingest design β Kinesis Data Streams is the AWS-native source for low-latency pipelines feeding Firehose, Lambda, or Flink.
Managed streaming-delivery service that batches, compresses, and lands records into S3, Redshift, OpenSearch, Splunk, or HTTP endpoints with optional Lambda transformation.
Why it's on the exam: Firehose is the canonical Data Ingestion answer when a question asks for managed, near-real-time delivery to a sink without writing consumer code.
Serverless interactive SQL engine over S3 (and federated sources) using the Glue Data Catalog, with workgroups for cost/access control and pay-per-query pricing.
Why it's on the exam: Data Operations and Support scenarios use Athena for ad-hoc exploration of lake data and as the query layer behind Lake Formation governance.
Managed Apache Airflow service for authoring, scheduling, and monitoring data pipelines as Python DAGs with full operator/sensor support.
Why it's on the exam: Data Operations and Support questions on pipeline orchestration distinguish MWAA (Airflow-native, code-first) from Step Functions (state-machine) β pick MWAA for complex cross-service DAGs.
Fine-grained access-control layer over the Glue Data Catalog providing row-, column-, and tag-based permissions across Athena, Redshift Spectrum, EMR, and Glue.
Why it's on the exam: Data Security and Governance tests Lake Formation as the AWS-native answer for row/column-level security on lake data, replacing direct IAM-on-S3 patterns.
Serverless key-value / document NoSQL database with single-digit-ms latency, on-demand or provisioned capacity, Streams for CDC, and zero-ETL export to S3.
Why it's on the exam: Data Store Management compares DynamoDB (operational NoSQL) against relational and warehouse options; DynamoDB Streams power CDC into the lake.
Managed relational databases (PostgreSQL, MySQL, Oracle, SQL Server, MariaDB) plus Aurora β including zero-ETL replication into Redshift for analytics.
Why it's on the exam: Data Store Management and Data Ingestion both reference RDS/Aurora as the operational source feeding the warehouse via zero-ETL, DMS, or logical replication.
Managed service for one-time and continuous (CDC) replication between heterogeneous databases β Oracle/SQL Server to Aurora/Redshift, on-prem to AWS.
Why it's on the exam: Data Ingestion and Transformation tests DMS as the canonical migration / CDC answer when the source is an operational RDBMS rather than a stream or file.
Serverless workflow orchestrator with native integrations for Glue, EMR, Lambda, Athena, SageMaker, and DynamoDB, modeling pipelines as Standard or Express state machines.
Why it's on the exam: Data Operations and Support questions distinguish Step Functions (state-machine, sub-second / long-running) from MWAA (Airflow DAGs) β Step Functions wins for event-driven, AWS-native flows.
Serverless event bus that routes AWS-service events, partner events, and custom events to targets (Lambda, Step Functions, Firehose, SQS) with content-based filtering and schedules.
Why it's on the exam: Data Operations and Support uses EventBridge to trigger pipelines on schedule or on data-arrival events and to fan out signals across teams.
Serverless compute used for in-flight record transformation (Firehose / Kinesis), lightweight ETL glue, S3-event-driven preprocessing, and pipeline custom logic.
Why it's on the exam: Data Ingestion and Transformation expects Lambda for Firehose-data-transformation use cases and for stitching event-driven steps that don't justify Glue or EMR.
Managed OpenSearch (and legacy Elasticsearch) for search, log analytics, and observability β including OpenSearch Serverless for variable-capacity workloads.
Why it's on the exam: Data Store Management and Data Operations cite OpenSearch as the target for log analytics and as a Firehose / Kinesis destination for searchable telemetry.
Serverless BI service with SPICE in-memory engine, ML insights, embedded analytics, and Q (natural-language) for querying Redshift, Athena, RDS, and S3 sources.
Why it's on the exam: Data Operations and Support questions on serving analytics back to business users name QuickSight as the AWS-native consumption layer over the lake/warehouse.
Account-wide access control: users, roles, policies, federation, and least-privilege permissions for every Glue job, S3 object, Redshift query, and pipeline step.
Why it's on the exam: Data Security and Governance is anchored on IAM β execution roles for Glue/EMR, cross-account data sharing, and resource-based bucket policies are recurring questions.
Managed creation and control of cryptographic keys used to encrypt S3 objects, Redshift clusters, RDS volumes, Kinesis records, and Glue Data Catalog metadata at rest.
Why it's on the exam: Data Security and Governance expects KMS customer-managed keys (CMKs) for encryption-at-rest with auditable key rotation across every storage and pipeline service.
Account-wide audit log of every API call β who launched a Glue job, who queried Redshift, who altered Lake Formation permissions, who exported data from S3.
Why it's on the exam: Data Security and Governance compliance scenarios cite CloudTrail as the immutable record needed for audit, forensic investigation, and regulatory evidence.
$105kβ$150kβ$215k USD annual
Range covers US-based mid-to-senior data engineering roles where AWS proficiency is required. FAANG and large data-intensive companies frequently exceed $260k TC at senior levels. Entry roles and non-coastal markets trend lower. DEA-C01 is a credible signal but rarely a sole hiring factor.
Source: levels.fyi 2025β2026 data engineering roles, U.S. BLS OEWS May 2024 (15-1252 software developers, 15-2051 data scientists). Figures are approximate; actual compensation depends on role, region, and experience.
Data engineering hiring stayed strong through 2024β2026 as enterprises continued building cloud data lakes, lakehouse architectures, and analytics platforms. DEA-C01 functions as a credible AWS-specific signal alongside Snowflake, Databricks, or dbt experience. Recruiters at AWS-centric data shops use it as a fast filter together with SQL, Python, and Spark fluency. It pairs naturally with the Solutions Architect Associate (SAA-C03), the Machine Learning Engineer Associate (MLA-C01), and provider-neutral tools like Airflow and dbt. The cert does NOT by itself qualify candidates for staff data-engineer or principal data-platform roles β those expect proven large-scale pipeline ownership and broader system-design experience.
There are no formal prerequisites. AWS recommends at least 2β3 years of general data-engineering experience and at least one year of hands-on AWS data-services experience.
Most candidates approach DEA-C01 after SAA-C03 (architectural foundation) or directly from a strong Spark/SQL/Python background. CLF-C02 is a useful warm-up for career changers without AWS exposure. The most efficient personal-project preparation is an end-to-end pipeline: Kinesis Firehose β S3 (Parquet, partitioned) β Glue catalog β Athena and Redshift Serverless, with Step Functions or Glue Workflows for orchestration and Lake Formation for governance. Candidates from non-AWS data backgrounds (e.g., on-prem Hadoop or pure Snowflake) should plan extra time on Glue, Lake Formation, and the Kinesis family.
DEA-C01 is rated Associate and is comparable in difficulty to SAA-C03, with a more focused service surface. Plan 70β110 hours over 8β12 weeks for candidates with prior data-engineering experience; 120β160 hours for those without. The exam is 65 scored questions in 130 minutes β multiple-choice and multiple-response, no labs.
Common stumbling blocks include differentiating Kinesis Data Streams (custom consumers, ordering, retention) from Firehose (managed delivery, transformations) and MSK (Kafka-compatible); knowing which orchestrator (Step Functions, Glue Workflows, MWAA, EventBridge Scheduler) suits a given pipeline; and Lake Formation permission inheritance edge cases. File-format and partitioning math (compression ratios, Parquet column pruning) shows up regularly.
Initial general availability. Beta exam ran late 2023. Replaces the retired Data Analytics Specialty (DAS-C01) for engineering-focused candidates. Current version as of April 2026.
DEA-C01 (AWS Certified Data Engineer Associate) is a a moderately difficult exam expecting practical hands-on experience plus solid understanding of best practices Associate-level exam. Most candidates need 80β150 hours of study spread over 6β12 weeks for associate-level exams. Most candidates who score consistently above the passing threshold on practice exams pass on their first attempt.
Most candidates need 80β150 hours of study spread over 6β12 weeks for associate-level exams. Time-to-pass varies widely by prior experience. Engineers with hands-on production experience in the underlying technology typically need less; candidates new to the platform should plan toward the upper end of that range.
DEA-C01 is a recognized credential in the AWS ecosystem and signals validated knowledge to employers, recruiters, and clients. Whether it is worth the time and fee for you depends on your role and goals β it tends to pay off most for cloud engineers, architects, and consultants who work with AWS day-to-day or want to move into roles that do.
The passing score for DEA-C01 is 720 / 1000. The exam contains 65 questions and lasts 2 hr 10 min.
The DEA-C01 exam fee is $150 USD. Fees are set by AWS and may vary by region; always confirm the current price on the official AWS certification page before booking.
AWS certifications are valid for 3 years. Recertify by passing the current version of the same exam, or by passing a higher-level exam in the same path before expiration.
Yes. You can take the exam online (proctored via the provider's secure browser, available 24/7 in most regions) or at an in-person Pearson VUE test center during business hours. Both formats use the same questions, time limit, and passing score.
CertLabPro provides 15 study modes across the practice question bank for DEA-C01. The exam-simulation mode mirrors the real exam: 65 questions in 2 hr 10 min, with the same passing threshold of 720 / 1000. Browse mode lets you read every Q&A statically.