Practice Free Amazon DEA-C01 Exam Online Questions
A company is building a data stream processing application. The application runs in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. The application stores processed data in an Amazon DynamoDB table.
The company needs the application containers in the EKS cluster to have secure access to the DynamoDB table. The company does not want to embed AWS credentials in the containers.
Which solution will meet these requirements?
- A . Store the AWS credentials in an Amazon S3 bucket. Grant the EKS containers access to the S3 bucket to retrieve the credentials.
- B . Attach an IAM role to the EKS worker nodes. Grant the IAM role access to DynamoDB. Use the IAM role to set up IAM roles service accounts (IRSA) functionality.
- C . Create an IAM user that has an access key to access the DynamoDB table. Use environment variables in the EKS containers to store the IAM user access key data.
- D . Create an IAM user that has an access key to access the DynamoDB table. Use Kubernetes secrets that are mounted in a volume of the EKS cluster nodes to store the user access key data.
B
Explanation:
In this scenario, the company is using Amazon Elastic Kubernetes Service (EKS) and wants secure access to DynamoDB without embedding credentials inside the application containers. The best practice is to use IAM roles for service accounts (IRSA), which allows assigning IAM roles to Kubernetes service accounts. This lets the EKS pods assume specific IAM roles securely, without the need to store credentials in containers.
IAM Roles for Service Accounts (IRSA):
With IRSA, each pod in the EKS cluster can assume an IAM role that grants access to DynamoDB without needing to manage long-term credentials. The IAM role can be attached to the service account associated with the pod.
This ensures least privilege access, improving security by preventing credentials from being embedded in the containers.
Reference: IAM Roles for Service Accounts (IRSA)
Alternatives Considered:
A (Storing AWS credentials in S3): Storing AWS credentials in S3 and retrieving them introduces security risks and violates the principle of not embedding credentials.
C (IAM user access keys in environment variables): This also embeds credentials, which is not recommended.
D (Kubernetes secrets): Storing user access keys as secrets is an option, but it still involves handling long-term credentials manually, which is less secure than using IRSA.
Reference: IAM Best Practices for Amazon EKS
Secure Access to DynamoDB from EKS
A company manages an Amazon Redshift data warehouse. The data warehouse is in a public subnet
inside a custom VPC A security group allows only traffic from within itself- An ACL is open to all traffic.
The company wants to generate several visualizations in Amazon QuickSight for an upcoming sales event. The company will run QuickSight Enterprise edition in a second AW5 account inside a public subnet within a second custom VPC. The new public subnet has a security group that allows outbound traffic to the existing Redshift cluster.
A data engineer needs to establish connections between Amazon Redshift and QuickSight.
QuickSight must refresh dashboards by querying the Redshift cluster.
Which solution will meet these requirements?
- A . Configure the Redshift security group to allow inbound traffic on the Redshift port from the QuickSight security group.
- B . Assign Elastic IP addresses to the QuickSight visualizations. Configure the QuickSight security group to allow inbound traffic on the Redshift port from the Elastic IP addresses.
- C . Confirm that the CIDR ranges of the Redshift VPC and the QuickSight VPC are the same. If CIDR ranges are different, reconfigure one CIDR range to match the other. Establish network peering between the VPCs.
- D . Create a QuickSight gateway endpoint in the Redshift VPC. Attach an endpoint policy to the gateway endpoint to ensure only specific QuickSight accounts can use the endpoint.
A data engineer needs to make tabular data available in an Amazon S3Cbased data lake. Users must be able to query the data by using SQL queries in Amazon Redshift, Amazon Athena, and Amazon EMR. The data is updated daily. The data engineer must ensure that updates and deletions are reflected in the data lake.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Store the data in S3 Standard. Configure Apache Hudi with merge-on-read in Amazon EMR. Use Apache Spark SQL in Amazon EMR to perform the daily updates and deletions. Use Amazon EMR to schedule compaction jobs. Use AWS Glue to create a data catalog of Hudi tables that are stored in Amazon S3.
- B . Create S3 tables for the tabular data. Use AWS Glue and an S3 tables catalog for Apache Iceberg JAR to perform the daily updates and deletions. Configure a compaction size target. Set up snapshot management and unreferenced file removal for the S3 tables bucket.
- C . Load the data into an Amazon Redshift cluster. Use SQL to perform the daily updates and deletions. Upload the data to an Amazon S3 bucket in Apache Parquet format to create the data lake.
- D . Load the data into an Amazon EMR cluster. Use Apache Spark to perform the daily updates and deletions. Upload the data into an Amazon S3 bucket in Apache Parquet format to create the data lake.
B
Explanation:
Comprehensive and Detailed Explanation (150C250 words)
Apache Iceberg is a table format designed for large-scale data lakes that supports ACID transactions, schema evolution, time travel, and row-level updates and deletes. Using S3 Tables with Apache Iceberg provides a fully managed experience that integrates natively with Amazon Athena, Amazon Redshift, and Amazon EMR.
By using AWS Glue with the Iceberg catalog, the data engineer can perform daily updates and deletions without managing Spark clusters, compaction scheduling, or metadata cleanup manually. Iceberg handles snapshots, file pruning, and unreferenced file removal automatically, significantly reducing operational overhead.
Apache Hudi requires Amazon EMR clusters, Spark jobs, and manual compaction orchestration, increasing complexity. The Parquet-only approaches in options C and D do not support updates or deletes efficiently and would require full rewrites of datasets, which is not scalable.
Therefore, using S3 Tables with Apache Iceberg provides the most efficient, scalable, and low-maintenance solution that satisfies all query and update requirements.
A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company’s existing analytics platform.
The company wants to minimize the effort and time required to incorporate third-party datasets.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Use API calls to access and integrate third-party datasets from AWS Data Exchange.
- B . Use API calls to access and integrate third-party datasets from AWS
- C . Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS Code Commit repositories.
- D . Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).
A
Explanation:
AWS Data Exchange is a service that makes it easy to find, subscribe to, and use third-party data in the cloud. It provides a secure and reliable way to access and integrate data from various sources, such as data providers, public datasets, or AWS services. Using AWS Data Exchange, you can browse and subscribe to data products that suit your needs, and then use API calls or the AWS Management Console to export the data to Amazon S3, where you can use it with your existing analytics platform. This solution minimizes the effort and time required to incorporate third-party datasets, as you do not need to set up and manage data pipelines, storage, or access controls. You also benefit from the data quality and freshness provided by the data providers, who can update their data products as frequently as needed12.
The other options are not optimal for the following reasons:
B. Use API calls to access and integrate third-party datasets from AWS. This option is vague and does not specify which AWS service or feature is used to access and integrate third-party datasets. AWS offers a variety of services and features that can help with data ingestion, processing, and analysis, but not all of them are suitable for the given scenario. For example, AWS Glue is a serverless data integration service that can help you discover, prepare, and combine data from various sources, but it requires you to create and run data extraction, transformation, and loading (ETL) jobs, which can add operational overhead3.
C. Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS Code Commit repositories. This option is not feasible, as AWS Code Commit is a source control service that hosts secure Git-based repositories, not a data source that can be accessed by Amazon Kinesis Data Streams. Amazon Kinesis Data Streams is a service that enables you to capture, process, and analyze data streams in real time, such as clickstream data, application logs, or IoT telemetry. It does not support accessing and integrating data from AWS Code Commit repositories, which are meant for storing and managing code, not data.
D. Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR). This option is also not feasible, as Amazon ECR is a fully managed container registry service that stores, manages, and deploys container images, not a data source that can be accessed by Amazon Kinesis Data Streams. Amazon Kinesis Data Streams does not support accessing and integrating data from Amazon ECR, which is meant for storing and managing container images, not data.
1: AWS Data Exchange User Guide
2: AWS Data Exchange FAQs
3: AWS Glue Developer Guide
: AWS Code Commit User Guide
: Amazon Kinesis Data Streams Developer Guide
: Amazon Elastic Container Registry User Guide
: Build a Continuous Delivery Pipeline for Your Container Images with Amazon ECR as Source
A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles.
More than 100 users access the data warehouse every day.
The company wants to control user access to the objects based on each user’s job role, permissions, and how sensitive the data is.
Which solution will meet these requirements?
- A . Use the role-based access control (RBAC) feature of Amazon Redshift.
- B . Use the row-level security (RLS) feature of Amazon Redshift.
- C . Use the column-level security (CLS) feature of Amazon Redshift.
- D . Use dynamic data masking policies in Amazon Redshift.
A
Explanation:
Amazon Redshift supports Role-Based Access Control (RBAC) to manage access to database objects. RBAC allows administrators to create roles for job functions and assign privileges at the schema, table, or column level based on data sensitivity and user roles.
“RBAC in Amazon Redshift helps manage permissions more efficiently at scale by assigning users to roles that reflect their job function. It simplifies user management and secures access based on job role and data sensitivity.”
C Ace the AWS Certified Data Engineer – Associate Certification – version 2 – apple.pdf
RBAC is preferred over RLS or CLS alone because it offers a more comprehensive and scalable solution across multiple users and permissions.
A data engineer is using an Apache Iceberg framework to build a data lake that contains 100 TB of data. The data engineer wants to run AWS Glue Apache Spark Jobs that use the Iceberg framework.
What combination of steps will meet these requirements? (Select TWO.)
- A . Create a key named -conf for an AWS Glue job. Set Iceberg as a value for the –datalake-formats job parameter.
- B . Specify the path to a specific version of Iceberg by using the –extra-Jars job parameter. Set Iceberg as a value for the ~ datalake-formats job parameter.
- C . Set Iceberg as a value for the -datalake-formats job parameter.
- D . Set the -enable-auto-scaling parameter to true.
- E . Add the -job-bookmark-option: job-bookmark-enable parameter to an AWS Glue job.
A data engineer needs to run a data transformation job whenever a user adds a file to an Amazon S3 bucket. The job will run for less than 1 minute. The job must send the output through an email message to the data engineer. The data engineer expects users to add one file every hour of the day.
Which solution will meet these requirements in the MOST operationally efficient way?
- A . Create a small Amazon EC2 instance that polls the S3 bucket for new files. Run transformation code on a schedule to generate the output. Use operating system commands to send email messages.
- B . Run an Amazon Elastic Container Service (Amazon ECS) task to poll the S3 bucket for new files. Run transformation code on a schedule to generate the output. Use operating system commands to send email messages.
- C . Create an AWS Lambda function to transform the data. Use Amazon S3 Event Notifications to invoke the Lambda function when a new object is created. Publish the output to an Amazon Simple Notification Service (Amazon SNS) topic. Subscribe the data engineer’s email account to the topic.
- D . Deploy an Amazon EMR cluster. Use EMR File System (EMRFS) to access the files in the S3 bucket. Run transformation code on a schedule to generate the output to a second S3 bucket. Create an Amazon Simple Notification Service (Amazon SNS) topic. Configure Amazon S3 Event Notifications to notify the topic when a new object is created.
A data engineer is designing a log table for an application that requires continuous ingestion. The application must provide dependable API-based access to specific records from other applications. The application must handle more than 4,000 concurrent write operations and 6,500 read operations every second.
- A . Create an Amazon Redshift table with the KEY distribution style. Use the Amazon Redshift Data API to perform all read and write operations.
- B . Store the log files in an Amazon S3 Standard bucket. Register the schema in AWS Glue Data Catalog. Create an external Redshift table that points to the AWS Glue schema. Use the table to perform Amazon Redshift Spectrum read operations.
- C . Create an Amazon Redshift table with the EVEN distribution style. Use the Amazon Redshift JDBC connector to establish a database connection. Use the database connection to perform all read and write operations.
- D . Create an Amazon DynamoDB table that has provisioned capacity to meet the application’s capacity needs. Use the DynamoDB table to perform all read and write operations by using DynamoDB APIs.
D
Explanation:
For low-latency, high-throughput workloads with API-based access and predictable reads/writes, Amazon DynamoDB is the optimal choice. It scales automatically, supports thousands of read/write operations per second, and offers fully managed API-driven access.
“Amazon DynamoDB provides consistent, single-digit millisecond latency for high-traffic applications and scales seamlessly to handle thousands of concurrent reads and writes.”
C Ace the AWS Certified Data Engineer – Associate Certification – version 2 – apple.pdf
A company uses a variety of AWS and third-party data stores. The company wants to consolidate all the data into a central data warehouse to perform analytics. Users need fast response times for analytics queries.
The company uses Amazon QuickSight in direct query mode to visualize the data. Users normally run queries during a few hours each day with unpredictable spikes.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Use Amazon Redshift Serverless to load all the data into Amazon Redshift managed storage (RMS).
- B . Use Amazon Athena to load all the data into Amazon S3 in Apache Parquet format.
- C . Use Amazon Redshift provisioned clusters to load all the data into Amazon Redshift managed storage (RMS).
- D . Use Amazon Aurora PostgreSQL to load all the data into Aurora.
A
Explanation:
Problem Analysis:
The company requires a centralized data warehouse for consolidating data from various sources.
They use Amazon QuickSight in direct query mode, necessitating fast response times for analytical queries.
Users query the data intermittently, with unpredictable spikes during the day.
Operational overhead should be minimal.
Key Considerations:
The solution must support fast, SQL-based analytics.
It must handle unpredictable spikes efficiently.
Must integrate seamlessly with QuickSight for direct querying. Minimize operational complexity and scaling concerns. Solution Analysis:
Option A: Amazon Redshift Serverless
Redshift Serverless eliminates the need for provisioning and managing clusters.
Automatically scales compute capacity up or down based on query demand.
Reduces operational overhead by handling performance optimization.
Fully integrates with Amazon QuickSight, ensuring low-latency analytics.
Reduces costs as it charges only for usage, making it ideal for workloads with intermittent spikes.
Option B: Amazon Athena with S3 (Apache Parquet)
Athena supports querying data directly from S3 in Parquet format.
While it’s cost-effective, performance depends on the size and complexity of the data. It is not optimized for high-speed analytics needed by QuickSight in direct query mode.
Option C: Amazon Redshift Provisioned Clusters
Requires manual cluster provisioning, scaling, and maintenance. Higher operational overhead compared to Redshift Serverless.
Option D: Amazon Aurora PostgreSQL
Aurora is optimized for transactional databases, not data warehousing or analytics. Does not meet the requirement for fast analytics queries. Final Recommendation:
Amazon Redshift Serverless is the best choice for this use case because it provides fast analytics, integrates natively with QuickSight, and minimizes operational complexity while efficiently handling unpredictable spikes.
Amazon Redshift Serverless Overview
Amazon QuickSight and Redshift Integration
Athena vs. Redshift
A data engineer is launching an Amazon EMR duster. The data that the data engineer needs to load into the new cluster is currently in an Amazon S3 bucket. The data engineer needs to ensure that data is encrypted both at rest and in transit.
The data that is in the S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key.
The data engineer has an Amazon S3 path that has a Privacy Enhanced Mail (PEM) file.
Which solution will meet these requirements?
- A . Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Create a second security configuration. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach both security configurations to the cluster.
- B . Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for local disk encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.
- C . Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.
- D . Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption.
Create the EMR cluster, and attach the security configuration to the cluster.
C
Explanation:
The data engineer needs to ensure that the data in an Amazon EMR cluster is encrypted both at rest and in transit. The data in Amazon S3 is already encrypted using an AWS KMS key. To meet the requirements, the most suitable solution is to create an EMR security configuration that specifies the correct KMS key for at-rest encryption and use the PEM file for in-transit encryption.
Option C: Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation. This option configures encryption for both data at rest (using KMS keys) and data in transit (using the PEM file for SSL/TLS encryption). This approach ensures that data is fully protected during storage and transfer.
Options A, B, and D either involve creating unnecessary additional security configurations or make inaccurate assumptions about the way encryption configurations are attached.
Reference: Amazon EMR Security Configuration
Amazon S3 Encryption
