Practice Free Amazon DEA-C01 Exam Online Questions – Page 4

Question #31

A company uses Amazon Redshift for its data warehouse. A data engineer must query a table named orders.complete_orders_history, which contains 100 columns. The query must return all columns except columns named company_id and unique_system_id.

Which Amazon Redshift SQL statement will meet this requirement?

A . SELECT * EXCLUDE company_id, unique_system_id FROM orders.complete_orders_history;
B . SELECT * NOT IN company_id, unique_system_id FROM orders.complete_orders_history;
C . SELECT * EXCEPT company_id, unique_system_id FROM orders.complete_orders_history;
D . SELECT * TRUNCATE company_id, unique_system_id FROM orders.complete_orders_history;

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Option A is correct because Amazon Redshift supports the EXCLUDE clause in the SELECT list for

exactly this use case: returning all columns from a wide table except a small set of unwanted columns. AWS documentation states that “EXCLUDE column_list names the columns that are excluded from the query results” and notes that the option is helpful when only a subset of columns must be excluded from a wide table. That matches this question precisely because the table contains 100 columns and only company_id and unique_system_id need to be omitted.

Option B is invalid SQL syntax in Amazon Redshift because NOT IN is used in predicates, not in the select list to remove columns.

Option C is also incorrect because EXCEPT in Redshift is a set operator used between query result sets, not a select-list column exclusion feature. AWS documents EXCEPT alongside UNION and INTERSECT as result-set comparison operators, not as a way to omit columns from SELECT *.

Option D is not valid Redshift SQL syntax for column selection. Therefore, the correct and documented Redshift statement is SELECT * EXCLUDE company_id, unique_system_id FROM orders.complete_orders_history;

Question #32

A company is building a data stream processing application. The application runs in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. The application stores processed data in an Amazon DynamoDB table.

The company needs the application containers in the EKS cluster to have secure access to the DynamoDB table. The company does not want to embed AWS credentials in the containers.

Which solution will meet these requirements?

A . Store the AWS credentials in an Amazon S3 bucket. Grant the EKS containers access to the S3 bucket to retrieve the credentials.
B . Attach an IAM role to the EKS worker nodes. Grant the IAM role access to DynamoDB. Use the IAM role to set up IAM roles service accounts (IRSA) functionality.
C . Create an IAM user that has an access key to access the DynamoDB table. Use environment variables in the EKS containers to store the IAM user access key data.
D . Create an IAM user that has an access key to access the DynamoDB table. Use Kubernetes secrets that are mounted in a volume of the EKS cluster nodes to store the user access key data.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

In this scenario, the company is using Amazon Elastic Kubernetes Service (EKS) and wants secure access to DynamoDB without embedding credentials inside the application containers. The best practice is to use IAM roles for service accounts (IRSA), which allows assigning IAM roles to Kubernetes service accounts. This lets the EKS pods assume specific IAM roles securely, without the need to store credentials in containers.

IAM Roles for Service Accounts (IRSA):

With IRSA, each pod in the EKS cluster can assume an IAM role that grants access to DynamoDB without needing to manage long-term credentials. The IAM role can be attached to the service account associated with the pod.

This ensures least privilege access, improving security by preventing credentials from being embedded in the containers.

Reference: IAM Roles for Service Accounts (IRSA)

Alternatives Considered:

A (Storing AWS credentials in S3): Storing AWS credentials in S3 and retrieving them introduces security risks and violates the principle of not embedding credentials.

C (IAM user access keys in environment variables): This also embeds credentials, which is not recommended.

D (Kubernetes secrets): Storing user access keys as secrets is an option, but it still involves handling long-term credentials manually, which is less secure than using IRSA.

Reference: IAM Best Practices for Amazon EKS

Secure Access to DynamoDB from EKS

Question #33

A company aggregates high-frequency sensor telemetry into an Amazon S3 data lake. Each sensor stream emits structured records every hour. The records include metadata such as sensor category, unit ID, operational state, event timestamp, and site location. The data scales up to millions of records each day. The company runs complex queries each day to uncover performance insights specific to sensor categories.

Which solution will meet these requirements with the FASTEST query execution time?

A . Persist the data in Apache ORC format. Partition the data by date. Sort the data by sensor category.
B . Persist the data in CSV format. Partition the data by date. Sort the data by operational status.
C . Persist the data in Parquet format. Partition the data by sensor category. Sort the data by date.
D . Persist the data in CSV format. Partition the data by date. Sort the data by sensor category.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Option C is correct because the fastest design combines a columnar storage format with partitioning on the most common query predicate. AWS Athena guidance says that Parquet and ORC are optimized columnar formats and that columns frequently used as filters are good candidates for partitioning. AWS also states that when a query filters on a partition key, the engine reads only matching partitions, which significantly reduces data scanned. Since the company’s main analytical need is insights specific to sensor categories, partitioning by sensor category provides the strongest pruning. Sorting by date then helps organize time-based data within each category partition.

Options B and D use CSV, which is row-based and much slower for analytical scans than Parquet or ORC.

Option A uses a good format, but partitioning by date is less optimal than partitioning by sensor category when category is the main filter in the company’s queries. The study guide also identifies Parquet as the preferred analytic storage format for efficient columnar queries.

Question #34

A company builds a new data pipeline to process data for business intelligence reports. Users have noticed that data is missing from the reports.

A data engineer needs to add a data quality check for columns that contain null values and for referential integrity at a stage before the data is added to storage.

Which solution will meet these requirements with the LEAST operational overhead?

A . Use Amazon SageMaker Data Wrangler to create a Data Quality and Insights report.
B . Use AWS Glue ETL jobs to perform a data quality evaluation transform on the data. Use an IsComplete rule on the requested columns. Use a Referential Integrity rule for each join.
C . Use AWS Glue ETL jobs to perform a SQL transform on the data to determine whether requested columns contain null values. Use a second SQL transform to check referential integrity.
D . Use Amazon SageMaker Data Wrangler and a custom Python transform to create custom rules to check for null values and referential integrity.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

AWS Glue Data Quality transforms allow you to define built-in rules like Is Complete for null validation and Referential Integrity for relationship validation―all with minimal code and operational overhead.

“Use AWS Glue Data Quality rules such as Is Complete and Referential Integrity within ETL jobs to automatically validate incoming data.”

Question #35

A data engineer needs to use Amazon Neptune to develop graph applications.

Which programming languages should the engineer use to develop the graph applications? (Select TWO.)

A . Gremlin
B . SQL
C . ANSI SQL
D . SPARQL
E . Spark SQL

Reveal Solution Hide Solution

Correct Answer: A,D
A,D

Explanation:

Amazon Neptune supports graph applications using Gremlin and SPARQL as query languages. Neptune is a fully managed graph database service that supports both property graph and RDF graph models.

Option A: GremlinGremlin is a query language for property graph databases, which is supported by Amazon Neptune. It allows the traversal and manipulation of graph data in the property graph model.

Option D: SPARQLSPARQL is a query language for querying RDF graph data in Neptune. It is used to query, manipulate, and retrieve information stored in RDF format.

Other options:

SQL (Option B) and ANSI SQL (Option C) are traditional relational database query languages and are not used for graph databases.

Spark SQL (Option E) is related to Apache Spark for big data processing, not for querying graph databases.

Reference: Amazon Neptune Documentation

Gremlin Documentation

SPARQL Documentation

Question #36

A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions.

The data engineer requires a less manual way to update the Lambda functions.

Which solution will meet this requirement?

A . Store a pointer to the custom Python scripts in the execution context object in a shared Amazon S3 bucket.
B . Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.
C . Store a pointer to the custom Python scripts in environment variables in a shared Amazon S3 bucket.
D . Assign the same alias to each Lambda function. Call reach Lambda function by specifying the function’s alias.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Lambda layers are a way to share code and dependencies across multiple Lambda functions. By packaging the custom Python scripts into Lambda layers, the data engineer can update the scripts in one place and have them automatically applied to all the Lambda functions that use the layer. This reduces the manual effort and ensures consistency across the Lambda functions. The other options are either not feasible or not efficient. Storing a pointer to the custom Python scripts in the execution context object or in environment variables would require the Lambda functions to download the scripts from Amazon S3 every time they are invoked, which would increase latency and cost. Assigning the same alias to each Lambda function would not help with updating the Python scripts, as the alias only points to a specific version of the Lambda function code.

Reference: AWS Lambda layers

AWS Certified Data Engineer – Associate DEA-C01 Complete Study Guide, Chapter 3: Data Ingestion

and Transformation, Section 3.4: AWS Lambda

Question #37

A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.

Which solution will meet this requirement?

A . Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.
B . Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.
C . Turn on concurrency scaling in the settings during the creation of and new Redshift cluster.
D . Turn on concurrency scaling for the daily usage quota for the Redshift cluster.

Reveal Solution Hide Solution

Correct Answer: A

Question #38

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.

Which solution will meet these requirements with the LEAST operational overhead?

A . Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
B . Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
C . Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
D . Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Option A is the correct answer because it meets the requirements with the least operational overhead. Creating an S3 event notification that has an event type of s3:ObjectCreated:* will trigger the Lambda function whenever a new object is created in the S3 bucket. Using a filter rule to generate notifications only when the suffix includes .csv will ensure that the Lambda function only runs for .csv files. Setting the ARN of the Lambda function as the destination for the event notification will directly invoke the Lambda function without any additional steps.

Option B is incorrect because it requires the user to tag the objects with .csv, which adds an extra step and increases the operational overhead.

Option C is incorrect because it uses an event type of s3:*, which will trigger the Lambda function for any S3 event, not just object creation. This could result in unnecessary invocations and increased costs.

Option D is incorrect because it involves creating and subscribing to an SNS topic, which adds an extra layer of complexity and operational overhead.

AWS Certified Data Engineer – Associate DEA-C01 Complete Study Guide, Chapter 3: Data Ingestion and Transformation, Section 3.2: S3 Event Notifications and Lambda Functions, Pages 67-69

Building Batch Data Analytics Solutions on AWS, Module 4: Data Transformation, Lesson 4.2: AWS Lambda, Pages 4-8 AWS Documentation Overview, AWS Lambda Developer Guide, Working with AWS Lambda Functions, Configuring Function Triggers, Using AWS Lambda with Amazon S3, Pages 1-5

Question #39

A company wants to combine data from multiple software as a service (SaaS) applications for analysis.

A data engineering team needs to use Amazon QuickSight to perform the analysis and build dashboards. A data engineer needs to extract the data from the SaaS applications and make the data available for QuickSight queries.

Which solution will meet these requirements in the MOST operationally efficient way?

A . Create AWS Lambda functions that call the required APIs to extract the data from the applications. Store the data in an Amazon S3 bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight
B . Use AWS Lambda functions as Amazon Athena data source connectors to run federated queries against the SaaS applications. Create an Athena data source and a dataset in QuickSight.
C . Use Amazon AppFlow to create a Row for each SaaS application. Set an Amazon S3 bucket as the destination. Schedule the flows to extract the data to the bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight.
D . Export data the from the SaaS applications as Microsoft Excel files. Create a data source and a dataset in QuickSight by uploading the Excel files.

Reveal Solution Hide Solution

Correct Answer: C

Question #40

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.

The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.

Which AWS service should the company use to meet these requirements?

A . AWS Lambda
B . AWS Database Migration Service (AWS DMS)
C . AWS Direct Connect
D . AWS DataSync

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

AWS Database Migration Service (AWS DMS) is a cloud service that makes it possible to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores to AWS quickly, securely, and with minimal downtime and zero data loss1. AWS DMS supports migration between 20-plus database and analytics engines, such as Microsoft SQL Server to Amazon RDS for SQL Server2. AWS DMS takes over many of the difficult or tedious tasks involved in a migration project, such as capacity analysis, hardware and software procurement, installation and administration, testing and debugging, and ongoing replication and monitoring1. AWS DMS is a cost-effective solution, as you only pay for the compute resources and additional log storage used during the migration process2. AWS DMS is the best solution for the company to migrate the financial transaction data from the on-premises Microsoft SQL Server database to AWS, as it meets the requirements of minimal downtime, zero data loss, and low cost.

Option A is not the best solution, as AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, but it does not provide any built-in features for database migration. You would have to write your own code to extract, transform, and load the data from the source to the target, which would increase the operational overhead and complexity.

Option C is not the best solution, as AWS Direct Connect is a service that establishes a dedicated network connection from your premises to AWS, but it does not provide any built-in features for database migration. You would still need to use another service or tool to perform the actual data transfer, which would increase the cost and complexity.

Option D is not the best solution, as AWS DataSync is a service that makes it easy to transfer data between on-premises storage systems and AWS storage services, such as Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server, but it does not support Amazon RDS for SQL Server as a target. You would have to use another service or tool to migrate the data from Amazon S3 to Amazon RDS for SQL Server, which would increase the latency and complexity.

Reference: Database Migration – AWS Database Migration Service – AWS

What is AWS Database Migration Service?

AWS Database Migration Service Documentation

AWS Certified Data Engineer – Associate DEA-C01 Complete Study Guide

1 2 3 4 5 6 7 8 9

Exams