Practice Free DAS-C01 Exam Online Questions
A bank is using Amazon Managed Streaming for Apache Kafka (Amazon MSK) to populate real-time data into a data lake The data lake is built on Amazon S3, and data must be accessible from the data lake within 24 hours Different microservices produce messages to different topics in the cluster The cluster is created with 8 TB of Amazon Elastic Block Store (Amazon EBS) storage and a retention period of 7 days.
The customer transaction volume has tripled recently and disk monitoring has provided an alert that the cluster is almost out of storage capacity
What should a data analytics specialist do to prevent the cluster from running out of disk space1?
- A . Use the Amazon MSK console to triple the broker storage and restart the cluster
- B . Create an Amazon CloudWatch alarm that monitors the KafkaDataLogsDiskUsed metric Automatically flush the oldest messages when the value of this metric exceeds 85%
- C . Create a custom Amazon MSK configuration Set the log retention hours parameter to 48 Update the cluster with the new configuration file
- D . Triple the number of consumers to ensure that data is consumed as soon as it is added to a topic.
A social media company is using business intelligence tools to analyze data for forecasting. The company is using Apache Kafka to ingest data. The company wants to build dynamic dashboards that include machine learning (ML) insights to forecast key business trends.
The dashboards must show recent batched data that is not more than 75 minutes old. Various teams at the company want to view the dashboards by using Amazon QuickSight with ML insights.
Which solution will meet these requirements?
- A . Replace Kafka with Amazon Managed Streaming for Apache Kafka (Amazon MSK). Use AWS Data Exchange to store the data in Amazon S3. Use SPICE in QuickSight Enterprise edition to refresh the data from Amazon S3 each hour. Use QuickSight to create a dynamic dashboard that includes forecasting and ML insights.
- B . Replace Kafka with an Amazon Kinesis data stream. Use AWS Data Exchange to store the data in Amazon S3. Use SPICE in QuickSight Standard edition to refresh the data from Amazon S3 each hour. Use QuickSight to create a dynamic dashboard that includes forecasting and ML insights.
- C . Configure the Kafka-Kinesis-Connector to publish the data to an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to store the data in Amazon S3 with a max buffer size of 60 seconds. Use SPICE in QuickSight Enterprise edition to refresh the data from Amazon S3 each hour. Use QuickSight to create a dynamic dashboard that includes forecasting and ML insights.
- D . Configure the Kafka-Kinesis-Connector to publish the data to an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to store the data in Amazon S3 with a max buffer size of 60 seconds. Refresh the data in QuickSight Standard edition SPICE from Amazon S3 by using a scheduled AWS Lambda function. Configure the Lambda function to run every 75 minutes and to invoke the QuickSight API to create a dynamic dashboard that includes forecasting and ML insights.
A company’s data science team is designing a shared dataset repository on a Windows server. The data repository will store a large amount of training data that the data science team commonly uses in its machine learning models. The data scientists create a random number of new datasets each day.
The company needs a solution that provides persistent, scalable file storage and high levels of throughput and IOPS. The solution also must be highly available and must integrate with Active Directory for access control.
Which solution will meet these requirements with the LEAST development effort?
- A . Store datasets as files in an Amazon EMR cluster. Set the Active Directory domain for authentication.
- B . Store datasets as files in Amazon FSx for Windows File Server. Set the Active Directory domain for authentication.
- C . Store datasets as tables in a multi-node Amazon Redshift cluster. Set the Active Directory domain for authentication.
- D . Store datasets as global tables in Amazon DynamoDB. Build an application to integrate authentication with the Active Directory domain.
A company wants to provide its data analysts with uninterrupted access to the data in its Amazon Redshift cluster. All data is streamed to an Amazon S3 bucket with Amazon Kinesis Data Firehose. An AWS Glue job that is scheduled to run every 5 minutes issues a COPY command to move the data into Amazon Redshift.
The amount of data delivered is uneven throughout the day, and cluster utilization is high during certain periods. The COPY command usually completes within a couple of seconds. However, when load spike occurs, locks can exist and data can be missed. Currently, the AWS Glue job is configured to run without retries, with timeout at 5 minutes and concurrency at 1.
How should a data analytics specialist configure the AWS Glue job to optimize fault tolerance and improve data availability in the Amazon Redshift cluster?
- A . Increase the number of retries. Decrease the timeout value. Increase the job concurrency.
- B . Keep the number of retries at 0. Decrease the timeout value. Increase the job concurrency.
- C . Keep the number of retries at 0. Decrease the timeout value. Keep the job concurrency at 1.
- D . Keep the number of retries at 0. Increase the timeout value. Keep the job concurrency at 1.
A company has a fitness tracker application that generates data from subscribers. The company needs real-time reporting on this data. The data is sent immediately, and the processing latency must be less than 1 second. The company wants to perform anomaly detection on the data as the data is collected. The company also requires a solution that minimizes operational overhead.
Which solution meets these requirements?
- A . Amazon EMR cluster with Apache Spark streaming, Spark SQL, and Spark’s machine learning library (MLIib)
- B . Amazon Kinesis Data Firehose with Amazon S3 and Amazon Athena
- C . Amazon Kinesis Data Firehose with Amazon QuickSight
- D . Amazon Kinesis Data Streams with Amazon Kinesis Data Analytics
A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company’s data analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data.
The amount of data that is ingested into Amazon S3 has increased to 5 PB over time. The query latency also has increased. The company needs to segment the data to reduce the amount of data that is scanned.
Which solutions will improve query performance? (Select TWO.)
- A . Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector.
Run the query from MySQL Workbench instead of Athena directly. - B . Configure Athena to use S3 Select to load only the files of the data subset.
- C . Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
- D . Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
- E . Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
A data analyst runs a large number of data manipulation language (DML) queries by using Amazon Athena with the JDBC driver. Recently, a query failed after It ran for 30 minutes. The query returned the following message
Java.sql.SGLException: Query timeout
The data analyst does not immediately need the query results However, the data analyst needs a long-term solution for this problem
Which solution will meet these requirements?
- A . Split the query into smaller queries to search smaller subsets of data.
- B . In the settings for Athena, adjust the DML query timeout limit
- C . In the Service Quotas console, request an increase for the DML query timeout
- D . Save the tables as compressed .csv files
A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake.
There are two data transformation requirements that will enable the consumers within the company to create reports:
Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company’s requirements for transforming the data? (Choose three.)
- A . For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
- B . For daily incoming data, use Amazon Athena to scan and identify the schema.
- C . For daily incoming data, use Amazon Redshift to perform transformations.
- D . For daily incoming data, use AWS Glue workflows with AWS Glue jobs to perform transformations.
- E . For archived data, use Amazon EMR to perform data transformations.
- F . For archived data, use Amazon SageMaker to perform data transformations.
A company has a mobile app that has millions of users. The company wants to enhance the mobile app by including interactive data visualizations that show user trends.
The data for visualization is stored in a large data lake with 50 million rows. Data that is used in the visualization should be no more than two hours old.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Run an hourly batch process that renders user-specific data visualizations as static images that are stored in Amazon S3.
- B . Precompute aggregated data hourly. Store the data in Amazon DynamoDB. Render the data by using the D3.js JavaScript library.
- C . Embed an Amazon QuickSight Enterprise edition dashboard into the mobile app by using the QuickSight Embedding SDK. Refresh data in SPICE hourly.
- D . Run Amazon Athena queries behind an Amazon API Gateway API. Render the data by using the D3.js JavaScript library.
A company using Amazon QuickSight Enterprise edition has thousands of dashboards analyses and datasets. The company struggles to manage and assign permissions for granting users access to various items within QuickSight. The company wants to make it easier to implement sharing and permissions management.
Which solution should the company implement to simplify permissions management?
- A . Use QuickSight folders to organize dashboards, analyses, and datasets Assign individual users permissions to these folders
- B . Use QuickSight folders to organize dashboards analyses, and datasets Assign group permissions by using these folders.
- C . Use AWS 1AM resource-based policies to assign group permissions to QuickSight items
- D . Use QuickSight user management APIs to provision group permissions based on dashboard naming conventions