Practice Free MLS-C01 Exam Online Questions
A growing company has a business-critical key performance indicator (KPI) for the uptime of a machine learning (ML) recommendation system. The company is using Amazon SageMaker hosting services to develop a recommendation model in a single Availability Zone within an AWS Region. A machine learning (ML) specialist must develop a solution to achieve high availability. The solution must have a recovery time objective (RTO) of 5 minutes.
Which solution will meet these requirements with the LEAST effort?
- A . Deploy multiple instances for each endpoint in a VPC that spans at least two Regions.
- B . Use the SageMaker auto scaling feature for the hosted recommendation models.
- C . Deploy multiple instances for each production endpoint in a VPC that spans at least two subnets that are in a second Availability Zone.
- D . Frequently generate backups of the production recommendation model. Deploy the backups in a second Region.
A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues.
The Data Scientist on this project would like to speed up the model training time without losing a lot of
information from the original dataset.
Which feature engineering technique should the Data Scientist use to meet the objectives?
- A . Run self-correlation on all features and remove highly correlated features
- B . Normalize all numerical values to be between 0 and 1
- C . Use an autoencoder or principal component analysis (PCA) to replace original features with new features
- D . Cluster raw data using k-means and use sample data from each cluster to build a new dataset
A Machine Learning Specialist previously trained a logistic regression model using scikit-learn on a local machine, and the Specialist now wants to deploy it to production for inference only.
What steps should be taken to ensure Amazon SageMaker can host a model that was trained locally?
- A . Build the Docker image with the inference code. Tag the Docker image with the registry hostname andupload it to Amazon ECR.
- B . Serialize the trained model so the format is compressed for deployment. Tag the Docker image with theregistry hostname and upload it to Amazon S3.
- C . Serialize the trained model so the format is compressed for deployment. Build the image and upload it toDocker Hub.
- D . Build the Docker image with the inference code. Configure Docker Hub and upload the image to Amazon ECR.
A company wants to create an artificial intelligence (Al) yoga instructor that can lead large classes of students. The company needs to create a feature that can accurately count the number of students who are in a class. The company also needs a feature that can differentiate students who are performing a yoga stretch correctly from students who are performing a stretch incorrectly.
…etermine whether students are performing a stretch correctly, the solution needs to measure the location and angle of each student’s arms and legs A data scientist must use Amazon SageMaker to …ss video footage of a yoga class by extracting image frames and applying computer vision models.
Which combination of models will meet these requirements with the LEAST effort? (Select TWO.)
- A . Image Classification
- B . Optical Character Recognition (OCR)
- C . Object Detection
- D . Pose estimation
- E . Image Generative Adversarial Networks (GANs)
A company’s data scientist has trained a new machine learning model that performs better on test data than the company’s existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data.
The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data.
Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.)
- A . Create a new endpoint configuration that includes a production variant for each of the two models.
- B . Create a new endpoint configuration that includes two target variants that point to different endpoints.
- C . Deploy the new model to the existing endpoint.
- D . Update the existing endpoint to activate the new model.
- E . Update the existing endpoint to use the new endpoint configuration.
A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers.
The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for each document type. Additionally, the company trained a named entity recognition (NER) model for custom entity detection using a small sample size. This model has a very low confidence score and will require retraining with a large dataset.
Which solution for text extraction and entity detection will require the LEAST amount of effort?
- A . Extract text from receipt images by using Amazon Textract. Use the Amazon SageMaker BlazingText algorithm to train on the text for entities and custom entities.
- B . Extract text from receipt images by using a deep learning OCR model from the AWS Marketplace.
Use the NER deep learning model to extract entities. - C . Extract text from receipt images by using Amazon Textract. Use Amazon Comprehend for entity detection, and use Amazon Comprehend custom entity recognition for custom entity detection.
- D . Extract text from receipt images by using a deep learning OCR model from the AWS Marketplace. Use Amazon Comprehend for entity detection, and use Amazon Comprehend custom entity recognition for custom entity detection.
A company wants to use machine learning (ML) to improve its customer churn prediction model. The company stores data in an Amazon Redshift data warehouse.
A data science team wants to use Amazon Redshift machine learning (Amazon Redshift ML) to build a model and run predictions for new data directly within the data warehouse.
Which combination of steps should the company take to use Amazon Redshift ML to meet these requirements? (Select THREE.)
- A . Define the feature variables and target variable for the churn prediction model.
- B . Use the SQL EXPLAIN_MODEL function to run predictions.
- C . Write a CREATE MODEL SQL statement to create a model.
- D . Use Amazon Redshift Spectrum to train the model.
- E . Manually export the training data to Amazon S3.
- F . Use the SQL prediction function to run predictions,
A company is using a machine learning (ML) model to recommend products to customers. An ML specialist wants to analyze the data for the most popular recommendations in four dimensions. The ML specialist will visualize the first two dimensions as coordinates. The third dimension will be visualized as color. The ML specialist will use size to represent the fourth dimension in the visualization.
Which solution will meet these requirements?
- A . Use the Amazon SageMaker Data Wrangler bar chart feature. Use Group By to represent the third and fourth dimensions.
- B . Use the Amazon SageMaker Canvas box plot visualization. Use color and fill pattern to represent the third and fourth dimensions.
- C . Use the Amazon SageMaker Data Wrangler histogram feature. Use color and fill pattern to represent the third and fourth dimensions.
- D . Use the Amazon SageMaker Canvas scatter plot visualization. Use scatter point size and color to represent the third and fourth dimensions.
A company builds computer-vision models that use deep learning for the autonomous vehicle industry. A machine learning (ML) specialist uses an Amazon EC2 instance that has a CPU: GPU ratio of 12:1 to train the models.
The ML specialist examines the instance metric logs and notices that the GPU is idle half of the time
The ML specialist must reduce training costs without increasing the duration of the training jobs.
Which solution will meet these requirements?
- A . Switch to an instance type that has only CPUs.
- B . Use a heterogeneous cluster that has two different instances groups.
- C . Use memory-optimized EC2 Spot Instances for the training jobs.
- D . Switch to an instance type that has a CPU GPU ratio of 6:1.
A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property.
The following is the sample data, excluding all other variables:
* Building ID 1000 has a Wall_Color value of Red.
* Building ID 1001 has a Wall_Color value of White.
* Building ID 1002 has a Wall_Color value of Green.
The specialist chose a model that needs numerical input data.
Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
- A . Apply integer transformation and set Red = 1, White = 5, and Green = 10.
- B . Add new columns that store one-hot representation of colors.
- C . Replace the color name string by its length.
- D . Create three columns to encode the color in RGB format.
- E . Replace each color name by its training set frequency.