Practice Free AIP-C01 Exam Online Questions
A company is using Amazon Bedrock and Anthropic Claude 3 Haiku to develop an AI assistant. The AI assistant normally processes 10,000 requests each hour but experiences surges of up to 30,000 requests each hour during peak usage periods. The AI assistant must respond within 2 seconds while operating across multiple AWS Regions.
The company observes that during peak usage periods, the AI assistant experiences throughput bottlenecks that cause increased latency and occasional request timeouts. The company must
resolve the performance issues.
Which solution will meet this requirement?
- A . Purchase provisioned throughput and sufficient model units (MUs) in a single Region. Configure the application to retry failed requests with exponential backoff.
- B . Implement token batching to reduce API overhead. Use cross-Region inference profiles to automatically distribute traffic across available Regions.
- C . Set up auto scaling AWS Lambda functions in each Region. Implement client-side round-robin request distribution. Purchase one model unit (MU) of provisioned throughput as a backup.
- D . Implement batch inference for all requests by using Amazon S3 buckets across multiple Regions. Use Amazon SQS to set up an asynchronous retrieval process.
B
Explanation:
Option B is the correct solution because it directly addresses both throughput bottlenecks and latency requirements using native Amazon Bedrock performance optimization features that are designed for real-time, high-volume generative AI workloads.
Amazon Bedrock supports cross-Region inference profiles, which allow applications to transparently route inference requests across multiple AWS Regions. During peak usage periods, traffic is automatically distributed to Regions with available capacity, reducing throttling, request queuing, and timeout risks. This approach aligns with AWS guidance for building highly available, low-latency GenAI applications that must scale elastically across geographic boundaries.
Token batching further improves efficiency by combining multiple inference requests into a single model invocation where applicable. AWS Generative AI documentation highlights batching as a key optimization technique to reduce per-request overhead, improve throughput, and better utilize model capacity. This is especially effective for lightweight, low-latency models such as Claude 3 Haiku, which are designed for fast responses and high request volumes.
Option A does not meet the requirement because purchasing provisioned throughput in a single Region creates a regional bottleneck and does not address multi-Region availability or traffic spikes beyond reserved capacity. Retries increase load and latency rather than resolving the root cause.
Option C improves application-layer scaling but does not solve model-side throughput limits. Client-side round-robin routing lacks awareness of real-time model capacity and can still send traffic to saturated Regions.
Option D is unsuitable because batch inference with asynchronous retrieval is designed for offline or non-interactive workloads. It cannot meet a strict 2-second response time requirement for an
interactive AI assistant.
Therefore, Option B provides the most effective and AWS-aligned solution to achieve low latency, global scalability, and high throughput during peak usage periods.
A financial services company is deploying a generative AI (GenAI) application that uses Amazon Bedrock to assist customer service representatives to provide personalized investment advice to customers. The company must implement a comprehensive governance solution that follows responsible AI practices and meets regulatory requirements.
The solution must detect and prevent hallucinations in recommendations. The solution must have
safety controls for customer interactions. The solution must also monitor model behavior drift in real time and maintain audit trails of all prompt-response pairs for regulatory review. The company must deploy the solution within 60 days. The solution must integrate with the company’s existing compliance dashboard and respond to customers within 200 ms.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Configure Amazon Bedrock guardrails to apply custom content filters and toxicity detection. Use Amazon Bedrock Model Evaluation to detect hallucinations. Store prompt-response pairs in Amazon DynamoDB to capture audit trails and set a TTL. Integrate Amazon CloudWatch custom metrics with the existing compliance dashboard.
- B . Deploy Amazon Bedrock and use AWS PrivateLink to access the application securely. Use AWS Lambda functions to implement custom prompt validation. Store prompt-response pairs in an Amazon S3 bucket and configure S3 Lifecycle policies. Create custom Amazon CloudWatch dashboards to monitor model performance metrics.
- C . Use Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to ground responses. Use Amazon Bedrock Guardrails to enforce content safety. Use Amazon OpenSearch Service to store and index prompt-response pairs. Integrate OpenSearch Service with Amazon QuickSight to create compliance reports and to detect model behavior drift.
- D . Use Amazon SageMaker Model Monitor to detect model behavior drift. Use AWS WAF to filter content. Store customer interactions in an encrypted Amazon RDS database. Use Amazon API Gateway to create custom HTTP APIs to integrate with the compliance dashboard.
A
Explanation:
Option A is the correct solution because it uses native Amazon Bedrock governance and evaluation capabilities to meet regulatory, performance, and deployment timeline requirements with the least operational overhead.
Amazon Bedrock guardrails provide built-in safety controls that enforce responsible AI policies directly during inference. Custom content filters and toxicity detection protect customer interactions and prevent disallowed investment guidance patterns without requiring custom application logic. Guardrails operate inline and are optimized for low latency, which helps meet the strict 200 ms response-time requirement.
Hallucination detection is addressed through Amazon Bedrock Model Evaluation, which supports automated evaluation at scale using LLM-as-a-judge techniques. This enables the company to detect factual inaccuracies and policy violations systematically, without building custom evaluation pipelines or requiring extensive human review. Evaluation outputs can be surfaced as metrics.
Storing all prompt-response pairs in Amazon DynamoDB provides a low-latency, highly scalable audit store that aligns with financial regulatory requirements. Using TTL enforces data retention policies automatically, reducing compliance risk and storage overhead.
Amazon CloudWatch custom metrics integrate seamlessly with existing compliance dashboards, allowing nearCreal-time monitoring of safety interventions, hallucination rates, and drift indicators. CloudWatch anomaly detection can be applied to these metrics to surface behavior changes quickly.
Option B relies on custom Lambda logic and S3-based auditing, increasing latency and operational complexity.
Option C introduces additional services that increase setup time and may exceed the 60-day deployment window.
Option D uses nonCBedrock-native monitoring and adds unnecessary infrastructure layers.
Therefore, Option A provides the most complete, compliant, and low-overhead governance solution for a regulated GenAI financial services application.
A hotel company wants to enhance a legacy Java-based property management system (PMS) by adding AI capabilities. The company wants to use Amazon Bedrock Knowledge Bases to provide staff with room availability information and hotel-specific details. The solution must maintain separate access controls for each hotel that the company manages. The solution must provide room availability information in near real time and must maintain consistent performance during peak usage periods.
Which solution will meet these requirements?
- A . Deploy a single Amazon Bedrock knowledge base that contains combined data for all hotels. Configure AWS Lambda functions to synchronize data from each hotel’s PMS database through direct
API connections. Implement AWS CloudTrail logging with hotel-specific filters to audit access logs for each hotel’s data. - B . Create an Amazon EventBridge rule for each hotel that is invoked by changes to the PMS database. Configure the rule to send updates to a centralized Amazon Bedrock knowledge base in a management AWS account. Configure resource-based policies to enforce hotel-specific access controls.
- C . Implement one Amazon Bedrock knowledge base for each hotel in a multi-account structure. Use direct data ingestion to provide near real-time room availability information. Schedule regular synchronization for less critical information.
- D . Build a centralized Amazon Bedrock Agents solution that uses multiple knowledge bases. Implement AWS IAM Identity Center with hotel-specific permission sets to control staff access.
C
Explanation:
Option C best meets the requirements by aligning with AWS best practices for data isolation, access control, and scalable GenAI retrieval. Implementing a separate Amazon Bedrock knowledge base for each hotel ensures strict separation of data and permissions. This approach naturally enforces hotel-level access control without requiring complex policy logic or post-query filtering.
A multi-account structure further strengthens security and governance by isolating each hotel’s data plane. AWS recommends account-level isolation for workloads with strong tenancy or compliance boundaries. Hotel staff can be granted access only to their hotel’s account and corresponding knowledge base, eliminating the risk of cross-hotel data exposure.
Direct data ingestion into each knowledge base enables near real-time updates for critical data such as room availability. For information that does not change frequently, scheduled synchronization reduces ingestion cost while maintaining accuracy. This hybrid ingestion model balances freshness and operational efficiency.
Because Amazon Bedrock Knowledge Bases are fully managed, performance remains consistent during peak usage periods without the company managing indexing, scaling, or retrieval infrastructure. Each knowledge base scales independently, preventing noisy-neighbor issues that could arise in a centralized design.
Option A and B rely on a centralized knowledge base, which increases policy complexity and introduces risk of misconfigured access controls.
Option D adds unnecessary orchestration complexity and does not inherently solve real-time data freshness requirements.
Therefore, Option C provides the most secure, scalable, and operationally efficient solution for enhancing the PMS with Amazon Bedrock Knowledge Bases.
A healthcare company is using Amazon Bedrock to develop a real-time patient care AI assistant to respond to queries for separate departments that handle clinical inquiries, insurance verification, appointment scheduling, and insurance claims. The company wants to use a multi-agent architecture.
The company must ensure that the AI assistant is scalable and can onboard new features for patients. The AI assistant must be able to handle thousands of parallel patient interactions. The company must ensure that patients receive appropriate domain-specific responses to queries.
Which solution will meet these requirements?
- A . Isolate data for each agent by using separate knowledge bases. Use IAM filtering to control access to each knowledge base. Deploy a supervisor agent to perform natural language intent classification on patient inquiries. Configure the supervisor agent to route queries to specialized collaborator agents to respond to department-specific queries. Configure each specialized collaborator agent to use Retrieval Augmented Generation (RAG) with the agent’s department-specific knowledge base.
- B . Create a separate supervisor agent for each department. Configure individual collaborator agents to perform natural language intent classification for each specialty domain within each department. Integrate each collaborator agent with department-specific knowledge bases only. Implement manual handoff processes between the supervisor agents.
- C . Isolate data for each department in separate knowledge bases. Use IAM filtering to control access to each knowledge base. Deploy a single general-purpose agent. Configure multiple action groups within the general-purpose agent to perform specific department functions. Implement rule-based routing logic within the general-purpose agent instructions.
- D . Implement multiple independent supervisor agents that run in parallel to respond to patient inquiries for each department. Configure multiple collaborator agents for each supervisor agent. Integrate all agents with the same knowledge base. Use external routing logic to merge responses from multiple supervisor agents.
A
Explanation:
Option A is the most appropriate design because it provides scalable multi-agent orchestration, clear domain separation, and strong governance with minimal operational complexity. A supervisor-agent pattern is a standard AWS-recommended approach for multi-agent systems: one agent performs intent classification and routing, while specialized agents handle domain-specific tasks.
Isolating data with separate knowledge bases ensures that each specialized collaborator agent retrieves only the information relevant to its department. This improves response accuracy, reduces hallucinations, and supports privacy controls because clinical content, claims content, and scheduling content can have different access policies. IAM-based filtering ensures that each agent has permission only to the knowledge base it is authorized to use.
Routing patient inquiries through a supervisor agent supports high concurrency and extensibility. New departments or features can be added by introducing new collaborator agents and knowledge bases without redesigning the entire system. Because routing is handled centrally, changes in classification logic do not require updates across many independent supervisors.
Using RAG within each collaborator agent ensures that responses are grounded in department-approved information sources, which is critical in healthcare settings to reduce unsafe or incorrect guidance. This approach also improves performance because each retrieval scope is smaller and more relevant, supporting thousands of parallel interactions.
Option B introduces manual handoffs that do not scale.
Option C relies on rule-based routing inside one general agent, which becomes brittle and difficult to govern as complexity grows.
Option D mixes all departments into a single knowledge base and merges responses externally, increasing risk of incorrect domain answers and operational overhead.
Therefore, Option A best meets the scalability, correctness, and multi-agent onboarding requirements.
A wildlife conservation agency operates zoos globally. The agency uses various sensors, trackers, and audiovisual recorders to monitor animal behavior. The agency wants to launch a generative AI (GenAI) assistant that can ingest multimodal data to study animal behavior.
The GenAI assistant must support natural language queries, avoid speculative behavioral interpretations, and maintain audit logs for ethical research audits.
Which solution will meet these requirements?
- A . Ingest raw videos into Amazon Rekognition to detect animal postures and expressions. Use Amazon Data Firehose to stream sensor and GPS data into Amazon S3. Prompt an Amazon Bedrock FM using basic templates stored in AWS Systems Manager Parameter Store. Use IAM for access control. Use AWS CloudTrail for audit logging.
- B . Use Amazon SageMaker Processing and Amazon Transcribe to pre-process multimodal data. Ingest curated summaries into an Amazon Bedrock Knowledge Bases. Apply Amazon Bedrock guardrails to restrict speculative outputs. Use AWS AppConfig to manage prompt templates. Use AWS CloudTrail to log research activity for audits.
- C . Use Amazon OpenSearch Serverless to index behavioral logs and telemetry. Use Amazon Comprehend to extract entities. Use Amazon Bedrock to answer questions over indexed data. Use IAM for access control and CloudTrail for audit logging.
- D . Configure Amazon O Business to federate data across Amazon S3, Amazon Kinesis, and Amazon SageMaker Feature Store. Use EventBridge for ingestion orchestration. Use custom AWS Lambda functions to filter LLM outputs for ethical compliance.
B
Explanation:
Option B best meets the multimodal, ethical, and auditability requirements using managed AWS services designed for research-grade GenAI systems. Multimodal data such as audio, video, sensor telemetry, and tracking data must be curated and summarized before being consumed by a foundation model. Amazon SageMaker Processing and Amazon Transcribe provide scalable, managed preprocessing for audiovisual and textual data.
By ingesting summarized, validated observations into Amazon Bedrock Knowledge Bases, the GenAI
assistant can answer natural language queries using grounded, evidence-based context instead of raw sensor signals. This significantly reduces the risk of speculative or anthropomorphic interpretations.
Amazon Bedrock guardrails are critical for preventing speculative behavioral claims, enforcing scientific and ethical constraints at inference time. Guardrails provide a validated, auditable safety layer that custom Lambda-based filters cannot reliably replicate.
AWS AppConfig enables controlled prompt management and change governance, ensuring that research prompts remain consistent and reviewable. AWS CloudTrail captures all access, query, and configuration changes, supporting ethical research audits and regulatory reviews.
Option A lacks grounding and speculative safeguards.
Option C focuses on text analytics and does not properly handle multimodal reasoning or safety enforcement.
Option D relies heavily on custom logic and introduces unnecessary operational risk.
Therefore, Option B provides the most robust, ethical, and auditable GenAI architecture for wildlife behavior research.
A company is building a video analysis platform on AWS. The platform will analyze a large video archive by using Amazon Rekognition and Amazon Bedrock. The platform must comply with predefined privacy standards. The platform must also use secure model I/O, control foundation model (FM) access patterns, and provide an audit of who accessed what and when.
Which solution will meet these requirements?
- A . Configure VPC endpoints for Amazon Bedrock model API calls. Implement Amazon Bedrock guardrails to filter harmful or unauthorized content in prompts and responses. Use Amazon Bedrock trace events to track all agent and model invocations for auditing purposes. Export the traces to Amazon CloudWatch Logs as an audit record of model usage. Store all prompts and outputs in Amazon S3 with server-side encryption with AWS KMS keys (SSE-KMS).
- B . Define access control by using IAM with attribute-based access control (ABAC) to map departments to specific permissions. Configure VPC endpoints for Amazon Bedrock model API calls. Use IAM condition keys to enforce specific GuardrailIdentifier and ModelId values. Configure AWS CloudTrail to capture management and data events for S3 objects and KMS key usage activities. Enable S3 server access logging to record detailed file-level interactions with the video archives. Send all CloudTrail logs to AWS CloudTrail Lake. Set up Amazon CloudWatch alarms to detect and alert on unexpected activity from Amazon Bedrock, Amazon Rekognition, and AWS KMS.
- C . Restrict access to services by using VPC endpoint policies. Use AWS Config to track resource changes and compliance with security rules. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt data at rest. Store the model’s I/O in separate Amazon S3 buckets. Enable S3 server access logging to track file-level interactions.
- D . Configure AWS CloudTrail Insights to analyze API call patterns across accounts and detect anomalous activity in Amazon Bedrock, Amazon Rekognition, Amazon S3, and AWS KMS. Deploy Amazon Macie to scan and classify the video archive. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt all stored data. Configure CloudTrail to capture KMS API usage events for audit purposes. Configure Amazon EventBridge rules to process CloudTrail Insights anomalies and Macie findings. Use CloudWatch alarms to trigger automated notifications and security responses when
potential security issues are detected.
B
Explanation:
Option B is the correct solution because it delivers end-to-end governance, security, and auditability across Amazon Bedrock, Amazon Rekognition, and the underlying data layer while meeting strict privacy and compliance requirements.
Using IAM attribute-based access control (ABAC) allows the company to control access to foundation models and data based on department, role, or workload attributes rather than static permissions. This is critical for controlling FM access patterns at scale. Enforcing specific ModelId and GuardrailIdentifier values with IAM condition keys ensures that only approved models and guardrails are used, which directly supports secure model I/O and governance requirements.
Configuring VPC endpoints for Amazon Bedrock ensures that all model invocations remain on private AWS network paths, reducing data exfiltration risk and supporting privacy standards. AWS CloudTrail captures both management and data events, providing a definitive audit trail of who accessed which resources and when. Sending logs to CloudTrail Lake enables centralized, long-term, queryable auditing across services.
Amazon S3 server access logging adds file-level visibility into video archive access, which is essential for compliance and forensic analysis. Amazon CloudWatch alarms provide near real-time detection of anomalous or unauthorized activity across Amazon Bedrock, Amazon Rekognition, and AWS KMS.
Option A focuses primarily on model-level tracing but lacks comprehensive IAM governance and S3 access auditing.
Option C provides partial controls but lacks identity-aware auditing and model governance.
Option D focuses on anomaly detection and classification but does not explicitly control FM access patterns.
Therefore, Option B best satisfies all stated requirements in a unified, auditable, and security-first architecture.
A publishing company is developing a chat assistant that uses a containerized large language model (LLM) that runs on Amazon SageMaker AI. The architecture consists of an Amazon API Gateway REST API that routes user requests to an AWS Lambda function. The Lambda function invokes a SageMaker AI real-time endpoint that hosts the LLM.
Users report uneven response times. Analytics show that a high number of chats are abandoned after 2 seconds of waiting for the first token. The company wants a solution to ensure that p95 latency is under 800 ms for interactive requests to the chat assistant.
Which combination of solutions will meet this requirement? (Select TWO.)
- A . Enable model preload upon container startup. Implement dynamic batching to process multiple user requests together in a single inference pass.
- B . Select a larger GPU instance type for the SageMaker AI endpoint. Set the minimum number of instances to 0. Continue to perform per-request processing. Lazily load model weights on the first request.
- C . Switch to a multi-model endpoint. Use lazy loading without request batching.
- D . Set the minimum number of instances to greater than 0. Enable response streaming.
- E . Switch to Amazon SageMaker Asynchronous Inference for all requests. Store requests in an Amazon S3 bucket. Set the minimum number of instances to 0.
A, D
Explanation:
The correct answers are A and D because they directly reduce time-to-first-token and stabilize p95 latency for interactive, real-time chat workloads hosted on Amazon SageMaker AI real-time endpoints.
Option D addresses the biggest driver of uneven latency: cold starts and scale-to-zero behavior. By setting the minimum number of instances to greater than 0, the endpoint always has warm capacity and loaded runtime resources, eliminating the first-request penalty that causes users to wait multiple seconds. Enabling response streaming improves perceived latency by returning the first tokens as soon as they are generated rather than waiting for the complete response. This directly targets the abandonment problem described (users leaving after waiting for the first token).
Option A further improves p95 latency and throughput by removing model loading overhead during inference and improving GPU utilization. Preloading model weights during container startup ensures the model is ready before traffic arrives and avoids unpredictable on-demand weight loading.
Dynamic batching increases efficiency by grouping compatible requests into a single inference pass, reducing per-request overhead and improving GPU saturation. When tuned properly for interactive workloads, batching can reduce tail latency while preserving responsiveness by enforcing small batch windows.
Option B makes latency worse because setting minimum instances to 0 and lazily loading weights guarantees cold-start delays and unpredictable first-token performance.
Option C similarly increases cold-start behavior through lazy loading and offers no batching benefits.
Option E is designed for non-interactive workloads and introduces queueing and storage latency, which conflicts with the 800 ms p95 requirement for interactive chat.
Therefore, A and D are the best combination to achieve consistently low p95 latency and fast first-token streaming for a SageMaker-hosted chat assistant.
A company is developing a generative AI (GenAI)-powered customer support application that uses Amazon Bedrock foundation models (FMs). The application must maintain conversational context across multiple interactions with the same user. The application must run clarification workflows to handle ambiguous user queries. The company must store encrypted records of each user conversation to use for personalization. The application must be able to handle thousands of concurrent users while responding to each user quickly.
Which solution will meet these requirements?
- A . Use an AWS Step Functions Express workflow to orchestrate conversation flow. Invoke AWS Lambda functions to run clarification logic. Store conversation history in Amazon RDS and use session IDs as the primary key.
- B . Use an AWS Step Functions Standard workflow to orchestrate clarification workflows. Include Wait for a Callback patterns to manage the workflows. Store conversation history in Amazon DynamoDB. Purchase on-demand capacity and configure server-side encryption.
- C . Deploy the application by using an Amazon API Gateway REST API to route user requests to an AWS Lambda function to update and retrieve conversation context. Store conversation history in Amazon S3 and configure server-side encryption. Save each interaction as a separate JSON file.
- D . Use AWS Lambda functions to call Amazon Bedrock inference APIs. Use Amazon SQS queues to orchestrate clarification steps. Store conversation history in an Amazon ElastiCache (Redis OSS) cluster. Configure encryption at rest.
B
Explanation:
Option B is the correct solution because it provides a scalable, durable, and secure architecture for conversational GenAI workloads that require multi-step clarification workflows and persistent memory.
AWS Step Functions Standard workflows are designed for long-running, stateful workflows with high reliability, which is ideal for clarification loops that may require multiple back-and-forth interactions. The Wait for a Callback pattern allows the workflow to pause while awaiting additional user input, making it well-suited for handling ambiguous queries without losing execution state.
Storing conversation history in Amazon DynamoDB enables millisecond-latency reads and writes at massive scale, supporting thousands of concurrent users. DynamoDB’s on-demand capacity mode automatically scales with traffic, eliminating capacity planning. Server-side encryption ensures that stored conversation data is encrypted at rest, meeting security and compliance requirements for personalized data.
Option A uses Step Functions Express and Amazon RDS, which is not ideal for long-lived conversational workflows and introduces scaling and connection management challenges.
Option C stores conversations as individual S3 objects, which increases latency and complicates context retrieval.
Option D relies on Amazon ElastiCache, which is optimized for ephemeral caching rather than durable, auditable conversation history.
Therefore, Option B best balances scalability, performance, durability, and security for a conversational Amazon BedrockCbased customer support application.
A media company must use Amazon Bedrock to implement a robust governance process for AI-generated content. The company needs to manage hundreds of prompt templates. Multiple teams use the templates across multiple AWS Regions to generate content. The solution must provide version control with approval workflows that include notifications for pending reviews. The solution must also provide detailed audit trails that document prompt activities and consistent prompt parameterization to enforce quality standards.
Which solution will meet these requirements?
- A . Configure Amazon Bedrock Studio prompt templates. Use Amazon CloudWatch dashboards to display prompt usage metrics. Store approval status in Amazon DynamoDB. Use AWS Lambda functions to enforce approvals.
- B . Use Amazon Bedrock Prompt Management to implement version control. Configure AWS CloudTrail for audit logging. Use AWS Identity and Access Management policies to control approval permissions. Create parameterized prompt templates by specifying variables.
- C . Use AWS Step Functions to create an approval workflow. Store prompts in Amazon S3. Use tags to implement version control. Use Amazon EventBridge to send notifications.
- D . Deploy Amazon SageMaker Canvas with prompt templates stored in Amazon S3. Use AWS CloudFormation for version control. Use AWS Config to enforce approval policies.
B
Explanation:
Option B is the correct solution because Amazon Bedrock Prompt Management is purpose-built to manage, govern, and standardize prompt usage at scale across teams and Regions. It provides native version control, allowing teams to track prompt changes over time and ensure that only approved versions are used in production workflows.
Prompt Management supports approval workflows that align with enterprise governance requirements. Approval permissions can be enforced through IAM policies, ensuring that only authorized reviewers can approve or publish prompt versions. This removes the need for custom workflow engines or external storage systems, significantly reducing operational overhead.
Parameterized prompt templates enable consistent prompt structure while allowing controlled variation through defined variables. This ensures consistent quality standards and reduces prompt drift, which is critical when hundreds of prompts are reused across multiple applications and teams.
AWS CloudTrail integrates natively with Amazon Bedrock to provide immutable audit logs for prompt creation, updates, approvals, and usage. These detailed audit trails satisfy compliance requirements and allow security and governance teams to trace prompt activity across Regions and users.
Option A requires significant custom development to coordinate approvals and maintain state.
Option C relies on general-purpose workflow services and manual versioning mechanisms that are error-prone and difficult to scale.
Option D uses services not designed for large-scale GenAI prompt governance and introduces unnecessary complexity.
Therefore, Option B best meets the requirements for scalable, auditable, and low-overhead governance of AI-generated content using Amazon Bedrock.
A specialty coffee company has a mobile app that generates personalized coffee roast profiles by using Amazon Bedrock with a three-stage prompt chain. The prompt chain converts user inputs into structured metadata, retrieves relevant logs for coffee roasts, and generates a personalized roast recommendation for each customer.
Users in multiple AWS Regions report inconsistent roast recommendations for identical inputs, slow inference during the retrieval step, and unsafe recommendations such as brewing at excessively high
temperatures. The company must improve the stability of outputs for repeated inputs. The company must also improve app performance and the safety of the app’s outputs. The updated solution must ensure 99.5% output consistency for identical inputs and achieve inference latency of less than 1 second. The solution must also block unsafe or hallucinated recommendations by using validated safety controls.
Which solution will meet these requirements?
- A . Deploy Amazon Bedrock with provisioned throughput to stabilize inference latency. Apply Amazon Bedrock guardrails with semantic denial rules to block unsafe outputs. Use Amazon Bedrock Prompt Management to manage prompts by using approval workflows.
- B . Use Amazon Bedrock Agents to manage chaining. Log model inputs and outputs to Amazon CloudWatch Logs. Use logs from CloudWatch to perform A/B testing for prompt versions.
- C . Cache prompt results in Amazon ElastiCache. Use AWS Lambda functions to pre-process metadata and to trace end-to-end latency. Use AWS X-Ray to identify and remediate performance bottlenecks.
- D . Use Amazon Kendra to improve roast log retrieval accuracy. Store normalized prompt metadata within Amazon DynamoDB. Use AWS Step Functions to orchestrate multi-step prompts.
A
Explanation:
Option A is the only choice that simultaneously addresses all three requirements: (1) higher output consistency for identical inputs, (2) sub-1-second performance, and (3) validated safety controls that block unsafe or hallucinated recommendations.
Provisioned throughput in Amazon Bedrock reserves capacity for the chosen model, which helps stabilize latency and reduces the chance of throttling or variable response times across Regions. This is important for a mobile app with strict latency goals and users distributed across multiple Regions. While provisioned throughput primarily improves performance predictability, it also reduces variability caused by contention during peak demand.
Amazon Bedrock guardrails provide validated safety controls to filter or block unsafe content. Semantic denial rules are appropriate for preventing dangerous brewing guidance (for example, excessively high temperatures) and for reducing hallucinated instructions that violate safety policies. Guardrails can be enforced consistently regardless of prompt-chain complexity, providing a uniform safety layer around the model outputs.
Amazon Bedrock Prompt Management supports controlled prompt versioning and approval workflows. By standardizing prompts, controlling changes, and ensuring the same prompt version is used for identical inputs, the company improves output stability and reduces drift caused by unmanaged prompt edits. Combined with strict configuration control (including fixed inference
parameters such as temperature where appropriate), this improves repeatability and increases the likelihood of achieving the 99.5% consistency target.
Option B improves observability and experimentation but does not provide strong safety enforcement or latency stabilization.
Option C improves performance through caching and tracing but does not provide validated safety controls and does not directly address cross-Region output consistency.
Option D may improve retrieval but does not enforce safety controls or ensure repeatable outputs.
Therefore, Option A best meets the stability, performance, and safety requirements using AWS-native controls.
