Practice Free AIP-C01 Exam Online Questions
An ecommerce company is using Amazon Bedrock to build a generative AI (GenAI) application. The application uses AWS Step Functions to orchestrate a multi-agent workflow to produce detailed product descriptions. The workflow consists of three sequential states: a description generator, a technical specifications validator, and a brand voice consistency checker. Each state produces intermediate reasoning traces and outputs that are passed to the next state. The application uses an Amazon S3 bucket for process storage and to store outputs.
During testing, the company discovers that outputs between Step Functions states frequently exceed the 256 KB quota and cause workflow failures. A GenAI Developer needs to revise the application architecture to efficiently handle the Step Functions 256 KB quota and maintain workflow observability. The revised architecture must preserve the existing multi-agent reasoning and acting (ReAct) pattern.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Store intermediate outputs in Amazon DynamoDB. Pass only references between states. Create a Map state that retrieves the complete data from DynamoDB when required for each agent’s processing step.
- B . Configure an Amazon Bedrock integration to use the S3 bucket URI in the input parameters for large outputs. Use the ResultPath and ResultSelector fields to route S3 references between the agent steps while maintaining the sequential validation workflow.
- C . Use AWS Lambda functions to compress outputs to less than 256 KB before each agent state. Configure each agent task to decompress outputs before processing and to compress results before passing them to the next state.
- D . Configure a separate Step Functions state machine to handle each agent’s processing. Use Amazon EventBridge to coordinate the execution flow between state machines. Use S3 references for the outputs as event data.
B
Explanation:
Option B is the best solution because it directly addresses the Step Functions 256 KB state payload quota by externalizing large intermediate artifacts to Amazon S3 and passing only lightweight references (URIs/keys) between states. This is a standard AWS pattern for workflows that produce large intermediate results, and it avoids introducing additional databases, compression logic, or cross-state-machine coordination that increases operational overhead.
In a multi-agent ReAct workflow, intermediate reasoning traces can be verbose and grow quickly as each agent produces chain-of-thought style artifacts, structured outputs, and supporting evidence. Step Functions is designed to orchestrate state transitions and pass JSON payloads, but large payloads should be stored outside the state machine and referenced by pointer values. Using Amazon S3 for intermediate outputs is operationally efficient because the application already uses S3 for storage, and S3 provides durable, low-cost storage with simple access patterns.
ResultPath and ResultSelector allow each state to store or reshape results so that only the required reference fields (such as s3Uri, object key, metadata, trace IDs) are forwarded to subsequent states. This preserves observability because the workflow can still log trace references, correlate steps with S3 objects, and store structured metadata for debugging. It also preserves the sequential validation design, keeping the existing ReAct pattern intact while preventing failures due to oversized payloads.
Option A adds additional services and read/write patterns that increase operational complexity.
Option C introduces custom compression/decompression logic that is fragile, adds latency, and complicates troubleshooting.
Option D increases orchestration overhead by splitting workflows and coordinating with events, which makes debugging harder and increases failure modes.
Therefore, Option B meets the payload limit requirement while keeping the architecture simple and observable.
A company is designing an API for a generative AI (GenAI) application that uses a foundation model (FM) that is hosted on a managed model service. The API must stream responses to reduce latency, enforce token limits to manage compute resource usage, and implement retry logic to handle model timeouts and partial responses.
Which solution will meet these requirements with the LEAST operational overhead?
- A . Integrate an Amazon API Gateway HTTP API with an AWS Lambda function to invoke Amazon Bedrock. Use Lambda response streaming to stream responses. Enforce token limits within the
Lambda function. Implement retry logic for model timeouts by using Lambda and API Gateway timeout configurations. - B . Connect an Amazon API Gateway HTTP API directly to Amazon Bedrock. Simulate streaming by using client-side polling. Enforce token limits on the frontend. Configure retry behavior by using API Gateway integration settings.
- C . Connect an Amazon API Gateway WebSocket API to an Amazon ECS service that hosts a containerized inference server. Stream responses by using the WebSocket protocol. Enforce token limits within Amazon ECS. Handle model timeouts by using ECS task lifecycle hooks and restart policies.
- D . Integrate an Amazon API Gateway REST API with an AWS Lambda function that invokes Amazon Bedrock. Use Lambda response streaming to stream responses. Enforce token limits within the Lambda function. Implement retry logic by using Lambda and API Gateway timeout configurations.
A
Explanation:
Option A is the best solution because it satisfies streaming, token control, and retry requirements while keeping operational overhead low by using fully managed, serverless AWS services. Amazon API Gateway HTTP APIs provide a lightweight, cost-effective front door for APIs and integrate cleanly with AWS Lambda for request processing and security controls.
AWS Lambda response streaming allows the API to begin returning content to the client as soon as partial model output is available, reducing perceived latency and improving user experience for long responses. Using Lambda as the integration layer also provides a centralized place to enforce token-aware request handling, such as rejecting oversized requests, truncating optional context, or applying consistent limits across users and tenants to manage compute usage.
Retry logic is best handled in the client or integration layer for transient failures such as timeouts and throttling. Lambda can implement controlled retries with exponential backoff and jitter, while API Gateway timeouts help bound request lifetimes and prevent hung connections from consuming resources indefinitely. Because the model service is managed, the company avoids infrastructure management and focuses only on request shaping, safety, and resiliency behavior.
Option B is not suitable because client-side polling is not true streaming, front-end token enforcement is insecure and inconsistent, and API Gateway does not provide model-aware retry behavior on its own.
Option C introduces container hosting and scaling complexity, which increases operational overhead compared to serverless.
Option D can work, but REST APIs are generally heavier than HTTP APIs for this pattern and do not reduce overhead compared to Option A.
Therefore, Option A provides the required streaming and resiliency capabilities with the least infrastructure management effort.
An enterprise application uses an Amazon Bedrock foundation model (FM) to process and analyze 50 to 200 pages of technical documents. Users are experiencing inconsistent responses and receiving truncated outputs when processing documents that exceed the FM’s context window limits.
Which solution will resolve this problem?
- A . Configure fixed-size chunking at 4,000 tokens for each chunk with 20% overlap. Use application-level logic to link multiple chunks sequentially until the FM’s maximum context window of 200,000 tokens is reached before making inference calls.
- B . Use hierarchical chunking with parent chunks of 8,000 tokens and child chunks of 2,000 tokens. Use Amazon Bedrock Knowledge Bases built-in retrieval to automatically select relevant parent chunks based on query context. Configure overlap tokens to maintain semantic continuity.
- C . Use semantic chunking with a breakpoint percentile threshold of 95% and a buffer size of 3 sentences. Use the RetrieveAndGenerate API to dynamically select the most relevant chunks based on embedding similarity scores.
- D . Create a pre-processing AWS Lambda function that analyzes document token count by using the FM’s tokenizer. Configure the Lambda function to split documents into equal segments that fit within 80% of the context window. Configure the Lambda function to process each segment independently before aggregating the results.
C
Explanation:
Option C directly addresses the root cause of truncated and inconsistent responses by using AWS-recommended semantic chunking and dynamic retrieval rather than static or sequential chunk processing. Amazon Bedrock documentation emphasizes that foundation models have fixed context windows and that sending oversized or poorly structured input can lead to truncation, loss of context, and degraded output quality.
Semantic chunking breaks documents based on meaning instead of fixed token counts. By using a breakpoint percentile threshold and sentence buffers, the content remains coherent and semantically complete. This approach reduces the likelihood that important concepts are split across chunks, which is a common cause of inconsistent summarization results.
The RetrieveAndGenerate API is designed specifically to handle large documents that exceed a model’s context window. Instead of forcing all content into a single inference call, the API generates embeddings for chunks and dynamically selects only the most relevant chunks based on similarity to the user query. This ensures that the FM receives only high-value context while staying within its context window limits.
Option A is ineffective because chaining chunks sequentially does not align with how FMs process context and risks exceeding context limits or introducing irrelevant information.
Option B improves structure but still relies on larger parent chunks, which can lead to inefficiencies when processing very large documents.
Option D processes segments independently, which often causes loss of global context and inconsistent summaries.
Therefore, Option C is the most robust, AWS-aligned solution for resolving truncation and consistency issues when processing large technical documents with Amazon Bedrock.
A medical device company wants to feed reports of medical procedures that used the company’s devices into an AI assistant. To protect patient privacy, the AI assistant must expose patient personally identifiable information (PII) only to surgeons. The AI assistant must redact PII for engineers. The AI assistant must reference only medical reports that are less than 3 years old.
The company stores reports in an Amazon S3 bucket as soon as each report is published. The company has already set up an Amazon Bedrock Knowledge Bases. The AI assistant uses Amazon Cognito to authenticate users.
Which solution will meet these requirements?
- A . Enable Amazon Macie PII detection on the S3 bucket. Use an S3 trigger to invoke an AWS Lambda function that redacts PII from the reports. Configure the Lambda function to delete outdated documents and invoke knowledge base syncing.
- B . Invoke an AWS Lambda function to sync the S3 bucket and the knowledge base when a new report is uploaded. Use a second Lambda function with Amazon Comprehend to redact PII for engineers. Use S3 Lifecycle rules to remove reports older than 3 years.
- C . Set up an S3 Lifecycle configuration to remove reports that are older than 3 years. Schedule an AWS Lambda function to run daily syncs between the bucket and the knowledge base. When users interact with the AI assistant, apply a guardrail configuration selected based on the user’s Cognito user group to redact PII from responses when required.
- D . Create a second knowledge base. Use Lambda and Amazon Comprehend to redact PII before syncing to the second knowledge base. Route users to the appropriate knowledge base based on Cognito group membership.
C
Explanation:
Option C is the correct solution because it enforces privacy controls at inference time, not at ingestion time, which is required when different user roles require different visibility into the same underlying data.
Using an S3 Lifecycle configuration ensures that documents older than 3 years are automatically removed, guaranteeing that the knowledge base references only compliant, recent medical reports.
Scheduling Lambda-based syncs keeps the knowledge base aligned with the bucket contents without introducing complex per-upload orchestration.
The most important requirement is role-based PII exposure. Amazon Bedrock guardrails support dynamic application at inference time, allowing the system to select a guardrail configuration based on the authenticated user’s Amazon Cognito group. Surgeons can receive full responses, while engineers receive responses with PII masked―without duplicating data or maintaining multiple knowledge bases.
This approach preserves a single source of truth for medical reports while enforcing privacy through response-level controls. It also maintains full auditability of access and redaction behavior.
Option A permanently removes PII and violates surgeon access requirements.
Option B redacts data inconsistently and couples privacy logic to ingestion.
Option D doubles storage, increases cost, and introduces data drift risk.
Therefore, Option C best meets privacy, compliance, scalability, and operational efficiency requirements.
An elevator service company has developed an AI assistant application by using Amazon Bedrock. The application generates elevator maintenance recommendations to support the company’s elevator technicians. The company uses Amazon Kinesis Data Streams to collect the elevator sensor data.
New regulatory rules require that a human technician must review all AI-generated recommendations. The company needs to establish human oversight workflows to review and approve AI recommendations. The company must store all human technician review decisions for audit purposes.
Which solution will meet these requirements?
- A . Create a custom approval workflow by using AWS Lambda functions and Amazon SQS queues for human review of AI recommendations. Store all review decisions in Amazon DynamoDB for audit purposes.
- B . Create an AWS Step Functions workflow that has a human approval step that uses the waitForTaskToken API to pause execution. After a human technician completes a review, use an AWS Lambda function to call the SendTaskSuccess API with the approval decision. Store all review decisions in Amazon DynamoDB.
- C . Create an AWS Glue workflow that has a human approval step. After the human technician review, integrate the application with an AWS Lambda function that calls the SendTaskSuccess API. Store all human technician review decisions in Amazon DynamoDB.
- D . Configure Amazon EventBridge rules with custom event patterns to route AI recommendations to human technicians for review. Create AWS Glue jobs to process human technician approval queues. Use Amazon ElastiCache to cache all human technician review decisions.
B
Explanation:
AWS Step Functions provides native support for human-in-the-loop workflows, making it the best fit for regulatory oversight requirements. The waitForTaskToken integration pattern is explicitly designed to pause a workflow until an external actor―such as a human reviewer―completes a task.
In this architecture, AI-generated recommendations are sent to a human technician for review. The workflow pauses execution using a task token. Once the technician approves or rejects the recommendation, an AWS Lambda function calls SendTaskSuccess or SendTaskFailure, allowing the workflow to continue deterministically.
This approach ensures full auditability, as Step Functions records every state transition, timestamp, and execution path. Storing review outcomes in Amazon DynamoDB provides durable, queryable audit records required for regulatory compliance.
Option A requires custom orchestration and lacks native workflow state management.
Option C incorrectly uses AWS Glue, which is not designed for approval workflows.
Option D uses caching instead of durable audit storage and introduces unnecessary complexity.
Therefore, Option B is the AWS-recommended, lowest-risk, and most auditable solution for
mandatory human review of AI outputs.
A company uses AWS Lambda functions to build an AI agent solution. A GenAI developer must set up a Model Context Protocol (MCP) server that accesses user information. The GenAI developer must also configure the AI agent to use the new MCP server. The GenAI developer must ensure that only authorized users can access the MCP server.
Which solution will meet these requirements?
- A . Use a Lambda function to host the MCP server. Grant the AI agent Lambda functions permission to invoke the Lambda function that hosts the MCP server. Configure the AI agent’s MCP client to invoke
the MCP server asynchronously. - B . Use a Lambda function to host the MCP server. Grant the AI agent Lambda functions permission to invoke the Lambda function that hosts the MCP server. Configure the AI agent to use the STDIO transport with the MCP server.
- C . Use a Lambda function to host the MCP server. Create an Amazon API Gateway HTTP API that proxies requests to the Lambda function. Configure the AI agent solution to use the Streamable HTTP transport to make requests through the HTTP API. Use Amazon Cognito to enforce OAuth 2.1.
- D . Use a Lambda layer to host the MCP server. Add the Lambda layer to the AI agent Lambda functions. Configure the agentic AI solution to use the STDIO transport to send requests to the MCP server. In the AI agent’s MCP configuration, specify the Lambda layer ARN as the command. Specify the user credentials as environment variables.
C
Explanation:
Option C is the correct solution because it provides a secure, scalable, and standards-compliant way to expose an MCP server to an AI agent while enforcing strong user authorization. The Model Context Protocol supports HTTP-based transports for remote MCP servers, making Streamable HTTP the appropriate choice when the server is hosted as a managed service rather than a local process.
Hosting the MCP server in AWS Lambda enables automatic scaling and cost-efficient execution. By placing Amazon API Gateway in front of the Lambda function, the company creates a secure, managed HTTP endpoint that the AI agent can invoke reliably. This architecture cleanly separates transport, authentication, and business logic, which aligns with AWS serverless best practices.
Using Amazon Cognito to enforce OAuth 2.1 ensures that only authenticated and authorized users can access the MCP server. This satisfies security and compliance requirements when the MCP server handles sensitive user information. Cognito integrates natively with API Gateway, removing the need for custom authentication logic and reducing operational overhead.
Option A lacks user-level authorization controls.
Option B and Option D rely on STDIO transport, which is intended for local or tightly coupled processes and is not suitable for distributed, serverless architectures.
Option D also introduces security risks by handling credentials through environment variables.
Therefore, Option C best meets the requirements for secure access control, scalability, and correct MCP integration in an AWS-based AI agent architecture.
