Latest Amazon Web Services AIP-C01 Dumps PDF Questions Answers 2025

AWS Certified Generative AI Developer - Professional Questions and Answers

Question 1

A bank is developing a generative AI (GenAI)-powered AI assistant that uses Amazon Bedrock to assist the bank’s website users with account inquiries and financial guidance. The bank must ensure that the AI assistant does not reveal any personally identifiable information (PII) in customer interactions.

The AI assistant must not send PII in prompts to the GenAI model. The AI assistant must not respond to customer requests to provide investment advice. The bank must collect audit logs of all customer interactions, including any images or documents that are transmitted during customer interactions.

Which solution will meet these requirements with the LEAST operational effort?

Options:

Use Amazon Macie to detect and redact PII in user inputs and in the model responses. Apply prompt engineering techniques to force the model to avoid investment advice topics. Use AWS CloudTrail to capture conversation logs.

Use an AWS Lambda function and Amazon Comprehend to detect and redact PII. Use Amazon Comprehend topic modeling to prevent the AI assistant from discussing investment advice topics. Set up custom metrics in Amazon CloudWatch to capture customer conversations.

Configure Amazon Bedrock guardrails to apply a sensitive information policy to detect and filter PII. Set up a topic policy to ensure that the AI assistant avoids investment advice topics. Use the Converse API to log model invocations. Enable delivery and image logging to Amazon S3.

Use regex controls to match patterns for PII. Apply prompt engineering techniques to avoid returning PII or investment advice topics to customers. Enable model invocation logging, delivery logging, and image logging to Amazon S3.

Buy Now

Answer:

Explanation:

Option C is the correct solution because Amazon Bedrock guardrails are purpose-built to enforce defense-in-depth safety controls for GenAI applications with minimal operational overhead. Guardrails provide managed, policy-based enforcement that operates before prompts are sent to the foundation model and after responses are generated, which directly satisfies the requirement that PII must not be sent to the model and must not appear in outputs.

By configuring a sensitive information policy, the application can automatically detect and redact PII in user inputs and model responses without building custom preprocessing pipelines. This approach is more reliable and scalable than regex or prompt engineering techniques, which are brittle and error-prone for sensitive data handling.

The topic policy capability in Amazon Bedrock guardrails allows the bank to explicitly block investment advice topics, ensuring regulatory compliance. This policy-based approach is safer and more auditable than attempting to steer the model only through prompt instructions.

Using the Converse API enables structured, standardized interactions with the model and supports consistent logging of requests and responses. Enabling delivery logging and image logging to Amazon S3 ensures that all customer interactions, including documents and images, are captured in a durable, auditable storage layer. This directly supports compliance, regulatory audits, and forensic analysis.

Option A incorrectly relies on Amazon Macie, which is designed for data-at-rest discovery rather than real-time conversational filtering. Option B introduces custom Lambda pipelines and topic modeling, increasing operational complexity. Option D relies on regex and prompt engineering, which do not meet financial-grade compliance standards.

Therefore, Option C delivers the strongest security, governance, and auditability with the least operational effort.

Question 2

A company has a generative AI (GenAI) application that uses Amazon Bedrock to provide real-time responses to customer queries. The company has noticed intermittent failures with API calls to foundation models (FMs) during peak traffic periods.

The company needs a solution to handle transient errors and provide detailed observability into FM performance. The solution must prevent cascading failures during throttling events and provide distributed tracing across service boundaries to identify latency contributors. The solution must also enable correlation of performance issues with specific FM characteristics.

Which solution will meet these requirements?

Options:

Implement a custom retry mechanism with a fixed delay of 1 second between retries. Configure Amazon CloudWatch alarms to monitor the application’s error rates and latency metrics.

Configure the AWS SDK with standard retry mode and exponential backoff with jitter. Use AWS X-Ray tracing with annotations to identify and filter service components.

Implement client-side caching of all FM responses. Add custom logging statements in the application code to record API call durations.

Configure the AWS SDK with adaptive retry mode. Use AWS CloudTrail distributed tracing to monitor throttling events.

Answer:

Explanation:

Option B best meets the combined resiliency and observability requirements because it applies AWS-recommended retry behavior for transient throttling and enables true distributed tracing across service boundaries. During peak traffic, intermittent failures are commonly caused by throttling and other transient conditions. The AWS SDK standard retry mode provides exponential backoff with jitter, which reduces synchronized retry storms, prevents cascading failures, and improves overall system stability. Jitter is important because it spreads retry attempts over time, reducing load amplification during throttling events.

For observability, AWS X-Ray provides distributed tracing that follows a request across components such as API Gateway or load balancers, application services, and downstream calls to Amazon Bedrock. X-Ray can identify where latency is being introduced and which downstream call is contributing most to end-to-end response time. This is required to “identify latency contributors” and isolate performance issues under load.

The requirement also states that the company must correlate performance issues with specific FM characteristics. X-Ray annotations are designed for this purpose: the application can annotate traces with the model ID, inference parameters, region, or inference profile used. This enables filtering and analysis (for example, comparing latency or error patterns by model, parameter set, or endpoint configuration) without building a separate telemetry system.

Option A’s fixed-delay retries increase synchronized retry behavior and do not provide distributed tracing. Option C does not prevent cascading failures and cannot provide cross-service tracing. Option D is incorrect because CloudTrail is an audit logging service and does not provide distributed tracing for request latency analysis.

Therefore, Option B provides the correct combination of resilient retries and deep, model-correlated distributed observability for Amazon Bedrock workloads.

Question 3

A company uses an organization in AWS Organizations with all features enabled to manage multiple AWS accounts. Employees use Amazon Bedrock across multiple accounts. The company must prevent specific topics and proprietary information from being included in prompts to Amazon Bedrock models. The company must ensure that employees can use only approved Amazon Bedrock models. The company wants to manage these controls centrally.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

Create an IAM permissions boundary for each employee's IAM role. Configure the permissions boundary to require an approved Amazon Bedrock guardrail identifier to invoke Amazon Bedrock models. Create an SCP that allows employees to use only approved models.

Create an SCP that allows employees to use only approved models. Configure the SCP to require employees to specify a guardrail identifier in calls to invoke an approved model.

Create an SCP that prevents an employee from invoking a model if a centrally deployed guardrail identifier is not specified in a call to the model. Create a permissions boundary on each employee's IAM role that allows each employee to invoke only approved models.

Use AWS CloudFormation to create a custom Amazon Bedrock guardrail that has a block filtering policy. Use stack sets to deploy the guardrail to each account in the organization.

Use AWS CloudFormation to create a custom Amazon Bedrock guardrail that has a mask filtering policy. Use stack sets to deploy the guardrail to each account in the organization.

Answer:

C, D

Explanation:

The correct combination is C and D because together they enforce centralized governance over both model access and prompt content controls, which are the two core requirements of the scenario.

To ensure employees can use only approved Amazon Bedrock models, governance must be enforced at the organization level and not rely on individual application logic. Service Control Policies (SCPs) are the strongest control mechanism available in AWS Organizations because they define the maximum permissions an account or principal can have. In option C, the SCP prevents any Amazon Bedrock model invocation unless a centrally approved guardrail identifier is specified. This ensures that guardrails are always enforced, regardless of how or where the invocation originates. The additional use of IAM permissions boundaries ensures that even within allowed accounts, employees are restricted to invoking only explicitly approved foundation models.

To prevent specific topics and proprietary information from being included in prompts, Amazon Bedrock Guardrails must be used. Guardrails operate inline during model invocation and can block disallowed content before it is processed by the model. Option D correctly specifies a block filtering policy, which is appropriate when content must be prevented entirely rather than partially redacted. Deploying the guardrail using AWS CloudFormation StackSets allows the company to centrally manage and consistently deploy the same guardrail configuration across all accounts in the organization, ensuring uniform enforcement.

Option E uses mask filtering, which is better suited for redacting sensitive output rather than preventing prohibited content from being submitted in prompts. Option B attempts to use SCPs alone but does not enforce guardrail deployment or content filtering. Option A incorrectly places guardrail enforcement in permissions boundaries, which are not designed to validate request parameters such as guardrail identifiers.

By combining SCP-based enforcement with centrally deployed Bedrock guardrails, options C and D together provide strong, scalable, and centrally managed controls for both content safety and model governance across the organization.

Question 4

A pharmaceutical company is developing a Retrieval Augmented Generation (RAG) application that uses an Amazon Bedrock knowledge base. The knowledge base uses Amazon OpenSearch Service as a data source for more than 25 million scientific papers. Users report that the application produces inconsistent answers that cite irrelevant sections of papers when queries span methodology, results, and discussion sections of the papers.

The company needs to improve the knowledge base to preserve semantic context across related paragraphs on the scale of the entire corpus of data.

Which solution will meet these requirements?

Options:

Configure the knowledge base to use fixed-size chunking. Set a 300-token maximum chunk size and a 10% overlap between chunks. Use an appropriate Amazon Bedrock embedding model.

Configure the knowledge base to use hierarchical chunking. Use parent chunks that contain 1,000 tokens and child chunks that contain 200 tokens. Set a 50-token overlap between chunks.

Configure the knowledge base to use semantic chunking. Use a buffer size of 1 and a breakpoint percentile threshold of 85% to determine chunk boundaries based on content meaning.

Configure the knowledge base not to use chunking. Manually split each document into separate files before ingestion. Apply post-processing reranking during retrieval.

Answer:

Explanation:

Option B is the best solution because hierarchical chunking is specifically designed to preserve broader semantic context while still enabling precise retrieval at paragraph or sub-paragraph granularity. The problem described—answers citing irrelevant sections when a query spans multiple paper sections—often occurs when chunks are either too small (losing cross-paragraph context) or too “flat” (retrieving isolated snippets without their surrounding rationale).

In a scientific paper, related information is frequently distributed across methodology, results, and discussion. Flat, fixed-size chunking (Option A) can split these logically connected ideas into separate chunks, causing retrieval to surface fragments that match a term but not the full intent. Semantic chunking (Option C) improves boundary placement, but it does not inherently provide a multi-resolution structure that helps preserve section-level continuity at massive scale.

Hierarchical chunking solves this by creating parent chunks (larger context windows) that capture broader section context and child chunks (smaller units) that retain retrieval precision. When the retriever identifies relevant child chunks, it can also bring in the associated parent context so the foundation model sees the surrounding methodological or discussion framing. The defined overlaps further reduce the risk that key transitions or references are split across chunks.

This approach is well suited for a corpus of 25 million papers because it improves relevance without requiring a custom reranking model or a manual preprocessing pipeline. It remains operationally efficient because it is configured at the knowledge base level rather than implemented through custom code per document.

Option D introduces high operational complexity and inconsistent document handling at scale. Therefore, Option B best meets the requirement to preserve semantic context across related paragraphs and improve citation relevance across scientific paper sections.

Question 5

A financial services company is deploying a generative AI (GenAI) application that uses Amazon Bedrock to assist customer service representatives to provide personalized investment advice to customers. The company must implement a comprehensive governance solution that follows responsible AI practices and meets regulatory requirements.

The solution must detect and prevent hallucinations in recommendations. The solution must have safety controls for customer interactions. The solution must also monitor model behavior drift in real time and maintain audit trails of all prompt-response pairs for regulatory review. The company must deploy the solution within 60 days. The solution must integrate with the company's existing compliance dashboard and respond to customers within 200 ms.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Configure Amazon Bedrock guardrails to apply custom content filters and toxicity detection. Use Amazon Bedrock Model Evaluation to detect hallucinations. Store prompt-response pairs in Amazon DynamoDB to capture audit trails and set a TTL. Integrate Amazon CloudWatch custom metrics with the existing compliance dashboard.

Deploy Amazon Bedrock and use AWS PrivateLink to access the application securely. Use AWS Lambda functions to implement custom prompt validation. Store prompt-response pairs in an Amazon S3 bucket and configure S3 Lifecycle policies. Create custom Amazon CloudWatch dashboards to monitor model performance metrics.

Use Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to ground responses. Use Amazon Bedrock Guardrails to enforce content safety. Use Amazon OpenSearch Service to store and index prompt-response pairs. Integrate OpenSearch Service with Amazon QuickSight to create compliance reports and to detect model behavior drift.

Use Amazon SageMaker Model Monitor to detect model behavior drift. Use AWS WAF to filter content. Store customer interactions in an encrypted Amazon RDS database. Use Amazon API Gateway to create custom HTTP APIs to integrate with the compliance dashboard.

Answer:

Explanation:

Option A is the correct solution because it uses native Amazon Bedrock governance and evaluation capabilities to meet regulatory, performance, and deployment timeline requirements with the least operational overhead.

Amazon Bedrock guardrails provide built-in safety controls that enforce responsible AI policies directly during inference. Custom content filters and toxicity detection protect customer interactions and prevent disallowed investment guidance patterns without requiring custom application logic. Guardrails operate inline and are optimized for low latency, which helps meet the strict 200 ms response-time requirement.

Hallucination detection is addressed through Amazon Bedrock Model Evaluation, which supports automated evaluation at scale using LLM-as-a-judge techniques. This enables the company to detect factual inaccuracies and policy violations systematically, without building custom evaluation pipelines or requiring extensive human review. Evaluation outputs can be surfaced as metrics.

Storing all prompt-response pairs in Amazon DynamoDB provides a low-latency, highly scalable audit store that aligns with financial regulatory requirements. Using TTL enforces data retention policies automatically, reducing compliance risk and storage overhead.

Amazon CloudWatch custom metrics integrate seamlessly with existing compliance dashboards, allowing near–real-time monitoring of safety interventions, hallucination rates, and drift indicators. CloudWatch anomaly detection can be applied to these metrics to surface behavior changes quickly.

Option B relies on custom Lambda logic and S3-based auditing, increasing latency and operational complexity. Option C introduces additional services that increase setup time and may exceed the 60-day deployment window. Option D uses non–Bedrock-native monitoring and adds unnecessary infrastructure layers.

Therefore, Option A provides the most complete, compliant, and low-overhead governance solution for a regulated GenAI financial services application.

Question 6

A healthcare company is developing an application to process medical queries. The application must answer complex queries with high accuracy by reducing semantic dilution. The application must refer to domain-specific terminology in medical documents to reduce ambiguity in medical terminology. The application must be able to respond to 1,000 queries each minute with response times less than 2 seconds.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon API Gateway to route incoming queries to an Amazon Bedrock agent. Configure the agent to use an Anthropic Claude model to decompose queries and an Amazon Titan model to expand queries. Create an Amazon Bedrock knowledge base to store the reference medical documents.

Configure an Amazon Bedrock knowledge base to store the reference medical documents. Enable query decomposition in the knowledge base. Configure an Amazon Bedrock flow that uses a foundation model and the knowledge base to support the application.

Use Amazon SageMaker AI to host custom ML models for both query decomposition and query expansion. Configure Amazon Bedrock knowledge bases to store the reference medical documents. Encrypt the documents in the knowledge base.

Create an Amazon Bedrock agent to orchestrate multiple AWS Lambda functions to decompose queries. Create an Amazon Bedrock knowledge base to store the reference medical documents. Use the agent’s built-in knowledge base capabilities. Add deep research and reasoning capabilities to the agent to reduce ambiguity in the medical terminology.

Answer:

Explanation:

Option B provides the least operational overhead because it keeps the solution primarily inside managed Amazon Bedrock capabilities, minimizing custom orchestration code and infrastructure to operate. The core requirements are domain grounding, reduced semantic dilution for complex questions, and consistent low-latency responses at high request volume. A Bedrock knowledge base is purpose-built for Retrieval Augmented Generation by ingesting domain documents, chunking content, generating embeddings, and retrieving the most relevant passages at runtime. This directly addresses the need to reference domain-specific medical terminology from authoritative documents to reduce ambiguity and improve factual accuracy.

Reducing semantic dilution typically requires improving the retrieval query so that the retriever focuses on the most relevant concepts, especially for long or multi-intent questions. Enabling query decomposition allows the system to break a complex medical query into smaller, more targeted sub-queries. This increases retrieval precision and recall for each sub-question, which helps the model generate a more accurate synthesized response grounded in the retrieved medical context.

Amazon Bedrock Flows provide a managed way to orchestrate multi-step generative AI workflows, such as preprocessing the input, performing retrieval against the knowledge base, invoking a foundation model, and formatting the final response. Because flows are managed, the company avoids maintaining custom state machines, multiple Lambda functions, or bespoke routing logic. This reduces operational overhead while still supporting repeatable, observable execution.

Compared with the alternatives, option A introduces an agent plus API Gateway routing and multiple model choices, increasing configuration and runtime complexity. Option C requires hosting and scaling custom models on SageMaker AI, which adds significant operational burden and latency risk. Option D relies on multiple Lambda functions orchestrated by an agent, which adds more moving parts and increases cold-start and integration overhead. Option B most directly meets the requirements with the smallest operational footprint.

Question 7

A company is using Amazon Bedrock to develop a customer support AI assistant. The AI assistant must respond to customer questions about their accounts. The AI assistant must not expose personal information in responses. The company must comply with data residency policies by ensuring that all processing occurs within the same AWS Region where each customer is located.

The company wants to evaluate how effective the AI assistant is at preventing the exposure of personal information before the company makes the AI assistant available to customers.

Which solution will meet these requirements?

Options:

Configure a cross-Region Amazon Bedrock guardrail to apply sensitive information filters. Set the guardrail to detect mode during development and testing. Switch to block mode for production deployment.

Configure an Amazon Bedrock guardrail to apply sensitive information filters. Set the guardrail to mask mode during development and testing. Switch to block mode for production deployment. Deploy a copy of the guardrail to each Region where the company operates.

Configure an Amazon Bedrock guardrail to apply content and topic filters. Set the guardrail to detect mode during development, testing, and production. Disable invocation logging for the Amazon Bedrock model.

Configure a cross-Region Amazon Bedrock guardrail to apply a set of content and word filters. Set the guardrail to detect mode during development and testing. Switch to mask mode for production deployment.

Answer:

Explanation:

Option B best meets all stated requirements by correctly combining PII protection, evaluation before launch, and data residency compliance using Amazon Bedrock Guardrails. Amazon Bedrock guardrails provide native sensitive information filtering that operates inline during model invocation, making them well suited for preventing personal data exposure in customer-facing AI assistants.

The requirement to evaluate how effective the AI assistant is at preventing exposure before release is best addressed by using mask mode during development and testing. Mask mode allows responses to be generated while automatically redacting detected personal information, making it easy for developers and reviewers to see where and how PII would have appeared. This provides concrete validation that the guardrail rules are correctly configured without fully blocking responses, which is ideal for quality assurance and pre-production evaluation.

For production, switching the guardrail to block mode ensures that responses containing personal information are fully prevented from being returned to users. This offers the strongest protection and aligns with compliance expectations for customer account data. Block mode is appropriate once confidence in the guardrail configuration has been established during testing.

The data residency requirement is addressed by deploying a copy of the guardrail in each AWS Region where the application operates. Amazon Bedrock guardrails are Region-specific resources, and using Region-local guardrails ensures that inference, filtering, and enforcement all occur within the same Region as the customer data. This avoids cross-Region processing and helps the company comply with regulatory and contractual data residency policies.

Option A and D incorrectly rely on cross-Region guardrails, which can violate data residency constraints. Option C focuses on topic filtering rather than sensitive information filtering and keeps detect mode enabled in production, which does not actively prevent PII exposure. Therefore, B is the only option that fully satisfies safety, compliance, and evaluation requirements.

Question 8

A company is using Amazon Bedrock and Anthropic Claude 3 Haiku to develop an AI assistant. The AI assistant normally processes 10,000 requests each hour but experiences surges of up to 30,000 requests each hour during peak usage periods. The AI assistant must respond within 2 seconds while operating across multiple AWS Regions.

The company observes that during peak usage periods, the AI assistant experiences throughput bottlenecks that cause increased latency and occasional request timeouts. The company must resolve the performance issues.

Which solution will meet this requirement?

Options:

Purchase provisioned throughput and sufficient model units (MUs) in a single Region. Configure the application to retry failed requests with exponential backoff.

Implement token batching to reduce API overhead. Use cross-Region inference profiles to automatically distribute traffic across available Regions.

Set up auto scaling AWS Lambda functions in each Region. Implement client-side round-robin request distribution. Purchase one model unit (MU) of provisioned throughput as a backup.

Implement batch inference for all requests by using Amazon S3 buckets across multiple Regions. Use Amazon SQS to set up an asynchronous retrieval process.

Answer:

Explanation:

Option B is the correct solution because it directly addresses both throughput bottlenecks and latency requirements using native Amazon Bedrock performance optimization features that are designed for real-time, high-volume generative AI workloads.

Amazon Bedrock supports cross-Region inference profiles, which allow applications to transparently route inference requests across multiple AWS Regions. During peak usage periods, traffic is automatically distributed to Regions with available capacity, reducing throttling, request queuing, and timeout risks. This approach aligns with AWS guidance for building highly available, low-latency GenAI applications that must scale elastically across geographic boundaries.

Token batching further improves efficiency by combining multiple inference requests into a single model invocation where applicable. AWS Generative AI documentation highlights batching as a key optimization technique to reduce per-request overhead, improve throughput, and better utilize model capacity. This is especially effective for lightweight, low-latency models such as Claude 3 Haiku, which are designed for fast responses and high request volumes.

Option A does not meet the requirement because purchasing provisioned throughput in a single Region creates a regional bottleneck and does not address multi-Region availability or traffic spikes beyond reserved capacity. Retries increase load and latency rather than resolving the root cause.

Option C improves application-layer scaling but does not solve model-side throughput limits. Client-side round-robin routing lacks awareness of real-time model capacity and can still send traffic to saturated Regions.

Option D is unsuitable because batch inference with asynchronous retrieval is designed for offline or non-interactive workloads. It cannot meet a strict 2-second response time requirement for an interactive AI assistant.

Therefore, Option B provides the most effective and AWS-aligned solution to achieve low latency, global scalability, and high throughput during peak usage periods.

Question 9

A university recently digitized a collection of archival documents, academic journals, and manuscripts. The university stores the digital files in an AWS Lake Formation data lake.

The university hires a GenAI developer to build a solution to allow users to search the digital files by using text queries. The solution must return journal abstracts that are semantically similar to a user's query. Users must be able to search the digitized collection based on text and metadata that is associated with the journal abstracts. The metadata of the digitized files does not contain keywords. The solution must match similar abstracts to one another based on the similarity of their text. The data lake contains fewer than 1 million files.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon Titan Embeddings in Amazon Bedrock to create vector representations of the digitized files. Store embeddings in the OpenSearch Neural plugin for Amazon OpenSearch Service.

Use Amazon Comprehend to extract topics from the digitized files. Store the topics and file metadata in an Amazon Aurora PostgreSQL database. Query the abstract metadata against the data in the Aurora database.

Use Amazon SageMaker AI to deploy a sentence-transformer model. Use the model to create vector representations of the digitized files. Store embeddings in an Amazon Aurora PostgreSQL database that has the pgvector extension.

Use Amazon Titan Embeddings in Amazon Bedrock to create vector representations of the digitized files. Store embeddings in an Amazon Aurora PostgreSQL Serverless database that has the pgvector extension.

Answer:

Explanation:

Option D is the best choice because it delivers true semantic search with the smallest operational footprint by combining a fully managed embedding service with an automatically scaling vector-capable database. The university’s requirement is explicitly semantic: the metadata has no keywords, and the system must match abstracts based on similarity of meaning. This is a direct fit for an embeddings-based approach, where each abstract is converted into a vector representation and searched using vector similarity. Amazon Titan Embeddings in Amazon Bedrock provides a managed way to generate these vectors without hosting or maintaining an ML model, eliminating the operational work of model provisioning, patching, scaling, and lifecycle management.

For storage and retrieval, Amazon Aurora PostgreSQL Serverless with the pgvector extension supports vector storage and similarity search while minimizing infrastructure operations. Aurora Serverless reduces capacity planning and scaling tasks because it can automatically adjust to changes in workload, which is valuable for a university search application with variable usage patterns. With fewer than 1 million files, a PostgreSQL-based vector store is commonly operationally simpler than running a dedicated search cluster, while still meeting the requirement to query using both text-derived similarity and associated metadata filters stored alongside the vectors.

Option A can also enable vector search, but operating an OpenSearch domain typically introduces additional concerns such as domain sizing, shard strategy, cluster scaling, and performance tuning for k-NN workloads. Option C increases operational overhead the most because it requires deploying and operating a sentence-transformer model endpoint in SageMaker AI, including scaling, monitoring, and model management. Option B does not meet the semantic similarity requirement reliably because topic extraction is not equivalent to embedding-based semantic matching, especially when the metadata lacks keywords and the system must compare abstracts by meaning.

Therefore, D best satisfies semantic search needs with the least operational overhead.

Question 10

A company provides a service that helps users from around the world discover new restaurants. The service has 50 million monthly active users. The company wants to implement a semantic search solution across a database that contains 20 million restaurants and 200 million reviews. The company currently stores the data in PostgreSQL.

The solution must support complex natural language queries and return results for at least 95% of queries within 500 ms. The solution must maintain data freshness for restaurant details that update hourly. The solution must also scale cost-effectively during peak usage periods.

Which solution will meet these requirements with the LEAST development effort?

Options:

Migrate the restaurant data to Amazon OpenSearch Service. Implement keyword-based search rules that use custom analyzers and relevance tuning to find restaurants based on attributes such as cuisine type, features, and location. Create Amazon API Gateway HTTP API endpoints to transform user queries into structured search parameters.

Migrate the restaurant data to Amazon OpenSearch Service. Use a foundation model (FM) in Amazon Bedrock to generate vector embeddings from restaurant descriptions, reviews, and menu items. When users submit natural language queries, convert the queries to embeddings by using the same FM. Perform k-nearest neighbors (k-NN) searches to find semantically similar results.

Keep the restaurant data in PostgreSQL and implement a pgvector extension. Use a foundation model (FM) in Amazon Bedrock to generate vector embeddings from restaurant data. Store the vector embeddings directly in PostgreSQL. Create an AWS Lambda function to convert natural language queries to vector representations by using the same FM. Configure the Lambda function to perform similarity searches within the database.

Migrate restaurant data to an Amazon Bedrock knowledge base by using a custom ingestion pipeline. Configure the knowledge base to automatically generate embeddings from restaurant information. Use the Amazon Bedrock Retrieve API with built-in vector search capabilities to query the knowledge base directly by using natural language input.

Answer:

Explanation:

Option B best satisfies the requirements while minimizing development effort by combining managed semantic search capabilities with fully managed foundation models. AWS Generative AI guidance describes semantic search as a vector-based retrieval pattern where both documents and user queries are embedded into a shared vector space. Similarity search (such as k-nearest neighbors) then retrieves results based on meaning rather than exact keywords.

Amazon OpenSearch Service natively supports vector indexing and k-NN search at scale. This makes it well suited for large datasets such as 20 million restaurants and 200 million reviews while still achieving sub-second latency for the majority of queries. Because OpenSearch is a distributed, managed service, it automatically scales during peak traffic periods and provides cost-effective performance compared with building and tuning custom vector search pipelines on relational databases.

Using Amazon Bedrock to generate embeddings significantly reduces development complexity. AWS manages the foundation models, eliminates the need for custom model hosting, and ensures consistency by using the same FM for both document embeddings and query embeddings. This aligns directly with AWS-recommended semantic search architectures and removes the need for model lifecycle management.

Hourly updates to restaurant data can be handled efficiently through incremental re-indexing in OpenSearch without disrupting query performance. This approach cleanly separates transactional data storage from search workloads, which is a best practice in AWS architectures.

Option A does not meet the semantic search requirement because keyword-based search cannot reliably interpret complex natural language intent. Option C introduces scalability and performance risks by running large-scale vector similarity searches inside PostgreSQL, which increases operational complexity. Option D adds unnecessary ingestion and abstraction layers intended for retrieval-augmented generation, not high-throughput semantic search.

Therefore, Option B provides the optimal balance of performance, scalability, data freshness, and minimal development effort using AWS Generative AI services.

Question 11

Which solution will meet these requirements with the LEAST development effort?

Options:

Migrate the restaurant data to Amazon OpenSearch Service. Implement keyword-based search rules that use custom analyzers and relevance tuning to find restaurants based on attributes such as cuisine type, feature, and location. Create Amazon API Gateway HTTP API endpoints to transform user queries into structured search parameters.

Migrate the restaurant data to an Amazon Bedrock knowledge base by using a custom ingestion pipeline. Configure the knowledge base to automatically generate embeddings from restaurant information. Use the Amazon Bedrock Retrieve API with built-in vector search capabilities to query the knowledge base directly by using natural language input.

Answer:

Explanation:

Option D requires the least development effort because it uses a managed retrieval workflow that bundles the most time-consuming parts of semantic search: embedding generation, vector indexing, and natural language retrieval. With an Amazon Bedrock knowledge base, the application does not need to implement and operate separate services to (1) generate embeddings for hundreds of millions of records, (2) store and manage vectors, (3) build query-time embedding conversion logic, and (4) implement k-NN search orchestration. Instead, the knowledge base is configured to automatically create embeddings during ingestion, and the application queries it using the Amazon Bedrock Retrieve API, which accepts natural language input and performs the vector search as a managed capability.

The performance requirement (95% of queries within 500 ms) is best served by a purpose-built vector search backend rather than running similarity search directly inside a transactional PostgreSQL system at this scale. A knowledge base is designed for retrieval patterns and can be backed by scalable vector stores, which helps meet latency goals under heavy concurrency. The hourly freshness requirement maps naturally to ingestion updates: the pipeline can re-ingest updated restaurant details on a schedule so the knowledge base remains current without building custom re-embedding workflows in application code.

Cost-effective scaling during peak periods is also easier with a managed retrieval layer because scaling the retrieval workload is separated from the operational database. This avoids overprovisioning PostgreSQL for peak semantic-search traffic and reduces the engineering effort to tune performance, sharding, indexing, and retry logic.

Options B and C can work, but they require the team to build and maintain embedding pipelines, query embedding generation, vector index management, and operational scaling strategies. Option A does not provide semantic search because it relies on keyword-based matching rather than embeddings.

Question 12

A company is building a serverless application that uses AWS Lambda functions to help students around the world summarize notes. The application uses Anthropic Claude through Amazon Bedrock. The company observes that most of the traffic occurs during evenings in each time zone. Users report experiencing throttling errors during peak usage times in their time zones.

The company needs to resolve the throttling issues by ensuring continuous operation of the application. The solution must maintain application performance quality and must not require a fixed hourly cost during low traffic periods.

Which solution will meet these requirements?

Options:

Create custom Amazon CloudWatch metrics to monitor model errors. Set provisioned throughput to a value that is safely higher than the peak traffic observed.

Create custom Amazon CloudWatch metrics to monitor model errors. Set up a failover mechanism to redirect invocations to a backup AWS Region when the errors exceed a specified threshold.

Enable invocation logging in Amazon Bedrock. Monitor key metrics such as Invocations, InputTokenCount, OutputTokenCount, and InvocationThrottles. Distribute traffic across cross-Region inference endpoints.

Enable invocation logging in Amazon Bedrock. Monitor InvocationLatency, InvocationClientErrors, and InvocationServerErrors metrics. Distribute traffic across multiple versions of the same model.

Question 13

An ecommerce company is developing a generative AI application that uses Amazon Bedrock with Anthropic Claude to recommend products to customers. Customers report that some recommended products are not available for sale on the website or are not relevant to the customer. Customers also report that the solution takes a long time to generate some recommendations.

The company investigates the issues and finds that most interactions between customers and the product recommendation solution are unique. The company confirms that the solution recommends products that are not in the company’s product catalog. The company must resolve these issues.

Which solution will meet this requirement?

Options:

Increase grounding within Amazon Bedrock Guardrails. Enable Automated Reasoning checks. Set up provisioned throughput.

Use prompt engineering to restrict the model responses to relevant products. Use streaming techniques such as the InvokeModelWithResponseStream action to reduce perceived latency for the customers.

Create an Amazon Bedrock knowledge base. Implement Retrieval Augmented Generation RAG. Set the PerformanceConfigLatency parameter to optimized.

Store product catalog data in Amazon OpenSearch Service. Validate the model’s product recommendations against the product catalog. Use Amazon DynamoDB to implement response caching.

Answer:

Explanation:

Option C best addresses both core problems: hallucinated recommendations that do not exist in the catalog and slow response times, while keeping operational overhead low. The most direct way to prevent the model from recommending unavailable products is to ground generation on authoritative product catalog data at inference time. An Amazon Bedrock knowledge base is designed for this pattern by ingesting domain data, chunking content, creating embeddings, and retrieving the most relevant catalog entries when a user asks for recommendations. Implementing Retrieval Augmented Generation ensures the foundation model receives only approved, catalog-backed context and can cite or base its output on those retrieved items. This sharply reduces the likelihood of inventing products, because the response is conditioned on retrieved catalog records rather than relying on the model’s parametric memory.

The requirement also notes that most interactions are unique. That makes response caching far less effective, because there are fewer repeated prompts to benefit from cached outputs. Instead, improving the retrieval and model invocation path is the better optimization. Using the PerformanceConfigLatency parameter set to optimized prioritizes lower latency behavior for model inference, helping meet faster recommendation generation without requiring the company to build and operate additional infrastructure.

The other options do not solve the root cause as reliably. Prompt engineering and streaming can improve perceived latency, but they do not guarantee catalog-only recommendations because the model can still hallucinate items. Guardrails can help detect or block certain undesired outputs, but without consistent catalog grounding they do not ensure every recommendation is derived from the company’s product data. Building a custom OpenSearch validation and caching layer increases operational complexity, and caching is misaligned with predominantly unique interactions.

Question 14

A financial services company is developing a real-time generative AI (GenAI) assistant to support human call center agents. The GenAI assistant must transcribe live customer speech, analyze context, and provide incremental suggestions to call center agents while a customer is still speaking. To preserve responsiveness, the GenAI assistant must maintain end-to-end latency under 1 second from speech to initial response display. The architecture must use only managed AWS services and must support bidirectional streaming to ensure that call center agents receive updates in real time.

Which solution will meet these requirements?

Options:

Use Amazon Transcribe streaming to transcribe calls. Pass the text to Amazon Comprehend for sentiment analysis. Feed the results to Anthropic Claude on Amazon Bedrock by using the InvokeModel API. Store results in Amazon DynamoDB. Use a WebSocket API to display the results.

Use Amazon Transcribe streaming with partial results enabled to deliver fragments of transcribed text before customers finish speaking. Forward text fragments to Amazon Bedrock by using the InvokeModelWithResponseStream API. Stream responses to call center agents through an Amazon API Gateway WebSocket API.

Use Amazon Transcribe batch processing to convert calls to text. Pass complete transcripts to Anthropic Claude on Amazon Bedrock by using the ConverseStream API. Return responses through an Amazon Lex chatbot interface.

Use the Amazon Transcribe streaming API with an AWS Lambda function to transcribe each audio segment. Call the Amazon Titan Embeddings model on Amazon Bedrock by using the InvokeModel API. Publish results to Amazon SNS.

Question 15

A company uses Amazon Bedrock to generate technical content for customers. The company has recently experienced a surge in hallucinated outputs when the company’s model generates summaries of long technical documents. The model outputs include inaccurate or fabricated details. The company’s current solution uses a large foundation model (FM) with a basic one-shot prompt that includes the full document in a single input.

The company needs a solution that will reduce hallucinations and meet factual accuracy goals. The solution must process more than 1,000 documents each hour and deliver summaries within 3 seconds for each document.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

Implement zero-shot chain-of-thought (CoT) instructions that require step-by-step reasoning with explicit fact verification before the model generates each summary.

Use Retrieval Augmented Generation (RAG) with an Amazon Bedrock knowledge base. Apply semantic chunking and tuned embeddings to ground summaries in source content.

Configure Amazon Bedrock guardrails to block any generated output that matches patterns that are associated with hallucinated content.

Increase the temperature parameter in Amazon Bedrock.

Prompt the Amazon Bedrock model to summarize each full document in one pass.

Question 16

A financial services company uses multiple foundation models (FMs) through Amazon Bedrock for its generative AI (GenAI) applications. To comply with a new regulation for GenAI use with sensitive financial data, the company needs a token management solution.

The token management solution must proactively alert when applications approach model-specific token limits. The solution must also process more than 5,000 requests each minute and maintain token usage metrics to allocate costs across business units.

Which solution will meet these requirements?

Options:

Develop model-specific tokenizers in an AWS Lambda function. Configure the Lambda function to estimate token usage before sending requests to Amazon Bedrock. Configure the Lambda function to publish metrics to Amazon CloudWatch and trigger alarms when requests approach thresholds. Store detailed token usage in Amazon DynamoDB to report costs.

Implement Amazon Bedrock Guardrails with token quota policies. Capture metrics on rejected requests. Configure Amazon EventBridge rules to trigger notifications based on Amazon Bedrock Guardrails metrics. Use Amazon CloudWatch dashboards to visualize token usage trends across models.

Deploy an Amazon SQS dead-letter queue for failed requests. Configure an AWS Lambda function to analyze token-related failures. Use Amazon CloudWatch Logs Insights to generate reports on token usage patterns based on error logs from Amazon Bedrock API responses.

Use Amazon API Gateway to create a proxy for all Amazon Bedrock API calls. Configure request throttling based on custom usage plans with predefined token quotas. Configure API Gateway to reject requests that will exceed token limits.

Question 17

A company has a recommendation system running on Amazon EC2 instances. The applications make API calls to Amazon Bedrock foundation models (FMs) to analyze customer behavior and generate personalized product recommendations.

The system experiences intermittent issues where some recommendations do not match customer preferences. The company needs an observability solution to monitor operational metrics and detect patterns of performance degradation compared to established baselines. The solution must generate alerts with correlation data within 10 minutes when FM behavior deviates from expected patterns.

Which solution will meet these requirements?

Options:

Configure Amazon CloudWatch Container Insights. Set up alarms for latency thresholds. Add custom token metrics using the CloudWatch embedded metric format.

Implement AWS X-Ray. Enable CloudWatch Logs Insights. Set up AWS CloudTrail and create dashboards in Amazon QuickSight.

Enable Amazon CloudWatch Application Insights. Create custom metrics for recommendation quality, token usage, and response latency using the CloudWatch embedded metric format with dimensions for request types and user segments. Configure CloudWatch anomaly detection on model metrics. Use CloudWatch Logs Insights for pattern analysis.

Use Amazon OpenSearch Service with the Observability plugin. Ingest metrics and logs through Amazon Kinesis and analyze behavior with custom queries.

Question 18

A company is designing a solution that uses foundation models (FMs) to support multiple AI workloads. Some FMs must be invoked on demand and in real time. Other FMs require consistent high-throughput access for batch processing.

The solution must support hybrid deployment patterns and run workloads across cloud infrastructure and on-premises infrastructure to comply with data residency and compliance requirements.

Which combination of steps will meet these requirements? (Select TWO.)

Options:

Use AWS Lambda to orchestrate low-latency FM inference by invoking FMs hosted on Amazon SageMaker AI asynchronous endpoints.

Configure provisioned throughput in Amazon Bedrock to ensure consistent performance for high-volume workloads.

Deploy FMs to Amazon SageMaker AI endpoints with support for edge deployment by using Amazon SageMaker Neo. Orchestrate the FMs by using AWS Lambda to support hybrid deployment.

Use Amazon Bedrock with auto-scaling to handle unpredictable traffic surges.

Use Amazon SageMaker JumpStart to host and invoke the FMs.

Question 19

An ecommerce company is developing a generative AI (GenAI) solution that uses Amazon Bedrock with Anthropic Claude to recommend products to customers. Customers report that some recommended products are not available for sale or are not relevant. Customers also report long response times for some recommendations.

The company confirms that most customer interactions are unique and that the solution recommends products not present in the product catalog.

Which solution will meet this requirement?

Options:

Increase grounding within Amazon Bedrock Guardrails. Enable automated reasoning checks. Set up provisioned throughput.

Use prompt engineering to restrict model responses to relevant products. Use streaming inference to reduce perceived latency.

Create an Amazon Bedrock Knowledge Bases and implement Retrieval Augmented Generation (RAG). Set the PerformanceConfigLatency parameter to optimized.

Store product catalog data in Amazon OpenSearch Service. Validate model recommendations against the catalog. Use Amazon DynamoDB for response caching.

Question 20

A GenAI developer is evaluating Amazon Bedrock foundation models (FMs) to enhance a Europe-based company's internal business application. The company has a multi-account landing zone in AWS Control Tower. The company uses Service Control Policies (SCPs) to allow its accounts to use only the eu-north-1 and eu-west-1 Regions. All customer data must remain in private networks within the approved AWS Regions.

The GenAI developer selects an FM based on analysis and testing and hosts the model in the eu-central-1 Region and the eu-west-3 Region. The GenAI developer must enable access to the FM for the company's employees. The GenAI developer must ensure that requests to the FM are private and remain within the same Regions as the FM.

Which solution will meet these requirements?

Options:

Deploy an AWS Lambda function that is exposed by a private Amazon API Gateway REST API to a VPC in eu-north-1. Create a VPC endpoint for the selected FM in eu-central-1 and eu-west-3. Extend existing SCPs to allow employees to use the FM. Integrate the REST API with the business application.

Deploy the FM on Amazon EC2 instances in eu-north-1. Deploy a private Amazon API Gateway REST API in front of the EC2 instances. Configure an Amazon Bedrock VPC endpoint. Integrate the REST API with the business application.

Configure the FM to use cross-Region inference through a Europe-scoped endpoint. Configure an Amazon Bedrock VPC endpoint. Extend existing SCPs to allow employees to use the FM through inference profiles in Europe-based Regions where the FM is available. Use an inference profile to integrate Amazon Bedrock with the business application.

Deploy the FM in Amazon SageMaker in eu-north-1. Configure a SageMaker VPC endpoint. Extend existing SCPs to allow employees to use the SageMaker endpoint. Integrate the FM in SageMaker with the business application.

Question 21

A company has a customer service application that uses Amazon Bedrock to generate personalized responses to customer inquiries. The company needs to establish a quality assurance process to evaluate prompt effectiveness and model configurations across updates. The process must automatically compare outputs from multiple prompt templates, detect response quality issues, provide quantitative metrics, and allow human reviewers to give feedback on responses. The process must prevent configurations that do not meet a predefined quality threshold from being deployed.

Which solution will meet these requirements?

Options:

Create an AWS Lambda function that sends sample customer inquiries to multiple Amazon Bedrock model configurations and stores responses in Amazon S3. Use Amazon QuickSight to visualize response patterns. Manually review outputs daily. Use AWS CodePipeline to deploy configurations that meet the quality threshold.

Use Amazon Bedrock evaluation jobs to compare model outputs by using custom prompt datasets. Configure AWS CodePipeline to run the evaluation jobs when prompt templates change. Configure CodePipeline to deploy only configurations that exceed the predefined quality threshold.

Set up Amazon CloudWatch alarms to monitor response latency and error rates from Amazon Bedrock. Use Amazon EventBridge rules to notify teams when thresholds are exceeded. Configure a manual approval workflow in AWS Systems Manager.

Use AWS Lambda functions to create an automated testing framework that samples production traffic and routes duplicate requests to the updated model version. Use Amazon Comprehend sentiment analysis to compare results. Block deployment if sentiment scores decrease.

Question 22

A publishing company is developing a chat assistant that uses a containerized large language model (LLM) that runs on Amazon SageMaker AI. The architecture consists of an Amazon API Gateway REST API that routes user requests to an AWS Lambda function. The Lambda function invokes a SageMaker AI real-time endpoint that hosts the LLM.

Users report uneven response times. Analytics show that a high number of chats are abandoned after 2 seconds of waiting for the first token. The company wants a solution to ensure that p95 latency is under 800 ms for interactive requests to the chat assistant.

Which combination of solutions will meet this requirement? (Select TWO.)

Options:

Enable model preload upon container startup. Implement dynamic batching to process multiple user requests together in a single inference pass.

Select a larger GPU instance type for the SageMaker AI endpoint. Set the minimum number of instances to 0. Continue to perform per-request processing. Lazily load model weights on the first request.

Switch to a multi-model endpoint. Use lazy loading without request batching.

Set the minimum number of instances to greater than 0. Enable response streaming.

Switch to Amazon SageMaker Asynchronous Inference for all requests. Store requests in an Amazon S3 bucket. Set the minimum number of instances to 0.

Answer:

A, D

Explanation:

The correct answers are A and D because they directly reduce time-to-first-token and stabilize p95 latency for interactive, real-time chat workloads hosted on Amazon SageMaker AI real-time endpoints.

Option D addresses the biggest driver of uneven latency: cold starts and scale-to-zero behavior. By setting the minimum number of instances to greater than 0, the endpoint always has warm capacity and loaded runtime resources, eliminating the first-request penalty that causes users to wait multiple seconds. Enabling response streaming improves perceived latency by returning the first tokens as soon as they are generated rather than waiting for the complete response. This directly targets the abandonment problem described (users leaving after waiting for the first token).

Option A further improves p95 latency and throughput by removing model loading overhead during inference and improving GPU utilization. Preloading model weights during container startup ensures the model is ready before traffic arrives and avoids unpredictable on-demand weight loading. Dynamic batching increases efficiency by grouping compatible requests into a single inference pass, reducing per-request overhead and improving GPU saturation. When tuned properly for interactive workloads, batching can reduce tail latency while preserving responsiveness by enforcing small batch windows.

Option B makes latency worse because setting minimum instances to 0 and lazily loading weights guarantees cold-start delays and unpredictable first-token performance. Option C similarly increases cold-start behavior through lazy loading and offers no batching benefits. Option E is designed for non-interactive workloads and introduces queueing and storage latency, which conflicts with the 800 ms p95 requirement for interactive chat.

Therefore, A and D are the best combination to achieve consistently low p95 latency and fast first-token streaming for a SageMaker-hosted chat assistant.

Question 23

A company is developing a customer communication platform that uses an AI assistant powered by an Amazon Bedrock foundation model (FM). The AI assistant summarizes customer messages and generates initial response drafts.

The company wants to use Amazon Comprehend to implement layered content filtering. The layered content filtering must prevent sharing of offensive content, protect customer privacy, and detect potential inappropriate advice solicitation. Inappropriate advice solicitation includes requests for unethical practices, harmful activities, or manipulative behaviors.

The solution must maintain acceptable overall response times, so all pre-processing filters must finish before the content reaches the FM.

Which solution will meet these requirements?

Options:

Use parallel processing with asynchronous API calls. Use toxicity detection for offensive content. Use prompt safety classification for inappropriate advice solicitation. Use personally identifiable information (PII) detection without redaction.

Use custom classification to build an FM that detects offensive content and inappropriate advice solicitation. Apply personally identifiable information (PII) detection as a secondary filter only when messages pass the custom classifier.

Deploy a multi-stage process. Configure the process to use prompt safety classification first, then toxicity detection on safe prompts only, and finally personally identifiable information (PII) detection in streaming mode. Route flagged messages through Amazon EventBridge for human review.

Use toxicity detection with thresholds configured to 0.5 for all categories. Use parallel processing for both prompt safety classification and personally identifiable information (PII) detection with entity redaction. Apply Amazon CloudWatch alarms to filter metrics.

Question 24

A company uses Amazon Bedrock to build a Retrieval Augmented Generation (RAG) system. The RAG system uses an Amazon Bedrock Knowledge Bases that is based on an Amazon S3 bucket as the data source for emergency news video content. The system retrieves transcripts, archived reports, and related documents from the S3 bucket.

The RAG system uses state-of-the-art embedding models and a high-performing retrieval setup. However, users report slow responses and irrelevant results, which cause decreased user satisfaction. The company notices that vector searches are evaluating too many documents across too many content types and over long periods of time.

The company determines that the underlying models will not benefit from additional fine-tuning. The company must improve retrieval accuracy by applying smarter constraints and wants a solution that requires minimal changes to the existing architecture.

Which solution will meet these requirements?

Options:

Enhance embeddings by using a domain-adapted model that is specifically trained on emergency news content for improved vector similarity.

Migrate to Amazon OpenSearch Service. Use vector fields and metadata filters to define the scope of results retrieval.

Enable metadata-aware filtering within the Amazon Bedrock knowledge base by indexing S3 object metadata.

Migrate to an Amazon Q Business index to perform structured metadata filtering and document categorization during retrieval.

Question 25

A financial services company needs to build a document analysis system that uses Amazon Bedrock to process quarterly reports. The system must analyze financial data, perform sentiment analysis, and validate compliance across batches of reports. Each batch contains 5 reports. Each report requires multiple foundation model (FM) calls. The solution must finish the analysis within 10 seconds for each batch. Current sequential processing takes 45 seconds for each batch.

Which solution will meet these requirements?

Options:

Use AWS Lambda functions with provisioned concurrency to process each analysis type sequentially. Configure the Lambda function timeouts to 10 seconds. Configure automatic retries with exponential backoff.

Use AWS Step Functions with a Parallel state to invoke separate AWS Lambda functions for each analysis type simultaneously. Configure Amazon Bedrock client timeouts. Use Amazon CloudWatch metrics to track execution time and model inference latency.

Create an Amazon SQS queue to buffer analysis requests. Deploy multiple AWS Lambda functions with reserved concurrency. Configure each Lambda function to process different aspects of each report sequentially and then combine the results.

Deploy an Amazon ECS cluster that runs containers that process each report sequentially. Use a load balancer to distribute batch workloads. Configure an auto-scaling policy based on CPU utilization.

Answer:

Explanation:

Option B is the correct solution because it parallelizes independent foundation model inference tasks while maintaining orchestration, observability, and time-bound execution. AWS Generative AI best practices emphasize reducing end-to-end latency by parallelizing independent inference calls rather than scaling individual calls vertically.

In this scenario, each report requires multiple independent analyses such as financial extraction, sentiment analysis, and compliance validation. These tasks do not depend on each other’s output, making them ideal candidates for parallel execution. AWS Step Functions provides a Parallel state that can invoke multiple AWS Lambda functions simultaneously, drastically reducing total processing time compared to sequential execution.

By invoking Amazon Bedrock from separate Lambda functions in parallel, the system can reduce batch execution time from 45 seconds to well under the 10-second requirement, assuming each inference call remains within acceptable latency bounds. Step Functions also provide built-in error handling, retries, and state tracking, which improves reliability without increasing complexity.

CloudWatch metrics allow teams to monitor both workflow execution time and individual model inference latency, enabling performance tuning and operational visibility. Configuring client-side timeouts ensures that slow or failed model invocations do not block the entire batch.

Option A still processes tasks sequentially and therefore cannot meet the strict latency requirement. Option C introduces queuing delays and sequential processing within each report, which increases total execution time. Option D relies on container-based sequential processing and adds unnecessary operational overhead for a workload that is event-driven and latency-sensitive.

Therefore, Option B best meets the performance, scalability, and operational efficiency requirements for high-speed batch document analysis using Amazon Bedrock.

Question 26

A company needs a system to automatically generate study materials from multiple content sources. The content sources include document files (PDF files, PowerPoint presentations, and Word documents) and multimedia files (recorded videos). The system must process more than 10,000 content sources daily with peak loads of 500 concurrent uploads. The system must also extract key concepts from document files and multimedia files and create contextually accurate summaries. The generated study materials must support real-time collaboration with version control.

Which solution will meet these requirements?

Options:

Use Amazon Bedrock Data Automation (BDA) with AWS Lambda functions to orchestrate document file processing. Use Amazon Bedrock Knowledge Bases to process all multimedia. Store the content in Amazon DocumentDB with replication. Collaborate by using Amazon SNS topic subscriptions. Track changes by using Amazon Bedrock Agents.

Use Amazon Bedrock Data Automation (BDA) with foundation models (FMs) to process document files. Integrate BDA with Amazon Textract for PDF extraction and with Amazon Transcribe for multimedia files. Store the processed content in Amazon S3 with versioning enabled. Store the metadata in Amazon DynamoDB. Collaborate in real time by using AWS AppSync GraphQL subscriptions and DynamoDB.

Use Amazon Bedrock Data Automation (BDA) with Amazon SageMaker AI endpoints to host content extraction and summarization models. Use Amazon Bedrock Guardrails to extract content from all file types. Store document files in Amazon Neptune for time series analysis. Collaborate by using Amazon Bedrock Chat for real-time messaging.

Use Amazon Bedrock Data Automation (BDA) with AWS Lambda functions to process batches of content files. Fine-tune foundation models (FMs) in Amazon Bedrock to classify documents across all content types. Store the processed data in Amazon ElastiCache (Redis OSS) by using Cluster Mode with sharding. Use Prompt management in Amazon Bedrock for version control.

Answer:

Explanation:

Option B best fulfills all functional, scalability, and collaboration requirements by combining purpose-built AWS services with Amazon Bedrock capabilities. Amazon Bedrock Data Automation is designed to orchestrate large-scale, multimodal data processing pipelines and integrates naturally with foundation models for summarization and concept extraction. Using BDA to process document files ensures consistent preprocessing and model invocation at scale, which is essential for handling more than 10,000 sources per day with high concurrency.

Integrating Amazon Textract for PDFs enables accurate extraction of structured and unstructured text from scanned and digital documents, while Amazon Transcribe is the appropriate service for converting recorded videos into text for downstream semantic analysis. These services are optimized for their respective media types and feed clean, normalized inputs into Bedrock foundation models, improving the quality of contextual summaries.

Storing processed content in Amazon S3 with versioning enabled directly addresses the requirement for version control. S3 versioning provides immutable object history and rollback capabilities without additional complexity. Metadata storage in Amazon DynamoDB supports high-throughput, low-latency access patterns and scales automatically to handle peak upload concurrency.

Real-time collaboration is achieved through AWS AppSync GraphQL subscriptions combined with DynamoDB. AppSync enables real-time updates to connected clients whenever study materials are created or modified, making it well suited for collaborative editing and live synchronization. DynamoDB streams integrate seamlessly with AppSync to propagate changes efficiently.

The other options misuse services or fail to meet key requirements. Amazon SNS does not support collaborative state synchronization, Amazon DocumentDB is not optimized for versioned document storage, Amazon Neptune is unsuitable for document-centric workloads, and Amazon ElastiCache is not designed for durable storage or version control. Option B aligns with AWS best practices for scalable, multimodal generative AI systems built on Amazon Bedrock.

Question 27

A GenAI developer is building a Retrieval Augmented Generation (RAG)-based customer support application that uses Amazon Bedrock foundation models (FMs). The application needs to process 50 GB of historical customer conversations that are stored in an Amazon S3 bucket as JSON files. The application must use the processed data as its retrieval corpus. The application’s data processing workflow must extract relevant data from customer support documents, remove customer personally identifiable information (PII), and generate embeddings for vector storage. The processing workflow must be cost-effective and must finish within 4 hours.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use AWS Lambda and Amazon Comprehend to process files in parallel, remove PII, and call Amazon Bedrock APIs to generate vectors. Configure Lambda concurrency limits and memory settings to optimize throughput.

Create an AWS Glue ETL job to run PII detection scripts on the data. Use Amazon SageMaker Processing to run the HuggingFaceProcessor to generate embeddings by using a pre-trained model. Store the embeddings in Amazon OpenSearch Service.

Deploy an Amazon EMR cluster that runs Apache Spark with user-defined functions (UDFs) that call Amazon Comprehend to detect PII. Use Amazon Bedrock APIs to generate vectors. Store outputs in Amazon Aurora PostgreSQL with the pgvector extension.

Implement a data processing pipeline that uses AWS Step Functions to orchestrate a workload that uses Amazon Comprehend to detect PII and Amazon Bedrock to generate embeddings. Directly integrate the workflow with Amazon OpenSearch Serverless to store vectors and provide similarity search capabilities.

Answer:

Explanation:

Comprehensive and Detailed 250 to 350 words of Explanation From AWS Generative AI concepts and services documents:

Option D is the best solution because it delivers a fully managed, scalable pipeline with minimal infrastructure management while meeting the 50 GB and 4-hour constraint. AWS Step Functions provides a serverless orchestration layer that can coordinate parallel processing steps, retries, and error handling without managing clusters or tuning long-running compute.

Using Amazon Comprehend for PII detection fulfills the requirement to remove customer PII in a managed and consistent way. Step Functions can coordinate Comprehend calls at scale and route sanitized outputs into the embedding step. Generating embeddings with Amazon Bedrock keeps the entire workflow within AWS managed services, eliminates the need to maintain custom embedding models, and supports consistent vector representations for downstream retrieval.

Direct integration with Amazon OpenSearch Serverless provides a low-operations vector store that can handle large-scale indexing and similarity search without cluster sizing, node maintenance, or shard management. This aligns strongly with the requirement for least operational overhead and supports growth beyond the initial 50 GB corpus. Step Functions can batch and parallelize ingestion into OpenSearch Serverless to meet the 4-hour completion goal in a cost-effective manner by controlling concurrency, chunk sizes, and failure handling.

Option A can be difficult and costly at this scale because Lambda concurrency and per-invocation overhead can become complex to tune for 50 GB within 4 hours. Option B introduces SageMaker Processing and embedding model management, increasing operational complexity. Option C requires EMR cluster management and tuning, which is the opposite of minimal overhead.

Therefore, Option D is the most operationally efficient, scalable, and managed approach to build the required PII-sanitized embedding pipeline for a RAG corpus.

Question 28

A pharmaceutical company is developing a Retrieval Augmented Generation application that uses an Amazon Bedrock knowledge base. The knowledge base uses Amazon OpenSearch Service as a data source for more than 25 million scientific papers. Users report that the application produces inconsistent answers that cite irrelevant sections of papers when queries span methodology, results, and discussion sections of the papers.

The company needs to improve the knowledge base to preserve semantic context across related paragraphs on the scale of the entire corpus of data.

Which solution will meet these requirements?

Options:

Configure the knowledge base to use fixed-size chunking. Set a 300-token maximum chunk size and a 10% overlap between chunks. Use an appropriate Amazon Bedrock embedding model.

Configure the knowledge base to use hierarchical chunking. Use parent chunks that contain 1,000 tokens and child chunks that contain 200 tokens. Set a 50-token overlap between chunks.

Configure the knowledge base to use semantic chunking. Use a buffer size of 1 and a breakpoint percentile threshold of 85% to determine chunk boundaries based on content meaning.

Configure the knowledge base not to use chunking. Manually split each document into separate files before ingestion. Apply post-processing reranking during retrieval.

Answer:

Explanation:

Option B is the best fit because hierarchical chunking is designed to preserve local detail while keeping broader document context available during retrieval, which directly addresses the problem of questions spanning methodology, results, and discussion. In large scientific papers, a single answer often depends on linked paragraphs across adjacent sections. If the knowledge base retrieves only small, isolated chunks, the RAG system can cite text that is semantically close to a query term but not contextually correct, producing inconsistent answers and irrelevant citations.

With hierarchical chunking, the knowledge base creates child chunks that are small enough for high-precision vector similarity matching, such as 200 tokens, which improves the likelihood that the retrieved text is tightly related to the user’s query. At the same time, each child chunk is associated with a larger parent chunk, such as 1,000 tokens, which retains the surrounding narrative and section-level context. This structure helps the retrieval pipeline return passages that include the relevant subsection plus the explanatory framing that prevents misinterpretation, which is especially important in scientific writing where methods, results, and discussion are interdependent.

The configured overlap further reduces boundary effects where key statements split across chunks. This improves continuity for paragraphs that bridge sections, such as a results paragraph that references the methodological setup or a discussion paragraph interpreting a specific metric.

Option A can improve consistency slightly, but fixed-size chunking still risks separating related paragraphs and does not provide a built-in mechanism to retrieve broader context linked to precise matches. Option C can create more meaningful boundaries, but it does not guarantee the parent-level context that hierarchical chunking provides at retrieval time. Option D increases operational burden and is not practical at the scale of 25 million

Question 29

A company wants to select a new FM for its AI assistant. A GenAI developer needs to generate evaluation reports to help a data scientist assess the quality and safety of various foundation models FMs. The data scientist provides the GenAI developer with sample prompts for evaluation. The GenAI developer wants to use Amazon Bedrock to automate report generation and evaluation.

Which solution will meet this requirement?

Options:

Combine the sample prompts into a single JSON document. Create an Amazon Bedrock knowledge base with the document. Write a prompt that asks the FM to generate a response to each sample prompt. Use the RetrieveAndGenerate API to generate a report for each model.

Combine the sample prompts into a single JSONL document. Store the document in an Amazon S3 bucket. Create an Amazon Bedrock evaluation job that uses a judge model. Specify the S3 location as input and a different S3 location as output. Run an evaluation job for each FM and select the FM as the generator.

Combine the sample prompts into a single JSONL document. Store the document in an Amazon S3 bucket. Create an Amazon Bedrock evaluation job that uses a judge model. Specify the S3 location as input and Amazon QuickSight as output. Run an evaluation job for each FM and select the FM as the evaluator.

Combine the sample prompts into a single JSON document. Create an Amazon Bedrock knowledge base from the document. Create an Amazon Bedrock evaluation job that uses the retrieval and response generation evaluation type. Specify an Amazon S3 bucket as the output. Run an evaluation job for each FM.

Question 30

A media company must use Amazon Bedrock to implement a robust governance process for AI-generated content. The company needs to manage hundreds of prompt templates. Multiple teams use the templates across multiple AWS Regions to generate content. The solution must provide version control with approval workflows that include notifications for pending reviews. The solution must also provide detailed audit trails that document prompt activities and consistent prompt parameterization to enforce quality standards.

Which solution will meet these requirements?

Options:

Configure Amazon Bedrock Studio prompt templates. Use Amazon CloudWatch dashboards to display prompt usage metrics. Store approval status in Amazon DynamoDB. Use AWS Lambda functions to enforce approvals.

Use Amazon Bedrock Prompt Management to implement version control. Configure AWS CloudTrail for audit logging. Use AWS Identity and Access Management policies to control approval permissions. Create parameterized prompt templates by specifying variables.

Use AWS Step Functions to create an approval workflow. Store prompts in Amazon S3. Use tags to implement version control. Use Amazon EventBridge to send notifications.

Deploy Amazon SageMaker Canvas with prompt templates stored in Amazon S3. Use AWS CloudFormation for version control. Use AWS Config to enforce approval policies.

Exam Detail

Vendor: Amazon Web Services

Certification: AWS Certified Professional

Exam Code: AIP-C01

Exam Name: AWS Certified Generative AI Developer - Professional

Last Update: Feb 22, 2026

AIP-C01 Question Answers

Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free and Premium Amazon Web Services AIP-C01 Dumps Questions Answers

AWS Certified Generative AI Developer - Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer: