Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

AWS Certified Associate Data-Engineer-Associate Full Course Free

AWS Certified Data Engineer - Associate (DEA-C01) Questions and Answers

Question 41

A company uses Amazon S3 buckets, AWS Glue tables, and Amazon Athena as components of a data lake. Recently, the company expanded its sales range to multiple new states. The company wants to introduce state names as a new partition to the existing S3 bucket, which is currently partitioned by date.

The company needs to ensure that additional partitions will not disrupt daily synchronization between the AWS Glue Data Catalog and the S3 buckets.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use the AWS Glue API to manually update the Data Catalog.

B.

Run an MSCK REPAIR TABLE command in Athena.

C.

Schedule an AWS Glue crawler to periodically update the Data Catalog.

D.

Run a REFRESH TABLE command in Athena.

Question 42

A company uses an Amazon S3 bucket to integrate multiple data sources into a central data lake. The company needs to perform multiple transformations and data cleaning processes on the data to make the data accessible to business partners.

The company needs a solution that will give multiple business partners the ability to run SQL queries on the central data lake during normal business hours.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Use a provisioned Amazon EMR cluster after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into Amazon Redshift Serverless.

B.

Use an AWS Glue Flex job after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into Amazon Redshift Serverless.

C.

Use an AWS Lambda function after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.

D.

Use an AWS Glue Flex job after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.

Question 43

A company has a frontend ReactJS website that uses Amazon API Gateway to invoke REST APIs. The APIs perform the functionality of the website. A data engineer needs to write a Python script that can be occasionally invoked through API Gateway. The code must return results to API Gateway.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Deploy a custom Python script on an Amazon Elastic Container Service (Amazon ECS) cluster.

B.

Create an AWS Lambda Python function with provisioned concurrency.

C.

Deploy a custom Python script that can integrate with API Gateway on Amazon Elastic Kubernetes Service (Amazon EKS).

D.

Create an AWS Lambda function. Ensure that the function is warm by scheduling an Amazon EventBridge rule to invoke the Lambda function every 5 minutes by using mock events.

Question 44

A company needs to build an extract, transform, and load (ETL) pipeline that has separate stages for batch data ingestion, transformation, and storage. The pipeline must store the transformed data in an Amazon S3 bucket. Each stage must automatically retry failures. The pipeline must provide visibility into the success or failure of individual stages.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Chain AWS Glue jobs that perform each stage together by using job triggers. Set the MaxRetries field to 0.

B.

Deploy AWS Step Functions workflows to orchestrate AWS Lambda functions that ingest data. Use AWS Glue jobs to transform the data and store the data in the S3 bucket.

C.

Build an Amazon EventBridge–based pipeline that invokes AWS Lambda functions to perform each stage.

D.

Schedule Apache Airflow directed acyclic graphs (DAGs) on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate pipeline steps. Use Amazon Simple Queue Service (Amazon SQS) to ingest data. Use AWS Glue jobs to transform data and store the data in the S3 bucket.