Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Vce Data-Engineer-Associate Questions Latest

AWS Certified Data Engineer - Associate (DEA-C01) Questions and Answers

Question 69

A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.

The company needs to identify matching records even when the records do not have a common unique identifier.

Which solution will meet this requirement?

Options:

A.

Use Amazon Made pattern matching as part of the ETL job.

B.

Train and use the AWS Glue PySpark Filter class in the ETL job.

C.

Partition tables and use the ETL job to partition the data on a unique identifier.

D.

Train and use the AWS Lake Formation FindMatches transform in the ETL job.

Question 70

A company’s data processing pipeline uses AWS Glue jobs and AWS Glue Data Catalog. All AWS Glue jobs must run in a custom VPC inside a private subnet. The company uses a NAT gateway to support outbound connections.

A data engineer needs to use AWS Glue to migrate data from an on-premises PostgreSQL database to Amazon S3. There is no current network connection between AWS and the on-premises environment. However, the data engineer has updated the on-premises database to allow traffic from the custom VPC.

Which solution will meet these requirements?

Options:

A.

Create a JDBC connection in AWS Glue with the database JDBC URL, username, and password.

B.

Create a Simple Authentication and Security Layer (SASL) connection in AWS Glue to the on-premises database.

C.

Create a JDBC connection in AWS Glue with a security group that allows TCP traffic to and from itself.

D.

Create a JDBC connection in AWS Glue that uses a JDBC driver stored in Amazon S3. Retrieve the database URL, username, and password from AWS Secrets Manager.

Question 71

A data engineer is designing a new data lake architecture for a company. The data engineer plans to use Apache Iceberg tables and AWS Glue Data Catalog to achieve fast query performance and enhanced metadata handling. The data engineer needs to query historical data for trend analysis and optimize storage costs for a large volume of event data.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Store Iceberg table data files in Amazon S3 Intelligent-Tiering.

B.

Define partitioning schemes based on event type and event date.

C.

Use AWS Glue Data Catalog to automatically optimize Iceberg storage.

D.

Run a custom AWS Glue job to compact Iceberg table data files.

Question 72

A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information.

The data engineer must identify and remove duplicate information from the legacy application data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Write a custom extract, transform, and load (ETL) job in Python. Use the DataFramedrop duplicatesf) function by importing the Pandas library to perform data deduplication.

B.

Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.

C.

Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

D.

Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.