Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Amazon Web Services Data-Engineer-Associate Online Access

AWS Certified Data Engineer - Associate (DEA-C01) Questions and Answers

Question 65

A data engineer is optimizing query performance in Amazon Athena notebooks that use Apache Spark to analyze large datasets that are stored in Amazon S3. The data is partitioned. An AWS Glue crawler updates the partitions.

The data engineer wants to minimize the amount of data that is scanned to improve efficiency of Athena queries.

Which solution will meet these requirements?

Options:

A.

Apply partition filters in the queries.

B.

Increase the frequency of AWS Glue crawler invocations to update the data catalog more often.

C.

Organize the data that is in Amazon S3 by using a nested directory structure.

D.

Configure Spark to use in-memory caching for frequently accessed data.

Question 66

A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution.

A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations.

The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes.

Which solution will meet these requirements?

Options:

A.

Change the sort key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

B.

Change the distribution key to the table column that has the largest dimension.

C.

Upgrade the reserved node from ra3.4xlarqe to ra3.16xlarqe.

D.

Change the primary key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

Question 67

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Establish WebSocket connections to Amazon Redshift.

B.

Use the Amazon Redshift Data API.

C.

Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.

D.

Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.

Question 68

Files from multiple data sources arrive in an Amazon S3 bucket on a regular basis. A data engineer wants to ingest new files into Amazon Redshift in near real time when the new files arrive in the S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use the query editor v2 to schedule a COPY command to load new files into Amazon Redshift.

B.

Use the zero-ETL integration between Amazon Aurora and Amazon Redshift to load new files into Amazon Redshift.

C.

Use AWS Glue job bookmarks to extract, transform, and load (ETL) load new files into Amazon Redshift.

D.

Use S3 Event Notifications to invoke an AWS Lambda function that loads new files into Amazon Redshift.