Pre-Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Passed Exam Today Databricks-Certified-Professional-Data-Engineer

Databricks Certified Data Engineer Professional Exam Questions and Answers

Question 13

Which distribution does Databricks support for installing custom Python code packages?

Options:

A.

sbt

B.

CRAN

C.

CRAM

D.

nom

E.

Wheels

F.

jars

Question 14

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.

A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have

Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.

Which statement captures best practices for this situation?

Options:

A.

Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

B.

All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

C.

In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

D.

Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

Question 15

When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM ' s resources?

Options:

A.

The five Minute Load Average remains consistent/flat

B.

Bytes Received never exceeds 80 million bytes per second

C.

Network I/O never spikes

D.

Total Disk Space remains constant

E.

CPU Utilization is around 75%

Question 16

A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impression led to monitizable clicks.

Which solution would improve the performance?

A)

B)

C)

D)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D