Pre-Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Databricks Certification Databricks-Certified-Professional-Data-Engineer Passing Score

Databricks Certified Data Engineer Professional Exam Questions and Answers

Question 45

Given the following PySpark code snippet in a Databricks notebook:

filtered_df = spark.read.format( " delta " ).load( " /mnt/data/large_table " ) \

.filter( " event_date > ' 2024-01-01 ' " )

filtered_df.count()

The data engineer notices from the Query Profiler that the scan operator for filtered_df is reading almost all files, despite the filter being applied.

What is the probable reason for poor data skipping?

Options:

A.

The Delta table lacks optimization that enables dynamic file pruning.

B.

The filter is executed only after the full data scan, preventing data skipping.

C.

The event_date column is outside the table’s partitioning and Z-ordering scheme.

D.

The filter condition involves a data type excluded from data skipping support.

Question 46

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

Which statement describes this implementation?

Options:

A.

The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.

B.

The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

C.

The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

D.

The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

E.

The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

Question 47

The data science team has created and logged a production using MLFlow. The model accepts a list of column names and returns a new column of type DOUBLE.

The following code correctly imports the production model, load the customer table containing the customer_id key column into a Dataframe, and defines the feature columns needed for the model.

Which code block will output DataFrame with the schema ' ' customer_id LONG, predictions DOUBLE ' ' ?

Options:

A.

Model, predict (df, columns)

B.

Df, map (lambda k:midel (x [columns]) ,select ( ' ' customer_id predictions ' ' )

C.

Df. Select ( ' ' customer_id ' ' .

Model ( ' ' columns) alias ( ' ' predictions ' ' )

D.

Df.apply(model, columns). Select ( ' ' customer_id, prediction ' '

Question 48

A data architect is designing a Databricks solution to efficiently process data for different business requirements.

In which scenario should a data engineer use a materialized view compared to a streaming table ?

Options:

A.

Implementing a CDC (Change Data Capture) pipeline that needs to detect and respond to database changes within seconds.

B.

Ingesting data from Apache Kafka topics with sub-second processing requirements for immediate alerting.

C.

Precomputing complex aggregations and joins from multiple large tables to accelerate BI dashboard performance.

D.

Processing high-volume, continuous clickstream data from a website to monitor user behavior in real-time.