Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions Bank

Databricks Certified Associate Developer for Apache Spark 3.5 – Python Questions and Answers

Question 5

An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.

The initial code is:

def in_spanish_inner(df: pd.Series) -> pd.Series:

model = get_translation_model(target_lang='es')

return df.apply(model)

in_spanish = sf.pandas_udf(in_spanish_inner, StringType())

How can the MLOps engineer change this code to reduce how many times the language model is loaded?

Options:

Convert the Pandas UDF to a PySpark UDF

Convert the Pandas UDF from a Series → Series UDF to a Series → Scalar UDF

Run the in_spanish_inner() function in a mapInPandas() function call

Convert the Pandas UDF from a Series → Series UDF to an Iterator[Series] → Iterator[Series] UDF

Question 6

A data engineer is working on the DataFrame:

(Referring to the table image: it has columns Id, Name, count, and timestamp.)

Which code fragment should the engineer use to extract the unique values in the Name column into an alphabetically ordered list?

Options:

df.select("Name").orderBy(df["Name"].asc())

df.select("Name").distinct().orderBy(df["Name"])

df.select("Name").distinct()

df.select("Name").distinct().orderBy(df["Name"].desc())

Question 7

49 of 55.

In the code block below, aggDF contains aggregations on a streaming DataFrame:

aggDF.writeStream \

.format("console") \

.outputMode("???") \

.start()

Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

Options:

AGGREGATE

COMPLETE

REPLACE

APPEND

Question 8

41 of 55.

A data engineer is working on the DataFrame df1 and wants the Name with the highest count to appear first (descending order by count), followed by the next highest, and so on.

The DataFrame has columns:

id | Name | count | timestamp

---------------------------------

1 | USA | 10

2 | India | 20

3 | England | 50

4 | India | 50

5 | France | 20

6 | India | 10

7 | USA | 30

8 | USA | 40

Which code fragment should the engineer use to sort the data in the Name and count columns?

Options:

df1.orderBy(col("count").desc(), col("Name").asc())

df1.sort("Name", "count")

df1.orderBy("Name", "count")

df1.orderBy(col("Name").desc(), col("count").asc())

Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions Bank

Databricks Certified Associate Developer for Apache Spark 3.5 – Python Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

CompTIA

Fortinet

Microsoft

Salesforce