Latest Databricks Databricks-Machine-Learning-Professional Dumps PDF Questions Answers 2025

Databricks Certified Machine Learning Professional Questions and Answers

Question 1

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Options:

Indent the child run blocks within the parent run block

Add the nested=True argument to the parent run

Remove the nested=True argument from the child runs

Provide the same name to the run name parameter for all three run blocks

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Buy Now

Question 2

Which of the following MLflow operations can be used to automatically calculate and log a Shapley feature importance plot?

Options:

mlflow.shap.log_explanation

None of these operations can accomplish the task.

mlflow.shap

mlflow.log_figure

client.log_artifact

Question 3

Which of the following is a benefit of logging a model signature with an MLflow model?

Options:

The model will have a unique identifier in the MLflow experiment

The schema of input data can be validated when serving models

The model can be deployed using real-time serving tools

The model will be secured by the user that developed it

The schema of input data will be converted to match the signature

Question 4

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

Options:

Option A

Option B

Option C

Option D

Option E

Question 5

Which of the following MLflow Model Registry use cases requires the use of an HTTP Webhook?

Options:

Starting a testing job when a new model is registered

Updatingdata in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage

Sending an email alert when an automated testing Job fails

None of these use cases require the use of an HTTP Webhook

Sending a message to a Slack channel when a model version transitions stages

Answer:

Explanation:

An HTTP Webhook is a mechanism that allows you to register a callback that is triggered by an event, such as a model registry event. The callback is an HTTP request that is sent to a specified URL, which can invoke an action or a notification on another platform or service. An HTTP Webhook is required for use cases that involve integrating the model registry with external tools or workflows that are not supported by Databricks1.

Sending a message to a Slack channel when a model version transitions stages is a use case that requires the use of an HTTP Webhook. This is because Slack is an external platform that is not natively integrated with Databricks, and the model registry events are not directly accessible by Slack. Therefore, to send a message to a Slack channel, you need to register an HTTP Webhook that is triggered by the model registry event of interest, such as MODEL_VERSION_TRANSITIONED_STAGE. The HTTP Webhook then sends a request to the Slack API endpoint that corresponds to the channel and the message content2.

The other options are incorrect because:

Option A: Starting a testing job when a new model is registered does not require the use of an HTTP Webhook, but rather a job registry webhook. A job registry webhook is a type of webhook that triggers a job in a Databricks workspace when a model registry event occurs. A job registry webhook can be created using the Databricks REST API or the Python client databricks-registry-webhooks on PyPI3.
Option B: Updating data in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage does not require the use of an HTTP Webhook, but rather a Databricks SQL trigger. A Databricks SQL trigger is a mechanism that allows you to execute a SQL query or a notebook when a specified condition is met, such as a time interval or a file arrival. A Databricks SQL trigger can be created using the Databricks SQL UI or the Databricks REST API4.
Option C: Sending an email alert when an automated testing job fails does not require the use of an HTTP Webhook, but rather a job alert. A job alert is a feature that allows you to send an email notification when a job run meets a specifiedcondition, such as a failure, a timeout, or a success. A job alert can be created using the Jobs UI or the Databricks REST API5.
Option D: None of these use cases require the use of an HTTP Webhook is incorrect, as option E does require the use of an HTTP Webhook. References: MLflow Model Registry Webhooks on Databricks, Streamline MLOps With MLflow Model Registry Webhooks, Databricks SQL Triggers, Job Alerts, [Slack API]

Question 6

A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.

Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?

Options:

The pvfunc model can be used to deploy models in a parallelizable fashion

The same preprocessing logic will automatically be applied when calling fit

The same preprocessing logic will automatically be applied when calling predict

This approach has no impact when loading the logged Pvfunc model for downstream deployment

There is no longer a need for pipeline-like machine learning objects

Question 7

Which of the following lists all of the model stages are available in the MLflow Model Registry?

Options:

Development. Staging. Production

None. Staging. Production

Staging. Production. Archived

None. Staging. Production. Archived

Development. Staging. Production. Archived

Question 8

Which of the following describes concept drift?

Options:

Concept drift is when there is a change in the distribution of an input variable

Concept drift is when there is a change in the distribution of a target variable

Concept drift is when there is a change in the relationship between input variables and target variables

Concept drift is when there is a change in the distribution of the predicted target given by the model

None of these describe Concept drift

Question 9

Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?

Options:

Cloud-based compute

None of these tools

REST APIs

Containers

Autoscaling clusters

Question 10

A data scientist has written a function to track the runs of their random forest model. The data scientist is changing the number of trees in the forest across each run.

Which of the following MLflow operations is designed to log single values like the number of trees in a random forest?

Options:

mlflow.log_artifact

mlflow.log_model

mlflow.log_metric

mlflow.log_param

There is no way to store values like this.

Question 11

A machine learning engineer and data scientist are working together to convert a batch deployment to an always-on streaming deployment. The machine learning engineer has expressed that rigorous data tests must be put in place as a part of their conversion to account for potential changes in data formats.

Which of the following describes why these types of data type tests and checks are particularly important for streaming deployments?

Options:

Because the streaming deployment is always on, all types of data must be handled without producing an error

All of these statements

Because the streaming deployment is always on, there is no practitioner to debug poor model performance

Because the streamingdeployment is always on, there is a need to confirm that the deployment can autoscale

None of these statements

Question 12

A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.

Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?

Options:

df = fs.get_missing_features(spark_df, model_uri)

fs.score_model(model_uri, df)

fs.score_model(model_uri, spark_df)

df = fs.get_missing_features(spark_df, model_uri)

fs.score_batch(model_uri, df)

df = fs.get_missing_features(spark_df)

fs.score_batch(model_uri, df)

fs.score_batch(model_uri, spark_df)

Answer:

Explanation:

To compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id, you can use the following code block:

Python

# Get the missing features from the Feature Store using the model URI and the customer_id column

df = fs.get_missing_features(spark_df, model_uri, lookup_key="customer_id")

# Score the DataFrame using the model URI and the Feature Store Client

fs.score_batch(model_uri, df)

AI-generated code. Review and use carefully. More info on FAQ.

The fs.get_missing_features method takes a Spark DataFrame, a model URI, and a lookup key as arguments. It returns a new Spark DataFrame that contains the originalcolumns plus the missing features that are required by the model. The missing features are retrieved from the Feature Store by joining the DataFrame with the feature tables using the lookup key. The lookup key must match the primary key of the feature tables. The model URI must point to a registered model that was trained using features from the Feature Store1.

The fs.score_batch method takes a model URI and a Spark DataFrame as arguments. It applies the model to the DataFrame and returns a new Spark DataFrame that contains the original columns plus a prediction column. The model URI must point to a registered model that was trained using features from the Feature Store2.

The other options are incorrect because:

Option A: fs.score_model is not a valid method name, as it is missing an underscore. The correct method name is fs.score_batch2.
Option B: fs.score_model without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option D: fs.score_batch without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option E: fs.score_batch without specifying the lookup key will not work, as the fs.get_missing_features method requires a lookup key to join the DataFrame with the feature tables. The correct way is to use fs.get_missing_features with the lookup key “customer_id” before fs.score_batch12. References: Get missing features, Score batch

Question 13

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.

Which of the following code blocks accomplishes this task?

Options:

spark.read.format(“delta”).load(path).drop(“star_rating”)

spark.read.format(“delta”).table(path).drop(“star_rating”)

Delta tables cannot be modified

spark.read.table(path).drop(“star_rating”)

spark.sql(“SELECT * EXCEPT star_rating FROM path”)

Question 14

Which of the following machine learning model deployment paradigms is the most common for machine learning projects?

Options:

On-device

Streaming

Real-time

Batch

None of these deployments

Question 15

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

Options:

Spark UDFs

[Structured Streaming

MLflow

D Delta Lake

AutoML

Question 16

A data scientist has developed and logged a scikit-learn random forest model model, and then they ended their Spark session and terminated their cluster. After starting a new cluster, they want to review the feature_importances_ of the original model object.

Which of the following lines of code can be used to restore the model object so that feature_importances_ is available?

Options:

mlflow.load_model(model_uri)

client.list_artifacts(run_id)["feature-importances.csv"]

mlflow.sklearn.load_model(model_uri)

This can only be viewed in the MLflow Experiments UI

client.pyfunc.load_model(model_uri)

Question 17

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

Options:

mlflow.log_data(importance_path, "feature-importance.csv")

mlflow.log_artifact(importance_path, "feature-importance.csv")

None of these code blocks tan accomplish the task.

Question 18

A machine learning engineer is attempting to create a webhook that will trigger a Databricks Jobjob_idwhen a model version for modelmodeltransitions into any MLflow Model Registry stage.

They have the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so that the code block accomplishes the task?

Options:

"MODEL_VERSION_CREATED"

"MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

"MODEL_VERSION_TRANSITIONED_TO_STAGING"

"MODEL_VERSION_TRANSITIONED_STAGE"

"MODEL_VERSION_TRANSITIONED_TO_STAGING", "MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

Exam Detail

Vendor: Databricks

Certification: ML Data Scientist

Exam Code: Databricks-Machine-Learning-Professional

Exam Name: Databricks Certified Machine Learning Professional

Last Update: Jul 30, 2025

Databricks-Machine-Learning-Professional Question Answers

Summer Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Free and Premium Databricks Databricks-Machine-Learning-Professional Dumps Questions Answers

Databricks Certified Machine Learning Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

CompTIA

Fortinet

Microsoft

Salesforce