Databricks Real Dumps Practice Exam Questions by Dumpswarp

Databricks Certified Machine Learning Professional Questions and Answers

Question 1

A data scientist would like to enable MLflow Autologging for all machine learning libraries used in a notebook. They want to ensure that MLflow Autologging is used no matter what version of the Databricks Runtime for Machine Learning is used to run the notebook and no matter what workspace-wide configurations are selected in the Admin Console.

Which of the following lines of code can they use to accomplish this task?

Options:

mlflow.sklearn.autolog()

mlflow.spark.autolog()

spark.conf.set(“autologging”, True)

It is not possible to automatically log MLflow runs.

mlflow.autolog()

Question 2

A machine learning engineer wants to view all of the active MLflow Model Registry Webhooks for a specific model.

They are using the following code block:

Which of the following changes does the machine learning engineer need to make to this code block so it will successfully accomplish the task?

Options:

There are no necessary changes

Replace list with view in the endpoint URL

Replace POST with GET in the call to http request

Replace list with webhooks in the endpoint URL

Replace POST with PUT in the call to http request

Question 3

A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.

Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?

Options:

The pvfunc model can be used to deploy models in a parallelizable fashion

The same preprocessing logic will automatically be applied when calling fit

The same preprocessing logic will automatically be applied when calling predict

This approach has no impact when loading the logged Pvfunc model for downstream deployment

There is no longer a need for pipeline-like machine learning objects

Question 4

Which of the following is a probable response to identifying drift in a machine learning application?

Options:

None of these responses

Retraining and deploying a model on more recent data

All of these responses

Rebuilding the machine learning application with a new label variable

Sunsetting the machine learning application

Answer:

Explanation:

Drift is the change over time in the statistical properties of the data that was used to train a machine learning model. This can cause the model to become less accurate or perform differently than it was designed to1. Drift can be detected by monitoring the statistics of the input and output data over time and comparing them with the baseline statistics from the training data2. Depending on the type and severity of the drift, different responses may be appropriate. Some possible responses are:

Retraining and deploying a model on more recent data: This can help the model adapt to the changes in the data and improve its performance. However, this may require frequent retraining and deployment cycles, which can be costly and time-consuming. Also, retraining may not be sufficient if the drift is caused by a change in the underlying concept or relationship between the input and output variables3.
Rebuilding the machine learning application with a new label variable: This can help the model capture the new concept or relationship that has emerged in the data. However, this may require a significant redesign of the application and the data pipeline, as well as collecting and labeling new data. Also, rebuilding may not be feasible if the concept or relationship is constantly changing or unknown3.
Sunsetting the machine learning application: This can help avoid the risks and costs of maintaining a model that is no longer reliable or useful. However, this may mean losing the benefits and value of the application and the data. Also, sunsetting may not be an option if the application is critical or mandatory for the business or the users3.

Therefore, all of these responses are probable, depending on the situation and the trade-offs involved. References:

Databricks Machine Learning Professional Exam Guide, Section 4: Solution and Data Monitoring, p. 5
Databricks Machine Learning Documentation, Monitoring ML Models, Data Drift Detection, p. 2-3
A Gentle Introduction to Concept Drift in Machine Learning, Types of Concept Drift, p. 3-4
Understanding Data Drift and Model Drift: Drift Detection in Python, Types of Drift, p. 2-3

Question 5

A data scientist has developed a model to predict ice cream sales using the expected temperature and expected number of hours of sun in the day. However, the expected temperature is dropping beneath the range of the input variable on which the model was trained.

Which of the following types of drift is present in the above scenario?

Options:

Label drift

None of these

Concept drift

Prediction drift

Feature drift

Question 6

Which of the following MLflow operations can be used to delete a model from the MLflow Model Registry?

Options:

client.transition_model_version_stage

client.delete_model_version

client.update_registered_model

client.delete_model

client.delete_registered_model

Question 7

Which of the following MLflow operations can be used to automatically calculate and log a Shapley feature importance plot?

Options:

mlflow.shap.log_explanation

None of these operations can accomplish the task.

mlflow.shap

mlflow.log_figure

client.log_artifact

Question 8

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

Options:

Option A

Option B

Option C

Option D

Option E

Question 9

Which of the following is an advantage of using thepython_function(pyfunc)model flavor over the built-in library-specific model flavors?

Options:

python_function provides no benefits over the built-in library-specific model flavors

python_function can be used to deploy models in a parallelizable fashion

python_function can be used to deploy models without worrying about which library was used to create the model

python_function can be used to store models in an MLmodel file

python_function can be used to deploy models without worrying about whether they are deployed in batch, streaming, or real-time environments

Answer:

Explanation:

The python_function model flavor is a generic model interface for MLflow Python models. It ensures that any MLflow Python model can be loaded and interacted with using a consistent API, regardless of which library was used to create the model. The python_function model flavor defines a standard format for model data and a common interface for model inference. The model data consists of a predict function that can accept various types of input data and return various types of output data. The model inference interface consists of a load_pyfunc function that can load the model from a given path and return an object with a predict method. The python_function model flavor enables model portability and interoperability across different platforms and environments, as it allows deployment tools to understand and use the model without having to integrate with each library-specific model flavor12

The other options are incorrect because:

A. python_function provides several benefits over the built-in library-specific model flavors, such as portability, interoperability, and simplicity.
B. python_function does not directly enable parallelizable deployment of models, as it depends on the underlying implementation of the predict function and the deployment tool. However, python_function can be used with other tools and frameworks that support parallelizable deployment, such as Spark, Databricks, or Ray.
D. python_function is not used to store models in an MLmodel file, but rather to load models from an MLmodel file. The MLmodel file is a configuration file that contains metadata about the model, such as the model flavor, the data path, the dependencies, etc. The MLmodel file is created when the model is logged or saved using MLflow3
E. python_function does not directly enable deployment of models in batch, streaming, or real-time environments, as it depends on the underlying implementation of the predict function and the deployment tool. However, python_function can be used with other tools and frameworks that support different deployment scenarios, such as MLflow Serving, MLflow Projects, MLflow Models, or MLflow Model Registry4

References:

mlflow.pyfunc — MLflow 2.9.1 documentation
Models, Flavors, and PyFuncs in MLflow
MLflow Models — MLflow 2.9.1 documentation
Built-In Deployment Tools - MLflow 2.9.1 documentation

Question 10

A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.

Which of the following deployment strategies can be used to meet these requirements?

Options:

Edge/on-device

Streaming

None of these strategies will meet the requirements.

Batch

Real-time

Question 11

Which of the following is a simple statistic to monitor for categorical feature drift?

Options:

Mode

None of these

Mode, number of unique values, and percentage of missing values

Percentage of missing values

Number of unique values

Question 12

A machine learning engineer is manually refreshing a model in an existing machine learning pipeline. The pipeline uses the MLflow Model Registry model "project". The machine learning engineer would like to add a new version of the model to "project".

Which of the following MLflow operations can the machine learning engineer use to accomplish this task?

Options:

mlflow.register_model

MlflowClient.update_registered_model

mlflow.add_model_version

MlflowClient.get_model_version

The machine learning engineer needs to create an entirely new MLflow Model Registry model

Question 13

A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark.

Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?

Options:

client.list_run_infos(exp_id)

spark.read.format("delta").load(exp_id)

There is no way to programmatically return row-level results from an MLflow Experiment.

mlflow.search_runs(exp_id)

spark.read.format("mlflow-experiment").load(exp_id)

Question 14

A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in themodel_urivariable and its Run ID in therun_idvariable. They have also determined that the model was logged with the name"model". Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name"best_model".

Which of the following lines of code can they use to register the model to the MLflow Model Registry?

Options:

mlflow.register_model(model_uri, "best_model")

mlflow.register_model(run_id, "best_model")

mlflow.register_model(f"runs:/{run_id}/best_model", "model")

mlflow.register_model(model_uri, "model")

mlflow.register_model(f"runs:/{run_id}/model")

Question 15

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:

1. Deploy a model to production and compute predicted values

2. Obtain the observed (actual) label values

3. _____

4. Run a statistical test to determine if there are changes over time

Which of the following should be completed as Step #3?

Options:

Obtain the observed values (actual) feature values

Measure the latency of the prediction time

Retrain the model

None of these should be completed as Step #3

Compute the evaluation metric using the observed and predicted values

Question 16

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.

Which of the following code blocks accomplishes this task?

Options:

spark.read.format(“delta”).load(path).drop(“star_rating”)

spark.read.format(“delta”).table(path).drop(“star_rating”)

Delta tables cannot be modified

spark.read.table(path).drop(“star_rating”)

spark.sql(“SELECT * EXCEPT star_rating FROM path”)

Question 17

Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?

Options:

Cloud-based compute

None of these tools

REST APIs

Containers

Autoscaling clusters

Question 18

Which of the following describes concept drift?

Options:

Concept drift is when there is a change in the distribution of an input variable

Concept drift is when there is a change in the distribution of a target variable

Concept drift is when there is a change in the relationship between input variables and target variables

Concept drift is when there is a change in the distribution of the predicted target given by the model

None of these describe Concept drift

Load More Databricks-Machine-Learning-Professional Questions

Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: wrap60

Dumpswrap Top Menu

breadcrumb

Databricks Databricks-Machine-Learning-Professional Dumps

Databricks-Machine-Learning-Professional Free PDF Questions

Databricks Certified Machine Learning Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Machine-Learning-Professional Free PDF Answers

Dumpswrap Footer Menu

DumpsWrap All Rights Reserved