You work for a manufacturing company. You need to train a custom image classification model to detect product defects at the end of an assembly line Although your model is performing well some images in your holdout set are consistently mislabeled with high confidence You want to use Vertex Al to understand your model's results What should you do?
Vertex Explainable AI is a set of tools and frameworks to help you understand and interpret predictions made by your machine learning models, natively integrated with a number of Google’s products and services1. With Vertex Explainable AI, you can generate feature-based explanations that show how much each input feature contributed to the model’s prediction2. This can help you debug and improve your model performance, and build confidence in your model’s behavior. Feature-based explanations are supported for custom image classification models deployed on Vertex AI Prediction3. References:
Explainable AI | Google Cloud
Introduction to Vertex Explainable AI | Vertex AI | Google Cloud
Supported model types for feature-based explanations | Vertex AI | Google Cloud
Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?
Vertex AI Pipelines and App Engine
Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring
Cloud Composer, BigQuery ML, and Vertex AI Prediction
Cloud Composer, Vertex AI Training with custom containers, and App Engine
Option A is incorrect because Vertex AI Pipelines and App Engine do not meet all the requirements of the system. Vertex AI Pipelines is a service that allows you to create, run, and manage ML workflows using TensorFlow Extended (TFX) components or custom components1. App Engine is a service that allows you to build and deploy scalable web applications using standard or flexible environments2. However, App Engine does not support Docker containers in the standard environment, and does not provide a dedicated service for online prediction and monitoring of ML models3.
Option B is correct because Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring meet all the requirements of the system. Vertex AI Prediction is a service that allows you to deploy and serve ML models for online or batch prediction, with support for autoscaling and custom containers4. Vertex AI Model Monitoring is a service that allows you to monitor the performance and fairness of your deployed models, and get alerts for any issues or anomalies5.
Option C is incorrect because Cloud Composer, BigQuery ML, and Vertex AI Prediction do not meet all the requirements of the system. Cloud Composer is a service that allows you to create, schedule, and manage workflows using Apache Airflow. BigQuery ML is a service that allows you to create and use ML models within BigQuery using SQL queries. However, BigQuery ML does not support custom containers, and Vertex AI Prediction does not support scheduled model retraining or model monitoring.
Option D is incorrect because Cloud Composer, Vertex AI Training with custom containers, and App Engine do not meet all the requirements of the system. Vertex AI Training is a service that allows you to train ML models using built-in algorithms or custom containers. However, Vertex AI Training does not support online prediction or model monitoring, and App Engine does not support Docker containers in the standard environment or online prediction and monitoring of ML models3.
References:
Vertex AI Pipelines overview
App Engine overview
Choosing an App Engine environment
Vertex AI Prediction overview
Vertex AI Model Monitoring overview
[Cloud Composer overview]
[BigQuery ML overview]
[BigQuery ML limitations]
[Vertex AI Training overview]
You created a model that uses BigQuery ML to perform linear regression. You need to retrain the model on the cumulative data collected every week. You want to minimize the development effort and the scheduling cost. What should you do?
Use BigQuerys scheduling service to run the model retraining query periodically.
Create a pipeline in Vertex Al Pipelines that executes the retraining query and use the Cloud Scheduler API to run the query weekly.
Use Cloud Scheduler to trigger a Cloud Function every week that runs the query for retraining the model.
Use the BigQuery API Connector and Cloud Scheduler to trigger. Workflows every week that retrains the model.
BigQuery is a serverless data warehouse that allows you to perform SQL queries on large-scale data. BigQuery ML is a feature of BigQuery that enables you to create and execute machine learning models using standard SQL queries. You can use BigQuery ML to perform linear regression on your data and create a model. BigQuery also provides a scheduling service that allows you to create and manage recurring SQL queries. You can use BigQuery’s scheduling service to run the model retraining query periodically, such as every week. You can specify the destination table for the query results, and the schedule options, such as start date, end date, frequency, and time zone. You can also monitor the status and history of your scheduled queries. This solution can help you retrain the model on the cumulative data collected every week, while minimizing the development effort and the scheduling cost. References:
BigQuery ML | Google Cloud
Scheduling queries | BigQuery
You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your models features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?
Classification
Reinforcement Learning
Recurrent Neural Networks (RNN)
Convolutional Neural Networks (CNN)
Reinforcement learning is a machine learning technique that enables an agent to learn from its own actions and feedback in an environment. Reinforcement learning does not require labeled data or explicit rules, but rather relies on trial and error and reward and punishment mechanisms to optimize the agent’s behavior and achieve a goal. Reinforcement learning can be used to solve complex and dynamic problems that involve sequential decision making and adaptation to changing situations1.
For the use case of creating an inventory prediction model for a large grocery retailer with stores in multiple regions, reinforcement learning is a suitable algorithm to use. This is because the problem involves multiple factors that affect the inventory demand, such as region, location, historical demand, and seasonal popularity, and the inventory manager needs to make optimal decisions on how much and when to order, store, and distribute the products. Reinforcement learning can help the inventory manager to learn from the new inventory data on a daily basis, and adjust the inventory policy accordingly. Reinforcement learning can also handle the uncertainty and variability of the inventory demand, and balance the trade-off between overstocking and understocking2.
The other options are not as suitable as option B, because they are not designed to handle sequential decision making and adaptation to changing situations. Option A, classification, is a machine learning technique that assigns a label to an input based on predefined categories. Classification can be used to predict the inventory demand for a single product or a single period, but it cannot optimize the inventory policy over multiple products and periods. Option C, recurrent neural networks (RNN), are a type of neural network that can process sequential data, such as text, speech, or time series. RNN can be used to model the temporal patterns and dependencies of the inventory demand, but they cannot learn from feedback and rewards. Option D, convolutional neural networks (CNN), are a type of neural network that can process spatial data, such as images, videos, or graphs. CNN can be used to extract features and patterns from the inventory data, but they cannot optimize the inventory policy over multiple actions and states. Therefore, option B, reinforcement learning, is the best answer for this question.
References:
Reinforcement learning - Wikipedia
Reinforcement Learning for Inventory Optimization
You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator;
estimator = tf.estimator.DNNRegressor(
feature_columns=[YOUR_LIST_OF_FEATURES],
hidden_units-[1024, 512, 256],
dropout=None)
Your model performs well, but Just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You are willing to accept a small decrease in performance in order to reach the latency requirement Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?
Increase the dropout rate to 0.8 in_PREDICT mode by adjusting the TensorFlow Serving parameters
Increase the dropout rate to 0.8 and retrain your model.
Switch from CPU to GPU serving
Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
Quantization is a technique that reduces the numerical precision of the weights and activations of a neural network, which can improve the inference speed and reduce the memory footprint of the model1.
Reducing the floating point precision from tf.float64 to tf.float16 can potentially halve the latency and memory usage of the model, while having minimal impact on the accuracy2.
Increasing the dropout rate to 0.8 in either mode would not affect the latency, but would likely degrade the performance of the model significantly, as dropout is a regularization technique that randomly drops out units during training to prevent overfitting3.
Switching from CPU to GPU serving may or may not improve the latency, depending on the hardware specifications and the model complexity, but it would also incur additional costs and complexity for deployment4
You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?
Create a tf.data.Dataset.prefetch transformation
Convert the images to tf .Tensor Objects, and then run Dataset. from_tensor_slices{).
Convert the images to tf .Tensor Objects, and then run tf. data. Dataset. from_tensors ().
Convert the images Into TFRecords, store the images in Cloud Storage, and then use the tf. data API to read the images for training
An input pipeline is a way to prepare and feed data to a machine learning model for training or inference. An input pipeline typically consists of several steps, such as reading, parsing, transforming, batching, and prefetching the data. An input pipeline can improve the performance and efficiency of the model, as it can handle large and complex datasets, optimize the data processing, and reduce the latency and memory usage1.
For the use case of developing an input pipeline for an ML training model that processes images from disparate sources at a low latency, the best option is to convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training. This option involves using the following components and techniques:
TFRecords: TFRecords is a binary file format that can store a sequence of data records, such as images, text, or audio. TFRecords can help to compress, serialize, and store the data efficiently, and reduce the data loading and parsing time. TFRecords can also support data sharding and interleaving, which can improve the data throughput and parallelism2.
Cloud Storage: Cloud Storage is a service that allows you to store and access data on Google Cloud. Cloud Storage can help to store and manage large and distributed datasets, such as images from different sources, and provide high availability, durability, and scalability. Cloud Storage can also integrate with other Google Cloud services, such as Compute Engine, AI Platform, and Dataflow3.
tf.data API: tf.data API is a set of tools and methods that allow you to create and manipulate data pipelines in TensorFlow. tf.data API can help to read, transform, batch, and prefetch the data efficiently, and optimize the data processing for performance and memory. tf.data API can also support various data sources and formats, such as TFRecords, CSV, JSON, and images.
By using these components and techniques, the input pipeline can process large datasets of images from disparate sources that do not fit in memory, and provide low latency and high performance for the ML training model. Therefore, converting the images into TFRecords, storing the images in Cloud Storage, and using the tf.data API to read the images for training is the best option for this use case.
References:
Build TensorFlow input pipelines | TensorFlow Core
TFRecord and tf.Example | TensorFlow Core
Cloud Storage documentation | Google Cloud
[tf.data: Build TensorFlow input pipelines | TensorFlow Core]
You work on a growing team of more than 50 data scientists who all use Al Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?
Set up restrictive I AM permissions on the Al Platform notebooks so that only a single user or group can access a given instance.
Separate each data scientist's work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources
Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about Al Platform resource usage In BigQuery create a SQL view that maps users to the resources they are using.
Labels are key-value pairs that can be attached to any AI Platform resource, such as jobs, models, versions, or endpoints1. Labels can help you organize your resources into descriptive categories, such as project, team, environment, or purpose. You can use labels to filter the results when you list or monitor your resources, or to group them for billing or quota purposes2. Using labels is a simple and scalable way to manage your AI Platform resources without creating unnecessary complexity or overhead. Therefore, using labels to organize resources is the best strategy for this use case.
References:
Using labels
Filtering and grouping by labels
You recently designed and built a custom neural network that uses critical dependencies specific to your organization's framework. You need to train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by Al Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the scheduler, workers, and servers distribution structure. What should you do?
Use a built-in model available on Al Platform Training
Build your custom container to run jobs on Al Platform Training
Build your custom containers to run distributed training jobs on Al Platform Training
Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training
AI Platform Training is a service that allows you to run your machine learning training jobs on Google Cloud using various features, model architectures, and hyperparameters. You can use AI Platform Training to scale up your training jobs, leverage distributed training, and access specialized hardware such as GPUs and TPUs1. AI Platform Training supports several pre-built containers that provide different ML frameworks and dependencies, such as TensorFlow, PyTorch, scikit-learn, and XGBoost2. However, if the ML framework and related dependencies that you need are not supported by the pre-built containers, you can build your own custom containers and use them to run your training jobs on AI Platform Training3.
Custom containers are Docker images that you create to run your training application. By using custom containers, you can specify and pre-install all the dependencies needed for your application, and have full control over the code, serving, and deployment of your model4. Custom containers also enable you to run distributed training jobs on AI Platform Training, which can help you train large-scale and complex models faster and more efficiently5. Distributed training is a technique that splits the training data and computation across multiple machines, and coordinates them to update the model parameters. AI Platform Training supports two types of distributed training: parameter server and collective all-reduce. The parameter server architecture consists of a set of workers that perform the computation, and a set of servers that store and update the model parameters. The collective all-reduce architecture consists of a set of workers that perform the computation and synchronize the model parameters among themselves. Both architectures also have a scheduler that coordinates the workers and servers.
For the use case of training a custom neural network that uses critical dependencies specific to your organization’s framework, the best option is to build your custom containers to run distributed training jobs on AI Platform Training. This option allows you to use the ML framework and dependencies of your choice, and train your model on multiple machines without having to manage the infrastructure. Since your ML framework of choice uses the scheduler, workers, and servers distribution structure, you can use the parameter server architecture to run your distributed training job on AI Platform Training. You can specify the number and type of machines, the custom container image, and the training application arguments when you submit your training job. Therefore, building your custom containers to run distributed training jobs on AI Platform Training is the best option for this use case.
References:
AI Platform Training documentation
Pre-built containers for training
Custom containers for training
Custom containers overview | Vertex AI | Google Cloud
Distributed training overview
[Types of distributed training]
[Distributed training architectures]
[Using custom containers for training with the parameter server architecture]
You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?
Use the class distribution to generate 10% positive examples
Use a convolutional neural network with max pooling and softmax activation
Downsample the data with upweighting to create a sample with 10% positive examples
Remove negative examples until the numbers of positive and negative examples are equal
The class imbalance problem is a common challenge in machine learning, especially in classification tasks. It occurs when the distribution of the target classes is highly skewed, such that one class (the majority class) has much more examples than the other class (the minority class). The minority class is often the more interesting or important class, such as failure incidents, fraud cases, or rare diseases. However, most machine learning algorithms are designed to optimize the overall accuracy, which can be biased towards the majority class and ignore the minority class. This can result in poor predictive performance, especially for the minority class.
There are different techniques to deal with the class imbalance problem, such as data-level methods, algorithm-level methods, and evaluation-level methods1. Data-level methods involve resampling the original dataset to create a more balanced class distribution. There are two main types of data-level methods: oversampling and undersampling. Oversampling methods increase the number of examples in the minority class, either by duplicating existing examples or by generating synthetic examples. Undersampling methods reduce the number of examples in the majority class, either by randomly removing examples or by using clustering or other criteria to select representative examples. Both oversampling and undersampling methods can be combined with upweighting or downweighting, which assign different weights to the examples according to their class frequency, to further balance the dataset.
For the use case of investigating failures of a production line component based on sensor readings, the best option is to downsample the data with upweighting to create a sample with 10% positive examples. This option involves randomly removing some of the negative examples (the majority class) until the ratio of positive to negative examples is 1:9, and then assigning higher weights to the positive examples to compensate for their low frequency. This option can create a more balanced dataset that can improve the performance of the classification models, while preserving the diversity and representativeness of the original data. This option can also reduce the computation time and memory usage, as the size of the dataset is reduced. Therefore, downsampling the data with upweighting to create a sample with 10% positive examples is the best option for this use case.
References:
A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks
You are developing a process for training and running your custom model in production. You need to be able to show lineage for your model and predictions. What should you do?
1 Create a Vertex Al managed dataset
2 Use a Vertex Ai training pipeline to train your model
3 Generate batch predictions in Vertex Al
1 Use a Vertex Al Pipelines custom training job component to train your model
2. Generate predictions by using a Vertex Al Pipelines model batch predict component
1 Upload your dataset to BigQuery
2. Use a Vertex Al custom training job to train your model
3 Generate predictions by using Vertex Al SDK custom prediction routines
1 Use Vertex Al Experiments to train your model.
2 Register your model in Vertex Al Model Registry
3. Generate batch predictions in Vertex Al
According to the official exam guide1, one of the skills assessed in the exam is to “track the lineage of pipeline artifacts”. Vertex AI Experiments2 is a service that allows you to track and compare the results of your model training runs. Vertex AI Experiments automatically logs metadata such as hyperparameters, metrics, and artifacts for each training run. You can use Vertex AI Experiments to train your custom model using TensorFlow, PyTorch, XGBoost, or scikit-learn. Vertex AI Model Registry3 is a service that allows you to manage your trained models in a central location. You can use Vertex AI Model Registry to register your model, add labels and descriptions, and view the model’s lineage graph. The lineage graph shows the artifacts and executions that are part of the model’s creation, such as the dataset, the training pipeline, and the evaluation metrics. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
Vertex AI Experiments
Vertex AI Model Registry
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model's training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?
Vertex ML Metadata Vertex Al Feature Store, and Vertex Al Vizier
Vertex Al Pipelines. Vertex Al Experiments, and Vertex Al Vizier
Vertex ML Metadata Vertex Al Experiments, and Vertex Al TensorBoard
Vertex Al Pipelines. Vertex Al Feature Store, and Vertex Al TensorBoard
According to the official exam guide1, one of the skills assessed in the exam is to “track the lineage of pipeline artifacts”. Vertex ML Metadata2 is a service that allows you to store, query, and visualize metadata associated with your ML workflows, such as datasets, models, metrics, and executions. Vertex ML Metadata helps you track the provenance and lineage of your ML artifacts and understand the relationships between them. Vertex AI Experiments3 is a service that allows you to track and compare the results of your model training runs. Vertex AI Experiments automatically logs metadata such as hyperparameters, metrics, and artifacts for each training run. You can use Vertex AI Experiments to train your custom model using TensorFlow, PyTorch, XGBoost, or scikit-learn. Vertex AI TensorBoard4 is a service that allows you to visualize and monitor your ML experiments using TensorBoard, an open source tool for ML visualization. Vertex AI TensorBoard helps you track the model’s training parameters and the metrics per training epoch, and compare the performance of each version of the model. Therefore, option C is the best way to determine which Vertex AI services to include in the workflow for the given use case. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
Vertex ML Metadata
Vertex AI Experiments
Vertex AI TensorBoard
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
You recently used XGBoost to train a model in Python that will be used for online serving Your model prediction service will be called by a backend service implemented in Golang running on a Google Kubemetes Engine (GKE) cluster Your model requires pre and postprocessing steps You need to implement the processing steps so that they run at serving time You want to minimize code changes and infrastructure maintenance and deploy your model into production as quickly as possible. What should you do?
Use FastAPI to implement an HTTP server Create a Docker image that runs your HTTP server and deploy it on your organization's GKE cluster.
Use FastAPI to implement an HTTP server Create a Docker image that runs your HTTP server Upload the image to Vertex Al Model Registry and deploy it to a Vertex Al endpoint.
Use the Predictor interface to implement a custom prediction routine Build the custom contain upload the container to Vertex Al Model Registry, and deploy it to a Vertex Al endpoint.
Use the XGBoost prebuilt serving container when importing the trained model into Vertex Al Deploy the model to a Vertex Al endpoint Work with the backend engineers to implement the pre- and postprocessing steps in the Golang backend service.
The best option for implementing the processing steps so that they run at serving time, minimizing code changes and infrastructure maintenance, and deploying the model into production as quickly as possible, is to use the Predictor interface to implement a custom prediction routine. Build the custom container, upload the container to Vertex AI Model Registry, and deploy it to a Vertex AI endpoint. This option allows you to leverage the power and simplicity of Vertex AI to serve your XGBoost model with minimal effort and customization. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained XGBoost model to an online prediction endpoint, which can provide low-latency predictions for individual instances. A custom prediction routine (CPR) is a Python script that defines the logic for preprocessing the input data, running the prediction, and postprocessing the output data. A CPR can help you customize the prediction behavior of your model, and handle complex or non-standard data formats. A CPR can also help you minimize the code changes, as you only need to write a few functions to implement the prediction logic. A Predictor interface is a class that inherits from the base class aiplatform.Predictor, and implements the abstract methods predict() and preprocess(). A Predictor interface can help you create a CPR by defining the preprocessing and prediction logic for your model. A container image is a package that contains the model, the CPR, and the dependencies. A container image can help you standardize and simplify the deployment process, as you only need to upload the container image to Vertex AI Model Registry, and deploy it to Vertex AI Endpoints. By using the Predictor interface to implement a CPR, building the custom container, uploading the container to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint, you can implement the processing steps so that they run at serving time, minimize code changes and infrastructure maintenance, and deploy the model into production as quickly as possible1.
The other options are not as good as option C, for the following reasons:
Option A: Using FastAPI to implement an HTTP server, creating a Docker image that runs your HTTP server, and deploying it on your organization’s GKE cluster would require more skills and steps than using the Predictor interface to implement a CPR, building the custom container, uploading the container to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint. FastAPI is a framework for building web applications and APIs in Python. FastAPI can help you implement an HTTP server that can handle prediction requests and responses, and perform data preprocessing and postprocessing. A Docker image is a package that contains the model, the HTTP server, and the dependencies. A Docker image can help you standardize and simplify the deployment process, as you only need to build and run the Docker image. GKE is a service that can create and manage Kubernetes clusters on Google Cloud. GKE can help you deploy and scale your Docker image on Google Cloud, and provide high availability and performance. However, using FastAPI to implement an HTTP server, creating a Docker image that runs your HTTP server, and deploying it on your organization’s GKE cluster would require more skills and steps than using the Predictor interface to implement a CPR, building the custom container, uploading the container to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint. You would need to write code, create and configure the HTTP server, build and test the Docker image, create and manage the GKE cluster, and deploy and monitor the Docker image. Moreover, this option would not leverage the power and simplicity of Vertex AI, which can provide online prediction natively integrated with Google Cloud services2.
Option B: Using FastAPI to implement an HTTP server, creating a Docker image that runs your HTTP server, uploading the image to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint would require more skills and steps than using the Predictor interface to implement a CPR, building the custom container, uploading the container to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint. FastAPI is a framework for building web applications and APIs in Python. FastAPI can help you implement an HTTP server that can handle prediction requests and responses, and perform data preprocessing and postprocessing. A Docker image is a package that contains the model, the HTTP server, and the dependencies. A Docker image can help you standardize and simplify the deployment process, as you only need to build and run the Docker image. Vertex AI Model Registry is a service that can store and manage your machine learning models on Google Cloud. Vertex AI Model Registry can help you upload and organize your Docker image, and track the model versions and metadata. Vertex AI Endpoints is a service that can provide online prediction for your machine learning models on Google Cloud. Vertex AI Endpoints can help you deploy your Docker image to an online prediction endpoint, which can provide low-latency predictions for individual instances. However, using FastAPI to implement an HTTP server, creating a Docker image that runs your HTTP server, uploading the image to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint would require more skills and steps than using the Predictor interface to implement a CPR, building the custom container, uploading the container to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint. You would need to write code, create and configure the HTTP server, build and test the Docker image, upload the Docker image to Vertex AI Model Registry, and deploy the Docker image to Vertex AI Endpoints. Moreover, this option would not leverage the power and simplicity of Vertex AI, which can provide online prediction natively integrated with Google Cloud services2.
Option D: Using the XGBoost prebuilt serving container when importing the trained model into Vertex AI, deploying the model to a Vertex AI endpoint, working with the backend engineers to implement the pre- and postprocessing steps in the Golang backend service would not allow you to implement the processing steps so that they run at serving time, and could increase the code changes and infrastructure maintenance. A XGBoost prebuilt serving container is a container image that is provided by Google Cloud, and contains the XGBoost framework and the dependencies. A XGBoost prebuilt serving container can help you deploy a XGBoost model without writing any code, but it also limits your customization options. A XGBoost prebuilt serving container can only handle standard data formats, such as JSON or CSV, and cannot perform any preprocessing or postprocessing on the input or output data. If your input data requires any transformation or normalization before running the prediction, you cannot use a XGBoost prebuilt serving container. A Golang backend service is a service that is implemented in Golang, a programming language that can be used for web development and system programming. A Golang backend service can help you handle the prediction requests and responses from the frontend, and communicate with the Vertex AI endpoint. However, using the XGBoost prebuilt serving container when importing the trained model into Vertex AI, deploying the model to a Vertex AI endpoint, working with the backend engineers to implement the pre- and postprocessing steps in the Golang backend service would not allow you to implement the processing steps so that they run at serving time, and could increase the code changes and infrastructure maintenance. You would need to write code, import the trained model into Vertex AI, deploy the model to a Vertex AI endpoint, implement the pre- and postprocessing steps in the Golang backend service, and test and monitor the Golang backend service. Moreover, this option would not leverage the power and simplicity of Vertex AI, which can provide online prediction natively integrated with Google Cloud services2.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 2: Serving ML Predictions
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.1 Deploying ML models to production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.2: Serving ML Predictions
Custom prediction routines
Using pre-built containers for prediction
Using custom containers for prediction
You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage Bucket. What should you do?
Use Vertex Al Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads the images from Cloud Storage and trains the model.
Use Vertex Al Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads the images from Cloud Storage and trams the model.
Import the labeled images as a managed dataset in Vertex Al: and use AutoML to tram the model.
Convert the image dataset to a tabular format using Dataflow Load the data into BigQuery and use BigQuery ML to tram the model.
The best option for developing an image classification model by using a large dataset that contains labeled images in a Cloud Storage bucket is to import the labeled images as a managed dataset in Vertex AI and use AutoML to train the model. This option allows you to leverage the power and simplicity of Google Cloud to create and deploy a high-quality image classification model with minimal code and configuration. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can create a managed dataset from a Cloud Storage bucket that contains labeled images, which can be used to train an AutoML model. AutoML is a service that can automatically build and optimize machine learning models for various tasks, such as image classification, object detection, natural language processing, and tabular data analysis. AutoML can handle the complex aspects of machine learning, such as feature engineering, model architecture, hyperparameter tuning, and model evaluation. AutoML can also evaluate, deploy, and monitor the image classification model, and provide online or batch predictions. By using Vertex AI and AutoML, users can develop an image classification model by using a large dataset with ease and efficiency.
The other options are not as good as option C, for the following reasons:
Option A: Using Vertex AI Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads the images from Cloud Storage and trains the model would require more skills and steps than using Vertex AI and AutoML. Vertex AI Pipelines is a service that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the machine learning model. Kubeflow Pipelines SDK is a Python library that can create and run pipelines on Vertex AI Pipelines or on Kubeflow, an open-source platform for machine learning on Kubernetes. However, using Vertex AI Pipelines and Kubeflow Pipelines SDK would require writing code, building Docker images, defining pipeline components and steps, and managing the pipeline execution and artifacts. Moreover, Vertex AI Pipelines and Kubeflow Pipelines SDK are not specialized for image classification, and users would need to use other libraries or frameworks, such as TensorFlow or PyTorch, to build and train the image classification model.
Option B: Using Vertex AI Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads the images from Cloud Storage and trains the model would require more skills and steps than using Vertex AI and AutoML. TensorFlow Extended (TFX) is a framework that can create and run end-to-end machine learning pipelines on TensorFlow, a popular library for building and training deep learning models. TFX can preprocess the data, train and evaluate the model, validate and push the model, and serve the model for online or batch predictions. However, using Vertex AI Pipelines and TFX would require writing code, building Docker images, defining pipeline components and steps, and managing the pipeline execution and artifacts. Moreover, TFX is not optimized for image classification, and users would need to use other libraries or tools, such as TensorFlow Data Validation, TensorFlow Transform, and TensorFlow Hub, to handle the image data and the model architecture.
Option D: Converting the image dataset to a tabular format using Dataflow, loading the data into BigQuery, and using BigQuery ML to train the model would not handle the image data properly and could result in a poor model performance. Dataflow is a service that can create scalable and reliable pipelines to process large volumes of data from various sources. Dataflow can preprocess the data by using Apache Beam, a programming model for defining and executing data processing workflows. BigQuery is a serverless, scalable, and cost-effective data warehouse that can perform fast and interactive queries on large datasets. BigQuery ML is a service that can create and train machine learning models by using SQL queries on BigQuery. However, converting the image data to a tabular format would lose the spatial and semantic information of the images, which are essential for image classification. Moreover, BigQuery ML is not specialized for image classification, and users would need to use other tools or techniques, such as feature hashing, embedding, or one-hot encoding, to handle the categorical features.
You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The application will inform the user if the picture meets the requirements. How should you build a model to ensure that the application does not falsely accept a non-compliant picture?
Use AutoML to optimize the model’s recall in order to minimize false negatives.
Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet the profile photo requirements.
Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not meet the profile photo requirements.
Recall is the ratio of true positives to the sum of true positives and false negatives. It measures how well the model can identify all the relevant cases. In this scenario, the relevant cases are the pictures that do not meet the profile photo requirements. Therefore, minimizing false negatives means minimizing the cases where the model incorrectly predicts that a non-compliant picture meets the requirements. By using AutoML to optimize the model’s recall, the model will be more likely to reject a non-compliant picture and inform the user accordingly. References:
[AutoML Vision] is a service that allows you to train custom ML models for image classification and object detection tasks. You can use AutoML to optimize your model for different metrics, such as recall, precision, or F1 score.
[Recall] is one of the evaluation metrics for ML models. It is defined as TP / (TP + FN), where TP is the number of true positives and FN is the number of false negatives. Recall measures how well the model can identify all the relevant cases. A high recall means that the model has a low rate of false negatives.
You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?
Redaction, reproducibility, and explainability
Traceability, reproducibility, and explainability
Federated learning, reproducibility, and explainability
Differential privacy federated learning, and explainability
Before building an insurance approval model, an ML engineer should consider the factors of traceability, reproducibility, and explainability, as these are important aspects of responsible AI and fairness in a regulated domain. Traceability is the ability to track the provenance and lineage of the data, models, and decisions throughout the ML lifecycle. It helps to ensure the quality, reliability, and accountability of the ML system, and to comply with the regulatory and ethical standards. Reproducibility is the ability to recreate the same results and outcomes using the same data, models, and parameters. It helps to verify the validity, consistency, and robustness of the ML system, and to debug and improve the performance. Explainability is the ability to understand and interpret the logic, behavior, and outcomes of the ML system. It helps to increase the transparency, trust, and confidence of the ML system, and to identify and mitigate any potential biases, errors, or risks. The other options are not as relevant or comprehensive as this option. Redaction is the process of removing sensitive or confidential information from the data or documents, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the data preparation and protection. Federated learning is a technique that allows training ML models on decentralized data without transferring the data to a central server, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the model architecture and privacy preservation. Differential privacy is a method that adds noise to the data or the model outputs to protect the individual privacy of the data subjects, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the model evaluation and deployment. References:
Responsible AI documentation
Traceability documentation
Reproducibility documentation
Explainability documentation
You work for a pet food company that manages an online forum Customers upload photos of their pets on the forum to share with others About 20 photos are uploaded daily You want to automatically and in near real time detect whether each uploaded photo has an animal You want to prioritize time and minimize cost of your application development and deployment What should you do?
Send user-submitted images to the Cloud Vision API Use object localization to identify all objects in the image and compare the results against a list of animals.
Download an object detection model from TensorFlow Hub. Deploy the model to a Vertex Al endpoint. Send new user-submitted images to the model endpoint to classify whether each photo has an animal.
Manually label previously submitted images with bounding boxes around any animals Build an AutoML object detection model by using Vertex Al Deploy the model to a Vertex Al endpoint Send new user-submitted images to your model endpoint to detect whether each photo has an animal.
Manually label previously submitted images as having animals or not Create an image dataset on Vertex Al Train a classification model by using Vertex AutoML to distinguish the two classes Deploy the model to a Vertex Al endpoint Send new user-submitted images to your model endpoint to classify whether each photo has an animal.
Cloud Vision API is a service that allows you to analyze images using pre-trained machine learning models1. You can use Cloud Vision API to perform various tasks, such as face detection, text extraction, logo recognition, and object localization1. Object localization is a feature that allows you to detect multiple objects in an image and draw bounding boxes around them2. You can also get the labels and confidence scores for each detected object2.
By sending user-submitted images to the Cloud Vision API, you can use object localization to identify all objects in the image and compare the results against a list of animals. You can use the OBJECT_LOCALIZATION feature type in the AnnotateImageRequest to request object localization3. You can then use the localizedObjectAnnotations field in the AnnotateImageResponse to get the list of detected objects, their labels, and their confidence scores. You can compare the labels with a predefined list of animals, such as dogs, cats, birds, etc., and determine whether the image has an animal or not.
This option is the best for your scenario, because it allows you to automatically and in near real time detect whether each uploaded photo has an animal, without requiring any manual labeling, model training, or model deployment. You can also prioritize time and minimize cost of your application development and deployment, as you can use the Cloud Vision API as a ready-to-use service, without needing any machine learning expertise or infrastructure.
The other options are not suitable for your scenario, because they either require manual labeling, model training, or model deployment, which would increase the time and cost of your application development and deployment, or they use object detection models, which are more complex and computationally expensive than object localization models, and are not necessary for your simple task of detecting whether an image has an animal or not.
References:
Cloud Vision API | Google Cloud
Object localization | Cloud Vision API | Google Cloud
AnnotateImageRequest | Cloud Vision API | Google Cloud
[AnnotateImageResponse | Cloud Vision API | Google Cloud]
You have recently developed a custom model for image classification by using a neural network. You need to automatically identify the values for learning rate, number of layers, and kernel size. To do this, you plan to run multiple jobs in parallel to identify the parameters that optimize performance. You want to minimize custom code development and infrastructure management. What should you do?
Create a Vertex Al pipeline that runs different model training jobs in parallel.
Train an AutoML image classification model.
Create a custom training job that uses the Vertex Al Vizier SDK for parameter optimization.
Create a Vertex Al hyperparameter tuning job.
Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts customers1 account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance is likely to drop below $25. How should you serve your predictions?
1. Create a Pub/Sub topic for each user
2 Deploy a Cloud Function that sends a notification when your model predicts that a user's account balance will drop below the $25 threshold.
1. Create a Pub/Sub topic for each user
2. Deploy an application on the App Engine standard environment that sends a notification when your model predicts that
a user's account balance will drop below the $25 threshold
1. Build a notification system on Firebase
2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when the average of all account balance predictions drops below the $25 threshold
1 Build a notification system on Firebase
2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when your model predicts that a user's account balance will drop below the $25 threshold
This answer is correct because it uses Firebase, a platform that provides a scalable and reliable notification system for mobile and web applications. Firebase Cloud Messaging (FCM) allows you to send messages and notifications to users across different devices and platforms. By registering each user with a user ID on the FCM server, you can target specific users based on their account balance predictions and send them personalized notifications when their balance is likely to drop below the $25 threshold. This way, you can provide a useful and timely feature for your customers and increase their engagement and retention. References:
[Firebase Cloud Messaging]
[Firebase Cloud Messaging: Send messages to specific devices]
You received a training-serving skew alert from a Vertex Al Model Monitoring job running in production. You retrained the model with more recent training data, and deployed it back to the Vertex Al endpoint but you are still receiving the same alert. What should you do?
Update the model monitoring job to use a lower sampling rate.
Update the model monitoring job to use the more recent training data that was used to retrain the model.
Temporarily disable the alert Enable the alert again after a sufficient amount of new production traffic has passed through the Vertex Al endpoint.
Temporarily disable the alert until the model can be retrained again on newer training data Retrain the model again after a sufficient amount of new production traffic has passed through the Vertex Al endpoint
The best option for resolving the training-serving skew alert is to update the model monitoring job to use the more recent training data that was used to retrain the model. This option can help align the baseline distribution of the model monitoring job with the current distribution of the production data, and eliminate the false positive alerts. Model Monitoring is a service that can track and compare the results of multiple machine learning runs. Model Monitoring can monitor the model’s prediction input data for feature skew and drift. Training-serving skew occurs when the feature data distribution in production deviates from the feature data distribution used to train the model. If the original training data is available, you can enable skew detection to monitor your models for training-serving skew. Model Monitoring uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores for each feature, and compares them with a baseline distribution. The baseline distribution is the statistical distribution of the feature’s values in the training data. If the distance score for a feature exceeds an alerting threshold that you set, Model Monitoring sends you an email alert. However, if you retrain the model with more recent training data, and deploy it back to the Vertex AI endpoint, the baseline distribution of the model monitoring job may become outdated and inconsistent with the current distribution of the production data. This can cause the model monitoring job to generate false positive alerts, even if the model performance is not deteriorated. To avoid this problem, you need to update the model monitoring job to use the more recent training data that was used to retrain the model. This can help the model monitoring job to recalculate the baseline distribution and the distance scores, and compare them with the current distribution of the production data. This can also help the model monitoring job to detect any true positive alerts, such as a sudden change in the production data that causes the model performance to degrade1.
The other options are not as good as option B, for the following reasons:
Option A: Updating the model monitoring job to use a lower sampling rate would not resolve the training-serving skew alert, and could reduce the accuracy and reliability of the model monitoring job. The sampling rate is a parameter that determines the percentage of prediction requests that are logged and analyzed by the model monitoring job. Using a lower sampling rate can reduce the storage and computation costs of the model monitoring job, but also the quality and validity of the data. Using a lower sampling rate can introduce sampling bias and noise into the data, and make the model monitoring job miss some important features or patterns of the data. Moreover, using a lower sampling rate would not address the root cause of the training-serving skew alert, which is the mismatch between the baseline distribution and the current distribution of the production data2.
Option C: Temporarily disabling the alert, and enabling the alert again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint, would not resolve the training-serving skew alert, and could expose the model to potential risks and errors. Disabling the alert would stop the model monitoring job from sending email notifications when the distance score for a feature exceeds the alerting threshold, but it would not stop the model monitoring job from calculating and comparing the distributions and distance scores. Therefore, disabling the alert would not address the root cause of the training-serving skew alert, which is the mismatch between the baseline distribution and the current distribution of the production data. Moreover, disabling the alert would prevent the model monitoring job from detecting any true positive alerts, such as a sudden change in the production data that causes the model performance to degrade. This can expose the model to potential risks and errors, and affect the user satisfaction and trust1.
Option D: Temporarily disabling the alert until the model can be retrained again on newer training data, and retraining the model again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint, would not resolve the training-serving skew alert, and could cause unnecessary costs and efforts. Disabling the alert would stop the model monitoring job from sending email notifications when the distance score for a feature exceeds the alerting threshold, but it would not stop the model monitoring job from calculating and comparing the distributions and distance scores. Therefore, disabling the alert would not address the root cause of the training-serving skew alert, which is the mismatch between the baseline distribution and the current distribution of the production data. Moreover, disabling the alert would prevent the model monitoring job from detecting any true positive alerts, such as a sudden change in the production data that causes the model performance to degrade. This can expose the model to potential risks and errors, and affect the user satisfaction and trust. Retraining the model again on newer training data would create a new model version, but it would not update the model monitoring job to use the newer training data as the baseline distribution. Therefore, retraining the model again on newer training data would not resolve the training-serving skew alert, and could cause unnecessary costs and efforts1.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 4: Evaluation
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.3 Monitoring ML models in production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.3: Monitoring ML Models
Using Model Monitoring
Understanding the score threshold slider
Sampling rate
Your team is training a large number of ML models that use different algorithms, parameters and datasets. Some models are trained in Vertex Ai Pipelines, and some are trained on Vertex Al Workbench notebook instances. Your team wants to compare the performance of the models across both services. You want to minimize the effort required to store the parameters and metrics What should you do?
Implement an additional step for all the models running in pipelines and notebooks to export parameters and metrics to BigQuery.
Create a Vertex Al experiment Submit all the pipelines as experiment runs. For models trained on notebooks log parameters and metrics by using the Vertex Al SDK.
Implement all models in Vertex Al Pipelines Create a Vertex Al experiment, and associate all pipeline runs with that experiment.
Store all model parameters and metrics as mode! metadata by using the Vertex Al Metadata API.
Vertex AI Experiments is a service that allows you to track, compare, and manage experiments with Vertex AI. You can use Vertex AI Experiments to record the parameters, metrics, and artifacts of each model training run, and compare them in a graphical interface. Vertex AI Experiments supports models trained in Vertex AI Pipelines, Vertex AI Custom Training, and Vertex AI Workbench notebooks. To use Vertex AI Experiments, you need to create an experiment and submit your pipeline runs or custom training jobs as experiment runs. For models trained on notebooks, you need to use the Vertex AI SDK to log the parameters and metrics to the experiment. This way, you can minimize the effort required to store and compare the model performance across different services. References: Track, compare, manage experiments with Vertex AI Experiments, Vertex AI Pipelines: Metrics visualization and run comparison using the KFP SDK, [Vertex AI SDK for Python]
You have recently developed a new ML model in a Jupyter notebook. You want to establish a reliable and repeatable model training process that tracks the versions and lineage of your model artifacts. You plan to retrain your model weekly. How should you operationalize your training process?
1. Create an instance of the CustomTrainingJob class with the Vertex AI SDK to train your model.
2. Using the Notebooks API, create a scheduled execution to run the training code weekly.
1. Create an instance of the CustomJob class with the Vertex AI SDK to train your model.
2. Use the Metadata API to register your model as a model artifact.
3. Using the Notebooks API, create a scheduled execution to run the training code weekly.
1. Create a managed pipeline in Vertex Al Pipelines to train your model by using a Vertex Al CustomTrainingJoOp component.
2. Use the ModelUploadOp component to upload your model to Vertex Al Model Registry.
3. Use Cloud Scheduler and Cloud Functions to run the Vertex Al pipeline weekly.
1. Create a managed pipeline in Vertex Al Pipelines to train your model using a Vertex Al HyperParameterTuningJobRunOp component.
2. Use the ModelUploadOp component to upload your model to Vertex Al Model Registry.
3. Use Cloud Scheduler and Cloud Functions to run the Vertex Al pipeline weekly.
The best way to operationalize your training process is to use Vertex AI Pipelines, which allows you to create and run scalable, portable, and reproducible workflows for your ML models. Vertex AI Pipelines also integrates with Vertex AI Metadata, which tracks the provenance, lineage, and artifacts of your ML models. By using a Vertex AI CustomTrainingJobOp component, you can train your model using the same code as in your Jupyter notebook. By using a ModelUploadOp component, you can upload your trained model to Vertex AI Model Registry, which manages the versions and endpoints of your models. By using Cloud Scheduler and Cloud Functions, you can trigger your Vertex AI pipeline to run weekly, according to your plan. References:
Vertex AI Pipelines documentation
Vertex AI Metadata documentation
Vertex AI CustomTrainingJobOp documentation
ModelUploadOp documentation
Cloud Scheduler documentation
[Cloud Functions documentation]
You work at an ecommerce startup. You need to create a customer churn prediction model Your company's recent sales records are stored in a BigQuery table You want to understand how your initial model is making predictions. You also want to iterate on the model as quickly as possible while minimizing cost How should you build your first model?
Export the data to a Cloud Storage Bucket Load the data into a pandas DataFrame on Vertex Al Workbench and train a logistic regression model with scikit-learn.
Create a tf.data.Dataset by using the TensorFlow BigQueryChent Implement a deep neural network in TensorFlow.
Prepare the data in BigQuery and associate the data with a Vertex Al dataset Create an
AutoMLTabuiarTrainmgJob to train a classification model.
Export the data to a Cloud Storage Bucket Create tf. data. Dataset to read the data from Cloud Storage Implement a deep neural network in TensorFlow.
BigQuery is a service that allows you to store and query large amounts of data in a scalable and cost-effective way. You can use BigQuery to prepare the data for your customer churn prediction model, such as filtering, aggregating, and transforming the data. You can then associate the data with a Vertex AI dataset, which is a service that allows you to store and manage your ML data on Google Cloud. By using a Vertex AI dataset, you can easily access the data from other Vertex AI services, such as AutoML. AutoML is a service that allows you to create and train ML models without writing code. You can use AutoML to create an AutoMLTabularTrainingJob, which is a type of job that trains a classification model for tabular data, such as customer churn. By using an AutoMLTabularTrainingJob, you can benefit from the automated feature engineering, model selection, and hyperparameter tuning that AutoML provides. You can also use Vertex Explainable AI to understand how your model is making predictions, such as which features are most important and how they affect the prediction outcome. By using BigQuery, Vertex AI dataset, and AutoMLTabularTrainingJob, you can build your first model as quickly as possible while minimizing cost and complexity. References:
BigQuery documentation
Vertex AI dataset documentation
AutoMLTabularTrainingJob documentation
Vertex Explainable AI documentation
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
You are building a custom image classification model and plan to use Vertex Al Pipelines to implement the end-to-end training. Your dataset consists of images that need to be preprocessed before they can be used to train the model. The preprocessing steps include resizing the images, converting them to grayscale, and extracting features. You have already implemented some Python functions for the preprocessing tasks. Which components should you use in your pipeline'?
Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the requirements and decided to use TensorFlow to build the classifier so that you have full control of the model's code, serving, and deployment. You will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of building a completely new model. How should you build the classifier?
Use the Natural Language API to classify support requests
Use AutoML Natural Language to build the support requests classifier
Use an established text classification model on Al Platform to perform transfer learning
Use an established text classification model on Al Platform as-is to classify support requests
Transfer learning is a technique that leverages the knowledge and weights of a pre-trained model and adapts them to a new task or domain1. Transfer learning can save time and resources by avoiding training a model from scratch, and can also improve the performance and generalization of the model by using a larger and more diverse dataset2. AI Platform provides several established text classification models that can be used for transfer learning, such as BERT, ALBERT, or XLNet3. These models are based on state-of-the-art natural language processing techniques and can handle various text classification tasks, such as sentiment analysis, topic classification, or spam detection4. By using one of these models on AI Platform, you can customize the model’s code, serving, and deployment, and use Kubeflow pipelines for the ML platform. Therefore, using an established text classification model on AI Platform to perform transfer learning is the best option for this use case.
References:
Transfer Learning - Machine Learning’s Next Frontier
A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning
Text classification models
Text Classification with Pre-trained Models in TensorFlow
You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform across the various languages and without changing the serving infrastructure.
You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API. However, the model has significant differences in performance across the different languages. How should you improve it?
Add a regularization term such as the Min-Diff algorithm to the loss function.
Train a classifier using the chat messages in their original language.
Replace the in-house word2vec with GPT-3 or T5.
Remove moderation for languages for which the false positive rate is too high.
The problem with the current approach is that it relies on the Cloud Translation API to translate the chat messages into a common language before embedding them with the in-house word2vec model. This introduces two sources of error: the translation quality and the word2vec quality. The translation quality may vary across different languages, depending on the availability of data and the complexity of the grammar and vocabulary. The word2vec quality may also vary depending on the size and diversity of the corpus used to train it. These errors may affect the performance of the classifier that moderates the chat messages, resulting in significant differences across the languages.
A better approach would be to train a classifier using the chat messages in their original language, without relying on the Cloud Translation API or the in-house word2vec model. This way, the classifier can learn the nuances and subtleties of each language, and avoid the errors introduced by the translation and embedding processes. This would also reduce the latency and cost of the moderation system, as it would not need to invoke the Cloud Translation API for every message. To train a classifier using the chat messages in their original language, one could use a multilingual pre-trained model such as mBERT or XLM-R, which can handle multiple languages and domains. Alternatively, one could train a separate classifier for each language, using a monolingual pre-trained model such as BERT or a custom model tailored to the specific language and task.
References:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Google Cloud launches machine learning engineer certification
[mBERT: Bidirectional Encoder Representations from Transformers]
[XLM-R: Unsupervised Cross-lingual Representation Learning at Scale]
[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]
You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop Model training will use a large batch size, and you expect training to take several weeks You need to configure a training architecture that minimizes both training time and compute costs What should you do?
According to the official exam guide1, one of the skills assessed in the exam is to “design, build, and productionalize ML models to solve business challenges using Google Cloud technologies”. TPUs2 are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed to handle large batch sizes, high dimensional data, and complex computations. TPUs can significantly reduce the training time and compute costs of large language models, especially when used with distributed training strategies, such as MultiWorkerMirroredStrategy3. Therefore, option D is the best way to configure a training architecture that minimizes both training time and compute costs for the given use case. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
TPUs
MultiWorkerMirroredStrategy
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction You notice that the input data contains a few categorical features, including product category and payment method You want to deploy the model as quickly as possible. What should you do?
Use the transform clause with the ML. ONE_HOT_ENCODER function on the categorical features at model creation and select the categorical and non-categorical features.
Use the ML. ONE_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.
Use the create model statement and select the categorical and non-categorical features.
Use the ML. ONE_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.
The best option for building an ML model to predict customer purchase behavior in BigQuery ML is to use the transform clause with the ML.ONE_HOT_ENCODER function on the categorical features at model creation and select the categorical and non-categorical features. This option allows you to encode the categorical features as one-hot vectors, which are binary vectors that have only one non-zero element. One-hot encoding is a common technique for handling categorical features in ML models, as it can reduce the dimensionality and sparsity of the data, and avoid the ordinality problem that arises when using numerical labels for categorical values1. The transform clause is a feature of BigQuery ML that lets you apply SQL expressions to transform the input data at model creation time. The transform clause can perform feature engineering, such as one-hot encoding, on the fly, without requiring you to create and store a new table with the transformed data2. By using the transform clause with the ML.ONE_HOT_ENCODER function, you can create and train an ML model in BigQuery ML with a single SQL statement, and export it to Cloud Storage for online prediction.
The other options are not as good as option A, for the following reasons:
Option B: Using the ML.ONE_HOT_ENCODER function on the categorical features, and selecting the encoded categorical features and non-categorical features as inputs to create your model, would require more steps and storage than using the transform clause. The ML.ONE_HOT_ENCODER function is a BigQuery ML function that returns a one-hot encoded vector for a given categorical value. However, using this function alone would not apply the one-hot encoding to the input data at model creation time. You would need to create a new table with the encoded features, and use that table as the input to create your model. This would incur additional storage costs and reduce the performance of the queries.
Option C: Using the create model statement and selecting the categorical and non-categorical features, would not handle the categorical features properly and could result in a poor model performance. The create model statement is a BigQuery ML statement that creates and trains an ML model from a SQL query. However, if the input data contains categorical features, you need to encode them as one-hot vectors or use the category_count option to specify the number of categories for each feature. Otherwise, BigQuery ML would treat the categorical features as numerical values, which can introduce bias and noise into the model3.
Option D: Using the ML.ONE_HOT_ENCODER function on the categorical features, and selecting the encoded categorical features and non-categorical features as inputs to create your model, is the same as option B, and has the same drawbacks.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: Data Engineering for ML on Google Cloud, Week 2: Feature Engineering
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 1: Architecting low-code ML solutions, 1.1 Developing ML models by using BigQuery ML
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 3: Data Engineering for ML, Section 3.2: BigQuery for ML
One-hot encoding
Using the TRANSFORM clause for feature engineering
Creating a model
ML.ONE_HOT_ENCODER function
You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model. What should you do?
Access BigQuery Studio in the Google Cloud console. Run the create model statement in the SQL editor to create an ARIMA model.
Create a Vertex Al Workbench notebook. Use IPython magic to run the create model statement to create an ARIMA model.
Access BigQuery Studio in the Google Cloud console. Run the create model statement in the SQL editor to create an AutoML regression model.
Create a Vertex Al Workbench notebook. Use IPython magic to run the create model statement to create an AutoML regression model.
BigQuery ML allows you to build and run machine learning models using SQL queries directly within BigQuery, which is one of the simplest approaches because it doesn't require setting up an external environment like Vertex AI or managing infrastructure.
AutoML regression is more appropriate for predicting customer lifetime value (CLV) compared to ARIMA, which is typically used for time series forecasting (e.g., sales over time, stock prices, etc.). CLV prediction involves understanding complex relationships between customer behavior and value, which is best captured by a regression model.
Using BigQuery Studio and running a CREATE MODEL statement to build an AutoML regression model offers the simplicity you're looking for because it automates much of the feature engineering, model selection, and hyperparameter tuning.
The other options involving ARIMA models (A and B) are not appropriate for CLV, and setting up a Vertex AI Workbench notebook (D) introduces unnecessary complexity for this task.
You are implementing a batch inference ML pipeline in Google Cloud. The model was developed by using TensorFlow and is stored in SavedModel format in Cloud Storage. You need to apply the model to a historical dataset that is stored in a BigQuery table. You want to perform inference with minimal effort. What should you do?
A. Import the TensorFlow model by using the create model statement in BigQuery ML. Apply the historical data to the TensorFlow model.
B. Export the historical data to Cloud Storage in Avro format. Configure a Vertex Al batch prediction job to generate predictions for the exported data.
C. Export the historical data to Cloud Storage in CSV format. Configure a Vertex Al batch prediction job to generate predictions for the exported data.
D. Configure and deploy a Vertex Al endpoint. Use the endpoint to get predictions from the historical data in BigQuery.
Answer: B
Vertex AI batch prediction is the most appropriate and efficient way to apply a pre-trained model like TensorFlow’s SavedModel to a large dataset, especially for batch processing.
The Vertex AI batch prediction job works by exporting your dataset (in this case, historical data from BigQuery) to a suitable format (like Avro or CSV) and then processing it in Cloud Storage where the model is stored.
Avro format is recommended for large datasets as it is highly efficient for data storage and is optimized for read/write operations in Google Cloud, which is why option B is correct.
Option A suggests using BigQuery ML for inference, but it does not support running arbitrary TensorFlow models directly within BigQuery ML. Hence, BigQuery ML is not a valid option for this particular task.
Option C (exporting to CSV) is a valid alternative but is less efficient compared to Avro in terms of performance.
Option D suggests deploying a Vertex AI endpoint, which is better suited for real-time inference rather than batch inference. Since the question asks for batch inference, B is the best answer.
You need to train a ControlNet model with Stable Diffusion XL for an image editing use case. You want to train this model as quickly as possible. Which hardware configuration should you choose to train your model?
Configure one a2-highgpu-1g instance with an NVIDIA A100 GPU with 80 GB of RAM. Use float32 precision during model training.
Configure one a2-highgpu-1g instance with an NVIDIA A100 GPU with 80 GB of RAM. Use bfloat16 quantization during model training.
Configure four n1-standard-16 instances, each with one NVIDIA Tesla T4 GPU with 16 GB of RAM. Use float32 precision during model training.
Configure four n1-standard-16 instances, each with one NVIDIA Tesla T4 GPU with 16 GB of RAM. Use float16 quantization during model training.
NVIDIA A100 GPUs are optimized for training complex models like Stable Diffusion XL. Using float32 precision ensures high model accuracy during training, whereas float16 or bfloat16 may cause lower precision in gradients, especially important for image editing. Distributing across multiple instances with T4 GPUs (Options C and D) would not speed up the process effectively due to lower power and more complex setup requirements.
=================
You are investigating the root cause of a misclassification error made by one of your models. You used Vertex Al Pipelines to tram and deploy the model. The pipeline reads data from BigQuery. creates a copy of the data in Cloud Storage in TFRecord format trains the model in Vertex Al Training on that copy, and deploys the model to a Vertex Al endpoint. You have identified the specific version of that model that misclassified: and you need to recover the data this model was trained on. How should you find that copy of the data'?
Use Vertex Al Feature Store Modify the pipeline to use the feature store; and ensure that all training data is stored in it Search the feature store for the data used for the training.
Use the lineage feature of Vertex Al Metadata to find the model artifact Determine the version of the model and identify the step that creates the data copy, and search in the metadata for its location.
Use the logging features in the Vertex Al endpoint to determine the timestamp of the models deployment Find the pipeline run at that timestamp Identify the step that creates the data copy; and search in the logs for its location.
Find the job ID in Vertex Al Training corresponding to the training for the model Search in the logs of that job for the data used for the training.
Option A is not the best answer because it requires modifying the pipeline to use the Vertex AI Feature Store, which may not be feasible or necessary for recovering the data that the model was trained on. The Vertex AI Feature Store is a service that helps you manage, store, and serve feature values for your machine learning models1, but it is not designed for storing the raw data or the TFRecord files.
Option B is the best answer because it leverages the lineage feature of Vertex AI Metadata, which is a service that helps you track and manage the metadata of your machine learning workflows, such as datasets, models, metrics, and parameters2. The lineage feature allows you to view the relationships and dependencies among the artifacts and executions in your pipeline, and trace back the origin and history of any artifact3. By using the lineage feature, you can find the model artifact, determine the version of the model, identify the step that creates the data copy, and search in the metadata for its location.
Option C is not the best answer because it relies on the logging features in the Vertex AI endpoint, which may not be accurate or reliable for finding the data copy. The logging features in the Vertex AI endpoint help you monitor and troubleshoot the online predictions made by your deployed models, but they do not provide information about the training data or the pipeline steps4. Moreover, the timestamp of the model deployment may not match the timestamp of the pipeline run, as there may be delays or errors in the deployment process.
Option D is not the best answer because it requires finding the job ID in Vertex AI Training, which may not be easy or straightforward. Vertex AI Training is a service that helps you train your custom models on Google Cloud, but it does not provide a direct way to link the training job to the model version or the pipeline run. Moreover, searching in the logs of the job may not reveal the location of the data copy, as the logs may only contain information about the training process and the metrics.
References:
1: Introduction to Vertex AI Feature Store | Vertex AI | Google Cloud
2: Introduction to Vertex AI Metadata | Vertex AI | Google Cloud
3: View lineage for ML workflows | Vertex AI | Google Cloud
4: Monitor online predictions | Vertex AI | Google Cloud
[5]: Train custom models | Vertex AI | Google Cloud
You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI Training, and you want to improve the model’s training time. What should you try out first?
Migrate your model to TensorFlow, and train it using Vertex AI Training.
Train your model in a distributed mode using multiple Compute Engine VMs.
Train your model with DLVM images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.
Train your model using Vertex AI Training with GPUs.
Option A is incorrect because migrating your model to TensorFlow, and training it using Vertex AI Training, is not the easiest way to improve the model’s training time. TensorFlow is a framework that allows you to create and train ML models using Python or other languages. Vertex AI Training is a service that allows you to train and optimize ML models using built-in algorithms or custom containers. However, this option requires significant code changes, as TensorFlow and scikit-learn have different APIs and functionalities. Moreover, this option does not leverage the parallelism or the scalability of the cloud, as it only uses a single instance.
Option B is incorrect because training your model in a distributed mode using multiple Compute Engine VMs, is not the most convenient way to improve the model’s training time. Compute Engine is a service that allows you to create and manage virtual machines that run on Google Cloud. You can use Compute Engine to run your scikit-learn model in a distributed mode, by using libraries such as Dask or Joblib. However, this option requires more effort and resources than option D, as it involves creating and configuring the VMs, installing and maintaining the libraries, and writing and running the distributed code.
Option C is incorrect because training your model with DLVM images on Vertex AI, and ensuring that your code utilizes NumPy and SciPy internal methods whenever possible, is not the most effective way to improve the model’s training time. DLVM (Deep Learning Virtual Machine) images are preconfigured VM images that include popular ML frameworks and tools, such as TensorFlow, PyTorch, or scikit-learn1. You can use DLVM images on Vertex AI to train your scikit-learn model, by using a custom container. NumPy and SciPy are libraries that provide numerical and scientific computing functionalities for Python. You can use NumPy and SciPy internal methods to optimize your scikit-learn code, as they are faster and more efficient than pure Python code2. However, this option does not leverage the parallelism or the scalability of the cloud, as it only uses a single instance. Moreover, this option may not have a significant impact on the training time, as scikit-learn already relies on NumPy and SciPy for most of its operations3.
Option D is correct because training your model using Vertex AI Training with GPUs, is the best way to improve the model’s training time. A GPU (Graphics Processing Unit) is a hardware accelerator that can perform parallel computations faster than a CPU (Central Processing Unit)4. Vertex AI Training is a service that allows you to train and optimize ML models using built-in algorithms or custom containers. You can use Vertex AI Training with GPUs to train your scikit-learn model, by using a custom container and specifying the accelerator type and count5. By using Vertex AI Training with GPUs, you can leverage the parallelism and the scalability of the cloud, and speed up the training process significantly, without changing your code.
References:
DLVM images
NumPy and SciPy
scikit-learn dependencies
GPU overview
Vertex AI Training with GPUs
[scikit-learn overview]
[TensorFlow overview]
[Compute Engine overview]
[Dask overview]
[Joblib overview]
[Vertex AI Training overview]
You are the Director of Data Science at a large company, and your Data Science team has recently begun using the Kubeflow Pipelines SDK to orchestrate their training pipelines. Your team is struggling to integrate their custom Python code into the Kubeflow Pipelines SDK. How should you instruct them to proceed in order to quickly integrate their code with the Kubeflow Pipelines SDK?
Use the func_to_container_op function to create custom components from the Python code.
Use the predefined components available in the Kubeflow Pipelines SDK to access Dataproc, and run the custom code there.
Package the custom Python code into Docker containers, and use the load_component_from_file function to import the containers into the pipeline.
Deploy the custom Python code to Cloud Functions, and use Kubeflow Pipelines to trigger the Cloud Function.
The easiest way to integrate custom Python code into the Kubeflow Pipelines SDK is to use the func_to_container_op function, which converts a Python function into a pipeline component. This function automatically builds a Docker image that executes the Python function, and returns a factory function that can be used to create kfp.dsl.ContainerOp instances for the pipeline. This option has the following benefits:
It allows the data science team to reuse their existing Python code without rewriting it or packaging it into containers manually.
It simplifies the component specification and implementation, as the function signature defines the component interface and the function body defines the component logic.
It supports various types of inputs and outputs, such as primitive types, files, directories, and dictionaries.
The other options are less optimal for the following reasons:
Option B: Using the predefined components available in the Kubeflow Pipelines SDK to access Dataproc, and run the custom code there, introduces additional complexity and cost. This option requires creating and managing Dataproc clusters, which are ephemeral and scalable clusters of Compute Engine instances that run Apache Spark and Apache Hadoop. Moreover, this option requires writing the custom code in PySpark or Hadoop MapReduce, which may not be compatible with the existing Python code.
Option C: Packaging the custom Python code into Docker containers, and using the load_component_from_file function to import the containers into the pipeline, introduces additional steps and overhead. This option requires creating and maintaining Dockerfiles, building and pushing Docker images, and writing component specifications in YAML files. Moreover, this option requires managing the dependencies and versions of the Python code and the Docker images.
Option D: Deploying the custom Python code to Cloud Functions, and using Kubeflow Pipelines to trigger the Cloud Function, introduces additional latency and limitations. This option requires creating and deploying Cloud Functions, which are serverless functions that execute in response to events. Moreover, this option requires invoking the Cloud Functions from the Kubeflow Pipelines using HTTP requests, which can incur network overhead and latency. Additionally, this option is subject to the quotas and limits of Cloud Functions, such as the maximum execution time and memory usage.
References:
Building Python function-based components | Kubeflow
Building Python Function-based Components | Kubeflow
You are developing an ML model to identify your company s products in images. You have access to over one million images in a Cloud Storage bucket. You plan to experiment with different TensorFlow models by using Vertex Al Training You need to read images at scale during training while minimizing data I/O bottlenecks What should you do?
Load the images directly into the Vertex Al compute nodes by using Cloud Storage FUSE Read the images by using the tf .data.Dataset.from_tensor_slices function.
Create a Vertex Al managed dataset from your image data Access the aip_training_data_uri
environment variable to read the images by using the tf. data. Dataset. Iist_flies function.
Convert the images to TFRecords and store them in a Cloud Storage bucket Read the TFRecords by using the tf. ciata.TFRecordDataset function.
Store the URLs of the images in a CSV file Read the file by using the tf.data.experomental.CsvDataset function.
TFRecords are a binary file format that can store large amounts of data efficiently. By converting the images to TFRecords and storing them in a Cloud Storage bucket, you can reduce the data size and improve the data transfer speed. You can then read the TFRecords by using the tf.data.TFRecordDataset function, which creates a dataset of tensors from the TFRecord files. This way, you can read images at scale during training while minimizing data I/O bottlenecks. References:
TFRecord documentation
tf.data.TFRecordDataset documentation
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
You are designing an ML recommendation model for shoppers on your company's ecommerce website. You will use Recommendations Al to build, test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?
Use the "Other Products You May Like" recommendation type to increase the click-through rate
Use the "Frequently Bought Together' recommendation type to increase the shopping cart size for each order.
Import your user events and then your product catalog to make sure you have the highest quality event stream
Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.
Recommendations AI is a service that allows users to build, test, and deploy personalized product recommendations for their ecommerce websites. It uses Google’s deep learning models to learn from user behavior and product data, and generate high-quality recommendations that can increase revenue, click-through rate, and customer satisfaction. One of the best practices for using Recommendations AI is to choose the right recommendation type for the business objective. The “Frequently Bought Together” recommendation type shows products that are often purchased together with the current product, and encourages users to add more items to their shopping cart. This can increase the average order value and the revenue for each transaction. The other options are not as effective or feasible for this objective. The “Other Products You May Like” recommendation type shows products that are similar to the current product, and may increase the click-through rate, but not necessarily the shopping cart size. Importing the user events and then the product catalog is not a recommended order, as it may cause data inconsistency and missing recommendations. The product catalog should be imported first, and then the user events. Using placeholder values for the product catalog is not a viable option, as it will not produce meaningful recommendations or reflect the real performance of the model. References:
Recommendations AI documentation
Choosing a recommendation type
Importing data to Recommendations AI
You are implementing a batch inference ML pipeline in Google Cloud. The model was developed using TensorFlow and is stored in SavedModel format in Cloud Storage You need to apply the model to a historical dataset containing 10 TB of data that is stored in a BigQuery table How should you perform the inference?
Export the historical data to Cloud Storage in Avro format. Configure a Vertex Al batch prediction job to generate predictions for the exported data.
Import the TensorFlow model by using the create model statement in BigQuery ML Apply the historical data to the TensorFlow model.
Export the historical data to Cloud Storage in CSV format Configure a Vertex Al batch prediction job to generate predictions for the exported data.
Configure a Vertex Al batch prediction job to apply the model to the historical data in BigQuery
The best option for implementing a batch inference ML pipeline in Google Cloud, using a model that was developed using TensorFlow and is stored in SavedModel format in Cloud Storage, and a historical dataset containing 10 TB of data that is stored in a BigQuery table, is to configure a Vertex AI batch prediction job to apply the model to the historical data in BigQuery. This option allows you to leverage the power and simplicity of Vertex AI and BigQuery to perform large-scale batch inference with minimal code and configuration. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can run a batch prediction job, which can generate predictions for a large number of instances in batches. Vertex AI can also provide various tools and services for data analysis, model development, model deployment, model monitoring, and model governance. A batch prediction job is a resource that can run your model code on Vertex AI. A batch prediction job can help you generate predictions for a large number of instances in batches, and store the prediction results in a destination of your choice. A batch prediction job can accept various input formats, such as JSON, CSV, or TFRecord. A batch prediction job can also accept various input sources, such as Cloud Storage or BigQuery. A TensorFlow model is a resource that represents a machine learning model that is built using TensorFlow. TensorFlow is a framework that can perform large-scale data processing and machine learning. TensorFlow can help you build and train various types of models, such as linear regression, logistic regression, k-means clustering, matrix factorization, and deep neural networks. A SavedModel format is a type of format that can store a TensorFlow model and its associated assets. A SavedModel format can help you save and load your TensorFlow model, and serve it for prediction. A SavedModel format can be stored in Cloud Storage, which is a service that can store and access large-scale data on Google Cloud. A historical dataset is a collection of data that contains historical information about a certain domain. A historical dataset can help you analyze the past trends and patterns of the data, and make predictions for the future. A historical dataset can be stored in BigQuery, which is a service that can store and query large-scale data on Google Cloud. BigQuery can help you analyze your data by using SQL queries, and perform various tasks, such as data exploration, data transformation, or data visualization. By configuring a Vertex AI batch prediction job to apply the model to the historical data in BigQuery, you can implement a batch inference ML pipeline in Google Cloud with minimal code and configuration. You can use the Vertex AI API or the gcloud command-line tool to configure a batch prediction job, and provide the model name, the model version, the input source, the input format, the output destination, and the output format. Vertex AI will automatically run the batch prediction job, and apply the model to the historical data in BigQuery. Vertex AI will also store the prediction results in a destination of your choice, such as Cloud Storage or BigQuery1.
The other options are not as good as option D, for the following reasons:
Option A: Exporting the historical data to Cloud Storage in Avro format, configuring a Vertex AI batch prediction job to generate predictions for the exported data would require more skills and steps than configuring a Vertex AI batch prediction job to apply the model to the historical data in BigQuery, and could increase the complexity and cost of the batch inference process. Avro is a type of format that can store and serialize data in a binary format. Avro can help you compress and encode your data, and support schema evolution and compatibility. By exporting the historical data to Cloud Storage in Avro format, configuring a Vertex AI batch prediction job to generate predictions for the exported data, you can perform batch inference with minimal code and configuration. You can use the BigQuery API or the bq command-line tool to export the historical data to Cloud Storage in Avro format, and use the Vertex AI API or the gcloud command-line tool to configure a batch prediction job, and provide the model name, the model version, the input source, the input format, the output destination, and the output format. However, exporting the historical data to Cloud Storage in Avro format, configuring a Vertex AI batch prediction job to generate predictions for the exported data would require more skills and steps than configuring a Vertex AI batch prediction job to apply the model to the historical data in BigQuery, and could increase the complexity and cost of the batch inference process. You would need to write code, export the historical data to Cloud Storage, configure a batch prediction job, and generate predictions for the exported data. Moreover, this option would not use BigQuery as the input source for the batch prediction job, which can simplify the batch inference process, and provide various benefits, such as fast query performance, serverless scaling, and cost optimization2.
Option B: Importing the TensorFlow model by using the create model statement in BigQuery ML, applying the historical data to the TensorFlow model would not allow you to use Vertex AI to run the batch prediction job, and could increase the complexity and cost of the batch inference process. BigQuery ML is a feature of BigQuery that can create and execute machine learning models in BigQuery by using SQL queries. BigQuery ML can help you build and train various types of models, such as linear regression, logistic regression, k-means clustering, matrix factorization, and deep neural networks. A create model statement is a type of SQL statement that can create a machine learning model in BigQuery ML. A create model statement can help you specify the model name, the model type, the model options, and the model query. By importing the TensorFlow model by using the create model statement in BigQuery ML, applying the historical data to the TensorFlow model, you can perform batch inference with minimal code and configuration. You can use the BigQuery API or the bq command-line tool to import the TensorFlow model by using the create model statement in BigQuery ML, and provide the model name, the model type, the model options, and the model query. You can also use the BigQuery API or the bq command-line tool to apply the historical data to the TensorFlow model, and provide the model name, the input data, and the output destination. However, importing the TensorFlow model by using the create model statement in BigQuery ML, applying the historical data to the TensorFlow model would not allow you to use Vertex AI to run the batch prediction job, and could increase the complexity and cost of the batch inference process. You would need to write code, import the TensorFlow model, apply the historical data, and generate predictions. Moreover, this option would not use Vertex AI, which is a unified platform for building and deploying machine learning solutions on Google Cloud, and provide various tools and services for data analysis, model development, model deployment, model monitoring, and model governance3.
Option C: Exporting the historical data to Cloud Storage in CSV format, configuring a Vertex AI batch prediction job to generate predictions for the exported data would require more skills and steps than configuring a Vertex AI batch prediction job to apply the model to the historical data in BigQuery, and could increase the complexity and cost of the batch inference process. CSV is a type of format that can store and serialize data in a comma-separated values format. CSV can help you store and exchange your data, and support various data types and formats. By exporting the historical data to Cloud Storage in CSV format, configuring a Vertex AI batch prediction job to generate predictions for the exported data, you can perform batch inference with minimal code and configuration. You can use the BigQuery API or the bq command-line tool to export the historical data to Cloud Storage in CSV format, and use the Vertex AI API or the gcloud command-line tool to configure a batch prediction job, and provide the model name, the model version, the input source, the input format, the output destination, and the output format. However, exporting the historical data to Cloud Storage in CSV format, configuring a Vertex AI batch prediction job to generate predictions for the exported data would require more skills and steps than configuring a Vertex AI batch prediction job to apply the model to the historical data in BigQuery, and could increase the complexity and cost of the batch inference process. You would need to write code, export the historical data to Cloud Storage, configure a batch prediction job, and generate predictions for the exported data. Moreover, this option would not use BigQuery as the input source for the batch prediction job, which can simplify the batch inference process, and provide various benefits, such as fast query performance, serverless scaling, and cost optimization2.
References:
Batch prediction | Vertex AI | Google Cloud
Exporting table data | BigQuery | Google Cloud
Creating and using models | BigQuery ML | Google Cloud
You are building a MLOps platform to automate your company's ML experiments and model retraining. You need to organize the artifacts for dozens of pipelines How should you store the pipelines' artifacts'?
Store parameters in Cloud SQL and store the models' source code and binaries in GitHub
Store parameters in Cloud SQL store the models' source code in GitHub, and store the models' binaries in Cloud Storage.
Store parameters in Vertex ML Metadata store the models' source code in GitHub and store the models' binaries in Cloud Storage.
Store parameters in Vertex ML Metadata and store the models source code and binaries in GitHub.
To organize the artifacts for dozens of pipelines, you should store the parameters in Vertex ML Metadata, store the models’ source code in GitHub, and store the models’ binaries in Cloud Storage. This option has the following advantages:
Vertex ML Metadata is a service that helps you track and manage the metadata of your ML workflows, such as datasets, models, metrics, and parameters1. It can also help you with data lineage, model versioning, and model performance monitoring2.
GitHub is a popular platform for hosting and collaborating on code repositories. It can help you manage the source code of your models, as well as the configuration files, scripts, and notebooks that are part of your ML pipelines3.
Cloud Storage is a scalable and durable object storage service that can store any type of data, including model binaries4. It can also integrate with other services, such as Vertex AI, Cloud Functions, and Cloud Run, to enable easy deployment and serving of your models5.
References:
1: Introduction to Vertex ML Metadata | Vertex AI | Google Cloud
2: Manage metadata for ML workflows | Vertex AI | Google Cloud
3: GitHub - Where the world builds software
4: Cloud Storage | Google Cloud
5: Deploying models | Vertex AI | Google Cloud
You need to design an architecture that serves asynchronous predictions to determine whether a particular mission-critical machine part will fail. Your system collects data from multiple sensors from the machine. You want to build a model that will predict a failure in the next N minutes, given the average of each sensor’s data from the past 12 hours. How should you design the architecture?
1. HTTP requests are sent by the sensors to your ML model, which is deployed as a microservice and exposes a REST API for prediction
2. Your application queries a Vertex AI endpoint where you deployed your model.
3. Responses are received by the caller application as soon as the model produces the prediction.
1. Events are sent by the sensors to Pub/Sub, consumed in real time, and processed by a Dataflow stream processing pipeline.
2. The pipeline invokes the model for prediction and sends the predictions to another Pub/Sub topic.
3. Pub/Sub messages containing predictions are then consumed by a downstream system for monitoring.
1. Export your data to Cloud Storage using Dataflow.
2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.
3. Export the batch prediction job outputs from Cloud Storage and import them into Cloud SQL.
1. Export the data to Cloud Storage using the BigQuery command-line tool
2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.
3. Export the batch prediction job outputs from Cloud Storage and import them into BigQuery.
Reasoning: The question asks for a design that serves asynchronous predictions to determine whether a machine part will fail. This means that the predictions do not need to be returned immediately to the sensors, but can be processed in batches and sent to a downstream system for monitoring. Option B is the only one that uses a streaming data pipeline with Pub/Sub and Dataflow, which can handle real-time data ingestion, processing, and prediction. Option B also invokes the model for prediction, which is required by the question. The other options either use synchronous predictions (option A), batch predictions (options C and D), or do not invoke the model for prediction (option D).
References: You can learn more about the differences between synchronous, asynchronous, and batch predictions in Vertex AI from this document. You can also find examples of how to use Pub/Sub and Dataflow for streaming data pipelines from this tutorial and this codelab.
You work on a data science team at a bank and are creating an ML model to predict loan default risk. You have collected and cleaned hundreds of millions of records worth of training data in a BigQuery table, and you now want to develop and compare multiple models on this data using TensorFlow and Vertex AI. You want to minimize any bottlenecks during the data ingestion state while considering scalability. What should you do?
Use the BigQuery client library to load data into a dataframe, and use tf.data.Dataset.from_tensor_slices() to read it.
Export data to CSV files in Cloud Storage, and use tf.data.TextLineDataset() to read them.
Convert the data into TFRecords, and use tf.data.TFRecordDataset() to read them.
Use TensorFlow I/O’s BigQuery Reader to directly read the data.
The best option for developing and comparing multiple models on a large-scale BigQuery table using TensorFlow and Vertex AI is to use TensorFlow I/O’s BigQuery Reader to directly read the data. This option has the following advantages:
It minimizes any bottlenecks during the data ingestion stage, as the BigQuery Reader can stream data from BigQuery to TensorFlow in parallel and in batches, without loading the entire table into memory or disk. The BigQuery Reader can also perform data transformations and filtering using SQL queries, reducing the need for additional preprocessing steps in TensorFlow.
It leverages the scalability and performance of BigQuery, as the BigQuery Reader can handle hundreds of millions of records worth of training data efficiently and reliably. BigQuery is a serverless, fully managed, and highly scalable data warehouse that can run complex queries over petabytes of data in seconds.
It simplifies the integration with Vertex AI, as the BigQuery Reader can be used with both custom and pre-built TensorFlow models on Vertex AI. Vertex AI is a unified platform for machine learning that provides various tools and features for data ingestion, data labeling, data preprocessing, model training, model tuning, model deployment, model monitoring, and model explainability.
The other options are less optimal for the following reasons:
Option A: Using the BigQuery client library to load data into a dataframe, and using tf.data.Dataset.from_tensor_slices() to read it, introduces memory and performance issues. This option requires loading the entire BigQuery table into a Pandas dataframe, which can consume a lot of memory and cause out-of-memory errors. Moreover, using tf.data.Dataset.from_tensor_slices() to read the dataframe can be slow and inefficient, as it creates one slice per row of the dataframe, resulting in a large number of small tensors.
Option B: Exporting data to CSV files in Cloud Storage, and using tf.data.TextLineDataset() to read them, introduces additional steps and complexity. This option requires exporting the BigQuery table to one or more CSV files in Cloud Storage, which can take a long time and consume a lot of storage space. Moreover, using tf.data.TextLineDataset() to read the CSV files can be slow and error-prone, as it requires parsing and decoding each line of text, handling missing values and invalid data, and applying data transformations and validations.
Option C: Converting the data into TFRecords, and using tf.data.TFRecordDataset() to read them, introduces additional steps and complexity. This option requires converting the BigQuery table into one or more TFRecord files, which are binary files that store serialized TensorFlow examples. This can take a long time and consume a lot of storage space. Moreover, using tf.data.TFRecordDataset() to read the TFRecord files requires defining and parsing the schema of the TensorFlow examples, which can be tedious and error-prone.
References:
[TensorFlow I/O documentation]
[BigQuery documentation]
[Vertex AI documentation]
You developed a custom model by using Vertex Al to forecast the sales of your company s products based on historical transactional data You anticipate changes in the feature distributions and the correlations between the features in the near future You also expect to receive a large volume of prediction requests You plan to use Vertex Al Model Monitoring for drift detection and you want to minimize the cost. What should you do?
Use the features for monitoring Set a monitoring- frequency value that is higher than the default.
Use the features for monitoring Set a prediction-sampling-rare value that is closer to 1 than 0.
Use the features and the feature attributions for monitoring. Set a monitoring-frequency value that is lower than the default.
Use the features and the feature attributions for monitoring Set a prediction-sampling-rate value that is closer to 0 than 1.
The best option for using Vertex AI Model Monitoring for drift detection and minimizing the cost is to use the features and the feature attributions for monitoring, and set a prediction-sampling-rate value that is closer to 0 than 1. This option allows you to leverage the power and flexibility of Google Cloud to detect feature drift in the input predict requests for custom models, and reduce the storage and computation costs of the model monitoring job. Vertex AI Model Monitoring is a service that can track and compare the results of multiple machine learning runs. Vertex AI Model Monitoring can monitor the model’s prediction input data for feature skew and drift. Feature drift occurs when the feature data distribution in production changes over time. If the original training data is not available, you can enable drift detection to monitor your models for feature drift. Vertex AI Model Monitoring uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores for each feature, and compares them with a baseline distribution. The baseline distribution is the statistical distribution of the feature’s values in the training data. If the training data is not available, the baseline distribution is calculated from the first 1000 prediction requests that the model receives. If the distance score for a feature exceeds an alerting threshold that you set, Vertex AI Model Monitoring sends you an email alert. However, if you use a custom model, you can also enable feature attribution monitoring, which can provide more insights into the feature drift. Feature attribution monitoring analyzes the feature attributions, which are the contributions of each feature to the prediction output. Feature attribution monitoring can help you identify the features that have the most impact on the model performance, and the features that have the most significant drift over time. Feature attribution monitoring can also help you understand the relationship between the features and the prediction output, and the correlation between the features1. The prediction-sampling-rate is a parameter that determines the percentage of prediction requests that are logged and analyzed by the model monitoring job. Using a lower prediction-sampling-rate can reduce the storage and computation costs of the model monitoring job, but also the quality and validity of the data. Using a lower prediction-sampling-rate can introduce sampling bias and noise into the data, and make the model monitoring job miss some important features or patterns of the data. However, using a higher prediction-sampling-rate can increase the storage and computation costs of the model monitoring job, and also the amount of data that needs to be processed and analyzed. Therefore, there is a trade-off between the prediction-sampling-rate and the cost and accuracy of the model monitoring job, and the optimal prediction-sampling-rate depends on the business objective and the data characteristics2. By using the features and the feature attributions for monitoring, and setting a prediction-sampling-rate value that is closer to 0 than 1, you can use Vertex AI Model Monitoring for drift detection and minimize the cost.
The other options are not as good as option D, for the following reasons:
Option A: Using the features for monitoring and setting a monitoring-frequency value that is higher than the default would not enable feature attribution monitoring, and could increase the cost of the model monitoring job. The monitoring-frequency is a parameter that determines how often the model monitoring job analyzes the logged prediction requests and calculates the distributions and distance scores for each feature. Using a higher monitoring-frequency can increase the frequency and timeliness of the model monitoring job, but also the computation costs of the model monitoring job. Moreover, using the features for monitoring would not enable feature attribution monitoring, which can provide more insights into the feature drift and the model performance1.
Option B: Using the features for monitoring and setting a prediction-sampling-rate value that is closer to 1 than 0 would not enable feature attribution monitoring, and could increase the cost of the model monitoring job. The prediction-sampling-rate is a parameter that determines the percentage of prediction requests that are logged and analyzed by the model monitoring job. Using a higher prediction-sampling-rate can increase the quality and validity of the data, but also the storage and computation costs of the model monitoring job. Moreover, using the features for monitoring would not enable feature attribution monitoring, which can provide more insights into the feature drift and the model performance12.
Option C: Using the features and the feature attributions for monitoring and setting a monitoring-frequency value that is lower than the default would enable feature attribution monitoring, but could reduce the frequency and timeliness of the model monitoring job. The monitoring-frequency is a parameter that determines how often the model monitoring job analyzes the logged prediction requests and calculates the distributions and distance scores for each feature. Using a lower monitoring-frequency can reduce the computation costs of the model monitoring job, but also the frequency and timeliness of the model monitoring job. This can make the model monitoring job less responsive and effective in detecting and alerting the feature drift1.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 4: Evaluation
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.3 Monitoring ML models in production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.3: Monitoring ML Models
Using Model Monitoring
Understanding the score threshold slider
You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute Engine. You use the following parameters:
• Optimizer: SGD
• Image shape = 224x224
• Batch size = 64
• Epochs = 10
• Verbose = 2
During training you encounter the following error: ResourceExhaustedError: out of Memory (oom) when allocating tensor. What should you do?
Change the optimizer
Reduce the batch size
Change the learning rate
Reduce the image shape
A ResourceExhaustedError: out of memory (OOM) when allocating tensor is an error that occurs when the GPU runs out of memory while trying to allocate memory for a tensor. A tensor is a multi-dimensional array of numbers that represents the data or the parameters of a machine learning model. The size and shape of a tensor depend on various factors, such as the input data, the model architecture, the batch size, and the optimization algorithm1.
For the use case of training a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute Engine, the best option to resolve the error is to reduce the batch size. The batch size is a parameter that determines how many input examples are processed at a time by the model. A larger batch size can improve the model’s accuracy and stability, but it also requires more memory and computation. A smaller batch size can reduce the memory and computation requirements, but it may also affect the model’s performance and convergence2.
By reducing the batch size, the GPU can allocate less memory for each tensor, and avoid running out of memory. Reducing the batch size can also speed up the training process, as the GPU can process more batches in parallel. However, reducing the batch size too much may also have some drawbacks, such as increasing the noise and variance of the gradient updates, and slowing down the convergence of the model. Therefore, the optimal batch size should be chosen based on the trade-off between memory, computation, and performance3.
The other options are not as effective as option B, because they are not directly related to the memory allocation of the GPU. Option A, changing the optimizer, may affect the speed and quality of the optimization process, but it may not reduce the memory usage of the model. Option C, changing the learning rate, may affect the convergence and stability of the model, but it may not reduce the memory usage of the model. Option D, reducing the image shape, may reduce the size of the input tensor, but it may also reduce the quality and resolution of the image, and affect the model’s accuracy. Therefore, option B, reducing the batch size, is the best answer for this question.
References:
ResourceExhaustedError: OOM when allocating tensor with shape - Stack Overflow
How does batch size affect model performance and training time? - Stack Overflow
How to choose an optimal batch size for training a neural network? - Stack Overflow
You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using Al Platform, and then using the best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up the tuning job without significantly compromising its effectiveness. Which actions should you take?
Choose 2 answers
Decrease the number of parallel trials
Decrease the range of floating-point values
Set the early stopping parameter to TRUE
Change the search algorithm from Bayesian search to random search.
Decrease the maximum number of trials during subsequent training phases.
Hyperparameter tuning is the process of finding the optimal values for the parameters of a machine learning model that affect its performance. AI Platform provides a service for hyperparameter tuning that can run multiple trials in parallel and use different search algorithms to find the best combination of hyperparameters. However, hyperparameter tuning can be time-consuming and costly, especially if the search space is large and the model training is complex. Therefore, it is important to optimize the tuning job to reduce the time and resources required.
One way to speed up the tuning job is to set the early stopping parameter to TRUE. This means that the tuning service will automatically stop trials that are unlikely to perform well based on the intermediate results. This can save time and resources by avoiding unnecessary computations for trials that are not promising. The early stopping parameter can be set in the trainingInput.hyperparameters field of the training job request1
Another way to speed up the tuning job is to decrease the maximum number of trials during subsequent training phases. This means that the tuning service will use fewer trials to refine the search space after the initial phase. This can reduce the time required for the tuning job to converge to the optimal solution. The maximum number of trials can be set in the trainingInput.hyperparameters.maxTrials field of the training job request1
The other options are not effective ways to speed up the tuning job. Decreasing the number of parallel trials will reduce the concurrency of the tuning job and increase the overall time required. Decreasing the range of floating-point values will reduce the diversity of the search space and may miss some optimal solutions. Changing the search algorithm from Bayesian search to random search will reduce the efficiency of the tuning job and may require more trials to find the best solution1
References: 1: Hyperparameter tuning overview
You have recently trained a scikit-learn model that you plan to deploy on Vertex Al. This model will support both online and batch prediction. You need to preprocess input data for model inference. You want to package the model for deployment while minimizing additional code What should you do?
1 Upload your model to the Vertex Al Model Registry by using a prebuilt scikit-learn prediction container
2 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job that uses the instanceConfig.inscanceType setting to transform your input data
1 Wrap your model in a custom prediction routine (CPR). and build a container image from the CPR local model
2 Upload your sci-kit learn model container to Vertex Al Model Registry
3 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job
1. Create a custom container for your sci-kit learn model,
2 Define a custom serving function for your model
3 Upload your model and custom container to Vertex Al Model Registry
4 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job
1 Create a custom container for your sci-kit learn model.
2 Upload your model and custom container to Vertex Al Model Registry
3 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job that uses the instanceConfig. instanceType setting to transform your input data
The best option for deploying a scikit-learn model on Vertex AI with minimal additional code is to wrap the model in a custom prediction routine (CPR) and build a container image from the CPR local model. Upload your scikit-learn model container to Vertex AI Model Registry. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job. This option allows you to leverage the power and simplicity of Google Cloud to deploy and serve a scikit-learn model that supports both online and batch prediction. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained scikit-learn model to an online prediction endpoint, which can provide low-latency predictions for individual instances. Vertex AI can also create a batch prediction job, which can provide high-throughput predictions for a large batch of instances. A custom prediction routine (CPR) is a Python script that defines the logic for preprocessing the input data, running the prediction, and postprocessing the output data. A CPR can help you customize the prediction behavior of your model, and handle complex or non-standard data formats. A CPR can also help you minimize the additional code, as you only need to write a few functions to implement the prediction logic. A container image is a package that contains the model, the CPR, and the dependencies. A container image can help you standardize and simplify the deployment process, as you only need to upload the container image to Vertex AI Model Registry, and deploy it to Vertex AI Endpoints. By wrapping the model in a CPR and building a container image from the CPR local model, uploading the scikit-learn model container to Vertex AI Model Registry, deploying the model to Vertex AI Endpoints, and creating a Vertex AI batch prediction job, you can deploy a scikit-learn model on Vertex AI with minimal additional code1.
The other options are not as good as option B, for the following reasons:
Option A: Uploading your model to the Vertex AI Model Registry by using a prebuilt scikit-learn prediction container, deploying your model to Vertex AI Endpoints, and creating a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data would not allow you to preprocess the input data for model inference, and could cause errors or poor performance. A prebuilt scikit-learn prediction container is a container image that is provided by Google Cloud, and contains the scikit-learn framework and the dependencies. A prebuilt scikit-learn prediction container can help you deploy a scikit-learn model without writing any code, but it also limits your customization options. A prebuilt scikit-learn prediction container can only handle standard data formats, such as JSON or CSV, and cannot perform any preprocessing or postprocessing on the input or output data. If your input data requires any transformation or normalization before running the prediction, you cannot use a prebuilt scikit-learn prediction container. The instanceConfig.instanceType setting is a parameter that determines the machine type and the accelerator type for the batch prediction job. The instanceConfig.instanceType setting can help you optimize the performance and the cost of the batch prediction job, but it cannot help you transform your input data2.
Option C: Creating a custom container for your scikit-learn model, defining a custom serving function for your model, uploading your model and custom container to Vertex AI Model Registry, and deploying your model to Vertex AI Endpoints, and creating a Vertex AI batch prediction job would require more skills and steps than using a CPR and a container image. A custom container is a container image that contains the model, the dependencies, and a web server. A custom container can help you customize the prediction behavior of your model, and handle complex or non-standard data formats. A custom serving function is a Python function that defines the logic for running the prediction on the model. A custom serving function can help you implement the prediction logic of your model, and handle complex or non-standard data formats. However, creating a custom container and defining a custom serving function would require more skills and steps than using a CPR and a container image. You would need to write code, build and test the container image, configure the web server, and implement the prediction logic. Moreover, creating a custom container and defining a custom serving function would not allow you to preprocess the input data for model inference, as the custom serving function only runs the prediction on the model3.
Option D: Creating a custom container for your scikit-learn model, uploading your model and custom container to Vertex AI Model Registry, deploying your model to Vertex AI Endpoints, and creating a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data would not allow you to preprocess the input data for model inference, and could cause errors or poor performance. A custom container is a container image that contains the model, the dependencies, and a web server. A custom container can help you customize the prediction behavior of your model, and handle complex or non-standard data formats. However, creating a custom container would require more skills and steps than using a CPR and a container image. You would need to write code, build and test the container image, and configure the web server. The instanceConfig.instanceType setting is a parameter that determines the machine type and the accelerator type for the batch prediction job. The instanceConfig.instanceType setting can help you optimize the performance and the cost of the batch prediction job, but it cannot help you transform your input data23.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 2: Serving ML Predictions
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.1 Deploying ML models to production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.2: Serving ML Predictions
Custom prediction routines
Using pre-built containers for prediction
Using custom containers for prediction
You work at a bank You have a custom tabular ML model that was provided by the bank's vendor. The training data is not available due to its sensitivity. The model is packaged as a Vertex Al Model serving container which accepts a string as input for each prediction instance. In each string the feature values are separated by commas. You want to deploy this model to production for online predictions, and monitor the feature distribution over time with minimal effort What should you do?
1 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Ai endpoint.
2. Create a Vertex Al Model Monitoring job with feature drift detection as the monitoring objective, and provide an instance schema.
1 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.
2 Create a Vertex Al Model Monitoring job with feature skew detection as the monitoring objective and provide an instance schema.
1 Refactor the serving container to accept key-value pairs as input format.
2. Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.
3. Create a Vertex Al Model Monitoring job with feature drift detection as the monitoring objective.
1 Refactor the serving container to accept key-value pairs as input format.
2 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.
3. Create a Vertex Al Model Monitoring job with feature skew detection as the monitoring objective.
The best option for deploying a custom tabular ML model to production for online predictions, and monitoring the feature distribution over time with minimal effort, using a model that was provided by the bank’s vendor, the training data is not available due to its sensitivity, and the model is packaged as a Vertex AI Model serving container which accepts a string as input for each prediction instance, is to upload the model to Vertex AI Model Registry and deploy the model to a Vertex AI endpoint, create a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and provide an instance schema. This option allows you to leverage the power and simplicity of Vertex AI to serve and monitor your model with minimal code and configuration. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained model to an online prediction endpoint, which can provide low-latency predictions for individual instances. Vertex AI can also provide various tools and services for data analysis, model development, model deployment, model monitoring, and model governance. A Vertex AI Model Registry is a resource that can store and manage your models on Vertex AI. A Vertex AI Model Registry can help you organize and track your models, and access various model information, such as model name, model description, and model labels. A Vertex AI Model serving container is a resource that can run your custom model code on Vertex AI. A Vertex AI Model serving container can help you package your model code and dependencies into a container image, and deploy the container image to an online prediction endpoint. A Vertex AI Model serving container can accept various input formats, such as JSON, CSV, or TFRecord. A string input format is a type of input format that accepts a string as input for each prediction instance. A string input format can help you encode your feature values into a single string, and separate them by commas. By uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, you can serve your model for online predictions with minimal code and configuration. You can use the Vertex AI API or the gcloud command-line tool to upload the model to Vertex AI Model Registry, and provide the model name, model description, and model labels. You can also use the Vertex AI API or the gcloud command-line tool to deploy the model to a Vertex AI endpoint, and provide the endpoint name, endpoint description, endpoint labels, and endpoint resources. A Vertex AI Model Monitoring job is a resource that can monitor the performance and quality of your deployed models on Vertex AI. A Vertex AI Model Monitoring job can help you detect and diagnose issues with your models, such as data drift, prediction drift, training/serving skew, or model staleness. Feature drift is a type of model monitoring metric that measures the difference between the distributions of the features used to train the model and the features used to serve the model over time. Feature drift can indicate that the online data is changing over time, and the model performance is degrading. By creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and providing an instance schema, you can monitor the feature distribution over time with minimal effort. You can use the Vertex AI API or the gcloud command-line tool to create a Vertex AI Model Monitoring job, and provide the monitoring objective, the monitoring frequency, the alerting threshold, and the notification channel. You can also provide an instance schema, which is a JSON file that describes the features and their types in the prediction input data. An instance schema can help Vertex AI Model Monitoring parse and analyze the string input format, and calculate the feature distributions and distance scores1.
The other options are not as good as option A, for the following reasons:
Option B: Uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective, and providing an instance schema would not help you monitor the changes in the online data over time, and could cause errors or poor performance. Feature skew is a type of model monitoring metric that measures the difference between the distributions of the features used to train the model and the features used to serve the model at a given point in time. Feature skew can indicate that the model is not trained on the representative data, or that the data is changing over time. By creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective, and providing an instance schema, you can monitor the feature distribution at a given point in time with minimal effort. However, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective, and providing an instance schema would not help you monitor the changes in the online data over time, and could cause errors or poor performance. You would need to use the Vertex AI API or the gcloud command-line tool to upload the model to Vertex AI Model Registry, deploy the model to a Vertex AI endpoint, create a Vertex AI Model Monitoring job, and provide an instance schema. Moreover, this option would not monitor the feature drift, which is a more direct and relevant metric for measuring the changes in the online data over time, and the model performance and quality1.
Option C: Refactoring the serving container to accept key-value pairs as input format, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective would require more skills and steps than uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and providing an instance schema. A key-value pair input format is a type of input format that accepts a key-value pair as input for each prediction instance. A key-value pair input format can help you specify the feature names and values in a JSON object, and separate them by colons. By refactoring the serving container to accept key-value pairs as input format, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, you can serve and monitor your model with minimal code and configuration. You can write code to refactor the serving container to accept key-value pairs as input format, and use the Vertex AI API or the gcloud command-line tool to upload the model to Vertex AI Model Registry, deploy the model to a Vertex AI endpoint, and create a Vertex AI Model Monitoring job. However, refactoring the serving container to accept key-value pairs as input format, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective would require more skills and steps than uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and providing an instance schema. You would need to write code, refactor the serving container, upload the model to Vertex AI Model Registry, deploy the model to a Vertex AI endpoint, and create a Vertex AI Model Monitoring job. Moreover, this option would not use the instance schema, which is a JSON file that can help Vertex AI Model Monitoring parse and analyze the string input format, and calculate the feature distributions and distance scores1.
Option D: Refactoring the serving container to accept key-value pairs as input format, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective would require more skills and steps than uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and providing an instance schema, and would not help you monitor the changes in the online data over time, and could cause errors or poor performance. Feature skew is a type of model monitoring metric that measures the difference between the distributions of the features used to train the model and the features used to serve the model at a given point in time. Feature skew can indicate that the model is not trained on the representative data, or that the data is changing over time. By creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective, you can monitor the feature distribution at a given point in time with minimal effort. However, refactoring the serving container to accept key-value pairs as input format, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective would require more skills and steps than uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and providing an instance schema, and would not help you monitor the changes in the online data over time, and could cause errors or poor performance. You would need to write code, refactor the serving container, upload the model to Vertex AI Model Registry, deploy the model to a Vertex AI endpoint, and create a Vertex AI Model Monitoring job. Moreover, this option would not monitor the feature drift, which is a more direct and relevant metric for measuring the changes in the online data over time, and the model performance and quality1.
References:
Using Model Monitoring | Vertex AI | Google Cloud
You are creating a social media app where pet owners can post images of their pets. You have one million user uploaded images with hashtags. You want to build a comprehensive system that recommends images to users that are similar in appearance to their own uploaded images.
What should you do?
Download a pretrained convolutional neural network, and fine-tune the model to predict hashtags based on the input images. Use the predicted hashtags to make recommendations.
Retrieve image labels and dominant colors from the input images using the Vision API. Use these properties and the hashtags to make recommendations.
Use the provided hashtags to create a collaborative filtering algorithm to make recommendations.
Download a pretrained convolutional neural network, and use the model to generate embeddings of the input images. Measure similarity between embeddings to make recommendations.
The best option to build a comprehensive system that recommends images to users that are similar in appearance to their own uploaded images is to download a pretrained convolutional neural network (CNN), and use the model to generate embeddings of the input images. Embeddings are low-dimensional representations of high-dimensional data that capture the essential features and semantics of the data. By using a pretrained CNN, you can leverage the knowledge learned from large-scale image datasets, such as ImageNet, and apply it to your own domain. A pretrained CNN can be used as a feature extractor, where the output of the last hidden layer (or any intermediate layer) is taken as the embedding vector for the input image. You can then measure the similarity between embeddings using a distance metric, such as cosine similarity or Euclidean distance, and recommend images that have the highest similarity scores to the user’s uploaded image. Option A is incorrect because downloading a pretrained CNN and fine-tuning the model to predict hashtags based on the input images may not capture the visual similarity of the images, as hashtags may not reflect the appearance of the images accurately. For example, two images of different breeds of dogs may have the same hashtag #dog, but they may not look similar to each other. Moreover, fine-tuning the model may require additional data and computational resources, and it may not generalize well to new images that have different or missing hashtags. Option B is incorrect because retrieving image labels and dominant colors from the input images using the Vision API may not capture the visual similarity of the images, as labels and colors may not reflect the fine-grained details of the images. For example, two images of the same breed of dog may have different labels and colors depending on the background, lighting, and angle of the image. Moreover, using the Vision API may incur additional costs and latency, and it may not be able to handle custom or domain-specific labels. Option C is incorrect because using the provided hashtags to create a collaborative filtering algorithm may not capture the visual similarity of the images, as collaborative filtering relies on the ratings or preferences of users, not the features of the images. For example, two images of different animals may have similar ratings or preferences from users, but they may not look similar to each other. Moreover, collaborative filtering may suffer from the cold start problem, where new images or users that have no ratings or preferences cannot be recommended. References:
Image similarity search with TensorFlow
Image embeddings documentation
Pretrained models documentation
Similarity metrics documentation
You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the app that verifies a customer's identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be downloaded and stored into the bank databases. Which learning strategy should you recommend to train and deploy this ML model?
Differential privacy
Federated learning
MD5 to encrypt data
Data Loss Prevention API
Federated learning is a machine learning technique that enables organizations to train AI models on decentralized data without centralizing or sharing it1. It allows data privacy, continual learning, and better performance on end-user devices2. Federated learning works by sending the model parameters to the devices, where they are updated locally on the device’s data, and then aggregating the updated parameters on a central server to form a global model3. This way, the data never leaves the device and the model can learn from a large and diverse dataset.
Federated learning is suitable for the use case of building an ML-based biometric authentication for the bank’s mobile app that verifies a customer’s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be downloaded and stored into the bank databases. By using federated learning, the bank can train and deploy an ML model that can recognize fingerprints without compromising the data privacy of the customers. The model can also adapt to the variations and changes in the fingerprints over time and improve its accuracy and reliability. Therefore, federated learning is the best learning strategy for this use case.
You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?
Configure a v3-8 TPU VM SSH into the VM to tram and debug the model.
Configure a v3-8 TPU node Use Cloud Shell to SSH into the Host VM to train and debug the model.
Configure a M-standard-4 VM with 4 NVIDIA P100 GPUs SSH into the VM and use
Parameter Server Strategy to train the model.
Configure a M-standard-4 VM with 4 NVIDIA P100 GPUs SSH into the VM and use
MultiWorkerMirroredStrategy to train the model.
A TPU VM is a virtual machine that has direct access to a Cloud TPU device. TPU VMs provide a simpler and more flexible way to use Cloud TPUs, as they eliminate the need for a separate host VM and network setup. TPU VMs also support interactive debugging tools such as TensorFlow Debugger (tfdbg) and Python Debugger (pdb), which can help researchers develop and troubleshoot complex models. A v3-8 TPU VM has 8 TPU cores, which can provide high performance and scalability for training large models. SSHing into the TPU VM allows the user to run and debug the TensorFlow code directly on the TPU device, without any network overhead or data transfer issues. References:
1: TPU VMs Overview
2: TPU VMs Quickstart
3: Debugging TensorFlow Models on Cloud TPUs
You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?
Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.
Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.
Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using
Labels are key-value pairs that you can attach to AI Platform resources such as jobs, models, and versions. Labels can help you organize your resources into descriptive categories that reflect your business needs. For example, you can use labels to indicate the owner, purpose, environment, or status of a resource. You can also use labels to filter the results when you list or monitor your resources on the Google Cloud Console or the Cloud SDK. Using labels can help you manage your resources in a clean and scalable way, without requiring separate projects or restrictive permissions.
References:
Using labels to organize AI Platform resources
Creating and managing labels
QUESTION 52
You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep Learning VM Image, you receive the following error: The resource 'projects/deeplearning-platforn/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80' was not found. What should you do?
A. Ensure that you have GPU quota in the selected region.
B. Ensure that the required GPU is available in the selected region.
C. Ensure that you have preemptible GPU quota in the selected region.
D. Ensure that the selected GPU has enough GPU memory for the workload.
Answer: B
The error message indicates that the selected GPU type (nvidia-tesla-k80) is not available in the selected region (europe-west4-c). This can happen when the GPU type is not supported in the region, or when the GPU quota is exhausted in the region. To avoid this error, you should ensure that the required GPU is available in the selected region before creating a Deep Learning VM Image. You can use the following steps to check the GPU availability and quota:
To check the GPU availability, you can use the gcloud compute accelerator-types list command with the --filter flag to specify the GPU type and the region. For example, to check the availability of nvidia-tesla-k80 in europe-west4-c, you can run:
gcloud compute accelerator-types list --filter="name=nvidia-tesla-k80 AND zone:europe-west4-c"
If the command returns an empty result, it means that the GPU type is not supported in the region. You can either choose a different GPU type or a different region that supports the GPU type. You can use the same command without the --filter flag to list all the available GPU types and regions. For example, to list all the available GPU types in europe-west4-c, you can run:
gcloud compute accelerator-types list --filter="zone:europe-west4-c"
To check the GPU quota, you can use the gcloud compute regions describe command with the --format flag to specify the region and the quota metric. For example, to check the quota for nvidia-tesla-k80 in europe-west4-c, you can run:
gcloud compute regions describe europe-west4-c --format="value(quotas.NVIDIA_K80_GPUS)"
If the command returns a value of 0, it means that the GPU quota is exhausted in the region. You can either request more quota from Google Cloud or choose a different region that has enough quota for the GPU type.
References:
Troubleshooting | Deep Learning VM Images | Google Cloud
Checking GPU availability
Checking GPU quota
You need to deploy a scikit-learn classification model to production. The model must be able to serve requests 24/7 and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment What should you do?
Deploy an online Vertex Al prediction endpoint Set the max replica count to 1
Deploy an online Vertex Al prediction endpoint Set the max replica count to 100
Deploy an online Vertex Al prediction endpoint with one GPU per replica Set the max replica count to 1.
Deploy an online Vertex Al prediction endpoint with one GPU per replica Set the max replica count to 100.
The best option for deploying a scikit-learn classification model to production is to deploy an online Vertex AI prediction endpoint and set the max replica count to 100. This option allows you to leverage the power and scalability of Google Cloud to serve requests 24/7 and handle millions of requests per second. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained scikit-learn model to an online prediction endpoint, which can provide low-latency predictions for individual instances. An online prediction endpoint consists of one or more replicas, which are copies of the model that run on virtual machines. The max replica count is a parameter that determines the maximum number of replicas that can be created for the endpoint. By setting the max replica count to 100, you can enable the endpoint to scale up to 100 replicas when the traffic increases, and scale down to zero replicas when the traffic decreases. This can help minimize the cost of deployment, as you only pay for the resources that you use. Moreover, you can use the autoscaling algorithm option to optimize the scaling behavior of the endpoint based on the latency and utilization metrics1.
The other options are not as good as option B, for the following reasons:
Option A: Deploying an online Vertex AI prediction endpoint and setting the max replica count to 1 would not be able to serve requests 24/7 and handle millions of requests per second. Setting the max replica count to 1 would limit the endpoint to only one replica, which can cause performance issues and service disruptions when the traffic increases. Moreover, setting the max replica count to 1 would prevent the endpoint from scaling down to zero replicas when the traffic decreases, which can increase the cost of deployment, as you pay for the resources that you do not use1.
Option C: Deploying an online Vertex AI prediction endpoint with one GPU per replica and setting the max replica count to 1 would not be able to serve requests 24/7 and handle millions of requests per second, and would increase the cost of deployment. Adding a GPU to each replica would increase the computational power of the endpoint, but it would also increase the cost of deployment, as GPUs are more expensive than CPUs. Moreover, setting the max replica count to 1 would limit the endpoint to only one replica, which can cause performance issues and service disruptions when the traffic increases, and prevent the endpoint from scaling down to zero replicas when the traffic decreases1. Furthermore, scikit-learn models do not benefit from GPUs, as scikit-learn is not optimized for GPU acceleration2.
Option D: Deploying an online Vertex AI prediction endpoint with one GPU per replica and setting the max replica count to 100 would be able to serve requests 24/7 and handle millions of requests per second, but it would increase the cost of deployment. Adding a GPU to each replica would increase the computational power of the endpoint, but it would also increase the cost of deployment, as GPUs are more expensive than CPUs. Setting the max replica count to 100 would enable the endpoint to scale up to 100 replicas when the traffic increases, and scale down to zero replicas when the traffic decreases, which can help minimize the cost of deployment. However, scikit-learn models do not benefit from GPUs, as scikit-learn is not optimized for GPU acceleration2. Therefore, using GPUs for scikit-learn models would be unnecessary and wasteful.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 2: Serving ML Predictions
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.1 Deploying ML models to production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.2: Serving ML Predictions
Online prediction
Scaling online prediction
scikit-learn FAQ
You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?
Randomly redistribute the data, with 70% for the training set and 30% for the test set
Use sparse representation in the test set
Apply one-hot encoding on the categorical variables in the test data.
Collect more data representing all categories
The best option for dealing with the missing categorical variable in the test set is to apply one-hot encoding on the categorical variables in the test data. This option has the following advantages:
It ensures the consistency and compatibility of the data format for the ML model, as the one-hot encoding transforms the categorical variables into binary vectors that can be easily processed by the model. By applying one-hot encoding on the categorical variables in the test data, you can match the number and order of the features in the test data with the training data, and avoid any errors or discrepancies in the model prediction.
It preserves the information and relevance of the data for the ML model, as the one-hot encoding creates a separate feature for each possible value of the categorical variable, and assigns a value of 1 to the feature corresponding to the actual value of the variable, and 0 to the rest. By applying one-hot encoding on the categorical variables in the test data, you can retain the original meaning and importance of the categorical variable, and avoid any loss or distortion of the data.
The other options are less optimal for the following reasons:
Option A: Randomly redistributing the data, with 70% for the training set and 30% for the test set, introduces additional complexity and risk. This option requires reshuffling and splitting the data again, which can be tedious and time-consuming. Moreover, this option may not guarantee that the missing categorical variable will be present in the test set, as it depends on the randomness of the data distribution. Furthermore, this option may affect the quality and validity of the ML model, as it may change the data characteristics and patterns that the model has learned from the original training set.
Option B: Using sparse representation in the test set introduces additional overhead and inefficiency. This option requires converting the categorical variables in the test set into sparse vectors, which are vectors that have mostly zero values and only store the indices and values of the non-zero elements. However, using sparse representation in the test set may not be compatible with the ML model, as the model expects the input data to have the same format and dimensionality as the training data, which uses one-hot encoding. Moreover, using sparse representation in the test set may not be efficient or scalable, as it requires additional computation and memory to store and process the sparse vectors.
Option D: Collecting more data representing all categories introduces additional cost and delay. This option requires obtaining and labeling more data that contains the missing categorical variable, which can be expensive and time-consuming. Moreover, this option may not be feasible or necessary, as the missing categorical variable may not be available or relevant for the test data, depending on the data source or the business problem.
You are working on a system log anomaly detection model for a cybersecurity organization. You have developed the model using TensorFlow, and you plan to use it for real-time prediction. You need to create a Dataflow pipeline to ingest data via Pub/Sub and write the results to BigQuery. You want to minimize the serving latency as much as possible. What should you do?
Containerize the model prediction logic in Cloud Run, which is invoked by Dataflow.
Load the model directly into the Dataflow job as a dependency, and use it for prediction.
Deploy the model to a Vertex AI endpoint, and invoke this endpoint in the Dataflow job.
Deploy the model in a TFServing container on Google Kubernetes Engine, and invoke it in the Dataflow job.
The best option for creating a Dataflow pipeline for real-time anomaly detection is to load the model directly into the Dataflow job as a dependency, and use it for prediction. This option has the following advantages:
It minimizes the serving latency, as the model prediction logic is executed within the same Dataflow pipeline that ingests and processes the data. There is no need to invoke external services or containers, which can introduce network overhead and latency.
It simplifies the deployment and management of the model, as the model is packaged with the Dataflow job and does not require a separate service or container. The model can be updated by redeploying the Dataflow job with a new model version.
It leverages the scalability and reliability of Dataflow, as the model prediction logic can scale up or down with the data volume and handle failures and retries automatically.
The other options are less optimal for the following reasons:
Option A: Containerizing the model prediction logic in Cloud Run, which is invoked by Dataflow, introduces additional latency and complexity. Cloud Run is a serverless platform that runs stateless containers, which means that the model prediction logic needs to be initialized and loaded every time a request is made. This can increase the cold start latency and reduce the throughput. Moreover, Cloud Run has a limit on the number of concurrent requests per container, which can affect the scalability of the model prediction logic. Additionally, this option requires managing two separate services: the Dataflow pipeline and the Cloud Run container.
Option C: Deploying the model to a Vertex AI endpoint, and invoking this endpoint in the Dataflow job, also introduces additional latency and complexity. Vertex AI is a managed service that provides various tools and features for machine learning, such as training, tuning, serving, and monitoring. However, invoking a Vertex AI endpoint from a Dataflow job requires making an HTTP request, which can incur network overhead and latency. Moreover, this option requires managing two separate services: the Dataflow pipeline and the Vertex AI endpoint.
Option D: Deploying the model in a TFServing container on Google Kubernetes Engine, and invoking it in the Dataflow job, also introduces additional latency and complexity. TFServing is a high-performance serving system for TensorFlow models, which can handle multiple versions and variants of a model. However, invoking a TFServing container from a Dataflow job requires making a gRPC or REST request, which can incur network overhead and latency. Moreover, this option requires managing two separate services: the Dataflow pipeline and the Google Kubernetes Engine cluster.
References:
[Dataflow documentation]
[TensorFlow documentation]
[Cloud Run documentation]
[Vertex AI documentation]
[TFServing documentation]
You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have an 8 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into a text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature following Google-recommended best practices?
Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
Upsample the audio recordings to 16 kHz. and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
Upsample the audio recordings to 16 kHz. and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
You work with a learn of researchers lo develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?
Configure a v3-8 TPU VM.
Configure a v3-8 TPU node.
Configure a c2-standard-60 VM without GPUs.
D, Configure a n1-standard-4 VM with 1 NVIDIA P100 GPU.
You are training an ML model on a large dataset. You are using a TPU to accelerate the training process You notice that the training process is taking longer than expected. You discover that the TPU is not reaching its full capacity. What should you do?
Increase the learning rate
Increase the number of epochs
Decrease the learning rate
Increase the batch size
The best option for training an ML model on a large dataset, using a TPU to accelerate the training process, and discovering that the TPU is not reaching its full capacity, is to increase the batch size. This option allows you to leverage the power and simplicity of TPUs to train your model faster and more efficiently. A TPU is a custom-developed application-specific integrated circuit (ASIC) that can accelerate machine learning workloads. A TPU can provide high performance and scalability for various types of models, such as linear regression, logistic regression, k-means clustering, matrix factorization, and deep neural networks. A TPU can also support various tools and frameworks, such as TensorFlow, PyTorch, and JAX. A batch size is a parameter that specifies the number of training examples in one forward/backward pass. A batch size can affect the speed and accuracy of the training process. A larger batch size can help you utilize the parallel processing power of the TPU, and reduce the communication overhead between the TPU and the host CPU. A larger batch size can also help you avoid overfitting, as it can reduce the variance of the gradient updates. By increasing the batch size, you can train your model on a large dataset faster and more efficiently, and make full use of the TPU capacity1.
The other options are not as good as option D, for the following reasons:
Option A: Increasing the learning rate would not help you utilize the parallel processing power of the TPU, and could cause errors or poor performance. A learning rate is a parameter that controls how much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the training process. A larger learning rate can help you converge faster, but it can also cause instability, divergence, or oscillation. By increasing the learning rate, you may not be able to find the optimal solution, and your model may perform poorly on the validation or test data2.
Option B: Increasing the number of epochs would not help you utilize the parallel processing power of the TPU, and could increase the complexity and cost of the training process. An epoch is a measure of the number of times all of the training examples are used once in the training process. An epoch can affect the speed and accuracy of the training process. A larger number of epochs can help you learn more from the data, but it can also cause overfitting, underfitting, or diminishing returns. By increasing the number of epochs, you may not be able to improve the model performance significantly, and your training process may take longer and consume more resources3.
Option C: Decreasing the learning rate would not help you utilize the parallel processing power of the TPU, and could slow down the training process. A learning rate is a parameter that controls how much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the training process. A smaller learning rate can help you find a more precise solution, but it can also cause slow convergence or local minima. By decreasing the learning rate, you may not be able to reach the optimal solution in a reasonable time, and your training process may take longer2.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: ML Models and Architectures, Week 1: Introduction to ML Models and Architectures
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 2: Architecting ML solutions, 2.1 Designing ML models
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: ML Models and Architectures, Section 4.1: Designing ML Models
Use TPUs
Triose phosphate utilization and beyond: from photosynthesis to end …
Cloud TPU performance guide
Google TPU: Architecture and Performance Best Practices - Run
You are deploying a new version of a model to a production Vertex Al endpoint that is serving traffic You plan to direct all user traffic to the new model You need to deploy the model with minimal disruption to your application What should you do?
1 Create a new endpoint.
2 Create a new model Set it as the default version Upload the model to Vertex Al Model Registry.
3. Deploy the new model to the new endpoint.
4 Update Cloud DNS to point to the new endpoint
1. Create a new endpoint.
2. Create a new model Set the parentModel parameter to the model ID of the currently deployed model and set it as the default version Upload the model to Vertex Al Model Registry
3. Deploy the new model to the new endpoint and set the new model to 100% of the traffic
1 Create a new model Set the parentModel parameter to the model ID of the currently deployed model Upload the model to Vertex Al Model Registry.
2 Deploy the new model to the existing endpoint and set the new model to 100% of the traffic.
1, Create a new model Set it as the default version Upload the model to Vertex Al Model Registry
2 Deploy the new model to the existing endpoint
The best option for deploying a new version of a model to a production Vertex AI endpoint that is serving traffic, directing all user traffic to the new model, and deploying the model with minimal disruption to your application, is to create a new model, set the parentModel parameter to the model ID of the currently deployed model, upload the model to Vertex AI Model Registry, deploy the new model to the existing endpoint, and set the new model to 100% of the traffic. This option allows you to leverage the power and simplicity of Vertex AI to update your model version and serve online predictions with low latency. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained model to an online prediction endpoint, which can provide low-latency predictions for individual instances. A model is a resource that represents a machine learning model that you can use for prediction. A model can have one or more versions, which are different implementations of the same model. A model version can have different parameters, code, or data than another version of the same model. A model version can help you experiment and iterate on your model, and improve the model performance and accuracy. A parentModel parameter is a parameter that specifies the model ID of the model that the new model version is based on. A parentModel parameter can help you inherit the settings and metadata of the existing model, and avoid duplicating the model configuration. Vertex AI Model Registry is a service that can store and manage your machine learning models on Google Cloud. Vertex AI Model Registry can help you upload and organize your models, and track the model versions and metadata. An endpoint is a resource that provides the service endpoint (URL) you use to request the prediction. An endpoint can have one or more deployed models, which are instances of model versions that are associated with physical resources. A deployed model can help you serve online predictions with low latency, and scale up or down based on the traffic. By creating a new model, setting the parentModel parameter to the model ID of the currently deployed model, uploading the model to Vertex AI Model Registry, deploying the new model to the existing endpoint, and setting the new model to 100% of the traffic, you can deploy a new version of a model to a production Vertex AI endpoint that is serving traffic, direct all user traffic to the new model, and deploy the model with minimal disruption to your application1.
The other options are not as good as option C, for the following reasons:
Option A: Creating a new endpoint, creating a new model, setting it as the default version, uploading the model to Vertex AI Model Registry, deploying the new model to the new endpoint, and updating Cloud DNS to point to the new endpoint would require more skills and steps than creating a new model, setting the parentModel parameter to the model ID of the currently deployed model, uploading the model to Vertex AI Model Registry, deploying the new model to the existing endpoint, and setting the new model to 100% of the traffic. Cloud DNS is a service that can provide reliable and scalable Domain Name System (DNS) services on Google Cloud. Cloud DNS can help you manage your DNS records, and resolve domain names to IP addresses. By updating Cloud DNS to point to the new endpoint, you can redirect the user traffic to the new endpoint, and avoid breaking the existing application. However, creating a new endpoint, creating a new model, setting it as the default version, uploading the model to Vertex AI Model Registry, deploying the new model to the new endpoint, and updating Cloud DNS to point to the new endpoint would require more skills and steps than creating a new model, setting the parentModel parameter to the model ID of the currently deployed model, uploading the model to Vertex AI Model Registry, deploying the new model to the existing endpoint, and setting the new model to 100% of the traffic. You would need to write code, create and configure the new endpoint, create and configure the new model, upload the model to Vertex AI Model Registry, deploy the model to the new endpoint, and update Cloud DNS to point to the new endpoint. Moreover, this option would create a new endpoint, which can increase the maintenance and management costs2.
Option B: Creating a new endpoint, creating a new model, setting the parentModel parameter to the model ID of the currently deployed model and setting it as the default version, uploading the model to Vertex AI Model Registry, and deploying the new model to the new endpoint and setting the new model to 100% of the traffic would require more skills and steps than creating a new model, setting the parentModel parameter to the model ID of the currently deployed model, uploading the model to Vertex AI Model Registry, deploying the new model to the existing endpoint, and setting the new model to 100% of the traffic. A parentModel parameter is a parameter that specifies the model ID of the model that the new model version is based on. A parentModel parameter can help you inherit the settings and metadata of the existing model, and avoid duplicating the model configuration. A default version is a model version that is used for prediction when no other version is specified. A default version can help you simplify the prediction request, and avoid specifying the model version every time. By setting the parentModel parameter to the model ID of the currently deployed model and setting it as the default version, you can create a new model that is based on the existing model, and use it for prediction without specifying the model version. However, creating a new endpoint, creating a new model, setting the parentModel parameter to the model ID of the currently deployed model and setting it as the default version, uploading the model to Vertex AI Model Registry, and deploying the new model to the new endpoint and setting the new model to 100% of the traffic would require more skills and steps than creating a new model, setting the parentModel parameter to the model ID of the currently deployed model, uploading the model to Vertex AI Model Registry, deploying the new model to the existing endpoint, and setting the new model to 100% of the traffic. You would need to write code, create and configure the new endpoint, create and configure the new model, upload the model to Vertex AI Model Registry, and deploy the model to the new endpoint. Moreover, this option would create a new endpoint, which can increase the maintenance and management costs2.
Option D: Creating a new model, setting it as the default version, uploading the model to Vertex AI Model Registry, and deploying the new model to the existing endpoint would not allow you to inherit the settings and metadata of the existing model, and could cause errors or poor performance. A default version is a model version that is used for prediction when no other version is specified. A default version can help you simplify the prediction request, and avoid specifying the model version every time. By setting the new model as the default version, you can use the new model for prediction without specifying the model version. However, creating a new model, setting it as the default version, uploading the model to Vertex AI Model Registry, and deploying the new model to the existing endpoint would not allow you to inherit the settings and metadata of the existing model, and could cause errors or poor performance. You would need to write code, create and configure the new model, upload the model to Vertex AI Model Registry, and deploy the model to the existing endpoint. Moreover, this option would not set the parentModel parameter to the model ID of the currently deployed model, which could prevent you from inheriting the settings and metadata of the existing model, and cause inconsistencies or conflicts between the model versions2.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 2: Serving ML Predictions
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.1 Deploying ML models to production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.2: Serving ML Predictions
Vertex AI
Cloud DNS
While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?
Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step.
Migrate your pipeline to Kubeflow hosted on Google Kubernetes Engine, and specify the appropriate node parameters for the evaluation step.
Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow.
Move the evaluation step out of your pipeline and run it on custom Compute Engine VMs with sufficient memory.
The best option to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead is to use Dataflow as the runner for the evaluation step. Dataflow is a fully managed service for executing Apache Beam pipelines that can scale up and down according to the workload. Dataflow can handle large-scale, distributed data processing tasks such as model evaluation, and it can also integrate with Vertex AI Pipelines and TensorFlow Extended (TFX). By using the flag -runner=DataflowRunner in beam_pipeline_args, you can instruct the Evaluator component to run the evaluation step on Dataflow, instead of using the default DirectRunner, which runs locally and may cause out-of-memory errors. Option A is incorrect because adding tfma.MetricsSpec() to limit the number of metrics in the evaluation step may downgrade the evaluation quality, as some important metrics may be omitted. Moreover, reducing the number of metrics may not solve the out-of-memory error, as the evaluation step may still consume a lot of memory depending on the size and complexity of the data and the model. Option B is incorrect because migrating the pipeline to Kubeflow hosted on Google Kubernetes Engine (GKE) may increase the infrastructure overhead, as you need to provision, manage, and monitor the GKE cluster yourself. Moreover, you need to specify the appropriate node parameters for the evaluation step, which may require trial and error to find the optimal configuration. Option D is incorrect because moving the evaluation step out of the pipeline and running it on custom Compute Engine VMs may also increase the infrastructure overhead, as you need to create, configure, and delete the VMs yourself. Moreover, you need to ensure that the VMs have sufficient memory for the evaluation step, which may require trial and error to find the optimal machine type. References:
Dataflow documentation
Using DataflowRunner
Evaluator component documentation
Configuring the Evaluator component
You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?
Set up a TensorFlow Extended (TFX) pipeline on Vertex Al Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
Set up a Vertex Al Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.
Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
You work at a leading healthcare firm developing state-of-the-art algorithms for various use cases You have unstructured textual data with custom labels You need to extract and classify various medical phrases with these labels What should you do?
Use the Healthcare Natural Language API to extract medical entities.
Use a BERT-based model to fine-tune a medical entity extraction model.
Use AutoML Entity Extraction to train a medical entity extraction model.
Use TensorFlow to build a custom medical entity extraction model.
Medical entity extraction is a task that involves identifying and classifying medical terms or concepts from unstructured textual data, such as electronic health records, clinical notes, or research papers. Medical entity extraction can help with various use cases, such as information retrieval, knowledge discovery, decision support, and data analysis1.
One possible approach to perform medical entity extraction is to use a BERT-based model to fine-tune a medical entity extraction model. BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that can capture the contextual information from both left and right sides of a given token2. BERT can be fine-tuned on a specific downstream task, such as medical entity extraction, by adding a task-specific layer on top of the pre-trained model and updating the model parameters with a small amount of labeled data3.
A BERT-based model can achieve high performance on medical entity extraction by leveraging the large-scale pre-training on general-domain corpora and the fine-tuning on domain-specific data. For example, Nesterov and Umerenkov4 proposed a novel method of doing medical entity extraction from electronic health records as a single-step multi-label classification task by fine-tuning a transformer model pre-trained on a large EHR dataset. They showed that their model can achieve human-level quality for most frequent entities.
References:
1: Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary | Data Intelligence | MIT Press
2: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
3: Fine-tuning BERT for Medical Entity Extraction
4: Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality
Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?
AVM on Compute Engine and 1 TPU with all dependencies installed manually.
AVM on Compute Engine and 8 GPUs with all dependencies installed manually.
A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.
A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.
In this scenario, the goal is to speed up model training for a CNN-based architecture on Google Cloud. The code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Given these constraints, the best environment to train the model on would be a Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed. Option C is the correct answer.
Option C: A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed. This option is the most suitable for the scenario because it provides a ready-to-use environment for deep learning on Google Cloud. A Deep Learning VM is a specialized VM image that is pre-installed with popular deep learning frameworks such as TensorFlow, PyTorch, Keras, and more. A Deep Learning VM also comes with NVIDIA GPU drivers and CUDA libraries that enable GPU acceleration for model training. A Deep Learning VM can be easily configured and launched from the Google Cloud Console or the Cloud SDK. An n1-standard-2 machine is a general-purpose machine type that provides 2 vCPUs and 7.5 GB of memory. This machine type can be sufficient for running a CNN-based architecture. A GPU is a specialized hardware accelerator that can speed up the computation of matrix operations and convolutions, which are common in CNN-based architectures. By using a Deep Learning VM with an n1-standard-2 machine and 1 GPU, the model training can be significantly faster than on an on-premises CPU-only infrastructure.
Option A: A VM on Compute Engine and 1 TPU with all dependencies installed manually. This option is not suitable for the scenario because it requires manual installation of dependencies and device placement. A TPU is a custom-designed ASIC that can provide high performance and efficiency for TensorFlow models. However, to use a TPU, the code needs to include manual device placement and be wrapped in Estimator model-level abstraction. Moreover, to use a TPU, the dependencies such as TensorFlow, Cloud TPU Client, and Cloud Storage need to be installed manually on the VM. This option can be complex and time-consuming to set up and may not be compatible with the existing code.
Option B: A VM on Compute Engine and 8 GPUs with all dependencies installed manually. This option is not suitable for the scenario because it requires manual installation of dependencies and may not be cost-effective. While using 8 GPUs can provide high parallelism and speed for model training, it also increases the cost and complexity of the environment. Moreover, to use GPUs, the dependencies such as NVIDIA GPU drivers, CUDA libraries, and deep learning frameworks need to be installed manually on the VM. This option can be tedious and error-prone to set up and may not be necessary for the scenario.
Option D: A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed. This option is not suitable for the scenario because it does not leverage GPU acceleration for model training. While using more powerful CPU machines can provide more compute resources and memory for model training, it may not be as fast and efficient as using GPU machines. CPU machines are not optimized for matrix operations and convolutions, which are common in CNN-based architectures. Moreover, using more powerful CPU machines can also increase the cost of the environment. This option can be suboptimal and wasteful for the scenario.
References:
Deep Learning VM Image documentation
Compute Engine documentation
Cloud TPU documentation
Machine types documentation
GPUs on Compute Engine documentation
You are developing a model to help your company create more targeted online advertising campaigns. You need to create a dataset that you will use to train the model. You want to avoid creating or reinforcing unfair bias in the model. What should you do?
Choose 2 answers
Include a comprehensive set of demographic features.
include only the demographic groups that most frequently interact with advertisements.
Collect a random sample of production traffic to build the training dataset.
Collect a stratified sample of production traffic to build the training dataset.
Conduct fairness tests across sensitive categories and demographics on the trained model.
To avoid creating or reinforcing unfair bias in the model, you should collect a representative sample of production traffic to build the training dataset, and conduct fairness tests across sensitive categories and demographics on the trained model. A representative sample is one that reflects the true distribution of the population, and does not over- or under-represent any group. A random sample is a simple way to obtain a representative sample, as it ensures that every data point has an equal chance of being selected. A stratified sample is another way to obtain a representative sample, as it ensures that every subgroup has a proportional representation in the sample. However, a stratified sample requires prior knowledge of the subgroups and their sizes, which may not be available or easy to obtain. Therefore, a random sample is a more feasible option in this case. A fairness test is a way to measure and evaluate the potential bias and discrimination of the model, based on different categories and demographics, such as age, gender, race, etc. A fairness test can help you identify and mitigate any unfair outcomes or impacts of the model, and ensure that the model treats all groups fairly and equitably. A fairness test can be conducted using various methods and tools, such as confusion matrices, ROC curves, fairness indicators, etc. References: The answer can be verified from official Google Cloud documentation and resources related to data sampling and fairness testing.
Sampling data | BigQuery
Fairness Indicators | TensorFlow
What-if Tool | TensorFlow
You are developing an image recognition model using PyTorch based on ResNet50 architecture Your code is working fine on your local laptop on a small subsample. Your full dataset has 200k labeled images You want to quickly scale your training workload while minimizing cost. You plan to use 4 V100 GPUs What should you do?
Create a Google Kubernetes Engine cluster with a node pool that has 4 V100 GPUs Prepare and submit a TFJob operator to this node pool.
Configure a Compute Engine VM with all the dependencies that launches the training Tram your model with Vertex Al using a custom tier that contains the required GPUs.
Create a Vertex Al Workbench user-managed notebooks instance with 4 V100 GPUs, and use it to tram your model.
Package your code with Setuptools and use a pre-built container. Train your model with Vertex Al using a custom tier that contains the required GPUs.
Vertex AI is a unified platform for building and managing machine learning solutions on Google Cloud. It provides a managed service for training custom models with various frameworks, such as TensorFlow, PyTorch, scikit-learn, and XGBoost. To train your PyTorch model with Vertex AI, you need to package your code with Setuptools, which is a Python tool for creating and distributing packages. You also need to use a pre-built container, which is a Docker image that contains the dependencies and libraries for your framework. You can choose from a list of pre-built containers provided by Google, or create your own custom container. By using a pre-built container, you can avoid the hassle of installing and configuring the environment for your model. You can also specify a custom tier for your training job, which allows you to select the number and type of GPUs you want to use. You can choose from various GPU options, such as V100, P100, K80, and T4. By using 4 V100 GPUs, you can leverage the high performance and memory capacity of these accelerators to train your model faster and cheaper than using CPUs. This solution requires minimal changes to your code and can scale your training workload efficiently. References:
Vertex AI | Google Cloud
Custom training with pre-built containers | Vertex AI
[Using GPUs | Vertex AI]
You work as an ML researcher at an investment bank and are experimenting with the Gemini large language model (LLM). You plan to deploy the model for an internal use case and need full control of the model’s underlying infrastructure while minimizing inference time. Which serving configuration should you use for this task?
Deploy the model on a Vertex AI endpoint using one-click deployment in Model Garden.
Deploy the model on a Google Kubernetes Engine (GKE) cluster manually by creating a custom YAML manifest.
Deploy the model on a Vertex AI endpoint manually by creating a custom inference container.
Deploy the model on a Google Kubernetes Engine (GKE) cluster using the deployment options in Model Garden.
Deploying the model on GKE with a custom YAML manifest allows maximum control over infrastructure and latency, aligning with the need for low inference time and internal model use. Vertex AI's one-click deployment (Option A) limits control, and deploying on Vertex AI (Option C) doesn’t allow for as much customization as a GKE setup.
=================
You are the lead ML engineer on a mission-critical project that involves analyzing massive datasets using Apache Spark. You need to establish a robust environment that allows your team to rapidly prototype Spark models using Jupyter notebooks. What is the fastest way to achieve this?
Configure a Compute Engine instance with Spark and use Jupyter notebooks.
Set up a Dataproc cluster with Spark and use Jupyter notebooks.
Set up a Vertex AI Workbench instance with a Spark kernel.
Use Colab Enterprise with a Spark kernel.
Dataproc provides a managed Spark environment and integrates with Jupyter notebooks, ideal for large datasets and rapid prototyping. It reduces setup time compared to manual Spark configurations on Compute Engine or Vertex AI. Colab Enterprise is more suitable for small-scale prototyping rather than extensive Spark-based analysis.
=================
You have developed a fraud detection model for a large financial institution using Vertex AI. The model achieves high accuracy, but stakeholders are concerned about potential bias based on customer demographics. You have been asked to provide insights into the model's decision-making process and identify any fairness issues. What should you do?
Enable Vertex AI Model Monitoring to detect training-serving skew. Configure an alert to send an email when the skew or drift for a model’s feature exceeds a predefined threshold. Retrain the model by appending new data to existing training data.
Compile a dataset of unfair predictions. Use Vertex AI Vector Search to identify similar data points in the model's predictions. Report these data points to the stakeholders.
Use feature attribution in Vertex AI to analyze model predictions and the impact of each feature on the model's predictions.
Create feature groups using Vertex AI Feature Store to segregate customer demographic features and non-demographic features. Retrain the model using only non-demographic features.
Feature attribution helps to determine how each feature influences predictions, essential for identifying bias. Vertex AI’s built-in explainability tools provide insights without altering the model’s feature space. Model monitoring (Option A) detects distributional drift rather than feature influence. Options B and D do not directly address the request to explain model decisions or provide fairness insights.
=================
You are building a predictive maintenance model to preemptively detect part defects in bridges. You plan to use high definition images of the bridges as model inputs. You need to explain the output of the model to the relevant stakeholders so they can take appropriate action. How should you build the model?
Use scikit-learn to build a tree-based model, and use SHAP values to explain the model output.
Use scikit-lean to build a tree-based model, and use partial dependence plots (PDP) to explain the model output.
Use TensorFlow to create a deep learning-based model and use Integrated Gradients to explain the model
output.
Use TensorFlow to create a deep learning-based model and use the sampled Shapley method to explain the model output.
According to the official exam guide1, one of the skills assessed in the exam is to “explain the predictions of a trained model”. TensorFlow2 is an open source framework for developing and deploying machine learning and deep learning models. TensorFlow supports various model explainability methods, such as Integrated Gradients3, which is a technique that assigns an importance score to each input feature by approximating the integral of the gradients along the path from a baseline input to the actual input. Integrated Gradients can help explain the output of a deep learning-based model by highlighting the most influential features in the input images. Therefore, option C is the best way to build the model for the given use case. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
TensorFlow
Integrated Gradients
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
You are analyzing customer data for a healthcare organization that is stored in Cloud Storage. The data contains personally identifiable information (PII) You need to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields What should you do?
Use the Cloud Data Loss Prevention (DLP) API to de-identify the PI! before performing data exploration and preprocessing.
Use customer-managed encryption keys (CMEK) to encrypt the Pll data at rest and decrypt the Pll data during data exploration and preprocessing.
Use a VM inside a VPC Service Controls security perimeter to perform data exploration and preprocessing.
Use Google-managed encryption keys to encrypt the Pll data at rest, and decrypt the Pll data during data exploration and preprocessing.
According to the official exam guide1, one of the skills assessed in the exam is to “design, build, and productionalize ML models to solve business challenges using Google Cloud technologies”. Cloud Data Loss Prevention (DLP) API2 is a service that provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams, such as text blocks and images. Cloud DLP API helps you discover, classify, and protect your sensitive data by using techniques such as de-identification, masking, tokenization, and bucketing. You can use Cloud DLP API to de-identify the PII data before performing data exploration and preprocessing, and retain the data utility for ML purposes. Therefore, option A is the best way to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
Cloud Data Loss Prevention (DLP) API
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?
Use latitude, longitude, and product type as features. Use profit as model output.
Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.
Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output.
Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs.
Option A is incorrect because using latitude, longitude, and product type as features, and using profit as model output is not the best way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option does not capture the interaction between latitude and longitude, which may affect the profitability of the product. For example, the same product may have different profitability in different regions, depending on the climate, culture, or preferences of the customers. Moreover, this option does not account for the granularity of the location data, which may be too fine or too coarse for the model. For example, using the exact coordinates of a city may not be meaningful, as the profitability may vary within the city, or using the country name may not be informative, as the profitability may vary across the country.
Option B is incorrect because using latitude, longitude, and product type as features, and using revenue and expenses as model outputs is not a suitable way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option has the same drawbacks as option A, as it does not capture the interaction between latitude and longitude, or account for the granularity of the location data. Moreover, this option does not directly predict the profitability of the product, which is the target variable of interest. Instead, it predicts the revenue and expenses of the product, which are intermediate variables that depend on other factors, such as the price, the cost, or the demand of the product. To obtain the profitability, we would need to subtract the expenses from the revenue, which may introduce errors or uncertainties in the prediction.
Option C is correct because using product type and the feature cross of latitude with longitude, followed by binning, as features, and using profit as model output is a good way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option captures the interaction between latitude and longitude, which may affect the profitability of the product, by creating a feature cross of these two features. A feature cross is a synthetic feature that combines the values of two or more features into a single feature1. This option also accounts for the granularity of the location data, by binning the feature cross into discrete buckets. Binning is a technique that groups continuous values into intervals, which can reduce the noise and complexity of the data2. Moreover, this option directly predicts the profitability of the product, which is the target variable of interest, by using it as the model output.
Option D is incorrect because using product type and the feature cross of latitude with longitude, followed by binning, as features, and using revenue and expenses as model outputs is not a valid way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option has the same advantages as option C, as it captures the interaction between latitude and longitude, and accounts for the granularity of the location data, by creating a feature cross and binning it. However, this option does not directly predict the profitability of the product, which is the target variable of interest, but rather predicts the revenue and expenses of the product, which are intermediate variables that depend on other factors, as explained in option B.
References:
Feature cross
Binning
[Profitability]
[Revenue and expenses]
[Latitude and longitude]
[Product type]
You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an Al Platform notebook. What should you do?
Use Al Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe
Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance
Download your table from BigQuery as a local CSV file, and upload it to your Al Platform notebook instance Use pandas. read_csv to ingest the file as a pandas dataframe
From a bash cell in your Al Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutii cp to copy the data into the notebook Use pandas. read_csv to ingest the file as a pandas dataframe
AI Platform Notebooks is a service that provides managed Jupyter notebooks for data science and machine learning. You can use AI Platform Notebooks to create, run, and share your code and analysis in a collaborative and interactive environment1. BigQuery is a service that allows you to analyze large-scale and complex data using SQL queries. You can use BigQuery to stream, store, and query your data in a fast and cost-effective way2. Pandas is a popular Python library that provides data structures and tools for data analysis and manipulation. You can use pandas to create, manipulate, and visualize dataframes, which are tabular data structures with rows and columns3.
AI Platform Notebooks provides a cell magic, %%bigquery, that allows you to run SQL queries on BigQuery data and ingest the results as a pandas dataframe. A cell magic is a special command that applies to the whole cell in a Jupyter notebook. The %%bigquery cell magic can take various arguments, such as the name of the destination dataframe, the name of the destination table in BigQuery, the project ID, and the query parameters4. By using the %%bigquery cell magic, you can query the data in BigQuery with minimal code and manipulate the results with pandas in AI Platform Notebooks. This is the most convenient and efficient way to achieve your goal.
The other options are not as good as option A, because they involve more steps, more code, and more manual effort. Option B requires you to export your table as a CSV file from BigQuery to Google Drive, and then use the Google Drive API to ingest the file into your notebook instance. This option is cumbersome and time-consuming, as it involves moving the data across different services and formats. Option C requires you to download your table from BigQuery as a local CSV file, and then upload it to your AI Platform notebook instance. This option is also inefficient and impractical, as it involves downloading and uploading large files, which can take a long time and consume a lot of bandwidth. Option D requires you to use a bash cell in your AI Platform notebook to export the table as a CSV file to Cloud Storage, and then copy the data into the notebook. This option is also complex and unnecessary, as it involves using different commands and tools to move the data around. Therefore, option A is the best option for this use case.
References:
AI Platform Notebooks documentation
BigQuery documentation
pandas documentation
Using Jupyter magics to query BigQuery data
You work for a biotech startup that is experimenting with deep learning ML models based on properties of biological organisms. Your team frequently works on early-stage experiments with new architectures of ML models, and writes custom TensorFlow ops in C++. You train your models on large datasets and large batch sizes. Your typical batch size has 1024 examples, and each example is about 1 MB in size. The average size of a network with all weights and embeddings is 20 GB. What hardware should you choose for your models?
A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64 machine with 64 vCPUs and 58 GB RAM
A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM
A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM
A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM
The best hardware to choose for your models is a cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM. This hardware configuration can provide you with enough compute power, memory, and bandwidth to handle your large and complex deep learning models, as well as your custom TensorFlow ops in C++. The NVIDIA Tesla A100 GPUs are the latest and most advanced GPUs from NVIDIA, which offer high performance, scalability, and efficiency for various ML workloads. They also support multi-instance GPU (MIG) technology, which allows you to partition each GPU into up to seven smaller instances, each with its own memory, cache, and compute cores. This can enable you to run multiple experiments in parallel, or to optimize the resource utilization and cost efficiency of your models. The a2-megagpu-16g machines are part of the Google Cloud Accelerator-Optimized VM (A2) family, which are designed to provide the best performance and flexibility for GPU-intensive applications. They also offer high-speed NVLink interconnects between the GPUs, which can improve the data transfer and communication between the GPUs. Moreover, the a2-megagpu-16g machines have 96 vCPUs and 1.4 TB RAM, which can support the CPU and memory requirements of your models, as well as the data preprocessing and postprocessing tasks.
The other options are not optimal for the following reasons:
A. A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64 machine with 64 vCPUs and 58 GB RAM is not a good option, as it has less GPU memory, compute power, and bandwidth than the a2-megagpu-16g machines. The NVIDIA Tesla V100 GPUs are the previous generation of GPUs from NVIDIA, which have lower performance, scalability, and efficiency than the NVIDIA Tesla A100 GPUs. They also do not support the MIG technology, which can limit the flexibility and optimization of your models. Moreover, the n1-highcpu-64 machines are part of the Google Cloud N1 VM family, which are general-purpose VMs that do not offer the best performance and features for GPU-intensive applications. They also have lower vCPUs and RAM than the a2-megagpu-16g machines, which can affect the CPU and memory requirements of your models, as well as the data preprocessing and postprocessing tasks.
C. A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM is not a good option, as it has less GPU memory, compute power, and bandwidth than the a2-megagpu-16g machines. The v2-8 TPU is a cloud tensor processing unit (TPU) device, which is a custom ASIC chip designed by Google to accelerate ML workloads. However, the v2-8 TPU is the second generation of TPUs, which have lower performance, scalability, and efficiency than the latest v3-8 TPUs. They also have less memory and bandwidth than the NVIDIA Tesla A100 GPUs, which can limit the size and complexity of your models, as well as the data transfer and communication between the devices. Moreover, the n1-highcpu-64 machine has lower vCPUs and RAM than the a2-megagpu-16g machines, which can affect the CPU and memory requirements of your models, as well as the data preprocessing and postprocessing tasks.
D. A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM is not a good option, as it does not have any GPUs, which are essential for accelerating deep learning models. The n1-highcpu-96 machines are part of the Google Cloud N1 VM family, which are general-purpose VMs that do not offer the best performance and features for GPU-intensive applications. They also have lower RAM than the a2-megagpu-16g machines, which can affect the memory requirements of your models, as well as the data preprocessing and postprocessing tasks.
References:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Google Cloud launches machine learning engineer certification
NVIDIA Tesla A100 GPU
Google Cloud Accelerator-Optimized VM (A2) family
Google Cloud N1 VM family
Cloud TPU
Your company stores a large number of audio files of phone calls made to your customer call center in an on-premises database. Each audio file is in wav format and is approximately 5 minutes long. You need to analyze these audio files for customer sentiment. You plan to use the Speech-to-Text API. You want to use the most efficient approach. What should you do?
1 Upload the audio files to Cloud Storage
2. Call the speech: Iongrunningrecognize API endpoint to generate transcriptions
3. Call the predict method of an AutoML sentiment analysis model to analyze the transcriptions
1 Upload the audio files to Cloud Storage
2 Call the speech: Iongrunningrecognize API endpoint to generate transcriptions.
3 Create a Cloud Function that calls the Natural Language API by using the analyzesentiment method
1 Iterate over your local Tiles in Python
2. Use the Speech-to-Text Python library to create a speech.RecognitionAudio object and set the content to the audio file data
3. Call the speech: recognize API endpoint to generate transcriptions
4. Call the predict method of an AutoML sentiment analysis model to analyze the transcriptions
1 Iterate over your local files in Python
2 Use the Speech-to-Text Python Library to create a speech.RecognitionAudio object, and set the content to the audio file data
3. Call the speech: lengrunningrecognize API endpoint to generate transcriptions
4 Call the Natural Language API by using the analyzesenriment method
According to the official exam guide1, one of the skills assessed in the exam is to “design, build, and productionalize ML models to solve business challenges using Google Cloud technologies”. The Speech-to-Text API2 allows you to convert audio to text by applying powerful neural network models. The Natural Language API3 enables you to analyze text and extract information about the sentiment, entities, and syntax. The Cloud Functions4 service lets you write and deploy code that runs in response to events, such as a Pub/Sub message or an HTTP request. Therefore, option B is the most efficient approach to analyze the audio files for customer sentiment, as it leverages the existing Google Cloud services and avoids unnecessary data processing and model training. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
Speech-to-Text API
Natural Language API
Cloud Functions
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
While performing exploratory data analysis on a dataset, you find that an important categorical feature has 5% null values. You want to minimize the bias that could result from the missing values. How should you handle the missing values?
Remove the rows with missing values, and upsample your dataset by 5%.
Replace the missing values with the feature’s mean.
Replace the missing values with a placeholder category indicating a missing value.
Move the rows with missing values to your validation dataset.
The best option for handling missing values in a categorical feature is to replace them with a placeholder category indicating a missing value. This is a type of imputation, which is a method of estimating the missing values based on the observed data. Imputing the missing values with a placeholder category preserves the information that the data is missing, and avoids introducing bias or distortion in the feature distribution. It also allows the machine learning model to learn from the missingness pattern, and potentially use it as a predictor for the target variable. The other options are not suitable for handling missing values in a categorical feature, because:
Removing the rows with missing values and upsampling the dataset by 5% would reduce the size of the dataset and potentially lose important information. It would also introduce sampling bias and overfitting, as the upsampling process would create duplicate or synthetic observations that do not reflect the true population.
Replacing the missing values with the feature’s mean would not make sense for a categorical feature, as the mean is a numerical measure that does not capture the mode or frequency of the categories. It would also create a new category that does not exist in the original data, and might confuse the machine learning model.
Moving the rows with missing values to the validation dataset would compromise the validity and reliability of the model evaluation, as the validation dataset would not be representative of the test or production data. It would also reduce the amount of data available for training the model, and might introduce leakage or inconsistency between the training and validation datasets. References:
Imputation of missing values
Effective Strategies to Handle Missing Values in Data Analysis
How to Handle Missing Values of Categorical Variables?
Google Cloud launches machine learning engineer certification
Google Professional Machine Learning Engineer Certification
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
You recently built the first version of an image segmentation model for a self-driving car. After deploying the model, you observe a decrease in the area under the curve (AUC) metric. When analyzing the video recordings, you also discover that the model fails in highly congested traffic but works as expected when there is less traffic. What is the most likely reason for this result?
The model is overfitting in areas with less traffic and underfitting in areas with more traffic.
AUC is not the correct metric to evaluate this classification model.
Too much data representing congested areas was used for model training.
Gradients become small and vanish while backpropagating from the output to input nodes.
The most likely reason for the observed result is that the model is overfitting in areas with less traffic and underfitting in areas with more traffic. Overfitting means that the model learns the specific patterns and noise in the training data, but fails to generalize well to new and unseen data. Underfitting means that the model is not able to capture the complexity and variability of the data, and performs poorly on both training and test data. In this case, the model might have learned to segment the images well when there is less traffic, but it might not have enough data or features to handle the more challenging scenarios when there is more traffic. This could lead to a decrease in the AUC metric, which measures the ability of the model to distinguish between different classes. AUC is a suitable metric for this classification model, as it is not affected by class imbalance or threshold selection. The other options are not likely to be the reason for the result, as they are not related to the traffic density. Too much data representing congested areas would not cause the model to fail in those areas, but rather help the model learn better. Gradients vanishing or exploding is a problem that occurs during the training process, not after the deployment, and it affects the whole model, not specific scenarios. References:
Image Segmentation: U-Net For Self Driving Cars
Intelligent Semantic Segmentation for Self-Driving Vehicles Using Deep Learning
Sharing Pixelopolis, a self-driving car demo from Google I/O built with TensorFlow Lite
Google Cloud launches machine learning engineer certification
Google Professional Machine Learning Engineer Certification
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
You are going to train a DNN regression model with Keras APIs using this code:
How many trainable weights does your model have? (The arithmetic below is correct.)
501*256+257*128+2 = 161154
500*256+256*128+128*2 = 161024
501*256+257*128+128*2=161408
500*256*0 25+256*128*0 25+128*2 = 40448
The number of trainable weights in a DNN regression model with Keras APIs can be calculated by multiplying the number of input units by the number of output units for each layer, and adding the number of bias units for each layer. The bias units are usually equal to the number of output units, except for the last layer, which does not have bias units if the activation function is softmax1. In this code, the model has three layers: a dense layer with 256 units and relu activation, a dropout layer with 0.25 rate, and a dense layer with 2 units and softmax activation. The input shape is 500. Therefore, the number of trainable weights is:
For the first layer: 500 input units * 256 output units + 256 bias units = 128256
For the second layer: The dropout layer does not have any trainable weights, as it only randomly sets some of the input units to zero to prevent overfitting2.
For the third layer: 256 input units * 2 output units + 0 bias units = 512
The total number of trainable weights is 128256 + 512 = 161024. Therefore, the correct answer is B.
References:
How to calculate the number of parameters for a Convolutional Neural Network?
Dropout (keras.io)
You have developed a BigQuery ML model that predicts customer churn and deployed the model to Vertex Al Endpoints. You want to automate the retraining of your model by using minimal additional code when model feature values change. You also want to minimize the number of times that your model is retrained to reduce training costs. What should you do?
1. Enable request-response logging on Vertex Al Endpoints.
2 Schedule a TensorFlow Data Validation job to monitor prediction drift
3. Execute model retraining if there is significant distance between the distributions.
1. Enable request-response logging on Vertex Al Endpoints
2. Schedule a TensorFlow Data Validation job to monitor training/serving skew
3. Execute model retraining if there is significant distance between the distributions
1 Create a Vertex Al Model Monitoring job configured to monitor prediction drift.
2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitonng alert is detected.
3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery
1. Create a Vertex Al Model Monitoring job configured to monitor training/serving skew
2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected
3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery.
The best option for automating the retraining of your model by using minimal additional code when model feature values change, and minimizing the number of times that your model is retrained to reduce training costs, is to create a Vertex AI Model Monitoring job configured to monitor prediction drift, configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery. This option allows you to leverage the power and simplicity of Vertex AI, Pub/Sub, and Cloud Functions to monitor your model performance and retrain your model when needed. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained model to an online prediction endpoint, which can provide low-latency predictions for individual instances. Vertex AI can also provide various tools and services for data analysis, model development, model deployment, model monitoring, and model governance. A Vertex AI Model Monitoring job is a resource that can monitor the performance and quality of your deployed models on Vertex AI. A Vertex AI Model Monitoring job can help you detect and diagnose issues with your models, such as data drift, prediction drift, training/serving skew, or model staleness. Prediction drift is a type of model monitoring metric that measures the difference between the distributions of the predictions generated by the model on the training data and the predictions generated by the model on the online data. Prediction drift can indicate that the model performance is degrading, or that the online data is changing over time. By creating a Vertex AI Model Monitoring job configured to monitor prediction drift, you can track the changes in the model predictions, and compare them with the expected predictions. Alert monitoring is a feature of Vertex AI Model Monitoring that can notify you when a monitoring metric exceeds a predefined threshold. Alert monitoring can help you set up rules and conditions for triggering alerts, and choose the notification channel for receiving alerts. Pub/Sub is a service that can provide reliable and scalable messaging and event streaming on Google Cloud. Pub/Sub can help you publish and subscribe to messages, and deliver them to various Google Cloud services, such as Cloud Functions. A Pub/Sub queue is a resource that can hold messages that are published to a Pub/Sub topic. A Pub/Sub queue can help you store and manage messages, and ensure that they are delivered to the subscribers. By configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, you can send a notification to a Pub/Sub topic, and trigger a downstream action based on the alert. Cloud Functions is a service that can run your stateless code in response to events on Google Cloud. Cloud Functions can help you create and execute functions without provisioning or managing servers, and pay only for the resources you use. A Cloud Function is a resource that can execute a piece of code in response to an event, such as a Pub/Sub message. A Cloud Function can help you perform various tasks, such as data processing, data transformation, or data analysis. BigQuery is a service that can store and query large-scale data on Google Cloud. BigQuery can help you analyze your data by using SQL queries, and perform various tasks, such as data exploration, data transformation, or data visualization. BigQuery ML is a feature of BigQuery that can create and execute machine learning models in BigQuery by using SQL queries. BigQuery ML can help you build and train various types of models, such as linear regression, logistic regression, k-means clustering, matrix factorization, and deep neural networks. By using a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery, you can automate the retraining of your model by using minimal additional code when model feature values change. You can write a Cloud Function that listens to the Pub/Sub queue, and executes a SQL query to retrain your model in BigQuery ML when a prediction drift alert is received. By retraining your model in BigQuery ML, you can update your model parameters and improve your model performance and accuracy1.
The other options are not as good as option C, for the following reasons:
Option A: Enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data Validation job to monitor prediction drift, and executing model retraining if there is significant distance between the distributions would require more skills and steps than creating a Vertex AI Model Monitoring job configured to monitor prediction drift, configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery. Request-response logging is a feature of Vertex AI Endpoints that can record the requests and responses that are sent to and from the online prediction endpoint. Request-response logging can help you collect and analyze the online prediction data, and troubleshoot any issues with your model. TensorFlow Data Validation is a tool that can analyze and validate your data for machine learning. TensorFlow Data Validation can help you explore, understand, and clean your data, and detect various data issues, such as data drift, data skew, or data anomalies. Prediction drift is a type of data issue that measures the difference between the distributions of the predictions generated by the model on the training data and the predictions generated by the model on the online data. Prediction drift can indicate that the model performance is degrading, or that the online data is changing over time. By enabling request-response logging on Vertex AI Endpoints, and scheduling a TensorFlow Data Validation job to monitor prediction drift, you can collect and analyze the online prediction data, and compare the distributions of the predictions. However, enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data Validation job to monitor prediction drift, and executing model retraining if there is significant distance between the distributions would require more skills and steps than creating a Vertex AI Model Monitoring job configured to monitor prediction drift, configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery. You would need to write code, enable and configure the request-response logging, create and run the TensorFlow Data Validation job, define and measure the distance between the distributions, and execute the model retraining. Moreover, this option would not automate the retraining of your model, as you would need to manually check the prediction drift and trigger the retraining2.
Option B: Enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data Validation job to monitor training/serving skew, and executing model retraining if there is significant distance between the distributions would not help you monitor the changes in the model feature values, and could cause errors or poor performance. Training/serving skew is a type of data issue that measures the difference between the distributions of the features used to train the model and the features used to serve the model. Training/serving skew can indicate that the model is not trained on the representative data, or that the data is changing over time. By enabling request-response logging on Vertex AI Endpoints, and scheduling a TensorFlow Data Validation job to monitor training/serving skew, you can collect and analyze the online prediction data, and compare the distributions of the features. However, enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data Validation job to monitor training/serving skew, and executing model retraining if there is significant distance between the distributions would not help you monitor the changes in the model feature values, and could cause errors or poor performance. You would need to write code, enable and configure the request-response logging, create and run the TensorFlow Data Validation job, define and measure the distance between the distributions, and execute the model retraining. Moreover, this option would not monitor the prediction drift, which is a more direct and relevant metric for measuring the model performance and quality2.
Option D: Creating a Vertex AI Model Monitoring job configured to monitor training/serving skew, configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery would not help you monitor the changes in the model feature values, and could cause errors or poor performance. Training/serving skew is a type of data issue that measures the difference between the distributions of the features used to train the model and the features used to serve the model. Training/serving skew can indicate that the model is not trained on the representative data, or that the data is changing over time. By creating a Vertex AI Model Monitoring job configured to monitor training/serving skew, you can track the changes in the model features, and compare them with the expected features. However, creating a Vertex AI Model Monitoring job configured to monitor training/serving skew, configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery would not help you monitor the changes in the model feature values, and could cause errors or poor performance. You would need to write code, create and configure the Vertex AI Model Monitoring job, configure the alert monitoring, create and configure the Pub/Sub queue, and write a Cloud Function to trigger the retraining. Moreover, this option would not monitor the prediction drift, which is a more direct and relevant metric for measuring the model performance and quality1.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 4: ML Governance
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production
You are working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
Address the model overfitting by using a less complex algorithm.
Address data leakage by applying nested cross-validation during model training.
Address data leakage by removing features highly correlated with the target value.
Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Data leakage is a problem where information from outside the training dataset is used to create the model, resulting in an overly optimistic or invalid estimate of the model performance. Data leakage can occur in time series data when the temporal order of the data is not preserved during data preparation or model evaluation. For example, if the data is shuffled before splitting into train and test sets, or if future data is used to impute missing values in past data, then data leakage can occur.
One way to address data leakage in time series data is to apply nested cross-validation during model training. Nested cross-validation is a technique that allows you to perform both model selection and model evaluation in a robust way, while preserving the temporal order of the data. Nested cross-validation involves two levels of cross-validation: an inner loop for model selection and an outer loop for model evaluation. The inner loop splits the training data into k folds, trains and tunes the model on k-1 folds, and validates the model on the remaining fold. The inner loop repeats this process for each fold and selects the best model based on the validation performance. The outer loop splits the data into n folds, trains the best model from the inner loop on n-1 folds, and tests the model on the remaining fold. The outer loop repeats this process for each fold and evaluates the model performance based on the test results.
Nested cross-validation can help to avoid data leakage in time series data by ensuring that the model is trained and tested on non-overlapping data, and that the data used for validation is never seen by the model during training. Nested cross-validation can also provide a more reliable estimate of the model performance than a single train-test split or a simple cross-validation, as it reduces the variance and bias of the estimate.
References:
Data Leakage in Machine Learning
How to Avoid Data Leakage When Performing Data Preparation
Classification on a single time series - prevent leakage between train and test
You developed a Vertex Al ML pipeline that consists of preprocessing and training steps and each set of steps runs on a separate custom Docker image Your organization uses GitHub and GitHub Actions as CI/CD to run unit and integration tests You need to automate the model retraining workflow so that it can be initiated both manually and when a new version of the code is merged in the main branch You want to minimize the steps required to build the workflow while also allowing for maximum flexibility How should you configure the CI/CD workflow?
Trigger a Cloud Build workflow to run tests build custom Docker images, push the images to Artifact Registry and launch the pipeline in Vertex Al Pipelines.
Trigger GitHub Actions to run the tests launch a job on Cloud Run to build custom Docker images push the images to Artifact Registry and launch the pipeline in Vertex Al Pipelines.
Trigger GitHub Actions to run the tests build custom Docker images push the images to Artifact Registry, and launch the pipeline in Vertex Al Pipelines.
Trigger GitHub Actions to run the tests launch a Cloud Build workflow to build custom Dicker images, push the images to Artifact Registry, and launch the pipeline in Vertex Al Pipelines.
The best option for automating the model retraining workflow is to use GitHub Actions and Cloud Build. GitHub Actions is a service that can create and run workflows for continuous integration and continuous delivery (CI/CD) on GitHub. GitHub Actions can run tests, build and deploy code, and trigger other actions based on events such as code changes, pull requests, or manual triggers. Cloud Build is a service that can create and run scalable and reliable pipelines to build, test, and deploy software on Google Cloud. Cloud Build can build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines. Vertex AI Pipelines is a service that can orchestrate machine learning (ML) workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the ML model. By using GitHub Actions and Cloud Build, users can leverage the power and flexibility of Google Cloud to automate the model retraining workflow, while minimizing the steps required to build the workflow.
The other options are not as good as option D, for the following reasons:
Option A: Triggering a Cloud Build workflow to run tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines would require more configuration and maintenance than using GitHub Actions and Cloud Build. Cloud Build is a service that can create and run pipelines to build, test, and deploy software on Google Cloud, but it is not designed to integrate with GitHub or other source code repositories. To trigger a Cloud Build workflow from GitHub, users would need to set up a webhook, a Cloud Pub/Sub topic, and a Cloud Function1. Moreover, Cloud Build does not support manual triggers, which limits the flexibility of the workflow2.
Option B: Triggering GitHub Actions to run the tests, launching a job on Cloud Run to build custom Docker images, pushing the images to Artifact Registry, and launching the pipeline in Vertex AI Pipelines would require more steps and resources than using GitHub Actions and Cloud Build. Cloud Run is a service that can run stateless containers on a fully managed environment or on Anthos. Cloud Run can build custom Docker images, but it is not optimized for this task. Users would need to write a Dockerfile, a cloudbuild.yaml file, and a Cloud Run service configuration file, and use the gcloud command-line tool to build and deploy the image3. Moreover, Cloud Run is designed for serving HTTP requests, not for running ML pipelines, which can have different performance and scalability requirements.
Option C: Triggering GitHub Actions to run the tests, building custom Docker images, pushing the images to Artifact Registry, and launching the pipeline in Vertex AI Pipelines would require more skills and tools than using GitHub Actions and Cloud Build. GitHub Actions can run tests and build code, but it is not specialized for building Docker images. Users would need to install and configure Docker on the GitHub Actions runner, write a Dockerfile, and use the docker command-line tool to build and push the image. Moreover, GitHub Actions has limitations on the disk space, memory, and CPU of the runner, which can affect the speed and reliability of the image building process.
References:
Building CI/CD for Vertex AI pipelines: The first solution
Cloud Build
GitHub Actions
Vertex AI Pipelines
Triggering builds from GitHub
Triggering builds manually
Building containers
Cloud Run
[Building and testing Docker images with GitHub Actions]
[Usage limits, billing, and administration]
You recently trained a XGBoost model that you plan to deploy to production for online inference Before sending a predict request to your model's binary you need to perform a simple data preprocessing step This step exposes a REST API that accepts requests in your internal VPC Service Controls and returns predictions You want to configure this preprocessing step while minimizing cost and effort What should you do?
Store a pickled model in Cloud Storage Build a Flask-based app packages the app in a custom container image, and deploy the model to Vertex Al Endpoints.
Build a Flask-based app. package the app and a pickled model in a custom container image, and deploy the model to Vertex Al Endpoints.
Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK. package it and a pickled model in a custom container image based on a Vertex built-in image, and deploy the model to Vertex Al Endpoints.
Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK and package the handler in a custom container image based on a Vertex built-in container image Store a pickled model in Cloud Storage and deploy the model to Vertex Al Endpoints.
Option A is not the best answer because it requires storing the pickled model in Cloud Storage, which may incur additional cost and latency for loading the model. It also requires building a Flask-based app, which may not be necessary for a simple data preprocessing step.
Option B is not the best answer because it requires building a Flask-based app, which may not be necessary for a simple data preprocessing step. It also requires packaging the app and the pickled model in a custom container image, which may increase the size and complexity of the image.
Option C is not the best answer because it requires packaging the pickled model in a custom container image, which may increase the size and complexity of the image. It also does not leverage the Vertex built-in container image, which may provide some optimizations and integrations for XGBoost models.
Option D is the best answer because it leverages the Vertex built-in container image, which may provide some optimizations and integrations for XGBoost models. It also allows storing the pickled model in Cloud Storage, which may reduce the size and complexity of the image. It also allows building a custom predictor class based on XGBoost Predictor from the Vertex AI SDK, which may simplify the data preprocessing step and the prediction logic.
You work for a gaming company that manages a popular online multiplayer game where teams with 6 players play against each other in 5-minute battles. There are many new players every day. You need to build a model that automatically assigns available players to teams in real time. User research indicates that the game is more enjoyable when battles have players with similar skill levels. Which business metrics should you track to measure your model’s performance? (Choose One Correct Answer)
Average time players wait before being assigned to a team
Precision and recall of assigning players to teams based on their predicted versus actual ability
User engagement as measured by the number of battles played daily per user
Rate of return as measured by additional revenue generated minus the cost of developing a new model
The best business metric to track to measure the model’s performance is user engagement as measured by the number of battles played daily per user. This metric reflects the main goal of the model, which is to enhance the user experience and satisfaction by creating balanced and fair battles. If the model is successful, it should increase the user retention and loyalty, as well as the word-of-mouth and referrals. This metric is also easy to measure and interpret, as it can be directly obtained from the user activity data.
The other options are not optimal for the following reasons:
A. Average time players wait before being assigned to a team is not a good metric, as it does not capture the quality or outcome of the battles. It only measures the efficiency of the model, which is not the primary objective. Moreover, this metric can be influenced by external factors, such as the availability and demand of players, the network latency, and the server capacity.
B. Precision and recall of assigning players to teams based on their predicted versus actual ability is not a good metric, as it is difficult to measure and interpret. It requires having a reliable and consistent way of estimating the player’s ability, which can be subjective and dynamic. It also requires having a ground truth label for each assignment, which can be costly and impractical to obtain. Moreover, this metric does not reflect the user feedback or satisfaction, which is the ultimate goal of the model.
D. Rate of return as measured by additional revenue generated minus the cost of developing a new model is not a good metric, as it is not directly related to the model’s performance. It measures the profitability of the model, which is a secondary objective. Moreover, this metric can be affected by many other factors, such as the market conditions, the pricing strategy, the marketing campaigns, and the competition.
References:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Google Cloud launches machine learning engineer certification
How to measure user engagement
How to choose the right metrics for your machine learning model
Your task is classify if a company logo is present on an image. You found out that 96% of a data does not include a logo. You are dealing with data imbalance problem. Which metric do you use to evaluate to model?
F1 Score
RMSE
F Score with higher precision weighting than recall
F Score with higher recall weighted than precision
The F1 score is a metric that combines both precision and recall, and is suitable for evaluating imbalanced classification problems. Precision measures the fraction of true positives among the predicted positives, and recall measures the fraction of true positives among the actual positives. The F1 score is the harmonic mean of precision and recall, and it ranges from 0 to 1, with higher values indicating better performance. The F1 score is a good metric for imbalanced data because it balances both the false positives and the false negatives, and does not favor the majority class over the minority class.
The other options are not good metrics for imbalanced data. RMSE (root mean squared error) is a metric for regression problems, not classification problems. It measures the average squared difference between the predicted and the actual values, and is not suitable for binary outcomes. F score with higher precision weighting than recall, or F0.5 score, is a metric that gives more importance to precision than recall. This means that it penalizes false positives more than false negatives, which is not desirable for imbalanced data where the minority class is more important. F score with higher recall weighting than precision, or F2 score, is a metric that gives more importance to recall than precision. This means that it penalizes false negatives more than false positives, which might be suitable for some imbalanced data problems, but not for the logo detection problem. In this problem, both false positives and false negatives are equally important, as we want to accurately identify the presence or absence of a logo in an image. Therefore, the F1 score is a better metric than the F2 score. References:
Tour of Evaluation Metrics for Imbalanced Classification
Metrics for imbalanced data (simply explained)
You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server, your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. What should you do first?
Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values significantly differ from the predicted performance values.
Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset.
Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production environment.
Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually labeled dataset.
Option A is incorrect because training a time-series model to predict the machines’ performance values, and configuring an alert if a machine’s actual performance values significantly differ from the predicted performance values, is not the best way to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. This option assumes that the performance values follow a predictable pattern, which may not be the case for complex systems. Moreover, this option does not use any historical incident data, which may contain useful information for identifying failures. Furthermore, this option does not involve any model evaluation or validation, which are essential steps for ensuring the quality and reliability of the model.
Option B is correct because implementing a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data, and training a model to predict anomalies based on this labeled dataset, is a reasonable way to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. This option uses a simple and fast method to label the historical performance data, which is necessary for supervised learning. A z-score is a measure of how many standard deviations a value is away from the mean of a distribution1. By using a z-score, we can label the performance values that are unusually high or low as anomalies, which may indicate failures. Then, we can train a model to learn the patterns of normal and anomalous performance values, and use it to predict anomalies on new data. We can also evaluate and validate the model using metrics such as precision, recall, or F1-score, and compare it with other models or methods.
Option C is incorrect because developing a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data, and testing this heuristic in a production environment, is not a safe way to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. This option does not involve any model training or evaluation, which are essential steps for ensuring the quality and reliability of the solution. Moreover, this option does not test the heuristic on a separate dataset, such as a validation or test set, before deploying it to production, which may lead to errors or failures in the production environment.
Option D is incorrect because hiring a team of qualified analysts to review and label the machines’ historical performance data, and training a model based on this manually labeled dataset, is not a feasible way to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. This option may produce high-quality labels, but it is also costly, time-consuming, and prone to human errors or biases. Moreover, this option may not scale well with large or complex datasets, which may require more analysts or more time to label.
References:
Z-score
[Predictive maintenance]
[Anomaly detection]
[Time-series analysis]
[Model evaluation]
Your data science team is training a PyTorch model for image classification based on a pre-trained RestNet model. You need to perform hyperparameter tuning to optimize for several parameters. What should you do?
Convert the model to a Keras model, and run a Keras Tuner job.
Run a hyperparameter tuning job on AI Platform using custom containers.
Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.
Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.
AI Platform supports hyperparameter tuning for PyTorch models using custom containers. This allows you to use any Python dependencies and libraries that are not included in the pre-built AI Platform Training runtime versions. You can also use a pre-trained model such as ResNet as a base for your custom model. To run a hyperparameter tuning job on AI Platform using custom containers, you need to do the following steps:
Create a Dockerfile that defines the container image for your training application. The Dockerfile should install PyTorch and any other dependencies, copy your training code and configuration files, and set the entrypoint for the container.
Build the container image and push it to Container Registry or another accessible registry.
Create a YAML file that defines the configuration for your hyperparameter tuning job. The YAML file should specify the container image URI, the training input and output paths, the hyperparameters to tune, the metric to optimize, and the tuning algorithm and budget.
Submit the hyperparameter tuning job to AI Platform using the gcloud command-line tool or the AI Platform Training API.
References:
Hyperparameter tuning overview
Using custom containers
PyTorch on AI Platform Training
You work for an auto insurance company. You are preparing a proof-of-concept ML application that uses images of damaged vehicles to infer damaged parts Your team has assembled a set of annotated images from damage claim documents in the company's database The annotations associated with each image consist of a bounding box for each identified damaged part and the part name. You have been given a sufficient budget to tram models on Google Cloud You need to quickly create an initial model What should you do?
Download a pre-trained object detection mode! from TensorFlow Hub Fine-tune the model in Vertex Al Workbench by using the annotated image data.
Train an object detection model in AutoML by using the annotated image data.
Create a pipeline in Vertex Al Pipelines and configure the AutoMLTrainingJobRunOp compon it to train a custom object detection model by using the annotated image data.
Train an object detection model in Vertex Al custom training by using the annotated image data.
According to the official exam guide1, one of the skills assessed in the exam is to “design, build, and productionalize ML models to solve business challenges using Google Cloud technologies”. AutoML Vision2 is a service that allows you to train and deploy custom vision models for image classification and object detection. AutoML Vision simplifies the model development process by providing a graphical user interface and a no-code approach. You can use AutoML Vision to train an object detection model by using the annotated image data, and evaluate the model performance using metrics such as mean average precision (mAP) and intersection over union (IoU)3. Therefore, option B is the best way to quickly create an initial model for the given use case. The other options are not relevant or optimal for this scenario. References:
Professional ML Engineer Exam Guide
AutoML Vision
Object detection evaluation
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
You have successfully deployed to production a large and complex TensorFlow model trained on tabular data. You want to predict the lifetime value (LTV) field for each subscription stored in the BigQuery table named subscription. subscriptionPurchase in the project named my-fortune500-company-project.
You have organized all your training code, from preprocessing data from the BigQuery table up to deploying the validated model to the Vertex AI endpoint, into a TensorFlow Extended (TFX) pipeline. You want to prevent prediction drift, i.e., a situation when a feature data distribution in production changes significantly over time. What should you do?
Implement continuous retraining of the model daily using Vertex AI Pipelines.
Add a model monitoring job where 10% of incoming predictions are sampled 24 hours.
Add a model monitoring job where 90% of incoming predictions are sampled 24 hours.
Add a model monitoring job where 10% of incoming predictions are sampled every hour.
Option A is incorrect because implementing continuous retraining of the model daily using Vertex AI Pipelines is not the most efficient way to prevent prediction drift. Vertex AI Pipelines is a service that allows you to create and run scalable and portable ML pipelines on Google Cloud1. You can use Vertex AI Pipelines to retrain your model daily using the latest data from the BigQuery table. However, this option may be unnecessary or wasteful, as the data distribution may not change significantly every day, and retraining the model may consume a lot of resources and time. Moreover, this option does not monitor the model performance or detect the prediction drift, which are essential steps for ensuring the quality and reliability of the model.
Option B is correct because adding a model monitoring job where 10% of incoming predictions are sampled 24 hours is the best way to prevent prediction drift. Model monitoring is a service that allows you to track the performance and health of your deployed models over time2. You can use model monitoring to sample a fraction of the incoming predictions and compare them with the ground truth labels, which can be obtained from the BigQuery table or other sources. You can also use model monitoring to compute various metrics, such as accuracy, precision, recall, or F1-score, and set thresholds or alerts for them. By using model monitoring, you can detect and diagnose the prediction drift, and decide when to retrain or update your model. Sampling 10% of the incoming predictions every 24 hours is a reasonable choice, as it balances the trade-off between the accuracy and the cost of the monitoring job.
Option C is incorrect because adding a model monitoring job where 90% of incoming predictions are sampled 24 hours is not a optimal way to prevent prediction drift. This option has the same advantages as option B, as it uses model monitoring to track the performance and health of the deployed model. However, this option is not cost-effective, as it samples a very large fraction of the incoming predictions, which may incur a lot of storage and processing costs. Moreover, this option may not improve the accuracy of the monitoring job significantly, as sampling 10% of the incoming predictions may already provide a representative sample of the data distribution.
Option D is incorrect because adding a model monitoring job where 10% of incoming predictions are sampled every hour is not a necessary way to prevent prediction drift. This option also has the same advantages as option B, as it uses model monitoring to track the performance and health of the deployed model. However, this option may be excessive, as it samples the incoming predictions too frequently, which may not reflect the actual changes in the data distribution. Moreover, this option may incur more storage and processing costs than option B, as it generates more samples and metrics.
References:
Vertex AI Pipelines documentation
Model monitoring documentation
[Prediction drift]
[TensorFlow Extended documentation]
[BigQuery documentation]
[Vertex AI documentation]
You work for an online grocery store. You recently developed a custom ML model that recommends a recipe when a user arrives at the website. You chose the machine type on the Vertex Al endpoint to optimize costs by using the queries per second (QPS) that the model can serve, and you deployed it on a single machine with 8 vCPUs and no accelerators.
A holiday season is approaching and you anticipate four times more traffic during this time than the typical daily traffic You need to ensure that the model can scale efficiently to the increased demand. What should you do?
1, Maintain the same machine type on the endpoint.
2 Set up a monitoring job and an alert for CPU usage
3 If you receive an alert add a compute node to the endpoint
1 Change the machine type on the endpoint to have 32 vCPUs
2. Set up a monitoring job and an alert for CPU usage
3 If you receive an alert, scale the vCPUs further as needed
1 Maintain the same machine type on the endpoint Configure the endpoint to enable autoscalling based on vCPU usage.
2 Set up a monitoring job and an alert for CPU usage
3 If you receive an alert investigate the cause
1 Change the machine type on the endpoint to have a GPU_ Configure the endpoint to enable autoscaling based on the GPU usage.
2 Set up a monitoring job and an alert for GPU usage.
3 If you receive an alert investigate the cause.
Vertex AI Endpoint is a service that allows you to serve your ML models online and scale them automatically. You can use Vertex AI Endpoint to deploy the custom ML model that you developed for recommending recipes to the users. You can maintain the same machine type on the endpoint, which is a single machine with 8 vCPUs and no accelerators. This machine type can optimize the costs by using the queries per second (QPS) that the model can serve. You can also configure the endpoint to enable autoscaling based on vCPU usage. Autoscaling is a feature that allows the endpoint to adjust the number of compute nodes based on the traffic demand. By enabling autoscaling based on vCPU usage, you can ensure that the endpoint can scale efficiently to the increased demand during the holiday season, without overprovisioning or underprovisioning the resources. You can also set up a monitoring job and an alert for CPU usage. Monitoring is a service that allows you to collect and analyze the metrics and logs from your Google Cloud resources. You can use Monitoring to monitor the CPU usage of your endpoint, which is an indicator of the load and performance of your model. You can also set up an alert for CPU usage, which is a feature that allows you to receive notifications when the CPU usage exceeds a certain threshold. By setting up a monitoring job and an alert for CPU usage, you can keep track of the health and status of your endpoint, and detect any issues or anomalies. If you receive an alert, you can investigate the cause by using the Monitoring dashboard, which provides a graphical interface for viewing and analyzing the metrics and logs from your endpoint. You can also use the Monitoring dashboard to troubleshoot and resolve the issues, such as adjusting the autoscaling parameters, optimizing the model, or updating the machine type. By using Vertex AI Endpoint, autoscaling, and Monitoring, you can ensure that the model can scale efficiently to the increased demand during the holiday season, and handle any issues or alerts that might arise. References:
[Vertex AI Endpoint documentation]
[Autoscaling documentation]
[Monitoring documentation]
[Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate]
You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest most efficient approach. What should you do?
Write a query that preprocesses the data by using BigQuery and creates a new table Create a Vertex Al managed dataset with the new table as the data source.
Use Dataflow to preprocess the data Write the output in TFRecord format to a Cloud Storage bucket.
Write a query that preprocesses the data by using BigQuery Export the query results as CSV files and use
those files to create a Vertex Al managed dataset.
Use a Vertex Al Workbench notebook instance to preprocess the data by using the pandas library Export the data as CSV files, and use those files to create a Vertex Al managed dataset.
The simplest and most efficient approach for preparing the data for AutoML is to use BigQuery and Vertex AI. BigQuery is a serverless, scalable, and cost-effective data warehouse that can perform fast and interactive queries on large datasets. BigQuery can preprocess the data by using SQL functions such as filtering, aggregating, joining, transforming, and creating new features. The preprocessed data can be stored in a new table in BigQuery, which can be used as the data source for Vertex AI. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can create a managed dataset from a BigQuery table, which can be used to train an AutoML model. Vertex AI can also evaluate, deploy, and monitor the AutoML model, and provide online or batch predictions. By using BigQuery and Vertex AI, users can leverage the power and simplicity of Google Cloud to train an AutoML model to predict house prices.
The other options are not as simple or efficient as option A, for the following reasons:
Option B: Using Dataflow to preprocess the data and write the output in TFRecord format to a Cloud Storage bucket would require more steps and resources than using BigQuery and Vertex AI. Dataflow is a service that can create scalable and reliable pipelines to process large volumes of data from various sources. Dataflow can preprocess the data by using Apache Beam, a programming model for defining and executing data processing workflows. TFRecord is a binary file format that can store sequential data efficiently. However, using Dataflow and TFRecord would require writing code, setting up a pipeline, choosing a runner, and managing the output files. Moreover, TFRecord is not a supported format for Vertex AI managed datasets, so the data would need to be converted to CSV or JSONL files before creating a Vertex AI managed dataset.
Option C: Writing a query that preprocesses the data by using BigQuery and exporting the query results as CSV files would require more steps and storage than using BigQuery and Vertex AI. CSV is a text file format that can store tabular data in a comma-separated format. Exporting the query results as CSV files would require choosing a destination Cloud Storage bucket, specifying a file name or a wildcard, and setting the export options. Moreover, CSV files can have limitations such as size, schema, and encoding, which can affect the quality and validity of the data. Exporting the data as CSV files would also incur additional storage costs and reduce the performance of the queries.
Option D: Using a Vertex AI Workbench notebook instance to preprocess the data by using the pandas library and exporting the data as CSV files would require more steps and skills than using BigQuery and Vertex AI. Vertex AI Workbench is a service that provides an integrated development environment for data science and machine learning. Vertex AI Workbench allows users to create and run Jupyter notebooks on Google Cloud, and access various tools and libraries for data analysis and machine learning. Pandas is a popular Python library that can manipulate and analyze data in a tabular format. However, using Vertex AI Workbench and pandas would require creating a notebook instance, writing Python code, installing and importing pandas, connecting to BigQuery, loading and preprocessing the data, and exporting the data as CSV files. Moreover, pandas can have limitations such as memory usage, scalability, and compatibility, which can affect the efficiency and reliability of the data processing.
References:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: Data Engineering for ML on Google Cloud, Week 1: Introduction to Data Engineering for ML
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 1: Architecting low-code ML solutions, 1.3 Training models by using AutoML
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: Low-code ML Solutions, Section 4.3: AutoML
BigQuery
Vertex AI
Dataflow
TFRecord
CSV
Vertex AI Workbench
Pandas
You are developing an ML model that predicts the cost of used automobiles based on data such as location, condition model type color, and engine-'battery efficiency. The data is updated every night Car dealerships will use the model to determine appropriate car prices. You created a Vertex Al pipeline that reads the data splits the data into training/evaluation/test sets performs feature engineering trains the model by using the training dataset and validates the model by using the evaluation dataset. You need to configure a retraining workflow that minimizes cost What should you do?
Compare the training and evaluation losses of the current run If the losses are similar, deploy the model to a Vertex AI endpoint Configure a cron job to redeploy the pipeline every night.
Compare the training and evaluation losses of the current run If the losses are similar deploy the model to a Vertex Al endpoint with training/serving skew threshold model monitoring When the model monitoring threshold is tnggered redeploy the pipeline.
Compare the results to the evaluation results from a previous run If the performance improved deploy the model to a Vertex Al endpoint Configure a cron job to redeploy the pipeline every night.
Compare the results to the evaluation results from a previous run If the performance improved deploy the model to a Vertex Al endpoint with training/serving skew threshold model monitoring. When the model monitoring threshold is triggered, redeploy the pipeline.
Comparing the training and evaluation losses of the current run is a good way to check if the model is overfitting or underfitting. If the losses are similar, it means that the model is generalizing well and can be deployed to a Vertex AI endpoint. Vertex AI endpoint is a service that allows you to serve your ML models online and scale them automatically. By using a training/serving skew threshold model monitoring, you can detect if there is a significant difference between the data used for training and the data used for serving. This can indicate that the model is becoming stale or inaccurate over time. When the model monitoring threshold is triggered, it means that the model needs to be retrained with the latest data. By redeploying the pipeline, you can automate the retraining process and update the model with the new data. This way, you can minimize the cost of retraining and ensure that your model is always up-to-date and accurate. References:
Vertex AI documentation
Vertex AI endpoint documentation
Model monitoring documentation
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Copyright © 2021-2025 CertsTopics. All Rights Reserved