Latest Google Professional-Cloud-DevOps-Engineer Dumps PDF Questions Answers 2025

Google Cloud Certified - Professional Cloud DevOps Engineer Exam Questions and Answers

Question 1

You are performing a semiannual capacity planning exercise for your flagship service. You expect a service user growth rate of 10% month-over-month over the next six months. Your service is fully containerized and runs on Google Cloud Platform (GCP). using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler enabled. You currently consume about 30% of your total deployed CPU capacity, and you require resilience against the failure of a zone. You want to ensure that your users experience minimal negative impact as a result of this growth or as a result of zone failure, while avoiding unnecessary costs. How should you prepare to handle the predicted growth?

Options:

Verity the maximum node pool size, enable a horizontal pod autoscaler, and then perform a load test to verity your expected resource needs.

Because you are deployed on GKE and are using a cluster autoscaler. your GKE cluster will scale automatically, regardless of growth rate.

Because you are at only 30% utilization, you have significant headroom and you won't need to add any additional capacity for this rate of growth.

Proactively add 60% more node capacity to account for six months of 10% growth rate, and then perform a load test to make sure you have enough capacity.

Buy Now

Question 2

You are reviewing your deployment pipeline in Google Cloud Deploy You must reduce toil in the pipeline and you want to minimize the amount of time it takes to complete an end-to-end deployment What should you do?

Choose 2 answers

Options:

Create a trigger to notify the required team to complete the next step when manual intervention is required

Divide the automation steps into smaller tasks

Use a script to automate the creation of the deployment pipeline in Google Cloud Deploy

Add more engineers to finish the manual steps.

Automate promotion approvals from the development environment to the test environment

Question 3

You are creating a CI/CD pipeline in Cloud Build to build an application container image The application code is stored in GitHub Your company requires thai production image builds are only run against the main branch and that the change control team approves all pushes to the main branch You want the image build to be as automated as possible What should you do?

Choose 2 answers

Options:

Create a trigger on the Cloud Build job Set the repository event setting to Pull request'

Add the owners file to the Included files filter on the trigger

Create a trigger on the Cloud Build job Set the repository event setting to Push to a branch

Configure a branch protection rule for the main branch on the repository

Enable the Approval option on the trigger

Question 4

You support a high-traffic web application with a microservice architecture. The home page of the application displays multiple widgets containing content such as the current weather, stock prices, and news headlines. The main serving thread makes a call to a dedicated microservice for each widget and then lays out the homepage for the user. The microservices occasionally fail; when that happens, theserving thread serves the homepage with some missing content. Users of the application are unhappy if this degraded mode occurs too frequently, but they would rather have some content served instead of no content at all. You want to set a Service Level Objective (SLO) to ensure that the user experience does not degrade too much. What Service Level Indicator {SLI) should you use to measure this?

Options:

A quality SLI: the ratio of non-degraded responses to total responses

An availability SLI: the ratio of healthy microservices to the total number of microservices

A freshness SLI: the proportion of widgets that have been updated within the last 10 minutes

A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total number of microservice calls

Question 5

You are monitoring a service that uses n2-standard-2 Compute Engine instances that serve large files. Users have reported that downloads are slow. Your Cloud Monitoring dashboard shows that your VMS are running at peak network throughput. You want to improve the network throughput performance. What should you do?

Options:

Deploy a Cloud NAT gateway and attach the gateway to the subnet of the VMS.

Add additional network interface controllers (NICs) to your VMS.

Change the machine type for your VMS to n2-standard-8.

Deploy the Ops Agent to export additional monitoring metrics.

Question 6

You need to introduce postmortems into your organization during the holiday shopping season. You are expecting your web application to receive a large volume of traffic in a short period. You need to prepare your application for potential failures during the event What should you do?

Choose 2 answers

Options:

Monitor latency of your services for average percentile latency.

Review your increased capacity requirements and plan for the required quota management.

Create alerts in Cloud Monitoring for all common failures that your application experiences.

Ensure that relevant system metrics are being captured with Cloud Monitoring and create alerts at levels of interest.

Configure Anthos Service Mesh on the application to identify issues on the topology map.

Question 7

You support a multi-region web service running on Google Kubernetes Engine (GKE) behind a Global HTTP'S Cloud Load Balancer (CLB). For legacy reasons, user requests first go through a third-party Content Delivery Network (CDN). which then routes traffic to the CLB. You have already implemented an availability Service Level Indicator (SLI) at the CLB level. However, you want to increase coverage in case of a potential load balancer misconfiguration. CDN failure, or other global networking catastrophe. Where should you measure this new SLI?

Choose 2 answers

Options:

Your application servers' logs

Instrumentation coded directly in the client

Metrics exported from the application servers

GKE health checks for your application servers

A synthetic client that periodically sends simulated user requests

Question 8

You manage your company's primary revenue-generating application. You have an error budget policy in place that freezes production deployments when the application is close to breaching its SLO. A number of issues have recently occurred, and the application has exhausted its error budget. You need to deploy a new release to the application that includes a feature urgently required by your largest customer. You have been told that the release has passed all unit tests. What should you do?

Options:

Start the deployment of the feature immediately.

Delay the deployment of the feature until the error budget is replenished.

Re-run the unit tests, and start the deployment of the feature if the tests pass.

Deploy the feature to a subset of users, and gradually roll out to all users if there are no errors reported.

Answer:

Explanation:

Comprehensive and Detailed Explanation From SRE Principles:

This scenario presents a classic SRE conflict: maintaining reliability (as dictated by the exhausted error budget and deployment freeze) versus delivering an urgent business requirement. The error budget policy is there for a reason – to protect users from further instability.

A. Start the deployment of the feature immediately: This directly violates the established error budget policy and the deployment freeze. While the feature is urgent, deploying without caution when the system is already unstable (as indicated by the exhausted error budget) is highly risky and could exacerbate existing problems or introduce new ones, further impacting revenue and customer trust.

B. Delay the deployment of the feature until the error budget is replenished: This strictly adheres to the policy but might not be acceptable given the "urgently required by your largest customer" clause. SRE principles allow for reasoned exceptions and risk management, not just blind adherence if the business context is compelling enough and risks are managed.

C. Re-run the unit tests, and start the deployment of the feature if the tests pass: Unit tests are foundational but insufficient to guarantee a complex application will perform reliably in production, especially when the system is already indicating instability (exhausted error budget). Passing unit tests doesn't negate the risk signaled by the depleted error budget.

D. Deploy the feature to a subset of users, and gradually roll out to all users if there are no errors reported: This is the most balanced SRE approach in this situation. It acknowledges the urgency while attempting to mitigate risk:Risk Mitigation: A canary release (deploying to a small subset of users) limits the potential negative impact if the new feature introduces new errors or worsens existing instability.

Observation: It allows for careful monitoring of the new release in the production environment with real users.

Data-Driven Decision: The decision to proceed with a wider rollout is based on observed behavior ("if there are no errors reported"), not just assumptions.

Controlled Rollout: A gradual rollout allows for quick rollback if issues arise.

While an exhausted error budget signals a deployment freeze, critical business needs can sometimes necessitate a carefully managed exception. A canary release is a standard SRE technique for deploying changes with reduced risk, making it the most appropriate course of action when faced with such conflicting priorities. The team would also need to communicate clearly about the risks and the rationale for this exception. It's implied that this urgent feature might also fix existing issues or is critical enough to warrant the carefully managed risk.

Reference (Based on SRE principles from Google's SRE books and general practices):

Error Budgets: "The SRE Book" (Site Reliability Engineering: How Google Runs Production Systems) discusses error budgets and deployment freezes. An exhausted error budget typically means no more risky changes until reliability improves.

Canary Releases: This is a fundamental practice for safely deploying new versions. It's about testing in production with a small percentage of traffic.

Managing Risk: SRE is about managing risk, not eliminating it entirely. In situations like this, a calculated risk with strong mitigation (canary, monitoring, rollback plan) can be justified for critical business needs. The decision involves weighing the risk of deploying against the risk of not deploying the urgent feature.

Option D represents a pragmatic SRE approach to navigate this difficult situation by minimizing the blast radius of the change.

Question 9

Your application images are built and pushed to Google Container Registry (GCR). You want to build an automated pipeline that deploys the application when the image is updated while minimizing the development effort. What should you do?

Options:

Use Cloud Build to trigger a Spinnaker pipeline.

Use Cloud Pub/Sub to trigger a Spinnaker pipeline.

Use a custom builder in Cloud Build to trigger a Jenkins pipeline.

Use Cloud Pub/Sub to trigger a custom deployment service running in Google Kubernetes Engine (GKE).

Question 10

Your organization recently adopted a container-based workflow for application development. Your team develops numerous applications that are deployed continuously through an automated build pipeline to a Kubernetes cluster in the production environment. The security auditor is concerned that developers or operators could circumvent automated testing and push code changes to production without approval. What should you do to enforce approvals?

Options:

Configure the build system with protected branches that require pull request approval.

Use an Admission Controller to verify that incoming requests originate from approved sources.

Leverage Kubernetes Role-Based Access Control (RBAC) to restrict access to only approved users.

Enable binary authorization inside the Kubernetes cluster and configure the build pipeline as an attestor.

Question 11

You are developing reusable infrastructure as code modules. Each module contains integration tests that launch the module in a test project. You are using GitHub for source control. You need to Continuously test your feature branch and ensure that all code is tested before changes are accepted. You need to implement a solution to automate the integration tests. What should you do?

Options:

Use a Jenkins server for Cl/CD pipelines. Periodically run all tests in the feature branch.

Use Cloud Build to run the tests. Trigger all tests to run after a pull request is merged.

Ask the pull request reviewers to run the integration tests before approving the code.

Use Cloud Build to run tests in a specific folder. Trigger Cloud Build for every GitHub pull request.

Answer:

Explanation:

Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure. Cloud Build can import source code from Google Cloud Storage, Cloud Source Repositories, GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives1. Cloud Build can also run integration tests as part of your build steps2.

You can use Cloud Build to run tests in a specific folder by specifying the path to the folder in the dir field of your build step3. For example, if you have a folder named tests that contains your integration tests, you can use the following build step to run them:

steps:

- name: 'gcr.io/cloud-builders/go'

args: ['test', '-v']

dir: 'tests'

Copy

You can use Cloud Build to trigger builds for every GitHub pull request by using the Cloud Build GitHub app. The app allows you to automatically build on Git pushes and pull requests and view your build results on GitHub and Google Cloud console4. You can configure the app to run builds on specific branches, tags, or paths5. For example, if you want to run builds on pull requests that target the master branch, you can use the following trigger configuration:

includedFiles:

- '**'

name: 'pull-request-trigger'

github:

name: 'my-repo'

owner: 'my-org'

pullRequest:

branch: '^master$'

Using Cloud Build to run tests in a specific folder and trigger builds for every GitHub pull request is a good way to continuously test your feature branch and ensure that all code is tested before changes areaccepted. This way, you can catch any errors or bugs early and prevent them from affecting the main branch.

Using a Jenkins server for CI/CD pipelines is not a bad option, but it would require more setup and maintenance than using Cloud Build, which is fully managed by Google Cloud. Periodically running all tests in the feature branch is not as efficient as running tests for every pull request, as it may delay the feedback loop and increase the risk of conflicts or failures.

Using Cloud Build to run the tests after a pull request is merged is not a good practice, as it may introduce errors or bugs into the main branch that could have been prevented by testing before merging.

Asking the pull request reviewers to run the integration tests before approving the code is not a reliable way of ensuring code quality, as it depends on human intervention and may be prone to errors or oversights.

Question 12

Your application services run in Google Kubernetes Engine (GKE). You want to make sure that only images from your centrally-managed Google Container Registry (GCR) image registry in the altostrat-images project can be deployed to the cluster while minimizing development time. What should you do?

Options:

Create a custom builder for Cloud Build that will only push images to gcr.io/altostrat-images.

Use a Binary Authorization policy that includes the whitelist name pattern gcr.io/attostrat-images/.

Add logic to the deployment pipeline to check that all manifests contain only images from gcr.io/altostrat-images.

Add a tag to each image in gcr.io/altostrat-images and check that this tag is present when the image is deployed.

Question 13

Your company runs services by using multiple globally distributed Google Kubernetes Engine (GKE) clusters Your operations team has set up workload monitoring that uses Prometheus-based tooling for metrics alerts: and generating dashboards This setup does not provide a method to view metrics globally across all clusters You need to implement a scalable solution to support global Prometheus querying and minimize management overhead What should you do?

Options:

Configure Prometheus cross-service federation for centralized data access

Configure workload metrics within Cloud Operations for GKE

Configure Prometheus hierarchical federation for centralized data access

Configure Google Cloud Managed Service for Prometheus

Question 14

Your development team has created a new version of their service’s API. You need to deploy the new versions of the API with the least disruption to third-party developers and end users of third-party installed applications. What should you do?

Options:

Introduce the new version of the API.Announce deprecation of the old version of the API.Deprecate the old version of the API.Contact remaining users of the old API.Provide best effort support to users of the old API.Turn down the old version of the API.

Announce deprecation of the old version of the API.Introduce the new version of the API.Contact remaining users on the old API.Deprecate the old version of the API.Turn down the old version of the API.Provide best effort support to users of the old API.

Announce deprecation of the old version of the API.Contact remaining users on the old API.Introduce the new version of the API.Deprecate the old version of the API.Provide best effort support to users of the old API.Turn down the old version of the API.

Introduce the new version of the API.Contact remaining users of the old API.Announce deprecation of the old version of the API.Deprecate the old version of the API.Turn down the old version of the API.Provide best effort support to users of the old API.

Question 15

You are designing a new Google Cloud organization for a client. Your client is concerned with the risks associated with long-lived credentials created in Google Cloud. You need to design a solution to completely eliminate the risks associated with the use of JSON service account keys while minimizing operational overhead. What should you do?

Options:

Use custom versions of predefined roles to exclude all iam.serviceAccountKeys. * service account role permissions.

Apply the constraints/iam.disableserviceAccountKeycreation constraint to the organization.

Apply the constraints/iam. disableServiceAccountKeyUp10ad constraint to the organization.

Grant the roles/ iam.serviceAccountKeyAdmin IAM role to organization administrators only.

Answer:

Explanation:

The correct answer is B. Apply the constraints/iam.disableServiceAccountKeyCreation constraint to the organization.

According to the Google Cloud documentation, the constraints/iam.disableServiceAccountKeyCreation constraint is an organization policy constraint that prevents the creation of user-managed service account keys1.User-managed service account keys are long-lived credentials that can be downloaded as JSON or P12 files and used to authenticate as a service account2.These keys pose severe security risks if they are leaked, stolen, or misused by unauthorized entities34.By applying this constraint to the organization, you can completely eliminate the risks associated with the use of JSON service account keys and enforce a more secure alternative for authentication, such as Workload Identity or short-lived access tokens12. This also minimizes operational overhead by avoiding the need to manage, rotate, or revoke user-managed service account keys.

The other options are incorrect because they do not completely eliminate the risks associated with the use of JSON service account keys. Option A is incorrect because it only restricts the IAM permissions to create, list, get, delete, or sign service account keys, but it does not prevent existing keys from being used or leaked. Option C is incorrect because it only disables the upload of user-managed service account keys, but it does not prevent the creation or download of such keys. Option D is incorrect because it only limits the IAM role that can create and manage service account keys, but it does not prevent the keys from being distributed or exposed to unauthorized entities.

[Reference:, Disable user-managed service account key creation, Disable user-managed service account key creation.Service accounts, User-managed service accounts.Help keep your Google Cloud service account keys safe, Help keep your Google Cloud service account keys safe.Stop Downloading Google Cloud ServiceAccount Keys!, Stop Downloading Google Cloud Service Account Keys! [Service Account Keys], Service Account Keys. [Disable user-managed service account key upload], Disable user-managed service account key upload. [Granting roles to service accounts], Granting roles to service accounts., , , , , ]

Question 16

You are designing a system with three different environments: development, quality assurance (QA), and production.

Each environment will be deployed with Terraform and has a Google Kubemetes Engine (GKE) cluster created so that application teams can deploy their applications. Anthos Config Management will be used and templated to deploy

infrastructure level resources in each GKE cluster. All users (for example, infrastructure operators and application owners) will use GitOps. How should you structure your source control repositories for both Infrastructure as Code (laC) and application code?

Options:

Cloud Infrastructure (Terraform) repository is shared: different directories are different environmentsGKE Infrastructure (Anthos Config Management Kustomize manifests) repository is shared: differentoverlay directories are different environmentsApplication (app source code) repositories are separated: different branches are different features

Cloud Infrastructure (Terraform) repository is shared: different directories are different environmentsGKE Infrastructure (Anthos Config Management Kustomize manifests) repositories are separated:different branches are different environmentsApplication (app source code) repositories are separated: different branches are different features

Cloud Infrastructure (Terraform) repository is shared: different branches are different environmentsGKE Infrastructure (Anthos Config Management Kustomize manifests) repository is shared: differentoverlay directories are different environmentsApplication (app source code) repository is shared: different directories are different features

Cloud Infrastructure (Terraform) repositories are separated: different branches are different environmentsGKE Infrastructure (Anthos Config Management Kustomize manifests) repositories are separated:different overlay directories are different environmentsApplication (app source code) repositories are separated: different branches are different features

Answer:

Explanation:

The correct answer is B. Cloud Infrastructure (Terraform) repository is shared: different directories are different environments. GKE Infrastructure (Anthos Config Management Kustomize manifests) repositories are separated: different branches are different environments. Application (app source code) repositories are separated: different branches are different features.

This answer follows the best practices for using Terraform and Anthos Config Management with GitOps, as described in the following sources:

For Terraform, it is recommended to use a single repository for all environments, and use directories to separate them. This way, you can reuse the same Terraform modules and configurations across environments, and avoid code duplication and drift.You can also use Terraform workspaces to isolate the state files for each environment12.

For Anthos Config Management, it is recommended to use separate repositories for each environment, and use branches to separate the clusters within each environment. This way, you can enforce different policies and configurations for each environment, and use pull requests to promote changes across environments.You can also use Kustomize to create overlays for each cluster that apply specific patches or customizations34.

For application code, it is recommended to use separate repositories for each application, and use branches to separate the features or bug fixes for each application. This way, you can isolate the development and testing of each application, and use pull requests to merge changes into the main branch.You can also use tags or labels to trigger deployments to different environments5.

[References:, 1:Best practices for using Terraform | Google Cloud, 2: Terraform Recommended Practices - Part 1 | Terraform - HashiCorp Learn, 3:Deploy Anthos on GKE with Terraform part 1: GitOps with Config Sync | Google Cloud Blog, 4: Using Kustomize with Anthos Config Management | Anthos Config Management Documentation | Google Cloud, 5: Deploy Anthos on GKE with Terraform part 3: Continuous Delivery with Cloud Build | Google Cloud Blog, : GitOps-style continuous delivery with Cloud Build | Cloud Build Documentation | Google Cloud, , , , , ]

Question 17

You support a service that recently had an outage. The outage was caused by a new release that exhausted the service memory resources. You rolled back the release successfully to mitigate the impact on users. You are now in charge of the post-mortem for the outage. You want to follow Site Reliability Engineering practices when developing the post-mortem. What should you do?

Options:

Focus on developing new features rather than avoiding the outages from recurring.

Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.

Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.

Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.

Question 18

A third-party application needs to have a service account key to work properly When you try to export the key from your cloud project you receive an error "The organization policy constraint larn.disableServiceAccountKeyCreation is enforcedM You need to make the third-party application work while following Google-recommended security practices What should you do?

Options:

Enable the default service account key. and download the key

Remove the iam.disableServiceAccountKeyCreation policy at the organization level, and create a key.

Disable the service account key creation policy at the project's folder, and download the default key

Add a rule to set the iam.disableServiceAccountKeyCreation policy to off in your project and create a key.

Question 19

You currently store the virtual machine (VM) utilization logs in Stackdriver. You need to provide an easy-to-share interactive VM utilization dashboard that is updated in real time and contains information aggregated on a quarterly basis. You want to use Google Cloud Platform solutions. What should you do?

Options:

1. Export VM utilization logs from Stackdriver to BigOuery.2. Create a dashboard in Data Studio.3. Share the dashboard with your stakeholders.

1. Export VM utilization logs from Stackdriver to Cloud Pub/Sub.2. From Cloud Pub/Sub, send the logs to a Security Information and Event Management (SIEM) system.3. Build the dashboards in the SIEM system and share with your stakeholders.

1. Export VM utilization logs (rom Stackdriver to BigQuery.2. From BigQuery. export the logs to a CSV file.3. Import the CSV file into Google Sheets.4. Build a dashboard in Google Sheets and share it with your stakeholders.

1. Export VM utilization logs from Stackdriver to a Cloud Storage bucket.2. Enable the Cloud Storage API to pull the logs programmatically.3. Build a custom data visualization application.4. Display the pulled logs in a custom dashboard.

Question 20

Your organization wants to collect system logs that will be used to generate dashboards in Cloud Operations for their Google Cloud project. You need to configure all current and future Compute Engine instances to collect the system logs and you must ensure that the Ops Agent remains up to date. What should you do?

Options:

Use the gcloud CLI to install the Ops Agent on each VM listed in the Cloud Asset Inventory

Select all VMs with an Agent status of Not detected on the Cloud Operations VMs dashboard Then select Install agents

Use the gcloud CLI to create an Agent Policy.

Install the Ops Agent on the Compute Engine image by using a startup script

Question 21

Your company processes IOT data at scale by using Pub/Sub, App Engine standard environment, and an application written in GO. You noticed that the performance inconsistently degrades at peak load. You could not reproduce this issue on your workstation. You need to continuously monitor the application in production to identify slow paths in the code. You want to minimize performance impact and management overhead. What should you do?

Options:

Install a continuous profiling tool into Compute Engine. Configure the application to send profiling data to the tool.

Periodically run the go tool pprof command against the application instance. Analyze the results by using flame graphs.

Configure Cloud Profiler, and initialize the cloud.go@gle.com/go/profiler library in the application.

Use Cloud Monitoring to assess the App Engine CPU utilization metric.

Answer:

Explanation:

The correct answer is C. Configure Cloud Profiler, and initialize the cloud.google.com/go/profiler library in the application.

According to the Google Cloud documentation, Cloud Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications1. Cloud Profiler can help you identify slow paths in your code and optimize the performance of your applications. Cloud Profiler supports applications written in Go that run on App Engine standard environment2. To use Cloud Profiler, you need to configure it in your Google Cloud project and initialize the cloud.google.com/go/profiler library in your application code3. You can then use the Cloud Profiler interface to analyze the profiling data and visualize the results by using flame graphs4. Cloud Profiler has minimal performance impact and management overhead, as it only samples a small fraction of the application activity and does not require any additional infrastructure or agents.

The other options are incorrect because they do not meet the requirements of minimizing performance impact and management overhead. Option A is incorrect because it requires installing a continuous profiling tool into Compute Engine, which is an additional infrastructure that needs to be managed and maintained. Option B is incorrect because it requires periodically running the go tool pprof command against the application instance, which is a manual and disruptive process that can affect the application performance. Option D is incorrect because it only uses Cloud Monitoring to assess the App Engine CPU utilization metric, which is not enough to identify slow paths in the code or optimize the application performance.

[Reference:, Cloud Profiler documentation, Overview. Profiling Go applications, Supported environments. Profiling Go applications, Using Cloud Profiler. Analyzing data, Analyzing data., , , , , ]

Question 22

Your company recently migrated to Google Cloud. You need to design a fast, reliable, and repeatable solution for your company to provision new projects and basic resources in Google Cloud. What should you do?

Options:

Use the Google Cloud console to create projects.

Write a script by using the gcloud CLI that passes the appropriate parameters from the request. Save the script in a Git repository.

Write a Terraform module and save it in your source control repository. Copy and run the apply command to create the new project.

Use the Terraform repositories from the Cloud Foundation Toolkit. Apply the code with appropriate parameters to create the Google Cloud project and related resources.

Question 23

You need to deploy a new service to production. The service needs to automatically scale using a Managed Instance Group (MIG) and should be deployed over multiple regions. The service needs a large number of resources for each instance and you need to plan for capacity. What should you do?

Options:

Use the n2-highcpu-96 machine type in the configuration of the MIG.

Monitor results of Stackdriver Trace to determine the required amount of resources.

Validate that the resource requirements are within the available quota limits of each region.

Deploy the service in one region and use a global load balancer to route traffic to this region.

Question 24

You are the Operations Lead for an ongoing incident with one of your services. The service usually runs at around 70% capacity. You notice that one node is returning 5xx errors for all requests. There has also been a noticeable increase in support cases from customers. You need to remove the offending node from the load balancer pool so that you can isolate and investigate the node. You want to follow Google-recommended practices to manage the incident and reduce the impact on users. What should you do?

Options:

1. Communicate your intent to the incident team.2. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately.3. When any new nodes report healthy, drain traffic from the unhealthy node, and remove the unhealthy node from service.

1. Communicate your intent to the incident team.2. Add a new node to the pool, and wait for the new node to report as healthy.3. When traffic is being served on the new node, drain traffic from the unhealthy node, and remove the old node from service.

1 . Drain traffic from the unhealthy node and remove the node from service.2. Monitor traffic to ensure that the error is resolved and that the other nodes in the pool are handling the traffic appropriately.3. Scale the pool as necessary to handle the new load.4. Communicate your actions to the incident team.

1 . Drain traffic from the unhealthy node and remove the old node from service.2. Add a new node to the pool, wait for the new node to report as healthy, and then serve traffic to the new node.3. Monitor traffic to ensure that the pool is healthy and is handling traffic appropriately.4. Communicate your actions to the incident team.

Answer:

Explanation:

The correct answer is A. Communicate your intent to the incident team. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately. When any new nodes report healthy, drain traffic from the unhealthy node, and remove the unhealthy node from service.

This answer follows the Google-recommended practices for incident management, as described in the Chapter 9 - Incident Response, Google SRE Book1. According to this source, some of the best practices are:

Maintain a clear line of command. Designate clearly defined roles. Keep a working record of debugging and mitigation as you go. Declare incidents early and often.

Communicate your intent before taking any action that might affect the service or the incident response. This helps to avoid confusion, duplication of work, or unintended consequences.

Perform a load analysis before removing a node from the load balancer pool, as this might affect the capacity and performance of the service. Scale the pool as necessary to handle the expected load.

Drain traffic from the unhealthy node before removing it from service, as this helps to avoid dropping requests or causing errors for users.

Answer A follows these best practices by communicating the intent to the incident team, performing a load analysis and scaling the pool, and draining traffic from the unhealthy node before removing it.

Answer B does not follow the best practice of performing a load analysis before adding or removing nodes, as this might cause overloading or underutilization of resources.

Answer C does not follow the best practice of communicating the intent before taking any action, as this might cause confusion or conflict with other responders.

Answer D does not follow the best practice of draining traffic from the unhealthy node before removing it, as this might cause errors for users.

[References:, 1:Chapter 9 - Incident Response, Google SRE Book, , , , , ]

Question 25

You are creating and assigning action items in a postmodern for an outage. The outage is over, but you need to address the root causes. You want to ensure that your team handles the action items quickly and efficiently. How should you assign owners and collaborators to action items?

Options:

Assign one owner for each action item and any necessary collaborators.

Assign multiple owners for each item to guarantee that the team addresses items quickly

Assign collaborators but no individual owners to the items to keep the postmortem blameless.

Assign the team lead as the owner for all action items because they are in charge of the SRE team.

Question 26

Your team is designing a new application for deployment into Google Kubernetes Engine (GKE). You need to set up monitoring to collect and aggregate various application-level metrics in a centralized location. You want to use Google Cloud Platform services while minimizing the amount of work required to set up monitoring. What should you do?

Options:

Publish various metrics from the application directly to the Slackdriver Monitoring API, and then observe these custom metrics in Stackdriver.

Install the Cloud Pub/Sub client libraries, push various metrics from the application to various topics, and then observe the aggregated metrics in Stackdriver.

Install the OpenTelemetry client libraries in the application, configure Stackdriver as the export destination for the metrics, and then observe the application's metrics in Stackdriver.

Emit all metrics in the form of application-specific log messages, pass these messages from the containers to the Stackdriver logging collector, and then observe metrics in Stackdriver.

Question 27

You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an important subset of the application’s functionality – a data intensive reporting feature – is consistently failing with an HTTP 500 error. When you investigate your application’s dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing the backend’s persistent disk (PD). How you need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?

Options:

As the I/O wait times aggregated across all report generation backends

As the proportion of report generation requests that result in a successful response

As the application’s report generation queue size compared to a known-good threshold

As the reporting backend PD throughout capacity compared to a known-good threshold

Question 28

You are configuring your CI/CD pipeline natively on Google Cloud. You want builds in a pre-production Google Kubernetes Engine (GKE) environment to be automatically load-tested before being promoted to the production GKE environment. You need to ensure that only builds that have passed this test are deployed to production. You want to follow Google-recommended practices. How should you configure this pipeline with Binary Authorization?

Options:

Create an attestation for the builds that pass the load test by requiring the lead quality assurance engineer to sign the attestation by using a key stored in Cloud Key Management Service (Cloud KMS).

Create an attestation for the builds that pass the load test by using a private key stored in Cloud Key Management Service (Cloud KMS) authenticated through Workload Identity.

Create an attestation for the builds that pass the load test by using a private key stored in Cloud Key Management Service (Cloud KMS) with a service account JSON key stored as a Kubernetes Secret.

Create an attestation for the builds that pass the load test by requiring the lead quality assurance engineer to sign the attestation by using their personal private key.

Question 29

Your product is currently deployed in three Google Cloud Platform (GCP) zones with your users divided between the zones. You can fail over from one zone to another, but it causes a 10-minute service disruption for the affected users. You typically experience a database failure once per quarter and can detect it within five minutes. You are cataloging the reliability risks of a new real-time chat feature for your product. You catalog the following information for each risk:

• Mean Time to Detect (MUD} in minutes

• Mean Time to Repair (MTTR) in minutes

• Mean Time Between Failure (MTBF) in days

• User Impact Percentage

The chat feature requires a new database system that takes twice as long to successfully fail over between zones. You want to account for the risk of the new database failing in one zone. What would be the values for the risk of database failover with the new system?

Options:

MTTD: 5MTTR: 10MTBF: 90Impact: 33%

MTTD:5MTTR: 20MTBF: 90Impact: 33%

MTTD:5MTTR: 10MTBF: 90Impact 50%

MTTD:5MTTR: 20MTBF: 90Impact: 50%

Question 30

You are the on-call Site Reliability Engineer for a microservice that is deployed to a Google Kubernetes Engine (GKE) Autopilot cluster. Your company runs an online store that publishes order messages to Pub/Sub and a microservice receives these messages and updates stock information in the warehousing system. A sales event caused an increase in orders, and the stock information is not being updated quickly enough. This is causing a large number of orders to be accepted for products that are out of stock You check the metrics for the microservice and compare them to typical levels.

You need to ensure that the warehouse system accurately reflects product inventory at the time orders are placed and minimize the impact on customers What should you do?

Options:

Decrease the acknowledgment deadline on the subscription

Add a virtual queue to the online store that allows typical traffic levels

Increase the number of Pod replicas

Increase the Pod CPU and memory limits

Question 31

You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices. What should you do first?

Options:

Call individual stakeholders lo explain what happened.

Develop a post-mortem to be distributed to stakeholders.

Send the Incident State Document to all the stakeholders.

Require the engineer responsible to write an apology email to all stakeholders.

Question 32

You have deployed a fleet Of Compute Engine instances in Google Cloud. You need to ensure that monitoring metrics and logs for the instances are visible in Cloud Logging and Cloud Monitoring by your company's operations and cyber

security teams. You need to grant the required roles for the Compute Engine service account by using Identity and Access Management (IAM) while following the principle of least privilege. What should you do?

Options:

Grant the logging.editor and monitoring.metricwriter roles to the Compute Engine service accounts.

Grant the Logging. admin and monitoring . editor roles to the Compute Engine service accounts.

Grant the logging. logwriter and monitoring. editor roles to the Compute Engine service accounts.

Grant the logging. logWriter and monitoring. metricWriter roles to the Compute Engine service accounts.

Answer:

Explanation:

The correct answer is D. Grant the logging.logWriter and monitoring.metricWriter roles to the Compute Engine service accounts.

According to the Google Cloud documentation, the Compute Engine service account is a Google-managed service account that is automatically created when you enable the Compute Engine API1.This service account is used by default to run your Compute Engine instances and access other Google Cloud services on your behalf1.To ensure that monitoring metrics and logs for the instances are visible in Cloud Logging and Cloud Monitoring, you need to grant the following IAM roles to the Compute Engine service account23:

The logging.logWriter role allows the service account to write log entries to Cloud Logging4.

The monitoring.metricWriter role allows the service account to write custom metrics to Cloud Monitoring5.

These roles grant the minimum permissions that are needed for logging and monitoring, following the principle of least privilege. The other roles are either unnecessary or too broad for this purpose.For example, the logging.editor role grants permissions to create and update logs, log sinks, and log exclusions, which are not required for writing log entries6. The logging.admin role grants permissions to delete logs, log sinks, and log exclusions, which are not required for writing log entries and may pose a security risk if misused. The monitoring.editor role grants permissions to create and update alerting policies, uptime checks, notification channels, dashboards, and groups, which are not required for writing custom metrics.

[Reference:, Service accounts, Service accounts.Setting up Stackdriver Logging for Compute Engine, Setting up Stackdriver Logging for Compute Engine.Setting up Stackdriver Monitoring for Compute Engine, Setting up Stackdriver Monitoring for Compute Engine.Predefined roles, Predefined roles.Predefined roles, Predefined roles.Predefined roles, Predefined roles. [Predefined roles], Predefined roles. [Predefined roles], Predefined roles., , , , , ]

Question 33

The new version of your containerized application has been tested and is ready to be deployed to production on Google Kubernetes Engine (GKE) You could not fully load-test the new version in your pre-production environment and you need to ensure that the application does not have performance problems after deployment Your deployment must be automated What should you do?

Options:

Deploy the application through a continuous delivery pipeline by using canary deployments Use Cloud Monitoring to look for performance issues, and ramp up traffic as supported by the metrics

Deploy the application through a continuous delivery pipeline by using blue/green deployments Migrate traffic to the new version of the application and use Cloud Monitoring to look for performance issues

Deploy the application by using kubectl and use Config Connector to slowly ramp up traffic between versions. Use Cloud Monitoring to look for performance issues

Deploy the application by using kubectl and set the spec. updatestrategy. type field to RollingUpdate Use Cloud Monitoring to look for performance issues, and run the kubectl rollback command if there are any issues.

Question 34

You are creating Cloud Logging sinks to export log entries from Cloud Logging to BigQuery for future analysis Your organization has a Google Cloud folder named Dev that contains development projects and a folder named Prod that contains production projects Log entries for development projects must be exported to dev_dataset. and log entries for production projects must be exported to prod_datasetYou need to minimize the number of log sinks created and you want to ensure that the log sinks apply to future projects What should you do?

Options:

Create a single aggregated log sink at the organization level.

Create a log sink in each project

Create two aggregated log sinks at the organization level, and filter by project ID

Create an aggregated Iog sink in the Dev and Prod folders

Question 35

You need to run a business-critical workload on a fixed set of Compute Engine instances for several months. The workload is stable with the exact amount of resources allocated to it. You want to lower the costs for this workload without any performance implications. What should you do?

Options:

Purchase Committed Use Discounts.

Migrate the instances to a Managed Instance Group.

Convert the instances to preemptible virtual machines.

Create an Unmanaged Instance Group for the instances used to run the workload.

Question 36

Your team is designing a new application for deployment both inside and outside Google Cloud Platform (GCP). You need to collect detailed metrics such as system resource utilization. You want to use centralized GCP services while minimizing the amount of work required to set up this collection system. What should you do?

Options:

Import the Stackdriver Profiler package, and configure it to relay function timing data to Stackdriver for further analysis.

Import the Stackdriver Debugger package, and configure the application to emit debug messages with timing information.

Instrument the code using a timing library, and publish the metrics via a health check endpoint that is scraped by Stackdriver.

Install an Application Performance Monitoring (APM) tool in both locations, and configure an export to a central data storage location for analysis.

Question 37

Your company has a Google Cloud resource hierarchy with folders for production test and development Your cyber security team needs to review your company's Google Cloud security posture to accelerate security issue identification and resolution You need to centralize the logs generated by Google Cloud services from all projects only inside your production folder to allow for alerting and near-real time analysis. What should you do?

Options:

Enable the Workflows API and route all the logs to Cloud Logging

Create a central Cloud Monitoring workspace and attach all related projects

Create an aggregated log sink associated with the production folder that uses a Pub Sub topic as the destination

Create an aggregated log sink associated with the production folder that uses a Cloud Logging bucket as the destination

Question 38

Your company is developing applications that are deployed on Google Kubernetes Engine (GKE) Each team manages a different application You need to create the development and production environments for each team while you minimize costs Different teams should not be able to access other teams environments You want to follow Google-recommended practices What should you do?

Options:

Create one Google Cloud project per team In each project create a cluster for development and one forproduction Grant the teams Identity and Access Management (1AM) access to their respective clusters

Create one Google Cloud project per team In each project create a cluster with a Kubernetes namespacefor development and one for production Grant the teams Identity and Access Management (1AM) access to their respective clusters.

Create a development and a production GKE cluster in separate projects In each cluster create a Kubernetes namespace per team and then configure Identity-Aware Proxy so that each team can onlyaccess its own namespace

Create a development and a production GKE cluster in separate projects In each cluster create a Kubernetes namespace per team and then configure Kubernetes role-based access control (RBAC) so that each team can only access its own namespace

Question 39

You are designing a deployment technique for your applications on Google Cloud. As part Of your deployment planning, you want to use live traffic to gather performance metrics for new versions Ofyour applications. You need to test against the full production load before your applications are launched. What should you do?

Options:

Use A/B testing with blue/green deployment.

Use shadow testing with continuous deployment.

Use canary testing with continuous deployment.

Use canary testing with rolling updates deployment,

Question 40

You recently created a Cloud Build pipeline for deploying Terraform code stored in a GitHub repository. You make Terraform code changes in short-lived branches and sometimes use tags during development. You tag releases with a semantic version when they are ready for deployment. You require your pipeline to apply the Terraform code whenever there is a new release, and you need to minimize operational overhead. What should you do?

Options:

Create a build trigger with the * branch pattern.

Create a build trigger with the \d+\.\d+\.\d* tag pattern.

Create a build trigger with the .* tag pattern.

Create a build trigger with the \d*\.\d+\.\d* branch pattern.

Question 41

You are building and running client applications in Cloud Run and Cloud Functions Your client requires that all logs must be available for one year so that the client can import the logs into their logging service You must minimize required code changes What should you do?

Options:

Update all images in Cloud Run and all functions in Cloud Functions to send logs to both Cloud Logging andthe client's logging service Ensure that all the ports required to send logs are open in the VPC firewall

Create a Pub/Sub topic subscription and logging sink Configure the logging sink to send all logs into thetopic Give your client access to the topic to retrieve the logs

Create a storage bucket and appropriate VPC firewall rules Update all images in Cloud Run and allfunctions in Cloud Functions to send logs to a file within the storage bucket

Create a logs bucket and logging sink. Set the retention on the logs bucket to 365 days Configure thelogging sink to send logs to the bucket Give your client access to the bucket to retrieve the logs

Question 42

You support a stateless web-based API that is deployed on a single Compute Engine instance in the europe-west2-a zone . The Service Level Indicator (SLI) for service availability is below the specified Service Level Objective (SLO). A postmortem has revealed that requests to the API regularly time out. The time outs are due to the API having a high number of requests and running out memory. You want to improve service availability. What should you do?

Options:

Change the specified SLO to match the measured SLI.

Move the service to higher-specification compute instances with more memory.

Set up additional service instances in other zones and load balance the traffic between all instances.

Set up additional service instances in other zones and use them as a failover in case the primary instance is unavailable.

Question 43

You work for a global organization and are running a monolithic application on Compute Engine You need to select the machine type for the application to use that optimizes CPU utilization by using the fewest number of steps You want to use historical system metncs to identify the machine type for the application to use You want to follow Google-recommended practices What should you do?

Options:

Use the Recommender API and apply the suggested recommendations

Create an Agent Policy to automatically install Ops Agent in all VMs

Install the Ops Agent in a fleet of VMs by using the gcloud CLI

Review the Cloud Monitoring dashboard for the VM and choose the machine type with the lowest CPU utilization

Answer:

Explanation:

The best option for selecting the machine type for the application to use that optimizes CPU utilization by using the fewest number of steps is to use the Recommender API and apply the suggested recommendations. The Recommender API is a service that provides recommendations for optimizing your Google Cloud resources, such as Compute Engine instances, disks, and firewalls. You can use the Recommender API to get recommendations for changing the machine type of your Compute Engine instances based on historical system metrics, such as CPU utilization. You can also apply the suggested recommendations by using the Recommender API or Cloud Console. This way, you can optimize CPU utilization by using the most suitable machine type for your application with minimal effort.

Your CTO has asked you to implement a postmortem policy on every incident for internal use. You want to define what a good postmortem is to ensure that the policy is successful at your company. What should you do?

Choose 2 answers

Ensure that all postmortems include what caused the incident, identify the person or team responsible for

causing the incident. and how to prevent a future occurrence of the incident.

Ensure that all postmortems include what caused the incident, how the incident could have been worse, and how to prevent a future occurrence of the incident.

Ensure that all postmortems include the severity of the incident, how to prevent a future occurrence of the incident. and what caused the incident without naming internal system components.

Ensure that all postmortems include how the incident was resolved and what caused the incident without naming customer information.

Ensure that all postmortems include all incident participants in postmortem authoring and share postmortems as widely as possible,

Answer: BE

The correct answers are B and E.

A good postmortem should include what caused the incident, how the incident could have been worse, and how to prevent a future occurrence of the incident1. This helps to identify the root cause of the problem, the impact of the incident, and the actions to take to mitigate or eliminate the risk of recurrence.

A good postmortem should also include all incident participants in postmortem authoring and share postmortems as widely as possible2. This helps to foster a culture of learning and collaboration, as well as to increase the visibility and accountability of the incident response process.

Answer A is incorrect because it assigns blame to a person or team, which goes against the principle of blameless postmortems2. Blameless postmortems focus on finding solutions rather than pointing fingers, and encourage honest and constructive feedback without fear of punishment.

Answer C is incorrect because it omits how the incident could have been worse, which is an important factor to consider when evaluating the severity and impact of the incident1. It also avoids naming internal system components, which makes it harder to understand the technical details and root cause of the problem.

Answer D is incorrect because it omits how to prevent a future occurrence of the incident, which is the main goal of a postmortem1. It also avoids naming customer information, which may be relevant for understanding the impact and scope of the incident.

Your uses Jenkins running on Google Cloud VM instances for CI/CD. You need to extend the functionality to use infrastructure as code automation by using Terraform. You must ensure that the Terraform Jenkins instance is authorized to create Google Cloud resources. You want to follow Google-recommended practices- What should you do?

Add the auth application-default command as a step in Jenkins before running the Terraform commands.

Create a dedicated service account for the Terraform instance. Download and copy the secret key value to the GOOGLE environment variable on the Jenkins server.

Confirm that the Jenkins VM instance has an attached service account with the appropriate Identity and Access Management (IAM) permissions.

use the Terraform module so that Secret Manager can retrieve credentials.

Answer: C

The correct answer is C.

Confirming that the Jenkins VM instance has an attached service account with the appropriate Identity and Access Management (IAM) permissions is the best way to ensure that the Terraform Jenkins instance is authorized to create Google Cloud resources.This follows the Google-recommended practice of using service accounts to authenticate and authorize applications running on Google Cloud1.Service accounts are associated with private keys that can be used to generate access tokens for Google Cloud APIs2.By attaching a service account to the Jenkins VM instance, Terraform can use the Application Default Credentials (ADC) strategy to automatically find and use the service account credentials3.

Answer A is incorrect because the auth application-default command is used to obtain user credentials, not service account credentials.User credentials are not recommended for applications running on Google Cloud, as they are less secure and less scalable than service account credentials1.

Answer B is incorrect because it involves downloading and copying the secret key value of the service account, which is not a secure or reliable way of managing credentials.The secret key value should be kept private and not exposed to any other system or user2. Moreover, setting the GOOGLE environment variable on the Jenkins server is not a valid way of providing credentials to Terraform.Terraform expects the credentials to be either in a file pointed by the GOOGLE_APPLICATION_CREDENTIALS environment variable, or in a provider block with the credentials argument3.

Answer D is incorrect because it involves using the Terraform module for Secret Manager, which is a service that stores and manages sensitive data such as API keys, passwords, and certificates. While Secret Manager can be used to store and retrieve credentials, it is not necessary or sufficient for authorizing the Terraform Jenkins instance. The Terraform Jenkins instance still needs a service account with the appropriate IAM permissions to access Secret Manager and other Google Cloud resources.

You are analyzing Java applications in production. All applications have Cloud Profiler and Cloud Trace installed and configured by default. You want to determine which applications need performance tuning. What should you do?

Choose 2 answers

A. Examine the wall-clock time and the CPU time Of the application. If the difference is substantial, increase the CPU resource allocation.

B. Examine the wall-clock time and the CPU time of the application. If the difference is substantial, increase the memory resource allocation.

C. 17 Examine the wall-clock time and the CPU time of the application. If the difference is substantial, increase the local disk storage allocation.

D. O Examine the latency time, the wall-clock time, and the CPU time of the application. If the latency time is slowly burning down the error budget, and the difference between wall-clock time and CPU time is minimal, mark the application for optimization.

E. Examine the heap usage Of the application. If the usage is low, mark the application for optimization.

Answer: AD

The correct answers are A and D.

Examine the wall-clock time and the CPU time of the application. If the difference is substantial, increase the CPU resource allocation. This is a good way to determine if the application is CPU-bound, meaning that it spends more time waiting for the CPU than performing actual computation.Increasing the CPU resource allocation can improve the performance of CPU-bound applications1.

Examine the latency time, the wall-clock time, and the CPU time of the application. If the latency time is slowly burning down the error budget, and the difference between wall-clock time and CPU time is minimal, mark the application for optimization. This is a good way to determine if the application is I/O-bound, meaning that it spends more time waiting for input/output operations than performing actual computation.Increasing the CPU resource allocation will not help I/O-bound applications, and they may need optimization to reduce the number or duration of I/O operations2.

Answer B is incorrect because increasing the memory resource allocation will not help if the application is CPU-bound or I/O-bound. Memory allocation affects how much data the application can store and access in memory, but it does not affect how fast the application can process that data.

Answer C is incorrect because increasing the local disk storage allocation will not help if the application is CPU-bound or I/O-bound. Disk storage affects how much data the application can store and access on disk, but it does not affect how fast the application can process that data.

Answer E is incorrect because examining the heap usage of the application will not help to determine if the application needs performance tuning. Heap usage affects how much memory the application allocates for dynamic objects, but it does not affect how fast the application can process those objects. Moreover, low heap usage does not necessarily mean that the application is inefficient or unoptimized.

You deployed an application into a large Standard Google Kubernetes Engine (GKE) cluster. The application is stateless and multiple pods run at the same time. Your application receives inconsistent traffic. You need to ensure that the user experience remains consistent regardless of changes in traffic. and that the resource usage of the cluster is optimized.

What should you do?

Configure a cron job to scale the deployment on a schedule.

Configure a Horizontal Pod Autoscaler.

Configure a Vertical Pod Autoscaler.

Configure cluster autoscaling on the node pool.

Answer: B

Question 44

You encounter a large number of outages in the production systems you support. You receive alerts for all the outages that wake you up at night. The alerts are due to unhealthy systems that are automatically restarted within a minute. You want to set up a process that would prevent staff burnout while following Site Reliability Engineering practices. What should you do?

Options:

Eliminate unactionable alerts.

Create an incident report for each of the alerts.

Distribute the alerts to engineers in different time zones.

Redefine the related Service Level Objective so that the error budget is not exhausted.

Question 45

You support a high-traffic web application and want to ensure that the home page loads in a timely manner. As a first step, you decide to implement a Service Level Indicator (SLI) to represent home page request latency with an acceptable page load time set to 100 ms. What is the Google-recommended way of calculating this SLI?

Options:

Buckelize Ihe request latencies into ranges, and then compute the percentile at 100 ms.

Bucketize the request latencies into ranges, and then compute the median and 90th percentiles.

Count the number of home page requests that load in under 100 ms, and then divide by the total number of home page requests.

Count the number of home page requests that load in under 100 ms. and then divide by the total number of all web application requests.

Question 46

You support an application deployed on Compute Engine. The application connects to a Cloud SQL instance to store and retrieve data. After an update to the application, users report errors showing database timeout messages. The number of concurrent active users remained stable. You need to find the most probable cause of the database timeout. What should you do?

Options:

Check the serial port logs of the Compute Engine instance.

Use Stackdriver Profiler to visualize the resources utilization throughout the application.

Determine whether there is an increased number of connections to the Cloud SQL instance.

Use Cloud Security Scanner to see whether your Cloud SQL is under a Distributed Denial of Service (DDoS) attack.

Question 47

You are managing an application that runs in Compute Engine The application uses a custom HTTP server to expose an API that is accessed by other applications through an internal TCP/UDP load balancer A firewall rule allows access to the API port from 0.0.0-0/0. You need to configure Cloud Logging to log each IP address that accesses the API by using the fewest number of steps What should you do Bret?

Options:

Enable Packet Mirroring on the VPC

Install the Ops Agent on the Compute Engine instances.

Enable logging on the firewall rule

Enable VPC Flow Logs on the subnet

Question 48

Your company is using HTTPS requests to trigger a public Cloud Run-hosted service accessible at the .a.run.app URL You need to give developers the ability to test the latest revisions of the service before the service is exposed to customers What should you do?

Options:

Runthegcioud run deploy booking-engine —no-traffic —-ag dev command Use the https://dev----booking-engine-abcdef. a. run. app URL for testing

Runthegcioud run services update-traffic booking-engine —to-revisions LATEST*! command Use the ht tps: //booking-engine-abcdef. a. run. ape URL for testing

Pass the curl -K "Authorization: Hearer S(gclcud auth print-identity-token)" auth token Use the https: / /booking-engine-abcdef. a. run. app URL to test privately

Grant the roles/run. invoker role to the developers testing the booking-engine service Use the https: //booking-engine-abcdef. private. run. app URL for testing

Question 49

You use a multiple step Cloud Build pipeline to build and deploy your application to Google Kubernetes Engine (GKE). You want to integrate with a third-party monitoring platform by performing a HTTP POST of the build information to a webhook. You want to minimize the development effort. What should you do?

Options:

Add logic to each Cloud Build step to HTTP POST the build information to a webhook.

Add a new step at the end of the pipeline in Cloud Build to HTTP POST the build information to a webhook.

Use Stackdriver Logging to create a logs-based metric from the Cloud Buitd logs. Create an Alert with a Webhook notification type.

Create a Cloud Pub/Sub push subscription to the Cloud Build cloud-builds PubSub topic to HTTP POST the build information to a webhook.

Question 50

Your organization wants to implement Site Reliability Engineering (SRE) culture and principles. Recently, a service that you support had a limited outage. A manager on another team asks you to provide a formal explanation of what happened so they can action remediations. What should you do?

Options:

Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it with the manager only.

Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it on the engineering organization's document portal.

Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it with the manager only.

Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it on the engineering organization's document portal.

Question 51

You support a user-facing web application. When analyzing the application’s error budget over the previous six months, you notice that the application has never consumed more than 5% of its error budget in any given time window. You hold a Service Level Objective (SLO) review with business stakeholders and confirm that the SLO is set appropriately. You want your application’s SLO to more closely reflect its observed reliability. What steps can you take to further that goal while balancing velocity, reliability, and business needs? (Choose two.)

Options:

Add more serving capacity to all of your application’s zones.

Have more frequent or potentially risky application releases.

Tighten the SLO match the application’s observed reliability.

Implement and measure additional Service Level Indicators (SLIs) fro the application.

Announce planned downtime to consume more error budget, and ensure that users are not depending on a tighter SLO.

Question 52

You support a popular mobile game application deployed on Google Kubernetes Engine (GKE) across several Google Cloud regions. Each region has multiple Kubernetes clusters. You receive a report that none of the users in a specific region can connect to the application. You want to resolve the incident while following Site Reliability Engineering practices. What should you do first?

Options:

Reroute the user traffic from the affected region to other regions that don’t report issues.

Use Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected region.

Add an extra node pool that consists of high memory and high CPU machine type instances to the cluster.

Use Stackdriver Logging to filter on the clusters in the affected region, and inspect error messages in the logs.

Question 53

You are working with a government agency that requires you to archive application logs for seven years. You need to configure Stackdriver to export and store the logs while minimizing costs of storage. What should you do?

Options:

Create a Cloud Storage bucket and develop your application to send logs directly to the bucket.

Develop an App Engine application that pulls the logs from Stackdriver and saves them in BigQuery.

Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent storage for seven years.

Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived logs, and then select the bucket as the log export destination.

Question 54

You support an application running on GCP and want to configure SMS notifications to your team for the most critical alerts in Stackdriver Monitoring. You have already identified the alerting policies you want to configure this for. What should you do?

Options:

Download and configure a third-party integration between Stackdriver Monitoring and an SMS gateway. Ensure that your team members add their SMS/phone numbers to the external tool.

Select the Webhook notifications option for each alerting policy, and configure it to use a third-party integration tool. Ensure that your team members add their SMS/phone numbers to the external tool.

Ensure that your team members set their SMS/phone numbers in their Stackdriver Profile. Select the SMS notification option for each alerting policy and then select the appropriate SMS/phone numbers from the list.

Configure a Slack notification for each alerting policy. Set up a Slack-to-SMS integration to send SMS messages when Slack messages are received. Ensure that your team members add their SMS/phone numbers to the external integration.

Question 55

You use Cloud Build to build and deploy your application. You want to securely incorporate database credentials and other application secrets into the build pipeline. You also want to minimize the development effort. What should you do?

Options:

Create a Cloud Storage bucket and use the built-in encryption at rest. Store the secrets in the bucket and grant Cloud Build access to the bucket.

Encrypt the secrets and store them in the application repository. Store a decryption key in a separate repository and grant Cloud Build access to the repository.

Use client-side encryption to encrypt the secrets and store them in a Cloud Storage bucket. Store a decryption key in the bucket and grant Cloud Build access to the bucket.

Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and include them in your Cloud Build deployment configuration. Grant Cloud Build access to the KeyRing.

Question 56

You are configuring a CI pipeline. The build step for your CI pipeline integration testing requires access to APIs inside your private VPC network. Your security team requires that you do not expose API traffic publicly. You need to implement a solution that minimizes management overhead. What should you do?

Options:

Use Cloud Build private pools to connect to the private VPC.

Use Cloud Build to create a Compute Engine instance in the private VPC. Run the integration tests on the VM by using a startup script.

Use Cloud Build as a pipeline runner. Configure a cross-region internal Application Load Balancer for API access.

Use Cloud Build as a pipeline runner. Configure a global external Application Load Balancer with a Google Cloud Armor policy for API access.

Question 57

You support a large service with a well-defined Service Level Objective (SLO). The development team deploys new releases of the service multiple times a week. If a major incident causes the service to miss its SLO, you want the development team to shift its focus from working on features to improving service reliability. What should you do before a major incident occurs?

Options:

Develop an appropriate error budget policy in cooperation with all service stakeholders.

Negotiate with the product team to always prioritize service reliability over releasing new features.

Negotiate with the development team to reduce the release frequency to no more than once a week.

Add a plugin to your Jenkins pipeline that prevents new releases whenever your service is out of SLO.

Exam Detail

Vendor: Google

Certification: Cloud DevOps Engineer

Exam Code: Professional-Cloud-DevOps-Engineer

Exam Name: Google Cloud Certified - Professional Cloud DevOps Engineer Exam

Last Update: Aug 15, 2025

Professional-Cloud-DevOps-Engineer Question Answers

Weekend Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free and Premium Google Professional-Cloud-DevOps-Engineer Dumps Questions Answers

Google Cloud Certified - Professional Cloud DevOps Engineer Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer: