Winter Sale - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Google Cloud Certified Professional-Data-Engineer Release Date

Google Professional Data Engineer Exam Questions and Answers

Question 57

You are designing a messaging system by using Pub/Sub to process clickstream data with an event-driven consumer app that relies on a push subscription. You need to configure the messaging system that is reliable enough to handle temporary downtime of the consumer app. You also need the messaging system to store the input messages that cannot be consumed by the subscriber. The system needs to retry failed messages gradually, avoiding overloading the consumer app, and store the failed messages after a maximum of 10 retries in a topic. How should you configure the Pub/Sub subscription?

Options:

A.

Increase the acknowledgement deadline to 10 minutes.

B.

Use immediate redelivery as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

C.

Use exponential backoff as the subscription retry policy, and configure dead lettering to the same source topic with maximum delivery attempts set to 10.

D.

Use exponential backoff as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

Question 58

You are architecting a data transformation solution for BigQuery. Your developers are proficient with SOL and want to use the ELT development technique. In addition, your developers need an intuitive coding environment and the ability to manage SQL as code. You need to identify a solution for your developers to build these pipelines. What should you do?

Options:

A.

Use Cloud Composer to load data and run SQL pipelines by using the BigQuery job operators.

B.

Use Dataflow jobs to read data from Pub/Sub, transform the data, and load the data to BigQuery.

C.

Use Dataform to build, manage, and schedule SQL pipelines.

D.

Use Data Fusion to build and execute ETL pipelines

Question 59

You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

Options:

A.

Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery

B.

Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.

C.

Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQLqueries to transform the data, and then write the transformations to a new table

D.

Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery

Question 60

You are a BigQuery admin supporting a team of data consumers who run ad hoc queries and downstream reporting in tools such as Looker. All data and users are combined under a single organizational project. You recently noticed some slowness in query results and want to troubleshoot where the slowdowns are occurring. You think that there might be some job queuing or slot contention occurring as users run jobs, which slows down access to results. You need to investigate the query job information and determine where performance is being affected. What should you do?

Options:

A.

Use Cloud Monitoring to view BigQuery metrics and set up alerts that let you know when a certain percentage of slots were used.

B.

Use slot reservations for your project to ensure that you have enough query processing capacity and are able to allocate available slots to the slower queries.

C.

Use Cloud Logging to determine if any users or downstream consumers are changing or deleting access grants on tagged resources.

D.

Use available administrative resource charts to determine how slots are being used and how jobs are performing over time. Run a query on the INFORMATION_SCHEMA to review query performance.