Google Cloud Certified Professional-Data-Engineer Release Date

Google Professional Data Engineer Exam Questions and Answers

Question 57

You are designing a messaging system by using Pub/Sub to process clickstream data with an event-driven consumer app that relies on a push subscription. You need to configure the messaging system that is reliable enough to handle temporary downtime of the consumer app. You also need the messaging system to store the input messages that cannot be consumed by the subscriber. The system needs to retry failed messages gradually, avoiding overloading the consumer app, and store the failed messages after a maximum of 10 retries in a topic. How should you configure the Pub/Sub subscription?

Options:

Increase the acknowledgement deadline to 10 minutes.

Use immediate redelivery as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

Use exponential backoff as the subscription retry policy, and configure dead lettering to the same source topic with maximum delivery attempts set to 10.

Use exponential backoff as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

Question 58

You are architecting a data transformation solution for BigQuery. Your developers are proficient with SOL and want to use the ELT development technique. In addition, your developers need an intuitive coding environment and the ability to manage SQL as code. You need to identify a solution for your developers to build these pipelines. What should you do?

Options:

Use Cloud Composer to load data and run SQL pipelines by using the BigQuery job operators.

Use Dataflow jobs to read data from Pub/Sub, transform the data, and load the data to BigQuery.

Use Dataform to build, manage, and schedule SQL pipelines.

Use Data Fusion to build and execute ETL pipelines

Question 59

You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

Options:

Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery

Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.

Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQLqueries to transform the data, and then write the transformations to a new table

Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery

Question 60

You are a BigQuery admin supporting a team of data consumers who run ad hoc queries and downstream reporting in tools such as Looker. All data and users are combined under a single organizational project. You recently noticed some slowness in query results and want to troubleshoot where the slowdowns are occurring. You think that there might be some job queuing or slot contention occurring as users run jobs, which slows down access to results. You need to investigate the query job information and determine where performance is being affected. What should you do?

Options:

Use Cloud Monitoring to view BigQuery metrics and set up alerts that let you know when a certain percentage of slots were used.

Use slot reservations for your project to ensure that you have enough query processing capacity and are able to allocate available slots to the slower queries.

Use Cloud Logging to determine if any users or downstream consumers are changing or deleting access grants on tagged resources.

Use available administrative resource charts to determine how slots are being used and how jobs are performing over time. Run a query on the INFORMATION_SCHEMA to review query performance.

Answer:

Explanation:

To troubleshoot query performance issues related to job queuing or slot contention in BigQuery, using administrative resource charts along with querying the INFORMATION_SCHEMA is the best approach. Here’s why option D is the best choice:

Administrative Resource Charts:

BigQuery provides detailed resource charts that show slot usage and job performance over time. These charts help identify patterns of slot contention and peak usage times.

INFORMATION_SCHEMA Queries:

The INFORMATION_SCHEMA tables in BigQuery provide detailed metadata about query jobs, including execution times, slots consumed, and other performance metrics.

Running queries on INFORMATION_SCHEMA allows you to pinpoint specific jobs causing contention and analyze their performance characteristics.

Comprehensive Analysis:

Combining administrative resource charts with detailed queries on INFORMATION_SCHEMA provides a holistic view of the system’s performance.

This approach enables you to identify and address the root causes of performance issues, whether they are due to slot contention, inefficient queries, or other factors.

Steps to Implement:

Access Administrative Resource Charts:

Use the Google Cloud Console to view BigQuery’s administrative resource charts. These charts provide insights into slot utilization and job performance metrics over time.

Run INFORMATION_SCHEMA Queries:

Execute queries on BigQuery’s INFORMATION_SCHEMA to gather detailed information about job performance. For example:

SELECT

creation_time,

job_id,

user_email,

query,

total_slot_ms / 1000 AS slot_seconds,

total_bytes_processed / (1024 * 1024 * 1024) AS processed_gb,

total_bytes_billed / (1024 * 1024 * 1024) AS billed_gb

FROM

`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE

creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)

AND state = 'DONE'

ORDER BY

slot_seconds DESC

LIMIT 100;

Analyze and Optimize:

Use the information gathered to identify bottlenecks, optimize queries, and adjust resource allocations as needed to improve performance.

Reference Links:

Monitoring BigQuery Slots

BigQuery INFORMATION_SCHEMA

BigQuery Performance Best Practices

Winter Sale - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Google Cloud Certified Professional-Data-Engineer Release Date

Google Professional Data Engineer Exam Questions and Answers

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

CompTIA

Fortinet

Microsoft

Salesforce