Month End Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Professional-Data-Engineer Leak Questions

Google Professional Data Engineer Exam Questions and Answers

Question 17

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

Options:

A.

Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.

B.

Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.

C.

Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.

D.

Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.

Question 18

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL ‘dataset.model’, table user_features). How should you create the ML pipeline?

Options:

A.

Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.

B.

Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.

C.

Create a Cloud Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.

D.

Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Cloud Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.

Question 19

You have a network of 1000 sensors. The sensors generate time series data: one metric per sensor per second, along with a timestamp. You already have 1 TB of data, and expect the data to grow by 1 GB every day You need to access this data in two ways. The first access pattern requires retrieving the metric from one specific sensor stored at a specific timestamp, with a median single-digit millisecond latency. The second access pattern requires running complex analytic queries on the data, including joins, once a day. How should you store this data?

Options:

A.

Store your data in Bigtable Concatenate the sensor ID and timestamp and use it as the row key Perform an export to BigQuery every day.

B.

Store your data in BigQuery Concatenate the sensor ID and timestamp. and use it as the primary key.

C.

Store your data in Bigtable Concatenate the sensor ID and metric, and use it as the row key Perform an export to BigQuery every day.

D.

Store your data in BigQuery. Use the metric as a primary key.

Question 20

Which of these statements about BigQuery caching is true?

Options:

A.

By default, a query's results are not cached.

B.

BigQuery caches query results for 48 hours.

C.

Query results are cached even if you specify a destination table.

D.

There is no charge for a query that retrieves its results from cache.