New Year Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Professional-Data-Engineer Premium Exam Questions

Google Professional Data Engineer Exam Questions and Answers

Question 29

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:

Executing the transformations on a schedule

Enabling non-developer analysts to modify transformations

Providing a graphical tool for designing transformations

What should you do?

Options:

A.

Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis

B.

Load each month’s CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query

C.

Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data’s schema changes

D.

Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery

Question 30

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern.

Which service do you select for storing and serving your data?

Options:

A.

Cloud Spanner

B.

Cloud Bigtable

C.

Cloud Firestore

D.

Cloud SQL

Question 31

You are using Cloud Bigtable to persist and serve stock market data for each of the major indices. To serve the trading application, you need to access only the most recent stock prices that are streaming in How should you design your row key and tables to ensure that you can access the data with the most simple query?

Options:

A.

Create one unique table for all of the indices, and then use the index and timestamp as the row key design

B.

Create one unique table for all of the indices, and then use a reverse timestamp as the row key design.

C.

For each index, have a separate table and use a timestamp as the row key design

D.

For each index, have a separate table and use a reverse timestamp as the row key design

Question 32

You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?

Options:

A.

Increase the cluster size with more non-preemptible workers.

B.

Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.

C.

Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.

D.

Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.