Month End Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free Professional-Data-Engineer Questions Attempt

Google Professional Data Engineer Exam Questions and Answers

Question 13

Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

Options:

A.

Set a single global window to capture all the data.

B.

Set sliding windows to capture all the lagged data.

C.

Use watermarks and timestamps to capture the lagged data.

D.

Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Question 14

Your company's data platform ingests CSV file dumps of booking and user profile data from upstream sources into Cloud Storage. The data analyst team wants to join these datasets on the email field available in both the datasets to perform analysis. However, personally identifiable information (PII) should not be accessible to the analysts. You need to de-identify the email field in both the datasets before loading them into BigQuery for analysts. What should you do?

Options:

A.

1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud Data Loss Prevention (Cloud DLP) with masking as the de-identification transformations type.

2. Load the booking and user profile data into a BigQuery table.

B.

1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud DLP with format-preserving encryption with FFX as the de-identification transformation type.

2. Load the booking and user profile data into a BigQuery table.

C.

1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.

2. Create a policy tag with the email mask as the data masking rule.

3. Assign the policy to the email field in both tables. A

4. Assign the Identity and Access Management bigquerydatapolicy.maskedReader role for the BigQuery tables to the analysts.

D.

1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.

2. Create a policy tag with the default masking value as the data masking rule.

3. Assign the policy to the email field in both tables.

4. Assign the Identity and Access Management bigquerydatapolicy.maskedReader role for the BigQuery tables to the analysts

Question 15

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub

streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

Options:

A.

They have not assigned the timestamp, which causes the job to fail

B.

They have not set the triggers to accommodate the data coming in late, which causes the job to fail

C.

They have not applied a global windowing function, which causes the job to fail when the pipeline is

created

D.

They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Question 16

You have uploaded 5 years of log data to Cloud Storage A user reported that some data points in the log data are outside of their expected ranges, which indicates errors You need to address this issue and be able to run the process again in the future while keeping the original data for compliance reasons. What should you do?

Options:

A.

Import the data from Cloud Storage into BigQuery Create a new BigQuery table, and skip the rows with errors.

B.

Create a Compute Engine instance and create a new copy of the data in Cloud Storage Skip the rows with errors

C.

Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to a new dataset in

Cloud Storage

D.

Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to the same dataset in Cloud Storage