Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Google Cloud Certified Professional-Data-Engineer Release Date

Google Professional Data Engineer Exam Questions and Answers

Question 57

You need to modernize your existing on-premises data strategy. Your organization currently uses.

• Apache Hadoop clusters for processing multiple large data sets, including on-premises Hadoop Distributed File System (HDFS) for data replication.

• Apache Airflow to orchestrate hundreds of ETL pipelines with thousands of job steps.

You need to set up a new architecture in Google Cloud that can handle your Hadoop workloads and requires minimal changes to your existing orchestration processes. What should you do?

Options:

A.

Use Dataproc to migrate Hadoop clusters to Google Cloud, and Cloud Storage to handle any HDFS use cases Convert your ETL pipelines to Dataflow.

B.

Use Bigtable for your large workloads, with connections to Cloud Storage to handle any HDFS use cases Orchestrate your pipelines with Cloud Composer.

C.

Use Dataproc to migrate your Hadoop clusters to Google Cloud, and Cloud Storage to handle any HDFS use cases. Use Cloud Data Fusion to visually design and deploy your ETL pipelines.

D.

Use Dataproc to migrate Hadoop clusters to Google Cloud, and Cloud Storage to handle any HDFS use cases.Orchestrate your pipelines with Cloud Composer..

Question 58

You are building a teal-lime prediction engine that streams files, which may contain Pll (personal identifiable information) data, into Cloud Storage and eventually into BigQuery You want to ensure that the sensitive data is masked but still maintains referential Integrity, because names and emails are often used as join keys How should you use the Cloud Data Loss Prevention API (DLP API) to ensure that the Pll data is not accessible by unauthorized individuals?

Options:

A.

Create a pseudonym by replacing the Pll data with cryptogenic tokens, and store the non-tokenized data in a locked-down button.

B.

Redact all Pll data, and store a version of the unredacted data in a locked-down bucket

C.

Scan every table in BigQuery, and mask the data it finds that has Pll

D.

Create a pseudonym by replacing Pll data with a cryptographic format-preserving token

Question 59

You need to deploy additional dependencies to all of a Cloud Dataproc cluster at startup using an existing initialization action. Company security policies require that Cloud Dataproc nodes do not have access to the Internet so public initialization actions cannot fetch resources. What should you do?

Options:

A.

Deploy the Cloud SQL Proxy on the Cloud Dataproc master

B.

Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet

C.

Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter

D.

Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role

Question 60

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have

asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

Options:

A.

Increase the CPU size on your server.

B.

Increase the size of the Google Persistent Disk on your server.

C.

Increase your network bandwidth from your datacenter to GCP.

D.

Increase your network bandwidth from Compute Engine to Cloud Storage.