Labour Day Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Cloudera CCA175 Dumps

Page: 1 / 4
Total 96 questions

CCA Spark and Hadoop Developer Exam Questions and Answers

Question 1

Problem Scenario 45 : You have been given 2 files , with the content as given Below

(spark12/technology.txt)

(spark12/salary.txt)

(spark12/technology.txt)

first,last,technology

Amit,Jain,java

Lokesh,kumar,unix

Mithun,kale,spark

Rajni,vekat,hadoop

Rahul,Yadav,scala

(spark12/salary.txt)

first,last,salary

Amit,Jain,100000

Lokesh,kumar,95000

Mithun,kale,150000

Rajni,vekat,154000

Rahul,Yadav,120000

Write a Spark program, which will join the data based on first and last name and save the joined results in following format, first Last.technology.salary

Options:

Question 2

Problem Scenario 73 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}

{"first_name":"Amir", "last_name":"Khan"}

{"first_name":"Rajesh", "last_name":"Khanna"}

{"first_name":"Priynka", "last_name":"Chopra"}

{"first_name":"Kareena", "last_name":"Kapoor"}

{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json file locally.

2. Load this file on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

Options:

Question 3

Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.

Data

echo "IBM,100,20160104" >> /tmp/spooldir/bb/.bb.txt

echo "IBM,103,20160105" >> /tmp/spooldir/bb/.bb.txt

mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt

After few mins

echo "IBM,100.2,20160104" >> /tmp/spooldir/dr/.dr.txt

echo "IBM,103.1,20160105" >> /tmp/spooldir/dr/.dr.txt

mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt

Requirements:

You have been given below directory location (if not available than create it) /tmp/spooldir . You have a finacial subscription for getting stock prices from BloomBerg as well as

Reuters and using ftp you download every hour new files from their respective ftp site in directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.

As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume/finance location in a single directory.

Write a flume configuration file named flume7.conf and use it to load data in hdfs with following additional properties .

1. Spool /tmp/spooldir/bb and /tmp/spooldir/dr

2. File prefix in hdfs sholuld be events

3. File suffix should be .log

4. If file is not commited and in use than it should have _ as prefix.

5. Data should be written as text to hdfs

Options:

Question 4

Problem Scenario 46 : You have been given belwo list in scala (name,sex,cost) for each work done.

List( ("Deeapak" , "male", 4000), ("Deepak" , "male", 2000), ("Deepika" , "female", 2000),("Deepak" , "female", 2000), ("Deepak" , "male", 1000) , ("Neeta" , "female", 2000))

Now write a Spark program to load this list as an RDD and do the sum of cost for combination of name and sex (as key)

Options:

Question 5

Problem Scenario 74 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Columns of order table : (orderjd , order_date , ordercustomerid, order status}

Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price)

Please accomplish following activities.

1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory p89_orders and p89_order_items .

2. Join these data using orderjd in Spark and Python

3. Now fetch selected columns from joined data Orderld, Order date and amount collected on this order.

4. Calculate total order placed for each date, and produced the output sorted by date.

Options:

Question 6

Problem Scenario 16 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish below assignment.

1. Create a table in hive as below.

create table departments_hive(department_id int, department_name string);

2. Now import data from mysql table departments to this hive table. Please make sure that data should be visible using below hive command, select" from departments_hive

Options:

Question 7

Problem Scenario 12 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following.

1. Create a table in retailedb with following definition.

CREATE table departments_new (department_id int(11), department_name varchar(45), created_date T1MESTAMP DEFAULT NOW());

2. Now isert records from departments table to departments_new

3. Now import data from departments_new table to hdfs.

4. Insert following 5 records in departmentsnew table. Insert into departments_new values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null); Insert into departments_new values(112, "Automobile" , null); Insert into departments_new values(113, "Pharma" , null);

Insert into departments_new values(114, "Social Engineering" , null);

5. Now do the incremental import based on created_date column.

Options:

Question 8

Problem Scenario 55 : You have been given below code snippet.

val pairRDDI = sc.parallelize(List( ("cat",2), ("cat", 5), ("book", 4),("cat", 12))) val pairRDD2 = sc.parallelize(List( ("cat",2), ("cup", 5), ("mouse", 4),("cat", 12)))

operation1

Write a correct code snippet for operationl which will produce desired output, shown below.

Array[(String, (Option[lnt], Option[lnt]))] = Array((book,(Some(4},None)), (mouse,(None,Some(4))), (cup,(None,Some(5))), (cat,(Some(2),Some(2)), (cat,(Some(2),Some(12))), (cat,(Some(5),Some(2))), (cat,(Some(5),Some(12))), (cat,(Some(12),Some(2))), (cat,(Some(12),Some(12)))J

Options:

Question 9

Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.

1. Select Maximum, minimum, average , Standard Deviation, and total quantity.

2. Select minimum and maximum price for each product code.

3. Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standard deviation will have maximum two decimal values.

4. Select all the product code and average price only where product count is more than or equal to 3.

5. Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.

Options:

Question 10

Problem Scenario 54 : You have been given below code snippet.

val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"))

val b = a.map(x => (x.length, x))

operation1

Write a correct code snippet for operationl which will produce desired output, shown below.

Array[(lnt, String)] = Array((4,lion), (7,panther), (3,dogcat), (5,tigereagle))

Options:

Question 11

Problem Scenario 10 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following.

1. Create a database named hadoopexam and then create a table named departments in it, with following fields. department_id int,

department_name string

e.g. location should be hdfs://quickstart.cloudera:8020/user/hive/warehouse/hadoopexam.db/departments

2. Please import data in existing table created above from retaidb.departments into hive table hadoopexam.departments.

3. Please import data in a non-existing table, means while importing create hive table named hadoopexam.departments_new

Options:

Question 12

Problem Scenario 80 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.products

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Columns of products table : (product_id | product_category_id | product_name | product_description | product_price | product_image )

Please accomplish following activities.

1. Copy "retaildb.products" table to hdfs in a directory p93_products

2. Now sort the products data sorted by product price per category, use productcategoryid colunm to group by category

Options:

Question 13

Problem Scenario 77 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Columns of order table : (orderid , order_date , order_customer_id, order_status)

Columns of ordeMtems table : (order_item_id , order_item_order_ld , order_item_product_id, order_item_quantity,order_item_subtotal,order_ item_product_price)

Please accomplish following activities.

1. Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92 order items .

2. Join these data using orderid in Spark and Python

3. Calculate total revenue perday and per order

4. Calculate total and average revenue for each date. - combineByKey

-aggregateByKey

Options:

Question 14

Problem Scenario 59 : You have been given below code snippet.

val x = sc.parallelize(1 to 20)

val y = sc.parallelize(10 to 30) operationl

z.collect

Write a correct code snippet for operationl which will produce desired output, shown below. Array[lnt] = Array(16,12, 20,13,17,14,18,10,19,15,11)

Options:

Page: 1 / 4
Total 96 questions