Independence Day Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Databricks Databricks-Certified-Professional-Data-Scientist Dumps

Databricks Certified Professional Data Scientist Exam Questions and Answers

Question 1

Refer to the exhibit.

You are building a decision tree. In this exhibit, four variables are listed with their respective values of info-gain.

Based on this information, on which attribute would you expect the next split to be in the decision tree?

Options:

A.

Credit Score

B.

Age

C.

Income

D.

Gender

Question 2

Select the correct problems which can be solved using SVMs

Options:

A.

SVMs are helpful in text and hypertext categorization

B.

Classification of images can also be performed using SVMs

C.

SVMs are also useful in medical science to classify proteins with up to 90% of the compounds classified correctly

D.

Hand-written characters can be recognized using SVM

Question 3

Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is...

Options:

A.

L2 is the sum of the square of the weights, while L1 is just the sum of the weights

B.

L1 is the sum of the square of the weights, while L2 is just the sum of the weights

C.

L1 gives Non-sparse output while L2 gives sparse outputs

D.

None of the above

Question 4

You have modeled the datasets with 5 independent variables called A,B,C,D and E having relationships which is not dependent each other, and also the variable A,B and C are continuous and variable D and E are discrete (mixed mode).

Now you have to compute the expected value of the variable let say A, then which of the following computation you will prefer

Options:

A.

Integration

B.

Differentiation

C.

Transformation

D.

Generalization

Question 5

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.

Above is an example of

Options:

A.

Linear Regression

B.

Logistic Regression

C.

Recommendation system

D.

Maximum likelihood estimation

E.

Hierarchical linear models

Question 6

Select the correct option from the below

Options:

A.

If you're trying to predict or forecast a target value^ then you need to look into supervised learning.

B.

If you've chosen supervised learning, with discrete target value like Yes/No. 1/2/3, A/B/C: or Red/Yellow/Black, then look into classification.

C.

If the target value can take on a number of values, say any value from 0.00 to 100.00, or -999 to 999: or +_to -_, then you need to look unsupervised learning

D.

If you're not trying to predict a target value, then you need to look into unsupervised learning

E.

Are you trying to fit your data into some discrete groups? If so and that's all you need, you should look into clustering.

Question 7

Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.

Above is an example of

Options:

A.

Linear Regression

B.

Logistic Regression

C.

Recommendation system

D.

Maximum likelihood estimation

E.

Hierarchical linear models

Question 8

You are working on a problem where you have to predict whether the claim is done valid or not. And you find that most of the claims which are having spelling errors as well as corrections in the manually filled claim forms compare to the honest claims. Which of the following technique is suitable to find out whether the claim is valid or not?

Options:

A.

Naive Bayes

B.

Logistic Regression

C.

Random Decision Forests

D.

Any one of the above

Question 9

A website is opened 3 times by a user. What is the probability of he clicks 2 times the advertisement, is best calculated by

Options:

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

Question 10

Scenario: Suppose that Bob can decide to go to work by one of three modes of transportation,

car, bus, or commuter train. Because of high traffic, if he decides to go by car. there is a 50% chance he will be late. If he goes by bus, which has special reserved lanes but is sometimes overcrowded, the probability of being late is only 20%. The commuter train is almost never late, with a probability of only 1 %, but is more expensive than the bus.

Suppose that Bob is late one day, and his boss wishes to estimate the probability that he drove to work that day by car. Since he does not know Which mode of transportation Bob usually uses, he gives a prior probability of 1 3 to each of the three possibilities. Which of the following method the boss will use to estimate of the probability that Bob drove to work?

Options:

A.

Naive Bayes

B.

Linear regression

C.

Random decision forests

D.

None of the above

Question 11

What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?

Options:

A.

1/3

B.

2/3

C.

1/6

D.

2/6

Question 12

A denote the event 'student is female' and let B denote the event 'student is French'. In a class of 100 students suppose 60 are French, and suppose that 10 of the French students are females. Find the probability that if I pick a French student, it will be a girl, that is, find P(A|B).

Options:

A.

1/3

B.

2/3

C.

1/6

D.

2/6

Question 13

Support vector machines (SVMs) are a set of supervised learning methods used for

Options:

A.

Linear classification

B.

Non-linear classification

C.

Regression

Question 14

A bio-scientist is working on the analysis of the cancer cells. To identify whether the cell is cancerous or not, there has been hundreds of tests are done with small variations to say yes to the problem. Given the test result for a sample of healthy and cancerous cells, which of the following technique you will use to determine whether a cell is healthy?

Options:

A.

Linear regression

B.

Collaborative filtering

C.

Naive Bayes

D.

Identification Test

Question 15

Suppose you have been given two Random Variables X and Y, whose joint distribution is already known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known. So how do you calculate the marginal distribution of X

Options:

A.

This is typically calculated by summing the joint probability distribution over Y.

B.

This is typically calculated by integrating the joint probability distribution over Y

C.

This is typically calculated by summing (In case of discrete variable) the joint probability distribution over Y

D.

This is typically calculated by integrating(ln case of continuous variable) the joint probability distribution over Y.

Question 16

Select the correct option which applies to L2 regularization

Options:

A.

Computational efficient due to having analytical solutions

B.

Non-sparse outputs

C.

No feature selection

Question 17

In which lifecycle stage are appropriate analytical techniques determined?

Options:

A.

Model planning

B.

Model building

C.

Data preparation

D.

Discovery

Question 18

Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several variables that may be......

Options:

A.

Numerical

B.

Categorical

C.

Both 1 and 2 are correct

D.

None of the 1 and 2 are correct

Question 19

If E1 and E2 are two events, how do you represent the conditional probability given that E2 occurs given that E1 has occurred?

Options:

A.

P(E1)/P(E2)

B.

P(E1+E2)/P(E1)

C.

P(E2)/P(E1)

D.

P(E2)/(P(E1+E2)

Question 20

Which of the following metrics are useful in measuring the accuracy and quality of a recommender system?

Options:

A.

Cluster Density

B.

Support Vector Count

C.

Mean Absolute Error

D.

Sum of Absolute Errors