Labour Day Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

SAS Institute A00-240 Dumps

Page: 1 / 4
Total 99 questions

SAS Statistical Business Analysis SAS9: Regression and Model Questions and Answers

Question 1

This question will ask you to provide a missing option.

Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure:

means Region / =levene ;

Options:

A.

test

B.

adjust

C.

var

D.

hovtest

Question 2

An analyst knows that the categorical predictor, zip_code, is an important predictor of a binary target. However, zip_code has too many levels to be a feasible predictor in a model. The analyst uses PROC CLUSTER to implement Greenacre's method to reduce the number of categorical levels.

What is the correct application of Greenacre's method in this situation?

Options:

A.

Clustering the levels using the target proportion for each zip_code as input.

B.

Clustering the levels using the zip_code values as input.

C.

Clustering the levels using the number of cases in each zip_code as input.

D.

Clustering the levels using dummy coded zip_code levels as inputs.

Question 3

When mean imputation is performed on data after the data is partitioned for honest assessment, what is the most appropriate method for handling the mean imputation?

Options:

A.

The sample means from the validation data set are applied to the training and test data sets.

B.

The sample means from the training data set are applied to the validation and test data sets.

C.

The sample means from the test data set are applied to the training and validation data sets.

D.

The sample means from each partition of the data are applied to their own partition.

Question 4

Refer to the lift chart:

What does the reference line at lift = 1 corresponds to?

Options:

A.

The predicted lift for the best 50% of validation data cases

B.

The predicted lift if the entire population is scored as event cases

C.

The predicted lift if none of the population are scored as event cases

D.

The predicted lift if 50% of the population are randomly scored as event cases

Question 5

The PROC LOGISTIC options SELECTION=SCORE and BEST=2 are used in a MODEL statement to generate a series of predictive models. The models are assigned numbers in order from 1 to 99 reflecting the fact that there are 50 candidate input variables. Results from the collection of derived models are used to generate the following plot of overall average profit by model number. Results are restricted to models with at least 9 inputs and at most 40 inputs.

The maximum value for the training data occurs for model number 46, and the maximum value for the validation data occurs for model number 43.

If you base model selection solely on overall average profit, what is the correct choice?

Options:

A.

Select model 46

B.

Select model 43

C.

Select model 45

D.

Select model 21

Question 6

The SAS data set RESULT contains the following variables:

  • Region (GrpA or GrpB)
  • Sales (dollars per year)

Which SAS programs can be used to find the p-value for comparing GrpA sales with GrpB sales? (Choose two.)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Question 7

An analyst has a sufficient volume of data to perform a 3-way partition of the data into training, validation, and test sets to perform honest assessment during the model building process.

What is the purpose of the training data set?

Options:

A.

To provide an unbiased measure of assessment for the final model.

B.

To compare models and select and fine-tune the final model.

C.

To reduce total sample size to make computations more efficient.

D.

To build the predictive models.

Question 8

A predictive model uses a data set that has several variables with missing values.

What two problems can arise with this model? (Choose two.)

Options:

A.

The model will likely be overfit.

B.

There will be a high rate of collinearity among input variables.

C.

Complete case analysis means that fewer observations will be used in the model building process.

D.

New cases with missing values on input variables cannot be scored without extra data processing.

Question 9

The selection criterion used in the forward selection method in the REG procedure is:

Options:

A.

Adjusted R-Square

B.

SLE

C.

Mallows' Cp

D.

AIC

Question 10

Which characteristic of Studentized residuals indicate potential outliers?

Options:

A.

Only studentized residuals greater than negative two

B.

Only studentized residuals less than negative two and greater than two

C.

Only studentized residuals greater than two

D.

Only studentized residuals less than two and greater than negative two

Question 11

Refer to the confusion matrix:

Calculate the sensitivity. (0 - negative outcome, 1 - positive outcome)

Click the calculator button to display a calculator if needed.

Options:

A.

25/48

B.

58/102

C.

25/B9

D.

58/81

Question 12

Refer to the exhibit.

Given alpha=0.02, which conclusion is justified regarding percentage of body fat, comparing small (S), medium (M), and large (L) wrist sizes?

Options:

A.

Medium wrist size is significantly different than small wrist size.

B.

Large wrist size is significantly different than medium wrist size.

C.

Large wrist size is significantly different than small wrist size.

D.

There is no significant difference due to wrist size.

Question 13

Select the equivalent LOGISTIC procedure model statements. (Choose two.)

Options:

A.

Mode1 Purchase * Gender Age Region;

B.

Mode1 Purchase * Gender | Age | Region;

C.

Mode1 Purchase * Gender|Age|Region @1;

D.

Mode1 Purchase * Gender|Age|Region @2;

Question 14

What is the default method in the LOGISTIC procedure to handle observations with missing data?

Options:

A.

Missing values are imputed.

B.

Parameters are estimated accounting for the missing values.

C.

Parameter estimates are made on all available data.

D.

Only cases with variables that are fully populated are used.

Page: 1 / 4
Total 99 questions