Data Mining Test Questions

Mid Term Exam
15.062 Data Mining
Problem 1 (25 points)
For the following questions please give a True or False answer with one or two sentences in justification. 1.1 A linear regression model will be developed using a training data set. Adding variables to the model will always reduce the sum of squared residuals measured on the validation set.
1.2 Although forward selection and backward elimination are fast methods for subset selection in linear regression, only step-wise selection is guaranteed to find the best subset.
1.3 An analyst computes classification functions using discriminant analysis for a data set with three classes C1, C2 and C3. She assumes that all three classes are equally likely to arise in the application. She later learns that the probability of C1 is twice that of C2 and C3. The probabilities for C2 and C3 are equal. If she re-computes the classification functions using this information, the value of the classification function for C1 will increase for every data point.
1.4 A classification model's misclassification rate on the validation set is a better measure of the model's predictive ability on new data than its misclassification rate on the training set.
1.5 A neural net classifier for two classes constructs a separating boundary between the classes that is linear in weighted sums of the input values.

Problem 2 (10 points)
A dataset of 1000 cases was partitioned into a training set of 600 cases and a validation set of 400 cases. A k-Nearest Neighbors model with k=1 had a misclassification error rate of 8% on the validation data. It was subsequently found that the partitioning had been done incorrectly and that
100 cases from the training data set had been accidentally duplicated and had overwritten 100 cases in the validation dataset. What is the misclassification error rate for the 300 cases that were truly part of the validation data?

Problem 3 (10 points)
A Naïve Bayes classifier has been constructed with

Data Mining Test Questions

You May Also Find These Documents Helpful

BCMB230 exam 1 spring14

BCMB230 exam 1 spring14

Acc 422 Week 5 Wileyplus Assignment - Exercises Essay Example

Acc 422 Week 5 Wileyplus Assignment - Exercises Essay Example

Mat 540 Week 4

Mat 540 Week 4

ACC 422 Week 5 Individual

ACC 422 Week 5 Individual

Accounting For A Loss Contingency For A Verdict Overturned On Appeal

Accounting For A Loss Contingency For A Verdict Overturned On Appeal

Mgt 460 Final Exams

Mgt 460 Final Exams

MIS 535 Entire Course Managerial Application of Information Technology

MIS 535 Entire Course Managerial Application of Information Technology

Stateline Shipping and Transport Company

Stateline Shipping and Transport Company

Project Part 2 Task 1: Introduction And Business Impact Analysis Plan

Project Part 2 Task 1: Introduction And Business Impact Analysis Plan

Scor eStore.com

Scor eStore.com

Data Mining-East West Airlines

Data Mining-East West Airlines

Best Performing Model Essay

Best Performing Model Essay

Review Question for Stat

Review Question for Stat

Does M-Net Can Achieve A Satisfactory Result On The SOC Experiment?

Does M-Net Can Achieve A Satisfactory Result On The SOC Experiment?

The CRISP-DM Case Study

The CRISP-DM Case Study

Related Topics