Evgueni N. Smirnov smirnov@cs.unimaas.nl August 21, 2010
1. Introduction Given a data-mining problem, you need to have data that represent the problem, models that are suitable for the data, and of course a data-mining environment that contains the algorithms capable of learning these models. In this lab you will study two well-known classification problems. You will try to find classification models for these problems using decision trees and decision rules. The algorithms to learn these models are given in Weka, a data-mining environment that accompanies our course. You will study the explorer part of Weka to learn how to call decision-tree and decision-rule algorithms, how to evaluate the accuracy of the learned models, and how to use reduced error pruning.
2. Concept-Learning Problems In this lab you are expected to build classification models for two classification problems: • Labor-negotiation problem; • Soybean classification problem.
The data files for all the two problems are provided in the directory:
http://www.unimaas.nl/datamining/UCI/datasets-UCI.zip
3. Environment As stated above to build the desired classification models you will use Weka. Weka is a data-mining environment that contains a collection of machine-learning algorithms for solving real-world data-mining problems. The algorithms can either be applied directly or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License.
4. Algorithms To build the classifiers you will use four learning algorithms provided in Weka: 1. zeroR is a majority/average predictor. It assigns to each instance the classification of the