Theoretical and Empirical Analysis of ReliefF and RReliefF
ˇ Marko Robnik-Sikonja (marko.robnik@fri.uni-lj.si) Igor Kononenko (igor.kononenko@fri.uni-lj.si)
University of Ljubljana, Faculty of Computer and Information Science, Trˇ aˇka 25, z s 1001 Ljubljana, Slovenia tel.: + 386 1 4768386 fax: + 386 1 4264647 Abstract. Relief algorithms are general and successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation in regression and classification. In addition, their quality estimates have a natural interpretation. While they have commonly been viewed as feature subset selection methods that are applied in prepossessing step before a model is learned, they have actually been used successfully in a variety of settings, e.g., to select splits or to guide constructive induction in the building phase of decision or regression tree learning, as the attribute weighting method and also in the inductive logic programming. A broad spectrum of successful uses calls for especially careful investigation of various features Relief algorithms have. In this paper we theoretically and empirically investigate and discuss how and why they work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them. Keywords: attribute estimation, feature selection, Relief algorithm, classification, regression
2
ˇ Robnik Sikonja and Kononenko
1. Introduction A problem of estimating the quality of attributes (features) is an important issue in the machine learning. There are several important tasks in the process of machine learning e.g., feature subset selection, constructive