95-791 Spring 2013
Lecture #8 Predictive analytics: Regression
Artur Dubrawski awd@cs.cmu.edu This unit
• Good-old correlation scores revisited • Locally weighted regression
– As an approximator of non-linear functions – As a framework for active/purposive acquisition of data
95-791 Data Mining
Lecture #8 Slide 2
Copyright © 2000-2013 Artur Dubrawski
Correlational scores of association between attributes of data
• • • • Linear Rank Quadratic ….
Would not it be great to have an universal formula for computing correlations of all types, no matter how complex were the underlying models (linear, quadratic, …, any kind)... hmmmm… life would be so much more fulfilling then…
95-791 Data Mining Lecture #8 Slide 3 Copyright © 2000-2013 Artur Dubrawski
Correlation coefficient generalized
• Idea: take your data and apply some function approximator to it (e.g. fit some regression model to it), and compute the following:
R2 1 ˆ y i 1 N i 1 N i
yi i 2
y
ˆ y, : from data, y : predicted
2
Using linear regression to predict
Basically, to predict we can use:
yi ? linear correlation
Using quadratic regression? quadratic correlation multiple regression, any kind of non-linear regression, any other function approximator we like, and we should still be able to compute the corresponding correlation coefficient. Life is perfect!
95-791 Data Mining Lecture #8 Slide 4 Copyright © 2000-2013 Artur Dubrawski
Generalized correlation total variation = explained variation + unexplained variation
2 2 2 ˆ ˆ y i y i y i y i i 1 i 1 i 1 N N N
total variation: ~variance observed in the training data explained variation: part of the total variation accounted for
(“explained”) by the trained model unexplained variation: mismatch between the data and the model-based predictions (part of the total variance that is left “unexplained” by the model)
R1.0