Data Mining 95-791 Spring 2013 Lecture #8 Predictive analytics: Regression Artur Dubrawski awd@cs.cmu.edu This unit • Good-old correlation scores revisited • Locally weighted regression – As an approximator of non-linear functions – As a framework for active/purposive acquisition of data 95-791 Data Mining Lecture #8 Slide 2 Copyright © 2000-2013 Artur Dubrawski Correlational scores of association between attributes of data • • • • Linear Rank Quadratic …. Would not it be
Premium Regression analysis Linear regression
The simple regression model (SRM) is model for association in the population between an explanatory variable X and response Y. The SRM states that these averages align on a line with intercept β0 and slope β1: µy|x = E(Y|X = x) = β0 + β1x Deviation from the Mean The deviation of observed responses around the conditional means µy|x are called errors (ε). The error’s equation: ε = y - µy|x Errors can be positive or negative‚ depending on whether data lie above (positive) or below the conditional
Premium Normal distribution Regression analysis Variance
you cannot consult the regression R2 because (a) ln(Y) may be negative for 0 < Y < 1. (b) the TSS are not measured in the same units between the two models. (c) the slope no longer indicates the effect of a unit change of X on Y in the log-linear model. (d) the regression R2 can be greater than one in the second model. 1 (v) The exponential function (a) is the inverse of the natural logarithm function. (b) does not play an important role in modeling nonlinear regression functions in econometrics
Premium Regression analysis Linear regression Polynomial
Testing. Follow the steps shown in the process diagram. You will try out four different models as described below: Regression: This model is the default regression model with the original data Regression – No Model Selection: This is the default regression model after transforming the variables as described below. Regression – Stepwise: This is the Regression model using stepwise regression and transformed data Decision Tree: This is the default decision tree model using transformed data Transform
Premium Statistics Stepwise regression Decision tree
Due in class Feb 6 UCI ID_____________________________ MultipleChoice Questions (Choose the best answer‚ and briefly explain your reasoning.) 1. Assume we have a simple linear regression model: . Given a random sample from the population‚ which of the following statement is true? a. OLS estimators are biased when BMI do not vary much in the sample. b. OLS estimators are biased when the sample size is small (say 20 observations)
Premium Regression analysis Errors and residuals in statistics Linear regression
Table 1 showed the empirical results of microfinance and poverty reduction through the Tobit regression method of analysis. For this study to evaluate the influence of microfinance on the poverty reduction‚ Tobit regression model was regressed on the poverty reduction‚ on the key variables in this study. These include micro-credit‚ age‚ household size‚ qualification‚ nature of business‚ duration of membership and village type. In this model‚ poverty reduction is a dummy and is considered as the dependent
Premium Economics Macroeconomics Unemployment
CWRU Regression Project Report OPRE 433 Tianao Zhang 12/5/2011 Introduction According to the data I’ve received‚ there are 6578 observations. The data base is composed by 13 columns and 506 rows. All the explanatory variables are continuous as well as the dependent variable and there are no categorical variables. My goal is to build a regression model to predict the average of Y or particular Y by a given X. 1. Do the regression assumptions such as Constant Variance‚ Normality and Independence
Premium Regression analysis
Regression As the human race ‘evolves’ and progresses it has created an environment unsuitable for the generations to come. This Darwinist environment promotes the ideals of a ‘dog-eat-dog’ world‚ in which one person’s ambitions are more important than another human being’s. People strive for the ideal life in which money is not an issue‚ so the matter of living comfortably is not a problem. To live comfortably is an idea of life without worry of matters such as starving‚ fiscal responsibility
Free Human Natural environment
MATH 231: Basic Statistics Homework #5 – Correlation and Regression: 1). Bi-lo Appliance Super-Store has outlets in several large metropolitan areas in New England. The general sales manager aired a commercial for a digital camera on selected local TV stations prior ro a sale starting on Saturday and ending on Sunday. She obtained the information for Saturday-Sunday digital camera sales at the various outlets and paired it with the number of times the advertisement was shown on local TV stations
Premium Statistics Regression analysis Linear regression
Time Series Regression 3.1 A small regional trucking company has experienced steady growth. Use time series regression to forecast capital needs for the next 2 years. The company’s recent capital needs have been: ══════════════════════════════════════════════ Capital Needs Capital Needs (Thousands Of (Thousands Of Year Dollars) Year Dollars) -------------------------------------------
Premium Errors and residuals in statistics Forecasting Statistics