The problem that our group had to solve is that we had to use the data that we were given on renewable resources in the United States, and had to consider the relationship of hydroelectric power with the energy from wind power. Our null hypothesis was there is no relationship between hydroelectric power and wind power. Our alternative hypothesis is there is a relationship between hydroelectric power and wind power.
We started by finding the regression analysis for hydroelectric conventional versus wind power. The regression equation that we had was hydroelectric conventional= 5631695 + 4.74 wind. There was 23 cases used and 28 cases contain missing values and we got a p value of .183. We achieved a r-squared of 8.3% which is far too low, so we knew that we had to take out outliers in order to get a high R-squared. Washington was a large outlier because they have a large number of hydroelectric plants, so we took Washington out of our data. This gave us a regression equation of hydroelectric conventional = 2996890 + 4.41wind. There were 22 cases used now because Washington isn’t in the data anymore and we got a p value of .036. This gave us a R-squared of 20.2% and S=9544987, the R-squared increased greatly when we took out Washington which was a good sign, but the R-squared was still relatively low.
We knew that in order to achieve a higher R- squared we would have to take out more outliers, but if we took out too many outliers this will make the model useless. So instead of taking out a bunch of outliers we decided to re-express the variables in the hope of making the distribution of a variable more symmetric and make the residual plot more spread out evenly. We did this by testing hydroelectricity vs. LogWind, LogHydroelctricity vs. LogWind, and LogHyrdroelectricity vs. Wind. The first one that we tested was hydroelectricity vs. LogWind which gave us a regression equation of hydroelectric conventional = -9732703 + 1285448