|
Preliminary Analysis
2a)
Figure 1: X as a Data Object
X is a data frame as derived from the program R shown above in Figure 1. There are 274 observations of 11 variables. The number of observations is obtained from the number of rows while the number of variables is obtained from the number of columns.
2b)
Figure 2: Creating a sub-data frame from X
Figure 3:Sub-data frame from X
Figure 2 shows a screenshot of the commands entered into R to create a sub-data frame X containing observations of the 7 selected variables. Figure 3 shows the sub-data frame created from the commands in R.
2c)
Figure 4: Missing values of WWBID
The variable WWBID has 36 missing values and the fraction of the number of missing values out of the total number of cases is1291.
2d)
The percentage of cases in the dataset which contains one or more missing value as calculated using R is 13.19% which can also be seen from Figure 4.
2e)
Figure 5: Bid price variable for markets Tri-County & Surround in 1984
Figure 6: Bid price variable for markets Tri-County & Surround in 1985
Figure 7: Bid price variable for markets Tri-County & Surround in 1986
Figure 8: Bid price variable for markets Tri-County & Surround in 1987
Figure 9: Bid price variable for markets Tri-County & Surround in 1988
Figures 5-9 above show the box plots obtained using R for Tri-County and Surround for the years 1984 to 1988. After examining these plots from the combined data for Meyer and Trauth, there is presence of potential outliers in all the years.
There is a presence of potential outliers in all 5 years in Surround. On the other hand, they are only present in years 1985 and 1988 for Tri-County.
For the potential outliers in Surround, they are all the maximum values of the bid price variable. However, Tri-County has a potential outlier in 1988 which is the minimum value of the bid price variable.