Chapter 3 Summaries:
Chapter 3 talks about descriptive statistics with numerical measures. These measures consist of location and variability. The measures of locations are mean, median, mode, weighted mean, geometric mean, percentiles, and quartiles. If these measures are computed for data from a sample it’s called sample statistics. If these measures are computes as population it’s called population parameters. A sample statistic is mentioned to as the point estimation of the agreement with population parameter. The most important measure of location is the mean. This measure of location provides a measure of central location. The mean of a data set is the average of every data value. Another location of measure is median which the value of the middle is when the data items are put together in ascending order. The median is usually reported for property value and annual income data. The mode of a data set is the value that takes place with the greatest frequency. Dealing with measures of location there are different kinds of means. You have trimmed mean, weighted mean, and geometric mean. Trimmed means is also used when extreme values are presented is secured by deleting a percentage of largest and smallest values. The data set then calculates the mean of the remaining values. The geometric mean is calculating by discovering the nth root of the product of n values. You will use geometric means anytime you want to find the mean rate of change over the number of successive periods. The last is weighted mean, in few occasions this mean is calculated by giving each observation weight that reflects its corresponding importance. Next is to explain the meanings of percentiles and quartiles. Percentiles give information about how data are spread around the interval from the least value to the largest value. An example of percentiles can be used as the test scores for universities and colleges. Quartiles are percentiles as well but specified, for example the 1st quartile is 25th percentile, the 2nd quartile is the 50th percentile and third quartile is the 75th percentile.
The next sections of measurements are the measures of variability. To explain measures of variability is the delivery of time supplier 1 or supplier 2 and also the variability in delivery in each. The measures of variability’s are range, variance, and coefficient of variation, interquartile range and standard deviation. The range of a data se consists between the largest and smallest data values. In other words it is the largest value minus the smallest value equals the range. Variance is the average of the squared differences between the mean and each data value. The
Interquartile range is data set that makes the differences of the third quartile and the first quartile. The standard deviation of dataset is the positive square root of the variance. It measures the same unit as the data making it more easily explained than the variance.
Descriptive statistics also deal with measures of distributions shape, z-Scores, relative location and detecting outliers. Skewness is an important measure of the shape of distribution; skewness can be easily calculated by using statistical software. The next measurement usually called the standardized value is known as z – Score. Z-scores indicate the number of standard deviations a data value x I from the mean. If the data value is less than the sample mean will have a z-Score less than zero. If it is greater than the sample mean the z-score will be greater than zero. The z-score will only equal to zero if the value is equal to the sample mean.
Following all of the information above, I will explain the Empirical Rule and Detecting Outliers. The empirical rule is a rule that is used to decide the percentage of data values that must be inside of a certain number of standard deviations of a mean. Empirical rule is based on the normal distribution. Now an outlier is an unusually small or unusually small or unusually large value in a dataset. Outliers also deal with z – scores, so a data value with z – score greater the +3 or less than -3 could be considered an outlier. This could be an incorrect recorded data value which was incorrectly put in a data set. Also might be data value that was correctly recorded that belongs in the data set.
A box plot is a graph related summary of data that is formed on a five number summary. A key to the growth of a box plot is the calculations of the quartiles and the median. Also a box plot gives another way to identify outliers. Located at the first and third quartiles a box is drawn within those quartiles a box is drawn in between them as the location of the median.
So far we have looked closely at numerical methods used to summarize the data for one variable at a time. Most times a decision maker or a manager is curious in the relationship between two variables. These two variables are known as correlation coefficient and covariance. Correlation coefficient is a measure of linear association and not required causation. If the two variables are extremely correlated it doesn’t necessary mean that one variable is the result of the other. The covariance is a measure of the linear affiliate between two variables. The positive values indicate a positive relationship and the negative value indicate a negative relationship.
Last is to review is data dashboards adding numeric measures to improve its effectiveness. Data dashboards are not restricted to graphical displays. The addition of numerical measures, with the standard deviation and mean of KPIs to a dashboard are usually serious and dashboards are mainly interactive. Dash boards also deals with drilling down it means to functionality in interactive dashboards that allows the user to access information and analyses at increasingly detailed level.