1. Measure of central tendency and dispersion
Mean of a group of number is the average value of these numbers. Median is the number that is in the middle when you order these numbers in ascending order. It is probably immediate to you that the order doesn’t have to be ascending. A tricky question is what the median is when the total number of the numbers is even. In that case, you just take the average of the two numbers that are in the middle. For example, suppose the numbers given are a, b, c and d, where a2*10}
Using Chebyshev inequality, we note that theta is 50, standard deviation is 10, and hence 2 is what we called c in the formula. So, Pr{X-50>2*10} [pic] .
Notice that Chebyshev inequality provides a bound for almost every distribution. It says that if X is picked from the population which has mean θ and standard deviation s, the probability that any one realization of x is [pic]farther from θ is less than [pic] . I am using k deliberately here, because many of you seemed to have struggled with the idea of a variable c in the original formula during our class discussion.
The central limit theorem actually takes us a little farther. According to the central limit theorem,
[pic]
Where Z is the standard normal variable and [pic]is standard normal distribution function. Using c=2 as before, we get that [pic](2)=0.977 (you can check this value in Uma Sekaran’s book, page 433 in fourth edition of the book. Such normal distribution table is generally given at the end of the most of the intermediate level statistics books). This implies that the probability that the sample mean is two standard deviation away from true mean value is about 0.046.
2. Correlation between two datasets.
Let {xi}i and {yi}i be two datasets each having n elements. Pearson’s correlation coefficient,r, is given by the following formula:
[pic]
Students must not confuse correlation with causation. Correlation implies