Statistics for Programmers
Version 1.6.0
Think Stats
Probability and Statistics for Programmers
Version 1.6.0
Allen B. Downey
Green Tea Press
Needham, Massachusetts
Copyright © 2011 Allen B. Downey.
Green Tea Press
9 Washburn Ave
Needham MA 02492
Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported License, which is available at http://creativecommons.org/licenses/by-nc/3.0/.
A
The original form of this book is L TEX source code. Compiling this code has the effect of generating a device-independent representation of a textbook, which can be converted to other formats and printed.
A
The L TEX source for this book is available from http://thinkstats.com.
The cover for this book is based on a photo by Paul Friel (http://flickr.com/ people/frielp/), who made it available under the Creative Commons Attribution license. The original photo is at http://flickr.com/photos/frielp/11999738/.
Preface
Why I wrote this book
Think Stats: Probability and Statistics for Programmers is a textbook for a new kind of introductory prob-stat class. It emphasizes the use of statistics to explore large datasets. It takes a computational approach, which has several advantages: • Students write programs as a way of developing and testing their understanding. For example, they write functions to compute a least squares fit, residuals, and the coefficient of determination. Writing and testing this code requires them to understand the concepts and implicitly corrects misunderstandings.
• Students run experiments to test statistical behavior. For example, they explore the Central Limit Theorem (CLT) by generating samples from several distributions. When they see that the sum of values from a Pareto distribution doesn’t converge to normal, they remember the assumptions the CLT is based on.
• Some ideas that are hard to