Bayesian Methods for Hackers
Using Python and PyMC
The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a sowhat feeling about Bayesian inference. In fact, this was the author's own prior opinion.
After some recent success of Bayesian methods in machinelearning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straightdays of reading examples and trying to put the pieces together to understand the methods. There was simply not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.
If Bayesian inference is the destination, then mathematical analysis is a particular path to towards it. On the other hand, computing power is cheap enough that we can afford to take an alternate route via probabilistic programming. The latter path is much more useful, as it denies the necessity of mathematical intervention at each step, that is, we remove oftenintractable mathematical analysis as a prerequisite to Bayesian inference. Simply put, this latter computational path proceeds via small intermediate jumps from beginning to end, where as the first path proceeds by enormous leaps, often landing far away from our target. Furthermore, without a strong mathematical background, the analysis required by the first path cannot even take place.
Bayesian Methods for Hackers is designed as an introduction to Bayesian inference from a computational/understandingfirst, and mathematicssecond, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematicalbackground, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.
The choice of PyMC as the probabilistic programming language is twofold. As of this writing, there is currently no central resource for examples and explanations in the PyMC universe. The official documentation assumes prior knowledge of Bayesian inference and probabilistic programming. We hope this book encourages users at every level to look at PyMC. Secondly, with recent core developments and popularity of the scientific stack in Python, PyMC is likely to become a core component soon enough.
PyMC does have dependencies to run, namely NumPy and (optionally) SciPy. To not limit the user, the examples in this book will rely only on PyMC, NumPy, SciPy and Matplotlib only.
Printed Version by AddisonWesley
Bayesian Methods for Hackers is now available as a printed book! You can pick up a copy on Amazon. What are the differences between the online version and the printed version?
 Additional Chapter on Bayesian A/B testing
 Updated examples
 Answers to the end of chapter questions
 Additional explaination, and rewritten sections to aid the reader.
Contents
See the project homepage here for examples, too.
The below chapters are rendered via the nbviewer at nbviewer.ipython.org/, and is readonly and rendered in realtime. Interactive notebooks + examples can be downloaded by cloning!
PyMC2

Prologue: Why we do it.

Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Examples include:
 Inferring human behaviour changes from text message rates

Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. How do we create Bayesian models? Examples include:
 Detecting the frequency of cheating students, while avoiding liars
 Calculating probabilities of the Challenger spaceshuttle disaster

Chapter 3: Opening the Black Box of MCMC We discuss how MCMC operates and diagnostic tools. Examples include:
 Bayesian clustering with mixture models

Chapter 4: The Greatest Theorem Never Told We explore an incredibly useful, and dangerous, theorem: The Law of Large Numbers. Examples include:
 Exploring a Kaggle dataset and the pitfalls of naive analysis
 How to sort Reddit comments from best to worst (not as easy as you think)

Chapter 5: Would you rather lose an arm or a leg? The introduction of loss functions and their (awesome) use in Bayesian methods. Examples include:
 Solving the Price is Right's Showdown
 Optimizing financial predictions
 Winning solution to the Kaggle Dark World's competition

Chapter 6: Getting our priorities straight Probably the most important chapter. We draw on expert opinions to answer questions. Examples include:
 MultiArmed Bandits and the Bayesian Bandit solution.
 What is the relationship between data sample size and prior?
 Estimating financial unknowns using expert priors
We explore useful tips to be objective in analysis as well as common pitfalls of priors.
PyMC3

Prologue: Why we do it.

Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Examples include:
 Inferring human behaviour changes from text message rates

Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. How do we create Bayesian models? Examples include:
 Detecting the frequency of cheating students, while avoiding liars
 Calculating probabilities of the Challenger spaceshuttle disaster

Chapter 3: Opening the Black Box of MCMC We discuss how MCMC operates and diagnostic tools. Examples include:
 Bayesian clustering with mixture models

Chapter 4: The Greatest Theorem Never Told We explore an incredibly useful, and dangerous, theorem: The Law of Large Numbers. Examples include:
 Exploring a Kaggle dataset and the pitfalls of naive analysis
 How to sort Reddit comments from best to worst (not as easy as you think)

Chapter 5: Would you rather lose an arm or a leg? The introduction of loss functions and their (awesome) use in Bayesian methods. Examples include:
 Solving the Price is Right's Showdown
 Optimizing financial predictions
 Winning solution to the Kaggle Dark World's competition

Chapter 6: Getting our priorities straight Probably the most important chapter. We draw on expert opinions to answer questions. Examples include:
 MultiArmed Bandits and the Bayesian Bandit solution.
 What is the relationship between data sample size and prior?
 Estimating financial unknowns using expert priors
We explore useful tips to be objective in analysis as well as common pitfalls of priors.
More questions about PyMC? Please post your modeling, convergence, or any other PyMC question on crossvalidated, the statistics stackexchange.
Using the book
The book can be read in three different ways, starting from most recommended to least recommended:
 The most recommended option is to clone the repository to download the .ipynb files to your local machine. If you have IPython installed, you can view the
chapters in your browser plus edit and run the code provided (and try some practice questions). This is the preferred option to read
this book, though it comes with some dependencies.
 IPython v2.0 (or greater) is a requirement to view the ipynb files. It can be downloaded here. IPython notebooks can be run by
(yourvirtualenv) ~/path/to/the/book/Chapter1_Introduction $ ipython notebook
 For Linux users, you should not have a problem installing NumPy, SciPy, Matplotlib and PyMC. For Windows users, check out precompiled versions if you have difficulty.
 In the styles/ directory are a number of files (.matplotlirc) that used to make things pretty. These are not only designed for the book, but they offer many improvements over the default settings of matplotlib.
 IPython v2.0 (or greater) is a requirement to view the ipynb files. It can be downloaded here. IPython notebooks can be run by

The second, preferred, option is to use the nbviewer.ipython.org site, which display IPython notebooks in the browser (example). The contents are updated synchronously as commits are made to the book. You can use the Contents section above to link to the chapters.
 PDFs are the leastprefered method to read the book, as pdf's are static and noninteractive. If PDFs are desired, they can be created dynamically using the nbconvert utility.
Installation and configuration
If you would like to run the IPython notebooks locally, (option 1. above), you'll need to install the following:
 IPython 2.0+ is a requirement to view the ipynb files. It can be downloaded here
 Necessary packages are PyMC, NumPy, SciPy and Matplotlib.
 For Linux/OSX users, you should not have a problem installing the above, except for Matplotlib on OSX.
 For Windows users, check out precompiled versions if you have difficulty.
 also recommended, for datamining exercises, are PRAW and requests.

New to Python or IPython, and help with the namespaces? Check out this answer.
 In the styles/ directory are a number of files that are customized for the notebook. These are not only designed for the book, but they offer many improvements over the default settings of matplotlib and the IPython notebook. The in notebook style has not been finalized yet.
Development
This book has an unusual development design. The content is opensourced, meaning anyone can be an author. Authors submit content or revisions using the GitHub interface.
How to contribute
What to contribute?
 The current chapter list is not finalized. If you see something that is missing (MCMC, MAP, Bayesian networks, good prior choices, Potential classes etc.), feel free to start there.
 Cleaning up Python code and making code more PyMCesque
 Giving better explanations
 Spelling/grammar mistakes
 Suggestions
 Contributing to the IPython notebook styles
Commiting
 All commits are welcome, even if they are minor ;)
 If you are unfamiliar with Github, you can email me contributions to the email below.
Reviews
these are satirical, but real
"No, but it looks good"  John D. Cook
"I ... read this book ... I like it!"  Andrew Gelman
"This book is a godsend, and a direct refutation to that 'hmph! you don't know maths, piss off!' school of thought... The publishing model is so unusual. Not only is it open source but it relies on pull requests from anyone in order to progress the book. This is ingenious and heartening"  excited Reddit user
Contributions and Thanks
Thanks to all our contributing authors, including (in chronological order):
We would like to thank the Python community for building an amazing architecture. We would like to thank the statistics community for building an amazing architecture.
Similarly, the book is only possible because of the PyMC library. A big thanks to the core devs of PyMC: Chris Fonnesbeck, Anand Patil, David Huard and John Salvatier.
One final thanks. This book was generated by IPython Notebook, a wonderful tool for developing in Python. We thank the IPython community for developing the Notebook interface. All IPython notebook files are available for download on the GitHub repository.
Contact
Contact the main author, Cam DavidsonPilon at [email protected] or @cmrndp