Simulating the NLDS: Can the Giants Win?

In Allen Downey’s new book, Think Bayes, he relates the “Boston Bruins” problem. The problem is to estimate the Bruins’ probability of winning the 2010-2011 NHL championship after two wins and two losses. I will briefly describe Downey’s approach, and then relate it to the current situation of the San Francisco Giants.

One (naive) approach would be to model this as a gambler’s ruin problem. There are two problems with that model for this problem: the total number of rounds to be played is uncertain (i.e. the championship is a best of n rather than play until one side is totally defeated), and it throws away important information about the score of the games.

Instead, we model baseball as a Poisson process, in which it is equally likely for a run to be scored at any time during the game. This is still somewhat of an oversimplification (the odds are better when you have runners on base, for example), but we are getting closer to the “true” model. Second, we assume that games between the Reds and Giants in this year’s National League Division Series are similar enough that they can be considered as outcomes from Poisson distributions in which each team’s scoring distribution is consistent between games with parameter λ. Different pitchers could cause this assumption to be thrown off (no pun intended), but we will again use it as a not-entirely-implausible simplification.

Having made our assumptions, we now use a four-step process proposed by Downey:

1. Use statistics from previous games to choose a prior distribution for λ.
2. Use the score from the first four games to estimate λ for each team.
3. Use the posterior distributions of λ to compute distribution of goals for each team, the distribution of the goal differential, and the probability that each team wins.
4. Simulate the rest of the series to estimate the probability of each possible outcome.

To calculate λ, we will use the team batting stats from ESPN and the thinkbayes Python package from Downey’s site.

Here is the distribution of λ, using regular season scoring as the prior and updating with the results of the first four games of the division series:

And here are the predicted runs-per-game by team, using simulations:

According to the model’s predictions, the probability that the Giants win today’s game (and the division series) is 0.387. I would have preferred to use a Gamma prior for λ and run some more simulations in R, but I wanted to use Downey’s example and get this up before the game started… which was a few minutes ago (although as I post, the score is still 0-0). Either way, enjoy the game!

About these ads

About You Study Politics, Right?

Graduate student in political science at Duke University.
This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s