Two Unusual Papers on Monte Carlo Simulation

For Bayesian inference, Markov Chain Monte Carlo (MCMC) methods were a huge breakthrough. These methods provide a principled way for simulating from a posterior probability distribution, and are useful for integrating distributions that are computationally intractable. Usually MCMC methods are performed with computers, but I recently read two papers that apply Monte Carlo simulation in interesting ways.

The first is Markov Chain Monte Carlo with People. MCMC with people is somewhat similar to playing the game of telephone–there is input “data” (think of the starting word in the telephone game) that is transmitted across stages where it can be modified and then output at the end. In the paper the authors construct a task so that human learners approximately follow an MCMC acceptance rule. I have summarized the paper in slightly more detail here.

The second paper is even less conventional: the authors approximate the value of π using a “Mossberg 500 pump-action shotgun as the proposal distribution.” Their simulated value is 3.131, within 0.33% of the true value. As the authors state, “this represents the first attempt at estimating π using such method, thus opening up new perspectives towards computing mathematical constants using everyday tools.” Who said statistics has to be boring?


What Really Happened to Nigeria’s Economy?

You may have heard the news that the size Nigeria’s economy now stands at nearly $500 billion. Taken at face value (as many commenters have seemed all to happy to do) this means that the West African state “overtook” South Africa’s economy, which was roughly $384 billion in 2012. Nigeria’s reported GDP for that year was $262 billion, meaning it roughly doubled in a year.

How did this “growth” happen? As Bloomberg reported:

On paper, the size of the economy expanded by more than three-quarters to an estimated 80 trillion naira ($488 billion) for 2013, Yemi Kale, head of the National Bureau of Statistics, said at a news conference yesterday to release the data in the capital, Abuja….

The NBS recalculated the value of GDP based on production patterns in 2010, increasing the number of industries it measures to 46 from 33 and giving greater weighting to sectors such as telecommunications and financial services.

The actual change appears to be due almost entirely to Nigeria including figures in GDP calculation that had been excluded previously. There is nothing wrong with this, per se, but it makes comparisons completely unrealistic. This would be like measuring your height in bare feet for years, then doing it while wearing platform shoes. Your reported height would look quite different, without any real growth taking place. Similar complications arise when comparing Nigeria’s new figures to other countries’, when the others have not changed their methodology.

Nigeria’s recalculation adds another layer of complexity to the problems plaguing African development statistics. Lack of transparency (not to mention accuracy) in reporting economic activity makes decisions about foreign aid and favorable loans more difficult. For more information on these problems, see this post discussing Morten Jerven’s book Poor NumbersIf you would like to know more about GDP and other economic summaries, and how they shape our world, I would recommend Macroeconomic Patterns and Stories (somewhat technical), The Leading Indicators, and GDP: A Brief but Affectionate History.

Schneier on Data and Power

Data and Power is the tentative title of a new book, forthcoming from Bruce Schneier. Here’s more from the post describing the topic of the book:

Corporations are collecting vast dossiers on our activities on- and off-line — initially to personalize marketing efforts, but increasingly to control their customer relationships. Governments are using surveillance, censorship, and propaganda — both to protect us from harm and to protect their own power. Distributed groups — socially motivated hackers, political dissidents, criminals, communities of interest — are using the Internet to both organize and effect change. And we as individuals are becoming both more powerful and less powerful. We can’t evade surveillance, but we can post videos of police atrocities online, bypassing censors and informing the world. How long we’ll still have those capabilities is unclear….

There’s a fundamental trade-off we need to make as society. Our data is enormously valuable in aggregate, yet it’s incredibly personal. The powerful will continue to demand aggregate data, yet we have to protect its intimate details. Balancing those two conflicting values is difficult, whether it’s medical data, location data, Internet search data, or telephone metadata. But balancing them is what society needs to do, and is almost certainly the fundamental issue of the Information Age.

There’s more at the link, including several other potential titles. The topic will likely interest many readers of this blog. It will likely build on his ideas of inequality and online feudalism, discussed here.

“The Impact of Leadership Removal on Mexican Drug Trafficking Organizations”

That’s the title of a new article, now online at the Journal of Quantitative Criminology. Thanks to fellow grad students Cassy Dorff and Shahryar Minhas for their feedback. Thanks also to mentors at the University of Houston (Jim Granato, Ryan Kennedy) and Duke University (Michael D. Ward, Scott de Marchi, Guillermo Trejo) for thoughtful comments. The anonymous reviewers at JQC and elsewhere were also a big help.

Here is the abstract:


Has the Mexican government’s policy of removing drug-trafficking organization (DTO) leaders reduced or increased violence? In the first 4 years of the Calderón administration, over 34,000 drug-related murders were committed. In response, the Mexican government captured or killed 25 DTO leaders. This study analyzes changes in violence (drug-related murders) that followed those leadership removals.


The analysis consists of cross-sectional time-series negative binomial modeling of 49 months of murder counts in 32 Mexican states (including the federal district).


Leadership removals are generally followed by increases in drug-related murders. A DTO’s home state experiences more subsequent violence than the state where the leader was removed. Killing leaders is associated with more violence than capturing them. However, removing leaders for whom a $30m peso bounty was offered is associated with a smaller increase than other removals.


DTO leadership removals in Mexico were associated with an estimated 415 additional deaths during the first 4 years of the Calderón administration. Reforming Mexican law enforcement and improving career prospects for young men are more promising counter-narcotics strategies. Further research is needed to analyze how the rank of leaders mediates the effect of their removal.

I didn’t shell out $3,000 for open access, so the article is behind a paywall. If you’d like a draft of the manuscript just email me.

Mexico Update Following Joaquin Guzmán’s Capture

As you probably know by now, the Sinaloa cartel’s leader Joaquin Guzmán was captured in Mexico last Saturday. How will violence in Mexico shift following Guzman’s removal?

(Alfredo Estrella/AFP/Getty Images)

(Alfredo Estrella/AFP/Getty Images)

I take up this question in an article forthcoming in the Journal of Quantitative Criminology. According to that research (which used negative binomial modeling on a cross-sectional time series of Mexican states from 2006 to 2010), DTO leadership removals in Mexico are generally followed by increased violence. However, capturing leaders is associated with less violence than killing them. The removal of leaders for whom a 30 million peso bounty (the highest in my dataset, which generally identified high-level leaders) been offered is also associated with less violence. The reward for Guzmán’s capture was higher than any other contemporary DTO leader: 87 million pesos. Given that Guzmán was a top-level leader and was arrested rather than killed, I would not expect a significant uptick in violence (in the next 6 months) due to his removal. This follows President Pena Nieto’s goal of reducing DTO violence.

My paper was in progress for a while, so the data is a few years old. Fortunately Brian Phillips has also taken up this question using additional data and similar methods, and his results largely corroborate mine:

Many governments kill or capture leaders of violent groups, but research on consequences of this strategy shows mixed results. Additionally, most studies have focused on political groups such as terrorists, ignoring criminal organizations – even though they can represent serious threats to security. This paper presents an argument for how criminal groups differ from political groups, and uses the framework to explain how decapitation should affect criminal groups in particular. Decapitation should weaken organizations, producing a short-term decrease in violence in the target’s territory. However, as groups fragment and newer groups emerge to address market demands, violence is likely to increase in the longer term. Hypotheses are tested with original data on Mexican drug-trafficking organizations (DTOs), 2006-2012, and results generally support the argument. The kingpin strategy is associated with a reduction of violence in the short term, but an increase in violence in the longer term. The reduction in violence is only associated with leaders arrested, not those killed.

A draft of the full paper is here.

Who says North is “up”?

There are several childhood lessons that I trace back to dinners at Outback Steakhouse: the deliciousness of cheese fries, the inconvenience of being in the middle of a wraparound booth, and the historical contingency of North as “up” on maps.

Who started using the NESW arrangement that is virtually omnipresent on maps today? Was it due to the fact that civilization as we now know it developed in the Northern hemisphere? (Incidentally, that’s why clocks run clockwise–a sundial in the Southern hemisphere goes the other way around.)

That doesn’t appear to be the case according to Nick Danforth, who recently took on this question at al-Jazeera America (via Flowing Data):

There is nothing inevitable or intrinsically correct — not in geographic, cartographic or even philosophical terms — about the north being represented as up, because up on a map is a human construction, not a natural one. Some of the very earliest Egyptian maps show the south as up, presumably equating the Nile’s northward flow with the force of gravity. And there was a long stretch in the medieval era when most European maps were drawn with the east on the top. If there was any doubt about this move’s religious significance, they eliminated it with their maps’ pious illustrations, whether of Adam and Eve or Christ enthroned. In the same period, Arab map makers often drew maps with the south facing up, possibly because this was how the Chinese did it.

So who started putting North up top? According to Danforth, that was Ptolemy:

[He] was a Hellenic cartographer from Egypt whose work in the second century A.D. laid out a systematic approach to mapping the world, complete with intersecting lines of longitude and latitude on a half-eaten-doughnut-shaped projection that reflected the curvature of the earth. The cartographers who made the first big, beautiful maps of the entire world, Old and New — men like Gerardus MercatorHenricus Martellus Germanus and Martin Waldseemuller — were obsessed with Ptolemy. They turned out copies of Ptolemy’s Geography on the newly invented printing press, put his portrait in the corners of their maps and used his writings to fill in places they had never been, even as their own discoveries were revealing the limitations of his work.

map_projectionsPtolemy probably had his reasons, but they are lost to history. As Danforth concludes, “The orientation of our maps, like so many other features of the modern world, arose from the interplay of chance, technology and politics in a way that defies our desire to impose easy or satisfying narratives.” Yet another example of a micro-institution that rules our world.

Visualizing the Indian Buffet Process with Shiny

(This is a somewhat more technical post than usual. If you just want the gist, skip to the visualization.)

N customers enter an Indian buffet restaurant, one after another. It has a seemingly endless array of dishes. The first customer fills her plate with a Poisson(α) number of dishes. Each successive customer i tastes the previously sampled dishes in proportion to their popularity (the number of previous customers who have sampled the kth dish, m_k, divided by i). The ith customer then samples a Poisson(α) number of new dishes.

That’s the basic idea behind the Indian Buffet Process (IBP). On Monday Eli Bingham and I gave a presentation on the IBP in our machine learning seminar at Duke, taught by Katherine Heller. The IBP is used in Bayesian non-parametrics to put a prior on (exchangeability classes of) binary matrices. The matrices usually represent the presence of features (“dishes” above, or the columns of the matrix) in objects (“customers,” or the rows of the matrix). The culinary metaphor is used by analogy to the Chinese Restaurant Process.

Although the visualizations in the main paper summarizing the IBP are good, I thought it would be helpful to have an interactive visualization where you could change α and N to see how what a random matrix with those parameters looks like. For this I used Shiny, although it would also be fun to do in d3.

One realization of the IBP, with α=10.

One realization of the IBP, with α=10.

In the example above, the first customer (top row) sampled seven dishes. The second customer sampled four of those seven dishes, and then four more dishes that the first customer did not try. The process continues for all 10 customers. (Note that this matrix is not sorted into its left-ordered-form. It also sometimes gives an error if α << N, but I wanted users to be able to choose arbitrary values of N so I have not changed this yet.) You can play with the visualization yourself here.

Interactive online visualizations like this can be a helpful teaching tool, and the process of making them can also improve your own understanding of the process. If you would like to make another visualization of the IBP (or another machine learning tool that lends itself to graphical representation) I would be happy to share it here. I plan to add the Chinese restaurant process and a Dirichlet process mixture of Gaussians soon. You can find more about creating Shiny apps here.

Constitutional Forks Revisited

Around this time last year, we discussed the idea of a constitutional “fork” that occurred with the founding of the Confederate States of America. That post briefly explains how forks work in open source software and how the Confederates used the US Constitution as the basis for their own, with deliberate and meaningful differences. Putting the two documents on Github allowed us to compare their differences visually and confirm our suspicions that many of them were related to issues of states’ rights and slavery.

Caleb McDaniel, a historian at Rice who undoubtedly has a much deeper and more thorough knowledge of the period, conducted a similar exercise and also posted his results on Github. He was faced with similar decisions of where to obtain the source text and which differences to retain as meaningful (for example, he left in section numbers where I did not). My method identifies 130 additions and 119 deletions when transitioning between the USA and CSA constitutions, whereas the stats for Caleb’s repo show 382 additions and 370 deletions.

What should we draw from these projects? In Caleb’s words:

My decisions make this project an interpretive act. You are welcome to inspect the changes more closely by looking at the commit histories for the individual Constitution files, which show the initial text as I got it from Avalon as well as the changes that I made.

You can take a look at both projects and conduct a difference-in-differences exploration of your own. More generally, these projects show the need for tools to visualize textual analyses, as well as the power of technology to enhance understanding of historical and political acts. Caleb’s readme file has great resources for learning more about this topic including the conversation that led him to this project, a New York Times interactive feature on the topic, and more.

Playing Chicken with Your Calendar

The ever-interesting Brendan Nelson on meeting chicken:

You have a regular meeting in your calendar. It’s with just one other person. Sometimes you have things to talk to them about and sometimes you don’t. But as long as your calendar says you both have to go, you will both go.

The day of the meeting comes round. There are lots of things that need to be done that day. You look at that meeting sitting obstinately in your calendar and think how useful it would be to get that time back.

Inspiration strikes: why not cancel the meeting? A couple of mouse clicks, an automatic notification sent out, a joyously blank calendar. It seems so easy.

But you can’t bring yourself to do it, to cancel a meeting at such short notice. It would make you look disorganised, unprepared. And what about the other person?

More at the link.

Don’t Forget Your Forever Stamps

The price of a first-class US stamp is set to increase from 46 to 49 cents on January 26. Like Cosmo Kramer’s Michigan bottle redemption plan (see below), Allison Schrager and Ritchie King ran the numbers on whether it would be possible to provide from Forever Stamp arbitrage.

Could the scheme make money? Maybe–if you get the timing right and pay low interest on capital:

Assuming we sell all 10 million stamps for the bulk discount price of $0.475 each, our profit will be $150,000. Subtract out the $399 for the distributor database. Let’s also assume we spent the $3,500 for Check Stand Program plus, say, $300 to make the 100 displays for advertising in stores. That gives us $145,801.

If we do manage to shift the stamps in a month, the interest on our debt will be $29,000. That brings our profits to $116,801. Then we’ll return the equity to our shareholders, along with 50% of the profits.

That leaves us with the other 50%: $58,400.50. If you look at that as a profit on the $4.6 million initial outlay, it’s not very much: less than 1.3%. But remember, all that outlay was leveraged. So if you look at it as a return on our investment—$33.25 for shipping—it’s 175,541%.