The “Manual for Civilization” Project: A Library for the End of the World

With the subtitle, “How to Rebuild Our World from Scratch,” you can probably guess the genre of The Knowledge. I read this ambitious book over the holidays, hoping that I could learn some of the basics of fields I’m less familiar with such as organic chemistry and medicine. On that front the book delivers, but does it live up to its title?

Some parts of the book were very practical while others seemed superfluous. Purifying water with bleach (p. 37) could be useful in even a small-scale disruption. But in the wake of a larger disaster I find it hard to believe that knowing how to build an internal combustion engine (p. 199) or mix gunpowder (p. 232) would be near-term priorities. (As an aside, the book contains a one-decimeter line segment from which you can reconstruct the entire metric system, but I happen to think that less formal systems of measurement such as the acre–the amount of land a yoke of oxen could plow in a day–would become popular in apocalyptic scenarios.)

The Knowledge is a fun read and contains some useful tips, but I would not want it to be my go-to book for emergencies. That is why I was interested to learn of the “Manual for Civilization” initiative, started by The Long Now Foundation.  This is a library of books that were listed by domain experts and Long Now staff and donors in answer to the question “If you were stranded on an island (or small hostile planetoid), what books would YOU want to have with you?

After reading through the answers I have compiled a short list of my own with the additional qualification that the book offers knowledge that is beneficial even if disaster doesn’t strike. The name after the title is the first recommender on whose list I noticed the book, with a link to their full list of recommendations. (Kevin Kelly’s compilation seemed especially good; his book Cool Tools would likely fit in the list below).

Design Patterns for Cooking

Last week Alexey introduced the idea of cooking patterns:

A recipe is basically a fixed set of actions and ingredients, while cooking techniques are just the possible actions. If we invent cooking patterns – an abstraction on top of each ingredient / action pair – we could have more understanding of the dish we are preparing while keeping the flexibility in ingredient and technique choice.

Let’s take fritters as an example. Wikipedia says the following:

Fritter is a name applied to a wide variety of fried foods, usually consisting of a portion of batter or breading which has been filled with bits of meat, seafood, fruit, or other ingredients.

A pattern in its most obvious form. Notice the “wide variety”, a fixed ingredient (batter) and a list of possible variables (meat, seafood, vegetables, fruit) that could influence the fritters you end up making.

I find this idea very exciting, because I enjoy cooking and am also in the process of learning more about software design patterns.

Cooking patterns seem like an accessible way to introduce beginners to more abstract ideas about software, too. Algorithms are often described as “recipes,” and this is a nice way to build on that concept.

For leveling up your cooking skills, ChefSteps looks promising. Their resources include classes, projects, and an ingredients wiki. I have signed up for one class and plan to follow up on this recommendation after completing it.

If you are interested in cooking patterns, check out the Github repo or read the full article.

A New Wiki for Computer Science Symbols

Computer science is increasingly relevant to a wide range of professional fields, yet many working programmers today do not have a formal CS education. This makes it difficult for the uninitiated to read academic research in computer science and related fields. Keeping up with the latest research is not a job requirement for most programmers, but understanding fundamental papers (such as the ones listed on Papers We Love) is important for building on established knowledge.

However, jargon and unfamiliar symbols present a non-trivial barrier to entry. This came up in the discussion on a recent episode of the Turing Incomplete podcast. A few existing resources were mentioned such as Wikipedia’s math symbols page and Volume I of The Art of Computer Programming. None of these is ideal for new programmers who may not know the names of the symbols, though.

That’s why I started a CS notation wiki. There are currently four pages, one each for computational symbols, linguistic symbols, logical symbols, and mathematical operators. Each page currently only has a few entries, but requests for additional ones can be filed as Github issues. New contributions are certainly welcome, and should be submitted as pull requests. Contribution guidelines can be found on the wiki’s home page. Other suggestions can be submitted as comments here, via email, or on Twitter. Let me know how this could be more useful to you!

Falsehoods Programmers Believe

The first principle is that you must not fool yourself – and you are the easiest person to fool. – Richard Feynman

Programmers love to fool themselves. “This line has to work! I didn’t write that bug! It works on my machine!” But if ever there was a field where you can’t afford to fool yourself, it’s programming. (Unless of course you want to do something like lose $172,222 a second for 45 minutes).

Over the years I’ve enjoyed lots of articles that talk about false assumptions that programmers accept without really questioning them. I thought it would be helpful to have these collected in one place for reference purposes. If you know of articles that would be a good fit on this list, let me know and I will add them.

Falsehoods programmers believe…

Academia to Industry

Last week, Brian Keegan had a great post on moving from doctoral studies to industrial data science. If you have not yet read it, go read the whole thing. In this post I will share a couple of my favorite parts of the post, as well as one area where I strongly disagreed with Brian.

The first key point of the post is to obtain relevant, marketable skills while you are in grad school. There’s just no excuse not to, regardless of your field of study–taking classes and working with scholars in other departments is almost always allowed and frequently encouraged. As Brian puts it:

[I]f you spend 4+ years in graduate school without ever taking classes that demand general programming and/or data analysis skills, I unapologetically believe that your very real illiteracy has held you back from your potential as a scholar and citizen.

Another great nugget in the post is in the context of recruiters, but it is also very descriptive of a prevailing attitude in academia:

This [realizing recruiters’ self-interested motivations] is often hard for academics who have come up through a system that demands deference to others’ agendas under the assumption they have your interests at heart as future advocates.

The final point from the post that I want to discuss may be very attractive and comforting to graduate students doing industry interviews for the first time:

After 4+ years in a PhD program, you’ve earned the privilege to be treated better than the humiliation exercises 20-year old computer science majors are subjected to for software engineering internships.

My response to this is, “no, you haven’t.” This is for exactly the reasons mentioned above–that many graduate students can go through an entire curriculum without being able to code up FizzBuzz. A coding interview is standard for junior and midlevel engineers, even if they have a PhD. Frankly, there are a lot of people trying to pass themselves off as data scientists who can’t code their way out of a paper bag, and a coding interview is a necessary screen. Think of it as a relatively low threshold that greatly enhances the signal-to-noise ratio for the interviewer. If you’re uncomfortable coding in front of another person, spend a few hours pairing with a friend and getting their feedback on your code. Interviewers know that coding on a whiteboard or in a Google Doc is not the most natural environment, and should be able to calibrate for this.

With this one caveat, I heartily recommend the remainder of the original post. This is an interesting topic, and you can expect to hear more about it here in the future.

What Really Happened to Nigeria’s Economy?

You may have heard the news that the size Nigeria’s economy now stands at nearly $500 billion. Taken at face value (as many commenters have seemed all to happy to do) this means that the West African state “overtook” South Africa’s economy, which was roughly $384 billion in 2012. Nigeria’s reported GDP for that year was $262 billion, meaning it roughly doubled in a year.

How did this “growth” happen? As Bloomberg reported:

On paper, the size of the economy expanded by more than three-quarters to an estimated 80 trillion naira ($488 billion) for 2013, Yemi Kale, head of the National Bureau of Statistics, said at a news conference yesterday to release the data in the capital, Abuja….

The NBS recalculated the value of GDP based on production patterns in 2010, increasing the number of industries it measures to 46 from 33 and giving greater weighting to sectors such as telecommunications and financial services.

The actual change appears to be due almost entirely to Nigeria including figures in GDP calculation that had been excluded previously. There is nothing wrong with this, per se, but it makes comparisons completely unrealistic. This would be like measuring your height in bare feet for years, then doing it while wearing platform shoes. Your reported height would look quite different, without any real growth taking place. Similar complications arise when comparing Nigeria’s new figures to other countries’, when the others have not changed their methodology.

Nigeria’s recalculation adds another layer of complexity to the problems plaguing African development statistics. Lack of transparency (not to mention accuracy) in reporting economic activity makes decisions about foreign aid and favorable loans more difficult. For more information on these problems, see this post discussing Morten Jerven’s book Poor NumbersIf you would like to know more about GDP and other economic summaries, and how they shape our world, I would recommend Macroeconomic Patterns and Stories (somewhat technical), The Leading Indicators, and GDP: A Brief but Affectionate History.

Mexico Update Following Joaquin Guzmán’s Capture

As you probably know by now, the Sinaloa cartel’s leader Joaquin Guzmán was captured in Mexico last Saturday. How will violence in Mexico shift following Guzman’s removal?

(Alfredo Estrella/AFP/Getty Images)

(Alfredo Estrella/AFP/Getty Images)

I take up this question in an article forthcoming in the Journal of Quantitative Criminology. According to that research (which used negative binomial modeling on a cross-sectional time series of Mexican states from 2006 to 2010), DTO leadership removals in Mexico are generally followed by increased violence. However, capturing leaders is associated with less violence than killing them. The removal of leaders for whom a 30 million peso bounty (the highest in my dataset, which generally identified high-level leaders) been offered is also associated with less violence. The reward for Guzmán’s capture was higher than any other contemporary DTO leader: 87 million pesos. Given that Guzmán was a top-level leader and was arrested rather than killed, I would not expect a significant uptick in violence (in the next 6 months) due to his removal. This follows President Pena Nieto’s goal of reducing DTO violence.

My paper was in progress for a while, so the data is a few years old. Fortunately Brian Phillips has also taken up this question using additional data and similar methods, and his results largely corroborate mine:

Many governments kill or capture leaders of violent groups, but research on consequences of this strategy shows mixed results. Additionally, most studies have focused on political groups such as terrorists, ignoring criminal organizations – even though they can represent serious threats to security. This paper presents an argument for how criminal groups differ from political groups, and uses the framework to explain how decapitation should affect criminal groups in particular. Decapitation should weaken organizations, producing a short-term decrease in violence in the target’s territory. However, as groups fragment and newer groups emerge to address market demands, violence is likely to increase in the longer term. Hypotheses are tested with original data on Mexican drug-trafficking organizations (DTOs), 2006-2012, and results generally support the argument. The kingpin strategy is associated with a reduction of violence in the short term, but an increase in violence in the longer term. The reduction in violence is only associated with leaders arrested, not those killed.

A draft of the full paper is here.

Visualizing the Indian Buffet Process with Shiny

(This is a somewhat more technical post than usual. If you just want the gist, skip to the visualization.)

N customers enter an Indian buffet restaurant, one after another. It has a seemingly endless array of dishes. The first customer fills her plate with a Poisson(α) number of dishes. Each successive customer i tastes the previously sampled dishes in proportion to their popularity (the number of previous customers who have sampled the kth dish, m_k, divided by i). The ith customer then samples a Poisson(α) number of new dishes.

That’s the basic idea behind the Indian Buffet Process (IBP). On Monday Eli Bingham and I gave a presentation on the IBP in our machine learning seminar at Duke, taught by Katherine Heller. The IBP is used in Bayesian non-parametrics to put a prior on (exchangeability classes of) binary matrices. The matrices usually represent the presence of features (“dishes” above, or the columns of the matrix) in objects (“customers,” or the rows of the matrix). The culinary metaphor is used by analogy to the Chinese Restaurant Process.

Although the visualizations in the main paper summarizing the IBP are good, I thought it would be helpful to have an interactive visualization where you could change α and N to see how what a random matrix with those parameters looks like. For this I used Shiny, although it would also be fun to do in d3.

One realization of the IBP, with α=10.

One realization of the IBP, with α=10.

In the example above, the first customer (top row) sampled seven dishes. The second customer sampled four of those seven dishes, and then four more dishes that the first customer did not try. The process continues for all 10 customers. (Note that this matrix is not sorted into its left-ordered-form. It also sometimes gives an error if α << N, but I wanted users to be able to choose arbitrary values of N so I have not changed this yet.) You can play with the visualization yourself here.

Interactive online visualizations like this can be a helpful teaching tool, and the process of making them can also improve your own understanding of the process. If you would like to make another visualization of the IBP (or another machine learning tool that lends itself to graphical representation) I would be happy to share it here. I plan to add the Chinese restaurant process and a Dirichlet process mixture of Gaussians soon. You can find more about creating Shiny apps here.

What Can We Learn from Games?

ImageThis holiday season I enjoyed giving, receiving, and playing several new card and board games with friends and family. These included classics such as cribbage, strategy games like Dominion and Power Grid, and the whimsical Munchkin.

Can video and board games teach us more than just strategy? What if games could teach us not to be better thinkers, but just to be… better? A while ago we discussed how monopoly was originally designed as a learning experience to promote cooperation. Lately I have learned of two other such games in a growing genre and wanted to share them here.

The first is Depression Quest by Zoe Quinn (via Jeff Atwood):

Depression Quest is an interactive fiction game where you play as someone living with depression. You are given a series of everyday life events and have to attempt to manage your illness, relationships, job, and possible treatment. This game aims to show other sufferers of depression that they are not alone in their feelings, and to illustrate to people who may not understand the illness the depths of what it can do to people.

The second is Train by Brenda Romero (via Marcus Montano) described here with spoilers:

In the game, the players read typewritten instructions. The game board is a set of train tracks with box cars, sitting on top of a window pane with broken glass. There are little yellow pegs that represent people, and the player’s job is to efficiently load those people onto the trains. A typewriter sits on one side of the board.

The game takes anywhere from a minute to two hours to play, depending on when the players make a very important discovery. At some point, they turn over a card that has a destination for the train. It says Auschwitz. At that point, for anyone who knows their history, it dawns on the player that they have been loading Jews onto box cars so they can be shipped to a World War II concentration camp and be killed in the gas showers or burned in the ovens.

The key emotion that Romero said she wanted the player to feel was “complicity.”

“People blindly follow rules,” she said. “Will they blindly follow rules that come out of a Nazi typewriter?”

I have tried creating my own board games in the past, and this gives me renewed interest and a higher standard. What is the most thought-provoking moment you have experienced playing games?

Two Great Talks on Government and Technology

If you are getting ready to travel next week, you might want to have a couple of good talks/podcasts handy for the trip. Here are two that I enjoyed, on the topic of government and technology.

The first is about how technology can help governments. Ben Orenstein of “Giant Robots Smashing Into Other Giant Robots” discusses Code for America with Catherine Bracy. Catherine recounts some ups and downs of CfA’s partnerships with cities throughout America and internationally. CfA fellows commit a year to help local governments with challenges amenable to technology. One great example that the podcast discusses is a tool for parents in Boston to see which schools they could send their kids to when the city switched from location-based school assignment to allowing students to attend schools throughout the city. (Incidentally, the school matching algorithm that Boston used was designed by some professors in economics at Duke, who drew on work for which Roth and Shapley won the Nobel Prize.)

The second talk offers another point of view on techno-politics: when government abuses technology. Steve Klabnik‘s “No Secrets Allowed” talk from Golden Gate Ruby Conference discusses recent revelations regarding the NSA and privacy. In particular he explains why “I have nothing to hide” is not an appropriate response. The talk is not entirely hopeless, and includes recommendations such as using Tor. The Ruby Rogues also had a roundtable discussing Klabnik’s presentation, which you can find here.

Other recommendations are welcome.