The “Manual for Civilization” Project: A Library for the End of the World

With the subtitle, “How to Rebuild Our World from Scratch,” you can probably guess the genre of The Knowledge. I read this ambitious book over the holidays, hoping that I could learn some of the basics of fields I’m less familiar with such as organic chemistry and medicine. On that front the book delivers, but does it live up to its title?

Some parts of the book were very practical while others seemed superfluous. Purifying water with bleach (p. 37) could be useful in even a small-scale disruption. But in the wake of a larger disaster I find it hard to believe that knowing how to build an internal combustion engine (p. 199) or mix gunpowder (p. 232) would be near-term priorities. (As an aside, the book contains a one-decimeter line segment from which you can reconstruct the entire metric system, but I happen to think that less formal systems of measurement such as the acre–the amount of land a yoke of oxen could plow in a day–would become popular in apocalyptic scenarios.)

The Knowledge is a fun read and contains some useful tips, but I would not want it to be my go-to book for emergencies. That is why I was interested to learn of the “Manual for Civilization” initiative, started by The Long Now Foundation.  This is a library of books that were listed by domain experts and Long Now staff and donors in answer to the question “If you were stranded on an island (or small hostile planetoid), what books would YOU want to have with you?

After reading through the answers I have compiled a short list of my own with the additional qualification that the book offers knowledge that is beneficial even if disaster doesn’t strike. The name after the title is the first recommender on whose list I noticed the book, with a link to their full list of recommendations. (Kevin Kelly’s compilation seemed especially good; his book Cool Tools would likely fit in the list below).

The Future of Imagination

Venkatesh Rao had a great piece last week on imagination as a survival skill. Here is the gist:

I suspect failure-to-self-actualize will become the leading cause of death (or madness) in the developed world.

Rao defines “self-actualization” as

the imaginative embodiment of internal realities (what the daemon feels) in the form of a dent in the universe: a surprising and free external reality that actualizes a new possibility for all

and “imagination” as

the ability to create unpredictable new meaning while generating more freedom than you consume.

The post is very good, worth reading twice. However, there is one key shift that Rao overlooks. He focuses on imagination as an essential ability for the wealthy (the “one percent”), but it has even bigger implications for a future of 100 percent unemployment.

This change is coming, slowly but surely. It’s hard to imagine a future where “the robots take over” entirely, but we are already seeing a society where the poorest have more leisure time than the wealthiest. As Arnold Kling writes:

The prediction I would make is that we would see a lot more leisure. For those whose skill adaptation is adequate, that leisure will take the form of earlier retirement, later entry into the work force, or shorter hours. For those whose skill adaptation is inadequate, that leisure will show up as unemployment or reluctant withdrawal from the labor force.

I think that if you look only at males in isolation, you will see this in the data. That is, men are working much less than they used to. For some men, this leisure is very welcome, but for others it is not. In that sense, I think that we should look at the fears of the early 1960s not as quaint errors but instead as fairly well borne out.

The availability of inexpensive leisure (think cable TV and Youtube) has increased the reservation wage of low-wage workers. This has made unskilled individuals less willing to work for near-minimum wage jobs, as detailed in this New York Times article.

Self-actualization of highly skilled individuals as described by Rao has created so much freedom for those at the bottom of the income distribution that they now choose not to work. In their own words, though, the willingly unemployed do not seem to live fulfilling lives. Self-actualization is as important for them as it is for the wealthy, but they suffer from a failure of imagination.

Design Patterns for Cooking

Last week Alexey introduced the idea of cooking patterns:

A recipe is basically a fixed set of actions and ingredients, while cooking techniques are just the possible actions. If we invent cooking patterns – an abstraction on top of each ingredient / action pair – we could have more understanding of the dish we are preparing while keeping the flexibility in ingredient and technique choice.

Let’s take fritters as an example. Wikipedia says the following:

Fritter is a name applied to a wide variety of fried foods, usually consisting of a portion of batter or breading which has been filled with bits of meat, seafood, fruit, or other ingredients.

A pattern in its most obvious form. Notice the “wide variety”, a fixed ingredient (batter) and a list of possible variables (meat, seafood, vegetables, fruit) that could influence the fritters you end up making.

I find this idea very exciting, because I enjoy cooking and am also in the process of learning more about software design patterns.

Cooking patterns seem like an accessible way to introduce beginners to more abstract ideas about software, too. Algorithms are often described as “recipes,” and this is a nice way to build on that concept.

For leveling up your cooking skills, ChefSteps looks promising. Their resources include classes, projects, and an ingredients wiki. I have signed up for one class and plan to follow up on this recommendation after completing it.

If you are interested in cooking patterns, check out the Github repo or read the full article.

A Checklist for Using Open Source Software in Production

A great majority of the web is built on open source software. Approximately two-thirds of public servers on the internet run a *nix operating system, and over half of those are Linux. The most popular server-side programming languages also tend to be open source (including my favorite, Ruby). This post is about adding a new open source library to an existing code base. What questions should you ask before adding such a dependency to a production application?

The first set of questions are the most basic. A “no” to any of these should prompt you to look elsewhere.
  • Is the project written in a language you support? Is it in a language you support? If not, is it compatible (e.g. through stdin/stdout or by compiling to your language of choice)?
  • Is the project in a version of of the language you support? If it’s written in Python 3 and you only support Python 2, for example, using this library could lead to headaches.
  • Can you use the project in your framework of choice (e.g. Rails or Django)?
  • Are there conflicts with other libraries or packages you’re currently using? (This is probably the hardest question to answer, and you might not know until you try it.)
Assuming there are no immediate technical barriers, the next questions to ask are of the legal variety. Open source licenses come in many flavors. In the absence of a license, traditional copyright rules apply. Be especially careful if the project you are investigating uses the GPL license–even basing the code you write off of a GPL open source project can have serious legal ramifications. There’s a great guide to OSS licenses on Github. If you’re the author or maintainer of an open source project checkout choosealicense.com.
The next thing to consider is whether and how the project is tested. If there is not an automated test suite, consider starting one as your first contribution to the project and be very reluctant to add the project to your application. Other related questions include:
  • Are there unit tests?
  • Are there integration tests?
  • What is the test coverage like?
  • Do the tests run quickly?
  • Are the tests clearly written?
Finally, by using an open source project you are also joining a community of developers. None of these questions are necessarily show-stoppers but knowing the size of the community and the tone of its discourse can save you pain down the road.
  • Is the project actively maintained? When was the last commit?
  • Does the community have a civil, professional style of debate and discussion?
  • Is there only one developer/maintainer who knows everything? This doesn’t have to be a deal breaker. However, if there is a single gatekeeper you should make sure you understand the basics of the code and could fork the project if necessary.

This is by no means an exhaustive list but these questions can serve as a useful checklist before adding an open source as a dependency for your project.

A New Wiki for Computer Science Symbols

Computer science is increasingly relevant to a wide range of professional fields, yet many working programmers today do not have a formal CS education. This makes it difficult for the uninitiated to read academic research in computer science and related fields. Keeping up with the latest research is not a job requirement for most programmers, but understanding fundamental papers (such as the ones listed on Papers We Love) is important for building on established knowledge.

However, jargon and unfamiliar symbols present a non-trivial barrier to entry. This came up in the discussion on a recent episode of the Turing Incomplete podcast. A few existing resources were mentioned such as Wikipedia’s math symbols page and Volume I of The Art of Computer Programming. None of these is ideal for new programmers who may not know the names of the symbols, though.

That’s why I started a CS notation wiki. There are currently four pages, one each for computational symbols, linguistic symbols, logical symbols, and mathematical operators. Each page currently only has a few entries, but requests for additional ones can be filed as Github issues. New contributions are certainly welcome, and should be submitted as pull requests. Contribution guidelines can be found on the wiki’s home page. Other suggestions can be submitted as comments here, via email, or on Twitter. Let me know how this could be more useful to you!

Falsehoods Programmers Believe

The first principle is that you must not fool yourself – and you are the easiest person to fool. – Richard Feynman

Programmers love to fool themselves. “This line has to work! I didn’t write that bug! It works on my machine!” But if ever there was a field where you can’t afford to fool yourself, it’s programming. (Unless of course you want to do something like lose $172,222 a second for 45 minutes).

Over the years I’ve enjoyed lots of articles that talk about false assumptions that programmers accept without really questioning them. I thought it would be helpful to have these collected in one place for reference purposes. If you know of articles that would be a good fit on this list, let me know and I will add them.

Falsehoods programmers believe…

Now in Print: “The Impact of Leadership Removal on Mexican Drug Trafficking Organizations”

My Journal of Quantitative Criminology article “The Impact of Leadership Removal on Mexican Drug Trafficking Organizations” is now in print. For the abstract and other discussions of the research see here, as well as the posts tagged “Mexico,” “drug trafficking,” and “leadership removal“.

Here is a timeline of the research and publication process:

  • Read an article in the Economist about DTO leadership removal, December 2010
  • Preliminary research for a graduate seminar in time series analysis at the University of Houston, Spring 2011
  • Draft of paper incorporating other research on organized crime and political violence for a seminar at Duke University, Fall 2011
  • Revised manuscript rejected from a security studies journal after R&R, Spring 2012
  • Revised manuscript rejected from a political violence journal after R&R, Late summer 2012
  • R&R from JQC, Summer 2013
  • Accepted for publication in JQC, December 2013
  • Published online, March 2014
  • Published in print, December 2014

All in all, a four-year project, with no significant changes to the manuscript in about 18 months previous to the print publication. The paper absolutely improved thanks to feedback from reviewers and quality, but I think you will agree that this is a very long feedback cycle.

Academia to Industry

Last week, Brian Keegan had a great post on moving from doctoral studies to industrial data science. If you have not yet read it, go read the whole thing. In this post I will share a couple of my favorite parts of the post, as well as one area where I strongly disagreed with Brian.

The first key point of the post is to obtain relevant, marketable skills while you are in grad school. There’s just no excuse not to, regardless of your field of study–taking classes and working with scholars in other departments is almost always allowed and frequently encouraged. As Brian puts it:

[I]f you spend 4+ years in graduate school without ever taking classes that demand general programming and/or data analysis skills, I unapologetically believe that your very real illiteracy has held you back from your potential as a scholar and citizen.

Another great nugget in the post is in the context of recruiters, but it is also very descriptive of a prevailing attitude in academia:

This [realizing recruiters’ self-interested motivations] is often hard for academics who have come up through a system that demands deference to others’ agendas under the assumption they have your interests at heart as future advocates.

The final point from the post that I want to discuss may be very attractive and comforting to graduate students doing industry interviews for the first time:

After 4+ years in a PhD program, you’ve earned the privilege to be treated better than the humiliation exercises 20-year old computer science majors are subjected to for software engineering internships.

My response to this is, “no, you haven’t.” This is for exactly the reasons mentioned above–that many graduate students can go through an entire curriculum without being able to code up FizzBuzz. A coding interview is standard for junior and midlevel engineers, even if they have a PhD. Frankly, there are a lot of people trying to pass themselves off as data scientists who can’t code their way out of a paper bag, and a coding interview is a necessary screen. Think of it as a relatively low threshold that greatly enhances the signal-to-noise ratio for the interviewer. If you’re uncomfortable coding in front of another person, spend a few hours pairing with a friend and getting their feedback on your code. Interviewers know that coding on a whiteboard or in a Google Doc is not the most natural environment, and should be able to calibrate for this.

With this one caveat, I heartily recommend the remainder of the original post. This is an interesting topic, and you can expect to hear more about it here in the future.

Tirole on Open Source

Jean Tirole is the latest recipient of the Nobel prize in economics, as was announced Monday. For more background on his work, see NPR and the New Yorker. My favorite portion of Tirole’s work (and, admittedly, pretty much the only part I’ve read) is his work on open source software communities. Much of this is joint work with Josh Lerner. Below I share a few selections from his work that indicate the general theme.

open_sourceThere are two main economic puzzles to open source software. First, why would highly skilled workers who earn a substantial hourly wage contribute their time to developing a product they won’t directly sell (and how do they convince their employers, in some cases, to support this)? Second, given the scale of these projects, how do they self-govern to set priorities and direct effort?

The answer to the first question is a combination of personal reputation and the ability to develop complementary software (Lerner and Tirole, 2002, p. 215-217). Most software work is “closed source,” meaning others can see the finished product but not the underlying code. For software developers, having your code out in the open gives others (especially potential collaborators or employers) the chance to assess your abilities. This is important to ensure career mobility. Open source software is also a complement to personal or professional projects. When there are components that are common across many projects, such as an operating system (Linux) or web framework (Rails), it makes sense for many programmers to contribute their effort to build a better mousetrap. This shared component can then improve everyone’s future projects by saving them time or effort. The collaboration of many developers also helps to identify bugs that may not have been caught by any single individual. Some of Tirole’s earlier work on collective reputations is closely related, as their appears to be an “alumni effect” for developers who participated in successful projects.

Tirole and Lerner’s answer to the second question revolves around leadership. Leaders are often the founders of or early participants in the open software project. Their skills and early membership status instill trust. As the authors put it, other programmers “must believe that the leader’s objectives are sufficiently congruent with theirs and not polluted by ego-driven, commercial, or political biases. In the end, the leader’s recommendations are only meant to convey her information to the community of participants.” (Lerner and Tirole, 2002, p. 222) This relates to some of Tirole’s other work, with Roland Benabou, on informal laws and social norms.

Again, this is only a small portion of Tirole’s work, but I find it fascinating. There’s more on open source governance in the archives. This post on reputation in hacker culture or this one on the Ruby community are good places to start.

Epstein on Athletes

As a follow-up to the most recent series of posts, you may enjoy this TED talk by David Epstein. Epstein is the author of The Sports Gene and offered the claim that kicked off those earlier posts–that he could accurately guess an Olympian’s sport knowing only her height and weight.

The talk offers some additional context for Epstein’s claim. Specifically Epstein describes how the average height and weight in a set of 24 sports has become more different over time:

In the early half of the 20th century, physical education instructors and coaches had the idea that the average body type was the best for all athletic endeavors: medium height, medium weight, no matter the sport. And this showed in athletes’ bodies. In the 1920s, the average elite high-jumper and average elite shot-putter were the same exact size. But as that idea started to fade away, as sports scientists and coaches realized that rather than the average body type, you want highly specialized bodies that fit into certain athletic niches, a form of artificial selection took place, a self-sorting for bodies that fit certain sports, and athletes’ bodies became more different from one another. Today, rather than the same size as the average elite high jumper, the average elite shot-putter is two and a half inches taller and 130 pounds heavier. And this happened throughout the sports world.

Here’s the chart used to support that point, with data points from the early twentieth century in yellow and more recent data points in blue:

Average height and mass for athletes in 24 sports in the early twentieth century (yellow) and today (blue)

Average height and mass for athletes in 24 sports in the early twentieth century (yellow) and today (blue)

This suggests that it has become easier over time to guess individuals’ sports based on physical characteristics, but as we saw it is still difficult to do with a high degree of accuracy.

Another interesting change highlighted in the talk is the role of technology:

In 1936, Jesse Owens held the world record in the 100 meters. Had Jesse Owens been racing last year in the world championships of the 100 meters, when Jamaican sprinter Usain Bolt finished, Owens would have still had 14 feet to go…. [C]onsider that Usain Bolt started by propelling himself out of blocks down a specially fabricated carpet designed to allow him to travel as fast as humanly possible. Jesse Owens, on the other hand, ran on cinders, the ash from burnt wood, and that soft surface stole far more energy from his legs as he ran. Rather than blocks, Jesse Owens had a gardening trowel that he had to use to dig holes in the cinders to start from. Biomechanical analysis of the speed of Owens’ joints shows that had been running on the same surface as Bolt, he wouldn’t have been 14 feet behind, he would have been within one stride. 

The third change Epstein discusses is more dubious: a “changing mindset” among athletes giving them a “can do” attitude. In particular he mentions Roger Bannister’s four-minute mile as a major psychological breakthrough in sporting. As this interview makes clear, Bannister attributes the fact that no progress was made in the fastest mile time between 1945 and 1954 to the destruction, rationing, and overall quite distracting events of WWII. It’s possible that a four-minute mile was run as early as 1770. I wonder what Epstein’s claims would look like on that time scale?