A Checklist for Using Open Source Software in Production

A great majority of the web is built on open source software. Approximately two-thirds of public servers on the internet run a *nix operating system, and over half of those are Linux. The most popular server-side programming languages also tend to be open source (including my favorite, Ruby). This post is about adding a new open source library to an existing code base. What questions should you ask before adding such a dependency to a production application?

The first set of questions are the most basic. A “no” to any of these should prompt you to look elsewhere.
  • Is the project written in a language you support? Is it in a language you support? If not, is it compatible (e.g. through stdin/stdout or by compiling to your language of choice)?
  • Is the project in a version of of the language you support? If it’s written in Python 3 and you only support Python 2, for example, using this library could lead to headaches.
  • Can you use the project in your framework of choice (e.g. Rails or Django)?
  • Are there conflicts with other libraries or packages you’re currently using? (This is probably the hardest question to answer, and you might not know until you try it.)
Assuming there are no immediate technical barriers, the next questions to ask are of the legal variety. Open source licenses come in many flavors. In the absence of a license, traditional copyright rules apply. Be especially careful if the project you are investigating uses the GPL license–even basing the code you write off of a GPL open source project can have serious legal ramifications. There’s a great guide to OSS licenses on Github. If you’re the author or maintainer of an open source project checkout choosealicense.com.
The next thing to consider is whether and how the project is tested. If there is not an automated test suite, consider starting one as your first contribution to the project and be very reluctant to add the project to your application. Other related questions include:
  • Are there unit tests?
  • Are there integration tests?
  • What is the test coverage like?
  • Do the tests run quickly?
  • Are the tests clearly written?
Finally, by using an open source project you are also joining a community of developers. None of these questions are necessarily show-stoppers but knowing the size of the community and the tone of its discourse can save you pain down the road.
  • Is the project actively maintained? When was the last commit?
  • Does the community have a civil, professional style of debate and discussion?
  • Is there only one developer/maintainer who knows everything? This doesn’t have to be a deal breaker. However, if there is a single gatekeeper you should make sure you understand the basics of the code and could fork the project if necessary.

This is by no means an exhaustive list but these questions can serve as a useful checklist before adding an open source as a dependency for your project.

Tirole on Open Source

Jean Tirole is the latest recipient of the Nobel prize in economics, as was announced Monday. For more background on his work, see NPR and the New Yorker. My favorite portion of Tirole’s work (and, admittedly, pretty much the only part I’ve read) is his work on open source software communities. Much of this is joint work with Josh Lerner. Below I share a few selections from his work that indicate the general theme.

open_sourceThere are two main economic puzzles to open source software. First, why would highly skilled workers who earn a substantial hourly wage contribute their time to developing a product they won’t directly sell (and how do they convince their employers, in some cases, to support this)? Second, given the scale of these projects, how do they self-govern to set priorities and direct effort?

The answer to the first question is a combination of personal reputation and the ability to develop complementary software (Lerner and Tirole, 2002, p. 215-217). Most software work is “closed source,” meaning others can see the finished product but not the underlying code. For software developers, having your code out in the open gives others (especially potential collaborators or employers) the chance to assess your abilities. This is important to ensure career mobility. Open source software is also a complement to personal or professional projects. When there are components that are common across many projects, such as an operating system (Linux) or web framework (Rails), it makes sense for many programmers to contribute their effort to build a better mousetrap. This shared component can then improve everyone’s future projects by saving them time or effort. The collaboration of many developers also helps to identify bugs that may not have been caught by any single individual. Some of Tirole’s earlier work on collective reputations is closely related, as their appears to be an “alumni effect” for developers who participated in successful projects.

Tirole and Lerner’s answer to the second question revolves around leadership. Leaders are often the founders of or early participants in the open software project. Their skills and early membership status instill trust. As the authors put it, other programmers “must believe that the leader’s objectives are sufficiently congruent with theirs and not polluted by ego-driven, commercial, or political biases. In the end, the leader’s recommendations are only meant to convey her information to the community of participants.” (Lerner and Tirole, 2002, p. 222) This relates to some of Tirole’s other work, with Roland Benabou, on informal laws and social norms.

Again, this is only a small portion of Tirole’s work, but I find it fascinating. There’s more on open source governance in the archives. This post on reputation in hacker culture or this one on the Ruby community are good places to start.

Schneier on Data and Power

Data and Power is the tentative title of a new book, forthcoming from Bruce Schneier. Here’s more from the post describing the topic of the book:

Corporations are collecting vast dossiers on our activities on- and off-line — initially to personalize marketing efforts, but increasingly to control their customer relationships. Governments are using surveillance, censorship, and propaganda — both to protect us from harm and to protect their own power. Distributed groups — socially motivated hackers, political dissidents, criminals, communities of interest — are using the Internet to both organize and effect change. And we as individuals are becoming both more powerful and less powerful. We can’t evade surveillance, but we can post videos of police atrocities online, bypassing censors and informing the world. How long we’ll still have those capabilities is unclear….

There’s a fundamental trade-off we need to make as society. Our data is enormously valuable in aggregate, yet it’s incredibly personal. The powerful will continue to demand aggregate data, yet we have to protect its intimate details. Balancing those two conflicting values is difficult, whether it’s medical data, location data, Internet search data, or telephone metadata. But balancing them is what society needs to do, and is almost certainly the fundamental issue of the Information Age.

There’s more at the link, including several other potential titles. The topic will likely interest many readers of this blog. It will likely build on his ideas of inequality and online feudalism, discussed here.

Constitutional Forks Revisited

Around this time last year, we discussed the idea of a constitutional “fork” that occurred with the founding of the Confederate States of America. That post briefly explains how forks work in open source software and how the Confederates used the US Constitution as the basis for their own, with deliberate and meaningful differences. Putting the two documents on Github allowed us to compare their differences visually and confirm our suspicions that many of them were related to issues of states’ rights and slavery.

Caleb McDaniel, a historian at Rice who undoubtedly has a much deeper and more thorough knowledge of the period, conducted a similar exercise and also posted his results on Github. He was faced with similar decisions of where to obtain the source text and which differences to retain as meaningful (for example, he left in section numbers where I did not). My method identifies 130 additions and 119 deletions when transitioning between the USA and CSA constitutions, whereas the stats for Caleb’s repo show 382 additions and 370 deletions.

What should we draw from these projects? In Caleb’s words:

My decisions make this project an interpretive act. You are welcome to inspect the changes more closely by looking at the commit histories for the individual Constitution files, which show the initial text as I got it from Avalon as well as the changes that I made.

You can take a look at both projects and conduct a difference-in-differences exploration of your own. More generally, these projects show the need for tools to visualize textual analyses, as well as the power of technology to enhance understanding of historical and political acts. Caleb’s readme file has great resources for learning more about this topic including the conversation that led him to this project, a New York Times interactive feature on the topic, and more.

Github for Government

What happens when you combine open source software, open data, and open government? For the city of Munich, the switch to open source software has been a big success:

In one of the premier open source software deployments in Europe, the city migrated from Windows NT to LiMux, its own Linux distribution. LiMux incorporates a fully open source desktop infrastructure. The city also decided to use the Open Document Format (ODF) as a standard, instead of proprietary options.

As of November last year, the city saved more than €11.7 million because of the switch. More recent figures were not immediately available, but cost savings were not the only goal of the operation. It was also done to be less dependent on manufacturers, product cycles and proprietary OSes, the council said.

We’ve talked before about how more city governments could follow the open data, open government initiatives of NYC, using tech to benefit citizens rather than (only) creating initiatives to attract tech companies to the area. This shift in emphasis, toward harnessing the power of technology for widespread gains in happiness, is likely to become even more important following recent protests against tech employees in the Bay Area.

Open data and open government will take the principles of open source and use them to make an even bigger social and political impact. One tool from open source that can be adapted for use by these newer movements is Github. We will continue to follow these trends here, and if you are interested in this trend you can also check out Github and Government for more success stories.

Uncle Bob on Public Policy and Software Professionalism

Software developers need to develop their own professional standard, or politicians will do it for them. That’s what “Uncle” Bob Martin argues in this interview starting about 28:00:

Healthcare.gov was awful. That’s a case where a software failure interfered with a public policy. Whether you agree with that policy or not that should scare the hell out of you, because the next public policy may be one much more important and if our software can’t cope with it we could be in a really deep, deep hole.

At some point or another, some software team is going to screw up so badly that there is a disaster of tremendous loss of life. At that point the politicians of the world will decide they have to do something about it. If we are not there with a set of minimum standards that we follow, practices that we follow, if we can’t convince those politicians that we have been behaving professionally and that this was an accident–if we can’t convince them that we weren’t being negligent–then they’ll have no choice but to regulate us. They’ll pass laws about which languages we use, what platforms we can program on, what books we have to read, and so on. It will not be a good outcome. I don’t want to be a civil servant.

The Economy That Is Stanford

Five of the six most-visited websites in the world are here, in ranked order: Facebook, Google, YouTube (which Google owns), Yahoo! and Wikipedia. (Number five is a Chinese-language site.) If corporations founded by Stanford alumni were to form an independent nation, it would be the tenth largest economy in the world, with an annual revenue of $2.7 trillion, as some professors at that university recently calculated. Another new report says: ‘If the internet was a country, its gross domestic product would eclipse all others but four within four years.’

That’s from this London Review of Books piece by Rebecca Solnit. The October, 2012, research report on which the claim is based is here, based on survey data. Solnit’s piece is interesting throughout, including a discussion of parallels and differences between the tech boom and the Gold Rush.

Two Great Talks on Government and Technology

If you are getting ready to travel next week, you might want to have a couple of good talks/podcasts handy for the trip. Here are two that I enjoyed, on the topic of government and technology.

The first is about how technology can help governments. Ben Orenstein of “Giant Robots Smashing Into Other Giant Robots” discusses Code for America with Catherine Bracy. Catherine recounts some ups and downs of CfA’s partnerships with cities throughout America and internationally. CfA fellows commit a year to help local governments with challenges amenable to technology. One great example that the podcast discusses is a tool for parents in Boston to see which schools they could send their kids to when the city switched from location-based school assignment to allowing students to attend schools throughout the city. (Incidentally, the school matching algorithm that Boston used was designed by some professors in economics at Duke, who drew on work for which Roth and Shapley won the Nobel Prize.)

The second talk offers another point of view on techno-politics: when government abuses technology. Steve Klabnik‘s “No Secrets Allowed” talk from Golden Gate Ruby Conference discusses recent revelations regarding the NSA and privacy. In particular he explains why “I have nothing to hide” is not an appropriate response. The talk is not entirely hopeless, and includes recommendations such as using Tor. The Ruby Rogues also had a roundtable discussing Klabnik’s presentation, which you can find here.

Other recommendations are welcome.

Visualizing the BART Labor Dispute

Labor disputes are complicated, and the BART situation is no different. Negotiations resumed this week after the cooling off period called for by the governor of California as a result of the July strikes.

To help get up to speed, check out the data visualizations made by the Bay Area d3 User Group in conjunction with the UC Berkeley VUDLab.  They have a round up of news articles, open data, and open source code, as well as links to all the authors’ Twitter profiles.

The infographics address several key questions relevant to the debate, including how much BART employees earn, who rides BART and where, and the cost of living for BART employees.



More here.

Technology and Government: San Francisco vs. New York

In a recent PandoMonthly interview, John Borthwick made a very interesting point. Many cities are trying to copy the success of Silicon Valley/Bay Area startups by being like San Francisco: hip, fun urban areas designed to attract young entrepreneurs and developers (Austin comes to mind). However, the relationship between tech and other residents is a strained one: witness graffiti to the effect of “trendy Google professionals raise housing prices” and the “startup douchebag” caricature.

New York, on the other hand, has a smaller startup culture (“Silicon Alley”) but much closer and more fruitful ties between tech entrepreneurs and city government. Mayor Bloomberg has been at the heart of this, with his Advisory Council on Technology and his 2012 resolution to learn to code. Bloomberg’s understanding of technology and relationship with movers and shakers in the industry will make him a tough act to follow.

Does this mean that the mayors of Chicago, Houston, or Miami need to be writing Javascript in their spare time? Of course not. But making an effort to understand and relate to technology professionals could yield great benefits.

Rather than trying to become the next Silicon Valley (a very tall order) it would be more efficacious for cities to follow New York’s model: ask not what your city can do for technology, but what technology can do for your city. Turn bus schedule PDF’s into a user-friendly app or–better yet, for many low-income riders–a service that allows you to text and see when the next bus will arrive. Instead of calling the city to set up services like water and garbage collection, add a form to the city’s website. The opportunities to make city life better for all citizens–not just developers and entrepreneurs–are practically boundless.

I was happy to see San Francisco take a small step in the right direction recently with the Open Law Initiative, but there is more to be done, and not just in the Bay Area. Major cities across the US and around the world could benefit from the New York model. See more of the Borthwick interview below: