“Developers are often willing to add complexity to a system if it means they get to avoid difficult conversations.” This (slightly paraphrased) quote from Matt Ranney, senior engineer at Uber, sums up one of the main themes of QCon NY 2016: much of what we initially view as technical problems often turn out to be communication or culture issues. When I attended QCon last week I saw this idea popping up in several talks. Below I will point out a few of the most striking examples.
In a talk on Incident Response, John Allspaw of Etsy spoke about how responding to anomalies is an area where teams have lots of tacit knowledge. When an incident occurs, there is a strong temptation to pattern-match and see which details look like previous incidents rather than considering current data on its own terms (engineers see this on a regular basis when debugging code–read the error message you have now rather than the one you had five minutes ago). Allspaw recently completed a graduate program in systems safety. He used a real incident at Etsy as material for his thesis, which you can read here.
Katharina Probst’s talk on the server-side scripting system at Netflix also touched on the roll of team communication. Because Netflix runs on thousands of different devices, the client side of their engineering organization is structured into teams by device. To allow for rapid development and mitigate dependencies on their backend team, Netflix allows device teams to write server-side scripts. These scripts were originally built alongside the main app, until the upload of a memory-intensive script caused cascading failures. Katharina pointed out that in retrospect one indicator that these scripts were growing more complex and consuming more resources was that device teams began referring to them as “apps.” Often as engineering teams we are comfortable having a different vocabulary from other teams or our users, but this example points out the substantial way in which the terms we use influence our thinking. This idea is not new but is important to keep in mind as we work with other teams. Even the same word could be used in different senses (Simok Chan of PredictionIO later highlighted the importance of determining what “real-time” means in a given use case).
The third talk that touched on the interplay of technical and cultural challenges in engineering was Matt Ranney’s, which I quoted above. He spoke about Uber’s migration from a monolithic API to microservices. One idea that came up multiple times in this talk was how much moving to a polyglot environment had fragmented the engineering culture at Uber. On a superficial level, it led to engineers becoming tribal and referring to others as “Java people” or “Go people” rather than seeing their shared challenges and goals. It also made code reuse and sharing more difficult, which is costly in any engineering organization (whether service-oriented or monolithic). Having a polyglot team also allows developers to keep their biases about what represents “the best tool for the job” without critically examining how similar problems have been addressed. The final challenge mentioned was that when your organization’s code runs in four languages it also makes conversations about performance more difficult–every team assumes their code is performant but that is because they are not using shared benchmarks.
It is easy to think of technical solutions to some of these problems, such as monitoring software or performance benchmarks that are language-agnostic. The real lesson, though, is how team culture impacts both how you view the problem space and how you address those problems. As Sunil Sadasivan (CTO of Buffer) put it, culture is the collection of “default settings” for your organization. Changing tools is much either than shaping culture, but the latter is the only way to build a great team that wins in the long run.