Epstein on Athletes

As a follow-up to the most recent series of posts, you may enjoy this TED talk by David Epstein. Epstein is the author of The Sports Gene and offered the claim that kicked off those earlier posts–that he could accurately guess an Olympian’s sport knowing only her height and weight.

The talk offers some additional context for Epstein’s claim. Specifically Epstein describes how the average height and weight in a set of 24 sports has become more different over time:

In the early half of the 20th century, physical education instructors and coaches had the idea that the average body type was the best for all athletic endeavors: medium height, medium weight, no matter the sport. And this showed in athletes’ bodies. In the 1920s, the average elite high-jumper and average elite shot-putter were the same exact size. But as that idea started to fade away, as sports scientists and coaches realized that rather than the average body type, you want highly specialized bodies that fit into certain athletic niches, a form of artificial selection took place, a self-sorting for bodies that fit certain sports, and athletes’ bodies became more different from one another. Today, rather than the same size as the average elite high jumper, the average elite shot-putter is two and a half inches taller and 130 pounds heavier. And this happened throughout the sports world.

Here’s the chart used to support that point, with data points from the early twentieth century in yellow and more recent data points in blue:

Average height and mass for athletes in 24 sports in the early twentieth century (yellow) and today (blue)

Average height and mass for athletes in 24 sports in the early twentieth century (yellow) and today (blue)

This suggests that it has become easier over time to guess individuals’ sports based on physical characteristics, but as we saw it is still difficult to do with a high degree of accuracy.

Another interesting change highlighted in the talk is the role of technology:

In 1936, Jesse Owens held the world record in the 100 meters. Had Jesse Owens been racing last year in the world championships of the 100 meters, when Jamaican sprinter Usain Bolt finished, Owens would have still had 14 feet to go…. [C]onsider that Usain Bolt started by propelling himself out of blocks down a specially fabricated carpet designed to allow him to travel as fast as humanly possible. Jesse Owens, on the other hand, ran on cinders, the ash from burnt wood, and that soft surface stole far more energy from his legs as he ran. Rather than blocks, Jesse Owens had a gardening trowel that he had to use to dig holes in the cinders to start from. Biomechanical analysis of the speed of Owens’ joints shows that had been running on the same surface as Bolt, he wouldn’t have been 14 feet behind, he would have been within one stride. 

The third change Epstein discusses is more dubious: a “changing mindset” among athletes giving them a “can do” attitude. In particular he mentions Roger Bannister’s four-minute mile as a major psychological breakthrough in sporting. As this interview makes clear, Bannister attributes the fact that no progress was made in the fastest mile time between 1945 and 1954 to the destruction, rationing, and overall quite distracting events of WWII. It’s possible that a four-minute mile was run as early as 1770. I wonder what Epstein’s claims would look like on that time scale?

Classifying Olympic Athletes by Sport and Event (Part 3)

This is the last post in a three-part series. Part one, describing the data, is here. Part two gives an overview of the machine learning methods and can be found here. This post presents the results.

To present the results I will use classification matrices, transformed into heatmaps. The rows indicate Olympians’ actual sports, and the columns are their predicted sports. A dark value on the diagonal indicates accurate predictions (the athlete is predicted to be in their actual sport) while light values on the diagonal suggest that Olympians in a certain sport are misclassified by the algorithms used. In each case results for the training set are in the left column and results for the test set are on the right. For a higher resolution version, see this pdf.

Classifying Athletes by Sport

sport-matrices

 

For most rows, swimming is the most common predicted sport. That’s partially because there are so many swimmers in the data and partially due to the fact that swimmers have a fairly generic body type as measured by height and weight (see the first post). With more features such as arm length and torso length we could better distinguish between swimmers and non-swimmers.

Three out of the four methods perform similarly. The real oddball here is random forest: it classifies the training data very well, but does about as well on the test data as the other methods. This suggests that random forest is overfitting the data, and won’t give us great predictions on new data.

Classifying Athletes by Event

event-matrices

The results here are similar to the ones above: all four methods do about equally well for the test data, while random forest overfits the training data. The two squares in each figure represent male and female sports. This is a good sanity check–at least our methods aren’t misclassifying men into women’s events or vice versa (recall that sex is one of the four features used for classification).

Accuracy

Visualizations are more helpful than looking at a large table of predicted probabilities, but what are the actual numbers? How accurate are the predictions from these methods? The table below presents accuracy for both tasks, for training and test sets.

accuracy

The various methods classify Olympians into sports and events with about 25-30 percent accuracy. This isn’t great performance. Keep in mind that we only had four features to go on, though–with additional data about the participants we could probably do better.

After seeing these results I am deeply skeptical that David Epstein could classify Olympians by event using only their height and weight. Giving him the benefit of the doubt, he probably had in mind the kind of sports and events that we saw were easy to classify: basketball, weightlifting, and high jump, for example. These are the types of competitions that The Sports Gene focuses on. As we have seen, though, there is a wide range of sporting events and a corresponding diversity of body types. Being naturally tall or strong doesn’t hurt, by it also doesn’t automatically qualify you for the Olympics. Training and hard work play an important role, and Olympic athletes exhibit a wide range of physical characteristics.

Classifying Olympic Athletes by Sport and Event (Part 1)

Note: This post is the first in a three-part series. It describes the motivation for this project and the data used. When parts two and three are posted I will link to them here.

Can you predict which sport or event an Olympian competes in based solely on her height, weight, age and sex? If so, that would suggest that physical features strongly drive athletes’ relative abilities across sports, and that they pick sports that best leverage their physical predisposition. If not, we might infer that athleticism is a latent trait (like “grit“) that can be applied to the sport of one’s choice.

SportsGeneDavid Epstein argues that sporting success is largely based on heredity in his book, The Sports Gene. To support his argument, he describes how elite athletes’ physical features have become more specialized to their sport over time (think Michael Phelps). At a basic level Epstein is correct: males and females differ at both a genetic level and in their physical features, generally speaking.

However, Epstein advanced a stronger claim in an interview (at 29:46) with Russ Roberts:

Roberts: [You argue that] if you simply had the height and weight of an Olympic roster, you could do a pretty good job of guessing what their events are. Is that correct?

Epstein: That’s definitely correct. I don’t think you would get every person accurately, but… I think you would get the vast majority of them correctly. And frankly, you could definitely do it easily if you had them charted on a height-and-weight graph, and I think you could do it for most positions in something like football as well.

I chose to assess Epstein’s claim in a project for a machine learning course at Duke this semester. The data was collected by The Guardian, and includes all participants for the 2012 London Summer Olympics. There was complete data on age, sex, height, and weight for 8,856 participants, excluding dressage (an oddity of the data is that every horse-rider pair was treated as the sole participant in a unique event described by the horse’s name). Olympians participate in one or more events (fairly specific competitions, like a 100m race), which are nested in sports (broader categories such as “Swimming” or “Athletics”).

Athletics is by far the largest sport category (around 20 percent of athletes), so when it was included it dominated the predictions. To get more accurate classifications, I excluded Athletics participants from the sport classification task. This left 6,956 participants in 27 sports, split into a training set of size 3,520 and a test set of size 3,436. The 1,900 Athletics participants were classified into 48 different events, and also split into training (907 observations) and test sets (993 observations). For athletes participating in more than one event, only their first event was used.

What does an initial look at the data tell us? The features of athletes in some sports (Basketball, Rowing, Weightlifting, and Wrestling) and events (100m hurdles, Hammer throw, High jump, and Javelin) exhibit strong clustering patters. This makes it relatively easy to guess a participant’s sport or event based on her features. In other sports (Archery, Swimming, Handball, Triathlon) and events (100m race, 400m hurdles, 400m race, and Marathon) there are many overlapping clusters making classification more difficult.

sport-descriptive

Well-defined (left) and poorly-defined clusters of height and weight by sport.

Well-defined (left) and poorly-defined clusters of height and weight by event.

Well-defined (left) and poorly-defined clusters of height and weight by event.

The next post, scheduled for Wednesday, will describe the machine learning methods I applied to this problem. The results will be presented on Friday.

What Really Happened to Nigeria’s Economy?

You may have heard the news that the size Nigeria’s economy now stands at nearly $500 billion. Taken at face value (as many commenters have seemed all to happy to do) this means that the West African state “overtook” South Africa’s economy, which was roughly $384 billion in 2012. Nigeria’s reported GDP for that year was $262 billion, meaning it roughly doubled in a year.

How did this “growth” happen? As Bloomberg reported:

On paper, the size of the economy expanded by more than three-quarters to an estimated 80 trillion naira ($488 billion) for 2013, Yemi Kale, head of the National Bureau of Statistics, said at a news conference yesterday to release the data in the capital, Abuja….

The NBS recalculated the value of GDP based on production patterns in 2010, increasing the number of industries it measures to 46 from 33 and giving greater weighting to sectors such as telecommunications and financial services.

The actual change appears to be due almost entirely to Nigeria including figures in GDP calculation that had been excluded previously. There is nothing wrong with this, per se, but it makes comparisons completely unrealistic. This would be like measuring your height in bare feet for years, then doing it while wearing platform shoes. Your reported height would look quite different, without any real growth taking place. Similar complications arise when comparing Nigeria’s new figures to other countries’, when the others have not changed their methodology.

Nigeria’s recalculation adds another layer of complexity to the problems plaguing African development statistics. Lack of transparency (not to mention accuracy) in reporting economic activity makes decisions about foreign aid and favorable loans more difficult. For more information on these problems, see this post discussing Morten Jerven’s book Poor NumbersIf you would like to know more about GDP and other economic summaries, and how they shape our world, I would recommend Macroeconomic Patterns and Stories (somewhat technical), The Leading Indicators, and GDP: A Brief but Affectionate History.

Constitutional Forks Revisited

Around this time last year, we discussed the idea of a constitutional “fork” that occurred with the founding of the Confederate States of America. That post briefly explains how forks work in open source software and how the Confederates used the US Constitution as the basis for their own, with deliberate and meaningful differences. Putting the two documents on Github allowed us to compare their differences visually and confirm our suspicions that many of them were related to issues of states’ rights and slavery.

Caleb McDaniel, a historian at Rice who undoubtedly has a much deeper and more thorough knowledge of the period, conducted a similar exercise and also posted his results on Github. He was faced with similar decisions of where to obtain the source text and which differences to retain as meaningful (for example, he left in section numbers where I did not). My method identifies 130 additions and 119 deletions when transitioning between the USA and CSA constitutions, whereas the stats for Caleb’s repo show 382 additions and 370 deletions.

What should we draw from these projects? In Caleb’s words:

My decisions make this project an interpretive act. You are welcome to inspect the changes more closely by looking at the commit histories for the individual Constitution files, which show the initial text as I got it from Avalon as well as the changes that I made.

You can take a look at both projects and conduct a difference-in-differences exploration of your own. More generally, these projects show the need for tools to visualize textual analyses, as well as the power of technology to enhance understanding of historical and political acts. Caleb’s readme file has great resources for learning more about this topic including the conversation that led him to this project, a New York Times interactive feature on the topic, and more.

Don’t Forget Your Forever Stamps

The price of a first-class US stamp is set to increase from 46 to 49 cents on January 26. Like Cosmo Kramer’s Michigan bottle redemption plan (see below), Allison Schrager and Ritchie King ran the numbers on whether it would be possible to provide from Forever Stamp arbitrage.

Could the scheme make money? Maybe–if you get the timing right and pay low interest on capital:

Assuming we sell all 10 million stamps for the bulk discount price of $0.475 each, our profit will be $150,000. Subtract out the $399 for the distributor database. Let’s also assume we spent the $3,500 for Check Stand Program plus, say, $300 to make the 100 displays for advertising in stores. That gives us $145,801.

If we do manage to shift the stamps in a month, the interest on our debt will be $29,000. That brings our profits to $116,801. Then we’ll return the equity to our shareholders, along with 50% of the profits.

That leaves us with the other 50%: $58,400.50. If you look at that as a profit on the $4.6 million initial outlay, it’s not very much: less than 1.3%. But remember, all that outlay was leveraged. So if you look at it as a return on our investment—$33.25 for shipping—it’s 175,541%.

Github for Government

What happens when you combine open source software, open data, and open government? For the city of Munich, the switch to open source software has been a big success:

In one of the premier open source software deployments in Europe, the city migrated from Windows NT to LiMux, its own Linux distribution. LiMux incorporates a fully open source desktop infrastructure. The city also decided to use the Open Document Format (ODF) as a standard, instead of proprietary options.

As of November last year, the city saved more than €11.7 million because of the switch. More recent figures were not immediately available, but cost savings were not the only goal of the operation. It was also done to be less dependent on manufacturers, product cycles and proprietary OSes, the council said.

We’ve talked before about how more city governments could follow the open data, open government initiatives of NYC, using tech to benefit citizens rather than (only) creating initiatives to attract tech companies to the area. This shift in emphasis, toward harnessing the power of technology for widespread gains in happiness, is likely to become even more important following recent protests against tech employees in the Bay Area.

Open data and open government will take the principles of open source and use them to make an even bigger social and political impact. One tool from open source that can be adapted for use by these newer movements is Github. We will continue to follow these trends here, and if you are interested in this trend you can also check out Github and Government for more success stories.

What Can We Learn from Games?

ImageThis holiday season I enjoyed giving, receiving, and playing several new card and board games with friends and family. These included classics such as cribbage, strategy games like Dominion and Power Grid, and the whimsical Munchkin.

Can video and board games teach us more than just strategy? What if games could teach us not to be better thinkers, but just to be… better? A while ago we discussed how monopoly was originally designed as a learning experience to promote cooperation. Lately I have learned of two other such games in a growing genre and wanted to share them here.

The first is Depression Quest by Zoe Quinn (via Jeff Atwood):

Depression Quest is an interactive fiction game where you play as someone living with depression. You are given a series of everyday life events and have to attempt to manage your illness, relationships, job, and possible treatment. This game aims to show other sufferers of depression that they are not alone in their feelings, and to illustrate to people who may not understand the illness the depths of what it can do to people.

The second is Train by Brenda Romero (via Marcus Montano) described here with spoilers:

In the game, the players read typewritten instructions. The game board is a set of train tracks with box cars, sitting on top of a window pane with broken glass. There are little yellow pegs that represent people, and the player’s job is to efficiently load those people onto the trains. A typewriter sits on one side of the board.

The game takes anywhere from a minute to two hours to play, depending on when the players make a very important discovery. At some point, they turn over a card that has a destination for the train. It says Auschwitz. At that point, for anyone who knows their history, it dawns on the player that they have been loading Jews onto box cars so they can be shipped to a World War II concentration camp and be killed in the gas showers or burned in the ovens.

The key emotion that Romero said she wanted the player to feel was “complicity.”

“People blindly follow rules,” she said. “Will they blindly follow rules that come out of a Nazi typewriter?”

I have tried creating my own board games in the past, and this gives me renewed interest and a higher standard. What is the most thought-provoking moment you have experienced playing games?

The Economy That Is Stanford

Five of the six most-visited websites in the world are here, in ranked order: Facebook, Google, YouTube (which Google owns), Yahoo! and Wikipedia. (Number five is a Chinese-language site.) If corporations founded by Stanford alumni were to form an independent nation, it would be the tenth largest economy in the world, with an annual revenue of $2.7 trillion, as some professors at that university recently calculated. Another new report says: ‘If the internet was a country, its gross domestic product would eclipse all others but four within four years.’

That’s from this London Review of Books piece by Rebecca Solnit. The October, 2012, research report on which the claim is based is here, based on survey data. Solnit’s piece is interesting throughout, including a discussion of parallels and differences between the tech boom and the Gold Rush.

Political Forecasting and the Use of Baseline Rates

As Joe Blitzstein likes to say, “Thinking conditionally is a condition for thinking.” Humans are not naturally good at this skill. Consider the following example: Kelly is interested in books and keeping things organized. She loves telling stories and attending book clubs. Is it more likely that Kelly is a bestselling novelist or an accountant?

Many of the “facts” about Kelly in that story might lead you to answer that she is a novelist. Only one–her sense of organization–might have pointed you toward an accountant. But think about the overall probability of each career. Very few bookworms become successful novelists, and there are many more accountants than (successful) authors in the modern workforce. Conditioning on the baseline rate helps make a more accurate decision.

I make a similar point–this time applied to political forecasting–in a recent blog post for the blog of Mike Ward’s lab (of which I am a member):

One piece of advice that Good Judgment forecasters are often reminded of is to use the baseline rate of an event as a starting point for their forecast. For example, insurgencies are a very rare event on the whole. For the period January, 2001 to August, 2013, insurgencies occurred in less than 10 percent of country-months in the ICEWS data set.

From this baseline, we can then incorporate information about the specific countries at hand and their recent history… Mozambique has not experienced an insurgency for the entire period of the ICEWS dataset. On the other hand, Chad had an insurgency that ended in December, 2003, and another that extended from November, 2005, to April, 2010. For the duration of the ICEWS data set, Chad has experienced an insurgency 59 percent of the time. This suggests that our predicted probability of insurgency in Chad should be higher than for Mozambique.

I started writing that post before rebels in Mozambique broke their treaty with the government. Maybe I spoke too soon, but the larger point is that baselines are the starting point–not the final product–of any successful forecast.

Having more data is useful, as long as it contributes more signal than noise. That’s what ICEWS aims to do, and I consider it a useful addition to the toolbox of forecasters participating in the Good Judgment Project. For more on this collaboration, as well as a map of insurgency rates around the globe as measured by ICEWS, see the aforementioned post here.