[caption id="attachment_2147" align="alignright" width="300"]Restored Panther tank, recovered from a Polish swamp. Private collection of the late Jacques Littlefield, Portola Valley CA. Restored Panther tank recovered from a Polish swamp. Private collection of the late Jacques Littlefield, Portola Valley CA.[/caption]

A few weeks ago I was talking with Kieran Healy about the impact the Second World War had on social science research. Specifically we discussed Machine Dreams and Keep from All Thoughtful Men. The conversation became less esoteric more interesting when he brought up the German tank problem, which I had not heard before in this particular form.

Richard Ruggles of Harvard and Henry Brodie of the State Department wrote the original statement of the problem in a 1947 JASA article:

In early 1943 the Economic Warfare Division of the American Embassy in London started to analyze markings and serial numbers obtained from captured German equipment in order to obtain estimates of German war production and strength....

Various kinds of captured enemy equipment were studied by this technique. The first product to be so analyzed was tires, and after this tanks, trucks, guns, flying bombs, and rockets were studied. Aircraft markings were not studied by the Economic Warfare Division, since, by previous agreement, the British Air Ministry bore the responsibility for all estimates on aircraft production. The uses of the intelligence derived from the markings were varied. At times it helped decide the target systems of the air forces; on other occasions it gave indications of German strength in weapons such as tanks and rockets.

The Allies needed to estimate German manufacturing capacity for a number of reasons. That information would give them a sense of how quickly the needed to produce to keep up. It would also let them know about how many factories would need to be targeted for air raids. This could also allow US and British forces to estimate whether the raids were effective at reducing German production.

So how did they do it? Well, in typical German fashion the Axis powers were quite organized, and many vehicle components bore markings that revealed information about their provenance. The Allied researchers used a bit of intuition (month codes should have more variance than year codes, for example) coupled with solid statistical know-how. Back to Ruggles and Brodie:

[A]ll Mark I tanks fell in the series 0 to 20,000, all Mark IIs in the series 20,000 to 30,000, and so on. When the cases in any particular series were arranged in an array, it became evident that some central authority had allotted the various producers one or more bands of numbers within the series.

Oversimplifying just a bit, the serial numbers were quasi-random draws from a uniform distribution. If each number revealed information about the date and place of manufacture, the Allies could estimate the rate at which Germany was producing tanks and the number it currently had. In statistical terms the problem is to estimate the maximum of a uniform distribution. (Perhaps I found Kieran's example so interesting because I have been asked to solve this problem in a number of ways in statistics homework assignments but never with such a real-world motivation.)

After the war actual production data became available, allowing us to see how good the estimates actually were. I'll omit the technical details of estimation for now, but they are available in the paper and at the Wikipedia article linked above. In short the statistical analysis was pretty darn good--much better than any of the guesses by field agents at the time:

[caption id="attachment_2133" align="aligncenter" width="555"]RugglesBrodie Allied estimates of German war production tended to be fairly accurate. The use of statistical methods greatly improved the accuracy of estimates of human intelligents agents.[/caption]

For tanks specifically, estimation accuracy increased as the war went on. Analysts were even able to approximate the proportion of tanks produced by each manufacturer.



Not bad! I hope you have enjoyed this bit of history as much as I did. It's a nice motivation for estimating the maximum of a discrete uniform distribution. More than that, though, it's a testament to the applicability of statistical know-how to potentially life saving problems.

(NB: This is my 300th post on YSPR since we got started nearly two years ago. Thanks for being part of the conversation!)