Data Analytics Throughout History

Colossus is named out of respect for the men and women who employed their skills and expertise in engineering, mathematics and linguistics during the Second World War at the home of the top-secret Codebreakers — Bletchley Park.

Shortly after Alan Turing, the leader of the Codebreakers, and his colleagues defeated the German’s Enigma code machine, a more advanced successor called Tunny was created. In order to defeat this new threat, the world’s first electronic digital computer was created by Thomas H. Flowers and his team of engineers. Originally built in 1943, this computer called Colossus successfully deciphered the Tunny message. Colossus’ success and its construction have been kept secret until recently.

After the second world war, Winston Churchill ordered that Colossus and its technology be safeguarded by the British Official Secrets Act. As a result, it was widely believed that the first computer was an American invention called ENIAC (Electronic Numerical Integrator and Computer), resulting in Colossus slipping into obscurity with Flowers and his team never receiving the proper recognition they deserved for inventing the world’s first computer. Colossus would have remained secret only for the fact it was mentioned in declassified US government wartime documents in 1996 and finally, in a detailed report released by the British government in 2000.

Abraham Wald was a Hungarian mathematician who worked for the United States during World War Two. His contribution to data analytics is what we now call “survivorship bias”, referring to the logical error of solely focusing on what made it past some selection process and overlooking those that did not, typically because of their lack of visibility. In other words, what we see is not all there is.

Wald was tasked with calculating the optimum amount of armour to use on warplanes based on analysing data from battles fought all over Europe. No armour would mean pilots lack protection, but too much armour would lead to heavier, slower planes that were less fuel-efficient. The engineers noted that the planes had far more shots on the fuselage and wings and concluded that these were the areas in need of extra protection.

However, Wald knew that sometimes “The most important data is the data you don’t have” – Abraham Wald

The question he asked himself was “where do planes that don’t come back get shot?” By asking a better question, he allowed himself to find a better answer — the planes that returned safely had more shots on the areas that can handle more shots. He concluded that the areas with fewer recorded shots needed the most armour.

John Snow was a British doctor who is considered one of the founders of modern epidemiology. He used data collection and data analysis to trace the source of a cholera outbreak in central London, and came to the conclusion that cholera was transmitted by “an agent in the water” as opposed to the accepted theory that it was transmitted by “bad air”.

Snow used data collection to trace the cholera outbreak to two water companies who drew their water from the Thames river, virtually unfiltered. He notes that a huge, double-blind experiment fell into his lap:

“No fewer than three hundred thousand people of both sexes, of every age and occupation, and of every rank and station, from gentlefolks down to the very poor, were divided into two groups without their choice, and, in most cases, without their knowledge; one group being supplied water containing the sewage of London, and amongst it, whatever might have come from the cholera patients, the other group having water quite free from such impurity.”

Snow’s analysis of the subsequent data and his other works led to fundamental changes in water and waste management in London and other cities, saving many lives and contributing significantly to global public health. It is now regarded as the founding event of epidemiology.