So March 14th was Pi day, and I don't celebrate Pi day and wasn't going to click on any videos about it, but ended up clicking this one anyway. I don't celebrate Pi Day because it seems arbitrary -- it's an arbitrary day determined by our arbitrary month numbering system and arbitrary number of days in each month and so on, but this video turned out to be weirdly interesting for completely different reasons.
First, it reminded me that before the invention of computers, "computer" was actually a job title. Calculations were done by humans. And when you hear about a computer doing x billion calculations per second, it's easy to let that go in one ear and out the other without thinking about what it means, but seeing a roomful of over a hundred people doing calculations by hand on paper gives you a very easy-to-visualize comparison. Oh, probably should mention, the idea behind this "Pi Day" video was to beat the record for the number number of digits of pi calculated by hand.
Second, it's weirdly illuminating as to the nature and role of management. As the number of people goes up, you can't just take whatever process you had developed for a small number of people and scale it up -- you have to change the process in fundamental ways. Here not everyone does calculations -- some people have special roles combining the calculations done by others, so there's now a variety of "management" positions, and there are various other organizational practices like color-coded paper.
You might think that on computers, you can just scale things up. But in the real world we see that isn't true. For example if you have a website that serves hundreds of people, to scale it up to hundreds of billions, you have to change it in fundamental ways -- generally you have to change it from a "monolithic" service to a distributed microservices architecture. So even once robots have taken over the world and replaced all human work with electronics and software, it will still be true that you can't take a simple computational system and just "scale it up" -- to scale up you will have to re-architect the process.
Third and finally, mistakes. This one really gets to me, as a person punished by this society (the United States, an anti-mistake society) for mistakes. But maybe all human societies are anti-mistake societies? Maybe punishing mistakes is just human nature? Maybe that is what all humans do, everywhere?* But what we see in this video is that, originally, they were going to have 3 people do each calculation, and if 2 out of 3 got the same answer, accept it as the right answer. "Each calculation" here means a 20-digit long division. It turns out that the algorithm they chose for calculating pi did a variety of arithmetic calculations, but the long divisions greatly outweighed the additions and everything else, so the vast majority of "computers" (job title) were doing long division. Anyway, 2 out of 3 turned out to be inadequate because it turned out it was very possible for 2 people to make identical mistakes. So the system they ended up going with is 5 people doing every calculation and only accepted if all 5 get the same answer. Error correction algorithms on computer are more sophisticated, but they assume errors occur at a certain rate (bits can flip from cosmic rays and so fourth, and computers have error-correcting codes that catch it and correct it).
[from the comments]
"Long division, not wrong division" -- spoken like a true line manager.
So they were shocked that the rate humans make errors was vastly higher than they thought. It's not just me, other people make mistakes, too.
At least they started this process with some tests to make sure their reliability assumptions were correct, rather than discovering all the digits they had calculated for a week were wrong at the end of the process. They discovered they had assumed humans were much more reliable "computers" than they actually were, and made the necessary changes to the process.
The key to engineering reliable systems isn't assuming humans don't make mistakes -- or can be made to never make mistakes by punishing them when they make mistakes -- it's changing the engineering process itself so the process prevents and catches and corrects mistakes.
People have been talking a lot about the unreliability of AI systems lately, but this is a reminder, humans, too, are very unreliable. In fact AI systems seem more like humans in the way they make mistakes than traditional computer software systems. AI systems are now spewing out massive amounts of code. That code has to be just as scrupulously reviewed and tested as if it were made by humans -- maybe more so.
The biggest hand calculation in a century! [Pi Day 2024] - Stand-up Maths