Was your workday interrupted by the Google Docs outage yesterday? Mine, too. Well, today, Google Docs Engineering Director Alan Warren apologized to us, and he did so in a nice, thorough blog post that explains exactly what happened.
The Docs team pushed a change that was “designed to improve real time collaboration within the document list.” That sounds like a good idea. Unfortunately, it revealed a big old memory management bug that they couldn’t detect until it was exposed to the full force of Google Docs users. Basically, the machines that check for updates didn’t clear their memory properly, so they filled up and crashed, shifting the load onto other machines, causing them to crash, and away we go. The team caught the problem within half an hour. It’s worth reading the blog post to see exactly how.
Article source: RRW http://feedproxy.google.com/~r/readwriteweb/~3/RwLeiFIgFAs/doing_it_right_google_docs_apologizes_for_yesterda.php