Revisiting *that* Google outage: Fire, flooding, (then running out of water) and a “regional Spanner” failure
Fire-fighting was not helped by Global Switch’s fire suppression system “running out of water”. The incident also introduced water and soot contamination. Google Cloud’s affected racks had to be taken apart, thoroughly cleaned and reassembled before they could be restarted.
A Google Cloud Paris outage in late April had all the ingredients in a city known for its love of drama: Fire, flooding, rubble, soot contamination, a fire suppression system running out of water in France’s premier co-location facility; a rapidly cascading global services failure…
Needless to say however, a localised incident should not trigger the global failure of mission-critical services from a cloud hyperscaler. What, exactly, happened and what lessons have been learned?
The Stack revisited the incident…