An IT outage grounded Lufthansa’s Global Fleet: Here’s what was to blame...
“Even at a depth of five metres, our fibre optic cables are not safe from concrete drills” lamented Deutsche Telekom AG on February 14, sharing pictures of damage caused by construction workers in Frankfurt.
Needless to say, the telco was not the only company wringing its hands, as downstream customer impact mounted. A major airline was among those hit: “Since this morning, the airlines of the Lufthansa Group have been affected by an IT failure caused by construction work in the Frankfurt region. Unfortunately, this has led to flight delays and cancellations” the airline said after the damage grounded its fleet, leaving travellers stranded.
Builders working for state-owned railway company Deutsche Bahn were to blame after taking out four cables more than 16 feet deep – then reportedly spilling concrete on them, further complicating repair efforts.
Quite why Lufthansa’s systems did not failover to alternative routes is unclear.
(Quite how Deutsche Bahn's plans were unaware that they were drilling on top of and right through critical data centre infrastructure is also an important question policy makers will be asking...)
See also: Why is HMG suddenly concerned about data centre security?
Deutsche Telekom was among those having to field questions about this, saying on Twitter on Wednesday in a question from customers about path redundancy: “In many cases, networks are doubled.
"But the fact is that our cable was severed late Tuesday afternoon, but the disruption at Lufthansa only became noticeable this morning. The error pattern is therefore more complex…” (translated from German.)
The incident is the latest aviation-related outage caused by a failure to failover to backup systems.
A serious IT outage at the Federal Aviation Authority (FAA) which forced it to halt all US departing flights on Wednesday January 11 was attributed by the transportation agency to a “damaged database file”. This propagated to backups as well, ultimately forcing a hard reboot of both systems whilst planes remained grounded across the US.
How do you ensure your failovers are watertight and your connectivity is truly resilient?
The Stack is keen to hear war stories, hard-won knowledge on best practice and more. Get in touch.