Google has revealed more details about what happened when it had to close one of its London data centers on the UK’s hottest day of the year yet.
The “europe-west2-a” zone outage last month, according to Google, was due to failure to maintain a safe operating temperature due to the simultaneous failure of multiple, redundant cooling systems coupled with the “extremely high” outdoor temperatures.
The outage impacted numerous Google services, including Google Compute Engine, Persistent Disk (PD), and Google Cloud Storage, causing instance termination, service degradation, and network issues.
What really happened?
Google engineers shut down the data center that housed part of the affected Europa-west2-a zone while the cooling system was being repaired
The total impact on cloud services was estimated at 18 hours and 23 minutes.
This is pretty disturbing news, especially when you consider how Google claims that these regional services are “designed to survive the failure of a single zone.”
Google attributed the error to accidentally changing the traffic routing for internal services to avoid all three zones in the “europe-west2” region, rather than just the affected “europe-west2-a” zone.
The routing incident prevented customers from accessing data from regional storage services, including GCS and BigQuery, across multiple zones.
Will this happen again?
This kind of news is understandably quite alarming if you’re concerned about global warming, as the UK may see quite a few even warmer days in the future.
Thankfully, Google has made some commitments to prevent these kinds of errors from ever impacting its cloud hosting again.
These include fixing and retesting the failover automation in an effort to ensure stronger resilience in the failover protocols during large scale events like this one.
The cloud giant is also committed to researching and developing “more advanced methods” to gradually reduce thermal load within a single data center space, reducing the likelihood of a complete shutdown being necessary.
In addition, Google would examine its procedures, tooling and automated recovery systems for gaps and will conduct an audit of cooling system equipment and standards in the data centers that house Google Cloud worldwide.
- Need to move your storage away from remote data centers? Check out our guide to the best bare metal storage