We had a short outage on some systems, and some of our switches today.
We had to do some maintenance on one of the smaller electrical cabinets and we stress tested one of our UPS units, and recovery procedures.
Unfortunately not everything went as planned, and some switches and few servers were shutdown in the process.
Lessons were learned, and there will be some design changes to parts of our datacenter to increase power distribution fault tolerance and enhance recovery. Longest downtime was roughly 20 minutes on one of the switches, this was due to it's power supplies failing during reboot. It's a known issue with this switch model as their power supplies age.
Sorry for all of our customers who got affected by this.
Thursday, November 5, 2020