There was more than 33 servers down for several hours.

Initially we assumed power distribution issue on one of the racks, which was partially correct. TOR switch for that rack had a failed PSU upon closer inspection.

That failed PSU has now been swapped over and the switch is back online, monitoring is reporting that all nodes are back online as well.

Monday, May 25, 2020

« Back