There is a significant number of nodes down along with no response to remote management.

Staff is enroute to the DC to check what is going on.

UPDATE 1: AC Unit had failed, which had caused some servers to go full power on fans, which in turn caused couple single rack single phases to trip, causing switches to be down, causing larger number to fail. Extra AC unit has been powered on, and repair for the failed AC unit has been ordered. 2 more similarly sized AC units were already in order to increase redundancy margin. All servers are back online now.

Tuesday, April 26, 2022

« Nazad