Access "Safer failover testing procedures for the data center"
This article is part of the null issue of null
Some data center pros have unorthodox ways of checking fault tolerance. For example, I asked a friend who is a network administrator for a medium-sized organization about his failover testing procedures. He said that at random times, he walks through the data center and yanks a power cord out of the back of a random server or switch. That way, he not only tests the resiliency of the failover infrastructure but also his staff’s ability to notice that a failover has occurred and to fix the problem. Even though this method seems to work for my friend, disconnecting random power cords is probably not the best failover testing method in a production environment. Although that technique might allow you to find out how well your fault-tolerant solutions work, it also poses a tremendous amount of risk because it will result in an outage unless all fault-tolerant mechanisms are working perfectly. Another problem with yanking a power cord out of a cluster node is that it simulates only one type of failure. A better approach is to design a series of tests that check ... Access >>>