In previous tips in this series on high availability (HA) in the data center, you've read how to
set up a Linux HA infrastructure. You have also learned how the “shoot the other node in the head”
(STONITH) approach is needed to ensure the integrity of the shared storage that is in your Linux
cluster. In this tip, you'll read how to implement split brain detection (SBD) STONITH, a STONITH
program that uses a shared disk device and is easy to implement in most environments.
There are many different programs available for STONITH.. The advantage of SBD STONITH is that
it is easy to implement and reliable. The only requirement is that the environment needs to have
shared storage. Typically, that would be a storage area network (SAN). If you don't have a SAN, you
can set it up using the Linux iSCSI target, which you'll read about in a later tip in this
series.
In SBD STONITH, the nodes in the Linux cluster keep each other updated by using the Heartbeat mechanism. If
something goes wrong with a node in the cluster, a poison pill is written for that node to the
shared storage device. The node has to “eat” (accept) the poison pill and terminate itself, after
which a file system resource can be safely failed over to another node in the Linux cluster.
SBD STONITH is a simple but effective way to ensure the integrity of data and other nodes in a
Linux cluster, but access to the SAN is required for it to function. The procedure below describes
how to set up SBD STONITH.
- To start, you must create a small logical unit number (LUN) volume . 1 MB is
enough, but to stay on the safe side, it's a good idea to create your SBD LUN with a size of at
least one cylinder (8 MB in most cases). Next, you need to find out which is the unique device name
of this LUN device, as it is seen from the nodes in the cluster. Typically, you would use the
multipath -l command on one of the nodes in the Linux cluster to find out the LUN’s unique
device name.
- Now as user root, from the command line on one of the nodes, you need to mark
the LUN that you´ve created as the SBD device using the sbd -d <devicename> create
command. This command writes the SBD information to the device, so it doesn't really matter which
device name you use, as long as you can see the device from that node. Make sure that when working
on devices, you work on device names that don't change. That means you should work on the device
names using /dev/disk/by-id at the beginning of the device name. These names are long and
ugly to work with, but at least they don't change. You can always see the “easy” device name by
using the ls-l command. So to assign the device
/dev/disk/by-id/scsi-149455400000000000000000003000000250600000f000000 as the SBD STONITH device,
use sbd -d /dev//disk/by-id/scsi-149455400000000000000000003000000250600000f000000 create .
- At this point, you can use the sbd -d
/dev/disk/by-id/scsi-149455400000000000000000003000000250600000f000000 dump command to see what is
written to the device. This gives you an output similar to the listing below.
Listing: Requesting current SBD information using sbd -d <device> dump
xen1:/dev/disk/by-id # sbd -d
/dev/disk/by-id/scsi-149455400000000000000000003000000250600000f000000 dump
Header version : 2
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 2
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 4
- Additionally, it is necessary to set up the Linux system to use kernel watchdogs,
which help the system detect if a node in the cluster has hung. The preferred way to do this is by
using a hardware-assisted watchdog. If, for some reason, this is not feasible for the hardware you
are using, you can use a software watchdog as an alternative. To do this, add the line modprobe
softdog to the /etc/init.d/boot.local file on all nodes in the cluster.
- At this point, you can start the Linux HA Management Client and log in as the
user hacluster. Then select Configuration > Resources and click Add.
- In the Add window, select the Primitive type and click OK. Next
enter the ID sbd-stonith. Also, make sure the following parameters are set:
- ID: sbd
- Class: stonith
- Type: external/sbd
- On the Instance Attributes tab, you'll now see the parameter sbd_device that
currently doesn't have a value. Click Edit, and enter the block device name of the SBD
device. You must make sure that the block device name is the same on all nodes in the Linux
cluster, so be sure to use one of the /dev/disk/by-id names to accomplish this.
- Now click OK, followed by Apply twice to add the resource to
your cluster.
- To complete this procedure, you also have to create a file with the name
/etc/sysconfig/sbd on all nodes. In this file, you must define two parameters. The
SBD_DEVICE parameter tells the cluster software which device it has to use as the SBD device
when it loads. The SBD_OPTS parameter tells it which startup parameters to use. In the
following listing, there is an example of what this file should look like. Don't forget to put the
name of the sbd device in the /etc/sysconfig/sbd file, or it won't work.
Listing:
xen1:/dev/disk/by-id # cat /etc/sysconfig/sbd
SBD_DEVICE="/dev/disk/by-id/scsi-14945540000000000000000000300000026060 0000f000000-"
SBD_OPTS="-W"
At this point, STONITH is configured and you can reboot the nodes in the cluster to verify that
it works. Once rebooted, you'll see the STONITH agent that is started from the Heartbeat graphical
management interface. Your Linux cluster is now in a safe state, so you can start creating the
resources you want to protect with HA. In the next tip in this series, you'll learn how to set up
Apache for Linux HA.
More resources on setting up a Linux cluster
About the expert:Sander van Vugt is an independent trainer and consultant
living in the Netherlands. Van Vugt is an expert in Linux high availability, virtualization and
performance, and has completed several projects that implement all three. Sander is also a regular
speaker on many Linux conferences all over the world. He is also the writer of various
Linux-related books, such as Beginning the Linux
Command Line, Beginning
Ubuntu Server Administration and Pro Ubuntu
Server Administration.