Implementing SBD STONITH in Linux HA clusters

An SBD STONITH approach is simple to set up and is a reliable way to ensure data integrity in a Linux HA cluster.

This Content Component encountered an error

In previous tips in this series on high availability (HA) in the data center, you've read how to set up a Linux HA infrastructure. You have also learned how the “shoot the other node in the head” (STONITH) approach is needed to ensure the integrity of the shared storage that is in your Linux cluster. In this tip, you'll read how to implement split brain detection (SBD) STONITH, a STONITH program that uses a shared disk device and is...

easy to implement in most environments.

There are many different programs available for STONITH.. The advantage of SBD STONITH is that it is easy to implement and reliable. The only requirement is that the environment needs to have shared storage. Typically, that would be a storage area network (SAN). If you don't have a SAN, you can set it up using the Linux iSCSI target, which you'll read about in a later tip in this series.

In SBD STONITH, the nodes in the Linux cluster keep each other updated by using the Heartbeat mechanism. If something goes wrong with a node in the cluster, a poison pill is written for that node to the shared storage device. The node has to “eat” (accept) the poison pill and terminate itself, after which a file system resource can be safely failed over to another node in the Linux cluster.

SBD STONITH is a simple but effective way to ensure the integrity of data and other nodes in a Linux cluster, but access to the SAN is required for it to function. The procedure below describes how to set up SBD STONITH.

  1. To start, you must create a small logical unit number (LUN) volume . 1 MB is enough, but to stay on the safe side, it's a good idea to create your SBD LUN with a size of at least one cylinder (8 MB in most cases). Next, you need to find out which is the unique device name of this LUN device, as it is seen from the nodes in the cluster. Typically, you would use the multipath -l command on one of the nodes in the Linux cluster to find out the LUN’s unique device name.
  2. Now as user root, from the command line on one of the nodes, you need to mark the LUN that you´ve created as the SBD device using the sbd -d <devicename> create command. This command writes the SBD information to the device, so it doesn't really matter which device name you use, as long as you can see the device from that node. Make sure that when working on devices, you work on device names that don't change. That means you should work on the device names using /dev/disk/by-id at the beginning of the device name. These names are long and ugly to work with, but at least they don't change. You can always see the “easy” device name by using the ls-l command. So to assign the device /dev/disk/by-id/scsi-149455400000000000000000003000000250600000f000000 as the SBD STONITH device, use sbd -d /dev//disk/by-id/scsi-149455400000000000000000003000000250600000f000000 create .
  3. At this point, you can use the sbd -d /dev/disk/by-id/scsi-149455400000000000000000003000000250600000f000000 dump command to see what is written to the device. This gives you an output similar to the listing below.

Listing: Requesting current SBD information using sbd -d <device> dump

xen1:/dev/disk/by-id # sbd -d /dev/disk/by-id/scsi-149455400000000000000000003000000250600000f000000 dump

Header version  : 2

Number of slots  : 255

Sector size    : 512

Timeout (watchdog) : 2

Timeout (allocate) : 2

Timeout (loop)   : 1

Timeout (msgwait) : 4

  1. Additionally, it is necessary to set up the Linux system to use kernel watchdogs, which help the system detect if a node in the cluster has hung. The preferred way to do this is by using a hardware-assisted watchdog. If, for some reason, this is not feasible for the hardware you are using, you can use a software watchdog as an alternative. To do this, add the line modprobe softdog to the /etc/init.d/boot.local file on all nodes in the cluster.
  2. At this point, you can start the Linux HA Management Client and log in as the user hacluster. Then select Configuration > Resources and click Add.
  3. In the Add window, select the Primitive type and click OK. Next enter the ID sbd-stonith. Also, make sure the following parameters are set:
    • ID: sbd
    • Class: stonith
    • Type: external/sbd
  4. On the Instance Attributes tab, you'll now see the parameter sbd_device that currently doesn't have a value. Click Edit, and enter the block device name of the SBD device. You must make sure that the block device name is the same on all nodes in the Linux cluster, so be sure to use one of the /dev/disk/by-id names to accomplish this.
  5. Now click OK, followed by Apply twice to add the resource to your cluster.
  6. To complete this procedure, you also have to create a file with the name /etc/sysconfig/sbd on all nodes. In this file, you must define two parameters. The SBD_DEVICE parameter tells the cluster software which device it has to use as the SBD device when it loads. The SBD_OPTS parameter tells it which startup parameters to use. In the following listing, there is an example of what this file should look like. Don't forget to put the name of the sbd device in the /etc/sysconfig/sbd file, or it won't work.

Listing:

xen1:/dev/disk/by-id # cat /etc/sysconfig/sbd

SBD_DEVICE="/dev/disk/by-id/scsi-14945540000000000000000000300000026060 0000f000000-"

SBD_OPTS="-W"

At this point, STONITH is configured and you can reboot the nodes in the cluster to verify that it works. Once rebooted, you'll see the STONITH agent that is started from the Heartbeat graphical management interface. Your Linux cluster is now in a safe state, so you can start creating the resources you want to protect with HA. In the next tip in this series, you'll learn how to set up Apache for Linux HA.

About the expert:Sander van Vugt is an independent trainer and consultant living in the Netherlands. Van Vugt is an expert in Linux high availability, virtualization and performance, and has completed several projects that implement all three. Sander is also a regular speaker on many Linux conferences all over the world. He is also the writer of various Linux-related books, such as Beginning the Linux Command LineBeginning Ubuntu Server Administration and Pro Ubuntu Server Administration.

This was first published in September 2011

Dig deeper on Data Center Disaster Recovery

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchWindowsServer

SearchEnterpriseLinux

SearchServerVirtualization

SearchCloudComputing

Close