Having a storage area network (SAN) can take a load of stress out of managing the large quantities of data that...
feed the life of an enterprise. But a SAN can also sap time and money away from that life when it fails to perform at your desired level.
Heartbeat is a monitoring tool that will help you to make the most of your SAN by catching problems before they interfere with your productivity. Part three of this four-part tip shows you how to install a Heartbeat cluster in an open source SAN.
Setting up the Heartbeat cluster
Let's be clear about this: Setting up a Heartbeat cluster is an ambitious venture. Doing it properly involves many steps. One of these is setting up STONITH, which ensures that a failing node shuts down automatically. I'm assuming that your Heartbeat network is already configured, so I won't cover that at length here. If it isn't, use the following basic guidelines to create a two-node Heartbeat cluster in SUSE's YaST.
Be warned, however. The procedure described below is minimal and for use in a test environment only. Use this procedure for setting up a production environment at your own risk.
1. Ensure that host name resolving is set up properly. If you are using domain name server, everything should be in place already. If not, make sure that the '/etc/hosts' file on all nodes include names and IP addresses of all other nodes.
2. On san1, start the YaST administration utility, using yast2.
3. From YaST, select Miscellaneous > High availability. In the window you now see, you will be able to see the name of the node from which you run YasT. In the Add Nodes bar, enter the name of the node you want to add, and then click 'Add.' This gives you a result as in Figure 1. Now click Next to proceed.Figure 1: After adding all nodes, the Node Configuration window should look like this.
4. Next, you need to select an authentication method. Strictly, you don't need any authentication. If you want to deploy more than one Heartbeat network in the same broadcast domain, though, it is a good idea to use either SHA1 or MD5 as the authentication method. By providing an authentication key with either of these protocols, you prevent networks from getting accidentally mixed up.Figure 2: Provide an authentication key to prevent different Heartbeat clusters from mixing up by accident.
5. In the Media Configuration window, you specify what connection is to be used for Heartbeat traffic. By default, the Heartbeat traffic will be sent as broadcast over the default LAN connection. It is a good idea to use the connection that is used for the storage synchronization between the two parts of the DRBD device here, so make sure that device is selected and then click 'Next' to continue. You can safely ignore any error messages that come into display.Figure 3: Make sure to select the storage network interface for the Heartbeat traffic.
6. In the final step of the procedure, you can specify whether or not you want to start the Heartbeat service automatically when your server boots. Set it to start automatically and click Finish to complete this part of the procedure.
7. To finalize the Heartbeat installation, use the /usr/lib/heartbeat/ha_propagate command. This command will use scp to copy the Heartbeat configuration to the other nodes in the network. This command copies the configuration to the other node, but does not start Heartbeat on that node as well. To start Heartbeat - and make sure it comes up after a reboot - use the following two commands on the other node:
Your cluster is now up and working for you. There are two methods to do a quick check. The first uses the crm_mon -I 1 command. This command shows you information about the number of nodes that were found in the cluster and the associated resources (see below).Calling up node information
crm_mon -I 1command gives an easy method to see if nodes in your cluster are available.
san1:/etc # crm_mon -i 1
Refresh in 1s... ============
Last updated: Mon Jun 23 20:56:37 2008
Current DC: san1 (eecba864-30cf-4e47-9c1e-395381c3c460)
2 Nodes configured.
0 Resources configured. ============
Node: san1 (eecba864-30cf-4e47-9c1e-395381c3c460): online
Node: san2 (76d72cfb-efd3-4fde-975f-b8cafd5885bd): online
The other method to see if the cluster is up and running is to use the hb_gui graphical user interface. To work from this interface, you need to provide the user hacluster with a password. So first, use passwd hacluster and next type 'hb_gui' on either of the nodes in the cluster. After starting it, use Connection > Login to authentication. You'll now see the current status of the cluster, in which at this moment there will only be two nodes and no resources.Figure 4: From the hb_gui interface you can check the current status of the cluster.
Configuring the DRBD resource in Heartbeat
You now have configured your server to start the DRBD service when booting. In a Heartbeat cluster, that is normally not good. In this particular scenario, it is good. All you need to do in Heartbeat is configure the drbddisk resource. The current status is that there are two resources available for DRBD. While the old drbddisk resource works, the Heartbeat version 2 style OCF agent does not. So, in this document I'll describe how to use drbddisk. The only prerequisite for using this is that DRBD must be started from the runlevels of all servers involved. Next, from the hb_gui interface, create the resource as described below.
- From one of the nodes, start hb_gui and authenticate as user 'hacluster.'
- Select Resources > Add New Item and select the Item Type native. Click OK next.
The ultimate goal is to create a resource group that manages DRBD and the iSCSI target service. To make this easier later on, you'll start working on the group from the beginning.
- In the Add Native Resource window that you can see from Figure 5, enter the resource ID drbd0 and at the option 'Belong to group,' type "iSCSItargetGroup". Don't click Add yet.
- Now from the drop-down list of DRBD types, select the resource type with the name 'drbddisk.' Next click Add Parameter and enter the value drbd0. This value should reflect the name that you used when creating the DRBD device in the /etc/drbd.conf script. Next, click OK and Add to add the resource to your cluster configuration.
- In the hb_gui interface, you'll now see that the drbd0 resource has been added, but has the status, 'not running.' Right-click the resource and click Start to start it. You should now see that the resource is running on one of the two nodes; that is fine.
Figure 6: The hb_gui now indicates that the DRBD resource has been started on one of the nodes in the cluster.
The DRBD resource is now running on one of the cluster nodes. In Figure 6 you can see it is currently being served by san1.
Time for a stress test: make sure that when you pull the plug from san1, the DRBD resource fails over automatically. Before doing this, use watch cat /proc/drbd, as demonstrated below, to monitor the status of the DRBD resource on node san2. You should see that it currently has the status secondary. Once the cluster has failed the resource over, the status should change to primary.
Monitoring the DRBD resource status
From /proc/drbd you can monitor the current status of the DRBD resource.
san1:/etc # watch cat /proc/drbd
Every 2.0s: cat /proc/drbd Mon Jun 23 21:43:48 2008
version: 0.7.22 (api:79/proto:74)
SVN Revision: 2572 build by lmb@dale, 2006-10-25 18:17:21
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:104726528 nr:0 dw:0 dr:104726528 al:0 bm:19176 lo:0 pe:0 ua:0 ap:0
Everything in place for the stress test? Time to start: from the san1 node, kill the cluster now by using the pkill heartbeat command. This will terminate the Heartbeat service on node 1, and it will make the other node primary in the DRBD setup.
Everything working so far? Very good. In that case, DRBD is up and running. If you are seeing unstable behavior, do a restart of both nodes. This will bring up DRBD right from the start as a resource that is managed by Heartbeat, and in so doing will increase the chances of success. If this doesn't work, don't proceed to the next step. If it does, it is time to configure iSCSI.
Note: After restarting the servers in your cluster, it will take some time before crm_mon -I 1 shows both nodes as up and available again. That is not a problem - just wait for the nodes to come up before proceeding.
We are almost there now. In the fourth and final part of this tip, you'll learn how to configure an iSCSI service to provide access to the DRBD resource. With this final step, you will have your open source SAN up and functioning.
About the author:Sander van Vugt is an author and independent technical trainer, specializing in Linux since 1994. Vugt is also a technical consultant for high-availability (HA) clustering and performance optimization, as well as an expert on SUSE Linux Enterprise Desktop 10 (SLED 10) administration. For Sander's tip about image server Clonezilla, click here.