Installing Heartbeat on Amazon’s EC2

I am currently working on setting up a small high availability server cluster on Amazon’s EC2 cloud. Such a setup requires several underlying technologies to work together. Common among these are a distributed file system, a load balancer, and some form of monitoring and resource control. This article looks at the one aspect of ‘monitoring’ – a messaging layer – and its basic setup.

Package Description
Cluster Glue Common dependency
Heartbeat Messaging layer (older, no new features being added)
Corosync Messaging layer (newer, preferred and under active development)
OpenAIS Protocol, formerly part of Corosync, implements AIS layer (not always required)
Pacemaker Resource manager (works with both messaging layers, formerly part of Heartbeat)
Resource Agents Scripts for controlling some services/resources

Messaging Layers

The underlying basis of a monitoring setup entails requesting (or sending) data (a file, a packet, etc.) from each node in the cluster on a periodic basis. The nodes would ideally all communicate with each other (either directly, or through a master node) and will therefore each know the status of the other nodes. Typically, the role of this inter-node communication is performed by something akin to a ‘bus’. Essentially a general protocol which will accept messages to be transmitted to other nodes as well as providing a ‘pulse’ to signify that the node is up. (If the pulse is not received when expected, other nodes conclude that the node is down).

Two of the commonly used messaging layers are Heartbeat and Corosync.

Corosync is the more recent and possibly more flexible one, and is likely to eventually obsolete heartbeat. The project has some notable backers (e.g. Red Hat) and is under active development. Corosync is available in the amzn repository, however, that version (1.2.3 at this time) does not support unicast. Only Corosync versions 1.3.0+ (Nov, 2010) have support for UDPU (UDP unicast) [although, a patch exists for some previous versions]. Unicast is of specific mention here because Amazon’s network does not permit broadcast or multicast transmissions.

Despite having finally opted to use Corosync as my messaging layer, I initially experimented with Heartbeat. The following briefly outlines how to setup and test Heartbeat on EC2. (This doesn’t include the monitoring of specific resources or the setup of the haresources file – just the basic setup and testing).

The RPMs generated here satisfy some of the dependencies of Pacemaker (in addition to Heartbeat and Cluster-Glue, Pacemaker also requires Corosync).

Pre-requisites

The setup below is specific to Amazon’s Linux distribution (RHEL/CentOS derived) – but should be applicable to other distributions with little modification.

Heartbeat is not available from the amzn repository, and I have decided to build RPMs from source (since you will need to install it on more than one server, the RPM approach saves a bit of time over re-compiling on each node).

For building of RPMs, you will want to have the ‘Development Tools’ installed (not recommended on a production machine).

yum groupinstall "Development Tools"

Heartbeat requires Cluster-Glue for compilation/installation. Resource Agents are also commonly installed.

You might want to create a user: hacluster and group: haclient as they are referenced by the install (the install will succeed though, even without them)

There are a number of dependencies you should install first:

(flex and bison are installed with ‘Development Tools’, and will be ignored in the list below if you already have them) (Note: mailx (and the mail command) is not included in the most recent version of Amazon’s Linux and has therefore been included as a dependency, below.)

yum install -y flex bison net-snmp OpenIPMI glib2-devel libxml2-devel bzip2-devel libuuid-devel docbook-utils docbook-dtds libtool-ltdl libtool-ltdl-devel libxslt perl-TimeDate python-devel OpenIPMI-devel openssl-devel docbook-style-xsl help2man e2fsprogs-devel mailx

You also need Python. .

Visit the Linux High Availability download page for the latest releases of the files below: http://www.linux-ha.org/wiki/Download

Cluster-Glue RPM

wget http://hg.linux-ha.org/glue/archive/glue-1.0.7.tar.bz2
tar -xjvf glue-*.tar.bz2
mv Reusable-Cluster-Components-glue--glue-* cluster-glue
tar -cjvf cluster-glue.tar.bz2 cluster-glue
mv cluster-glue.tar.bz2 /usr/src/rpm/SOURCES/
cd cluster-glue
rpmbuild --bb cluster-glue-fedora.spec

The above commands download, extract, rename, and repackage the files; copy the repackaged file to the SOURCES directory, and build the RPM using the fedora spec file provided.

When done, the following RPMS can be found in: /usr/src/rpm/RPMS/x86_64/

  • cluster-glue-1.0.7-1.amzn1.x86_64.rpm
  • cluster-glue-debuginfo-1.0.7-1.amzn1.x86_64.rpm
  • cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm
  • cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm

 

Heartbeat RPM

Install cluster-glue-libs, cluster-glue, and cluster-glue-libs-devel (built above) [we need the --nogpgcheck parameter, since we have not signed the RPMs we created]:

yum install --nogpgcheck cluster-glue-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm

Following a procedure essentially the same as above, run the following to build the Heartbeat RPMs:

wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/STABLE-3.0.4.tar.bz2
tar -xjvf STABLE-*.tar.bz2
mv Heartbeat-3-0-STABLE-* heartbeat
tar -cjvf heartbeat.tar.bz2 heartbeat
mv heartbeat.tar.bz2 /usr/src/rpm/SOURCES/
cd heartbeat
rpmbuild --bb heartbeat-fedora.spec

When done, the following RPMS can be found in: /usr/src/rpm/RPMS/x86_64/

  • heartbeat-3.0.4-1.amzn1.x86_64.rpm
  • heartbeat-debuginfo-3.0.4-1.amzn1.x86_64.rpm
  • heartbeat-devel-3.0.4-1.amzn1.x86_64.rpm
  • heartbeat-libs-3.0.4-1.amzn1.x86_64.rpm

Resource Agents RPMs

This last step is optional, but commonly used with Heartbeat – these are scripts for managing resources. The preparation is quite similar to those above.

wget -O agents-1.0.4.tgz https://github.com/ClusterLabs/resource-agents/tarball/agents-1.0.4
tar -xzvf agents-*.tgz
mv ClusterLabs-resource-agents-* resource-agents
tar -cjvf resource-agents.tar.bz2 resource-agents
mv resource-agents.tar.bz2 /usr/src/rpm/SOURCES/
cd resource-agents
rpmbuild --bb resource-agents.spec

When done, the following RPMS can be found in: /usr/src/rpm/RPMS/x86_64/

  • ldirectord-1.0.4-1.amzn1.x86_64.rpm
  • resource-agents-1.0.4-1.amzn1.x86_64.rpm
  • resource-agents-debuginfo-1.0.4-1.amzn1.x86_64.rpm

 

Install the RPMs

(Again, we need –nogpgcheck since we haven’t signed the RPMs)

yum --nogpgcheck install heartbeat-3.0.4-1.amzn1.x86_64.rpm heartbeat-libs-3.0.4-1.amzn1.x86_64.rpm resource-agents-1.0.4-1.amzn1.x86_64.rpm

Basic Setup

The following outlines a minimum necessary to get heartbeat sending a pulse between two nodes.

cd /usr/share/doc/heartbeat-3.0.4/
cp authkeys ha.cf haresources /etc/ha.d/
cd /etc/ha.d

Essentially, we are copying the sample files to the configuration directory (ha.d).

We will now generate authkeys: (the sed command is to remove an extra ‘(stdin)= ‘ that openssl adds)

( echo -ne "auth 1\n1 sha1 "; dd if=/dev/urandom bs=512 count=1 | openssl sha1 | sed 's/.*= //' ) > /etc/ha.d/authkeys

Set the permissions:

chmod 600 authkeys

 

Basic configuration

/etc/ha.d/ha.cf: (change the IPs)

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
udpport 694
ucast eth0 10.xxx.xxx.xxa
ucast eth0 10.xxx.xxx.xxb
auto_failback off
node server1.domain.com
node server2.domain.com

The above presumes that you have your node IPs setup in your hosts file (or some other DNS equivalent), if not, use IP addresses for the node values. The IPs on the ucast lines are the private IPs of the instances (one is the local instance, the second is the other node).

For full details on the configuration file, see: http://www.linux-ha.org/wiki/Ha.cf

Remember to open UDP 694 in the security group

Copy authkeys, ha.cf, haresources to second machine and set the file permissions.

Start heartbeat on both machines.

service heartbeat start

 

Test and Diagnose

Verify ports are open:

(if nmap is not installed: yum install –y nmap)

nmap -p 694 -sU -P0 10.xxx.xxx.xxx

Watch the ‘heartbeat’ communications:

tcpdump port 694

Watch the log to see when a node comes up or goes down  (i.e when you start/stop heartbeat on the other server) [use ctrl+c to exit]

tail -f /var/log/ha-log

By cyberx86

Just a random guy who dabbles with assorted technologies yet works in a completely unrelated field.

18 comments

  1. you also need there deps before issuing rpmbuild for the first time:
    yum install -y libtool autoconf automake

    1. All of these are part of the “Development Tools” package in the amzn repository.

      Running yum groupinfo “Development Tools” gives the following:

      Group: Development Tools, Description:
      These tools include core development tools such as automake, gcc, perl, python, and debuggers.,
      Mandatory Packages: autoconf, automake, binutils, bison, flex, gcc, gcc-c++, gdb, gettext, libtool, make, pkgconfig, rpm-build, strace, system-rpm-config
      Default Packages: automake14, automake15, automake16, automake17, byacc, cscope, ctags, cvs, dev86, diffstat, doxygen, elfutils, gcc-gfortran, indent, ltrace, oprofile, patchutils, pfmon, pstack, python-ldap, rcs, splint, subversion, swig, systemtap, texinfo, valgrind
      Optional Packages: ElectricFence, dejagnu, expect, gcc-gnat, gcc-objc, gcc44, gcc44-c++, gcc44-gfortran, imake, java-1.6.0-openjdk, java-1.6.0-openjdk-devel, libgfortran44, memtest86+, nasm, pexpect, python24-docs, python26-docs, unifdef

  2. also please add this deps and see this error:
    sudo yum install -y gettext

    rpmbuild –bb heartbeat-fedora.spec
    error: Failed build dependencies:
    cluster-glue-libs-devel is needed by heartbeat-3.0.4-1.amzn1.x86_64

  3. Again, gettext is part of “Development Tools” – I used it to easily address the build dependencies.

    Also, cluster-glue-libs-devel is part of the first line of the ‘Heartbeat RPM’ section:

    yum install –nogpgcheck cluster-glue-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm

  4. rpm -i ~/rpmbuild/RPMS/x86_64/cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm
    error: Failed dependencies:
    cluster-glue-libs = 1.0.7-1.amzn1 is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
    liblrm.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
    libpils.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
    libplumb.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
    libplumbgpl.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
    libstonith.so.1()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64

  5. You need to install them all together – cluster-glue-libs-devel depends on cluster-glue-libs.

    After you have built them, either install all simultaneously with rpm -i, or use the method above, to install them via yum (which will resolve any additional dependencies for you)
    Again: yum install –nogpgcheck cluster-glue-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm

    (there are two dashes in front of ‘nogpgcheck’)

  6. in Resource Agents RPMs pls replace the line with:
    s/tar -xzvf ClusterLabs-resource-agents-agents-*.tar.gz/tar -xzvf agents-1.0.4/

    thanks

  7. current ami doesnt have mailx by default. one should add it before this procedure to be able to build heartbeat (second rpmbuild )

  8. what is 10.xxx.xxx.xxx ? in

    ucast eth0 10.xxx.xxx.xxx
    ucast eth0 10.xxx.xxx.xxx

    is it local ip of server1.domain.com and server2.domain.com or virtual ip? i want to use any of instance(which is active) from outside using single ip(or host name) how can i do that on EC2?
    dont we need so specify

    node01 172.16.4.82 httpd
    in haresources ?

    1. The lines 10.xxx.xxx.xxx are the private (internal) IP addresses of the instances. They were used in order to keep transmissions secure between the two instances. If you want to use a public IP address, you can do so. However, you may need to make some changes to the security group settings. Also, you should probably use an elastic IP address on each instance – the advantage being that the elastic IP will map to an internal IP when two instances are in the same zone, and otherwise to the the public IP, usually providing the optimal (fastest and least expensive) route. The node directives define all the nodes (including the local node) in the cluster – it should match uname -n for the node (i.e. not an IP address). You can specify more than one node per line though. The ucast directives describe where to send packets (and which interface to use) – IP addresses are good here.
      The haresources directive usually has the node name of the node (i.e. uname -n), the configured IP, and the resource. In other words, they need to match what you put into the ha.cf file.

  9. thank you for your reply.
    I want to make active/passive failover instance
    I have two instance, for example

    instance1: hostname1(reachable from public)
    localip1
    instance2: hostname2(reachable from public)
    localip2

    so i need to do
    ucast eth0 localip1
    ucast eth0 localip2

    now i want to know if i can use third ip(may be elastic ip) so if i open that in my browser i can hit hostname1 or hostname2 which is available
    how can i do that
    reply would be appreciated
    thanx

    1. You can only map a single IP address – whether it is an elastic IP or the default dynamic IP addresses – to a given instance. The traditional AWS solution to your problem would be to use an Elastic Load Balancer – which will distribute the requests between the two instances. If that is not the avenue you wish to pursue, you can setup HAProxy (or even use nginx or Varnish) as your load balancer. Install it on an instance (it can be on the same instances your are load balancing – although, that adds to the complexity). In the end though, it is fairly typical to have a single public facing IP (unless it is a really large site) (which corresponds to the load balancer), and all traffic is then directed to private (i.e. non-web accessible) backends.

      I might suggest you look into Corosync instead of Heartbeat – I found it to be a bit more friendly, and it is actively maintained (unlike Heartbeat which it has mostly superseded). If you are interested, I can post my notes on setting up Corosync on EC2. (It has been a few months, but the way I approached this problem was to create two identical instances running HAProxy, OpenVPN, Pacemaker, Corosync, and Gluster. I seem to recall OpenVPN being required for some of the network communication between some of the components, as they used protocols not supported by the EC2 network – although, it is possible that VPC may work instead (I couldn’t test it at the time).

      I might suggest the site ServerFault if you are not familiar with it – it is a Q&A site for system’s admins. If you have specific problems, you should be able to get some help there (and I answer questions there as well, under the same alias).

  10. problem compiler,

    rpmbuild --bb heartbeat-fedora.spec
    __________________________________________________________

    automake-1.5: configure.in: installing `./compile'
    replace/Makefile.am:29: required file `replace/[lt__dirent].c' not found
    replace/Makefile.am:29: required file `replace/[lt__strl].c' not found
    replace/Makefile.am:29: required file `replace/[argz].c' not found

    please.

    thanks.

    1. Those files are provided by the package libtool-ltdl-devel. Although, I do not recall needing it to compile the packages. If you install the package, the files (lt__dirent.c, lt__strl.c, argz.c) are in /usr/share/libtool/libltdl/. I might suggest ensuring that you have the latest build tools first, and also the latest version of Heartbeat (which is currently 3.0.5 – the article references 3.0.4). Also, keep in mind that the article was written for Amazon’s Linux AMI – while it may work on other RHEL derived systems, it wasn’t tested on them.

  11. I am testing the installation on CentOS 6.2 the first compilation worked well but this has drawbacks

    replace / Makefile.am: 29: required file `replace / [lt__dirent]. c 'not found
     replace / Makefile.am: 29: required file `replace / [lt__strl]. c 'not found
     replace / Makefile.am: 29: required file `replace / [argz]. c 'not found

    ________________
    la ruta que dijiste esta bien. Esta buscando otra ruta?
    (The path you specified is fine. Is this looking in another directory?)

    /usr/share/libtool/libltdl/

    aclocal.m4   config.log    libltdl       ltdl.c         lt__strl.c    README
    argz.c       configure     loaders       ltdl.h         Makefile.am   slist.c
    argz_.h      configure.ac  lt__alloc.c   lt_dlloader.c  Makefile.in
    config-h.in  COPYING.LIB   lt__dirent.c  lt_error.c     Makefile.inc
    1. If you watch the compile, you should see the following:

      libtoolize: copying file `libltdl/argz.c'
      libtoolize: copying file `libltdl/libltdl/lt__dirent.h'
      libtoolize: copying file `libltdl/libltdl/lt__strl.h'

      The path it is looking in is the replace/ directory of the source tree. Presumably, libtool should copy the needed files into that directory.

      I spun up a virtual machine with a fresh install of CentOS 6.2 (minimal, i386) and gave it a try. Heartbeat built without any issues. Here is my command log (you’ll note a few variations from the article, but nothing major):

      Update and install dependencies:

      yum update -y
      yum groupinstall -y "Development Tools"
      yum install -y wget net-snmp-devel openhpi-devel net-snmp OpenIPMI glib2-devel libxml2-devel bzip2-devel libuuid-devel docbook-utils docbook-dtds libtool-ltdl libtool-ltdl-devel libxslt perl-TimeDate python-devel OpenIPMI-devel openssl-devel docbook-style-xsl help2man e2fsprogs-devel mailx

      Setup bulid environment:

      mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
      echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros

      Get the packages, extract, rename, retar:

      cd /usr/local/src
      wget -O cluster-glue.tar.bz2 http://hg.linux-ha.org/glue/archive/glue-1.0.9.tar.bz2
      wget -O heartbeat.tar.bz2 http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/7e3a82377fa8.tar.bz2
      tar -xjvf cluster-glue.tar.bz2
      tar -xjvf heartbeat.tar.bz2
      rm -rf cluster-glue.tar.bz2 heartbeat.tar.bz2
      mv Reusable-Cluster-Components-* cluster-glue
      mv Heartbeat-* heartbeat
      tar -cjvf ~/rpmbuild/SOURCES/cluster-glue.tar.bz2 cluster-glue
      tar -cjvf ~/rpmbuild/SOURCES/heartbeat.tar.bz2 heartbeat

      Build (change the architecture if needed):

      rpmbuild -bb cluster-glue/cluster-glue-fedora.spec
      yum install --nogpgcheck ~/rpmbuild/RPMS/i686/cluster-glue-1.0.9-1.el6.i686.rpm \
      ~/rpmbuild/RPMS/i686/cluster-glue-libs-1.0.9-1.el6.i686.rpm \
      ~/rpmbuild/RPMS/i686/cluster-glue-libs-devel-1.0.9-1.el6.i686.rpm
      rpmbuild -bb heartbeat/heartbeat-fedora.spec

      I didn’t build Resource-agents, but the procedure should amount to:

      ./autogen.sh
      ./configure
      make rpm

Leave a Reply to cyberx86 Cancel reply

Your email address will not be published. Required fields are marked *