Thursday, January 1, 2015

HA Web Server via Pacemaker on three VMs using an iSCSI VM

A number of years ago I configured a four-node RHEL6 cluster using cman,
rgmanager, clvm, gfs2, a SAN and Dell's CMC fencing to support live
migration and automatic failover of Xen/KVM virtual machines.
I ended up trying prototype this using VMWare Fusion on a Mac but was
unable to because there were not any fencing options for the VM
platform at the time.

Today I was glad to set up a three-node RHEL7 HA web cluster
all within my laptop using only four KVM VMs using pacemaker,
corosync, and a VM running an iSCSI server. I used the following
documents for reference.



Below are some of my notes, mainly for my own purposes.


Get the software

subscription-manager register
subscription-manager attach --pool=$POOL
subscription-manager repos --disable=*
subscription-manager repos --enable=rhel-7-server-rpms
subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
yum -y install pacemaker corosync fence-agents-all fence-agents-virsh fence-virt pcs  dlm lvm2-cluster gfs2-utils  

Use consistent /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.122.87  pcs-a pcs-a.example.com
192.168.122.26  pcs-b pcs-b.example.com
192.168.122.129 pcs-c pcs-c.example.com
192.168.122.140 iscsi iscsi.example.com
192.168.122.141 web web.example.com
Note that I cloned my VM (pcs-a) at this point into pcs-{b,c} at this point.


Start the cluster

Note that this was applied to the other nodes.
passwd hacluster 
pcs cluster auth pcs-a.example.com pcs-b.example.com pcs-c.example.com   
Notice that I didn't need to set up fencing just to get them talking to eachother.
[root@pcs-a ~]# pcs cluster setup --name rh7nodes pcs-a.example.com pcs-b.example.com pcs-c.example.com   
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
pcs-a.example.com: Succeeded
pcs-b.example.com: Succeeded
pcs-c.example.com: Succeeded
[root@pcs-a ~]# pcs cluster start --all  

Get Fencing Working

First you need to configure your hypervisor (my laptop) to fence the VMs using fence-virtd as described in
How can I set up virtual fencing on my KVM Virtual machine cluster?


Once that's working (using the multicast option), create the fence device:
pcs stonith describe fence_xvm  
pcs stonith create virtfence_xvm fence_xvm  
pcs stonith update virtfence_xvm pcmk_host_map="pcs-a.example.com:pcs-a;pcs-b.example.com:pcs-b;pcs-c.example.com:pcs-c" key_file=/etc/corosync/fence_xvm.key  
After following the above I was unable at first to fence a node.
[root@pcs-a ~]# stonith_admin --reboot pcs-c.example.com
Command failed: No route to host
[root@pcs-a ~]# 
I found that my host hadn't joined the multicast group which I
addressed by restarting fenced and my VMs.

ss -l | grep 225.0.0.12
tcpdump was very helpful in pointing out that I needed to open the
firewall:

tcpdump -i virbr0 -n port 1229 
iptables -I INPUT -p udp --dport 1229 -j ACCEPT
To let fence_virtd service get accessed.
# lsof -i UDP:1229
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
fence_vir 11781 root    8u  IPv4 506550      0t0  UDP *:zented 
# ps axu | grep 11781
root     10197  0.0  0.0 112640   960 pts/5    S+   16:02   0:00 grep --color=auto 11781
root     11781  0.0  0.0 158832  4868 ?        Ss   11:23   0:00 /usr/sbin/fence_virtd -w
# 
The other issue I ran into is that you must pass the fence_xvm command
NOT the hostname but the name that virsh uses; i.e. this will not
work:

[root@pcs-a ~]# fence_xvm -H pcs-c.example.com -dddd | tail -5
Issuing TCP challenge
Responding to TCP challenge
TCP Exchange + Authentication done... 
Waiting for return value from XVM host
Operation failed
[root@pcs-a ~]# 
Because `virsh list | grep pcs-c.example.com` will not return
anything. Instead, the following worked because I had named by VM
rhel7-pcs-c using virsh.
fence_xvm -H rhel7-pcs-c -dddddd
Finally, when debugging on the host, stop the fence_virtd daemon via
the service controller and watch the error messages directly:

service fence_virtd stop
 /usr/sbin/fence_virtd -d999 -F

Configure the iSCSI server

I made a new VM called iscsi.example.com configured it as an iSCSI server by following the RHE7 Storage Admin Guide. I then created a file based block device to serve as a LUN.

[root@iscsi ~]# systemctl enable target
ln -s '/usr/lib/systemd/system/target.service' '/etc/systemd/system/multi-user.target.wants/target.service'
[root@iscsi ~]# targetcli
Warning: Could not load preferences file /root/.targetcli/prefs.bin.
targetcli shell version 2.1.fb34
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 0]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 0]
  o- loopback ......................................................................................................... [Targets: 0]
/> iscsi/
/iscsi> create
Created target iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4.
Created TPG 1.
/iscsi> ls
o- iscsi .............................................................................................................. [Targets: 1]
  o- iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4 ............................................................. [TPGs: 1]
    o- tpg1 ................................................................................................. [no-gen-acls, no-auth]
      o- acls ............................................................................................................ [ACLs: 0]
      o- luns ............................................................................................................ [LUNs: 0]
      o- portals ...................................................................................................... [Portals: 0]
/iscsi>

/backstores> /backstores/fileio create www /var/fileio/www 5G write_back=false
Created fileio www with size 5368709120
/backstores> ls
o- backstores ..................................................................
  o- block .....................................................................
  o- fileio ....................................................................
  | o- www .....................................................................
  o- pscsi .....................................................................
  o- ramdisk ...................................................................
/backstores>
/backstores> /iscsi
/iscsi> ls
o- iscsi .......................................................................
  o- iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4 ...................
    o- tpg1 ....................................................................
      o- acls ..................................................................
      o- luns ..................................................................
      o- portals ...............................................................
/iscsi> 
/iscsi> iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4/tpg1/
/iscsi/iqn.20...ba51d7f4/tpg1> portals/ create 192.168.122.140 
Using default IP port 3260
Created network portal 192.168.122.140:3260.
/iscsi/iqn.20...ba51d7f4/tpg1> ls
o- tpg1 ................................................. [no-gen-acls, no-auth]
  o- acls ............................................................ [ACLs: 0]
  o- luns ............................................................ [LUNs: 0]
  o- portals ...................................................... [Portals: 1]
    o- 192.168.122.140:3260 ............................................... [OK]
/iscsi/iqn.20...ba51d7f4/tpg1> luns/ create /backstores/fileio/www 

Created LUN 0.
/iscsi/iqn.20...ba51d7f4/tpg1> ls
o- tpg1 ................................................. [no-gen-acls, no-auth]
  o- acls ............................................................ [ACLs: 0]
  o- luns ............................................................ [LUNs: 1]
  | o- lun0 ..................................... [fileio/www (/var/fileio/www)]
  o- portals ...................................................... [Portals: 1]
    o- 192.168.122.140:3260 ............................................... [OK]
/iscsi/iqn.20...ba51d7f4/tpg1>

On pcs-{a,b,c} I ran `yum install iscsi-initiator-utils` as per a Red Hat knowledge base article and identified the generated InitiatorName:


[root@pcs-c ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:44e8828a7868
[root@pcs-c ~]# 
From there I assigned the ACLs for my three hosts:

/iscsi/iqn.20...7f4/tpg1/acls> create iqn.1994-05.com.redhat:553420881b94
Created Node ACL for iqn.1994-05.com.redhat:553420881b94
Created mapped LUN 0.
/iscsi/iqn.20...7f4/tpg1/acls> create iqn.1994-05.com.redhat:44e8828a7868
Created Node ACL for iqn.1994-05.com.redhat:44e8828a7868
Created mapped LUN 0.
/iscsi/iqn.20...7f4/tpg1/acls> create iqn.1994-05.com.redhat:d7163d296480
Created Node ACL for iqn.1994-05.com.redhat:d7163d296480
Created mapped LUN 0.
/iscsi/iqn.20...7f4/tpg1/acls> ls
o- acls ..................................................................................... [ACLs: 3]
  o- iqn.1994-05.com.redhat:44e8828a7868 ............................................. [Mapped LUNs: 1]
  | o- mapped_lun0 ............................................................. [lun0 fileio/www (rw)]
  o- iqn.1994-05.com.redhat:553420881b94 ............................................. [Mapped LUNs: 1]
  | o- mapped_lun0 ............................................................. [lun0 fileio/www (rw)]
  o- iqn.1994-05.com.redhat:d7163d296480 ............................................. [Mapped LUNs: 1]
    o- mapped_lun0 ............................................................. [lun0 fileio/www (rw)]
/iscsi/iqn.20...7f4/tpg1/acls> 
I then had them all connect to the same block device:

[root@pcs-c ~]# fdisk -l /dev/sda 
fdisk: cannot open /dev/sda: No such file or directory
[root@pcs-c ~]# iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4 -p iscsi.example.com -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4, portal: 192.168.122.140,3260] (multiple)
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.9e8fba51d7f4, portal: 192.168.122.140,3260] successful.
[root@pcs-c ~]# fdisk -l /dev/sda 

Disk /dev/sda: 5368 MB, 5368709120 bytes, 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4194304 bytes

[root@pcs-c ~]# 


Create an Exclusive LV


I then followed the second half of the documentation to create a file system and LV for the web tree.

pvcreate /dev/sda1
vgcreate my_vg /dev/sda1
lvcreate -L1000 -n my_lv my_vg 
mkfs.ext4 /dev/my_vg/my_lv
mount /dev/my_vg/my_lv /var/www/
mkdir /var/www/html
mkdir /var/www/cgi-bin
mkdir /var/www/error
restorecon -R /var/www
echo "hello" > /var/www/html/index.html
umount /var/www

One thing I like about the example is that I didn't need to use CLVM
since I don't have a true shared file system. I only needed to tell
the cluster that access to a vertain LV was to be managed by it and
for it to be accessed by only one node at a time. One safegard against
the filesystem corruption (in addition to fencing) was to tell LVM to
not configure my iSCSI LV as an auto_activation_volume by editing the
volume_list to include only the local LVs in /etc/lvm/lvm.conf.


[root@pcs-a ~]# lvs
  LV    VG    Attr       LSize    Pool Origin Data%  Move Log Cpy%Sync Convert
  my_lv my_vg -wi-a----- 1000.00m                                             
  root  rhel  -wi-ao----   14.81g                                             
  swap  rhel  -wi-ao----    1.70g                                             
[root@pcs-a ~]# 

[root@pcs-b ~]# lvs
  LV   VG   Attr       LSize  Pool Origin Data%  Move Log Cpy%Sync Convert
  root rhel -wi-ao---- 14.81g                                             
  swap rhel -wi-ao----  1.70g                                             
[root@pcs-b ~]# 

[root@pcs-c ~]# lvs
  LV   VG   Attr       LSize  Pool Origin Data%  Move Log Cpy%Sync Convert
  root rhel -wi-ao---- 14.81g                                             
  swap rhel -wi-ao----  1.70g                                             
[root@pcs-c ~]# 

Configure the Cluster Resources

I continued with the the documentation and created by cluster resources starting with the storage resources.

[root@pcs-a ~]# pcs resource create my_lvm LVM volgrpname=my_vg exclusive=true --group apachegroup 

[root@pcs-a ~]# pcs resource show
 Resource Group: apachegroup
     my_lvm     (ocf::heartbeat:LVM):   Started 
[root@pcs-a ~]# pcs resource create my_fs Filesystem device="/dev/my_vg/my_lv" directory="/var/www" fstype="ext4" --group apachegroup 
[root@pcs-a ~]# 
Earlier I had decided that 192.168.122.141 would be a shared IP
address for the host web.example.com. I also configured a new virtual
NIC (which you can do without rebooting a KVM host) for each node and
configured that NIC to not be started on boot, since the cluster will
manage it; e.g. a similar file as below exists on the other nodes but
varies by MAC.

[root@pcs-c ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens8 
HWADDR="52:54:00:c8:71:77"
TYPE="Ethernet"
BOOTPROTO="none"
DEFROUTE="yes"
NAME="ens8"
ONBOOT="no"
[root@pcs-c ~]# 
I then configured this NIC to hold my floating IP address and also
configured the apache resource group.

[root@pcs-a ~]# pcs resource create VirtualIP IPaddr2 ip=192.168.122.141 nic=ens8 cidr_netmask=24 --group apachegroup 
[root@pcs-a ~]# pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apachegroup 
[root@pcs-a ~]# 

I ran into some issues with getting the VirtualIP and Website
resources to start as per `pcs status` and spent time chasing
after an arp issue using `ip n` on the nodes and `tcpdump -i virbr0`
on the hypervisor before I realized I had assigned an address from a
different vlan by accident. This was a good experience however
because I became acquainted with the following.

[root@pcs-a ~]# pcs resource show
 Resource Group: apachegroup
     my_lvm     (ocf::heartbeat:LVM):   Started pcs-b.example.com 
     my_fs      (ocf::heartbeat:Filesystem):    Started pcs-b.example.com 
     VirtualIP  (ocf::heartbeat:IPaddr2):       Stopped 
     Website    (ocf::heartbeat:apache):        Stopped 
[root@pcs-a ~]# pcs resource show

[root@pcs-a ~]# pcs resource debug-start VirtualIP
Operation start for VirtualIP (ocf:heartbeat:IPaddr2) returned 1
 >  stderr: ERROR: Unable to find nic or netmask.
 >  stderr: ERROR: [findif] failed
[root@pcs-a ~]# 

[root@pcs-a pcsd]# pcs resource show VirtualIP
 Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=198.168.122.141 cidr_netmask=24 
  Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s)
              stop interval=0s timeout=20s (VirtualIP-stop-timeout-20s)
              monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s)
[root@pcs-a pcsd]# 

[root@pcs-a pcsd]# /usr/libexec/heartbeat/send_arp -A -i 200 -r 5 -p /var/run/resource-agents/send_arp-198.168.122.141 ens8 198.168.122.141  auto not_used not_used
ARPING 198.168.122.141 from 198.168.122.141 ens8
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
[root@pcs-a pcsd]# 
Can you imagine copying/pasting the above without realizing that I was
not using 192.168.... but 198.168.... ? Often something
simple. Eventually I noticed it while watching the ARPs on my virtual
bridge:

# tcpdump -i virbr0 | grep -i arp
14:04:17.183244 ARP, Request who-has web tell pcs-a, length 28
14:04:17.363589 ARP, Request who-has pcs-b tell pcs-c, length 28
14:04:17.363684 ARP, Reply pcs-b is-at 52:54:00:1c:c4:1c (oui Unknown), length 28
14:04:17.779637 ARP, Request who-has laptop.example.com tell pcs-b, length 28
14:04:17.779658 ARP, Reply laptop.example.com is-at fe:54:00:1c:c4:1c (oui Unknown), length 28
14:04:18.184881 ARP, Request who-has web tell pcs-a, length 28
14:04:19.186844 ARP, Request who-has web tell pcs-a, length 28
14:04:22.611597 ARP, Request who-has pcs-c tell pcs-b, length 28
14:04:22.611680 ARP, Reply pcs-c is-at 52:54:00:b2:79:04 (oui Unknown), length 28
...

Test the cluster

Once I had updated the IP address,
pcs resource update VirtualIP IPaddr2 nic=ens8  ip=192.168.122.141 cidr_netmask=24 --group apachegroup 
I ran the following while also relading web.example.com in my browser.
watch -n 1 "ip a s ens8; df -h"
and did command like the following.
pcs cluster standby pcs-b.example.com
pcs cluster unstandby pcs-b.example.com
So I could watch the cluster resources move from hosts that suddently
did not have them:
[root@pcs-b ~]# ip a s ens8; df -h 
2: ens8:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:8f:19:c6 brd ff:ff:ff:ff:ff:ff
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   15G  1.2G   14G   8% /
devtmpfs               489M     0  489M   0% /dev
tmpfs                  498M   39M  459M   8% /dev/shm
tmpfs                  498M  6.6M  491M   2% /run
tmpfs                  498M     0  498M   0% /sys/fs/cgroup
/dev/vda1              497M  136M  362M  28% /boot
[root@pcs-b ~]# 
To hosts that did have them:
[root@pcs-a ~]# ip a s ens8; df -h
2: ens8:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:7d:40:6b brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.141/24 scope global ens8
       valid_lft forever preferred_lft forever
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root     15G  1.2G   14G   8% /
devtmpfs                 489M     0  489M   0% /dev
tmpfs                    498M   54M  444M  11% /dev/shm
tmpfs                    498M  6.6M  491M   2% /run
tmpfs                    498M     0  498M   0% /sys/fs/cgroup
/dev/vda1                497M  136M  362M  28% /boot
/dev/mapper/my_vg-my_lv  969M  2.5M  900M   1% /var/www
[root@pcs-a ~]#