*nix hacks: March 2008

Friday, March 28, 2008

rhel5 mod_ldap

I just got mod_ldap working with my OpenLDAP server on a RHEL5 system. It was harder than it had to be. The following did the trick using the vanilla Apache from RHEL5:

AuthName "your name here"
AuthType Basic
AuthBasicProvider ldap
AuthzLDAPAuthoritative off
AuthLDAPUrl ldap://ldap.domain.tld:389/o=options
AuthLDAPBindDN cn=servicedn,o=tld
AuthLDAPBindPassword secret
Require user user0 user1

There are three mod_auth_ldap modules:

Note how the first two are official from Apache 2.0 or 2.2 respectively and conditionally. The third does not come directly from Apache. RHEL5's vanilla Apache 2.0 comes with the first two.

$ grep ldap /etc/httpd/conf/httpd.conf 
LoadModule ldap_module modules/mod_ldap.so
LoadModule authnz_ldap_module modules/mod_authnz_ldap.so

You can yum install mod_authz_ldap to drop the new module in /etc/httpd/modules/ but you'll still need to edit httpd.conf to include it. However, you don't need it for basic authentication.

Friday, March 21, 2008

I've mentioned earlier that I have pair of DNS load balanced mail gateways. These boxen run sendmail and I've recently lowered the MX preference number in DNS so that the more powerful system will get the mail first. However I've run into a strange problem where some systems are not respecting the DNS change. It seems that the more powerful system is not accepting mail from these hosts and the less powerful system is then picking up the message instead. These systems are also running sendmail. I'm still trying to figure out why this is, but I'd like to share how I figured this out (with the help of a network-oriented friend).

In two terminals I ran the following commands:

echo -e "sent on `date`" | mail -s "test: `hostname`" me@domain.tld
tcpdump -vv -s 1500 -w sendmail_25w.txt port 25

This produced a log which consistently showed that my non-gateway MTA sent a retransmit and the more powerful gateway (mta1) tore down the connection:

16:44:28.856481 IP server.domain.tld.41432 > mta1.domain.tld.smtp: 
P 135:699(564) ack 367 win 46


16:44:29.057802 IP server.domain.tld.41432 > mta1.domain.tld.smtp: 
P 135:699(564) ack 367 win 46


16:44:29.058142 IP mta1.domain.tld.smtp > server.domain.tld.41432: 
R 2420026399:2420026399(0) win 0

However it had no problem building the connection with mta0 and sending the mail. Why did the non-gateway MTA resend the extra packet and why did the gateway MTA reject it?

My friend points out that Jon Postel would say the gateway MTA was wrong as per RFC793 section 2.10, Robustness Principle. Also, why would mta0 accept it and mta1 reject it?

Here's that same data but in a screen capture from wireshark. I couldn't resist overstriking the image. Note the three red-dots showing the re-transmit: Note that my non-gateway MTAs are using vanilla sendmail from RHEL4/5:

$ /usr/sbin/sendmail -v -d0.1 < /dev/null | head -1
Version 8.13.1
...
$ /usr/sbin/sendmail -v -d0.1 < /dev/null | head -1
Version 8.13.8

When looking at mta1 I see it rejecting the connection with an I/O error:

$ fgrep me@server.domain.tld /var/log/maillog
Mar 19 15:52:44 mta1 sendmail[6255]: m2JJqidH006255: from=, size=558, 
class=0, nrcpts=1, msgid=<200803191952.m2JJqh0r018282@server.domain.tld>, proto=SMTP, daemon=MTA,
relay=server.domain.tld [123.456.78.9]
Mar 19 15:57:47 mta1 sendmail[16727]: m2JJvlnq016727: SYSERR(root): collect: I/O error on 
connection from server.domain.tld, from=
Mar 19 15:57:47 mta1 sendmail[16727]: m2JJvlnq016727: from=, size=566, 
class=0, nrcpts=1, proto=SMTP, daemon=MTA, relay=server.domain.tld [123.456.78.9]

We'll see if I figure this one out or live with mail going the wrong way for a few servers.

How to make a growable LUN

We have a SAN in our department. There have been times when some of us have needed to make certain LUNs larger and the process has been non-optimal: arrange for downtime, make a new LUN and finally move the data there and remount it.

If you have a feeling that other organizations with SANs don't do this you're probably right. The SAN offers the ability to expand LUNs without any downtime so why should we not be able to do this with the filesystem? The problem is that after you magically grow the disk behind the scenes you still need to expand the filesystem to recognize it without any downtime. The solution is to configure the raw disks not with ext3 but with ext3 on top of LVM:

Create the LUN with the SAN's interface
When the LUN is presented via iSCSI or HBA setup LVM
Make ext3 on top of the LVM (not the raw device the SAN presents)

Then if you need to grow the mount point in the future:

Have SAN do a block-level migration to grow the LUN
Use lvextend and ext2online to have the filesystem grow into the new blocks

The one thing about LVM that some people are confused about is if I have a disk why wouldn't I just make it the max size it could be from the start? The answer is that you might add more disk later using SAN tricks and then you'll wish you had used LVM in order to grow.

In order to practice this (since you don't just want to do this on production data) you can use a USB thumb drive.

Make two partitions on the USB thumb drive
Make an LVM on the first partition
Make a filesystem on the LVM and store some data
Add the second partiton to the LVM
Grow the filesystem to use the second partition
The data should be unchanged but we'll have more capacity

I'll say that agin but with details:

Make two partitions on the USB thumb drive

$ fdisk /dev/sda

It there are no partitions:

Command (m for help): p

Disk /dev/sda: 65 MB, 65536000 bytes
3 heads, 42 sectors/track, 1015 cylinders
Units = cylinders of 126 * 512 = 64512 bytes

   Device Boot      Start         End      Blocks   Id  System

Create two partitions with 'n':

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1015, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-1015, default 1015): 507
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (508-1015, default 508): 
Using default value 508
Last cylinder or +size or +sizeM or +sizeK (508-1015, default 1015): 
Using default value 1015

Command (m for help): p

Disk /dev/sda: 65 MB, 65536000 bytes
3 heads, 42 sectors/track, 1015 cylinders
Units = cylinders of 126 * 512 = 64512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         507       31920   83  Linux
/dev/sda2             508        1015       32004   83  Linux

Command (m for help):

Note that both got the System ID "83 Linux". You'll want to change it to type "8e Linux LVM" (you'll see this listed via 'l'). Use 't' to change a partition's system id (you'll see this listed via 'm').

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 8e
Changed system type of partition 2 to 8e (Linux LVM)

Command (m for help): p

Disk /dev/sda: 65 MB, 65536000 bytes
3 heads, 42 sectors/track, 1015 cylinders
Units = cylinders of 126 * 512 = 64512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         507       31920   8e  Linux LVM
/dev/sda2             508        1015       32004   8e  Linux LVM

Command (m for help):

Type 'w' to write the partition to disk. For the rest of this exercise we'll work on /dev/sda1 but near the end we'll include /dev/sda2 as if it's been added after a long time and more disk became available.

Make an LVM on the first partition

To use LVM, partitions or whole disks must first be converted into physical volumes (PVs) using the pvcreate command.

$ pvcreate /dev/sda1
  Physical volume "/dev/sda1" successfully created
$

Once you have one or more physical volumes created, you can create a volume group from these PVs using the vgcreate command.

$ vgcreate lvm_test /dev/sda1
  Volume group "lvm_test" successfully created
$

Note that you could have added /dev/sda2 as a third argument and other partitions or physical devices as needed to create the partition with more volumes. However we want to simulate growing a partition with data on it so we'll reserve the following for later:

pvcreate /dev/sda2
vgextend lvm_test /dev/sda2

Remember, don't do the above yet. You can see what you have so far in the volume group with vgdisplay:

$ vgdisplay lvm_test
  --- Volume group ---
  VG Name               lvm_test
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               28.00 MB
  PE Size               4.00 MB
  Total PE              7
  Alloc PE / Size       0 / 0   
  Free  PE / Size       7 / 28.00 MB
  VG UUID               komlLr-rmIb-VUKH-z6Xs-Upl3-eRzY-J1tgSm
   
$

We'll use the Total PE above to create a new logical volume using all available space in lvm_test:

$ lvcreate -n lvm_test_vol0  -l 7 lvm_test
  Logical volume "lvm_test_vol0" created
$

If we had more space we could create a second logical volume but we've used it all. Verify this with vgdisplay and note that the Alloc PE and Free PE have transposed. lvcreate will create a linear allocation by default. If we had multiple disks (not partitions) and we wanted to stripe them for better retrieval then we would use "lvcreate -i2 -I4" to create two stripes and stripe size of 4 KB. You should see a new device in /dev as a result of the lvcreate:

$ ls -l /dev/lvm_test/
total 0
lrwxrwxrwx  1 root root 34 Mar 20 09:19 lvm_test_vol0 
                           -> /dev/mapper/lvm_test-lvm_test_vol0
$

We will use this device (/dev/lvm_test/lvm_test_vol0) as if it were a raw disk. LVM is inserting a layer that will allow us to grow this device later without changing the data on the filesystem.

Make a filesystem on the LVM and store some data

Make an ext3 filesystem on the new logical device and then mount it:

$ /sbin/mkfs.ext3 /dev/lvm_test/lvm_test_vol0
$ mount -t ext3 /dev/lvm_test/lvm_test_vol0 test/

Verify your new mount point and put some data on it.

$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       28M  1.4M   25M   6% /mnt/test
$ cd /mnt/test/
$ echo "please don't break me" > file.txt

Let's put some extra data on there until it's full (this is fair use I own the CD):

$ cp ~tunes/malevolent_creation/Retribution/* .
cp: writing `./06 The Coldest Survive.mp3': No space left on device
cp: writing `./07 Monster.mp3': No space left on device
cp: writing `./08 Mindlock.mp3': No space left on device
cp: writing `./09 Iced.mp3': No space left on device
$ ll
total 24883
-rwx---r-x  1 root root 5230857 Mar 20 09:53 01 Eve of the Apocalypse.mp3
-rwx---r-x  1 root root 4172897 Mar 20 09:53 02 Systematic Execution.mp3
-rwx---r-x  1 root root 4493160 Mar 20 09:53 03 Slaughter of Innocence.mp3
-rwx---r-x  1 root root 6128950 Mar 20 09:53 04 Coronation of Our Domain.mp3
-rw-r--r--  1 root root 5324912 Mar 20 09:53 05 No Flesh Shall Be Spared.mp3
-rw-r--r--  1 root root      22 Mar 20 09:50 file.txt
drwx------  2 root root   12288 Mar 20 09:46 lost+found

So we couldn't fit the rest of the album. Let's try to fix that.

Add the second partiton to the LVM

We're now ready to add /dev/sda2 to lvm_test:

$ pvcreate /dev/sda2
    Physical volume "/dev/sda2" successfully created
$ vgextend lvm_test /dev/sda2
    Volume group "lvm_test" successfully extended
$

This simulates adding more disk provided that it presents a physical device. If we had grown a LUN on our SAN we wouldn't need do the above steps since /dev/sda2 wouldn't be getting added. Instead it would be more like /dev/sda1 had grown behind the scenes and we'd just need to go to the next step.

We'll verify that we have the extra space (you should do this for this exercise or if you've grown the LUN with your SAN):

$ vgdisplay lvm_test
  --- Volume group ---
  VG Name               lvm_test
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               56.00 MB
  PE Size               4.00 MB
  Total PE              14
  Alloc PE / Size       7 / 28.00 MB
  Free  PE / Size       7 / 28.00 MB
  VG UUID               komlLr-rmIb-VUKH-z6Xs-Upl3-eRzY-J1tgSm

$

Note that VG and PE Sizes have doubled and that the Free PE size now shows our extra available space to use this space:

$ lvextend -L+28M /dev/lvm_test/lvm_test_vol0
  Extending logical volume lvm_test_vol0 to 56.00 MB
  Logical volume lvm_test_vol0 successfully resized
$

You can see the change from the above with vgdisplay:

$ vgdisplay lvm_test | grep Alloc
  Alloc PE / Size       14 / 56.00 MB
$

Note that the mounted filesystem still doesn't see the extra space:

$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       28M   26M   85K 100% /mnt/test
$

Grow the filesystem to use the second partition

So far we've added the extra disk to the LVM without umounting the filesystem. Now to we'll extend the filesystem to use the rest of logical volume. We could first umount it and be on safe side according to the LVM howto. However RedHat claims it's possible to expand both the ext3fs and GFS file systems online without bringing the system down:

For this exercise we'll extend the filesystem online. However if downtime is available it's probably better to take advantage of it and use ext2resize instead of ext2online.

Running this command produced interesting results:

$ ext2online /dev/lvm_test/lvm_test_vol0 
ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
ext2online: ext2_ioctl: No space left on device

ext2online: unable to resize /dev/mapper/lvm_test-lvm_test_vol0
$

However I have some extra space:

$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       39M   26M   12M  70% /mnt/test
$

It's probably realted to the filesystem being full. How full?

Mounting and remounting produces the same 39M volume and ext2resize isn't available. ext2online complains if the drive isn't mounted. Let's try freeing up some space (metal names make howto's funny):

$ rm 05\ No\ Flesh\ Shall\ Be\ Spared.mp3
$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       39M   21M   17M  56% /mnt/test
$

Same result. Let's see what it looks like:

$ vgdisplay lvm_test
  --- Volume group ---
  VG Name               lvm_test
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               56.00 MB
  PE Size               4.00 MB
  Total PE              14
  Alloc PE / Size       14 / 56.00 MB
  Free  PE / Size       0 / 0   
  VG UUID               komlLr-rmIb-VUKH-z6Xs-Upl3-eRzY-J1tgSm
   
$

So all available space is used. We've gone from:

  Alloc PE / Size       7 / 28.00 MB

to:

  Alloc PE / Size       14 / 56.00 MB

So we have extra capacity, but it's not as much as I would have expected.

The data should be unchanged but we'll have more capacity

Yes:

$ diff 01\ Eve\ of\ the\ Apocalypse.mp3 ~tunes/malevolent_creation/Retribution/01\ Eve\ of\ the\ Apocalypse.mp3

The only issue is that ext2online is a little unpredictable. So far I've seen it grow data and not corrupt it, but it hasn't grown data as much as I'd expect. If I find out more I'll share it.

Update I was able to follow a similar process online as per randombugs.com and it worked well.

Monday, March 17, 2008

FSF Weekend

I attended the FSF Meeting. I had a good time as I often do. Ben Klemens was my favorite speaker. I was sorry to not see Eben Moglen or Gerald Sussman.

It was interesting that some people seemingly new to the organization had misread the philosophy and were driving themselves mad with it. E.g. one person asked if it was unethical for him to use packets that went through routers running non-free software. The consensus was that he'd drive himself crazy worrying about that sort of thing and boycotting businesses that choose to use non-free software is silly. It's the service provider's choice and the non-freedom of their equipment is their problem. They are the victims of proprietary software, not the users. Furthermore if you're at your friend's house and want to borrow his computer to look something up and you use proprietary software you can still be a member of the Church of Emacs. Lighten up. There are better things you can do to help the FSF than seek purity like some sort of ascetic while annoying your friends.

I ran into a friend who shared some cool things that he's into: the Kinesis Advantage Pro and javascript. Everyone knows about javascript but after talking with him about what he's up to with it I realize that I was too dismissive of it in the past. I've known how handy it is when combined with web services (buzzword intentionally omitted) but I didn't know it was such a nice language in itself. It was written by a Lisper and supports higher order functions, lexical closures and other lisp similarities. He showed me some code which was dispatching event handlers within closures and I'm inspired. I'm going to learn more javascript. He said there weren't any great javascript books, especially for a lispy style, but that O'Reilly's book is decent.

I also ran into another friend who mentioned that ruby is not context-free and that its developers were unable to add a new syntax because they didn't understand the consequences of this. He said they even reported a bug to bison.

Tuesday, March 11, 2008

spf and mail loops

It's a great day for slightly unusual mail questions:

1. SPF
My employer has an SPF record in DNS. A third-party marketer working with a department has been setting the from address to my user's addresses. Some mail servers are actually using our SPF record and bouncing the mail back to the falsified from address. The third-party vendor is now asking me to add their IP to our SPF record. If their goal is for recipients to reply to my users, then they should use the reply-to field. I say "no".

2. Mail Loops
My mailer-daemon sent a user a mail loop: too many hops (too many 'Received:' header fields) failure notice. I sent explanations ranging from friendly to medium to rigorous.

Monday, March 3, 2008

NetScaler

A friend of mine was saying good things about NetScaler (acquired by Citrix). It's a load balancing layer-7 content switch. Their software is mainly a FreeBSD kernel module called the Core Packet Processing Engine which has been tuned to minimize unresponsiveness due to I/O interrupts . He suggested that I might want to buy three (@~$50k each) -- two in my primary data center and one in my backup -- and use it to load balance all my critical services. Apparently it can determine what servers are under utilized or not available, look at user packets and then intelligently reroute them to the optimal server. This made me uncomfortable at first since in the past I've wanted to keep the intelligence in my application. Especially for webapps that write to a database. How could I trust a switch to change the server in between the steps of a transaction? There are white papers on how this works and the Optimizing Web Applications paper has a six page explanation. My friend mentioned that I can customize it to load balance for my application. E.g. I could create specific rules by using a Perl API to make regex's it will follow when processing packets. People are really doing this kind of thing, including Google.

*nix hacks