Friday, March 21, 2008

How to make a growable LUN

We have a SAN in our department. There have been times when some of us have needed to make certain LUNs larger and the process has been non-optimal: arrange for downtime, make a new LUN and finally move the data there and remount it.

If you have a feeling that other organizations with SANs don't do this you're probably right. The SAN offers the ability to expand LUNs without any downtime so why should we not be able to do this with the filesystem? The problem is that after you magically grow the disk behind the scenes you still need to expand the filesystem to recognize it without any downtime. The solution is to configure the raw disks not with ext3 but with ext3 on top of LVM:

  • Create the LUN with the SAN's interface
  • When the LUN is presented via iSCSI or HBA setup LVM
  • Make ext3 on top of the LVM (not the raw device the SAN presents)
Then if you need to grow the mount point in the future:
  • Have SAN do a block-level migration to grow the LUN
  • Use lvextend and ext2online to have the filesystem grow into the new blocks

The one thing about LVM that some people are confused about is if I have a disk why wouldn't I just make it the max size it could be from the start? The answer is that you might add more disk later using SAN tricks and then you'll wish you had used LVM in order to grow.

In order to practice this (since you don't just want to do this on production data) you can use a USB thumb drive.

  1. Make two partitions on the USB thumb drive
  2. Make an LVM on the first partition
  3. Make a filesystem on the LVM and store some data
  4. Add the second partiton to the LVM
  5. Grow the filesystem to use the second partition
  6. The data should be unchanged but we'll have more capacity
I'll say that agin but with details:

Make two partitions on the USB thumb drive

$ fdisk /dev/sda
It there are no partitions:
Command (m for help): p

Disk /dev/sda: 65 MB, 65536000 bytes
3 heads, 42 sectors/track, 1015 cylinders
Units = cylinders of 126 * 512 = 64512 bytes

   Device Boot      Start         End      Blocks   Id  System
Create two partitions with 'n':
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1015, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-1015, default 1015): 507
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (508-1015, default 508): 
Using default value 508
Last cylinder or +size or +sizeM or +sizeK (508-1015, default 1015): 
Using default value 1015

Command (m for help): p

Disk /dev/sda: 65 MB, 65536000 bytes
3 heads, 42 sectors/track, 1015 cylinders
Units = cylinders of 126 * 512 = 64512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         507       31920   83  Linux
/dev/sda2             508        1015       32004   83  Linux

Command (m for help): 
Note that both got the System ID "83 Linux". You'll want to change it to type "8e Linux LVM" (you'll see this listed via 'l'). Use 't' to change a partition's system id (you'll see this listed via 'm').
Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 8e
Changed system type of partition 2 to 8e (Linux LVM)

Command (m for help): p

Disk /dev/sda: 65 MB, 65536000 bytes
3 heads, 42 sectors/track, 1015 cylinders
Units = cylinders of 126 * 512 = 64512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         507       31920   8e  Linux LVM
/dev/sda2             508        1015       32004   8e  Linux LVM

Command (m for help):
Type 'w' to write the partition to disk. For the rest of this exercise we'll work on /dev/sda1 but near the end we'll include /dev/sda2 as if it's been added after a long time and more disk became available.

Make an LVM on the first partition

To use LVM, partitions or whole disks must first be converted into physical volumes (PVs) using the pvcreate command.

$ pvcreate /dev/sda1
  Physical volume "/dev/sda1" successfully created
$
Once you have one or more physical volumes created, you can create a volume group from these PVs using the vgcreate command.
$ vgcreate lvm_test /dev/sda1
  Volume group "lvm_test" successfully created
$ 
Note that you could have added /dev/sda2 as a third argument and other partitions or physical devices as needed to create the partition with more volumes. However we want to simulate growing a partition with data on it so we'll reserve the following for later:
pvcreate /dev/sda2
vgextend lvm_test /dev/sda2
Remember, don't do the above yet. You can see what you have so far in the volume group with vgdisplay:
$ vgdisplay lvm_test
  --- Volume group ---
  VG Name               lvm_test
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               28.00 MB
  PE Size               4.00 MB
  Total PE              7
  Alloc PE / Size       0 / 0   
  Free  PE / Size       7 / 28.00 MB
  VG UUID               komlLr-rmIb-VUKH-z6Xs-Upl3-eRzY-J1tgSm
   
$
We'll use the Total PE above to create a new logical volume using all available space in lvm_test:
$ lvcreate -n lvm_test_vol0  -l 7 lvm_test
  Logical volume "lvm_test_vol0" created
$
If we had more space we could create a second logical volume but we've used it all. Verify this with vgdisplay and note that the Alloc PE and Free PE have transposed. lvcreate will create a linear allocation by default. If we had multiple disks (not partitions) and we wanted to stripe them for better retrieval then we would use "lvcreate -i2 -I4" to create two stripes and stripe size of 4 KB. You should see a new device in /dev as a result of the lvcreate:
$ ls -l /dev/lvm_test/
total 0
lrwxrwxrwx  1 root root 34 Mar 20 09:19 lvm_test_vol0 
                           -> /dev/mapper/lvm_test-lvm_test_vol0
$
We will use this device (/dev/lvm_test/lvm_test_vol0) as if it were a raw disk. LVM is inserting a layer that will allow us to grow this device later without changing the data on the filesystem.

Make a filesystem on the LVM and store some data

Make an ext3 filesystem on the new logical device and then mount it:

$ /sbin/mkfs.ext3 /dev/lvm_test/lvm_test_vol0
$ mount -t ext3 /dev/lvm_test/lvm_test_vol0 test/
Verify your new mount point and put some data on it.
$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       28M  1.4M   25M   6% /mnt/test
$ cd /mnt/test/
$ echo "please don't break me" > file.txt
Let's put some extra data on there until it's full (this is fair use I own the CD):
$ cp ~tunes/malevolent_creation/Retribution/* .
cp: writing `./06 The Coldest Survive.mp3': No space left on device
cp: writing `./07 Monster.mp3': No space left on device
cp: writing `./08 Mindlock.mp3': No space left on device
cp: writing `./09 Iced.mp3': No space left on device
$ ll
total 24883
-rwx---r-x  1 root root 5230857 Mar 20 09:53 01 Eve of the Apocalypse.mp3
-rwx---r-x  1 root root 4172897 Mar 20 09:53 02 Systematic Execution.mp3
-rwx---r-x  1 root root 4493160 Mar 20 09:53 03 Slaughter of Innocence.mp3
-rwx---r-x  1 root root 6128950 Mar 20 09:53 04 Coronation of Our Domain.mp3
-rw-r--r--  1 root root 5324912 Mar 20 09:53 05 No Flesh Shall Be Spared.mp3
-rw-r--r--  1 root root      22 Mar 20 09:50 file.txt
drwx------  2 root root   12288 Mar 20 09:46 lost+found
So we couldn't fit the rest of the album. Let's try to fix that.

Add the second partiton to the LVM

We're now ready to add /dev/sda2 to lvm_test:

$ pvcreate /dev/sda2
    Physical volume "/dev/sda2" successfully created
$ vgextend lvm_test /dev/sda2
    Volume group "lvm_test" successfully extended
$
This simulates adding more disk provided that it presents a physical device. If we had grown a LUN on our SAN we wouldn't need do the above steps since /dev/sda2 wouldn't be getting added. Instead it would be more like /dev/sda1 had grown behind the scenes and we'd just need to go to the next step.

We'll verify that we have the extra space (you should do this for this exercise or if you've grown the LUN with your SAN):

$ vgdisplay lvm_test
  --- Volume group ---
  VG Name               lvm_test
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               56.00 MB
  PE Size               4.00 MB
  Total PE              14
  Alloc PE / Size       7 / 28.00 MB
  Free  PE / Size       7 / 28.00 MB
  VG UUID               komlLr-rmIb-VUKH-z6Xs-Upl3-eRzY-J1tgSm

$
Note that VG and PE Sizes have doubled and that the Free PE size now shows our extra available space to use this space:
$ lvextend -L+28M /dev/lvm_test/lvm_test_vol0
  Extending logical volume lvm_test_vol0 to 56.00 MB
  Logical volume lvm_test_vol0 successfully resized
$ 
You can see the change from the above with vgdisplay:
$ vgdisplay lvm_test | grep Alloc
  Alloc PE / Size       14 / 56.00 MB
$ 
Note that the mounted filesystem still doesn't see the extra space:
$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       28M   26M   85K 100% /mnt/test
$ 

Grow the filesystem to use the second partition

So far we've added the extra disk to the LVM without umounting the filesystem. Now to we'll extend the filesystem to use the rest of logical volume. We could first umount it and be on safe side according to the LVM howto. However RedHat claims it's possible to expand both the ext3fs and GFS file systems online without bringing the system down:

For this exercise we'll extend the filesystem online. However if downtime is available it's probably better to take advantage of it and use ext2resize instead of ext2online.

Running this command produced interesting results:

$ ext2online /dev/lvm_test/lvm_test_vol0 
ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
ext2online: ext2_ioctl: No space left on device

ext2online: unable to resize /dev/mapper/lvm_test-lvm_test_vol0
$ 
However I have some extra space:
$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       39M   26M   12M  70% /mnt/test
$ 
It's probably realted to the filesystem being full. How full?

Mounting and remounting produces the same 39M volume and ext2resize isn't available. ext2online complains if the drive isn't mounted. Let's try freeing up some space (metal names make howto's funny):

$ rm 05\ No\ Flesh\ Shall\ Be\ Spared.mp3
$ df -h | grep test
/dev/mapper/lvm_test-lvm_test_vol0
                       39M   21M   17M  56% /mnt/test
$
Same result. Let's see what it looks like:
$ vgdisplay lvm_test
  --- Volume group ---
  VG Name               lvm_test
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               56.00 MB
  PE Size               4.00 MB
  Total PE              14
  Alloc PE / Size       14 / 56.00 MB
  Free  PE / Size       0 / 0   
  VG UUID               komlLr-rmIb-VUKH-z6Xs-Upl3-eRzY-J1tgSm
   
$ 
So all available space is used. We've gone from:
  Alloc PE / Size       7 / 28.00 MB
to:
  Alloc PE / Size       14 / 56.00 MB
So we have extra capacity, but it's not as much as I would have expected.

The data should be unchanged but we'll have more capacity

Yes:
$ diff 01\ Eve\ of\ the\ Apocalypse.mp3 ~tunes/malevolent_creation/Retribution/01\ Eve\ of\ the\ Apocalypse.mp3 
The only issue is that ext2online is a little unpredictable. So far I've seen it grow data and not corrupt it, but it hasn't grown data as much as I'd expect. If I find out more I'll share it.

Update I was able to follow a similar process online as per randombugs.com and it worked well.

1 comment:

Unknown said...

Hey , i know why ext2online not woked correctly , its due to a Bug in linux (Ahh forgot the bug number), if you unmount and try
resize2fs it will work ,

This online resize is not working due to the less journal size.

# dmesg | tail -n 1
JBD: resize2fs wants too many credits (292 > 256)

Linux has a patch for this , They say it will be fixed in next releases.

Thanks
Bipul