Monday, February 11, 2008

ext3++

The default limit for subdirectories in a single directory on ext2/ext3 filesystem is 32,000. This can be increased to a maximum of 65,500 by changing the source code for the ext3/ext2 filesystem and building a new kernel.

Preparing to build a RHEL4 Kernel

There are plenty of howtos [0] on building a RHEL4 kernel. My notes on this are:

Get the packages you'll need to build the kernel:

up2date -i kernel-devel redhat-rpm-config ncurses-devel rpm-build
Get the correct source for your current kernel. This just installs an RPM which you should install manually:
up2date --get-source kernel 
rpm -ivh /var/spool/up2date/kernel-2.6.9-67.0.4.EL.src.rpm
rpmbuild to extract and prepare the kernel sources. The prepare phase of rpmbuild will extract the source from the archive and apply RedHat patches:
cd /usr/src/redhat/SPECS/
rpmbuild -bp --target `uname -m` kernel-2.6.spec
Go to the kernel source:
cd /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9.  
Make custom configuration changes:
 make menuconfig
Note that I added support for Resierfs, JFS, XFS, NTFS (for further experimentation).

Modifying the Limit in the Kernel Source

From /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/ go to the ext2 and ext3 header files where the 32,000 hard limit is defined:
cd include/linux/
Observe what has to be changed:
$ grep 32000 ext2_fs.h ext3_fs.h 
ext2_fs.h:#define EXT2_LINK_MAX         32000
ext3_fs.h:#define EXT3_LINK_MAX         32000
Patch your kernel by modifying the files above such that 32000 is replaced by 65500. I'm told that greater values will not work.
sed -i s/32000/65500/g ext2_fs.h ext3_fs.h
Check that sed did the trick:
$ grep 65500 ext2_fs.h ext3_fs.h 
ext2_fs.h:#define EXT2_LINK_MAX         65500
ext3_fs.h:#define EXT3_LINK_MAX         65500

Build Your New Kernel

I originally tried building an RPM for my new kernel. I ran:
rpmbuild --target=i686 -ba /usr/src/redhat/SPECS/kernel-2.6.spec
and after waiting 5 hours for five types of Kernel RPMs (i686, smp, hugmem and xen) to build on a 1Ghz system and booting the new i686 kernel I found that my patch as above was undone. I.e. the grep returned 32000, not 65500. The new kernel also failed to support more than 32,000 subdirectories. Thus, I'm building my kernel with make.

Go to the original source directory:

cd /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/
Start building:
make
make modules_install
make install
The new kernel, initrd, and system.map will all be copied into /boot/ and the new kernel should appear in GRUB as "2.6.9-prep" on the list of kernels to boot.
  CHK     include/linux/version.h
make[1]: `arch/i386/kernel/asm-offsets.s' is up to date.
  CHK     include/linux/compile.h
Kernel: arch/i386/boot/bzImage is ready
sh
/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/arch/i386/boot/install.sh
2.6.9-prep arch/i386/boot/bzImage System.map ""
In /etc/grub.conf, /vmlinuz-2.6.9-67.0.1.EL was changed from 0 to 1. But 1 remained the default. Changed to 0 manually. The 65500 in the header files remain. Reboot.

Test the Extended Capabilities

When the system comes back online check that you have your kernel:
 
$ uname -r
2.6.9-prep
Make an ext3 file system on some device. In this case I'm testing on a 64M USB thumb drive. Since the amount of inodes you can have is a function of the size of your disk, we need to pay attention to how many blocks per inode we have. Thus, I'm passing mkfs some options to minimize the bytes:inode ratio as well as make the blocks as small as possible.
$ /sbin/mkfs.ext3 -b 1024 -i 1024 /dev/sda1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
63744 inodes, 63724 blocks
3186 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=65273856
8 block groups
8192 blocks per group, 8192 fragments per group
7968 inodes per group
Superblock backups stored on blocks: 
        8193, 24577, 40961, 57345
I'm using the smallest possible block size for ext3 (must be 1024, 2048 or 4096). Note that with XFS, the block size can theoretically be any power-of-two multiple of 512 bytes up to 64KB.

I'm also passing "-i bytes-per-inode". As per the man page:

Specify  the  bytes:inode  ratio.   mke2fs creates an inode for
every bytes-per-inode bytes of space on the disk.   The  larger
the  bytes-per-inode  ratio,  the fewer inodes will be created.
This value generally shouldn't be smaller than the blocksize of
the  filesystem,  since  then too many inodes will be made.  Be
warned that is not possible to expand the number of inodes on a
filesystem after it is created, so be careful deciding the cor-
rect value for this parameter.
Since the ratio shouldn't be smaller than the blocksize I'm setting it to the lowest possible value; equal to the blocksize. Note that there is also a -N option to pass the number-of-inodes, as per the man page:
       
overrides  the default calculation of the number of inodes that
should be reserved for the filesystem (which is  based  on  the
number  of  blocks and the bytes-per-inode ratio).  This allows
the user to specify the number of desired inodes directly.
However, I'm going to let mkfs compute the inode number based on the best blocksize and bytes:inode ratio for what I want to do.

Mount your new file system:

mount -t ext3 /dev/sda1 /mnt/usb/
and see how many inodes you have available:
$ df -i /mnt/usb/
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1              63744      11   63733    1% /mnt/usb
Note the important difference in the number of available inodes on this small device given the mkfs options. If I had used a 4096 block size and the standard bytes:inode ratio I would have the folllowing and I couldn't even cary out a meaningful test:
$ df -i /mnt/usb/
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1              15936    8804    7132   56% /mnt/usb
Finally, use a script to see how many subdirectories you can fit:
#!/usr/bin/perl
$num_dirs = 63743;
system "mkdir test";
for($i=0; $i < $num_dirs; $i++) {
  system "mkdir test/$i";
  print "$i\n";
}
I was able to break the 32,000 limit:
$ df -i /mnt/usb/
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1              63744   49584   14160   78% /mnt/usb
$ 
and create 49,570 subdirectories on a ext3 USB thumb drive. Note that my script failed before it finished since I used all my disk space:
$ df -h /mnt/usb/
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              55M   55M     0 100% /mnt/usb
But I wasn't limited by inodes.

Footnotes:

[0]

Howtos on Building a RHEL4 kernel:
http://kbase.redhat.com/faq/FAQ_85_8254.shtm
http://voidmain.is-a-geek.net/redhat/fedora_3_kernel_build.html
http://www.jukie.net/~bart/blog/20060410102824
http://lists.us.dell.com/pipermail/linux-poweredge/2005-April/020134.html
http://www-theorie.physik.unizh.ch/%7Edpotter/howto/modules

No comments: