*nix hacks: January 2008

Wednesday, January 30, 2008

standard tightening

I often run the following commands when I set up a new server:

su -
/usr/sbin/visudo
/usr/sbin/usermod -a -G wheel $USER
sed -i s/"PasswordAuthentication yes"/"PasswordAuthentication no"/g 
    /etc/ssh/sshd_config
/etc/init.d/sshd restart
exit
sudo ls

Thursday, January 24, 2008

xfs rhel4

Problem

Make a RHEL4 system mount a partition which can support more directories than ext3's inode max will allow.

Solution

Use a kernel module to use xfs. Note that we don't need a new kernel, just a new kernel module. There are RPMs for this. If you can install them correctly this won't even require any downtime.

Details

Going to test by making an XFS USB thumb drive. First we install the XFS Kernel Module. There is a howto for doing this with kernel modules via RPMs. For details see faqs.org.

I need 3 RPMS: xfsprogs, xfsprogs-devel, and the kernel-module-xfs:

rpm -Uvh xfsprogs-[kernel-version][rpm-version].rpm
rpm -Uvh xfsprogs-devel-[kernel-version][rpm-version].rpm
rpm -ivh kernel-module-xfs-[kernel-version][rpm-version].rpm

Given what I'm running:

# uname -r
2.6.9-67.0.1.EL
#

and a bit of searching I found a mirror which had the 2.6.9-67.0.1.EL kernel-module-xfs. Note that xfsprogs and xfsprogs-devel don't necessarily have to be the exact same version, just the specific kernel module. After following the order above I'm able to load the kernel module and verify that I have the XFS mkfs:

# modprobe xfs
# lsmod | grep xfs
xfs                   526832  0 
# which mkfs.xfs
/sbin/mkfs.xfs
#

Next I'll look at the partition on the thumb drive (/dev/sda1 as per dmesg) and determine that I can mount it:


# parted
(parted) select /dev/sda1                                                 
Using /dev/fd0
(parted) mklabel msdos                                                    
(parted) print                                                            
Disk geometry for /dev/fd0: 0.000-1.406 megabytes
Disk label type: msdos
Minor    Start       End     Type      Filesystem  Flags
(parted) quit
# mount -t vfat /dev/sda1 /mnt/usb/
# umount /mnt/usb/

Then we format the partition for XFS:

# /sbin/mkfs.xfs -f -i size=512,maxpct=0 /dev/sda1 
meta-data=/dev/sda1              isize=512    agcount=3, agsize=4096 blks
         =                       sectsz=512  
data     =                       bsize=4096   blocks=12288, imaxpct=0
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal log           bsize=4096   blocks=1200, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0
#

Finally we verify that we can mount it:

# mount -t xfs /dev/sda1 /mnt/usb/
# mount | grep xfs
/dev/sda1 on /mnt/usb type xfs (rw)
#

After doing this you can see how many inodes it can handle and test it empirically. The following perl script will attempt to make an arbitrary number of directories:

#!/usr/bin/perl
$num_dirs = 38000;
system "mkdir test";
for($i=0; $i < $num_dirs; $i++) {
  system "mkdir test/$i";
  print "$i\n";
}

You can then run it in one window while you watch it eat inodes in the other:

# df -i /mnt/usb/
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1              86712   24138   62574   28% /mnt/usb
#
...
# df -i /mnt/usb/
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1              86176   38007   48169   45% /mnt/usb
#

So you can fill up half a drive with nothing by empty dirs:

# df -h /mnt/usb/
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              44M   20M   24M  46% /mnt/usb
#

update

When I try to use XFS on an iSCSI LUN I get a kernel panic. All I have to do is mount the LUN, mkdir and then rmdir:

# rmdir 2
Message from syslogd@localhost at Tue Jan 29 10:38:34 2008 ...
kernel: Bad page state at free_hot_cold_page (in process 'iscsi-rx', page c1682a20)
 
Message from syslogd@localhost at Tue Jan 29 10:38:34 2008 ...
kernel: flags:0x20000084 mapping:00000000 mapcount:0 count:0
 
Message from syslogd@localhost at Tue Jan 29 10:38:34 2008 ...
kernel: Backtrace:
 
Message from syslogd@localhost at Tue Jan 29 10:38:35 2008 ...
kernel: Trying to fix it up, but a reboot is needed
 
#

Someone else has this problem too.

Tuesday, January 22, 2008

Fibre Channel I/O Calls

A colleague of mine came across the following fact regarding I/O and fabric switches:

2 Gbps FC can queue 254 I/O commands
4 Gbps FC can queue 2048 I/O commands

We wonder if moving a certain server from a 2G switch (McData DS-24M2) to a 4G switch (Cisco MDS-9124) will improve performance. In order to determine this I'd need to see how many I/O commands we have for different points in time.

I've blogged about seeing I/O with /proc/diskstats before. Let's look at it closer with awk. Note that I don't have anything conclusive below, just some observations. I do think I can use this over time and recognize trends on my system however.

Here's what proc says about a particular LUN:

# cat /proc/diskstats |  grep " sdc "
   8   32 sdc 391348542 3811329 642958694 1765819166 212637694 
1424571277 438970288 1314722135 1 366113251 3445284834
#

As per comp.os.linux.development these fields (starting after the device name) are:

Field 1 -- # of reads issued
Field 2 -- # of reads merged, field 6 -- # of writes merged
Field 3 -- # of sectors read
Field 4 -- # of milliseconds spent reading
Field 5 -- # of writes completed
Field 7 -- # of sectors written
Field 8 -- # of milliseconds spent writing
Field 9 -- # of I/Os currently in progress
Field 10 -- # of milliseconds spent doing I/Os
Field 11 -- weighted # of milliseconds spent doing I/Os

Or to put it another way:

 391348542 reads issued (4)
   3811329 reads merged (5)
 642958694 sectors read (6)
1765819166 milliseconds spent reading (7)

 212637694 writes completed (8)
1424571277 writes merged (9)
 438970288 sectors written (10)
1314722135 milliseconds spent writing (11)

         1 I/Os currently in progress (12)
 366113251 milliseconds spent doing I/Os (13)
3445284834 weighted milliseconds spent doing I/Os (14)

Note that I've put the awk offset in parentheses above. We can then take more readings and focus on essential columns. E.g. we spend more time reading than writing:

#  while [ 1 ]; do grep " sdc " /proc/diskstats |
     awk {'print $7 " " $11'}; sleep 1; done
1767053699 1323835167
1767053722 1323835217
1767054231 1323858000
1767054400 1323858477
1767054401 1323859097
1767054420 1323859106
1767055201 1323863662
1767055543 1323863671
1767055666 1323864799
1767056048 1323865700
#

If we look at them every quarter second we can see spikes in the number of I/O along with number of reads and write issues during that time (looking at a larger interval hides the spikes):

#  while [ 1 ]; do grep " sdc " /proc/diskstats | 
     awk {'print  $12 "\t" $4 "\t" $8'}; sleep 0.25; done
1       391689249       213184077
1       391689253       213184467
4       391689253       213184912
4       391689253       213185311
4       391689253       213185780
1       391689257       213186170
1       391689257       213186558
2       391689258       213187017
68      391689271       213187319
1       391689271       213187801
2       391689313       213188219
1       391689338       213188481
2       391689379       213188863
44      391689379       213189282
32      391689384       213190180
3       391689400       213190569
3       391689400       213190971
1       391689405       213191429
3       391689407       213192172
#

We can check the math on the last few lines. Because our sampling interval is missing events that occur in between our numbers won't add up exactly, but we can see a general trend in some of these numbers:

1       391689338       213188481
2       391689379       213188863
44      391689379       213189282
32      391689384       213190180

There were a lot more writes than reads from the samples taken above

3       391689400       213190569
3       391689400       213190971

Nothing was read, but 3 I/O operations seem to have been written.

I don't have anything conclusive from the above but I do think I can use this over time and recognize trends on my system.

Monday, January 7, 2008

RHEL5 Install via Ubuntu NFS

After booting a RHEL4 or 5 disk with "linux askmethod" and getting the system online (the Celerra could ping it) and making sure the appropriate IP was in the ACL, I still couldn't mount the ISO images to install:

RPC timeout that directory could not be mounted from the server

I ended up mounting the Celerra from an already installed system on the same network, copying the ISOs over to it and then umounting the Celerra. I then made that system an NFS server and now the five RHEL5 hosts have no problem mounting it for an NFS based install.

When setting up an NFS server make sure that portmap is running so that it listens on port 111 for the RPC calls that NFS needs.

# netstat -npl | egrep "111|2049"
tcp        0      0 0.0.0.0:2049            0.0.0.0:*               LISTEN     -                   
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN     29969/portmap       
udp        0      0 0.0.0.0:2049            0.0.0.0:*                          -                   
udp        0      0 0.0.0.0:111             0.0.0.0:*                          29969/portmap

It's annoying that I'm not sure why the Celerra won't serve this purpose and that I don't have any log data to figure out why. If time allows I'll try again with tcpdump but I need to get these hosts installed.

Update

The firewall admin noticed that the host that was booted from the RHEL5 CD was trying to make a UDP connection to port 1234 of the Celerra. Opening this seems to have fixed the problem.

nfs-common

Out of the box Ubuntu will support NFS mounting but will take about 90 seconds to do it and not work well. If you check /var/log/messages you'll see errors [1]. To fix this install the nfs-common package:

http://packages.ubuntu.com/feisty/net/nfs-common

Footnote:

[1]

[4661210.004709] portmap: server localhost not responding, timed out
[4661210.004745] RPC: failed to contact portmap (errno -5).
[4661244.949461] portmap: server localhost not responding, timed out
[4661244.949496] RPC: failed to contact portmap (errno -5).
[4661244.949513] lockd_up: makesock failed, error=-5
[4661279.894214] portmap: server localhost not responding, timed out
[4661279.894248] RPC: failed to contact portmap (errno -5).
[4661279.894255] nfs: Starting lockd failed (do you have nfs-common installed?).
[4661279.894284] nfs: Continuing anyway, but this workaround will go away soon.

NFS: Stevens' TCP/IP Ch 29

Stevens' TCP/IP Illustrated, Volume 1 The Protocols Chapter 29 explains NFS in terms of UDP protocols and RPC. If you're making firewall rules for NFS you need to allow ports 2049 for NFS itself and 111 for RPC. It's possible for NFS to use both TCP and UDP for both ports.

Friday, January 4, 2008

Celerra Command Line Non-Troubleshooting

My colleague offers NFS service via an EMC Celerra NS 502G. I want to be able to troubleshoot it by grepping its logs for errors. Either it doesn't keep log files for the errors I've been encountering or I couldn't find them.

The problem

Originally I couldn't mount the host because port 2049 was not open to the client (so remember to check the network layer first with telnet). I then became curious and tried to mount from a host that worked in the past which I then specifically removed from the Celerra ACL:

mount -t nfs nas0.prd.domain.tld:/isos /mnt/isos/
mount: nas0.prd.domain.tld:/isos failed, reason given by server: Permission denied

My goal is to know where the Celerra logs these types of issues. I don't think it does, but I'm trying to prove a negative by searching so I could have missed something.

Getting to the command line:

You can SSH to a Celerra as nasadmin. It's really just a GNU/Linux box:

[root@nas_cs0 root]# uname -a
Linux nas_cs0 2.4.20-28.5506.EMC #1 Tue Aug 8 22:16:20 EDT 2006 i686 unknown
[root@nas_cs0 root]#

It's got a 2 GHz Celeron and 512MB of RAM:

[root@nas_cs0 etc]# dmesg | grep -i cpu
Initializing CPU#0
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 128K
CPU:     After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:             Common caps: bfebfbff 00000000 00000000 00000000
CPU: Intel(R) Celeron(R) CPU 2.00GHz stepping 09
[root@nas_cs0 etc]# free -m
             total       used       free     shared    buffers     cached
Mem:           503        469         33          0         77        185
-/+ buffers/cache:        207        295
Swap:          509        247        262
[root@nas_cs0 etc]#

Seems to be RPM based, probably RedHat:

[root@nas_cs0 var]# rpm -qa | wc -l
    262
[root@nas_cs0 var]#

What files are useful from here?

You can look in /celerra/backendmonitor to see some of the configuration files. But where are the log files? One way to find log files on any box is to find all the items that were modified after when you tested what you're trying to debug:

# touch /tmp/x
# find / -type f -newer /tmp/x  2> /dev/null | grep -v proc

In the above I'm ignoring proc and standard errors while trying to find files newer than now (since I just touched /tmp/x). This returns:

/var/log/pacct
/tmp/ch_globals.tmp
/nas/log/eventstore/slot_1/sys_log

I then hopped into /nas/log/ and tried to find files containing the IP of the host that couldn't NFS mount the system:

[root@nas_cs0 log]# find . -exec fgrep -q "123.456.7.89" '{}' \; -print 2> /dev/null 
./nas_log.al
./cmd_log
./cel_api.log
[root@nas_cs0 log]#

All of the above just contained logs from when the host was added to the ACL.

Non-results

I wish I could end with and then I found the log in ... but I never found useful logs. Since I searched for files modified after the time of error and found nothing, my position is that it's not logging these errors. It might have been easier to just buy a server with fibre cards and let it work as an NFS wrapper. Then I'd have a more standard NFS server. At least it does iSCSI, but I haven't yet trouble shot it at this level of detail. I also found some comments on EMC's NFS implementation.

OpenMoko update

There will soon be a new developer open moko phone. However, we still seem to be in phase 1. I'm waiting for phase 2 since I can't do without a reliable means of making phone calls. It would be fun to just get one to develop on, but I've got too much going on. I read yesterday that they'll make it "available to the mass market later this year". The year just started so it might just be a whole year. It's been delayed before.

rhel5 nfs install

I have an NFS share with ISOs [1]. When installing a RHEL4 system the boot disk offers to use an NFS source. I am then able to bring up eth0, mount the NFS server and complete the install with one CD. RHEL5 is different and doesn't have a KickStart CD. However, you can achieve the same effect by booting from the first CD with "linux askmethod". The installer will will then prompt for the NFS server details.

Footnote:

[1]

# mount -t nfs nas0.prd.domain.tld:/isos /mnt/isos/
# ls /mnt/isos/datastore1/rhel* 
/mnt/isos/datastore1/rhel4u4_i386:
RHEL4-U4-i386-ES-disc1.iso  RHEL4-U4-i386-ES-disc4.iso
RHEL4-U4-i386-ES-disc2.iso  RHEL4-U4-i386-ES-disc5.iso
RHEL4-U4-i386-ES-disc3.iso

/mnt/isos/datastore1/rhel4u4_x86_64:
RHEL4-U4-x86_64-ES-disc1.iso  RHEL4-U4-x86_64-ES-disc4.iso
RHEL4-U4-x86_64-ES-disc2.iso  RHEL4-U4-x86_64-ES-disc5.iso
RHEL4-U4-x86_64-ES-disc3.iso

/mnt/isos/datastore1/rhel5u1_i386:
rhel-5-server-i386-disc1.iso  rhel-5-server-i386-disc4.iso
rhel-5-server-i386-disc2.iso  rhel-5-server-i386-disc5.iso
rhel-5-server-i386-disc3.iso

/mnt/isos/datastore1/rhel5u1_x86_64:
rhel-5-client-x86_64-disc1.iso  rhel-5-client-x86_64-disc5.iso
rhel-5-client-x86_64-disc2.iso  rhel-5-client-x86_64-disc6.iso
rhel-5-client-x86_64-disc3.iso  rhel-5-client-x86_64-disc7.iso
rhel-5-client-x86_64-disc4.iso

/mnt/isos/datastore1/rhel-5-x86-64:
rhel-5-server-x86_64-disc1.iso  rhel-5-server-x86_64-disc4.iso
rhel-5-server-x86_64-disc2.iso  rhel-5-server-x86_64-disc5.iso
rhel-5-server-x86_64-disc3.iso
#

Wednesday, January 2, 2008

dreams of being cracked

I read about a new theory on dreaming:

Dreams are a sort of nighttime theater in which our brains screen realistic scenarios. This virtual reality simulates emergency situations and provides an arena for safe training: "The primary function of negative dreams is rehearsal for similar real events, so that threat recognition and avoidance happens faster and more automatically in comparable real situations." Dreaming helps us recognize dangers more quickly and respond more efficiently. ... The difference between the typical and optimal response could save your life. But making such a reaction swift and automatic takes practice. It's the reason martial arts students drill their movements over and over. Frequent rehearsal prepares them for that one decisive moment, ensuring that their response in an actual life-or-death situation is the one they practiced. Dreams may do the same thing.

It even offers a method for a brain to for select what to dream about:

The dreaming brain scans emotional memories. When it detects a memory trace with a strong negative emotion, it constructs a nightmare around that theme. The more traumatic the event, the more intense the nightmare. The brain's system for detecting threats is sensitive and flexible: Anything the brain tags with a strong negative charge gets thrown into the threat bin and dredged up at night. ... Even a single exposure to a life-threatening situation can plunge a person into an inferno of post-traumatic nightmares, dreams in which the threatening event—the attack, the rape, the war—is repeated over and over in every possible variation.

I had a dream I had been socially engineered. That I had given a root password to someone I suddenly realized was not to be trusted. If this theory of dreaming is correct it's good that it's in time with modern fears, not just being chased by tigers.

don't chmod 777

I came across a document which said "In order to use all these tools, you have to change the chmod of wp-content folder to 777".

I very rarely do this and it's usually unnecessary. It's analogous to unlocking all the doors so that your friend can use one of them. I normally prefer to unlock only the necessary door ("That's just perfectly normal paranoia" -- Slarty Bartfast). If you're being lazy and writing a doc you might just ask the user to 777 a directory since you probably don't want to write long notes like this one explaining the different types of doors and the conditions in which you should unlock different ones. 777 will always work, especially if the user installing the software doesn't have root. If you do have root, then you should just chown or chgrp to the user or group which needs to write to the directory. This is normally apache and this is what I recommend.

However, this doesn't mean problem solved and everything is secure. It means the back door is open and you could be at risk if the person you've asked to watch it for you is incompetent. Since WordPress is a popular and active project with a lot of developers I'm going to err on the side of trusting them and endorse letting apache write to said part of the file system. You should then put a noexec .htaccess in that directory so that if something bad is uploaded it can't be run.

Let me explain more about what I mean by not asking some incompetent person (or script) to watch the back door. Some web applications need to be able to write to the file system for them to be of use. E.g. WordPress probably wouldn't be too handy without this feature. By the bug:feature ratio law we will also open ourselves to plenty of exploits. It all depends on an arms race between developers and crackers. If you are installing some dead project like "Jim's PHP-weekend photo gallery" and he hasn't put too much work into thinking about possible exploits or updating his code you are putting the server at risk. Someone could drop in some code to use your box as a spam relay or they might even upload PHP shell and try to root the box. I've seen it happen.

For web servers where many users have shell accounts allowing them to make the call on whether a project is secure enough to be hosted with the apache write option is not a good idea. You basically need to support apache reads only. Your best bet if they want to do writes is to force it though a database like MySQL. This works well for text (the majority of cases) but is not a good idea for attachments. If a good open source project had a generalized attachment solution which focused on security and provided an API and then other projects adapted it, I think admins and users would be happier.

NFS intr

I've been using the NFS intr option:

If an NFS file operation has a major timeout and it is hard mounted, then allow signals to interupt the file operation and cause it to return EINTR to the calling program. The default is to not allow file operations to be interrupted.

The Linux Network Administrators Guide or wlug has more information.

But like others I've had problems when the network share went away. Normally RedHat's fix has worked for me in the past but I recently had it not work. I ended up doing a lazy umount (-l option) to get off of it without hurting the server.