Friday, January 4, 2008

Celerra Command Line Non-Troubleshooting

My colleague offers NFS service via an EMC Celerra NS 502G. I want to be able to troubleshoot it by grepping its logs for errors. Either it doesn't keep log files for the errors I've been encountering or I couldn't find them.

The problem

Originally I couldn't mount the host because port 2049 was not open to the client (so remember to check the network layer first with telnet). I then became curious and tried to mount from a host that worked in the past which I then specifically removed from the Celerra ACL:
mount -t nfs nas0.prd.domain.tld:/isos /mnt/isos/
mount: nas0.prd.domain.tld:/isos failed, reason given by server: Permission denied
My goal is to know where the Celerra logs these types of issues. I don't think it does, but I'm trying to prove a negative by searching so I could have missed something.

Getting to the command line:

You can SSH to a Celerra as nasadmin. It's really just a GNU/Linux box:
[root@nas_cs0 root]# uname -a
Linux nas_cs0 2.4.20-28.5506.EMC #1 Tue Aug 8 22:16:20 EDT 2006 i686 unknown
[root@nas_cs0 root]# 
It's got a 2 GHz Celeron and 512MB of RAM:
[root@nas_cs0 etc]# dmesg | grep -i cpu
Initializing CPU#0
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 128K
CPU:     After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:             Common caps: bfebfbff 00000000 00000000 00000000
CPU: Intel(R) Celeron(R) CPU 2.00GHz stepping 09
[root@nas_cs0 etc]# free -m
             total       used       free     shared    buffers     cached
Mem:           503        469         33          0         77        185
-/+ buffers/cache:        207        295
Swap:          509        247        262
[root@nas_cs0 etc]# 
Seems to be RPM based, probably RedHat:
[root@nas_cs0 var]# rpm -qa | wc -l
    262
[root@nas_cs0 var]# 

What files are useful from here?

You can look in /celerra/backendmonitor to see some of the configuration files. But where are the log files? One way to find log files on any box is to find all the items that were modified after when you tested what you're trying to debug:
# touch /tmp/x
# find / -type f -newer /tmp/x  2> /dev/null | grep -v proc
In the above I'm ignoring proc and standard errors while trying to find files newer than now (since I just touched /tmp/x). This returns:
/var/log/pacct
/tmp/ch_globals.tmp
/nas/log/eventstore/slot_1/sys_log
I then hopped into /nas/log/ and tried to find files containing the IP of the host that couldn't NFS mount the system:
[root@nas_cs0 log]# find . -exec fgrep -q "123.456.7.89" '{}' \; -print 2> /dev/null 
./nas_log.al
./cmd_log
./cel_api.log
[root@nas_cs0 log]# 
All of the above just contained logs from when the host was added to the ACL.

Non-results

I wish I could end with and then I found the log in ... but I never found useful logs. Since I searched for files modified after the time of error and found nothing, my position is that it's not logging these errors. It might have been easier to just buy a server with fibre cards and let it work as an NFS wrapper. Then I'd have a more standard NFS server. At least it does iSCSI, but I haven't yet trouble shot it at this level of detail. I also found some comments on EMC's NFS implementation.

No comments: