*nix hacks: 2007

Sunday, December 30, 2007

Ubuntu DVD

Playing DVD's in Ubuntu isn't too difficult (see the Ubuntu doc or howto). I got good results from VLC. From there k9 can make an ISO of a movie if you need to back it up. Gnome can then burn the ISO to a DVD+R.

Friday, December 21, 2007

alpine

I can use pine again! Sort of. With alpine I can use a pine-like mail client and not have to worry about the undesirable license. Thanks slashdot.

vi query replace

I believe that emacs is ultimately more powerful than vi because it is written in Lisp and allows you to modify how it works on the fly. However, vi is smaller and faster -- sort of for the same reason -- and I prefer to use it for quick edits. In emacs I often M-% to query replace. So it's about time I found out how to do the same thing in vi:

: %s/foo/bar/gc

It's a lot like sed except for the c option which sed doesn't have since it's non-interactive by definition:

$ echo "I like unix" | sed s/i/u/
I luke unix
$ echo "I like unix" | sed s/i/u/g
I luke unux
$ echo "I like unix" | sed s/i/u/gi
u luke unux
$ echo "I like unix" | sed s/i/u/gc
sed: -e expression #1, char 8: unknown option to `s'
$

Think of the c as confirm.

Tuesday, December 18, 2007

https overview

I. If you're making your own numbers:

You just need a certificate and a private key. These can be encoded in one PEM file. It's easy to make your own PEM file with your own self-signed cert on a vanilla RedHat apache:

cd /etc/httpd/conf/
make host.domain.tld.pem
mv host.domain.tld.pem ssl.crt/

Just be sure to edit ssl.conf to reference the new cert. You can replace:

 SSLCertificateFile /etc/httpd/conf/ssl.crt/server.crt
 SSLCertificateKeyFile /etc/httpd/conf/ssl.key/server.key

With just:

 SSLCertificateFile /etc/httpd/conf/ssl.crt/host.domain.tld.pem

and then restart apache.

II. If you're buying numbers:

You'll probably be asked to:

Generate a .key
Generate a .csr
Give the CSR to the company and get back a .crt

Then have your httpd.conf reference the above respectively with the mod_ssl directives:

SSLCertificateKeyFile /usr/local/ssl/private/verisign.key
N/A
SSLCertificateFile /usr/local/ssl/certs/cert.crt

Once you have your cert you can look it over and see relevant things like how long it will be valid for with:

openssl x509 -noout -text -in cert.crt

Note that you might also run into intermediate certificates. Which Apache can reference with either of the following directives:

 SSLCertificateChainFile /usr/local/ssl/certs/intermediate.crt
 SSLCACertificateFile /usr/local/ssl/certs/intermediate.crt

depending on your version of Apache.

Duke NFS Slides

I found some nice slides on NFS from Duke.

Friday, December 14, 2007

name your server

Someone where I work wanted a bad name for a host. Part of it ended in ge since it had gigabit ethernet. We came up with a better name. I found a post on this topic. I think sprint has the right idea.

Thursday, December 13, 2007

sync; sync; halt

Several links on the controversial: sync sync halt.

One even claimed that sync on recent Linux kernels appears to be synchronous and not return until the data is out the door. I thought that only fsync did that.

MPIO

I'm considering native Multi Path I/O (MPIO) as an alternative to EMC PowerPath 5 for stability reasons. MPIO has been native since the 2.6.13 kernel. Back ports have been in RHEL since version 4U2. Dell has an overview and the multipath-tools project page's ReferenceBook has more details. The 2005 Linux Symposium (pages 155-175) also covers it and calivia.com has test results.

sync

File systems don't write your data to the disk directly. They buffer it until later and then write it. Thus, block level snapshots of your disk can be incomplete.

Sync can be used to insure file system integrity, but it buffers a message to write to disk instead of waiting to return until everything is written to disk. The fsync system call takes a file descriptor as an argument and won't return until data is written to disk. Chapters 3.14 and 4.24 of APUE cover this in more detail. EMC seems to advise (page 10) that you sync and umount a disk before using their block-level SANcopy tool.

doodle

This scheduling tool works well to identify the best date(s) for meetings: http://www.doodle.ch/main.html

Monday, December 10, 2007

python primes

#!/usr/bin/env python
# Filename:                primes.py
# Description:             Returns all primes less than N
# Supported Langauge(s):   Python 2.5.x
# Time-stamp:              <2007-12-10 00:38:14> 
# -------------------------------------------------------
# I. Find all primes <100 using the Sieve of Eratosthenes:
# 
# By the fundamental theorem of arithmetic: 
#  * every postive integer is a prime xor a composite
#  * every composite can be written as the product of primes 
#  * if n is a composite, then n has a prime divisor <= sqrt(n)
#
# So we'll start with a list of primes whose prime factor is less
# than sqrt(100), or 10.  These primes will be 2,3,5,7.
# 
# To use the sieve we do the following:  
# 
# * integers divisible by 2 (besides 2) are deleted
# 
# * since 3 is the first integer greater than 2 that is left,  
#   all integers divisible by 3 (besides 3) are deleted
# 
# * since 5 is the next integer after 3 that is left, all integers
#   divisible by 5 (besides 5) are deleted
# 
# * since 7 is the next integer after 5 that is left, all integers
#   divisible by 7 (besides 7) are deleted
# 
# * since all composite integers not exceeding 100 are divisible
#   by 2,3,5,7 all remaining integers except 1 are prime.  
# 
# Runtime:  n * sqrt(n)
# -------------------------------------------------------

from math import sqrt
from math import ceil

def divides(a, b):
    return a % b == 0

def sprimes(n):
    integers = []  
    for i in range(2, n): 
        integers.append(i)

    max_factor = int(ceil(sqrt(n))) 
    factors = integers[:max_factor] 

    for factor in factors:
        for integer in integers:
            if (divides(integer, factor) and (integer != factor)):
                integers.remove(integer)
                if (integer in factors):
                    factors.remove(integer)
    return integers 

print sprimes(1000)

# -------------------------------------------------------
# II. find all primes <100 using Wilson's theorem:
# 
#  p>1 is a prime iff fact(p-1) == -1 (mod p)
# 
# As per:  http://en.wikipedia.org/wiki/Wilson's_theorem
# 
# Translating modular arithmetic into Python:
#  http://www.muppetlabs.com/~breadbox/txt/rsa.html#3
# 
# As the above states "27 == 3 (mod 12)" in Math would be 
# "27 % 12 == 3" in Python.  Taking it to the next step:
# "fact(p-1) == -1 (mod p)" would be "fact(p-1) % p == -1".
# However, since a programming language's modulo returns an
# integer greater than or equal to zero I won't get a -1 so
# I'll use "fact(n-1) % n == n-1".  

def fact(n):
    f = 1
    while (n > 0):
        f = f * n;
        n = n - 1;
    return f;

def prime(n):
    return (fact(n-1) % n) == (n-1)

def wprimes(n):
    integers = []  
    for i in range(2, n):
        if prime(i):
            integers.append(i)
    return integers

print wprimes(1000)
        
# -------------------------------------------------------
# The Sieve of Eratosthenes ran faster than Wilson's theorem,
# probably because of the factorial (try it on 1000)

Saturday, December 8, 2007

topcoder

I registered with topcoder this weekend. It might be fun to keep my wits sharp with the occasional algorithm competition. I tried one of the practice problems. I like Java more than the other languages to solve the problem. The problem didn't seem to demand object orientation so I ended up using static methods. This was a problem (An illegal access exception occurred. Make sure your method is declared to be public) when I tried to submit my work back into their system since they seem to instantiate your class and then call your method to test. I think I found a bug or at least in inconsistency with the question. I reported it so we'll see. Also, I looked over the problem and didn't come back to it until the next day. This probably made it take too long and I think that's why I only got ~90 points. My solution seemed to work for all of their cases. Better luck next time.

Update: Topcoder acknowledged the bug in the 300pt 1-SRM 144 Div 1 problem. The statement considers example "123210122", so the second line you quote should look like

Because Q[0] = P[0] + P[1] = 1 + P[1] = 1 (!!), we know that P[1] = 0.

instead of

Because Q[0] = P[0] + P[1] = 1 + P[1] = 0, we know that P[1] = 0.

Friday, December 7, 2007

Java Stack Trace

I'm messing around with Java. Ubuntu has a write up. I choose IcedTea:

apt-get install icedtea-java7-jdk

I also installed mmake. I'm revisiting the Knock Knock server. Doing a SIGQUIT kill (-3) causes it to dump core and keep running:

$ java KKMultiServer > log 
...
$ ps axu | grep java | grep -v grep
user  24050  0.5  0.3 1308620 12740 pts/22  Sl+  15:24   0:00 java KKMultiServer
user  24071  2.0  0.3 1377796 15024 pts/20  Sl+  15:25   0:00 java KnockKnockClient
user  24093  2.4  0.3 1378512 15456 pts/21  Sl+  15:25   0:00 java KnockKnockClient
user  24113  4.3  0.3 1377708 15472 pts/19  Sl+  15:25   0:00 java KnockKnockClient
$ kill -3 24050
$ ps axu | grep 24050 | grep -v grep 
user  24050  0.0  0.3 1308620 12788 pts/22  Sl+  15:24   0:00 java KKMultiServer
$ wc -l log 
117 log

The stack trace output is interesting. It seemed to list each thread that the KKMultiServer had:

"KKMultiServerThread" prio=10 tid=0x00000000006e8400 nid=0x5e43 runnable [0x0000000041412000..0x0000000041412d90]
...
"KKMultiServerThread" prio=10 tid=0x00000000006e6400 nid=0x5e2f runnable [0x0000000041311000..0x0000000041311c10]
...
"KKMultiServerThread" prio=10 tid=0x00000000006d2400 nid=0x5e1b runnable [0x0000000041210000..0x0000000041210c90]

Then I see other runnable threads like the Low Memory Detector, Compiler Thread, Signal Dispatcher, Finalizer, Reference Handler, main, VM Thread and GC tasks. Each was waiting or runnable. It also prints Heap information. I came across PSYoungGen.java when trying to make sense of it. I found two stack trace articles from Sun and [0xCAFEFEED]. It's nice to see that people find this useful.

Thursday, December 6, 2007

amd-v xen

I'm looking to virtualize some services with xen and since I'm buying new hardware I'm curious about the amd-v chip.

I glanced over an AMD paper on xen on the amd-v chip. I also found a blog post which discusses these types of chips and the my hypervisor uses cpu hardware extensions to do what you do in software so it's faster than yours debate. The first benefit you'll hear about from AMD-V seems to be allowing unmodified guests to run on xen. I.e. the OS that you want to host doesn't need to be ported to xen. Since I'm mainly looking at running Linux I'm not too excited. The paper also claims it "reduces overhead by selectively intercepting instructions destined for guest environments". This seems plausible, though I'm curious if it's just newspeak.

AMD also claims that the memory controller "provides efficient isolation of virtual machine memory". This seems to be the most relevant benefit for me. I don't know enough about memory controllers to fully appreciate this. So far I'm hearing about memory virtualization (i.e. AMD Nested Page Tables and Intel Extended Page Tables) but virtal memory is nothing new. The difference seems to be a focus on isolating memory for a virtual machine as noted by the inquirer and project-xen.web.

Wednesday, December 5, 2007

Unbuffered I/O System Calls (apue ch3)

apue ch3 covers Unbuffered File I/O System Calls. It starts with:

open takes a path and returns a file descriptor (or -1 on error)
The kernel hands a file descriptor int to a process which reads or writes to it
creat is equivalent to open(path, O_WRONLY | O_CREAT | O_TRUNC, mode)
close(fd) releases process locks on a file and returns 0 (or -1)
Every open file has a current offset --stored in kernel at no I/O cost-- initialized to 0 unless O_APPEND is used
lseek (fd, off_t, whence) returns the offset (or -1)
whence can be: SEEK_SET (off_t from beginning) SEEK_CUR (off_t from current) SEEK_END (off_t + file size)
currpos = lseek(fd, 0, SEEK_CUR); i.e. the new file offset zero bytes from current is a goodway to get the current offset
write(fd, *buff, n_bytes) writes n_bytes of *buff to fd, increments offset, returns n_bytes or -1
read(fd, *buff, n_bytes) reads n_bytes of *buff from fd, increments offset, returns n_bytes or -1
unlink(path) deletes a file
chmod(path, mode) changes file permissions
stat(path, struct stat *sb) returns ptr to stat struc with file inode info

Note that the last three sys calls above are not in ch3 of apue and that I borrowed them from File-Related System Calls in FreeBSD.

With this in mind we can "create a hole in a file". I.e. the filesystem just pretends that at a particular place in the file there is zero bytes, but no actual disk sectors are used. When this happens the offset is still incremented so that future writes don't fill the hole. E.g. hole.c:

if ( (fd = creat("file.hole", FILE_MODE)) < 0)
   err_sys("creat error");
// create file.hole in current directory

if (write(fd, buf1, 10) != 10)
 err_sys("buf1 write error");
// write 10 bytes of buf1 to fd (offset = 10)

if (lseek(fd, 40, SEEK_SET) == -1)
 err_sys("lseek error");
// seek 40 bytes in memory from beginning (offset = 40) 

if (write(fd, buf2, 10) != 10)
 err_sys("buf2 write error");
// write 10 bytes of buf2 to fd (offset = 50)

Note that ls reports that file.hole is 50 bytes. We can then see the holes with od:

:~/code/c/apue/ch3> ll
total 20K
-rw-r--r-- 1 anonymous anonymous  542 2007-12-05 23:42 hole.c
-rwxr-xr-x 1 anonymous anonymous 8.5K 2007-12-05 23:42 a.out*
-rw-r--r-- 1 anonymous anonymous   50 2007-12-05 23:56 file.hole
f:~/code/c/apue/ch3> od -c file.hole 
0000000   a   b   c   d   e   f   g   h   i   j  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000040  \0  \0  \0  \0  \0  \0  \0  \0   A   B   C   D   E   F   G   H
0000060   I   J
0000062
:~/code/c/apue/ch3>

Ch3 then provides an unbuffered cat which differs from K&R's Ch1.5.1 cat:

#include <stdio.h>
/* copy input to output; 2nd version  */
main()
{
    int c;
    while ((c = getchar()) != EOF)
        putchar(c);
}

Which uses Standard Buffered I/O and is covered later in apue ch5.

The rest of the chapter covers File Sharing Between Processes, Atomic Operations and ioctl.

apue init

I'm finally trying some example programs from apue. When I try to compile hole.c with ourhdr.h in the same directory I see an "undefined reference to `err_sys'" error. This makes sense since I haven't defined the error functions. One solution is to replace them with printf. Since the apue site has error.c it's easy enough to just include that file. I'm working in a subdirectory I have for apue and with some slight modifications to hole.c:

:~/code/c/apue/ch3> diff hole.c hole.c.orig 
4,5c4
< #include      "../ourhdr.h"
< #include      "../error.c"
---
> #include      "ourhdr.h"
:~/code/c/apue/ch3>

I've got the example working on my Ubuntu system.

CUPS Export

You can export your CUPS settings as an XML file and then import them to another system. You can't just scp the /etc/cups/printers.conf file since it won't contain all of your printing configuration.

up2date has no dist-upgrade

With Debian/Ubuntu you can "apt-get dist-upgrade" but I know of no equivalent in RedHat. According to the RHEL5 release notes, you may "perform an upgrade from the latest updated version of Red Hat Enterprise Linux 4 to Red Hat Enterprise Linux 5". But they don't say how. MIT has some instructions which propose either having the installer upgrade or doing a fresh install. There also seem to be some yum issues after the upgrade.

Tuesday, December 4, 2007

LUNz

LUNz is created when a host has connectivity to an array, but no LUNs have been assigned to it. This allows the host to see the array and make use of a very limited set of SCSI commands. There are more or less technical explanations.

You can see if you have a LUNz device by looking for "Z" items in /proc/scsi/scsi. The last two entries below are LUNz's:

$ cat /proc/scsi/scsi
Attached devices:
...
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: DGC Model: RAID 5 Rev: 0324
Type: Direct-Access ANSI SCSI revision: 04

Host: scsi1 Channel: 00 Id: 02 Lun: 00
Vendor: DGC Model: LUNZ Rev: 0324
Type: Direct-Access ANSI SCSI revision: 04

Host: scsi1 Channel: 00 Id: 03 Lun: 00
Vendor: DGC Model: LUNZ Rev: 0324
Type: Direct-Access ANSI SCSI revision: 04
...
$

If you've connected a host to a second SAN but haven't assigned any real LUNs to it then an SP Collect will show you port number of the Second SAN's Service Processor which is most likely a LUNz.

It's been reported that there can be blocked I/O to EMC CLARiiON LUNz paths.

You can get rid of a LUNz by removing the zone that connects your host to the particular SAN with the LUNz OR you could add a "real" LUN to that SAN. You'll then want to reload the HBA module since it allows the kernel to see the the SCSI bus in it's final state. Rebooting is recommended to be on the safe side.

Saturday, December 1, 2007

xubuntu latex

LaTeX on xubuntu is easy:

apt-get install tetex-base tetex-bin tetex-extra

Then I can use the Ultimate Latex Makefile to easily "make pdf" and "make clean". When making updates I tend to "make foo.pdf && xpdf foo.pdf"

Friday, November 30, 2007

cpan urllist

CPAN keeps trying to get to archive.progeny.com which is down. You can get around this by using o conf urllist to see the URLs and then shift to move the bad URL out:

cpan[5]> o conf urllist                                                       
    urllist           
        [ftp://archive.progeny.com/CPAN/]
        [ftp://carroll.cac.psu.edu/pub/CPAN/]
        [ftp://cpan.calvin.edu/pub/CPAN]
        [ftp://cpan.cs.utah.edu/pub/CPAN/]
        [ftp://cpan.mirrors.redwire.net/pub/CPAN/]
Type 'o conf' to view all configuration items


cpan[6]> o conf urllist shift                                                 

cpan[7]> o conf urllist                                                       
    urllist           
        [ftp://carroll.cac.psu.edu/pub/CPAN/]
        [ftp://cpan.calvin.edu/pub/CPAN]
        [ftp://cpan.cs.utah.edu/pub/CPAN/]
        [ftp://cpan.mirrors.redwire.net/pub/CPAN/]
Type 'o conf' to view all configuration items


cpan[8]> o conf commit                                                        
commit: wrote '/usr/lib/perl5/5.8.5/CPAN/Config.pm'

cpan[9]>

You can also o conf urllist push ftp://... to add URLs.

qmail

djb's qmail is now in the public domain. His hardware page with standard workstations is pretty cool too.

script(1)

script(1) is an easy to keep a log of what you typed:

$ script foo
Script started, file is foo
$ ls
foo  test.sh*
$ exit
exit
Script done, file is foo
$ wc -l foo
7 foo
$

Thursday, November 29, 2007

Disk I/O

I've been looking at I/O on my disks with iostat. iostat can be thought of as a wrapper to /proc/diskstats. You can see how the diskstats change by putting a test load on them. The following does lots of reads from a particular device that I'm trying to trouble shoot:

dd if=/dev/emcpowerb of=/dev/null bs=512 count=100000000000 &

In a multipath set up iostat might look like:

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.01    0.00    0.02    0.01   99.96

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
...
sdb               0.00         0.00         0.00       2496          0
sdc               0.01         2.52         0.00    2583090        496
sdd               0.00         0.00         0.00       2496          0
sde               0.01         2.52         0.00    2581664        952
...
emcpowerb         0.02         5.02         0.00    5150338       1448

Note that emcpowerb has the total between sdc and sde. You can use several dd comands like the above to drive the I/O load up. Look at all the time spent in wa (iowait - amount of time the CPU has been waiting for I/O to complete):

top - 22:27:44 up 12 days,  3:26,  3 users,  load average: 7.05, 2.78, 1.04
Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.4% us,  4.8% sy,  0.0% ni,  0.0% id, 93.9% wa,  0.0% hi,  0.0% si
Cpu1  :  1.7% us,  7.1% sy,  0.0% ni,  0.0% id, 91.2% wa,  0.0% hi,  0.0% si
Cpu2  :  0.0% us,  0.3% sy,  0.0% ni, 81.8% id, 17.9% wa,  0.0% hi,  0.0% si
Cpu3  :  0.3% us,  3.4% sy,  0.0% ni,  6.1% id, 90.1% wa,  0.0% hi,  0.0% si
Mem:   8310532k total,  1028576k used,  7281956k free,   767020k buffers
Swap:  2031608k total,        0k used,  2031608k free,   128680k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
 2400 root      18   0  4836  420  356 D    2  0.0   0:00.83 dd                 
 2352 root      18   0  5364  420  356 D    2  0.0   0:06.15 dd                 
 2358 root      18   0  5036  420  356 D    2  0.0   0:04.05 dd                 
 2359 root      18   0  5204  420  356 D    2  0.0   0:04.06 dd                 
 2402 root      18   0  4492  420  356 D    2  0.0   0:00.86 dd                 
 2345 root      18   0  4112  420  356 D    2  0.0   0:09.21 dd                 
 2348 root      18   0  3884  416  356 D    2  0.0   0:06.40 dd                 
 2401 root      18   0  5276  420  356 D    2  0.0   0:00.81 dd                 
 2353 root      18   0  4348  420  356 D    1  0.0   0:06.09 dd

and see how your multipath device handles it (note that 5 and 3 queued I/Os):

# powermt display dev=emcpowerb | egrep "sdc|sde"
   1 qla2xxx                   sdc       SP A1     active  alive      5      0
   2 qla2xxx                   sde       SP A0     active  alive      3      0

# iostat
...
avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.01    0.00    0.02    0.04   99.93

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
...
sdb               0.00         0.00         0.00       2496          0
sdc               0.08        17.64         0.00   18117818        496
sdd               0.00         0.00         0.00       2496          0
sde               0.08        17.26         0.00   17725688        952
...
emcpowerb         0.15        34.89         0.00   35829090       1448

Note that there are obviously more reads. Note also how you can get similar stats directly from /proc/diskstats:

# cat /proc/diskstats | egrep "emc|sdc|sde"
   8   32 sdc 121192 15 26827106 2477633 51 0 496 467 7 641017 2478212
   8   64 sde 120903 15 26285720 2529294 45 0 952 472 2 653448 2529891
 120   16 emcpowerb 240323 6396919 53098682 5097396 96 85 1448 1013 10 .. ..
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    1      2       3        4       5  6  7    8   *9*

Note the 9th column (# of I/Os currently in progress) which is the same queued I/O which "powermt display dev=emcpowerb" displayed. If I stop the load test (killall dd) you can see the queued I/O drop:

# cat /proc/diskstats | egrep "emc|sdc|sde"
   8   32 sdc 160459 15 35087122 3570191 51 0 496 467 0 879727 3570647
   8   64 sde 160766 15 34445608 3638404 45 0 952 472 0 892158 3638873
 120   16 emcpowerb 319453 8370224 69518314 7328191 96 85 1448 1013 0 .. ..

Here's an easy way to focus on the 9th column:

cat /proc/diskstats |grep sdc |awk {'print $12'}

Wednesday, November 28, 2007

seq host

I looked up all the hosts in a subnet with:

for x in `seq 1 100`; do host 123.456.78.$x; done;

No big deal, but I'll log it here.

Tuesday, November 27, 2007

dd-wrt

My friend upgraded his router to dd-wrt. It's pretty cool. I might try the same for me WRT-54G v2.2.

Monday, November 26, 2007

dotlockfile

Today I learned about dotlockfile. I've put together an example to test my understanding. To use it start three terminals:

In terminal1 run the shell script below
In terminal2 run the shell script below (within 5 seconds)
In terminal3 view the PIDs with "cat /tmp/lock_test"

If you do the above correctly then terminal2's instance won't start until terminal1's instance is finished. Also, you should see different PIDs for each instance in terminal3.

Example Shell Script:

#!/bin/sh
# -------------------------------------------------------- 
# This program uses dotlockfile(1) to assure that no other 
# instances of itself will run.  Only useful as an example. 
# It works because other instances will also try to create 
# a lockfile of the same name and will find that the file 
# already exists.  It only locks a resource used by the same 
# program.  I.e. another program could choose to ignore the 
# lock file.  
# -------------------------------------------------------- 
dotlockfile -p -r2 /tmp/lock_test;  # lock this instance

TIME=5; # do something with resource (just sleeps)
echo "Sleeping for $TIME";  
echo "I.e. no other instances of me will run for $TIME";
sleep $TIME;
echo "Done, about to unlock for other instances";

dotlockfile -u /tmp/lock_test; # unlock this instance

Note that it's just a way to create lock files as part of file locking a process, since "the resource to be controlled is not a regular file at all, so using methods for locking files does not apply".

Moodle advises having cron do this while mirroring. I'm using it because I've got a cron job that's still running when another instance of it starts.

To install dotlockfile on RedHat you can get an RPM:

$ rpm -qlp dotlockfile-1.06.1-1mdv2007.0.i586.rpm 2> /dev/null
/usr/bin/dotlockfile
/usr/share/man/man1/dotlockfile.1.bz2
$

I.e. I couldn't easily find it on RHN. It seems to be installed by default on Ubuntu but if you don't have it it is available in Ubuntu's liblockfile1 package.

find old files

Delete all tar.gz files in $BDIR older than $DAYS:

find $BDIR -name \*.gz -ctime +$DAYS -exec rm '{}' \;

My backup script wasn't cleaning things correctly. The above did the trick. The find command is interesting.

Wednesday, November 21, 2007

Cisco IOS CLI

I'm not the primary person for the SAN where I work but I'm trying to learn more about it. One component of it is a Fabric Switch which I can SSH into and use CLI for IOS.

Monday, November 19, 2007

nfslock vs imaps

I had a race condition on a server that made it back into production. nfslock had grabbed port 993 before imaps could. This wasn't obvious at first from the imaps client error message:

The first thing is to always check your logs. I grepped based on when I restarted the service:

# egrep "^10:" 2007-11-19.log | grep -i imap | head -1
10:06:32.88 1 IMAP failed to start listener on [123.456.78.9:993]. Error
Code=network address (port) is already in use
#

I hadn't expected another service to grab that port but there it was:

# netstat -tulpn | grep 993
tcp        0      0 0.0.0.0:993                 0.0.0.0:*
    LISTEN      5899/rpc.statd
#

man rpc.statd: "The rpc.statd server implements the NSM (Network Status Monitor) RPC protocol... used by the NFS file locking service, rpc.lockd, to implement lock recovery when the NFS server machine crashes and reboots." This server used to use NFS but doesn't anymore. I stopped the service:

service nfslock stop
chkconfig nfslock off

and made sure my last chkconfg would prevent it from coming back up:

# chkconfig --list | grep nfslock
nfslock         0:off   1:off   2:off   3:off   4:off   5:off   6:off
#

While investigating this I saw that others had seen rpc.statd running on various ports. The man page said that "rpc.statd will ask portmap(8) to assign it a port number. As of this writing, there is not a standard port number that portmap always or usually assigns. Specifying a port may be useful when implementing a firewall" (thus the -p option). I find it odd that it just happened to grab a port that my server needed.

Saturday, November 17, 2007

zimbra rhel5 install

RedHat has a guide for installing Zimbra on RHEL5. They say that it should be installed on a "clean" RHEL5 system. One thing I don't like about RHEL5 is that its default install includes X. I used to like just selecting "minimal" when installing RHEL4.

Monday, November 12, 2007

Zimbra 1

I'm considering moving to Zimbra. I'm evaluating it on an Ubuntu Server test system and it was really easy to install. It's not like I just apt-got it but it very easy to download their tgz file (full of debs) and run their script. There's a guide on howtoforge. I had to trick it into thinking I was running Ubuntu 6, not 7. I got it authenticating off of my OpenLDAP server in 10 minutes:

ldap_filter:            (&(uid=%u))
ldap_search_base:       o=domain

I'm now trying to import a list of users. Since Zimbra uses OpenLDAP to store account data I think I'll have to use that as my interface. I'm able to export them:

openldap/sbin/slapcat -f /opt/zimbra/conf/slapd.conf -l /tmp/ldap.ldif

But even if I used the last 14 lines of the ldif file I don't think I could just re-import the file. I might be able to feed the file to a script which would re-create the account in the mail store, but I'm speculating. Time to read more documentation. I want to pilot on multiple servers:

and the Multi-Server Install guide says I only need one license (for my beloved MAPI and mobile users) - for the master LDAP server - and that I should respectively install the mailbox and MTA servers after the LDAP server. I now want to treat my pilot server as nothing but an LDAP server and delegate the mailbox and MTA duties to other servers. The Multi-Server Install guide describes how to set this up from scratch as opposed to upgrading from single to multiple servers so I'm looking to re-install.

Sunday, November 4, 2007

xli

I browse my directories with a shell. I occasionally come across images and I don't want to go back to my browser or use some other GUI program to view them. I just want to type command - the same way I type xpdf - to view the image. xv is non-free. Ubuntu has an xli packages which works nicely. Remember r and l rotate the image. Just use q to quit.

down with regex

I wrote a quick python script to parse CSV files and add them to zone files. In general I think I'm going to try to write more robust tools that I can reuse instead of quick little programs. A friend of mine looked at the code and offered some advice. One thing I realize is that I turn to regular expressions too much.

Regular expressions tend to be overkill especially for simple things. User input should almost never be turned into a regex. A lot of string operations can be effectively resolved more simply. Look at the string and try to make some rule based on index math and substrings.

I wanted to know if a string ended with a substring. I tried this:

host_re = re.compile('\.domain\.tld$')
if (host_re.search(host)): 
  # do something

Instead we started up with:

def ends_with(x, y):
    return len(x) == x.rfind(y) + len(y)

If y is found within x we get the index, or location, where it was found. We add the index to the length of y and this value must equal the length of x. This is better because a regex tends to introduce complications. Here's a variation of the above which covers if the other string is longer:

pos = host.rfind(zone_line)
if (pos > -1 and len(host) == pos + len(zone_line)):
   # do something

Tuesday, October 30, 2007

redhat hang

If your RedHat system gets really overloaded and hangs here are some things that might help:

SysRq: SysRq is a key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up.
Hangwatch: Hangwatch periodically polls /proc/loadavg, and echos a user-defined set of characters into /proc/sysrq-trigger if a user-defined load threshold is exceeded.
NMI: Non-Maskable Interrupt (NMI) is a mechanism to detect system lockups. It enables the built-in kernel deadlock detector. By executing periodic NMI interrupts, the kernel can monitor whether any CPU has locked up and print out debugging messages as needed.

Thursday, October 25, 2007

five-minute monitoring

A system independent of my mail server will text my phone if any of the ports on the mail server are not accessible. It checks every 5 minutes, which is about how long it took to whip this thing up.

me@workstation:~$ crontab -l
*/5 * * * * /bin/sh /home/me/code/shell/monitor.sh > /dev/null 2>&1
me@workstation:~$ cat /home/me/code/shell/monitor.sh
#!/bin/sh

EMAIL="my_number@cell_phone_company.com";
HOST="mailserver.domain.tld";
PORTS="25,80,110,143,443,587,993,995";
CMD=`/usr/bin/nmap -p $PORTS $HOST | grep tcp | grep -v open`;

if [ -n "$CMD" ] # if output of command has non-zero length
then
    echo $CMD | /bin/mail $EMAIL;
else
    echo "$PORTS are open on $HOST";
fi
me@workstation:~$

Wednesday, October 24, 2007

DNS MX hacks

In a previous post I talked about mail gateway load balancing by having two MX records in BIND. I mentioned that I'd want the more powerful server chosen a larger percentage of the time.

You would think I could just add another instance of the same host:

MX 10 mta0
MX 10 mta1
MX 10 mta1

but this doesn't work. BIND just ignores the second entry as redundant. I could make mta2 a CNAME for mta1 and then add mta2 as a third MX record. I've done some tests in a test environment and this works. However, this is a hack and having CNAMES for MX records is not theoretically not permitted (RFC 1034 section 3.6.2).

mail gateway load balancing

I have a dedicated mail gateway (mta0) which filters spam. It's been overworked so I set up a second (mta1). mta0 is the master and stores spam definitions and user preferences in a MySQL DB. mta1 is a slave and receives these content updates from mta0.

In order to get both systems sharing the load I simply add a second MX record for mta1. Here are the relevant portions of my zone file before:

$ grep -n MX domain.tld.zone
16:                     MX      10 mta0.domain.tld.
359:mail                   MX      10 mta0.domain.tld.
$

and after:

$ grep -n MX domain.tld.zone
16:                     MX      10 mta0.domain.tld.
17:                     MX      10 mta1.domain.tld.
360:mail                   MX      10 mta0.domain.tld.
361:mail                   MX      10 mta1.domain.tld.
$

BIND will automatically swap the order of either MX record for a given lookup. E.g. note how 0 or 1 end up on top for alternating queries:

$ dig @dns.domain.tld domain.tld +short MX 
10 mta0.domain.tld.
10 mta1.domain.tld. 
$ dig @dns.domain.tld domain.tld +short MX
10 mta1.domain.tld.
10 mta0.domain.tld.  
$

It then just takes a little time for your DNS updates to propagate. You can test your changes by using mxtoolbox.com or sending mail from hosts like gmail and yahoo and seeing which mta relayed by viewing full headers. Before you drop a second email hub into service be sure that it sends mail where it should. It would be a shame if half of your mail was lost. Use the following test and make sure you get the email where you'd expect. You might need to adjust your spam filter to let the test message below through:

telnet mta1 25
HELO workstation.domain.tld
MAIL FROM: me@domain.tld
RCPT TO:me@domain.tld
DATA
test
.

One nice thing about this you can add the second system without any downtime. mta0 does not need to be brought offline; it's just a matter of waiting for DNS to propagate. Since mta1 has twice as much CPU and RAM as mta0 I'm going to look into weighing the records so that mta1 gets more of the load.

Saturday, October 20, 2007

/lib/modules space hack

I made / too small for xubuntu and had trouble upgrading my kernel from 2.6.20-15 to 2.6.20-16.

dpkg: error processing 
/var/cache/apt/archives/linux-restricted-modules-2.6.20-16-generic_2.6.20.5-16.29_i386.deb 
(--unpack):
 failed in buffer_write(fd) (9, ret=-1): backend dpkg-deb during 
`./lib/linux-restricted-modules/2.6.20-16-generic/nvidia_legacy/nv-kernel.o': 
No space left on device

The issue was that /lib/modules/2.6.20-15-generic was taking up too much space (~100M):

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             236M  214M  9.4M  96% /
varrun                375M  100K  375M   1% /var/run
varlock               375M     0  375M   0% /var/lock
procbususb            375M  100K  375M   1% /proc/bus/usb
udev                  375M  100K  375M   1% /dev
devshm                375M     0  375M   0% /dev/shm
lrm                   375M   33M  342M   9% /lib/modules/2.6.20-15-generic/volatile
/dev/sda7              20G  1.4G   18G   8% /home
/dev/sda5             4.6G  2.3G  2.1G  53% /usr
/dev/sda6             1.9G  800M  982M  45% /var

My klugey fix (this system wasn't too important) was to "rm -r /lib/modules/2.6.20-15-generic" (after I tared it up on another partition). There were complaints about not being able to remove volatile but I was counting on that and left it, everything else made way. I did this while running the 2.6.20-15 kernel; I figured that what I would need for this operation was in RAM. The upgrade worked so I was then able to boot off of the 2.6.20-16 kernel. I then removed the 2.6.20-15 kernel. I guess I only have room for one at a time.

$ sudo dpkg -r linux-image-2.6.20-15-generic
Password:
(Reading database ... 102721 files and directories currently installed.)
Removing linux-image-2.6.20-15-generic ...
Running postrm hook script /sbin/update-grub.
You shouldn't call /sbin/update-grub. Please call /usr/sbin/update-grub instead!

Searching for GRUB installation directory ... found: /boot/grub
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-2.6.20-16-generic
Found kernel: /boot/memtest86+.bin
Updating /boot/grub/menu.lst ... done

The link /vmlinuz.old is a damaged link
Removing symbolic link vmlinuz.old 
Unless you used the optional flag in lilo, 
 you may need to re-run your boot loader[lilo]
The link /initrd.img.old is a damaged link
Removing symbolic link initrd.img.old 
Unless you used the optional flag in lilo, 
 you may need to re-run your boot loader[lilo]

Thursday, October 18, 2007

load watch

If you want to keep your eye on a system and you don't even have cron you can keep this script running. It wakes up every 10 minutes and sends me an email if the load is more than six:

#!/usr/bin/env python
# Filename:                load_watch.py
# Description:             emails me if load too high
# Supported Langauge(s):   Python 2.5.1
# -------------------------------------------------------- 
import commands
import os
import time
import smtplib

def main():
    while(1):
        frequency = 60 * 10 # ten minutes
        warn_load = 6.0
        loadcmd = 'cut -d " " -f1 /proc/loadavg'
        loadavg = float(commands.getoutput(loadcmd))
        if loadavg > warn_load:
            send_warning(loadavg)
        time.sleep(frequency)
        print loadavg
    
def send_warning(loadavg):
    """emails high load as subject"""
    print "warning"
    domain = 'domain.tld'
    smtpServer = 'mail.' + domain
    fromAddr = 'load_watch@mail.' + domain
    toAddr = 'me@' + domain

    msg = ""
    msg += "To: " + toAddr + "\n"
    msg += "From: " + fromAddr + "\n"
    msg += "Subject: " + 'Mail Load: %s' % loadavg
    msg += "\n\n"

    server = smtplib.SMTP(smtpServer)
    server.set_debuglevel(0)
    server.sendmail(fromAddr, toAddr, msg)

if __name__=="__main__":
   main()

Wednesday, October 17, 2007

ping keep shell

If you don't want an SSH session to timeout you can leave the following command running.

ping -i 30 $host

This command sends a single ping to $host every 30 seconds by using ping's interval option.

Tuesday, October 16, 2007

blackberry spoofing?

I have a colleague who does user support with blackberry devices. We ended up looking at the headers of a message from one of the new devices that he was testing. He was told that the new device uses a different protocol. The first header looked something like this:

Received: from mail.domain.tld (HELO domain.tld) ([123.456.78.9])
  by as16.bis.na.blackberry.com with ESMTP; 11 Oct 2007 20:29:46 +0000

Note that there's no message ID and I have nothing in my logs from this transaction. A normal message sent to google looks like this:

Received: from domain.tld (mail.domain.tld [123.456.78.9])
        by mx.google.com with ESMTP id  i35si14940528wxd.2007.10.16.11.05.19;
        Tue, 16 Oct 2007 11:05:19 -0700 (PDT)

Note the message ID and that I can confirm an SMTP handshake in my logs:

14:05:20.14 2 SMTP-25607(gmail.com) [12046375] sent to [66.249.83.27:25], 
got:250 2.0.0 OK 1192557920 i35si14940528wxd

In the case of the blackberry the first header really came from them. I suspect that the device connected to their server over the cellular network to send the mail. Their server then wrote that header to say it was from us, not them. So this first header in what supposedly happened is misleading as far as I can tell.

Friday, October 12, 2007

DNS Math

; I'm posting this one in Elisp.  
; 
; Someone I work with entered 200710110333 instead of 
; 2007101103 for a serial number field of the SOA RR on 
; our root DNS server.  Once this high number propagated 
; DNS updates with the correct date broke.  
; 
; The problem is that the following is true:

(< 2007101201 200710110333)

; so today's updates didn't propogate to our other servers.  
; 
; According to:  
;  http://www.zytrax.com/books/dns/ch9/serial.html
; 
; "perhaps ritual suicide is the best option" it also says:  
; 
;; The SOA serial number is an unsigned 32-bit field with
;; a maximum value of 2**31, which gives a range of 0 to 
;; 4294967295, but the maximum increment to such a number 
;; is 2**(31 - 1) or 2147483647, incrementing the number 
;; by the maximum would give the same number.  
; 
; Did I read that math right?  I think the key word here 
; is "usigned".  Checking up on this:  

(insert-string (expt 2 32))
(insert-string (expt 2 31))
(insert-string (expt 2 30))

; Inserts 4294967296, 2147483648 and 1073741824 into the 
; buffer respectively.  Also, the notation is bad, I think they 
; mean  (2**31) - 1 not 2**(31 - 1).  More precisely one of 
; the two:

(- (expt 2 31) 1)
(- (expt 2 32) 1)

; I guess I'll compute both and use this to solve my problem.  
; Let's assume that the following is true:
;
;; An unsigned 32-bit field with a maximum value of 2**31
;
; Actually let's check the RFC:
; 
;; http://www.faqs.org/rfcs/rfc1982.html
; 
; which defines SERIAL as:
;
;; The unsigned 32 bit version number of the original copy of
;; the zone.  Zone transfers preserve this value.  This value
;; wraps and should be compared using sequence space arithmetic.
;            
; It is "the maximum is always one less than a power of two."

; The DNS-Pro book then says:
; 
;; Using the maximum increment, the serial number fix is a two-step
;; process. First, add 2147483647 to the erroneous value, for example,
;; 2008022800 + 2147483647 = 4155506447, restart BIND or reload the zone,
;; and make absolutely sure the zone has transferred to all the slave
;; servers. Second, set the SOA serial number for the zone to the correct
;; value and restart BIND or reload the zone again. The zone will
;; transfer to the slave because the serial number has wrapped through
;; zero and is greater that the previous value of 4155506447! RFC 1982
;; contains all the gruesome details of serial number comparison
;; algorithms if you are curious about such things.
; 
; OK so what's the real_error value?  It wrapped by 2^32:

(mod 200710110333 (expt 2 32))

; Which makes sense since
; 
; me@workstation:~> dig @nameserver mta1.domain.tld 
; ...
; domain.tld.          3600    IN      SOA     
; nameserver.domain.tld. hostmaster.domain.tld. 
; 3141614717 1800 900 86400 3600
; me@workstation:~>

; So, I'm going to set my root server's SOA SN to 3141614717 

(mod 200710110333 (expt 2 32))

; To get everyone back in sync.  dig verified that they're 
; back in sync in less than an hour:
;
;; for x in dns1 ... dnsN; 
;;   do dig @$x domain.tld SOA +short; 
;; done
;
; I'm then free to set it to:   

(let ((today 2007101201)) 
  (- (expt 2 32) 
     1 
     (abs (- today (- (expt 2 31) 1)))))

; or greater than 0 but less than:  

(let ((error_value 3141614717)) 
  (mod 
   (+ (- (expt 2 31) 1) error_value)
   (expt 2 32)))

; For fun my colleague and I went with 666 and then did 
; an "rndc reload" and then set it to today and reload 
; again.

mail arriving out of order?

If your users complain about mail arriving out of order you can tell them the story below. But if they're complaining that this happens a lot, it's probably a symptom of your MTA being overloaded.

The mail protocol makes no guarantee that your messages will arrive in order.

If the mail server for foo.com tries to relay message-0 sent at 2:00 to mta0.domain.tld and mta0 doesn't have the resources it can and will refuse the SMTP connection. It's also possible that foo.com will then put message-0 back in its queue to not be resent for as long as root@foo.com sees fit to configure it. Let's assume this value is 1 hour. foo.com could then try to send message-1 (this not not message-0 which is still in the queue) at 2:05. When it tries to get it's SMTP connection this time mta0 has resources at that moment and accepts the message for delivery. Then at 3:00 foo.com tries to make another connection for message-0 to mta0 which again has resources and accepts the message. So, in the end message-0 sent at 2:00 arrives at 3:00 and message-1 sent at 2:05 arrives at 2:05. All of this is perfectly legal.

host2dig

I'm trying to get used to using dig instead of host. Here's two simple idioms to start with conversion:

Get an IP from a particular DNS server:

 dig @dns-server $host_name +short

Get a hostname from a particular DNS server:

 dig @dns-server -x $ip_address +short

Then read the DiG HOWTO.

Wednesday, October 10, 2007

mail

This is a handy command for testing:

echo -e "sent on `date`" | mail -s "test: `hostname`" foo@domain.tld

It will send mail quickly from the command line and there's no need to muck about with any clients After running this you can go see how your message is doing in /var/spool/mqueue etc.

Tuesday, October 9, 2007

tar -C

Would you believe that I got this far without knowing the tar -C option?

If you do the following:

tar xzf foo.tar.gz -C /

then /foo will contain the contents of foo.tar.gz.

Note that the -C means change to the specified directory so that the contents of your extract end up there. As the man page says:

       -C, --directory DIR
              change to directory DIR

I normally just put the tarball where I want it extracted and then extract it. My problem tonight was that the partition where I wanted to extract it to was not big enough for both the tarball and its files. I had to extract it to that directory from a different partition. Of course I had to wait until I saw a disk full error to realize this. If I had known about it I would probably be home now and not waiting for a 5G tarball to uncompress into 15G. I'll probably remember it now.

Monday, October 8, 2007

tcpdump

If you're debugging network services between two hosts don't forget to let tcpdump help you.

E.g. if host1 can't SSH to host2 and you think an external firewall is blocking you, try this:

1. Have host2 display TCP info on port 22 and display only results containing host1:

host2$ tcpdump -i eth0 tcp port 22 | grep host1

2. Try to SSH from host1 to host2.

If an external firewall is blocking them you should see nothing from the command above. However, if you see something like the following, then the external firewall isn't blocking you and it's an issue between the two hosts:

host2$ tcpdump -i eth0 tcp port 22 | grep host1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:08:50.412007 IP host1.domain.tld.54403 >
host2.domain.tld.ssh: S 2707949880:2707949880(0) win 5840 
14:08:53.411702 IP host1.domain.tld.54403 >
host2.domain.tld.ssh: S 2707949880:2707949880(0) win 5840 
14:08:59.411422 IP host1.domain.tld.54403 >
host2.domain.tld.ssh: S 2707949880:2707949880(0) win 5840

In the case above host2 is unable to get back to host1 to complete the TCP hand shake. In this case you could try reaching host1 from host2 and debug the resulting issue (e.g. broken netmask on host2).

A healthy looking tcpdump from host2 would look like this:

host2$ tcpdump -i eth0 tcp port 22 | grep host1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:11:12.101381 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: S 1120004832:1120004832(0) win 5840 
14:11:12.101391 IP host2.domain.tld.ssh >
host1.domain.tld.54432: S 320695995:320695995(0) ack 1120004833 win
5792 
14:11:12.101498 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 1 win 183 
14:11:12.107969 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1:24(23) ack 1 win 1448 
14:11:12.108063 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 24 win 183 
14:11:12.108165 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1:23(22) ack 24 win 183 
14:11:12.108173 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 23 win 1448 
14:11:12.108341 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 23:663(640) ack 24 win 183

14:11:12.108347 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 663 win 1768 
14:11:12.109143 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 24:664(640) ack 663 win 1768

14:11:12.109313 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 663:687(24) ack 664 win 223

14:11:12.111342 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 664:816(152) ack 687 win 1768

14:11:12.113385 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 687:831(144) ack 816 win 223

14:11:12.119580 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 816:1280(464) ack 831 win 1768

14:11:12.121828 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 831:847(16) ack 1280 win 263

14:11:12.162117 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 847 win 1768 
14:11:12.162204 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 847:895(48) ack 1280 win 263

14:11:12.162219 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 895 win 1768 
14:11:12.162262 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1280:1328(48) ack 895 win 1768

14:11:12.163845 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 895:959(64) ack 1328 win 263

14:11:12.164001 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1328:1392(64) ack 959 win 1768

14:11:12.166947 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 959:1055(96) ack 1392 win 263

14:11:12.168083 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1392:1456(64) ack 1055 win 1768

14:11:12.170456 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1055:1151(96) ack 1456 win 263

14:11:12.170493 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1456:1520(64) ack 1151 win 1768

14:11:12.170635 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1151:1519(368) ack 1520 win 263

14:11:12.171024 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1520:1840(320) ack 1519 win 2088

14:11:12.180209 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1519:2159(640) ack 1840 win 263

14:11:12.183403 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1840:1872(32) ack 2159 win 2408

-- 14:11:12.183659 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 2159:2223(64) ack 1872 win 263

14:11:12.187702 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1872:1920(48) ack 2223 win 2408

14:11:12.187923 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 2223:2671(448) ack 1920 win 263

14:11:12.201571 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1920:1968(48) ack 2671 win 2728

14:11:12.201596 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1968:2080(112) ack 2671 win 2728

14:11:12.201691 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 2080 win 263 
14:11:12.229982 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 2080:2144(64) ack 2671 win 2728

14:11:12.239626 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 2144:2208(64) ack 2671 win 2728

14:11:12.239719 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 2208 win 263

If you read the above you can see host1 ack'ing 1120004833 from host2 to establish a connection. There are plenty of tcpdump examples on the Interblag.

Wednesday, October 3, 2007

nice

I experimented with nice values and top and I'm pasting my results here.

When running top note the value of PR vs NI:

PR:

The priority number is calculated by the kernel and is used to determine the order in which processes are schedule. The kernel takes many factors in to consideration when calculating this number, and it is not unusual to see large fluctuations in this number over the lifetime of a process.

NI:

This column reflects the "nice" setting of each process. A process's nice is inhereted from its parent. Most user processes run at a nice of 0, indicating normal priority. Users have the option of starting a process with a positive nice value to allow the system to reduce the priority given to that process. This is normally done for long-running cpu-bound jobs to keep them from interfering with interactive processes. The Unix command "nice" controls setting this value. Only root can set a nice value lower than the current value. Nice values can be negative. On most systems they range from -20 to 20. The nice value influences the priority value calculated by the Unix scheduler.

To see these values in action I'll start two CPU intensive processes:

someguy@machine:~$ dd if=/dev/urandom of=/dev/null & 
[1] 31406
someguy@machine:~$ dd if=/dev/urandom of=/dev/null & 
[2] 31415
someguy@machine:~$

I see them both at the top of the stack:

  
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy   25   0  9200  772  620 R  100  0.0   3:48.62 dd                 
31415 someguy   25   0  9204  772  620 R  100  0.0   3:28.74 dd

Note their priority of 25 and their nice value, let's see how they change as we renice them.

Remember, the nicer a program the lower it's priority. The less nice a program the higher it's priority. For exmple:

Nice value of 19 is letting people walk on you
Nice value of 5 is holding the door
Nice value of -20 is pushing people out of the way

You can nice a program when you start it:

   nice -n -10 xmms

If it's already running:

   renice 19 -p $pid

In this case I'll move one way up and one way down:

lower 406: renice 19 -p 31406
raise 415: renice -19 -p 31415

The results of the first:

someguy@machine:~$ renice 19 -p 31406
31406: old priority 0, new priority 19
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31415 someguy   25   0  9204  772  620 R  100  0.0   6:50.54 dd                 
31406 someguy   39  19  9200  772  620 R  100  0.0   7:10.48 dd

So, it's priority went higher and the higher nice value shows. So, even the PR numbers are inversely proportional to nice numbers. Now, I raise the other:

someguy@machine:~$ sudo renice -19 -p 31415
Password:
31415: old priority 0, new priority -19
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy   39  19  9200  772  620 R  100  0.0   8:46.89 dd                 
31415 someguy    6 -19  9204  772  620 R  100  0.0   8:26.92 dd

So, I've raised the priority of the first. I could maximize it:

someguy@machine:~$ sudo renice -20 -p 31415
31415: old priority -19, new priority -20
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy   39  19  9200  772  620 R  100  0.0  17:00.51 dd                 
31415 someguy    5 -20  9204  772  620 R  100  0.0  16:41.13 dd

Note that I still can't get it past 5 with the lowest possible nice value. Note also that these are the two most CPU intensive processes so they both take the top of the stack, even with the most extreme nice values. If I had a third it should be sandwiched between the two.

someguy@machine:~$ dd if=/dev/urandom of=/dev/null & 
[1] 32089
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
32089 someguy   25   0  9200  768  620 R  100  0.0   0:21.44 dd                 
31415 someguy    5 -20  9204  772  620 R  100  0.0  18:29.59 dd                 
31406 someguy   39  19  9200  772  620 R  100  0.0  18:48.88 dd

Now it's scheduling them round robin.

15, 6, 89 :: 15, 89, 6 :: 6, 89, 15 :: 89, 6, 15 :: 6, 15, 89

But 15 is on top 2/3 of the time. Interesting. It still letting the other jobs at the resources.

Now, I'll start a fourth CPU hog with a nice value from the start:

someguy@machine:~$ sudo nice -n -20 dd if=/dev/urandom of=/dev/null &
[2] 32288
someguy@machine:~$

I predict that 88 and 15 will stay on top most of the time. Note that it was started with a higher nice value which required sudo so it's running as root:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31415 someguy    5 -20  9204  772  620 R  100  0.0  30:48.50 dd                 
32288 root       5 -20  9200  772  620 R  100  0.0   4:59.89 dd                 
31406 someguy   39  19  9200  772  620 R   99  0.0  31:05.53 dd                 
32089 someguy   25   0  9200  768  620 R   99  0.0  12:39.09 dd

Now I'll kill 88 and 15. Then let's try raising the importance of 6 and introducting a new process with a priority just below it:

renice -20 -p 31406 && nice -n -19 dd if=/dev/urandom of=/dev/null &


someguy@machine:~$ sudo su -
root@machine:~# renice -20 -p 31406 && nice -n -19 dd if=/dev/urandom of=/dev/null &
[1] 32656
root@machine:~# 31406: old priority 19, new priority -20

root@machine:~#

top shows:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy    5 -20  9200  772  620 R  100  0.0  39:28.19 dd                 
32089 someguy   25   0  9200  768  620 R  100  0.0  21:02.10 dd                 
32658 root       6 -19  9200  768  620 R  100  0.0   0:22.50 dd

Wasn't that fun?

Remember when running top that you can see how all of the CPUs are working by pressing 1.

printing

No, I don't mean having your program print something to the screen:

printf("Hello, world...\n");

I mean sending a PS file (if you're lucky) to a mechanical device which puts its content on paper. What good is that? You can't grep paper. You can't pipe programs to or from it or even copy and paste with it. I guess it works when the power is out, but you can't read it in the dark. A wise man once told me "the only time I've had to print something is when I had to hand it to an idiot". Since this post is about doing something that's not cool I'll describe a non-cool way to approach the issue since it relates to a non-cool part of my job and I'm just going to log it here. Sorry it's lame. I hope the anecdote made this post worth reading on some level.

Anyway, If you're configuring a RedHat box that wasn't minimally installed you can add printers remotely by doing:

ssh -X server

Once you're in you can do:

sudo /usr/sbin/system-config-printer-gui

<sarcasm>love it!</sarcasm> Note that the GUI requires you to have something present for the printer path. This seems like a bug since there are times when I want to print to a network printer like an LaserJet 4000 and this field simply doesn't apply. If this happens you can create the printer with the GUI and then you need to go in and edit the file that it generates (which is /etc/cups/printers.conf) and delete that path.

Monday, September 24, 2007

PowerPath Kernel Updates

I do the following when updating the kernel of a RHEL system that uses PowerPath to HBA mount a Clariion.

Prepare system to be rebooted without the SAN
When the system is rebooted with the new kernel the SAN won't be there so make sure you don't try to use it:
- Stop any processes that uses the HBA mounted LUNs and remove their startup scripts from /etc/rc*.d for after the reboot.
- Umount any HBA LUNs and then comment out their entries in /etc/fstab
Install your new kernel
Explicitly tell up2date to get your new kernel. E.g.:
```
 up2date -i kernel-hugemem
```
You might need to do an up2date --configure and set pkgSkipList (option 20 from the list) to an empty string. You might wish to undue this step later so that you don't get a new kernel with each up2date (otherwise you might have to follow these steps again when you're not ready).
Update PowerPath for the new Kernel
```
 rpm -Uvh --replacepkgs EMCpower.LINUX-5.0.0-157.rhel.i386.rpm
```
After doing the above PowerPath should start without a emcpmpx module error. Note that you might need the x86_64 RPM. The RPM comes from EMCpower.LINUX.*.tar.gz.
Undo what you did in step 1 (in reverse order) and reboot.

Of course if I was really cool I would just use the Free software MPIO instead of PowerPath.

Thursday, September 20, 2007

ssh agent

My users asked about how to use ssh-agent so I sent them this:

1. Start the SSH agent

workstation:~$ eval `ssh-agent`
Agent pid 26147
workstation:~$

Note that the above is a back-tick, not a single quote. It should be on the upper-left of a standard PC keyboard. If you try this and get:

Could not open a connection to your authentication agent.

then your session is not running under the ssh-agent. You can get around this by restarting a new shell under the agent by running:

exec ssh-agent bash

2. Make the agent aware of your key (and type passphrase):

workstation:~$ ssh-add
Enter passphrase for /home/me/.ssh/id_rsa:
Identity added: /home/me/.ssh/id_rsa (/home/me/.ssh/id_rsa)
workstation:~$

3. Confirm it has your key:

workstation:~$ ssh-add -l
2048 9b:fe:23:ed:9a:ff:be:ed:1d:b7:26:28:c9:68:b5:62
/home/me/.ssh/id_rsa (RSA)
workstation:~$

4. SSH to server1 and forward your key:

workstation:~$ ssh -AX server1
Last login: Thu May 31 11:58:34 2007 from workstation.domain.tld
[server1 ~]$

(note: it didn't prompt for a password since the agent cached the key)

5. SSH from server1 to server3

[server1 ~]$ ssh -AX server3
The authenticity of host 'server3 (123.456.789.45)' can't be established.
RSA key fingerprint is 6b:9d:98:60:36:8e:ef:d3:ea:90:0e:a8:cb:25:b2:90.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server3,123.456.789.45' (RSA) to the list of
known hosts.
Last login: Wed May 30 17:42:33 2007 from workstation.domain.tld
[server3 ~]$

6. Create a file on server3

[server3 ~]$ echo "foo" > foo.txt
[server3 ~]$

7. Logout back to server1:

[server3 ~]$ exit
Connection to server3 closed.
[server1 ~]$

8. scp the file you left on server3 back to server1:

[server1 ~]$ scp server3:/home/me/foo.txt .
foo.txt                                     100%    4     0.0KB/s   00:00
[server1 ~]$

9. Logout on server1 and see that the agent still running on your PC:

workstation:~$ ssh-add -l
2048 9b:fe:32:ed:9a:ee:fb:ea:1f:3b:22:83:9c:86:b5:62
/home/me/.ssh/id_rsa (RSA)
workstation:~$

10. Remove the key when your done working:

workstation:~$ ssh-add -d ~/.ssh/id_rsa
Identity removed: /home/me/.ssh/id_rsa (/home/me/.ssh/id_rsa.pub)
workstation:~$

11. Verify it's no longer cached:

workstation:~$ ssh-add -l
The agent has no identities.
workstation:~$

12. Figure out the agent's PID and stop it: You were told the PID in step one but if you don't remember it you can find it:

workstation:~$ ps ax | grep 26147
26147 ?        Ss     0:00 ssh-agent
workstation:~$

and then kill that PID:

workstation:~$ kill 26147
workstation:~$

You can then make sure that the agent has died:

workstation:~$ ps ax | grep 26147
workstation:~$

These last few steps are important. Especially if you're done working and going to be away from your desk.

Thursday, September 13, 2007

GPG for the Masses

FireGPG tweaks Firefox so that you can highlight text and then right click to choose to de/en-crypt that text. I'm curious about the same thing but integrated with an operating system's file browser so that you could right click on the icon for a file and then choose to produce a .gpg version of that file or produce a decrypted version without the .gpg. I think this would help the average user get more comfortable GPG.

Friday, September 7, 2007

TWM

I'm considering going back to twm. Small means fast and dependable. I also love green on black and this lugu.org screenshot inspires me given all of the focus on bloated eye candy:

Tuesday, September 4, 2007

Huge Memory Kernel

Remember kids!

If 'uname -i' returns x86_64 and you have 16G or more of RAM (supposedly up to a tebibyte), then you don't need a special kernel.

If 'uname -i' returns i386 and you have between 16G or more of RAM, then you need to install a huge memory kernel (on RedHat "up2date -i kernel-hugemem").

Tuesday, August 21, 2007

Perl array references... yuck

#!/usr/bin/perl
# -------------------------------------------------------- 
# I hate Perl's syntax for scalars vs arrays vs hashes and 
# all of the nasty tricks in between for referencing them. 
# 
# I want to iterate over a hash table of arrays and then 
# pass each array to a function.  
# -------------------------------------------------------- 
# From the Camel Book:

%HoA = (
    flintstones    => [ "fred", "barney" ],
    jetsons        => [ "george", "jane", "elroy" ],
    simpsons       => [ "homer", "marge", "bart" ],
);

# Here's the book's way to loop through this:  

for $family ( keys %HoA ) {
    print $family . ": ";
    for $i ( 0 .. $#{ $HoA{$family} } ) {
        print $HoA{$family}[$i] . " "; 
    }
    print "\n";
}

# If I want to grab each array that comes out I need to 
# deference it with {} and declare it an an array with @
# I can't just say:  @fam_arr = @HoA{$family};
# So for the array that comes out which I refer to as  
# a scalar, I'm now going to refer to as an array:  

foreach $family ( keys %HoA ) {
    @fam_arr = @{ $HoA{$family} }; 
    print $family . ": ";
    foreach $name (@fam_arr) {
        print $name . " "; 
    }
    print "\n";
}

# Since Perl's parameter passing to functions expands arrays 
# into all of their values, I need to be explicit when I pass
# the array that I'm passing a reference to the array with \@
# The function then takes the scalar reference and I then tell 
# it to deference it with {} and treat it as an array with @

foreach $family ( keys %HoA ) {
    @fam_arr = @{ $HoA{$family} }; 
    print $family . ": ";
    print_list(\@fam_arr);
    print "\n";
}

sub print_list {
    my ($fam_arr) = @_;
    foreach $name (@{$fam_arr}) {
        print $name . " "; 
    }
}

# I find this to be excessive compared to similar languages.
# Perhaps I don't know a better way with Perl or this is just
# not a paradigm Perl promotes.  I'm trying to abstract some 
# things to shift focus away from what I want to do to each
# array in a real program and I've done it, but I feel I had
# to introduce what seems like some ugly reference tricks.

Perl talking to Oracle

There is some data in an Oracle database that I need to retrieve regularly with a Perl script. So I need some sort of Oracle client to connect to the database so I can develop and and I need a library for my script to use to talk to it regularly. Oracle offers such a library. After your register you can get some non-Free software which is a pain to install. I'll now rant about the software.

Oracle says:

Download the appropriate Instant Client packages for your platform. All installations REQUIRE the Basic package.
Unzip the packages into a single directory such as "instantclient".
Set the library loading path in your environment to the directory in Step 2 ("instantclient"). On many UNIX platforms, LD_LIBRARY_PATH is the appropriate environment variable.
Start your application and enjoy.

The above implies that someone new to Oracle will have an easy time configuring and using this software. Not so. It assumes that you have some other files in your configuration and some other environment variables which it doesn't set.

Missing Secret Steps:

I installed three RPMs (using RHEL4U5):

 instantclient-basic-linux32
 instantclient-sdk-linux32
 instantclient-sqlplus-linux32

which unpack files in /usr/lib/oracle/10.2.0.3/client/lib and set up symlinks for sqlplus. The first think I see when I try to run it is:

sqlplus: error while loading shared libraries: libsqlplus.so: cannot
open shared object file: No such file or directory

Since I've got them installed in /usr/lib/oracle/10.2.0.3/client/lib I simply set my environment variable as per Oracle's instructions:

 LD_LIBRARY_PATH=/usr/lib/oracle/10.2.0.3/client/lib

I googled a little bit and found this so I also needed to set some other variables:

ORACLE_HOME=/home/oracle
ORACLE_BASE=/home/oracle
LD_LIBRARY_PATH=/usr/lib/oracle/10.2.0.3/client/lib
PATH=$PATH:$ORACLE_HOME/bin
export ORACLE_HOME
export ORACLE_BASE
export LD_LIBRARY_PATH
export PATH

As you can see from the above I actually created a user oracle (with no password (i.e. !! in /etc/shadow)) and made an empty home directory. The RPMs won't create the oracle home directory which is this somewhat arbitrary place where oracle stores some config files.

Next time I tried to run it I saw:

Message file sp1.msb not found
SP2-0750: You may need to set ORACLE_HOME to your Oracle software
directory

Turns out that you need this file in your ORACLE_HOME. Luckily I had a server where Oracle was installed and I found an sqlplus directory in the ORACLE_HOME which I scp'd over to my ORACLE_HOME:

# ls -R $ORACLE_HOME/sqlplus/
sqlplus/:
admin  doc  lib  mesg

sqlplus/admin:
glogin.sql  help  plustrce.sql  pupbld.sql

sqlplus/admin/help:
helpbld.sql  helpdrop.sql  helpus.sql  hlpbld.sql

sqlplus/doc:
cpyr.htm  oracle.gif  README.htm

sqlplus/lib:
env_sqlplus.mk  ins_sqlplus.mk  s0afimai.o

sqlplus/mesg:
cpyus.msb  sp1us.msb  sp2us.msb  sp3us.msb
cpyus.msg  sp1us.msg  sp2us.msg  sp3us.msg

I then tried to start sqlplus and saw: ORA-12162: TNS:net service name is incorrectly specified. Turns out that the names of the systems that you want to connect to are stored in a tnsnames.ora file. This files content's can look just like this:

FOO =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = 
        (PROTOCOL = TCP)
        (HOST = db-server.tld)
        (PORT = 2321)
      )
    )
    (CONNECT_DATA = (SID = FOO)
    )
  )

and then you can pass it arguments like this:

 sqlplus user@foo

it will then lookup foo as above and prompt you for a password. Of course you'd change foo to your DB name and HOST to your DB server.

I ended up again borrowing tnsnames.ora from my working server (don't know what I would have done without it). Now, I wasn't certain of where to put it, but I used an evolutionary algorithm, i.e. I was able to waste some time experiementing until I got it right which made me feel like slime trying to evolve. Anyway, I borrowed the entire network directory from my working Oracle server and put it in ORACLE_HOME.

Inside of this directory was the ADMIN directory, at least that's how it was on the server that I borrowed from. I ended up doing a symlink with the lowercase version to finally get it working:

[root@host network]# ll | grep admin
lrwxrwxrwx  1 root   root    6 Aug 21 12:08 admin -> ADMIN/
[root@host network]#

So AFAICT, tnsnames.ora goes in $ORACLE_HOME/network/admin/.

Coming from MySQL all of this is a pain. I like installing my MySQL client with one command. I like being able to specify which host to connect to by passing the -h not editing a tnsnames.ora file (though I did have fun with the S-expressions). I like being able to use my package manager to have instant access to the libraries that I need. Instead the Perl module to talk to the database will complain if it can't find references to the Oracle client above as it tries to compile. Let's talk more about that.

CPAN

Don't think you can just skip here and just use CPAN without Enterprise Grade Genuine Advantage Enterprise Oracle Software.

cpan[3]> install DBD:Oracle
Running install for module 'DBD::Oracle'
...
Trying to find an ORACLE_HOME
Your LD_LIBRARY_PATH env var is set to ''

      The ORACLE_HOME environment variable is not set and I couldn't
      guess it.
      It must be set to hold the path to an Oracle installation
      directory
      on this machine (or a machine with a compatible architecture).
      See the README.clients.txt file for more information.
      ABORTED!
  
Warning: No success on command[/usr/bin/perl Makefile.PL]
  PYTHIAN/DBD-Oracle-1.19.tar.gz
  /usr/bin/perl Makefile.PL -- NOT OK

Even after installing the client as described above and getting sqlplus working from the command line first. I still had some trouble doing the above. I ended up downloading DBD-Oracle-1.19.tar.gz from: cpan into /usr/local/src/ and doing a:

 perl Makefile.PL 
 make 
 make install

I was then finally able to have some code do:

use DBI;
my $dbh = DBI->connect('DBI:Oracle:dbname', 'user', 'pass')
    or die "Couldn't connect to database: " . DBI->errstr;

The above seems to work.

Ruby Scripting Part II

Ruby's Net::SSH is pretty cool. Here's a script to SSH into a set of hosts and do some work. It assumes that you're using sudo with local passwords and well as public keys.

#!/usr/bin/ruby
# -------------------------------------------------------- 
# SSH's into a set of hosts as user (assumes he has sudo)
# Uses local password (gathered earlier) to execute a set 
# of commands as root
# -------------------------------------------------------- 
require 'rubygems' 
require 'net/ssh'
# -------------------------------------------------------- 
# Store systems & commands to be run as whom
user = "user"
hosts = Array.new
hosts[0] = "system1"
hosts[1] = "system2"
cmds = Array.new
cmds[0] = "touch x"
cmds[1] = "touch y"
cmds[2] = "touch z"
# -------------------------------------------------------- 
# Cache tokens
print "Enter local password for hosts (assuming same): "
system("stty -echo") # hide input (password)
password = gets.chomp.to_s
system("stty echo") # show input again
print "\n"
# use ssh agent to cache key
system("ssh-agent")
system("ssh-add")
# -------------------------------------------------------- 
# SSH into each host and sudo execute commands
hosts.each do |host|
  Net::SSH.start(host, user) do |session|
    shell = session.shell.open
    shell.cd "/home/#{user}"
    cmds.each do |cmd|
      shell.echo "#{password} | sudo -S #{cmd}"
    end
    shell.exit
    # give the above commands sufficient time to terminate
    sleep 0.5
    # display the output
    $stdout.print shell.stdout while shell.stdout?
    $stderr.puts "-- stderr: --"
    $stderr.print shell.stderr while shell.stderr?
  end
end
# -------------------------------------------------------- 
# Clean out cached tokens
password = ""
system("ssh-add -d ~/.ssh/id_rsa")

Sunday, August 12, 2007

Ruby Scripting

I've got some scripting to do. I'm going to try using Ruby. Aside from a quick rails tutorial, I have never really used Ruby so let's see how easy it is to learn.

I've got to change some IP addresses on a set of Linux boxen. I'm going to encode my data as recursive hash tables. First I'll get comfortable with the syntax by comparing it to some all too familiar PHP. Here's an example:

#!/usr/bin/php -q
<?php
$data = array( 
      "server0" => 
      array(
     "old_ip" => "123.456.234.51",
     "new_ip" => "123.456.7.5",
     "gateway" => "123.456.7.1", 
     "subnet" => "255.255.255.224", 
     ),
      "server1" => 
      array(
     "old_ip" => "123.456.234.52",
     "new_ip" => "123.456.7.6",
     "gateway" => "123.456.7.1", 
     "subnet" => "255.255.255.224", 
     ),
      );

foreach ($data as $host => $fields) {
  print $host . ":\n";
  foreach ($fields as $name => $value) {
    print "\t $name \t $value \n";
  }
}
?>

Translating this to Ruby:

#!/usr/bin/ruby

data = { 
      "server0" => 
        {
     "old_ip" => "123.456.234.51",
     "new_ip" => "123.456.7.5",
     "gateway" => "123.456.7.1", 
     "subnet" => "255.255.255.224", 
        },
      "server1" => 
        {
     "old_ip" => "123.456.234.52",
     "new_ip" => "123.456.7.6",
     "gateway" => "123.456.7.1", 
     "subnet" => "255.255.255.224", 
        },
  }

data.each do |host, fields| 
    puts host + ":\n"
    fields.each do |name, value|
       puts "\t #{name} \t #{value} \n"
    end
end

Friday, July 27, 2007

SysAdmin Day

The Last Friday Of July is System Administrator Appreciation Day. Yay for me.

Wednesday, July 25, 2007

tired random thoughts

Who needs sleep when you're upgrading your Generator, Network, SAN and mail server in the same two-week period? I ended up slightly restless and looking for things to play with.

I tried out this rails thing. RHN was down so I couldn't throw together a test system in an "enterprise" environment. Luckily I can count on Debian. It was all too easy. I found a cute little tutorial and I think it's actually pretty neat. Takes a lot of the manual labor out of web development. I'd like to really try it to see if the framework went too far and locks me in. At the moment I feel like I could be locked into something but I also kind of like it. It's like apt for web development. Programming is all about creating abstractions. Will using the rails abstraction help or hurt me? I can see it going either way. Only one way to find out; write something real in it. Either way it's better for me than using PHP again.

SVN can use WebDAV and authentication for it should be controllable by Apache's devices including mod_shib. So Shibboleth federated members should be able to collaborate on our SVN server provided that their federation groups them (since we don't everyone from their institution writing to our repository).

Thursday, July 12, 2007

June

I was traveling in June. I've had things to add but haven't had a chance. For now I will just link to:

Friday, June 8, 2007

Privacy

I've setup Tor on two of my Ubuntu systems. JFM's guide made it too easy. I'll see about getting it working with NetBSD next. I'm also now using FireGPG. It basically allows you to highlight any text on an HTML page and GPG encrypt/decrypt it. It also has Gmail integration which simply adds two buttons to do this. You wouldn't be able to add this to other webmails unless you modified their source or tried tweaking their DOM, but you can still right-click and use it in any HTML page, including any webmail.

Monday, May 28, 2007

wmii

I've evaluating wmmi, an even more minimalistic window manager and suckless.org does a fine job of explaining why. I like that it only has 10,000 lines of code. Let me show you my favorite feature so far:

bash-3.2# dmesg | grep cpu | grep Intel
cpu0: Intel Pentium 4 (686-class), 1794.24 MHz, id 0xf12
cpu0: "Intel(R) Pentium(R) 4 CPU 1.80GHz"
bash-3.2# pwd
/usr/pkgsrc/wm/wmii
bash-3.2# date && make && date                       
Mon May 28 06:25:53 EDT 2007
=> Required installed package digest>=20010302: digest-20060826 found
===> Checking for vulnerabilities in wmii-3.1
=> Checksum SHA1 OK for wmii-3.1.tar.gz
=> Checksum RMD160 OK for wmii-3.1.tar.gz
===> Installing dependencies for wmii-3.1
==========================================================================
The following variables will affect the build process of this package,
wmii-3.1.  Their current value is shown below:

        * PKG_SYSCONFBASE = /usr/pkg/etc

You may want to abort the process now with CTRL-C and change their value
before continuing.  Be sure to run `/usr/bin/make clean' after
the changes.
==========================================================================
=> Required installed package digest>=20010302: digest-20060826 found
=> Required installed package x11-links>=0.25: x11-links-0.30 found
===> Overriding tools for wmii-3.1
===> Extracting for wmii-3.1
===> Patching for wmii-3.1
=> Applying pkgsrc patches for wmii-3.1
===> Creating toolchain wrappers for wmii-3.1
===> Building for wmii-3.1
wmii build options:
LIBS     = -lc -lX11
CFLAGS   = -O2 -I/usr/pkg/include -DVERSION="3.1"
LDFLAGS  = -L/usr/pkg/lib -Wl,-R/usr/pkg/lib -L/usr/X11R6/lib -Wl,-R/usr/X11R6/l
ib -lc -lX11
CC       = cc
CC emallocz.c
CC strlcat.c
CC strlcpy.c
CC strtonum.c
CC tokenize.c
CC trim.c
CC vector.c
AR libcext.a
built libcext
CC color.c
CC font.c
CC draw.c
CC geometry.c
AR liblitz.a
built liblitz
CC client.c
CC convert.c
CC message.c
CC server.c
CC socket.c
CC transport.c
AR libixp.a
built libixp
CC wmiimenu.c
LD wmiimenu
CC wmiipsel.c
LD wmiipsel
CC wmiir.c
LD wmiir
CC wmiisetsid.c
LD wmiisetsid
CC wmiiwarp.c
LD wmiiwarp
built wmii commands
CC area.c
CC bar.c
CC client.c
CC column.c
CC event.c
CC frame.c
CC fs.c
CC key.c
CC mouse.c
CC rule.c
CC view.c
CC wm.c
LD wmiiwm
built core window manager
=> Unwrapping files-to-be-installed.
Mon May 28 06:26:17 EDT 2007
bash-3.2#

I.e. it builds in less than one minute.

Using wmii (don't panic)

Menu:  Alt-p, xterm, Esc
Move:  Alt-shift-l
Tag:   Mod-shift-2 send current window to workspace 2
Quit:  Alt-a

I typically do the following:

startx
Alt-p, xterm
Alt-p, xterm
Mod-shift-l split them vertically
Alt-p, firefox
Mod-shift-2 send firefox to workspace 2

See the documentation for more information.

Sunday, May 20, 2007

Binary Packages

I've been a Debian user for a while and I'm trying to get used to pkgsrc. My goals for pkgsrc are:

Only install what I need
Stay current with updates

Note that the first goal will make the second one easier. Eventually I'd like to keep a CVS synchronized pkgsrc tree but for now I'm going to start by using binary packages. One thing that I like about apt that I want to talk about in this post is dependency management and security updates. When I was using packages not managed by my operating system I didn't want to search for dependent packages or drop whatever I was doing just to manually build the latest version of a package because a new vulnerability came out. This security update feature got me addicted to apt-get. I would just "apt-get upgrade" every few days and not have to worry about those exploits. Note that an operating system could also provide a managed system for updates that you automatically build (as opposed to manually) rather than install binary packages. I'm looking to explore this further when I get to know the src side of pkgsrc, but today I'm focusing on the pkg side.

Dependencies

Pkgsrc will manage your dependencies provided those packages "are present where you install" your package from. I.e. you can specify an FTP location as an argument to pkg_add and it will then download and install the package along with its dependencies. The trick is to not download the binary package itself first since when you try to pkg_add it, the dependent packages may not be present and pkg_add will complain that it can't find the dependencies. This is like downloading a .deb file without its dependencies and then trying to dpkg it. Instead, just give pkg_add a URL to a package instead of a package file itself and let pkg_add handle the dependencies. So if you search around and find a package that you want you should be able to copy the link for your platform and then pass it to the pkg_add command. E.g. I wanted madplay to play my MP3s for i386 so I used

bash-3.2# pkg_add <URL>
pkg_add: Warning: package 
...
was built for a different version of the OS:
pkg_add: NetBSD/i386 3.0 (pkg) vs. 
NetBSD/i386 3.1 (this host)
bash-3.2# which madplay
/usr/pkg/bin/madplay
bash-3.2#

Note that I try to use ftp8 and ftp7 since they are the closest mirrors (for madplay it linked back to the master ftp.netbsd.org).

Finding Newer Versions and Vulnerabilities

Pkgsrc will provide you a list of vulnerabilities and a list of newer versions. After installing audit-packages I ran "download-vulnerability-list" and then "audit-packages" to get a list of security threats. After installing pkglint I could also see that there were newer packages:

bash-3.2# lintpkgsrc -i
Scan Makefiles: 6607 packages                              
Version mismatch: 'MesaLib' 6.4.2nb2 vs 6.4.2nb3
Version mismatch: 'Xft2' 2.1.7nb2 vs 2.1.7nb3
Version mismatch: 'Xrender' 0.9.0nb1 vs 0.9.0nb2
Version mismatch: 'atk' 1.12.3 vs 1.18.0
Version mismatch: 'bash' 3.2 vs 3.2.9,2.05.2.7nb8
Version mismatch: 'cairo' 1.2.6 vs 1.4.2nb1
Version mismatch: 'emacs' 20.7nb7 vs 21.4anb10,20.7nb8
Version mismatch: 'esound' 0.2.36nb1 vs 0.2.37
Version mismatch: 'firefox' 2.0.0.1 vs 2.0.0.3nb1,1.5.0.11
Version mismatch: 'fluxbox' 0.9.15.1nb1 vs 1.0rc3nb1
Version mismatch: 'fontconfig' 2.4.2 vs 2.4.2nb2
Version mismatch: 'freetype2' 2.2.1nb2 vs 2.3.4
Version mismatch: 'glib2' 2.12.4nb1 vs 2.12.11
Version mismatch: 'gtk2+' 2.10.6 vs 2.10.11
Version mismatch: 'libIDL' 0.8.7 vs 0.8.8
Version mismatch: 'pango' 1.14.9 vs 1.16.2
Version mismatch: 'perl' 5.8.8nb3 vs 5.8.8nb4
Version mismatch: 'pkglint' 4.74 vs 4.76
Version mismatch: 'png' 1.2.14nb1 vs 1.2.16
bash-3.2#

The documentation claims that I can then use "make update" to update the package and rebuild any dependencies. AFAICT this means that you have to cd into /usr/pkgsrc and run the "make update" command to make an update from source. I.e. it doesn't mean that it will download a newer binary package. For example after running the above I cd'd into /usr/pkgsrc/editors/emacs20 and ran a "make update" and then after running the above command it no longer included emacs20 (note that I should really get emacs21, but that's not the point). Here is the new output after the "make update":

bash-3.2# lintpkgsrc -i
Scan Makefiles: 6607 packages                              
Version mismatch: 'MesaLib' 6.4.2nb2 vs 6.4.2nb3
Version mismatch: 'Xft2' 2.1.7nb2 vs 2.1.7nb3
Version mismatch: 'Xrender' 0.9.0nb1 vs 0.9.0nb2
Version mismatch: 'atk' 1.12.3 vs 1.18.0
Version mismatch: 'bash' 3.2 vs 3.2.9,2.05.2.7nb8
Version mismatch: 'cairo' 1.2.6 vs 1.4.2nb1
Version mismatch: 'esound' 0.2.36nb1 vs 0.2.37
Version mismatch: 'firefox' 2.0.0.1 vs 2.0.0.3nb1,1.5.0.11
Version mismatch: 'fluxbox' 0.9.15.1nb1 vs 1.0rc3nb1
Version mismatch: 'fontconfig' 2.4.2 vs 2.4.2nb2
Version mismatch: 'freetype2' 2.2.1nb2 vs 2.3.4
Version mismatch: 'glib2' 2.12.4nb1 vs 2.12.11
Version mismatch: 'gtk2+' 2.10.6 vs 2.10.11
Version mismatch: 'libIDL' 0.8.7 vs 0.8.8
Version mismatch: 'pango' 1.14.9 vs 1.16.2
Version mismatch: 'perl' 5.8.8nb3 vs 5.8.8nb4
Version mismatch: 'pkglint' 4.74 vs 4.76
Version mismatch: 'png' 1.2.14nb1 vs 1.2.16
bash-3.2#

It seems that the binary packages can get you a package quickly but that you should probably build from source to begin with in order to have the newer versions. I'll post more about this later.