Friday, March 21, 2008

Strange Sendmail Bugs

I've mentioned earlier that I have pair of DNS load balanced mail gateways. These boxen run sendmail and I've recently lowered the MX preference number in DNS so that the more powerful system will get the mail first. However I've run into a strange problem where some systems are not respecting the DNS change. It seems that the more powerful system is not accepting mail from these hosts and the less powerful system is then picking up the message instead. These systems are also running sendmail. I'm still trying to figure out why this is, but I'd like to share how I figured this out (with the help of a network-oriented friend).

In two terminals I ran the following commands:

echo -e "sent on `date`" | mail -s "test: `hostname`" me@domain.tld
tcpdump -vv -s 1500 -w sendmail_25w.txt port 25
This produced a log which consistently showed that my non-gateway MTA sent a retransmit and the more powerful gateway (mta1) tore down the connection:
16:44:28.856481 IP server.domain.tld.41432 > mta1.domain.tld.smtp: 
P 135:699(564) ack 367 win 46


16:44:29.057802 IP server.domain.tld.41432 > mta1.domain.tld.smtp: 
P 135:699(564) ack 367 win 46


16:44:29.058142 IP mta1.domain.tld.smtp > server.domain.tld.41432: 
R 2420026399:2420026399(0) win 0
However it had no problem building the connection with mta0 and sending the mail. Why did the non-gateway MTA resend the extra packet and why did the gateway MTA reject it?

My friend points out that Jon Postel would say the gateway MTA was wrong as per RFC793 section 2.10, Robustness Principle. Also, why would mta0 accept it and mta1 reject it?

Here's that same data but in a screen capture from wireshark. I couldn't resist overstriking the image. Note the three red-dots showing the re-transmit: Note that my non-gateway MTAs are using vanilla sendmail from RHEL4/5:

$ /usr/sbin/sendmail -v -d0.1 < /dev/null | head -1
Version 8.13.1
...
$ /usr/sbin/sendmail -v -d0.1 < /dev/null | head -1
Version 8.13.8
When looking at mta1 I see it rejecting the connection with an I/O error:
$ fgrep me@server.domain.tld /var/log/maillog
Mar 19 15:52:44 mta1 sendmail[6255]: m2JJqidH006255: from=, size=558, 
class=0, nrcpts=1, msgid=<200803191952.m2JJqh0r018282@server.domain.tld>, proto=SMTP, daemon=MTA,
relay=server.domain.tld [123.456.78.9]
Mar 19 15:57:47 mta1 sendmail[16727]: m2JJvlnq016727: SYSERR(root): collect: I/O error on 
connection from server.domain.tld, from=
Mar 19 15:57:47 mta1 sendmail[16727]: m2JJvlnq016727: from=, size=566, 
class=0, nrcpts=1, proto=SMTP, daemon=MTA, relay=server.domain.tld [123.456.78.9]
We'll see if I figure this one out or live with mail going the wrong way for a few servers.

No comments: