Wednesday, October 24, 2007

mail gateway load balancing

I have a dedicated mail gateway (mta0) which filters spam. It's been overworked so I set up a second (mta1). mta0 is the master and stores spam definitions and user preferences in a MySQL DB. mta1 is a slave and receives these content updates from mta0.

In order to get both systems sharing the load I simply add a second MX record for mta1. Here are the relevant portions of my zone file before:

$ grep -n MX domain.tld.zone
16:                     MX      10 mta0.domain.tld.
359:mail                   MX      10 mta0.domain.tld.
$                                                                               
and after:
$ grep -n MX domain.tld.zone
16:                     MX      10 mta0.domain.tld.
17:                     MX      10 mta1.domain.tld.
360:mail                   MX      10 mta0.domain.tld.
361:mail                   MX      10 mta1.domain.tld.
$  
BIND will automatically swap the order of either MX record for a given lookup. E.g. note how 0 or 1 end up on top for alternating queries:
$ dig @dns.domain.tld domain.tld +short MX 
10 mta0.domain.tld.
10 mta1.domain.tld. 
$ dig @dns.domain.tld domain.tld +short MX
10 mta1.domain.tld.
10 mta0.domain.tld.  
$ 
It then just takes a little time for your DNS updates to propagate. You can test your changes by using mxtoolbox.com or sending mail from hosts like gmail and yahoo and seeing which mta relayed by viewing full headers. Before you drop a second email hub into service be sure that it sends mail where it should. It would be a shame if half of your mail was lost. Use the following test and make sure you get the email where you'd expect. You might need to adjust your spam filter to let the test message below through:
telnet mta1 25
HELO workstation.domain.tld
MAIL FROM: me@domain.tld
RCPT TO:me@domain.tld
DATA
test
.
One nice thing about this you can add the second system without any downtime. mta0 does not need to be brought offline; it's just a matter of waiting for DNS to propagate. Since mta1 has twice as much CPU and RAM as mta0 I'm going to look into weighing the records so that mta1 gets more of the load.

No comments: