Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Perl Journal magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Perl Journal Column 07 (Dec 2003)

[Suggested title: ``Blocking spam with Postfix and Amavis'']

Spam: what a mess. The wonderful Mail::SpamAssassin can catch most of it, looking at various spammy things in headers and bodies of messages, and even check external realtime block lists (RBLs) to help determine if a particular piece of email is indeed a reasonable email or simply someone getting a free ride to promote their own commercial activity. This month, I'd like to look at a recent change I made at the stonehenge.com mail server to help deal with the rapidly increasing amount of spam on the net.

Mail for the stonehenge.com domain is handled by blue.stonehenge.com, a server at a rack location provided by Sprocket Data (sprocketdata.com). We're running OpenBSD and Postfix, because I like to sleep at night and not worry about the ``hack of the week''.

Prior to the recent change, I had most of the mail for the stonehenge.com domain (with a few notable exceptions) be delivered to my personal merlyn account's mail, rewriting the destination to use the ``plus'' extended address format provided by Postfix. I accomplished this with an /etc/postfix/virtual_regexp entry that looked something like:

    /^stonehenge\.com$/
      whatever
    # other stonehenge.com rewrites are here
    # final catch-all:
    /^(.*)@stonehenge.com$/
      merlyn+for-stonehenge+$1

I also included this line into /etc/postfix/main.cf:

    virtual_maps = regexp:/etc/postfix/virtual_regexp

so that the virtual map was properly specified. As each email came in, procmail would launch, and consult my .procmailrc file, which looked something like:

    LOGFILE=$HOME/.procmail.log
    LOGABSTRACT=yes
    ## :0c
    ## $HOME/JustInCase/
    :0w:
    | Sortmail >>SORTMAIL.LOG 2>&1
    LOG="... Sortmail failed, bouncing ...
    "
    EXITCODE=75
    :0
    /dev/null

The key here is that each mail would be piped to my Sortmail program, but if anything went wrong, Postfix would retain the message in its own queue for a subsequent delivery. This saved my bacon more than once when I was editing Sortmail and forgot to check syntax before writing it out.

Within my Sortmail program, I extracted the very first ``Delivered-To'' header, and then undid the transformation applied by the virtual rewrite. This got me back to the original stonehenge.com address that had been requested.

I then constructed a Mail::Internet and Mail::Audit object from the incoming message:

    my $mi = Mail::Internet->new(\*STDIN);
    my $ma = Mail::Audit->new(data => [$mi->as_string =~ /(.*\n?)/g], noexit=> 1, log => '-', loglevel => 2);

I did this because although I started with Mail::Audit, I later found out it lacked some of the header access functions that I needed, so I had to punt and use Mail::Internet instead. Eventually, I hope to eliminate Mail::Audit entirely, as I've found it to be too funky and ugly for my needs.

After sorting through the delivery address to determine how the message would be delivered or autoresponded, I started adding checks using Mail::SpamAssassin to try not to autorespond to spam or deliver spam into my significant inboxes. Anything addressed to me personally that was spammish got dropped into my ``ube'' folder (unsolicited bulk email). Anything that was not addressed to me got dropped into my ``ubetrap'' file (as in a ``spamtrap'' address, but I always pronounced this rhyming with ``boobytrap'').

For any message that required an autoresponse, I eventually hooked in a Template Toolkit-based response template, passing the Mail::Internet object, the Mail::Audit object, and the constructed reply headers. A typical template looks like:

    [%
      head.Subject = "Your recent message to $to\n";
      INCLUDE normal_header;
    %]
    Why did you send a message to [% to %]?

    (It's very possible given the current sorry state of Microsoft
    so-called operating systems security that your address has been
    forged.  If so, please ignore me.  Sorry.)

    Randal L. Schwartz
    postmaster@stonehenge.com
    [% INCLUDE signature %]

This template handles ``bounces'' (addresses that aren't otherwise assigned within stonehenge.com). I send out a human message instead of a normal sendmail-like message because I need to know if they really intended the message for some other domain that was similar to stonehenge.com: it's amazing how many of those are out there. Most people ignore sendmail-like messages, but they'll respond in plain English to this letter.

There are other little parts to the mail system, but I hope you get the sense that it's a bunch of bailing wire and duct tape, because it is. And it evolved slowly over time, starting out initially as procmailrc targets, then evolving to use the Perl-based MailAgent, and then to this bizarre hodgepodge.

Initially, this system worked rather fine. I was dealing with about 300 to 500 pieces of email a day, sorting them into mailing list folders, personal mail, autoresponse mail for our Perl Training services and my legal case, and handling comp.lang.perl.announce postings, and a few lightweight ``mailing list'' rebroadcasters. Each incoming message triggered just two forks (procmail, then my Perl sortmail), and then got delivered.

But around the summer of 2003, the various Microsoft virus mailers started hitting. They started sending mail from randomly selected addresses to many targets, carrying their DNA along to infect the next system on the list.

Right away, I noticed that I was getting a lot of ``your mail contains a virus'' mail, from well-intentioned anti-virus programs. Let me make this perfectly clear. If you write an anti-virus program, and your anti-virus program can recognize that the virus fakes the ``From'' line, do not send a response to that clearly faked ``from''. These ``your mail contains a virus'' mails are worse than the virus mails themselves, at least for me.

But also what I noticed was a steady increase in the MIRVs. I'm using MIRV here in the ``multiple independently targetable reentry vehicles'' sense. An incoming spam letter would be addressed to a number of stonehenge.com addresses, and delivered in one SMTP connection.

The trouble with MIRVs is that they get burst by the local Postfix, and delivered as separate messages (although sharing one Message ID) to separate invocations of procmail, and then to separate invocations of my Sortmail. And each Sortmail would eventually get around to wondering if this was spam, and would make all the regex matches against the mail and all the RBL checks out on the internet, and come to identical conclusions (usually ``yes, it's spam'').

So, each MIRV caused 20 new processes to fire up on my box in the space of about two seconds, beating up on a lot of memory and CPU as those complex regexen were dragged through the mail, and a lot of DNS net traffic to see about RBLs. Ugh. It was a nice design before MIRV spam, but clearly failing now.

But how to fix it? It wouldn't be enough to move to spamc/spamd, which at least would remove the need to fork and reload all of that SpamAssassin code on each mail, because we still are asking the same question ten times because of the MIRVs.

But luckily, I recently stumbled across a Slashdot posting that mentioned Amavis (``A Mail Virus Scanner''), found at <http://www.amavis.org/>. Unlike mail-user-agent (MUA) tools like spamc or simple Mail::SpamAssassin-based custom tools, I could hook Amavis in at the mail-transfer-agent (MTA) level. Ahh! Before the MIRV has burst! This looked very promising.

Even more promising is that Amavis is written in Perl, and uses Spam::Assassin and Net::Server, both technologies with which I was familiar. I figured that if I had any trouble with Amavis, my Perl skills were probably sufficient to either reverse-engineer for understanding or customize the needed features.

Although Amavis can be used as the normal port-25 listener on a server, I didn't want to remove the known reliability and flexibility of having Postfix be my port-25 listener. Luckily, in the installation instructions, I saw how to make Amavis work alongside Postfix, and followed the instructions rather directly.

First, I unpacked Amavis into /opt/amavisd/, and created an etc and sbin directory alongside the now unpacked source directory. I also created an amavis user, allowing the home directory to default to /home/amavis.

Next, I copied amavisd to the sbin directory, and amavisd.conf to the etc directory. I edited the amavisd.conf file (putting it under RCS first) to reflect local preferences. Many of settings were as recommended by the README.postfix file included with the distribution.

First, I fixed $MYHOME to /home/amavis and $mydomain to stonehenge.com. (I like that the config file is in Perl and not some obscure config language.) Then I set $daemon_user and $daemon_group both to amavis, as I had chosen, and pushed $TEMPBASE into the tmp subdirectory.

Skimming down, I found the POSTFIX section, uncommenting the lines for $forward_method and $notify_method there. Finally, I uncommented the line that set @bypass_virus_checks_acl to a single period. Since I didn't care about virus checks, and only about spam, I wanted to keep Amavis single-minded.

I changed the $QUARANTINEDIR to be below /home/amavis for simplicity. And finally, I commented the $sa_local_tests_only line, causing SpamAssassin to also consider the RBL tests.

After making all these changes, I then proceeded with the testing as indicated in the README.postfix file, double checking every step, because my machine was handling live email. Ultimately, my /etc/postfix/master.cf was altered to comment out three lines:

    #AMAVIS# smtp      inet  n       -       -       -       -       smtpd
    #AMAVIS# pickup    fifo  n       -       -       60      1       pickup
    #AMAVIS# cleanup   unix  n       -       -       -       0       cleanup

replacing those with:

    smtp      inet  n       -       -       -       -       smtpd
        -o cleanup_service_name=pre-cleanup
    pickup    fifo  n       -       -       60      1       pickup
        -o cleanup_service_name=pre-cleanup
    cleanup unix    n       -       -       -       0       cleanup
        -o mime_header_checks=
        -o nested_header_checks=
        -o body_checks=
        -o header_checks=

And adding these new services as well:

    pre-cleanup  unix n     -       -       -       0       cleanup
            -o virtual_alias_maps=
            -o canonical_maps=
            -o sender_canonical_maps=
            -o recipient_canonical_maps=
            -o masquerade_domains=

    smtp-amavis unix -      -       y       -       2  lmtp
        -o smtp_data_done_timeout=1200
        -o disable_dns_lookups=yes

    127.0.0.1:10025 inet n  -       y       -       -  smtpd
        -o content_filter=
        -o local_recipient_maps=
        -o relay_recipient_maps=
        -o smtpd_restriction_classes=
        -o smtpd_client_restrictions=
        -o smtpd_helo_restrictions=
        -o smtpd_sender_restrictions=
        -o smtpd_recipient_restrictions=permit_mynetworks,reject
        -o mynetworks=127.0.0.0/8
        -o strict_rfc821_envelopes=yes

I also added the startup for amavisd to /etc/rc.local:

    if [ -x /opt/amavisd/sbin/amavisd ]; then
            echo -n ' amavis ';
            sudo -u amavis /opt/amavisd/sbin/amavisd -c /opt/amavisd/etc/amavisd.co\
    nf
    fi;

But that's it. It's been working quite well for me, and my load average has been significantly less. Now the MIRVs get processed once instead of ten times, and all is well. Almost immediately after making this change, net connections on the box were more stable, and my system didn't suddenly spike and freeze up my Emacs session nearly as much as it formerly did. And I'm not dealing with anywhere near the volume of identical spams, so my personal mail volume is also lower.

So, consider adding Amavis to your MTA, and help fight your neverending spam battle in an efficient way. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Perl Journal Column 07 (Dec 2003)