amavisd-new documentation bits and pieces

The most recent version of this document is available at http://www.ijs.si/software/amavisd/amavisd-new-docs.html

performing mail checks
acting on mail checks results
tag, tag2 and kill levels
quarantine
hard black- and whitelisting senders regarding spam
soft black- and whitelisting senders regarding spam -- @score_sender_maps
configuration variables
policy banks
$max_requests and max_use

Performing mail checks

The following checks on mail are available

mail header validity checks
banned names and types checks
virus checks
spam checks
is sender white- or blacklisted (regarding spam)

Although checks are presently not performed in parallel, it is best to consider the order of their evaluation unspecified (unknown). Besides possible future parallel implementation, another reason is the caching of results, where subsequent mail with the same contents may benefit from earlier checks if validity of these check results has not yet expired -- so a check result may be instantly available, regardless of whether it has been asked for or not.

Using configuration variables @bypass_virus_checks_maps, @bypass_banned_checks_maps, @bypass_header_checks_maps and @bypass_spam_checks_maps each recipient (or administrator on their behalf) may suggest that certain tests are not needed, primarily for performance reasons. Although the @bypass_*_checks_maps pertain to individual recipients, a mail check is an operation done on the whole message, regardless of the number of recipients and their individual preferences. Suggestion by some of the recipients that certain check is not needed (is to be bypassed) does not guarantee the test will not be performed.

Similarly the (hard) blacklisting or whitelisting of sender address may make running spam check unnecessary, but it does not guarantee the spam check result will not be available for subsequent decisions.

There are two primary reasons why a check result may still be available despite the bypass hint or a sender being black- or whitelisted:

a check result from some previous mail with the same contents has been cached and is still valid;
when mail has multiple recipients and not all of them agree that a check should be bypassed.

The amavisd-new program is allowed to skip some check for performance reasons if all recipients agree that a check is not necessary (that it may be bypassed), or if the outcome of a check to be skipped could not influence further mail processing and delivery/non-delivery of the message (as is the case of a sender being black- or whitelisted regarding spam check).

For example spam checks may be skipped if it is already known that a mail is infected. This is an implementation and optimization issue, and no guarantee is given about interdependency of checks. Future version may use a different strategy of performing checks (e.g. some checks may be performed in parallel), as long as a change does not affect the final outcome.

Acting on mail checks results

Based on the outcome of mail checks performed during mail analysis or cached from previous mail with the same contents, and based on global settings and individual recipient preferences, the program now decides what action to perform next. As described in the previous section, not all results of checks are necessarily know (e.g. if all recipients voted for some check to be bypassed). For the purpose of deciding further actions, unknown results of a check are considered equivalent to negative (false) results, i.e. skipped virus check is treated the same as non-infected mail, bypassed spam check is equivalent to low spam score (ham).

The following decisions are made at this stage:

whether a mail should be quarantined and how;
whether an administrator (and which administrator) should receive a notification (and which notification);
whether recipients should receive a notification;

and regarding mail delivery and/or sender (non)delivery notifications:

whether a mail should be delivered to each recipient or not;
whether delivered mail should be modified (header edits, defanging);
whether a sender should receive a (non)delivery notification (bounce);
what should be the final status code returned to the mailer (reject/pass).

For the purpose of deciding on these actions, a mail is classified based on all available checks results. It is quite possible that more than one check results would be positive (e.g. virus and banned and bad header, or spam and bad header, or virus and spam), yet a mail is considered to be only in one category. The logic is currently hard-wired into the program and can not be influenced by configuration variables. The following order is used, the first condition met decides the outcome:

a virus is detected: mail is considered infected;
contains banned name or type: mail is considered banned;
spam level is above kill level for at least one recipient or a sender is blacklisted: mail is considered spam;
bad (invalid) headers: mail is considered as having a bad header.

This decision order explains why amavisd-new is not free to skip (to optimize away) virus checks if a presence of a banned name or a bad header is already known or can easily be determined. The order was chosen with the intention that a more informative or a stronger assertion is the one to base further mail delivery on, and to be quoted in notifications and in the log. Even at the expense of possibly longer processing time, it is more important to declare a mail infected than complain about a bad header, a banned executable or spamy contents.

The determined mail category now governs further action. Administrators are notified if enabled for the category, mail is quarantined if quarantining if enabled for the category, recipients are notified if enabled for the category.

Next a mail delivery is attempted. A decision to deliver depends on mail category and on global and individual recipient preferences. The global setting $final_*_destiny=D_PASS or a per-recipient setting @*_lovers_maps ensure mail delivery for corresponding mail category even if mail would otherwise be blocked for being infected or banned or spam or having a bad header.

A mail that is decided to be passed to an individual recipient undergoes some simple header editing which happens on-the-fly during mail forwarding. Certain mail header fields may be inserted or removed, or an existing header field (e.g. Subject) may be modified. This header editing may be different for each recipient even in multi-recipient messages. If necessary, a multi-recipient mail is split into more than one forwarding transaction, grouping (clustering) recipients with same settings into one SMTP transaction.

Based on decisions to forward or to block mail to each recipient, and on the global setting for the mail category ($final_*_destiny=D_BOUNCE or D_REJECT), the sender (non)delivery notification is now prepared in case of D_BOUNCE, and MTA receives a 2xx status (success); or in case of D_REJECT the MTA receives a 5xx (reject) status and preparing sender notifications is thus delegated to MTA (not recommended).

Even in cases of mail non-delivery when a (non-)delivery status notification (DSN) for the sender should have been prepared and sent, there are certain exceptions where the DSN is suppressed, which makes mail effectively lost as far as the sender and the recipient are concerned (but quarantining is not affected):

when $final_*_destiny=D_DISCARD;
when mail is infected and the detected virus name matches the @viruses_that_fake_sender_maps;
when spam score exceeds level determined by @spam_dsn_cutoff_level_maps for all recipients;
when mail is coming from a mailing list, as determined by examining a mail header Precedence: for containing string 'bulk' or 'list' or 'junk';

tag, tag2 and kill levels

When SpamAssassin is called upon to analyze a mail message, it returns a spam score (spam level, hits), which is a numeric representation of spaminess. The higher the number, the more spamy the message is considered. Small numbers near zero or negative indicate a clean message, colloquially called ham. The spam score is a characteristic of the whole message, and does not depend on recipient preferences. SpamAssassin is called only once for each message regardless of the number of recipients.

To determine further course of action, amavisd-new compares the spam score to three numeric values: tag level, tag2 level and kill level. These values may be different for each recipient, and the further actions may be different for each recipient. If necessary, the mail forwarding is split into more than one transaction to cater for different recipient preferences.

tag level: if spam score is at or above tag level, spam-related header fields (X-Spam-Status, X-Spam-Level) are inserted for local recipients; undef is interpreted as lower than any spam score;
tag2 level: if spam score is at or above tag2 level, spam-related header fields (X-Spam-Status, X-Spam-Level, X-Spam-Flag and X-Spam-Report) are inserted for local recipients, and X-Spam-Flag and X-Spam-Status bear a YES; also recipient address extension (if enabled) is tacked onto recipient address for local recipients; for these actions to have any effect, mail must be allowed to be delivered to a recipient;
kill level: if spam score is at or above kill level, mail is blocked; and sender receives a nondelivery notification unless spam score exceeds dsn cutoff level.

The general idea is that kill level is what controls the main actions as far as MTA and amavisd-new is concerned (regardless of what recipients' MUA later does with the mail).

Reaching kill level for at least one recipient controls the following:

mail gets quarantined (unless disabled)
spam administrator gets a notification (unless disabled)
ContentSpamMsgs counter is incremented
spam defanging is done (unless disabled)
sender gets a notification if warnspamsender is true and $final_spam_destiny is D_PASS
if message is not delivered, sender gets a nondelivery notification (suppressed under certain conditions).

On the other hand the tag2 level just adds some mark to the passed mail (only for local recipients), which recipient or his MUA may decide to act on or not. Specifically:

Subject header field is modified (unless disabled)
X-Spam-Flag and X-Spam-Status header field get a Yes
address extension for spam gets tacked on the recipient address
and (perhaps inconsistently with the rest) the mail log entry says 'Passed SPAM' instead of 'Passed CLEAN'.

If a recipient (or its MUA) decides to discard the mail based on tag2 marking, there is no way to retrieve it later from a quarantine, the sender is never notified, spam administrator is never notified. As far as the MTA and amavisd-new are concerned, the message was successfully delivered. Whatever MUA does with the mail is entirely the responsibility and jurisdiction of the recipient.

Quarantine

Mail quarantining is attempted when mail is infected or banned or spam score for at least one of its recipients is at or above his kill level. The *quarantine_to for each recipient (when nonempty), along with a global corresponding *_quarantine_method, determines where the quarantine location should be.

The *_quarantine_method can be considered a static and a site-wide setting, generally controlling the format and location of the quarantine on the system. The *quarantine_to can be considered the dynamic part of the quarantine location, possibly affected by per-recipient settings and the class of malware. It servers to fully specify the final location, e.g. a file or a mailbox.

Depending on the mail category (type of malware), the following variables specify the quarantine method: $virus_quarantine_method, $spam_quarantine_method, $banned_files_quarantine_method, and $bad_header_quarantine_method. One way to globally disable quarantine is to specify undef or an empty string as a value of these variables. A nonempty string should follow a syntax:

local:filename-template
bsmtp:filename-template
smtp:hostname:port
smtp:[ip-address-or-hostname]:port

The local: and bsmtp: methods are useful for quarantining. The smtp: is currently not useful for quarantining (is used in forwarding and notifications), and is only listed here for completeness and possible future use.

Depending on the method specified (local/bsmtp/smtp) a per-recipient setting *quarantine_to adopts different semantics and syntax, possibly modified by the configuration variable $QUARANTINEDIR.

method	quarantine_to	`$QUARANTINEDIR`	effect
`local:`	e-mail address containing '@'-sign	anything	sent via SMTP to the mailer for storage, uses $notify_method to specify how to deliver to MTA
`local:`	pseudo-alias mapped through %local_delivery_aliases	directory	stored as an individual file below the directory `$QUARANTINEDIR`, file name comes from the template specified in the *_quarantine_method
`local:`	pseudo-alias mapped through %local_delivery_aliases	filename of a mailbox	appended to a file `$QUARANTINEDIR` in mbox format
`local:`	pseudo-alias mapped through %local_delivery_aliases	empty or undef	not quarantined
`bsmtp:`	anything (nonempty)	anything	stored in the file specified in the *_quarantine_method in BSMTP format (if file name is absolute, i.e. starts with a "/")
`bsmtp:`	anything (nonempty)	directory	stored in the file specified in the *_quarantine_method in BSMTP format (file name relative to `$QUARANTINEDIR`)
empty or undef	anything	anything	not quarantined
anything	empty or undef	anything	not quarantined

The *quarantine_to is currently quite limited in functionality, it is often used only to turn off the quarantining for some user or local subdomain. The reason for this limited functionality is a more vulnerable nature of this value, as it may come from SQL or LDAP lookups where non-careful access controls to these databases might permit users to enter any value in the *quarantine_to field, which is why we do not let it control the directory or the exact file name of the quarantine file. This may be somewhat relaxed in the future.

In common setups the quarantine location (e.g. a directory or a dedicated mailbox) is the same for all recipients. If at least one recipient specifies a nonempty *quarantine_to specifying this location, the message is quarantined (stored) there once, regardless of the number of recipients.

The general algorithm is: the *quarantine_to value associated with each recipient is looked up. Empty or undef values are ignored and duplicates are discarded. A mail to be quarantined is then stored/sent to each unique location remaining on the list.

The "bsmtp:" quarantine method is somewhat special in that the quarantine file location is entirely determined by the *_quarantine_method setting, and the value of per-recipient *quarantine_to settings do not influence the quarantine location, as long as this value is nonempty.

When using the "bsmtp:" quarantine method and versions of amavisd-new earlier than 2.2.0, the *_quarantine_to was completely ignored, which made it impossible to turn off quarantining selectively for certain users by specifying an empty or undef value. Since 2.2.0, an empty *_quarantine_to turns off quarantine for a recipient regardless of the quarantine method. A nonempty string in *_quarantine_to (the exact value is ignored) must now be used even with "bsmtp:" to enable quarantining.

Hard black- and whitelisting senders regarding spam

The blacklisting and the whitelisting are ways of telling that we already know that a message is spam or is ham (non-spam) just by examining the envelope sender address and comparing it to lists of known spammers or to lists of known legitimate senders of ham. It is a quick check, potentially saving us the trouble of examining the mail contents. It has a big drawback however in that the sender mail address can be (and often is) faked and there is no guarantee that the claimed sender address represents the actual sender.

The sender address is usually faked for spam messages, so whitelisting some sender address is a of questionable value, and often lets in far more spam than it does good by approving legitimate mail. For a reliable way of permitting certain sending clients to send spamy mail see policy banks.

Blacklisting however is still useful: spammer has no desire to pretend to be some blacklisted sending address, when he can choose any other address. Genuine sender that is intentionally blacklisted can only avoid being blocked by falsifying his address (joining spammers in his methods) and sending non-spamy mail, the later being our objective anyway. Although amavisd-new does provide blacklisting, it is functionally equivalent but more effective to blacklist senders at the MTA, preventing such mail from even entering the mail system.

It should be emphasized that whitelisting (and blacklisting) only affects spam checks. It has no influence on other checks such as virus, banned or header checks. Infected mail from whitelisted sender would still be blocked if our policy is to block viruses.

Another point to bear in mind is that the sender address examined is the one from the SMTP protocol, exactly as provide by MTA to amavisd-new. It is know as the envelope sender address or return path. This address does not necessarily match the mail author's address from the mail header (From:) or the sender's address from the header (Sender:). This is most obvious with mail from mailing lists, where the envelope sender address is usually the address of a mailing list management service, while the author's address (From:) is the address of a person sending the message. Using the envelope sender address in most cases makes it easier to black- or whitelist mail from mailing lists, compared to guessing a sender address by parsing mail header.

To avoid surprises, whitelisted sender suppresses inserting/editing the tag2-level header fields (X-Spam-*, Subject), appending spam address extension, and quarantining, even if we know the message is spam (e.g. because the spam check result on the same mail contents has been cached from some earlier mail or known from check on behalf of another recipient).

For mail from blacklisted senders, the effect is as if the spam level were artificially pushed high, resulting in 'X-Spam-Flag: YES', high 'X-Spam-Level' bar and other usual reactions to spam, including possible rejection. If the message nevertheless still passes (e.g. for spam loving recipients), it is tagged as BLACKLISTED in the 'X-Spam-Status' header field, but the reported spam value and set of tests in this report header field is not adjusted (if available from SpamAssassin, which may or may not have been called)

If all recipients of a message either white- or blacklist the sender, amavisd is free to skip spam scanning (calling the SpamAssassin), saving on time. There is no guarantee however that spam scanning will actually and always be skipped.

The following variables (lists of lookup tables) are available, with the semantics and syntax as specified in README.lookups: @whitelist_sender_maps, @blacklist_sender_maps, which implement global policy applicable to all recipients. Similarly there are $per_recip_blacklist_sender_lookup_tables and $per_recip_whitelist_sender_lookup_tables, which make possible for each recipient or subdomain to specify its own set of black- or whitelisted senders. The per-recipient tables take precedence over global tables.

For SQL lookups, amavisd-new will first lookup the recipient in table users in order of descending priority, e.g. user@sub.domain.org, user, @.sub.domain.org, @.domain.org, @.org, and @. (which can be considered a catchall). Each matching recipient record may have a list of senders associated (through join on field users.id and wblist.rid). The sender address is then looked up in the associated list of senders (wblist) in order of descending priority, e.g. sender@sub.example.com, @.sub.example.com, @.example.com, @.com, and @. . This search stops at the first matching sender record with a non-NULL field wblist.wb. The value of a field wblist.wb from the matched record determines if the sender is considered whitelisted ('W'), blacklisted ('B') or neutral (' ') for this recipient.

The neutral value is there just as a way to explicitly stop the search, which may be used by a recipient to overrule site-wide or static white- or blacklisting defaults for some specific sender, and to explicitly neither whitelist nor blacklist the sender, letting the normal spam check determine the spaminess of a mail.

For recipient user@sub.domain.com and sender sender@sub.example.com the following search is performed:

user@sub.domain.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.
  
user
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.sub.domain.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.domain.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

Soft black- and whitelisting senders regarding spam -- @score_sender_maps

Instead of hard black- or whitelisting a sender address (unconditionally considering mail spam or ham solely based on sender address regardless of mail contents), a more gentle approach is to add score points (penalties) to the spam score for mail from certain senders or sending domains. Positive points lean towards blacklisting, negative towards whitelisting. This is much like adding SpamAssassin rules or using its white/blacklisting, except that here only envelope sender addresses are considered (not addresses in a mail header), and that score points can be assigned per-recipient (or per-domain or globally), and that the assigned penalties are customarily much lower than the default SpamAssassin white/blacklisting score.

The table structure of @score_sender_maps is similar to $per_recip_blacklist_sender_lookup_tables i.e. the first level key is recipient address, pointing to by-sender lookup tables. The essential difference is that scores from all matching by-recipient lookups (not just the first that matches) are summed to give the final score boost. That means that both the site and domain administrators, as well as the recipient can have a say on the final score.

For SQL lookups, the mechanism is much like the one described for hard black- or whitelisting, with the following differences:

the field wblist.wb is numeric, representing score points, instead of containing a character W or B or space;
the search through matching recipients does not stop at the first match, but traverses all matching recipients, summing up the corresponding wblist.wb field values.

Namely, amavisd will lookup the recipient, e.g. user@sub.domain.org, user, @.sub.domain.org, @.domain.org, @.org, and @. . Since the search will not stop at the first recipient match, the search order in this case is unimportant, although it is actually the same descending-priority order as with hard b/w listing. Each matching recipient record may have a list of senders associated (through join on field users.id and wblist.rid). The sender address is then looked up in the associated list of senders (wblist) in order of descending priority, e.g. sender@sub.example.com, @.sub.example.com, @.example.com, @.com, and @. . This search stops at the first matching sender record with a non-NULL field wblist.wb, but this does not terminate the outer recipients search. Numeric values of a field wblist.wb from matched records are summed up across all matching recipients tables, and the result is added to the spam score as produced by SpamAssassin.

Unlike static tables, where hard and soft w/b-listing use separate tables, the SQL-based hard and soft w/b-listing uses the same SQL tables and the same field wblist.wb. Mixing the 'W', 'B' with numeric values is somewhat frowned upon, but is supported to facilitate transition. The search goes like described above as long as only numeric field values are encountered, summing up the values and adding the accumulated sum to the final score. If a non-numeric value of field wblist.wb is encountered during this search, its value (W or B or space) is interpreted as described for hard w/b listing, and the search stops at this point.

Configuration variables

The behaviour of the amavisd-new is controlled by a set of configuration variables, which are just normal module-global Perl variables (in package Amavis::Conf). At daemon startup time these variables are first assigned an initial value (often just an undefined value, the undef). The default values of configuration variables are documented in file amavisd.conf-defaults, which lists all configuration variables.

Next a configuration file amavisd.conf (or other file as specified by option -c) is read and interpreted by the Perl interpreter itself. The amavisd.conf is just a normal Perl program, and can in principle do whatever and however it pleases, but its main purpose is to assign values to configuration variables.

After execution of amavisd.conf is done, the daemon may correct some configuration variable values (mainly to maintain backwards compatibility with earlier version of configuration file), and may assign a default value to certain variables which are still undefined -- these variables and their default values are marked "after-defaults" in the documentation file amavisd.conf-defaults. The main reason for existence of the "after-defaults" concept is that some default values depend on other configuration variables and can not be computed before the amavisd.conf is finished. To force such variables to an off/false/disabled state, one needs to assign some false but defined value to them, such as '' (an empty string) or a 0 for booleans.

Perl variables always start with a character $, @ or % to indicate a type of variable. This leading character is part of the variable name for all practical purposes.

$ (dollar character): indicates a scalar variable (a string, a number, a reference)
@ (at sign): indicates an array variable (a list)
% (percent character): indicates an associative array (also known as hash), which maps keys to values

A couple of Perl syntactical elements deserve mention at this point, as they are often used in the amavisd.conf configuration file.

"...", a double-quoted string: is a string; variables within are evaluated, e.g. "$MYHOME/tmp"
'...', a single-quoted string: is a string; variables within are not evaluated, the $ and @ loose their special meaning, e.g. 'user@example.com'
(...): is a list of comma-separated expressions, e.g. (1,2,"test"); a list is normally assigned to an array variable
qw(string): is an operator that interprets its argument as a single string, splits it on whitespace to words, and returns a list of words (strings); it is a convenience to avoid some typing, e.g. qw(user@example.com .example.net .org) is exactly equivalent to ('user@example.com', '.example.net', '.org');
[...]: is a reference to an anonymous list of comma-separated expressions, e.g. [1,2,"test"]; (note: a reference is a scalar)
{...}: is a reference to an anonymous associative array, e.g. {'alfa'=>1, 'beta'=>99, 'other'=>'test'}; (note: a reference is a scalar)
\variable: is a reference to a variable, e.g. \$virus_admin, \@mynetworks, \%whitelist_sender; (note: a reference is a scalar)

Historically amavisd-new accessed all configuration directly with their name, e.g. %spam_lovers, @spam_lovers_acl, $spam_lovers_re. Later it became apparent that certain groups of variables (lookups) are always used together in the same way, so new array variables like @spam_lovers_maps were introduced. The program now never accesses old lookup table variables directly, but always through higher level lists. The solution is fully backwards compatible, as the default value for the new lists references the old variables, e.g.:

@spam_lovers_maps = (\%spam_lovers, \@spam_lovers_acl, \$spam_lovers_re);

Administrator is free to modify or replace the lists in variables like @spam_lovers_maps, perhaps rearranging the order or loosing all references to legacy variables, and replacing them with other variables, often anonymous arrays/lists or anonymous associative maps (hashes), or constants which can serve as a convenient catchall default value when used last in the list.

Since amavisd-new version 2.0, there is one further generalization step in the way a program accesses configuration variables. More than a hundred configuration variables which control amavisd-new operation on a by-message level (as opposed to by-recipient and truly global settings) are now grouped in associative array called a policy bank. These configuration variables are no longer accessed directly by their variable name by the program, but always through a currently installed policy bank. Administrator is free to modify the policy bank, normally by providing replacement policy banks and specifying under what conditions the replacement policy bank it to be automatically installed.

Policy banks

Policy banks hold sets of configuration variables controlling most of per-message settings, including: static lookup tables, IP interface access rules, forwarding address, log level, templates, administrator addresses, spam trigger levels, quarantine rules, lists of anti-virus scanner entries (or just a subset), banned names rules, defang settings, etc. The whole set of these settings may be replaced with another predefined set based on incoming port number, making it possible for one amavisd daemon to cope with more diverse needs of served user communities which could so far only be implemented by running more than one instance of the amavisd daemon, each with its own configuration file.

This mechanism brings new potentials for the future: in principle policy banks could be swapped not only based on port number or SMTP client IP address, but on any characteristics pertaining to a mail message as a whole (not specific to each of its recipients), or to characteristics of a connection from a mailer (e.g. the interface address or protocol);

Until a better mechanism is available, a policy bank named 'MYNETS' has special semantics: this policy bank gets loaded whenever MTA supplies a SMTP client IP address (through Postfix XFORWARD extension or a new AM.PDP protocol) and that address matches the @mynetworks list.

An associative array %interface_policy is a current mechanism of assigning a policy bank to an incoming TCP port number (port must be in the list @$inet_socket_port). Whenever the connection from MTA is received, first a built-in policy bank with an empty name -- the $policy_bank{''} gets loaded, which brings in all the global/legacy settings. Then it is overlaid by whatever configuration settings are in the bank named in the $interface_policy{$port} if any, and finally the policy bank named 'MYNETS' (i.e. settings from $policy_bank{'MYNETS'}) is overlaid if such policy bank exists and the SMTP client IP address is known (by XFORWARD SMTP extension command from MTA) and it matches @mynetworks. See amavisd.conf-sample for examples.

When a new policy bank is overlaid over an existing set of configuration variables, the variables not present in the new policy bank retain their value.

The built-in policy bank (with empty name) is predefined, and includes references to most other variables (the dynamic config variables), which are accessed only indirectly through the currently installed policy bank. Overlaying a policy bank with another policy bank may bring in references to entirely different variables, possibly unnamed, and may remove references to legacy variables if it so chooses.

Configuration variables are referenced from a policy bank (which is implemented as a perl associative array, i.e. a hash) by keys of the same name, e.g. { log_level => \$log_level, inet_acl => \@inet_acl, ...}. For scalars one level of indirection is allowed, e.g. a policy bank { log_level => \$log_level }; $log_level=2; is equivalent to { log_level => $log_level } or to { log_level => 2 }, but in the first example with an indirect reference, the $log_level may be assigned to even _after_ the policy bank has already been formed.

One word of caution: the syntax of entries within a policy bank hash is slightly different from assignments to configuration variables. This is because entries within policy bank are not assignments, but key=>value pairs as in any Perl hash. And these pairs are delimited by commas, unlike statements, which are delimited by semicolons. Value is separated from its key by '=>' (or by a comma), whereas the assignment operator is '='. Keys of a policy bank are without leading $ or @ or %, unlike variable names. Values of a hash can only be scalars (e.g. strings or numbers or references to arrays or references to hashes).

Compare:

value of a policy bank is a reference to a Perl hash, e.g.:

    { log_level => 3,
      forward_method => 'smtp:[127.0.0.1]:10025',
      spam_admin_maps => ["spamalert\@$mydomain"],
    }

normal assignments look like:

      $log_level = 3;
      $forward_method = 'smtp:[127.0.0.1]:10025';
      @spam_admin_maps = ("spamalert\@$mydomain");

And a final note: Perl can detect and report typing mistakes in variable names, but mistyped key is just some unused hash entry lurking in a hash, never used and never reported as mistyped/useless.

A case study

The sender address can be faked, so comparing envelope sender address to @local_domains_maps to base some important decisions on would not be useful. The only reliable way to detect if mail is originating from inside or from outside is to test the IP address of the sending SMTP client, or to set up a separate MTA path for internally originating and for externally originating mail.

Since in this particular example external mail comes in via fetchmail, it is probably easiest to configure fetchmail to submit mail to Postfix on a different port than any other internally originating mail uses.

In /etc/postfix/master.cf add another smtpd service, listening for example on port 10088 :

10088 inet n - n - - smtpd -o content_filter=smtp-amavis:127.0.0.1:10026

This also tells Postfix that for mail coming in on port 10088 (from fetchmail) the content filter to be used is at port 10026 (not the default one at 10024, as configured by the global content_filter in main.cf).

In .fetchmailrc add

  smtphost localhost/10088

to the poll section.

In amavisd.conf tell it to listen on port 10026, besides the more usual 10024:

  $inet_socket_port = [10024,10026];

Now one may make up a name for a policy bank which will cover only internally originating mail, let's pick a name INTERNAL. Tell amavisd to load policy INTERNAL when a request comes in on port 10024:

  $interface_policy{'10024'} = 'INTERNAL';

(alternatively, or in addition, one might make up another policy and attach it to port 10026, but we'll just use the global settings for the other port)

Now one may prepare the policy INTERNAL and specify there the options which should be different from normal options for externally originating mail. For example:

$policy_bank{'INTERNAL'} = {
  log_level => 2,
  spam_admin_maps => ["virusalert\@$mydomain"],
  virus_admin_maps => ["virusalert\@$mydomain"],
  spam_kill_level_maps => [7.0],
  spam_dsn_cutoff_level_maps => [15],
  final_virus_destiny => D_BOUNCE,
# notify_spam_sender_templ => read_text("$MYHOME/notify_spam_sender.txt"),
};

$max_requests and max_use

Amavisd-new runs under process control of Net::Server. This is a pre-forked environment where $max_servers child processes are constantly kept alive and ready to accept new tasks (mail messages to be checked). Each amavisd child process is able to handle several tasks in a row, which helps to reduce startup (fork) costs. In case of SMTP or LMTP protocol, each session may consist of several SMTP/LMTP transactions. Each SMTP/LMTP transaction is counted a one task, regardless of whether it came in from the same SMTP/LMTP client in a multi-transaction session, or as separate sessions, possibly from different SMTP/LMTP clients.

A configuration variable $max_requests (default value 10) controls the approximate number of tasks each child process is willing to handle. After that the child process terminates and Net::Server provides a new child process to take its place.

The exact value of $max_requests is not critical. There are two opposing needs, and some in-between value should be chosen.

On the low side, the number should not be too small in order for the startup cost to be averaged out / sufficiently diluted over an entire child lifetime. A value above 5 or 10 meets this goal in most amavisd-new configurations.

On the high side, the value depends on the amavisd-new configuration. The amavisd daemon itself is conservative in its use of dynamically allocated memory and does not load a mail into memory, but keeps mail being processed and its components on files. Similarly, most of the called external virus scanners and decoders are rational in their use of memory (a notable exception is Archive::Tar which is used if cpio command is not available). Unfortunately this is not true for Perl module Mail::SpamAssassin, which expect to have an entire mail in memory in order to be able to run its large set of rules on it in reasonable time. This is a design decision made by SpamAssassin creators, and we have to live with it.

When amavisd-new is not configured to use SpamAssassin, the value of $max_requests can be quite high without any known or expected problems. For general sanity reasons, an upper limit could be a 100 for example, although anything above 20 or so would not bring measurable benefit to the maximum sustained mail throughput.

When amavisd-new is configured to use SpamAssassin however, the slurping of entire mail in memory may have implications, depending on the maximum mail size allowed at the MTA (e.g. Postfix setting for message_size_limit). Even though the allocated memory is reclaimed by Perl after mail processing, and is reused for subsequent processing, the process virtual memory footprint never shrinks, it can only expand as needed.

With a default value of message_size_limit near 10 MB this is not a serious problem, and $max_requests can be fairly large, although since the additional performance gain is negligible for values beyond 20 or so, there is no good reason to choose much larger value than that.

Some sites however chose not to limit mail size, or increase the maximum mail size limit substantially. If a large mail arrives at such site, the virtual memory of amavisd child process is extended to accommodate the message. For the rest of its lifetime the child process that processed the mail stays at its high virtual memory size. If this happens frequently, host resources may become scarce. Limiting the number of tasks each child is supposed to process is very much desirable on such systems.

The default value of 10 for $max_servers was chosen as a reasonable compromise between averaging-out the startup costs and not wasting too much resources on hosts with high message size limit and SpamAssassin enabled.

In the setup with Postfix where its lmtp client is chosen to feed amavisd-new, this client tries to keep LMTP session open and submit several mail messages in multiple transactions. With recent Postfix versions its SMTP client is capable and willing of using multiple transaction sessions as well, although it seems to be less persistent than the LMTP client.

According to SMTP and LMTP protocol specifications, dropping the session on the server side is considered rude and should be used only as a last resort. In order to respect the $max_requests setting (which is not strictly enforced by amavisd, and is considered an advisory value), the client side should be configured with a comparable limit. In case of Postfix, its smtp client service already limits cached multiple transactions to 10 or so, so no special options are needed on the Postfix side.

The current Postfix lmtp client is more persistent. In the future it is expected to behave more like the smtp service, but until then one may choose to apply the max_use Postfix limit to this service (or globally if tolerable). A recommended value of max_use (if feeding amavisd by lmtp) is the same or similar in value as $max_requests.

mm
Last updated: 2004-10-27