amavisd-new-docs.html [plain text]

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet href="#internalStyle" type="text/css"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <title>amavisd-new documentation bits and pieces</title>
  <meta name="AUTHOR" content="Mark Martinec" />
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
  <link rev="made" href="mailto:mark.martinec@ijs.si" />
<style type="text/css" id="internalStyle">
  body { background: white; color: black }
  kbd { font-family: monospace }
  img.noboarder { color: white; border: none }
</style>
<!-- [link rel="STYLESHEET" href="./a.css" type="text/css"] -->
</head>

<body>
<h1><em><a href="http://www.ijs.si/software/amavisd/">amavisd-new</a></em>
documentation bits and pieces</h1>

<p>The most recent version of this document is available at
<a href="http://www.ijs.si/software/amavisd/amavisd-new-docs.html">
http://www.ijs.si/software/amavisd/amavisd-new-docs.html</a></p>

<ul>
<li><a href="#checks">performing mail checks</a></li>
<li><a href="#actions">acting on mail checks results</a></li>
<li><a href="#tagkill">tag, tag2 and kill levels</a></li>
<li><a href="#quarantine">quarantine</a></li>
<li><a href="#wblist">hard black- and whitelisting senders regarding spam</a></li>
<li><a href="#score_sender">soft black- and whitelisting
senders regarding spam -- @score_sender_maps</a></li>
<li><a href="#confvars">configuration variables</a></li>
<li><a href="#pbanks">policy banks</a></li>
<li><a href="#max_requests">$max_requests and max_use</a></li>
</ul>


<h2><a name="checks">Performing mail checks</a></h2>

<p>The following checks on mail are available</p>

<ul>
<li>mail header validity checks</li>
<li>banned names and types checks</li>
<li>virus checks</li>
<li>spam checks</li>
<li>is sender white- or blacklisted (regarding spam)</li>
</ul>

<p>Although checks are presently not performed in parallel, it is
best to consider the order of their evaluation unspecified (unknown).
Besides possible future parallel implementation, another reason is
the caching of results, where subsequent mail with the same contents
may benefit from earlier checks if validity of these check results
has not yet expired -- so a check result may be instantly available,
regardless of whether it has been asked for or not.</p>

<p>Using configuration variables @bypass_virus_checks_maps,
@bypass_banned_checks_maps, @bypass_header_checks_maps and
@bypass_spam_checks_maps each recipient (or administrator on their
behalf) may suggest that certain tests are not needed, primarily
for performance reasons. Although the @bypass_*_checks_maps pertain
to individual recipients, a mail check is an operation done on the
whole message, regardless of the number of recipients and their individual
preferences. Suggestion by some of the recipients that certain check is
not needed (is to be bypassed) does not guarantee the test will not be
performed.</p>

<p>Similarly the (hard) blacklisting or whitelisting of sender address
may make running spam check unnecessary, but it does not guarantee
the spam check result will not be available for subsequent decisions.</p>

<p>There are two primary reasons why a check result may still be available
despite the bypass hint or a sender being black- or whitelisted:</p>

<ul>
<li>a check result from some previous mail with the same contents
  has been cached and is still valid;</li>
<li>when mail has multiple recipients and not all of them agree
  that a check should be bypassed.</li>
</ul>

<p>The amavisd-new program is allowed to skip some check for performance
reasons if all recipients agree that a check is not necessary (that it may
be bypassed), or if the outcome of a check to be skipped could not influence
further mail processing and delivery/non-delivery of the message (as is
the case of a sender being black- or whitelisted regarding spam check).</p>

<p>For example spam checks may be skipped if it is already known that
a mail is infected. This is an implementation and optimization issue,
and no guarantee is given about interdependency of checks.
Future version may use a different strategy of performing checks
(e.g. some checks may be performed in parallel), as long as a change
does not affect the final outcome.</p>


<h2><a name="actions">Acting on mail checks results</a></h2>

<p>Based on the outcome of mail checks performed during mail analysis
or cached from previous mail with the same contents, and based on
global settings and individual recipient preferences, the program now
decides what action to perform next. As described in the previous section,
not all results of checks are necessarily know (e.g. if all recipients
voted for some check to be bypassed). For the purpose of deciding further
actions, unknown results of a check are considered equivalent to
negative (false) results, i.e. skipped virus check is treated the same
as non-infected mail, bypassed spam check is equivalent to low spam score
(ham).</p>

<p>The following decisions are made at this stage:</p>
<ul>
<li>whether a mail should be quarantined and how;</li>
<li>whether an administrator (and which administrator)
  should receive a notification (and which notification);</li>
<li>whether recipients should receive a notification;</li>
</ul>

<p>and regarding mail delivery and/or sender (non)delivery notifications:</p>
<ul>
<li>whether a mail should be delivered to each recipient or not;</li>
<li>whether delivered mail should be modified (header edits, defanging);</li>
<li>whether a sender should receive a (non)delivery notification (bounce);</li>
<li>what should be the final status code returned to the mailer
  (reject/pass).</li>
</ul>

<p>For the purpose of deciding on these actions, a mail is classified
based on all available checks results. It is quite possible that more
than one check results would be positive (e.g. virus and banned and
bad header, or spam and bad header, or virus and spam), yet a mail
is considered to be only in one category. The logic is currently
hard-wired into the program and can not be influenced by configuration
variables. The following order is used, the first condition met
decides the outcome:</p>

<ol>
<li>a virus is detected: mail is considered infected;</li>
<li>contains banned name or type: mail is considered banned;</li>
<li>spam level is above kill level for at least one recipient
  or a sender is blacklisted: mail is considered spam;</li>
<li>bad (invalid) headers: mail is considered as having a bad header.</li>
</ol>

<p>This decision order explains why amavisd-new is not free to skip
(to optimize away) virus checks if a presence of a banned name or a bad
header is already known or can easily be determined. The order was chosen
with the intention that a more informative or a stronger assertion is the
one to base further mail delivery on, and to be quoted in notifications
and in the log. Even at the expense of possibly longer processing time,
it is more important to declare a mail infected than complain about
a bad header, a banned executable or spamy contents.</p>

<p>The determined mail category now governs further action.
Administrators are notified if enabled for the category,
mail is quarantined if quarantining if enabled for the category,
recipients are notified if enabled for the category.</p>

<p>Next a mail delivery is attempted. A decision to deliver depends
on mail category and on global and individual recipient preferences.
The global setting $final_*_destiny=D_PASS or a per-recipient setting
@*_lovers_maps ensure mail delivery for corresponding mail category
even if mail would otherwise be blocked for being infected or banned
or spam or having a bad header.</p>

<p>A mail that is decided to be passed to an individual recipient
undergoes some simple header editing which happens on-the-fly during
mail forwarding. Certain mail header fields may be inserted or removed,
or an existing header field (e.g. Subject) may be modified. This header
editing may be different for each recipient even in multi-recipient
messages. If necessary, a multi-recipient mail is split into more than
one forwarding transaction, grouping (clustering) recipients with same
settings into one SMTP transaction.</p>

<p>Based on decisions to forward or to block mail to each recipient,
and on the global setting for the mail category ($final_*_destiny=D_BOUNCE
or D_REJECT), the sender (non)delivery notification is now prepared
in case of D_BOUNCE, and MTA receives a 2xx status (success); or in
case of D_REJECT the MTA receives a 5xx (reject) status and preparing
sender notifications is thus delegated to MTA (not recommended).</p>

<p>Even in cases of mail non-delivery when a (non-)delivery status
notification (DSN) for the sender should have been prepared and sent,
there are certain exceptions where the DSN is suppressed, which makes mail
effectively lost as far as the sender and the recipient are concerned
(but quarantining is not affected):</p>

<ul>
<li>when $final_*_destiny=D_DISCARD;</li>
<li>when mail is infected and the detected virus name matches
  the @viruses_that_fake_sender_maps;</li>
<li>when spam score exceeds level determined by @spam_dsn_cutoff_level_maps
  for all recipients;</li>
<li>when mail is coming from a mailing list, as determined by
  examining a mail header <i>Precedence:</i> for containing
  string 'bulk' or 'list' or 'junk';</li>
</ul>


<h2><a name="tagkill">tag, tag2 and kill levels</a></h2>

<p>When SpamAssassin is called upon to analyze a mail message, it returns
a spam score (spam level, hits), which is a numeric representation of
spaminess. The higher the number, the more spamy the message is considered.
Small numbers near zero or negative indicate a clean message, colloquially
called ham. The spam score is a characteristic of the whole message,
and does not depend on recipient preferences. SpamAssassin is called
only once for each message regardless of the number of recipients.</p>

<p>To determine further course of action, amavisd-new compares the spam score
to three numeric values: tag level, tag2 level and kill level. These values
may be different for each recipient, and the further actions may be different
for each recipient. If necessary, the mail forwarding is split into more
than one transaction to cater for different recipient preferences.</p>

<dl>
<dt>tag level</dt>
<dd>if spam score is at or above tag level, spam-related header fields
  (X-Spam-Status, X-Spam-Level) are inserted for local recipients;
  undef is interpreted as lower than any spam score;</dd>
<dt>tag2 level</dt>
<dd>if spam score is at or above tag2 level, spam-related header fields
  (X-Spam-Status, X-Spam-Level, X-Spam-Flag and X-Spam-Report)
  are inserted for local recipients, and X-Spam-Flag and X-Spam-Status
  bear a YES; also recipient address extension (if enabled) is tacked onto
  recipient address for local recipients; for these actions to have any
  effect, mail must be allowed to be delivered to a recipient;</dd>
<dt>kill level</dt>
<dd>if spam score is at or above kill level, mail is blocked; and
  sender receives a nondelivery notification unless spam score exceeds
  dsn cutoff level.</dd>
</dl>

<p>The general idea is that kill level is what controls the main actions
as far as MTA and amavisd-new is concerned (regardless of what recipients'
MUA later does with the mail).</p>

<p>Reaching kill level for at least one recipient controls the following:</p>
  
<ul>
<li>mail gets quarantined (unless disabled)</li>
<li>spam administrator gets a notification (unless disabled)</li>
<li>ContentSpamMsgs counter is incremented</li>
<li>spam defanging is done (unless disabled)</li>
<li>sender gets a notification if warnspamsender
  is true and $final_spam_destiny is D_PASS</li>
<li>if message is not delivered, sender gets a nondelivery
  notification (suppressed under certain conditions).</li>
</ul>

<p>On the other hand the tag2 level just adds some mark to the passed
mail (only for local recipients), which recipient or his MUA may decide
to act on or not. Specifically:</p>

<ul>
<li>Subject header field is modified (unless disabled)</li>
<li>X-Spam-Flag and X-Spam-Status header field get a Yes</li>
<li>address extension for spam gets tacked on the recipient address</li>
<li>and (perhaps inconsistently with the rest) the mail log entry says
  'Passed SPAM' instead of 'Passed CLEAN'.</li>
</ul>

<p>If a recipient (or its MUA) decides to discard the mail based on
tag2 marking, there is no way to retrieve it later from a quarantine,
the sender is never notified, spam administrator is never notified.
As far as the MTA and amavisd-new are concerned, the message was
successfully delivered. Whatever MUA does with the mail is entirely
the responsibility and jurisdiction of the recipient.</p>


<h2><a name="quarantine">Quarantine</a></h2>

<p>Mail quarantining is attempted when mail is infected or banned or
spam score for at least one of its recipients is at or above his kill level.
The <i>*quarantine_to</i> for each recipient (when nonempty), along with a
global corresponding <i>*_quarantine_method</i>, determines where the
quarantine location should be.</p>

<p>The <i>*_quarantine_method</i> can be considered a static and a site-wide
setting, generally controlling the format and location of the quarantine
on the system. The <i>*quarantine_to</i> can be considered the dynamic
part of the quarantine location, possibly affected by per-recipient settings
and the class of malware. It servers to fully specify the final location,
e.g. a file or a mailbox.</p>

<p>Depending on the mail category (type of malware), the following
variables specify the quarantine method: <tt>$virus_quarantine_method</tt>,
<tt>$spam_quarantine_method</tt>, <tt>$banned_files_quarantine_method</tt>,
and <tt>$bad_header_quarantine_method</tt>. One way to globally disable
quarantine is to specify undef or an empty string as a value of these
variables. A nonempty string should follow a syntax:</p>

<ul>
<li><tt>local:</tt><i>filename-template</i></li>
<li><tt>bsmtp:</tt><i>filename-template</i></li>
<li><tt>smtp:</tt><i>hostname</i><tt>:</tt><i>port</i></li>
<li><tt>smtp:[</tt><i>ip-address-or-hostname</i><tt>]:</tt><i>port</i></li>
</ul>

<p>The <tt>local:</tt> and <tt>bsmtp:</tt> methods are useful for
quarantining. The <tt>smtp:</tt> is currently not useful for quarantining
(is used in forwarding and notifications), and is only listed here for
completeness and possible future use.</p>

<p>Depending on the method specified (local/bsmtp/smtp) a per-recipient
setting <i>*quarantine_to</i> adopts different semantics and syntax,
possibly modified by the configuration variable <tt>$QUARANTINEDIR</tt>.</p>

<table border="1">
<tr>
  <th>method</th>
  <th>quarantine_to</th>
  <th><tt>$QUARANTINEDIR</tt></th>
  <th>effect</th></tr>
<tr>
  <td><tt>local:</tt></td>
  <td>e-mail address containing '@'-sign</td>
  <td>anything</td>
  <td>sent via SMTP to the mailer for storage,
      uses $notify_method to specify how to deliver to MTA</td></tr>
<tr>
  <td><tt>local:</tt></td>
  <td>pseudo-alias mapped through %local_delivery_aliases</td>
  <td>directory</td>
  <td>stored as an individual file
      below the directory <tt>$QUARANTINEDIR</tt>, file name comes
      from the template specified in the <i>*_quarantine_method</i>
  </td></tr>
<tr>
  <td><tt>local:</tt></td>
  <td>pseudo-alias mapped through %local_delivery_aliases</td>
  <td>filename of a mailbox</td>
  <td>appended to a file <tt>$QUARANTINEDIR</tt> in mbox format</td></tr>
<tr>
  <td><tt>local:</tt></td>
  <td>pseudo-alias mapped through %local_delivery_aliases</td>
  <td>empty or undef</td>
  <td>not quarantined</td></tr>
<tr>
  <td><tt>bsmtp:</tt></td>
  <td>anything (nonempty)</td>
  <td>anything</td>
  <td>stored in the file specified in the <i>*_quarantine_method</i>
   in BSMTP format
   (if file name is absolute, i.e. starts with a "/")</td></tr>
<tr>
  <td><tt>bsmtp:</tt></td>
  <td>anything (nonempty)</td>
  <td>directory</td>
  <td>stored in the file specified in the <i>*_quarantine_method</i>
   in BSMTP format
   (file name relative to <tt>$QUARANTINEDIR</tt>)</td></tr>
<tr>
  <td>empty or undef</td>
  <td>anything</td>
  <td>anything</td>
  <td>not quarantined</td></tr>
<tr>
  <td>anything</td>
  <td>empty or undef</td>
  <td>anything</td>
  <td>not quarantined</td></tr>
</table>

<p>The <i>*quarantine_to</i> is currently quite limited in functionality,
it is often used only to turn off the quarantining for some user or local
subdomain. The reason for this limited functionality is a more vulnerable
nature of this value, as it may come from SQL or LDAP lookups where
non-careful access controls to these databases might permit users to enter
any value in the <i>*quarantine_to</i> field, which is why we do not let
it control the directory or the exact file name of the quarantine file.
This may be somewhat relaxed in the future.</p>

<p>In common setups the quarantine location (e.g. a directory or a dedicated
mailbox) is the same for all recipients. If at least one recipient specifies
a nonempty <i>*quarantine_to</i> specifying this location, the message is
quarantined (stored) there once, regardless of the number of recipients.</p>

<p>The general algorithm is: the <i>*quarantine_to</i> value associated with
each recipient is looked up. Empty or undef values are ignored and duplicates
are discarded. A mail to be quarantined is then stored/sent to each
unique location remaining on the list.</p>

<p>The "bsmtp:" quarantine method is somewhat special in that the quarantine
file location is entirely determined by the <i>*_quarantine_method</i> setting,
and the value of per-recipient <i>*quarantine_to</i> settings do not influence
the quarantine location, as long as this value is nonempty.</p>

<p>When using the "bsmtp:" quarantine method and versions of amavisd-new
earlier than 2.2.0, the <i>*_quarantine_to</i> was completely ignored,
which made it impossible to turn off quarantining selectively for certain
users by specifying an empty or undef value. Since 2.2.0, an empty
<i>*_quarantine_to</i> turns off quarantine for a recipient regardless
of the quarantine method. A nonempty string in <i>*_quarantine_to</i>
(the exact value is ignored) must now be used even with "bsmtp:" to
enable quarantining.</p>


<h2><a name="wblist">Hard black- and whitelisting senders regarding spam</a></h2>

<p>The blacklisting and the whitelisting are ways of telling that we already
know that a message is spam or is ham (non-spam) just by examining the envelope
sender address and comparing it to lists of known spammers or to lists of
known legitimate senders of ham. It is a quick check, potentially saving us
the trouble of examining the mail contents. It has a big drawback however
in that the sender mail address can be (and often is) faked and there is no
guarantee that the claimed sender address represents the actual sender.</p>

<p>The sender address is usually faked for spam messages, so whitelisting
some sender address is a of questionable value, and often lets in far more
spam than it does good by approving legitimate mail. For a reliable way
of permitting certain sending clients to send spamy mail see <i>policy
banks</i>.</p>

<p>Blacklisting however is still useful: spammer has no desire to pretend
to be some blacklisted sending address, when he can choose any other address.
Genuine sender that is intentionally blacklisted can only avoid being
blocked by falsifying his address (joining spammers in his methods)
<em>and</em> sending non-spamy mail, the later being our objective anyway.
Although amavisd-new does provide blacklisting, it is functionally equivalent
but more effective to blacklist senders at the MTA, preventing such mail
from even entering the mail system.</p>

<p>It should be emphasized that whitelisting (and blacklisting) only affects
spam checks. It has no influence on other checks such as virus, banned or
header checks. Infected mail from whitelisted sender would still be blocked
if our policy is to block viruses.</p>

<p>Another point to bear in mind is that the sender address examined
is the one from the SMTP protocol, exactly as provide by MTA to amavisd-new.
It is know as the envelope sender address or return path. This address
does not necessarily match the mail author's address from the mail header
(From:) or the sender's address from the header (Sender:). This is most
obvious with mail from mailing lists, where the envelope sender address
is usually the address of a mailing list management service, while the
author's address (From:) is the address of a person sending the message.
Using the envelope sender address in most cases makes it easier to black-
or whitelist mail from mailing lists, compared to guessing a sender address
by parsing mail header.</p>

<p>To avoid surprises, whitelisted sender suppresses inserting/editing
the tag2-level header fields (X-Spam-*, Subject), appending spam address
extension, and quarantining, even if we know the message is spam (e.g.
because the spam check result on the same mail contents has been cached from
some earlier mail or known from check on behalf of another recipient).</p>

<p>For mail from blacklisted senders, the effect is as if the spam level
were artificially pushed high, resulting in 'X-Spam-Flag: YES', high
'X-Spam-Level' bar and other usual reactions to spam, including possible
rejection. If the message nevertheless still passes (e.g. for spam loving
recipients), it is tagged as BLACKLISTED in the 'X-Spam-Status' header field,
but the reported spam value and set of tests in this report header field
is not adjusted (if available from SpamAssassin, which may or may not have
been called)</p>

<p>If <em>all</em> recipients of a message either white- or blacklist the
sender, amavisd is free to skip spam scanning (calling the SpamAssassin),
saving on time. There is no guarantee however that spam scanning will
actually and always be skipped.</p>

<p>The following variables (lists of lookup tables) are available,
with the semantics and syntax as specified in README.lookups:
@whitelist_sender_maps, @blacklist_sender_maps, which implement
global policy applicable to all recipients. Similarly there are
$per_recip_blacklist_sender_lookup_tables and
$per_recip_whitelist_sender_lookup_tables, which make possible
for each recipient or subdomain to specify its own set of black-
or whitelisted senders. The per-recipient tables take precedence
over global tables.</p>

<p>For SQL lookups, amavisd-new will first lookup the recipient in table
<i>users</i> in order of descending priority, e.g. user@sub.domain.org,
user, @.sub.domain.org, @.domain.org, @.org, and @. (which can be considered
a catchall). Each matching recipient record may have a list of senders
associated (through join on field <i>users.id</i> and <i>wblist.rid</i>).
The sender address is then looked up in the associated list of senders
(<i>wblist</i>) in order of descending priority, e.g. sender@sub.example.com,
@.sub.example.com, @.example.com, @.com, and @. . This search stops at the
first matching sender record with a non-NULL field <i>wblist.wb</i>. The value
of a field <i>wblist.wb</i> from the matched record determines if the sender
is considered whitelisted ('W'), blacklisted ('B') or neutral ('&nbsp;')
for this recipient.</p>

<p>The neutral value is there just as a way to explicitly stop the search,
which may be used by a recipient to overrule site-wide or static
white- or blacklisting defaults for some specific sender, and to
explicitly neither whitelist nor blacklist the sender, letting the
normal spam check determine the spaminess of a mail.</p>

<p>For recipient user@sub.domain.com and sender sender@sub.example.com
the following search is performed:</p>

<pre>
user@sub.domain.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.
  
user
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.sub.domain.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.domain.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.org
  sender@sub.example.com @.sub.example.com @.example.com @.com @.

@.
  sender@sub.example.com @.sub.example.com @.example.com @.com @.
</pre>


<h2><a name="score_sender">Soft black- and whitelisting
senders regarding spam -- @score_sender_maps</a></h2>

<p>Instead of hard black- or whitelisting a sender address (unconditionally
considering mail spam or ham solely based on sender address regardless
of mail contents), a more gentle approach is to add score points (penalties)
to the spam score for mail from certain senders or sending domains.
Positive points lean towards blacklisting, negative towards whitelisting.
This is much like adding SpamAssassin rules or using its white/blacklisting,
except that here only envelope sender addresses are considered (not addresses
in a mail header), and that score points can be assigned per-recipient
(or per-domain or globally), and that the assigned penalties are customarily
much lower than the default SpamAssassin white/blacklisting score.</p>

<p>The table structure of @score_sender_maps is similar to
$per_recip_blacklist_sender_lookup_tables i.e. the first level key is
recipient address, pointing to by-sender lookup tables. The essential
difference is that scores from <em>all</em> matching by-recipient lookups
(not just the first that matches) are summed to give the final score boost.
That means that both the site and domain administrators, as well as the
recipient can have a say on the final score.</p>

<p>For SQL lookups, the mechanism is much like the one described for
hard black- or whitelisting, with the following differences:</p>
<ul>
<li>the field <i>wblist.wb</i> is numeric, representing score points,
  instead of containing a character W or B or space;</li>
<li>the search through matching recipients does not stop at the first
  match, but traverses all matching recipients, summing up the
  corresponding <i>wblist.wb</i> field values.</li>
</ul>

<p>Namely, amavisd will lookup the recipient, e.g. user@sub.domain.org,
user, @.sub.domain.org, @.domain.org, @.org, and @. . Since the search will
not stop at the first recipient match, the search order in this case is
unimportant, although it is actually the same descending-priority order as with
hard b/w listing. Each matching recipient record may have a list of senders
associated (through join on field <i>users.id</i> and <i>wblist.rid</i>).
The sender address is then looked up in the associated list of senders
(<i>wblist</i>) in order of descending priority, e.g. sender@sub.example.com,
@.sub.example.com, @.example.com, @.com, and @. . This search stops at the
first matching sender record with a non-NULL field <i>wblist.wb</i>, but this
does not terminate the outer recipients search. Numeric values of a field
<i>wblist.wb</i> from matched records are summed up across all matching
recipients tables, and the result is added to the spam score as produced
by SpamAssassin.</p>

<p>Unlike static tables, where hard and soft w/b-listing use separate
tables, the SQL-based hard and soft w/b-listing uses the same SQL tables
and the same field <i>wblist.wb</i>. Mixing the 'W', 'B' with numeric values
is somewhat frowned upon, but is supported to facilitate transition.
The search goes like described above as long as only numeric field values
are encountered, summing up the values and adding the accumulated sum
to the final score. If a non-numeric value of field <i>wblist.wb</i>
is encountered during this search, its value (W or B or space) is
interpreted as described for hard w/b listing, and the search stops at
this point.</p>


<h2><a name="confvars">Configuration variables</a></h2>

<p>The behaviour of the amavisd-new is controlled by a set of configuration
variables, which are just normal module-global Perl variables (in package
Amavis::Conf). At daemon startup time these variables are first assigned
an initial value (often just an undefined value, the undef). The default
values of configuration variables are documented in file amavisd.conf-defaults,
which lists all configuration variables.</p>

<p>Next a configuration file amavisd.conf (or other file as specified
by option -c) is read and interpreted by the Perl interpreter itself.
The amavisd.conf is just a normal Perl program, and can in principle
do whatever and however it pleases, but its main purpose is to assign
values to configuration variables.</p>

<p>After execution of amavisd.conf is done, the daemon may correct some
configuration variable values (mainly to maintain backwards compatibility
with earlier version of configuration file), and may assign a default value
to certain variables which are still undefined -- these variables and their
default values are marked "after-defaults" in the documentation file
amavisd.conf-defaults. The main reason for existence of the "after-defaults"
concept is that some default values depend on other configuration variables
and can not be computed before the amavisd.conf is finished. To force such
variables to an off/false/disabled state, one needs to assign some false but
defined value to them, such as '' (an empty string) or a 0 for booleans.</p>

<p>Perl variables always start with a character $, @ or % to indicate a type
of variable. This leading character is part of the variable name for all
practical purposes.</p>

<dl>
<dt>$ (dollar character)</dt>
<dd>indicates a scalar variable (a string, a number, a reference)</dd>
<dt>@ (at sign)</dt>
<dd>indicates an array variable (a list)</dd>
<dt>% (percent character)</dt>
<dd>indicates an associative array (also known as hash),
  which maps keys to values</dd>
</dl>

<p>A couple of Perl syntactical elements deserve mention at this point,
as they are often used in the amavisd.conf configuration file.</p>

<dl>
<dt>"...", a double-quoted string</dt>
<dd>is a string; variables within are evaluated, e.g. "$MYHOME/tmp"</dd>
<dt>'...', a single-quoted string</dt>
<dd>is a string; variables within are not evaluated,
  the $ and @ loose their special meaning, e.g. 'user@example.com'</dd>
<dt>(...)</dt>
<dd>is a list of comma-separated expressions, e.g. (1,2,"test");
  a list is normally assigned to an array variable</dd>
<dt>qw(string)</dt>
<dd>is an operator that interprets its argument as a single string,
  splits it on whitespace to words, and returns a list of words (strings);
  it is a convenience to avoid some typing,
  e.g. qw(user@example.com .example.net .org) is exactly equivalent
  to ('user@example.com', '.example.net', '.org');
</dd>
<dt>[...]</dt>  
<dd>is a reference to an anonymous list of comma-separated expressions,
  e.g. [1,2,"test"]; (note: a reference is a scalar)</dd>
<dt>{...}</dt>
<dd>is a reference to an anonymous associative array,
  e.g. {'alfa'=>1, 'beta'=>99, 'other'=>'test'};
  (note: a reference is a scalar)</dd>
<dt>\variable</dt>
<dd>is a reference to a variable, e.g. \$virus_admin, \@mynetworks, \%whitelist_sender;
  (note: a reference is a scalar)</dd>
</dl>

<p>Historically amavisd-new accessed all configuration directly with their name,
e.g. %spam_lovers, @spam_lovers_acl, $spam_lovers_re. Later it became apparent
that certain groups of variables (lookups) are always used together in the same
way, so new array variables like @spam_lovers_maps were introduced. The program
now never accesses old lookup table variables directly, but always through higher
level lists. The solution is fully backwards compatible, as the default value
for the new lists references the old variables, e.g.:</p>
<pre>
@spam_lovers_maps = (\%spam_lovers, \@spam_lovers_acl, \$spam_lovers_re);
</pre>

<p>Administrator is free to modify or replace the lists in variables like
@spam_lovers_maps, perhaps rearranging the order or loosing all references to
legacy variables, and replacing them with other variables, often anonymous
arrays/lists or anonymous associative maps (hashes), or constants which can
serve as a convenient catchall default value when used last in the list.</p>

<p>Since amavisd-new version 2.0, there is one further generalization step
in the way a program accesses configuration variables. More than a hundred
configuration variables which control amavisd-new operation on a by-message
level (as opposed to by-recipient and truly global settings) are now grouped
in associative array called a <i>policy bank</i>. These configuration variables
are no longer accessed directly by their variable name by the program, but
always through a currently installed policy bank. Administrator is free to
modify the policy bank, normally by providing replacement policy banks and
specifying under what conditions the replacement policy bank it to be
automatically installed.</p>


<h2><a name="pbanks">Policy banks</a></h2>

<p>Policy banks hold sets of configuration variables controlling most
of per-message settings, including: static lookup tables, IP interface
access rules, forwarding address, log level, templates, administrator
addresses, spam trigger levels, quarantine rules, lists of anti-virus
scanner entries (or just a subset), banned names rules, defang settings,
etc. The whole set of these settings may be replaced with another
predefined set based on incoming port number, making it possible for one
amavisd daemon to cope with more diverse needs of served user communities
which could so far only be implemented by running more than one instance
of the amavisd daemon, each with its own configuration file.</p>

<p>This mechanism brings new potentials for the future: in principle policy
banks could be swapped not only based on port number or SMTP client
IP address, but on any characteristics pertaining to a mail message as
a whole (not specific to each of its recipients), or to characteristics
of a connection from a mailer (e.g. the interface address or protocol);</p>

<p>Until a better mechanism is available, a policy bank named 'MYNETS' has
special semantics: this policy bank gets loaded whenever MTA supplies a
SMTP client IP address (through Postfix XFORWARD extension or a new AM.PDP
protocol) and that address matches the @mynetworks list.</p>

<p>An associative array %interface_policy is a current mechanism of
assigning a policy bank to an incoming TCP port number (port must be in
the list @$inet_socket_port). Whenever the connection from MTA is received,
first a built-in policy bank with an empty name -- the $policy_bank{''}
gets loaded, which brings in all the global/legacy settings.
Then it is overlaid by whatever configuration settings are in the bank
named in the $interface_policy{$port} if any, and finally the policy bank
named 'MYNETS' (i.e. settings from $policy_bank{'MYNETS'}) is overlaid
if such policy bank exists and the SMTP client IP address is known
(by XFORWARD SMTP extension command from MTA) and it matches @mynetworks.
See amavisd.conf-sample for examples.</p>

<p>When a new policy bank is overlaid over an existing set of configuration
variables, the variables not present in the new policy bank retain their
value.</p>

<p>The built-in policy bank (with empty name) is predefined, and includes
references to most other variables (the dynamic config variables),
which are accessed only indirectly through the currently installed
policy bank. Overlaying a policy bank with another policy bank may
bring in references to entirely different variables, possibly unnamed,
and may remove references to legacy variables if it so chooses.</p>

<p>Configuration variables are referenced from a policy bank (which
is implemented as a perl associative array, i.e. a hash) by keys of the
same name, e.g. { log_level => \$log_level, inet_acl => \@inet_acl, ...}.
For scalars one level of indirection is allowed, e.g.
a policy bank { log_level => \$log_level }; $log_level=2;
is equivalent to { log_level => $log_level } or to { log_level => 2 },
but in the first example with an indirect reference, the $log_level
may be assigned to even _after_ the policy bank has already been formed.</p>

<p>One word of caution: the syntax of entries within a policy bank hash
is slightly different from assignments to configuration variables.
This is because entries within policy bank are not assignments, but
key=>value pairs as in any Perl hash. And these pairs are delimited by
commas, unlike statements, which are delimited by semicolons.
Value is separated from its key by '=>' (or by a comma), whereas the
assignment operator is '='. Keys of a policy bank are without leading $
or @ or %, unlike variable names. Values of a hash can only be scalars
(e.g. strings or numbers or references to arrays or references to hashes).</p>

<p>Compare:</p>
<ul>
<li>value of a policy bank is a reference to a Perl hash, e.g.:
<pre>
    { log_level => 3,
      forward_method => 'smtp:[127.0.0.1]:10025',
      spam_admin_maps => ["spamalert\@$mydomain"],
    }
</pre>
</li>
<li>normal assignments look like:
<pre>
      $log_level = 3;
      $forward_method = 'smtp:[127.0.0.1]:10025';
      @spam_admin_maps = ("spamalert\@$mydomain");
</pre>
</li>
</ul>

<p>And a final note: Perl can detect and report typing mistakes in variable
names, but mistyped key is just some unused hash entry lurking in a hash,
never used and never reported as mistyped/useless.</p>

<h3>A case study</h3>

<p>The sender address can be faked, so comparing envelope sender address
to @local_domains_maps to base some important decisions on would not be
useful. The only reliable way to detect if mail is originating from inside
or from outside is to test the IP address of the sending SMTP client,
or to set up a separate MTA path for internally originating and for
externally originating mail.</p>

<p>Since in this particular example external mail comes in via fetchmail,
it is probably easiest to configure fetchmail to submit mail to Postfix
on a different port than any other internally originating mail uses.</p>

<p>In /etc/postfix/master.cf add another smtpd service,
listening for example on port 10088 :</p>

<pre>
10088 inet n - n - - smtpd -o content_filter=smtp-amavis:127.0.0.1:10026
</pre>

<p>This also tells Postfix that for mail coming in on port 10088
(from fetchmail) the content filter to be used is at port 10026
(not the default one at 10024, as configured by the global content_filter
in main.cf).</p>

<p>In .fetchmailrc add</p>
<pre>
  smtphost localhost/10088
</pre>
<p>to the poll section.</p>

<p>In amavisd.conf tell it to listen on port 10026,
besides the more usual 10024:</p>
<pre>
  $inet_socket_port = [10024,10026];
</pre>

<p>Now one may make up a name for a policy bank which will cover
only internally originating mail, let's pick a name INTERNAL.
Tell amavisd to load policy INTERNAL when a request comes in
on port 10024:</p>
<pre>
  $interface_policy{'10024'} = 'INTERNAL';
</pre>

<p>(alternatively, or in addition, one might make up another policy
and attach it to port 10026, but we'll just use the global settings
for the other port)</p>

<p>Now one may prepare the policy INTERNAL and specify there the options
which should be different from normal options for externally originating 
mail. For example:</p>

<pre>
$policy_bank{'INTERNAL'} = {
  log_level => 2,
  spam_admin_maps => ["virusalert\@$mydomain"],
  virus_admin_maps => ["virusalert\@$mydomain"],
  spam_kill_level_maps => [7.0],
  spam_dsn_cutoff_level_maps => [15],
  final_virus_destiny => D_BOUNCE,
# notify_spam_sender_templ => read_text("$MYHOME/notify_spam_sender.txt"),
};
</pre>


<h2><a name="max_requests">$max_requests and max_use</a></h2>

<p>Amavisd-new runs under process control of Net::Server. This is a pre-forked
environment where $max_servers child processes are constantly kept alive and
ready to accept new tasks (mail messages to be checked). Each amavisd child
process is able to handle several tasks in a row, which helps to reduce
startup (fork) costs. In case of SMTP or LMTP protocol, each session may
consist of several SMTP/LMTP transactions. Each SMTP/LMTP transaction is
counted a one task, regardless of whether it came in from the same SMTP/LMTP
client in a multi-transaction session, or as separate sessions, possibly
from different SMTP/LMTP clients.</p>

<p>A configuration variable $max_requests (default value 10) controls the
approximate number of tasks each child process is willing to handle. After
that the child process terminates and Net::Server provides a new child process
to take its place.</p>

<p>The exact value of $max_requests is not critical. There are two
opposing needs, and some in-between value should be chosen.</p>

<p>On the low side, the number should not be too small in order for the
startup cost to be averaged out / sufficiently diluted over an entire
child lifetime. A value above 5 or 10 meets this goal in most amavisd-new
configurations.</p>

<p>On the high side, the value depends on the amavisd-new configuration.
The amavisd daemon itself is conservative in its use of dynamically
allocated memory and does not load a mail into memory, but keeps mail
being processed and its components on files. Similarly, most of the
called external virus scanners and decoders are rational in their use
of memory (a notable exception is Archive::Tar which is used if cpio
command is not available). Unfortunately this is not true for Perl
module Mail::SpamAssassin, which expect to have an entire mail in
memory in order to be able to run its large set of rules on it in
reasonable time. This is a design decision made by SpamAssassin creators,
and we have to live with it.</p>

<p>When amavisd-new is not configured to use SpamAssassin, the value of
$max_requests can be quite high without any known or expected problems.
For general sanity reasons, an upper limit could be a 100 for example,
although anything above 20 or so would not bring measurable benefit to
the maximum sustained mail throughput.</p>

<p>When amavisd-new <em>is</em> configured to use SpamAssassin however,
the slurping of entire mail in memory may have implications, depending
on the maximum mail size allowed at the MTA (e.g. Postfix setting for
<i>message_size_limit</i>). Even though the allocated memory is reclaimed
by Perl after mail processing, and is reused for subsequent processing,
the process virtual memory footprint never shrinks, it can only expand
as needed.</p>

<p>With a default value of <i>message_size_limit</i> near 10 MB this is
not a serious problem, and $max_requests can be fairly large, although
since the additional performance gain is negligible for values beyond
20 or so, there is no good reason to choose much larger value than that.</p>

<p>Some sites however chose not to limit mail size, or increase the maximum
mail size limit substantially. If a large mail arrives at such site, the
virtual memory of amavisd child process is extended to accommodate the
message. For the rest of its lifetime the child process that processed
the mail stays at its high virtual memory size. If this happens frequently,
host resources may become scarce. Limiting the number of tasks each child
is supposed to process is very much desirable on such systems.</p>

<p>The default value of 10 for $max_servers was chosen as a reasonable
compromise between averaging-out the startup costs and not wasting too
much resources on hosts with high message size limit and SpamAssassin
enabled.</p>

<p>In the setup with Postfix where its lmtp client is chosen to
feed amavisd-new, this client tries to keep LMTP session open and
submit several mail messages in multiple transactions. With recent
Postfix versions its SMTP client is capable and willing of using
multiple transaction sessions as well, although it seems to be
less persistent than the LMTP client.</p>

<p>According to SMTP and LMTP protocol specifications, dropping the
session on the server side is considered rude and should be used
only as a last resort. In order to respect the $max_requests setting
(which is not strictly enforced by amavisd, and is considered an
advisory value), the client side should be configured with a
comparable limit. In case of Postfix, its smtp client service already
limits cached multiple transactions to 10 or so, so no special
options are needed on the Postfix side.</p>

<p>The current Postfix lmtp client is more persistent. In the future
it is expected to behave more like the smtp service, but until then
one may choose to apply the <i>max_use</i> Postfix limit to this service
(or globally if tolerable). A recommended value of <i>max_use</i>
(if feeding amavisd by lmtp) is the same or similar in value as
$max_requests.</p>

<hr />
<p>
<i><a href="http://www.ijs.si/people/mark/">mm</a></i>
<br />Last updated: 2004-10-27
</p>

<p>
<a href="http://validator.w3.org/check/referer"
><img class="noboarder" src="./valid-xhtml10.png" height="31" width="88"
      alt="Valid XHTML 1.0!" /></a>
</p>

</body>
</html>