STRESS_README.html   [plain text]


<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<title>Postfix Stress-Dependent Configuration</title>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body>

<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
Stress-Dependent Configuration</h1>

<hr>

<h2>Overview </h2>

<p> This document describes the symptoms of Postfix SMTP server
overload, and how to avoid the condition under normal conditions.
When the condition is caused by botnets or other malware, the
document suggests configuration settings that help to minimize the
impact on legitimate mail.  Finally, the document introduces
stress-adaptive behavior, introduced with Postfix 2.5, and how it
can be used to automatically switch configuration settings under
overload.  </p>

<p> Topics covered in this document: </p>

<ul>

<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> 

<li><a href="#concurrency"> Service more SMTP clients at the same time </a> 

<li><a href="#time"> Spend less time per SMTP client </a>

<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>

<li><a href="#desperate"> Take desperate measures </a>

<li><a href="#adapt"> Make Postfix behavior stress-adaptive </a>

<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>

<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>

<li><a href="#credits"> Credits </a>

</ul>

<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>

<p> Under normal conditions, Postfix responds immediately when a
remote SMTP client connects. The time needed to deliver mail should
be noticeable only with very large messages.  Performance degrades
more dramatically when the number of remote SMTP clients exceeds
the number of Postfix SMTP server processes.  When a client connects
while all server processes are busy, the client must wait until a
server process becomes available. </p>

<p> Overload may be caused by a legitimate mail (example: a DNS
registrar opens a new zone for registrations), by mistake (mail
explosion caused by a forwarding loop) or by illegitimate mail (worm
outbreak, botnet, or other malware activity).  Symptoms of Postfix
SMTP mail server overload are: </p>

<ul>

<li> <p> Remote SMTP clients experience a long delay before Postfix
sends the "220 hostname.example.com ESMTP Postfix" greeting.  If
this affects end-user mail clients, enable the "submission" service
entry in <a href="master.5.html">master.cf</a> (present since Postfix 2.1), and tell users to
connect to this instead of the public SMTP service. </p>

<li> <p> The Postfix SMTP server logs an increased number of "lost
connection after CONNECT" events. This happens because remote SMTP
clients disconnect before Postfix answers the connection. </p>

<li> <p> Postfix 2.3 and later logs a warning that all server ports
are busy: </p>

<pre>
Oct  3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
 (25) has reached its process limit "30": new clients may experience
 noticeable delays
Oct  3 20:39:27 spike postfix/master[28905]: warning: to avoid this
 condition, increase the process count in <a href="master.5.html">master.cf</a> or reduce the
 service time per client
</pre>

</ul>

<p> NOTE: The first two symptoms may also happen without overload,
for example: </p>

<ul>

<li> <p> Broken DNS also causes lengthy delays before "220
hostname.example.com
..." while the Postfix SMTP server tries to look up the client's
hostname.  </p>

<li> <p> A portscan for open SMTP ports also results in "lost
connection ..." logfile messages. </p>

</ul>

<p> Legitimate mail that doesn't get through during an episode of
overload is not necessarily lost. It should still arrive once the
situation returns to normal, as long as the overload condition is
temporary.  </p>

<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>

<p> To service more SMTP clients simultaneously, you need to increase
the number of SMTP server processes. This will improve the
responsiveness for remote SMTP clients, as long as the server machine
has enough hardware and software resources to run the additional
processes, and as long as the file system can keep up with the
additional load. </p>

<ul>

<li> <p> You increase the number of SMTP server processes either
by increasing the <a href="postconf.5.html#default_process_limit">default_process_limit</a> in <a href="postconf.5.html">main.cf</a> (line 3 below),
or by increasing the SMTP server's "maxproc" field in <a href="master.5.html">master.cf</a>
(line 10 below).  Either way, you need to issue a "postfix reload"
command to make the change effective.  </p>

<li> <p> Process limits above 1000 require Postfix version 2.4 or
later, and an operating system that supports kernel-based event
filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
</p>

<li> <p> You can reduce the Postfix memory footprint by using <a href="CDB_README.html">cdb</a>:
lookup tables instead of Berkeley DB. </p>

<pre>
 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
 2     # Raise the global process limit, 100 since Postfix 2.0.
 3     <a href="postconf.5.html#default_process_limit">default_process_limit</a> = 200
 4
 5 /etc/postfix/<a href="master.5.html">master.cf</a>:
 6     # =============================================================
 7     # service type  private unpriv  chroot  wakeup  maxproc command
 8     # =============================================================
 9     # Raise the SMTP service process limit only.
10     smtp      inet  n       -       n       -       200     smtpd
</pre>

<li> <p> NOTE: older versions of the <a href="SMTPD_POLICY_README.html">SMTPD_POLICY_README</a> document
contain a mistake: they configure a fixed number of policy daemon
processes.  When you raise the SMTP server's "maxproc" field in
<a href="master.5.html">master.cf</a>, SMTP server processes will report problems when connecting
to policy server processes, because there aren't enough of them.
Examples of errors are "connection refused" or "operation timed
out".  To fix, edit <a href="master.5.html">master.cf</a> and specify a zero "maxproc" field
in all policy server entries; see line 6 in the example below.
Issue a "postfix reload" command to make the change effective.  </p>

<pre>
1 /etc/postfix/<a href="master.5.html">master.cf</a>:
2     # =============================================================
3     # service type  private unpriv  chroot  wakeup  maxproc command
4     # =============================================================
5     # Disable the policy service process limit.
6     policy    unix  -       n       n       -       0       spawn
7         user=nobody argv=/some/where/policy-server
</pre>

</ul>

<h2><a name="time"> Spend less time per SMTP client </a></h2>

<p> When increasing the number of SMTP server processes is not
practical, you can improve Postfix server responsiveness by eliminating
unnecessary work. When Postfix spends less time per SMTP session, the
same number of SMTP server processes can service more clients in
the same amount of time. </p>

<ul>

<li> <p> Eliminate non-functional RBL lookups (blocklists that are
no longer in operation). These lookups can degrade performance.
Postfix logs a warning when an RBL server does not respond. </p>

<li> <p> Eliminate redundant RBL lookups (people often use multiple
Spamhaus RBLs that include each other).  To find out whether RBLs
include other RBLs, look up the websites that document the RBL's
policies. </p>

<li> <p> Eliminate <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a>, and keep just a few
emergency patterns to block the latest worm explosion or backscatter
mail.  See <a href="BACKSCATTER_README.html">BACKSCATTER_README</a> for examples of the latter.

<li> <p> Group your <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a> patterns to avoid
unnecessary pattern matching operations.

<pre>
 1  /etc/postfix/header_checks:
 2      if /^Subject:/
 3      /^Subject: virus found in mail from you/ reject
 4      /^Subject: ..../ ....
 5      endif
 6  
 7      if /^Received:/
 8      /^Received: from (postfix\.org) / reject forged client name in received header: $1
 9      /^Received: from .../ ....
10      endif
</pre>

</ul>

<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>

<p> Under conditions of overload you can improve Postfix SMTP server
responsiveness by hanging up on suspicious clients, so that other
clients get a chance to talk to Postfix.  </p>

<ul>

<li> <p> Use "421" reply codes for botnet-related RBLs or for
selected non-RBL restrictions. This causes Postfix 2.3 and later
to disconnect immediately without waiting for the remote SMTP
client to send a QUIT command. </p>

<p> You can set individual reject codes for RBLs, and for individual
responses from a specific RBL. We'll use zen.spamhaus.org as an
example; by the time you read this document, details may have
changed.  Right now, their documents say that a response of 127.0.0.10
or 127.0.0.11 indicates a dynamic client IP address, which means
that the machine is probably running a bot of some kind.  To give
a 421 response instead of the default 554 response, use something
like: </p>

<pre>
 1  /etc/postfix/<a href="postconf.5.html">main.cf</a>:
 2      <a href="postconf.5.html#smtpd_client_restrictions">smtpd_client_restrictions</a> =
 3         <a href="postconf.5.html#permit_mynetworks">permit_mynetworks</a>
 4         <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org=127.0.0.10
 5         <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org=127.0.0.11
 6         <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org
 7  
 8      <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = hash:/etc/postfix/rbl_reply_maps
 9  
10  /etc/postfix/rbl_reply_maps:
11      zen.spamhaus.org=127.0.0.10 421 4.7.1 Service unavailable;
12       $rbl_class [$rbl_what] blocked using
13       $rbl_domain${rbl_reason?; $rbl_reason}
14  
15      zen.spamhaus.org=127.0.0.11 421 4.7.1 Service unavailable;
16       $rbl_class [$rbl_what] blocked using
17       $rbl_domain${rbl_reason?; $rbl_reason}
</pre>

<p> Although the above shows three RBL lookups (lines 4-6), Postfix
will still only do a single DNS query, so the performance difference
is negligible. </p>

<p> The down-side of sending 421 instead of the default 554 is that
it works only for zombies and other malware. If the client is running
a real MTA, then it may connect again several times until the mail
expires in its queue. When this is a problem, stick with the default
554 reply, and use "<a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1" as described below.
</p>

<p> With Postfix 2.5, or with earlier releases that contain the
stress-adaptive behavior patch, you can turn on the above under
overload by replacing line 8 with: </p>

<pre>
 8      <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = ${stress?hash:/etc/postfix/rbl_reply_maps}
</pre>

<p> More information about automatic stress-adaptive behavior is
at the end of this document. </p>

</ul>

<h2><a name="desperate"> Take desperate measures </a></h2>

<p> The following measures will still allow <b>most</b> legitimate
clients to connect and send mail, but may affect some legitimate
clients. </p>

<ul>

<li> <p> Reduce <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> (default: 300s). Experience on the
postfix-users list from a variety of sysadmins shows that reducing
the "normal" <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 60s is unlikely to affect legitimate
clients. However, it is unlikely to become the Postfix default
because it's not RFC compliant. Setting <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 10s (line
2 below) or even 5s under stress will still allow <b>most</b>
legitimate clients to connect and send mail, but may delay mail
from some clients.  No mail should be lost, as long as this measure
is used only temporarily.  </p>

<li> <p> Reduce <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> (default: 20). Setting this
to 1 under stress (line 3 below) helps by disconnecting clients
after a single error, giving other clients a chance to connect.
However, this may cause significant delays with legitimate mail,
such as a mailing list that contains a few no-longer-active user
names that didn't bother to unsubscribe. No mail should be lost,
as long as this measure is used only temporarily. </p>

<li> <p> Disable remote SMTP client hostname lookups, so that all
SMTP client hostnames become "unknown" (line 5 below). This feature
was introduced with Postfix 2.3. Unfortunately, this measure is
more problematic than the other ones proposed sofar.  First, this
will result in loss of mail when you use hostname-based access rules
that reject mail from "unknown" SMTP clients (examples:
<a href="postconf.5.html#reject_unknown_client_hostname">reject_unknown_client_hostname</a>, <a href="postconf.5.html#reject_unknown_reverse_client_hostname">reject_unknown_reverse_client_hostname</a>).
Second, this may result in loss of mail when you subject "unknown"
SMTP clients to additional restrictions such as <a href="postconf.5.html#reject_unverified_sender">reject_unverified_sender</a>.
</p>

</ul>

<blockquote>
<pre>
1  /etc/postfix/<a href="postconf.5.html">main.cf</a>:
2      <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = 10
3      <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1
4      # Caution: line 5 may trigger REJECTs by hostname-based access rules 
5      <a href="postconf.5.html#smtpd_peername_lookup">smtpd_peername_lookup</a> = no
</pre>
</blockquote>

<p> Except with the last measure, no mail should be lost, as long
as these measures are used only temporarily. The next section of
this document introduces a way to automate this process. </p>

<h2><a name="adapt"> Make Postfix behavior stress-adaptive </a></h2>

<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
This is also available as an add-on patch for Postfix versions 2.4
and 2.3 from the mirrors listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>.
</p>

<p> It works as follows. When a "public" network service runs into
an "all server ports are busy" condition, the <a href="master.8.html">master(8)</a> daemon logs
a warning, restarts the service (without interrupting existing
network sessions), and runs the service with "-o stress=yes" on the
command line. Normally, it runs a stress-adaptive service with "-o
stress=" on the command line (i.e. with an empty parameter value).
Other services never have "-o stress" parameters on the command
line, including services that listen on a loopback interface only.
</p>

<p> The stress pseudo-parameter value is the key to making <a href="postconf.5.html">main.cf</a>
parameter settings stress adaptive: </p>

<blockquote>
<pre>
1  /etc/postfix/<a href="postconf.5.html">main.cf</a>:
2      <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = ${stress?10}${stress:300}
3      <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = ${stress?1}${stress:20}
</pre>
</blockquote>

<p> Translation: <p>

<ul>

<li> <p> Line 2: under conditions of stress, use an <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a>
value of 10 seconds instead of the default 300 seconds,

<li> <p> Line 3: under conditions of stress, use an <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a>
of 1 instead of the default 20. </p>

</ul>

<p> The syntax of ${name?value} and ${name:value} is explained at
the beginning of the <a href="postconf.5.html">postconf(5)</a> manual page. </p>

<p> NOTE: Please keep in mind that the stress-adaptive feature is
a fairly desperate measure to keep <b>some</b> legitimate mail
flowing under overload conditions.  If a site is reaching the SMTP
server process limit when there isn't an attack or bot flood
occurring, then either the process limit needs to be raised or more
hardware needs to be added.  </p>

<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>

<p> To find out if your Postfix installation supports stress-adaptive
behavior, use the "ps" command, and look for the smtpd processes.
Postfix has stress-adaptive support when you see "-o stress=" or
"-o stress=yes" command-line options. Remember that Postfix never
enables stress-adaptive behavior on servers that listen on local
addresses only. </p>

<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
and other System-V flavors, use "ps -ef" instead of "ps ax". </p>

<blockquote>
<pre>
$ ps ax|grep smtpd
83326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
84345  ??  Ss     0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
</pre>
</blockquote>

<p> You can't use <a href="postconf.1.html">postconf(1)</a> to detect stress-adaptive support.
The <a href="postconf.1.html">postconf(1)</a> command ignores the existence of the stress parameter
in <a href="postconf.5.html">main.cf</a>, because the parameter has no effect there.  Command-line
"-o parameter" settings always take precedence over <a href="postconf.5.html">main.cf</a> parameter
settings.  <p>

<p> If you configure stress-adaptive behavior in <a href="postconf.5.html">main.cf</a> when it
isn't supported, nothing bad will happen.  The processes will run
as if the stress parameter always has an empty value. </p>

<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>

<p> You can manually force stress-adaptive behavior on, by adding
a "-o stress=yes" command-line option in <a href="master.5.html">master.cf</a>. This can be
useful for testing overrides on the SMTP service. Issue "postfix
reload" to make the change effective.  </p>

<p> Note: setting the stress parameter in <a href="postconf.5.html">main.cf</a> has no effect for
services that accept remote connections. </p>

<blockquote>
<pre>
1 /etc/postfix/<a href="master.5.html">master.cf</a>:
2     # =============================================================
3     # service type  private unpriv  chroot  wakeup  maxproc command
4     # =============================================================
5     # 
6     smtp      inet  n       -       n       -       -       smtpd
7         -o stress=yes
8         -o . . .
</pre>
</blockquote>

<p> To permanently force stress-adaptive behavior off with a specific
service, specify "-o stress=" on its <a href="master.5.html">master.cf</a> command line.  This
may be desirable for the "submission" service. Issue "postfix reload"
to make the change effective.  </p>

<p> Note: setting the stress parameter in <a href="postconf.5.html">main.cf</a> has no effect for
services that accept remote connections. </p>

<blockquote>
<pre>
1 /etc/postfix/<a href="master.5.html">master.cf</a>:
2     # =============================================================
3     # service type  private unpriv  chroot  wakeup  maxproc command
4     # =============================================================
5     # 
6     submission inet n       -       n       -       -       smtpd
7         -o stress=
8         -o . . .
</pre>
</blockquote>

<h2><a name="credits"> Credits </a></h2>

<ul>

<li>  Thanks to the postfix-users mailing list members for sharing
early experiences with the stress-adaptive feature.

<li>  The RBL example and several other paragraphs of text were
adapted from postfix-users postings by Noel Jones.

<li>  Wietse implemented stress-adaptive behavior as the smallest
possible patch while he should be working on other things.

</ul>

</body> </html>