DEBUG_README.html   [plain text]

<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"



<title> Postfix Debugging Howto </title>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">



<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>


<h2>Purpose of this document</h2>

<p> This document describes how to debug parts of the Postfix mail
system when things do not work according to expectation. The methods
vary from making Postfix log a lot of detail, to running some daemon
processes under control of a call tracer or debugger. </p>

<p> The text assumes that the Postfix and
configuration files are stored in directory /etc/postfix. You can
use the command "<b>postconf config_directory</b>" to find out the
actual location of this directory on your machine. </p>

<p> Listed in order of increasing invasiveness, the debugging
techniques are as follows: </p>


<li><a href="#logging">Look for obvious signs of trouble</a>

<li><a href="#trace_mail">Debugging Postfix from inside</a>

<li><a href="#no_chroot">Try turning off chroot operation in</a>

<li><a href="#debug_peer">Verbose logging for specific SMTP

<li><a href="#sniffer">Record the SMTP session with a network

<li><a href="#verbose">Making Postfix daemon programs more verbose</a>

<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>

<li><a href="#auto_trace">Automatically tracing a Postfix daemon

<li><a href="#xxgdb">Running daemon programs with the interactive
xxgdb debugger</a>

<li><a href="#gdb">Running daemon programs under a non-interactive

<li><a href="#unreasonable">Unreasonable behavior</a>

<li><a href="#mail">Reporting problems to</a>


<h2><a name="logging">Look for obvious signs of trouble</a></h2>

<p> Postfix logs all failed and successful deliveries to a logfile.
The file is usually called /var/log/maillog or /var/log/mail; the
exact pathname is defined in the /etc/syslog.conf file. </p>

<p> When Postfix does not receive or deliver mail, the first order
of business is to look for errors that prevent Postfix from working
properly:  </p>

% egrep '(warning|error|fatal|panic):' /some/log/file | more

<p> Note: the most important message is near the BEGINNING of the
output.  Error messages that come later are less useful. </p>

<p> The nature of each problem is indicated as follows: </p>


<li> <p> "<b>panic</b>" indicates a problem in the software itself
that only a programmer can fix. Postfix cannot proceed until this
is fixed. </p>

<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
permissions, incorrect configuration file settings that you can
fix.  Postfix cannot proceed until this is fixed. </p>

<li> <p> "<b>error</b>" reports a fatal or non-fatal error condition.
Postfix cannot proceed until this is fixed.  </p>

<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
problems that you may not be able to fix (such as a broken DNS
server elsewhere on the network) but may also indicate local
configuration errors that could become a problem later. </p>


<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>

<p> With Postfix version 2.1 and later you can ask Postfix to
produce mail delivery reports for debugging purposes. These reports
not only show sender/recipient addresses after address rewriting
and alias expansion or forwarding, they also show information about
delivery to mailbox, delivery to non-Postfix command, responses
from remote SMTP servers, and so on.

<p> Postfix can produce two types of mail delivery reports for
debugging: </p>


<li> <p> What-if: report what would happen, but do not actually
deliver mail. This mode of operation is requested with: </p>

$ <b>/usr/sbin/sendmail -bv address...</b>
Mail Delivery Status Report will be mailed to &lt;your login name&gt;.

<li> <p> What happened: deliver mail and report successes and/or
failures, including replies from remote SMTP servers.  This mode
of operation is requested with: </p>

$ <b>/usr/sbin/sendmail -v address...</b>
Mail Delivery Status Report will be mailed to &lt;your login name&gt;.


<p> These reports contain information that is generated by Postfix
delivery agents. Since these run as daemon processes and do not
interact with users directly, the result is sent as mail to the
sender of the test message. The format of these reports is practically
identical to that of ordinary non-delivery notifications. </p>

<p> For a detailed example of a mail delivery status report, see
the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
section at the end of the ADDRESS_REWRITING_README document.  </p>

<h2><a name="no_chroot">Try turning off chroot operation in</a></h2>

<p> A common mistake is to turn on chroot operation in the
file without going through all the necessary steps to set up a
chroot environment. This causes Postfix daemon processes to fail
due to all kinds of missing files. </p>

<p> The example below shows an SMTP server that is configured with
chroot turned off: </p>

    # =============================================================
    # service type  private unpriv  <b>chroot</b>  wakeup  maxproc command
    #               (yes)   (yes)   <b>(yes)</b>   (never) (100)
    # =============================================================
    smtp      inet  n       -       <b>n</b>       -       -       smtpd

<p> Inspect for any processes that have chroot operation
not turned off. If you find any, save a copy of the file,
and edit the entries in question.  After executing the command
"<b>postfix reload</b>", see if the problem has gone away. </p>

<p> If turning off chrooted operation made the problem go away,
then congratulations.  Leaving Postfix running in this way is
adequate for most sites.  If you prefer chrooted operation, see
the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
BASIC_CONFIGURATION_README</a> file for information about how to
prepare Postfix for chrooted operation. </p>

<h2><a name="debug_peer">Verbose logging for specific SMTP

<p> In /etc/postfix/, list the remote site name or address
in the debug_peer_list parameter. For example, in order to make
the software log a lot of information to the syslog daemon for
connections from or to the loopback interface: </p>

    debug_peer_list =

<p> You can specify one or more hosts, domains, addresses or
net/masks.  To make the change effective immediately, execute the
command "<b>postfix reload</b>". </p>

<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>

<p> This example uses <b>tcpdump</b>. In order to record a conversation
you need to specify a large enough buffer with the "-s" option or
else you will miss some or all of the packet payload. </p>

# tcpdump -w /file/name -s 2000 host and port 25

<p> Run this for a while, stop with Ctrl-C when done. To view the
data use a binary viewer, or <b>ethereal</b>, or use my <b>tcpdumpx</b>
utility that is available from

<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>

<p> Append one or more "<b>-v</b>" options to selected daemon
definitions in /etc/postfix/ and type "<b>postfix reload</b>".
This will cause a lot of activity to be logged to the syslog daemon.
Example: </p>

    smtp      inet  n       -       n       -       -       smtpd -v

<p> This makes the Postfix SMTP server more verbose. To diagnose
problems with address rewriting one would specify a "<b>-v</b>"
option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to
diagnose problems with mail delivery one would specify a "<b>-v</b>"
option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8),
local(8), pipe(8), smtp(8), or virtual(8) delivery agent.  </p>

<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>

<p> Many systems allow you to inspect a running process with a
system call tracer. For example: </p>

# trace -p process-id (SunOS 4)
# strace -p process-id (Linux and many others)
# truss -p process-id (Solaris, FreeBSD)
# ktrace -p process-id (generic 4.4BSD)

<p> Even more informative are traces of system library calls.
Examples: </p>

# ltrace -p process-id (Linux, also ported to FreeBSD and BSD/OS)
# sotruss -p process-id (Solaris)

<p> See your system documentation for details. </p>

<p> Tracing a running process can give valuable information about
what a process is attempting to do. This is as much information as
you can get without running an interactive debugger program, as
described in a later section. </p>

<h2><a name="auto_trace">Automatically tracing a Postfix daemon

<p> Postfix can attach a call tracer whenever a daemon process
starts.  Call tracers come in several kinds. </p>


<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
<b>strace</b>, or <b>ktrace</b>.  These show the communication
between the process and the kernel. </p>

<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
These show calls of library routines, and give a better idea of
what is going on within the process. </p>


<p> Append a <b>-D</b> option to the suspect command in
/etc/postfix/, for example: </p>

    smtp      inet  n       -       n       -       -       smtpd -D

<p> Edit the debugger_command definition in /etc/postfix/
so that it invokes the call tracer of your choice, for example:

    debugger_command =
         (truss -p $process_id 2&gt&amp;1 | logger -p &amp; sleep 5

<p> Type "<b>postfix reload</b>" and watch the logfile. </p>

<h2><a name="xxgdb">Running daemon programs with the interactive
xxgdb debugger</a></h2>

<p> If you have X Windows installed on the Postfix machine, then
an interactive debugger such as <b>xxgdb</b> can be convenient.

<p> Edit the debugger_command definition in /etc/postfix/
so that it invokes <b>xxgdb</b>: </p>

    debugger_command =
         xxgdb $daemon_directory/$process_name $process_id &amp; sleep 5

<p> Be sure that <b>gdb</b> is in the command search path, and
export <b>XAUTHORITY</b> so that X access control works, for example:

% setenv XAUTHORITY ~/.Xauthority (csh syntax)
$ export XAUTHORITY=$HOME/.Xauthority (sh syntax)

<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/, for example: </p>

    smtp      inet  n       -       n       -       -       smtpd -D

<p> Stop and start the Postfix system.  This is necessary so that
Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
settings. </p>

<p> Whenever the suspect daemon process is started, a debugger
window pops up and you can watch in detail what happens. </p>

<h2><a name="gdb">Running daemon programs under a non-interactive

<p> If you do not have X Windows installed on the Postfix machine,
or if you are not familiar with interactive debuggers, then you
can try to run <b>gdb</b> in non-interactive mode, and have it
print a stack trace when the process crashes.  </p>

<p> Edit the debugger_command definition in /etc/postfix/
so that it invokes the <b>gdb</b> debugger: </p>

    debugger_command =
        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont;
        echo where) | gdb $daemon_directory/$process_name $process_id 2&gt&amp;1
        &gt;$config_directory/$process_name.$process_id.log &amp; sleep 5

<p> Append a <b>-D</b> option to the suspect daemon in
/etc/postfix/, for example: </p>

    smtp      inet  n       -       n       -       -       smtpd -D

<p> Type "<b>postfix reload</b>" to make the configuration changes
effective.  </p>

<p> Whenever a suspect daemon process is started, an output file
is created, named after the daemon and process ID (for example,
smtpd.12345.log). When the process crashes, a stack trace (with
output from the "<b>where</b>" command) is written to its logfile.

<h2><a name="unreasonable">Unreasonable behavior</a></h2>

<p> Sometimes the behavior exhibited by Postfix just does not match the
source code. Why can a program deviate from the instructions given
by its author? There are two possibilities. </p>


<li> <p> The compiler has erred. This rarely happens. </p>

<li> <p> The hardware has erred. Does the machine have ECC memory? </p>


<p> In both cases, the program being executed is not the program
that was supposed to be executed, so anything could happen. </p>

<p> There is a third possibility: </p>


<li> <p> Bugs in system software (kernel or libraries). </p>


<p> Hardware-related failures usually do not reproduce in exactly
the same way after power cycling and rebooting the system.  There's
little Postfix can do about bad hardware.  Be sure to use hardware
that at the very least can detect memory errors.  Otherwise, Postfix
will just be waiting to be hit by a bit error.  Critical systems
deserve real hardware. </p>

<p> When a compiler makes an error, the problem can be reproduced
whenever the resulting program is run. Compiler errors are most
likely to happen in the code optimizer. If a problem is reproducible
across power cycles and system reboots, it can be worthwhile to
rebuild Postfix with optimization disabled, and to see if optimization
makes a difference. </p>

<p> In order to compile Postfix with optimizations turned off: </p>

% make tidy
% make makefiles OPT=

<p> This produces a set of Makefiles that do not request compiler
optimization.  </p>

<p> Once the makefiles are set up, build the software: </p>

% make
% su
# make install

<p> If the problem goes away, then it is time to ask your vendor
for help. </p>

<h2><a name="mail">Reporting problems to</a></h2>

<p> The people who participate on the
are very helpful, especially if YOU provide them with sufficient
information.  Remember, these volunteers are willing to help, but
their time is limited. </p>

<p> When reporting a problem, be sure to include the following
information. </p>


<li> <p> A summary of the problem. Please do not just send some
logging without explanation of what YOU believe is wrong. </p>

<li> <p> Consider using a test email address so that you don't have
to reveal email addresses of innocent people. </p>

<li> <p> If you can't use a test email address, please anonymize
information consistently. Replace each letter by "A", each digit
by "D" so that the helpers can still recognize syntactical errors.

<li> <p> Complete error messages. Please use cut-and-paste, or use
attachments, instead of reciting information from memory.

<li> <p> Postfix logging. See the text at the top of the DEBUG_README
document to find out where logging is stored. Please do not frustrate
the helpers by word wrapping the logging. </p>

<li> <p> Output from "postconf -n". Please do not send your
file.  Or better, provide output from the "postfinger" tool.  </p>

<li> <p> If the problem is about too much mail in the queue, consider
including output from the qshape tool, as described in the
QSHAPE_README file. </p>

<li> <p> If the problem is protocol related (connections time out
or an SMTP server complains about syntax errors etc.) consider
recording a session with tcpdump, as described in the DEBUG_README
document.  </ul>