DEBUG_README.html   [plain text]


<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<title> Postfix Debugging Howto </title>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body>

<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>

<hr>

<h2>Purpose of this document</h2>

<p> This document describes how to debug parts of the Postfix mail
system when things do not work according to expectation. The methods
vary from making Postfix log a lot of detail, to running some daemon
processes under control of a call tracer or debugger. </p>

<p> The text assumes that the Postfix main.cf and master.cf
configuration files are stored in directory /etc/postfix. You can
use the command "<b>postconf config_directory</b>" to find out the
actual location of this directory on your machine. </p>

<p> Listed in order of increasing invasiveness, the debugging
techniques are as follows: </p>

<ul>

<li><a href="#logging">Look for obvious signs of trouble</a>

<li><a href="#trace_mail">Debugging Postfix from inside</a>

<li><a href="#no_chroot">Try turning off chroot operation in
master.cf</a>

<li><a href="#debug_peer">Verbose logging for specific SMTP
connections</a>

<li><a href="#sniffer">Record the SMTP session with a network
sniffer</a>

<li><a href="#verbose">Making Postfix daemon programs more verbose</a>

<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>

<li><a href="#auto_trace">Automatically tracing a Postfix daemon
process</a>

<li><a href="#ddd">Running daemon programs with the interactive
ddd debugger</a>

<li><a href="#screen">Running daemon programs with the interactive
gdb debugger</a>

<li><a href="#gdb">Running daemon programs under a non-interactive
debugger</a>

<li><a href="#unreasonable">Unreasonable behavior</a>

<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>

</ul>

<h2><a name="logging">Look for obvious signs of trouble</a></h2>

<p> Postfix logs all failed and successful deliveries to a logfile.
The file is usually called /var/log/maillog or /var/log/mail; the
exact pathname is defined in the /etc/syslog.conf file. </p>

<p> When Postfix does not receive or deliver mail, the first order
of business is to look for errors that prevent Postfix from working
properly:  </p>

<blockquote>
<pre>
% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b>
</pre>
</blockquote>

<p> Note: the most important message is near the BEGINNING of the
output.  Error messages that come later are less useful. </p>

<p> The nature of each problem is indicated as follows: </p>

<ul>

<li> <p> "<b>panic</b>" indicates a problem in the software itself
that only a programmer can fix. Postfix cannot proceed until this
is fixed. </p>

<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
permissions, incorrect configuration file settings that you can
fix.  Postfix cannot proceed until this is fixed. </p>

<li> <p> "<b>error</b>" reports an error condition. For safety
reasons, a Postfix process will terminate when more than 13 of these
happen. </p>

<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
problems that you may not be able to fix (such as a broken DNS
server elsewhere on the network) but may also indicate local
configuration errors that could become a problem later. </p>

</ul>

<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>

<p> Postfix version 2.1 and later can
produce mail delivery reports for debugging purposes. These reports
not only show sender/recipient addresses after address rewriting
and alias expansion or forwarding, they also show information about
delivery to mailbox, delivery to non-Postfix command, responses
from remote SMTP servers, and so on.
</p>

<p> Postfix can produce two types of mail delivery reports for
debugging: </p>

<ul>

<li> <p> What-if: report what would happen, but do not actually
deliver mail. This mode of operation is requested with: </p>

<pre>
% <b>/usr/sbin/sendmail -bv address...</b>
Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
</pre>

<li> <p> What happened: deliver mail and report successes and/or
failures, including replies from remote SMTP servers.  This mode
of operation is requested with: </p>

<pre>
% <b>/usr/sbin/sendmail -v address...</b>
Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
</pre>

</ul>

<p> These reports contain information that is generated by Postfix
delivery agents. Since these run as daemon processes that cannot
interact with users directly, the result is sent as mail to the
sender of the test message. The format of these reports is practically
identical to that of ordinary non-delivery notifications. </p>

<p> For a detailed example of a mail delivery status report, see
the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
section at the end of the ADDRESS_REWRITING_README document.  </p>

<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>

<p> A common mistake is to turn on chroot operation in the master.cf
file without going through all the necessary steps to set up a
chroot environment. This causes Postfix daemon processes to fail
due to all kinds of missing files. </p>

<p> The example below shows an SMTP server that is configured with
chroot turned off: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    # =============================================================
    # service type  private unpriv  <b>chroot</b>  wakeup  maxproc command
    #               (yes)   (yes)   <b>(yes)</b>   (never) (100)
    # =============================================================
    smtp      inet  n       -       <b>n</b>       -       -       smtpd
</pre>
</blockquote>

<p> Inspect master.cf for any processes that have chroot operation
not turned off. If you find any, save a copy of the master.cf file,
and edit the entries in question.  After executing the command
"<b>postfix reload</b>", see if the problem has gone away. </p>

<p> If turning off chrooted operation made the problem go away,
then congratulations.  Leaving Postfix running in this way is
adequate for most sites.  If you prefer chrooted operation, see
the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
BASIC_CONFIGURATION_README</a> file for information about how to
prepare Postfix for chrooted operation. </p>

<h2><a name="debug_peer">Verbose logging for specific SMTP
connections</a></h2>

<p> In /etc/postfix/main.cf, list the remote site name or address
in the debug_peer_list parameter. For example, in order to make
the software log a lot of information to the syslog daemon for
connections from or to the loopback interface: </p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debug_peer_list = 127.0.0.1
</pre>
</blockquote>

<p> You can specify one or more hosts, domains, addresses or
net/masks.  To make the change effective immediately, execute the
command "<b>postfix reload</b>". </p>

<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>

<p> This example uses <b>tcpdump</b>. In order to record a conversation
you need to specify a large enough buffer with the "<b>-s</b>"
option or else you will miss some or all of the packet payload.
</p>

<blockquote>
<pre>
# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
</pre>
</blockquote>

<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
use "<b>-s 2000</b>" instead.  </p>

<p> Run this for a while, stop with Ctrl-C when done. To view the
data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
</p>

<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>

<p> Append one or more "<b>-v</b>" options to selected daemon
definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>".
This will cause a lot of activity to be logged to the syslog daemon.
For example, to make the Postfix SMTP server process more verbose: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -v
</pre>
</blockquote>

<p> To diagnose problems with address rewriting specify a "<b>-v</b>"
option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to
diagnose problems with mail delivery specify a "<b>-v</b>"
option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8),
local(8), pipe(8), smtp(8), or virtual(8) delivery agent.  </p>

<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>

<p> Many systems allow you to inspect a running process with a
system call tracer. For example: </p>

<blockquote>
<pre>
# <b>trace -p process-id</b> (SunOS 4)
# <b>strace -p process-id</b> (Linux and many others)
# <b>truss -p process-id</b> (Solaris, FreeBSD)
# <b>ktrace -p process-id</b> (generic 4.4BSD)
</pre>
</blockquote>

<p> Even more informative are traces of system library calls.
Examples: </p>

<blockquote>
<pre>
# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
# <b>sotruss -p process-id</b> (Solaris)
</pre>
</blockquote>

<p> See your system documentation for details. </p>

<p> Tracing a running process can give valuable information about
what a process is attempting to do. This is as much information as
you can get without running an interactive debugger program, as
described in a later section. </p>

<h2><a name="auto_trace">Automatically tracing a Postfix daemon
process</a></h2>

<p> Postfix can attach a call tracer whenever a daemon process
starts.  Call tracers come in several kinds. </p>

<ol>

<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
<b>strace</b>, or <b>ktrace</b>.  These show the communication
between the process and the kernel. </p>

<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
These show calls of library routines, and give a better idea of
what is going on within the process. </p>

</ol>

<p> Append a <b>-D</b> option to the suspect command in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes the call tracer of your choice, for example:
</p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin;
         (truss -p $process_id 2&gt;&amp;1 | logger -p mail.info) &amp; sleep 5
</pre>
</blockquote>

<p> Type "<b>postfix reload</b>" and watch the logfile. </p>

<h2><a name="ddd">Running daemon programs with the interactive
ddd debugger</a></h2>

<p> If you have X Windows installed on the Postfix machine, then
an interactive debugger such as <b>ddd</b> can be convenient.
</p>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes <b>ddd</b>: </p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
         ddd $daemon_directory/$process_name $process_id &amp; sleep 5
</pre>
</blockquote>

<p> Be sure that <b>gdb</b> is in the command search path, and
export <b>XAUTHORITY</b> so that X access control works, for example:
</p>

<blockquote>
<pre>
% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
</pre>
</blockquote>

<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Stop and start the Postfix system.  This is necessary so that
Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
settings. </p>

<p> Whenever the suspect daemon process is started, a debugger
window pops up and you can watch in detail what happens. </p>

<h2><a name="screen">Running daemon programs with the interactive
gdb debugger</a></h2>

<p> If you have the screen command installed on the Postfix machine, then
you can run an interactive debugger such as <b>gdb</b> as follows. </p>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
</p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
        PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
        export HOME; screen -e^tt -dmS $process_name gdb
        $daemon_directory/$process_name $process_id &amp; sleep 2
</pre>
</blockquote>

<p> Be sure that <b>gdb</b> is in the command search path. </p>

<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Execute the command "<b>postfix reload</b>" and wait until a
daemon process is started (you can see this in the maillog file).
</p>

<p> Then attach to the screen, and debug away: </p>

<blockquote>
<pre>
# HOME=/root screen -r
gdb) continue
gdb) where
</pre>
</blockquote>

<h2><a name="gdb">Running daemon programs under a non-interactive
debugger</a></h2>

<p> If you do not have X Windows installed on the Postfix machine,
or if you are not familiar with interactive debuggers, then you
can try to run <b>gdb</b> in non-interactive mode, and have it
print a stack trace when the process crashes.  </p>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes the <b>gdb</b> debugger: </p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
        where; sleep 8640000) | gdb $daemon_directory/$process_name 
        $process_id 2&gt&amp;1
        &gt;$config_directory/$process_name.$process_id.log &amp; sleep 5
</pre>
</blockquote>

<p> Append a <b>-D</b> option to the suspect daemon in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Type "<b>postfix reload</b>" to make the configuration changes
effective.  </p>

<p> Whenever a suspect daemon process is started, an output file
is created, named after the daemon and process ID (for example,
smtpd.12345.log). When the process crashes, a stack trace (with
output from the "<b>where</b>" command) is written to its logfile.
</p>

<h2><a name="unreasonable">Unreasonable behavior</a></h2>

<p> Sometimes the behavior exhibited by Postfix just does not match the
source code. Why can a program deviate from the instructions given
by its author? There are two possibilities. </p>

<ul>

<li> <p> The compiler has erred. This rarely happens. </p>

<li> <p> The hardware has erred. Does the machine have ECC memory? </p>

</ul>

<p> In both cases, the program being executed is not the program
that was supposed to be executed, so anything could happen. </p>

<p> There is a third possibility: </p>

<ul>

<li> <p> Bugs in system software (kernel or libraries). </p>

</ul>

<p> Hardware-related failures usually do not reproduce in exactly
the same way after power cycling and rebooting the system.  There's
little Postfix can do about bad hardware.  Be sure to use hardware
that at the very least can detect memory errors.  Otherwise, Postfix
will just be waiting to be hit by a bit error.  Critical systems
deserve real hardware. </p>

<p> When a compiler makes an error, the problem can be reproduced
whenever the resulting program is run. Compiler errors are most
likely to happen in the code optimizer. If a problem is reproducible
across power cycles and system reboots, it can be worthwhile to
rebuild Postfix with optimization disabled, and to see if optimization
makes a difference. </p>

<p> In order to compile Postfix with optimizations turned off: </p>

<blockquote>
<pre>
% <b>make tidy</b>
% <b>make makefiles OPT=</b>
</pre>
</blockquote>

<p> This produces a set of Makefiles that do not request compiler
optimization.  </p>

<p> Once the makefiles are set up, build the software: </p>

<blockquote>
<pre>
% <b>make</b>
% <b>su</b>
Password:
# <b>make install</b>
</pre>
</blockquote>

<p> If the problem goes away, then it is time to ask your vendor
for help. </p>

<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>

<p> The people who participate on postfix-users@postfix.org
are very helpful, especially if YOU provide them with sufficient
information.  Remember, these volunteers are willing to help, but
their time is limited. </p>

<p> When reporting a problem, be sure to include the following
information. </p>

<ul>

<li> <p> A summary of the problem. Please do not just send some
logging without explanation of what YOU believe is wrong. </p>

<li> <p> Complete error messages. Please use cut-and-paste, or use
attachments, instead of reciting information from memory.
</p>

<li> <p> Postfix logging. See the text at the top of the DEBUG_README
document to find out where logging is stored. Please do not frustrate
the helpers by word wrapping the logging. If the logging is more
than a few kbytes of text, consider posting an URL on a web or ftp
site. </p>

<li> <p> Consider using a test email address so that you don't have
to reveal email addresses or passwords of innocent people. </p>

<li> <p> If you can't use a test email address, please anonymize
email addresses and host names consistently. Replace each letter
by "A", each digit
by "D" so that the helpers can still recognize syntactical errors.
</p>

<li> <p> Output from "<b>postconf -n</b>". Please do not send your
main.cf file, or 500+ lines of <b>postconf</b> output. </p>

<li> <p> Better, provide output from the <b>postfinger</b> tool.
This can be found at http://ftp.wl0.org/SOURCES/postfinger.  </p>

<li> <p> If the problem is SASL related, consider including the
output from the <b>saslfinger</b> tool. This can be found at
http://postfix.state-of-mind.de/patrick.koetter/saslfinger/. </p>

<li> <p> If the problem is about too much mail in the queue, consider
including output from the <b>qshape</b> tool, as described in the
QSHAPE_README file. </p>

<li> <p> If the problem is protocol related (connections time out,
or an SMTP server complains about syntax errors etc.) consider
recording a session with <b>tcpdump</b>, as described in the <a
href="#sniffer">DEBUG_README</a> document.  </ul>

</body>

</html>