NOTES   [plain text]


   (This document was generated from design-notes.html)

     _________________________________________________________________

                           Design Notes On Fetchmail

   These notes are for the benefit of future hackers and maintainers. The
   following  sections  are  both  functional  and  narrative,  read from
   beginning to end.

                                    History

   A  direct  ancestor  of  the fetchmail program was originally authored
   (under  the  name popclient) by Carl Harris <ceharris@mal.com>. I took
   over  development  in  June  1996 and subsequently renamed the program
   `fetchmail' to reflect the addition of IMAP support and SMTP delivery.
   In  early  November  1996  Carl  officially ended support for the last
   popclient versions.

   Before accepting responsibility for the popclient sources from Carl, I
   had   investigated  and  used  and  tinkered  with  every  other  UNIX
   remote-mail   forwarder   I   could   find,   including   fetchpop1.9,
   PopTart-0.9.3,   get-mail,   gwpop,   pimp-1.0,  pop-perl5-1.2,  popc,
   popmail-1.6  and  upop.  My  major  goal  was  to get a header-rewrite
   feature  like  fetchmail's  working  so I wouldn't have reply problems
   anymore.

   Despite  having  done  a good bit of work on fetchpop1.9, when I found
   popclient  I  quickly  concluded that it offered the solidest base for
   future  development. I was convinced of this primarily by the presence
   of    multiple-protocol    support.    The   competition   didn't   do
   POP2/RPOP/APOP,  and  I  was  already  having  vague thoughts of maybe
   adding  IMAP.  (This would advance two other goals: learn IMAP and get
   comfortable writing TCP/IP client software.)

   Until  popclient  3.05  I was simply following out the implications of
   Carl's  basic design. He already had daemon.c in the distribution, and
   I  wanted  daemon  mode almost as badly as I wanted the header rewrite
   feature. The other things I added were bug fixes or minor extensions.

   After  3.1,  when  I  put  in SMTP-forwarding support (more about this
   below)   the   nature   of   the   project  changed  --  it  became  a
   carefully-thought-out  attempt  to render obsolete every other program
   in its class. The name change quickly followed.

                              The rewrite option

   MTAs ought to canonicalize the addresses of outgoing non-local mail so
   that  From:,  To:,  Cc:,  Bcc:  and other address headers contain only
   fully  qualified  domain  names.  Failure to do so can break the reply
   function on many mailers. (Sendmail has an option to do this.)

   This  problem  only  becomes  obvious  when  a reply is generated on a
   machine  different  from  where  the  message  was  delivered. The two
   machines  will  have  different  local  username  spaces,  potentially
   leading to misrouted mail.

   Most  MTAs  (and  sendmail  in particular) do not canonicalize address
   headers  in  this way (violating RFC 1123). Fetchmail therefore has to
   do it. This is the first feature I added to the ancestral popclient.

                                Reorganization

   The  second  thing I did reorganize and simplify popclient a lot. Carl
   Harris's  implementation  was  very  sound,  but  exhibited  a kind of
   unnecessary  complexity  common  to many C programmers. He treated the
   code  as central and the data structures as support for the code. As a
   result,  the  code  was beautiful but the data structure design ad-hoc
   and rather ugly (at least to this old LISP hacker).

   I  was  able  to improve matters significantly by reorganizing most of
   the  program around the `query' data structure and eliminating a bunch
   of  global  context.  This  especially simplified the main sequence in
   fetchmail.c and was critical in enabling the daemon mode changes.

                       IMAP support and the method table

   The  next  step was IMAP support. I initially wrote the IMAP code as a
   generic  query driver and a method table. The idea was to have all the
   protocol-independent  setup  logic  and flow of control in the driver,
   and the protocol-specific stuff in the method table.

   Once   this   worked,  I  rewrote  the  POP3  code  to  use  the  same
   organization.  The  POP2  code  kept  its own driver for a couple more
   releases,  until I found sources of a POP2 server to test against (the
   breed seems to be nearly extinct).

   The  purpose  of  this reorganization, of course, is to trivialize the
   development  of  support for future protocols as much as possible. All
   mail-retrieval protocols have to have pretty similar logical design by
   the  nature  of the task. By abstracting out that common logic and its
   interface   to   the   rest  of  the  program,  both  the  common  and
   protocol-specific parts become easier to understand.

   Furthermore,  many  kinds  of  new features can instantly be supported
   across all protocols by modifying the one driver module.

                        Implications of smtp forwarding

   The  direction  of the project changed radically when Harry Hochheiser
   sent me his scratch code for forwarding fetched mail to the SMTP port.
   I  realized  almost immediately that a reliable implementation of this
   feature would make all the other delivery modes obsolete.

   Why  mess  with all the complexity of configuring an MDA or setting up
   lock-and-append on a mailbox when port 25 is guaranteed to be there on
   any  platform  with TCP/IP support in the first place? Especially when
   this  means  retrieved  mail is guaranteed to look like normal sender-
   initiated SMTP mail, which is really what we want anyway.

   Clearly,  the  right  thing to do was (1) hack SMTP forwarding support
   into  the  generic  driver,  (2)  make  it  the  default mode, and (3)
   eventually throw out all the other delivery modes.

   I  hesitated  over  step  3  for some time, fearing to upset long-time
   popclient  users  dependent  on  the alternate delivery mechanisms. In
   theory,  they  could  immediately  switch  to  .forward files or their
   non-sendmail  equivalents  to  get  the  same effects. In practice the
   transition might have been messy.

   But  when  I  did it (see the NEWS note on the great options massacre)
   the  benefits  proved  huge.  The  cruftiest  parts of the driver code
   vanished.  Configuration  got  radically simpler -- no more grovelling
   around  for  the  system MDA and user's mailbox, no more worries about
   whether the underlying OS supports file locking.

   Also, the only way to lose mail vanished. If you specified localfolder
   and the disk got full, your mail got lost. This can't happen with SMTP
   forwarding  because  your  SMTP  listener  won't  return OK unless the
   message can be spooled or processed.

   Also,  performance improved (though not so you'd notice it in a single
   run).  Another  not  insignificant benefit of this change was that the
   manual page got a lot simpler.

   Later,  I  had  to bring --mda back in order to allow handling of some
   obscure  situations involving dynamic SLIP. But I found a much simpler
   way to do it.

   The  moral?  Don't  hesitate to throw away superannuated features when
   you  can  do  it  without loss of effectiveness. I tanked a couple I'd
   added  myself  and  have  no  regrets  at  all. As Saint-Exupery said,
   "Perfection  [in design] is achieved not when there is nothing more to
   add, but rather when there is nothing more to take away." This program
   isn't perfect, but it's trying.

        The most-requested features that I will never add, and why not:

Password encryption in .fetchmailrc

   The  reason  there's  no  facility to store passwords encrypted in the
   .fetchmailrc file is because this doesn't actually add protection.

   Anyone  who's  acquired  the  0600  permissions  needed  to  read your
   .fetchmailrc  file  will be able to run fetchmail as you anyway -- and
   if  it's  your  password  they're  after,  they'd  be  able to rip the
   necessary decoder out of the fetchmail code itself to get it.

   All .fetchmailrc encryption would do is give a false sense of security
   to people who don't think very hard.

Truly concurrent queries to multiple hosts

   Occasionally  I  get a request for this on "efficiency" grounds. These
   people  aren't  thinking  either. True concurrency would do nothing to
   lessen  fetchmail's  total IP volume. The best it could possibly do is
   change the usage profile to shorten the duration of the active part of
   a  poll  cycle  at  the cost of increasing its demand on IP volume per
   unit time.

   If  one  could thread the protocol code so that fetchmail didn't block
   on  waiting  for a protocol response, but rather switched to trying to
   process another host query, one might get an efficiency gain (close to
   constant loading at the single-host level).

   Fortunately,  I've only seldom seen a server that incurred significant
   wait  time  on  an individual response. I judge the gain from this not
   worth the hideous complexity increase it would require in the code.

Multiple concurrent instances of fetchmail

   Fetchmail  locking  is  on  a  per-invoking-user because finer-grained
   locks would be really hard to implement in a portable way. The problem
   is  that  you don't want two fetchmails querying the same site for the
   same remote user at the same time.

   To  handle this optimally, multiple fetchmails would have to associate
   a  system-wide  semaphore  with  each active pair of a remote user and
   host  canonical address. A fetchmail would have to block until getting
   this semaphore at the start of a query, and release it at the end of a
   query.

   This would be way too complicated to do just for an "it might be nice"
   feature.  Instead,  you  can  run  a single root fetchmail polling for
   multiple users in either single-drop or multidrop mode.

   The  fundamental  problem here is how an instance of fetchmail polling
   host  foo  can assert that it's doing so in a way visible to all other
   fetchmails.  System  V semaphores would be ideal for this purpose, but
   they're not portable.

   I've  thought about this a lot and roughed up several designs. All are
   complicated  and  fragile, with a bunch of the standard problems (what
   happens  if  a fetchmail aborts before clearing its semaphore, and how
   do we recover reliably?).

   I'm just not satisfied that there's enough functional gain here to pay
   for  the  large  increase  in  complexity that adding these semaphores
   would entail.

                         Multidrop and alias handling

   I  decided to add the multidrop support partly because some users were
   clamoring for it, but mostly because I thought it would shake bugs out
   of  the single-drop code by forcing me to deal with addressing in full
   generality. And so it proved.

   There   are  two  important  aspects  of  the  features  for  handling
   multiple-drop aliases and mailing lists which future hackers should be
   careful to preserve.
    1. The  logic  path  for  single-recipient  mailboxes doesn't involve
       header  parsing  or  DNS  lookups  at all. This is important -- it
       means  the  code  for the most common case can be much simpler and
       more robust.
    2. The  multidrop  handing  does  not rely on doing the equivalent of
       passing  the  message  to sendmail -oem -t. Instead, it explicitly
       mines  members  of  a  specified set of local usernames out of the
       header.
    3. We  do not attempt delivery to multidrop mailboxes in the presence
       of  DNS  errors. Before each multidrop poll we probe DNS to see if
       we  have  a  nameserver handy. If not, the poll is skipped. If DNS
       crashes  during  a poll, the error return from the next nameserver
       lookup  aborts message delivery and ends the poll. The daemon mode
       will then quietly spin until DNS comes up again, at which point it
       will resume delivering mail.

   When  I  designed this support, I was terrified of doing anything that
   could  conceivably  cause  a mail loop (you should be too). That's why
   the code as written can only append local names (never @-addresses) to
   the recipients list.

   The  code  in mxget.c is nasty, no two ways about it. But it's utterly
   necessary,  there  are a lot of MX pointers out there. It really ought
   to be a (documented!) entry point in the bind library.

                              DNS error handling

   Fetchmail's  behavior  on  DNS  errors  is  to suppress forwarding and
   deletion  of  the  individual  message that each occurs in, leaving it
   queued  on  the  server  for  retrieval  on  a  subsequent  poll.  The
   assumption  is  that DNS errors are transient, due to temporary server
   outages.

   Unfortunately  this  means  that if a DNS error is permanent a message
   can be perpetually stuck in the server mailbox. We've had a couple bug
   reports  of  this  kind  due  to  subtle  RFC822 parsing errors in the
   fetchmail  code  that  resulted in impossible things getting passed to
   the DNS lookup routines.

   Alternative  ways  to  handle the problem: ignore DNS errors (treating
   them  as  a  non-match  on the mailserver domain), or forward messages
   with  errors  to  fetchmail's  invoking  user in addition to any other
   recipients.  These  would fit an assumption that DNS lookup errors are
   likely to be permanent problems associated with an address.

                                IPv6 and IPSEC

   The  IPv6 support patches are really more protocol-family independence
   patches.  Because of this, in most places, "ports" (numbers) have been
   replaced with "services" (strings, that may be digits). This allows us
   to  run  with  certain  protocols  that use strings as "service names"
   where  we  in  the IP world think of port numbers. Someday we'll plumb
   strings   all   over   and  then,  if  inet6  is  not  enabled,  do  a
   getservbyname()  down  in  SocketOpen.  The  IPv6  support patches use
   getaddrinfo(), which is a POSIX p1003.1g mandated function. So, in the
   not  too  distant  future,  we'll zap the ifdefs and just let autoconf
   check  for  getaddrinfo.  IPv6 support comes pretty much automatically
   once you have protocol family independence.

                             Internationalization

   Internationalization  is  handled  using  GNU  gettext  (see  the file
   ABOUT_NLS   in  the  source  distribution).  This  places  some  minor
   constraints on the code.

   Strings  that  must  be  subject to translation should be wrapped with
   GT_()  or  N_()  --  the  former  in function arguments, the latter in
   static initializers and other non-function-argument contexts.

                         Checklist for Adding Options

   Adding a control option is not complicated in principle, but there are
   a  lot  of  fiddly  details  in  the  process.  You'll  need to do the
   following minimum steps.
     * Add  a field to represent the control in struct run, struct query,
       or struct hostdata.
     * Go  to  rcfile_y.y. Add the token to the grammar. Don't forget the
       %token declaration.
     * Pick  an  actual  string to declare the option in the .fetchmailrc
       file. Add the token to rcfile_l.
     * Pick a long-form option name, and a one-letter short option if any
       are  left.  Go  to  options.c.  Pick  a  new  LA_  value. Hack the
       longoptions  table  to set up the association. Hack the big switch
       statement to set the option. Hack the `?' message to describe it.
     * If  the  default  is  nonzero,  set it in def_opts near the top of
       load_params in fetchmail.c.
     * Add code to dump the option value in fetchmail.c:dump_params.
     * For  a  per-site or per-user option, add proper FLAG_MERGE actions
       in  fetchmail.c's optmerge() function. For a global option, add an
       override  at  the  end of load_params; this will involve copying a
       "cmd_run." field to a corresponding "run." field, see the existing
       code for models.
     * Document  the  option in fetchmail.man. This will require at least
       two  changes;  one to the collected table of options, and one full
       text description of the option.
     * Hack   fetchmailconf  to  configure  it.  Bump  the  fetchmailconf
       version.
     * Hack  conf.c  to  dump  the option so we won't have a version-skew
       problem.
     * Add an entry to NEWS.
     * If  the option implements a new feature, add a note to the feature
       list.

   There  may  be  other  things  you  have to do in the way of logic, of
   course.

   Before  you  implement an option, though, think hard. Is there any way
   to  make  fetchmail automatically detect the circumstances under which
   it  should  change its behavior? If so, don't write an option. Just do
   the check!

                                Lessons learned

  1. Server-side state is essential

   The  person(s)  responsible  for  removing  LAST  from POP3 deserve to
   suffer.  Without  it,  a client has no way to know which messages in a
   box  have  been  read  by  other  means, such as an MUA running on the
   server.

   The  POP3  UID  feature  described  in  RFC1725  to  replace  LAST  is
   insufficient.  The  only  problem it solves is tracking which messages
   have  been  read  by  this  client  --  and even that requires tricky,
   fragile implementation.

   The  underlying  lesson  is  that  maintaining  accessible server-side
   `seen' state bits associated with Status headers is indispensible in a
   Unix/RFC822 mail server protocol. IMAP gets this right.

  2. Readable text protocol transactions are a Good Thing

   A  nice  thing  about  the  general class of text-based protocols that
   SMTP,   POP2,   POP3,  and  IMAP  belongs  to  is  that  client/server
   transactions  are  easy  to watch and transaction code correspondingly
   easy to debug. Given a decent layer of socket utility functions (which
   Carl  provided)  it's  easy  to write protocol engines and not hard to
   show that they're working correctly.

   This  is  an advantage not to be despised! Because of it, this project
   has been interesting and fun -- no serious or persistent bugs, no long
   hours spent looking for subtle pathologies.

  3. IMAP is a Good Thing.

   Now  that  there  is  a  standard  IMAP  equivalent  of  the POP3 APOP
   validation in CRAM-MD5, POP3 is completely obsolete.

  4. SMTP is the Right Thing

   In  retrospect  it  seems clear that this program (and others like it)
   should have been designed to forward via SMTP from the beginning. This
   lesson  may  be  applicable  to  other Unix programs that now call the
   local MDA/MTA as a program.

  5. Syntactic noise can be your friend

   The  optional  `noise' keywords in the rc file syntax started out as a
   late-night   experiment.   The   English-like  syntax  they  allow  is
   considerably  more  readable  than the traditional terse keyword-value
   pairs  you  get  when  you  strip them all out. I think there may be a
   wider lesson here.

                           Motivation and validation

   It is truly written: the best hacks start out as personal solutions to
   the  author's  everyday problems, and spread because the problem turns
   out  to  be  typical  for  a large class of users. So it was with Carl
   Harris and the ancestral popclient, and so with me and fetchmail.

   It's  gratifying  that  fetchmail  has  become  so popular. Until just
   before  1.9  I  was designing strictly to my own taste. The multi-drop
   mailbox  support and the new --limit option were the first features to
   go in that I didn't need myself.

   By  1.9,  four months after I started hacking on popclient and a month
   after  the  first  fetchmail  release,  there were literally a hundred
   people  on  the fetchmail-friends contact list. That's pretty powerful
   motivation.  And  they  were  a  good  crowd,  too,  sending fixes and
   intelligent  bug  reports  in volume. A user population like that is a
   gift from the gods, and this is my expression of gratitude.

   The  beta  testers  didn't know it at the time, but they were also the
   subjects of a sociological experiment. The results are described in my
   paper, The Cathedral And The Bazaar.

                                    Credits

   Special thanks go to Carl Harris, who built a good solid code base and
   then  tolerated  me  hacking  it  out  of  recognition.  And  to Harry
   Hochheiser, who gave me the idea of the SMTP-forwarding delivery mode.

   Other   significant  contributors  to  the  code  have  included  Dave
   Bodenstab  (error.c  code  and  --syslog),  George Sipe (--monitor and
   --interface), Gordon Matzigkeit (netrc.c), Al Longyear (UIDL support),
   Chris  Hanson  (Kerberos  V4  support),  and  Craig  Metz (OPIE, IPv6,
   IPSEC).

                                  Conclusion

   At this point, the fetchmail code appears to be pretty stable. It will
   probably undergo substantial change only if and when support for a new
   retrieval protocol or authentication method is added.

                                 Relevant RFCS

   Not  all of these describe standards explicitly used in fetchmail, but
   they all shaped the design in one way or another.

   RFC821
          SMTP protocol

   RFC822
          Mail header format

   RFC937
          Post Office Protocol - Version 2

   RFC974
          MX routing

   RFC976
          UUCP mail format

   RFC1081
          Post Office Protocol - Version 3

   RFC1123
          Host requirements (modifies 821, 822, and 974)

   RFC1176
          Interactive Mail Access Protocol - Version 2

   RFC1203
          Interactive Mail Access Protocol - Version 3

   RFC1225
          Post Office Protocol - Version 3

   RFC1344
          Implications of MIME for Internet Mail Gateways

   RFC1413
          Identification server

   RFC1428
          Transition of Internet Mail from Just-Send-8 to 8-bit SMTP/MIME

   RFC1460
          Post Office Protocol - Version 3

   RFC1508
          Generic Security Service Application Program Interface

   RFC1521
          MIME: Multipurpose Internet Mail Extensions

   RFC1869
          SMTP Service Extensions (ESMTP spec)

   RFC1652
          SMTP Service Extension for 8bit-MIMEtransport

   RFC1725
          Post Office Protocol - Version 3

   RFC1730
          Interactive Mail Access Protocol - Version 4

   RFC1731
          IMAP4 Authentication Mechanisms

   RFC1732
          IMAP4 Compatibility With IMAP2 And IMAP2bis

   RFC1734
          POP3 AUTHentication command

   RFC1870
          SMTP Service Extension for Message Size Declaration

   RFC1891
          SMTP Service Extension for Delivery Status Notifications

   RFC1892
          The  Multipart/Report  Content  Type  for the Reporting of Mail
          System Administrative Messages

   RFC1894
          An Extensible Message Format for Delivery Status Notifications

   RFC1893
          Enhanced Mail System Status Codes

   RFC1894
          An Extensible Message Format for Delivery Status Notifications

   RFC1938
          A One-Time Password System

   RFC1939
          Post Office Protocol - Version 3

   RFC1957
          Some   Observations  on  Implementations  of  the  Post  Office
          Protocol (POP3)

   RFC1985
          SMTP Service Extension for Remote Message Queue Starting

   RFC2033
          Local Mail Transfer Protocol

   RFC2060
          Internet Message Access Protocol - Version 4rev1

   RFC2061
          IMAP4 Compatibility With IMAP2bis

   RFC2062
          Internet Message Access Protocol - Obsolete Syntax

   RFC2195
          IMAP/POP AUTHorize Extension for Simple Challenge/Response

   RFC2177
          IMAP IDLE command

   RFC2449
          POP3 Extension Mechanism

   RFC2554
          SMTP Service Extension for Authentication

   RFC2595
          Using TLS with IMAP, POP3 and ACAP

   RFC2645
          On-Demand Mail Relay: SMTP with Dynamic IP Addresses

   RFC2683
          IMAP4 Implementation Recommendations

   RFC2821
          Simple Mail Transfer Protocol

   RFC2822
          Internet Message Format
     _________________________________________________________________



    Eric S. Raymond <esr@snark.thyrsus.com>