distcc.sgml [plain text]

<!doctype linuxdoc system><!-- fill-column: 78; -->
<linuxdoc>
  <book>
    <titlepag>
      <title>distcc User Manual</title>
      <author>Martin Pool</author>
      <date>$Date: 2003/04/05 00:46:27 $, for distcc 2.1cvs</date>
    </titlepag>
    
    <toc>
    
    <chapt>
      <heading>Introduction</heading>

      <p>
	<em>"Speed, it seems to me, provides the one genuinely modern pleasure."</em>
	--- Aldous Huxley
      </p>

      <sect>
	<heading>Overview</heading>

	<p>
	  <htmlurl url="http://distcc.samba.org/" name="distcc"> is a program
	  to distribute compilation of C or C++ code across several machines on a
	  network.  distcc should always generate the same results as a
	  local compile, is simple to install and use, and is often
	  significantly faster than a local compile.
	</p>

	<p>
	  Unlike other distributed build systems, distcc does not
	  require all machines to share a filesystem, have
	  synchronized clocks, or to have the same libraries or header
	  files installed.
	</p>


	<p>
	  Compilation is centrally controlled by a client machine, which
	  is typically the developer's workstation or laptop.  The
	  distcc client runs on this machine, as does <em>make</em>, the
	  preprocessor, the linker, and other stages of the build
	  process.  Any number of "volunteer" machines help the client
	  to build the program, by running the compiler and assembler
	  as required.  The volunteer machines run the <tt>distccd</tt>
	  daemon which listens on a network socket for requests.
	</p>

	<p>
	  distcc sends the complete preprocessed source code across the
	  network for each job, so all it requires of the volunteer
	  machines is that they be running the <tt>distccd</tt> daemon,
	  and that they have an appropriate compiler installed.
	</p>

	<p>
	  distcc is designed to be used with GNU make's parallel-build
	  feature (<tt>-j</tt>).  Shipping files across the network takes
	  time, but few cycles on the client machine.  Any files that can
	  be built remotely are essentially "for free" in terms of client
	  CPU.
	</p>

	<p>
	  distcc was written by Martin Pool.  
	</p>

	<p>
	  distcc was inspired by Andrew Tridgell's <htmlurl
	  url="http://ccache.samba.org/" name="ccache"> program.
	</p>

	<p>
	  If you find distcc useful, please try to complete the survey
	  form in the distribution, or just send an email.
	</p>
      </sect>

      
      <sect>
	<heading>Licence</heading>
	<p>
	  distcc and the <em>distcc User Manual</em> are copyright (C)
	  2002, 2003 by Martin Pool.
	</p>
	<p>
	  distcc is free software; you can redistribute it and/or modify
	  it under the terms of the GNU General Public License as published by
	  the Free Software Foundation; either version 2 of the License, or
	  (at your option) any later version.
	</p>
	<p>
	  Permission is granted to copy, distribute and/or modify the
	  <em>distcc User Manual</em> under the terms of the GNU Free
	  Documentation License, Version 1.1 or any later version
	  published by the Free Software Foundation; with no Invariant
	  Sections, no Front-Cover Texts, and no Back-Cover Texts.
	</p>
	<p>
	  distcc is distributed in the hope that it will be useful,
	  but WITHOUT ANY WARRANTY; without even the implied warranty
	  of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
	  the GNU General Public License for more details.
	</p>
	<p>
	  You should have received a copy of the GNU General Public
	  License and GNU Free Documentation License along with
	  distcc.  If not, write to the Free Software Foundation,
	  Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA,
	  or see <url url="http://www.gnu.org/licenses/">.
	</p>
	<p>
	  The author understands the GNU GPL to apply to distcc in the
	  following way: you are allowed to use distcc to compile a
	  non-free program, or to call it from a non-free Make, or to
	  call a non-free compiler.  However, you may not distribute a
	  modified version of distcc unless you comply with the terms
	  of the GPL: in particular, giving your users access to the
	  source code and the right to redistribute it, and clearly
	  identifying your changes.
	</p>
      </sect>
      
      <sect>
	<heading>Security Considerations</heading>
	
	<p>
	  <bf>
	    distcc should only be used on networks where all machines
	    and all users are trusted.
	  </bf>
	</p>
	
	<p>
	  The distcc daemon, <tt>distccd</tt>, allows other machines
	  on the network to run arbitrary commands on the volunteer
	  machine.  Anyone that can make a connection to the volunteer
	  machine can run essentially any command as the user running
	  <tt>distccd</tt>. 
	</p>

	<p>
	  distcc is suitable for use on a small to medium network of
	  friendly developers.  It's certainly not suitable for use on
	  a machine connected to the Internet or a large
	  (e.g. university campus) network without firewalling in
	  place.
	</p>

	<p>
	  <tt>inetd</tt> or <tt>tcpwrappers</tt> can be used to impose
	  access control rules, but this should be done with an eye to
	  the possibility of address spoofing.
	</p>

	<p>
	  In summary, the security level is similar to that of
	  old-style network protocols like X11-over-TCP, NFS or RSH.
	</p>
      </sect>

      <sect>
	<heading>Getting Started</heading>
	
	<p>
	  Four straightforward steps are required to install and use
	  distcc:
	
	  <enum>
	    <item>
	      Compile and install the <tt>distcc</tt> package on the
	      client and volunteer machines.
	      
	    <item>
	      Start the <tt>distccd</tt> daemon on all volunteer
	      machines.  
	      
	    <item>
	      On the client, set the <tt>DISTCC_HOSTS</tt> environment
	      variable to indicate which volunteer machines to use.
	      For example:
	      <tscreen><verb>DISTCC_HOSTS='angry toey:4202 localhost'</verb></tscreen>
	      
	    <item>
	      Set the <tt>CC</tt> variable or edit Makefiles to prefix
	      distcc to calls to the C/C++ compiler.  For example:
	      <tscreen><verb>distcc gcc -o hello.o -c hello.c</verb></tscreen>
	  </enum>
	</p>
      </sect>


      <sect>
	<heading>Reporting Bugs</heading>

	<p>
	  If you think you have found a bug, please check the manual
	  and the <tt>TODO</tt> file to see if it is a known
	  restriction.  If not, please send a clear and detailed
	  report to the mailing list <tt>distcc@lists.samba.org</tt>.
          (For a clear
	  and detailed description of "clear and detailed", see Simon
	  Tatham's advice on reporting bugs, <url
	  url="http://www.chiark.greenend.org.uk/~sgtatham/bugs.html">.)
	</p>

	<p>
	  A good bug report for distcc should include:

	
	  <enum>
	    <item>
	      What you're trying to do.  For example: "compile KDE",
	      "use gcc's <tt>-MD</tt> option".
	    </item>
	    
	    <item>
	      What actually happens.  For example: "distcc fails with
	      error 104", "the compilation never completes", "I get
	      error message XXX".
	    </item>
	    
	    <item>
	      The version of distcc you're using (the output of
	      <tt>-</tt><tt>-version</tt> on both client and server.
	      If you got it from a distribution rather than building
	      it yourself, then mention that.
	    </item>

	    <item>
	      What other software you're using, in particular  the
	      operating system and compiler.  For the operating system
	      it's normally enough to give the overall version
	      (FreeBSD CURRENT, RedHat Linux 7.2, ...).  For the
	      compiler, use "<tt>gcc -version</tt>".
	    </item>
	    
	    <item>
	      The exact command you're using to run the compilation.
	      If you're using make, then include the line from its
	      output that runs the compiler.
	    </item>

	    <item>
	      The debug logs from the client and server.  On the
	      client, you should set <tt>DISTCC_VERBOSE</tt> and
	      <tt>DISTCC_LOG</tt>.  On the server, use
	      <tt>-</tt><tt>-verbose</tt> and
	      <tt>-</tt><tt>-log-file</tt>.  If you can, trim the log
	      files to just the invocation that causes trouble.
	      Grepping for a process id can help with this.  If the
	      problem is intermittent, then please leave logging
	      running until it recurs and then pull out a smaller
	      section of logs to send.
	    </item>
	  </enum>
	</p>

	<p>
	  Please do not obfuscate your logs.  The name of a single
	  source file or machine is probably not confidential
	  information, but the confusion introduced by editing logs
	  can be significant.
	</p>

	<p>
	  Please send a problem description to the <tt>distcc</tt>
	  mailing list, on <tt>lists.samba.org</tt>.  Please don't
	  send mail direct to the author: if you use the list, other
	  people may be able to help you, and the answers are publicly
	  archived.
	</p>
      </sect>

        

      <sect>
	<heading>
	  Test Suite
	</heading>

	<p>
	  distcc has a test suite written in Python using the
	  <em>ComfyChair</em> framework.  It does not yet exercise all
	  functionality, but is improving.
	</p>
	
	<p>
	  To run the test suite, run <tt>make check</tt> from the distcc
	  source directoy.
	</p>
      </sect>
    </chapt>  <!-- End introduction chapter -->

    

    <chapt>
      <heading>Using distcc</heading>
      <sect>
	<heading>Invoking distcc</heading>

	<p>
	  To setup distcc to be compatible with the widest range of
	  existing software, create a "masquerade dir" of compiler
	  links that will invoke distcc.  When the distcc-in-disguise
	  gets invoked, it invokes the real compiler of the same name
	  either on the local client machine, or on a remote volunteer
	  host.
	</p>
	<p>
	  For instance, you could create the directory named
	  /usr/lib/distcc/bin and populate it links (symlinks are
	  slightly easier to maintain in the long run, but use hard
	  links if you prefer them):
	  <tscreen><verb>$ mkdir /usr/lib/distcc/bin
$ cd /usr/lib/distcc/bin
$ ln -s ../../../bin/distcc gcc
$ ln -s ../../../bin/distcc cc
$ ln -s ../../../bin/distcc g++
$ ln -s ../../../bin/distcc c++</verb></tscreen>
	</p>
	<p>
	  Then, to use distcc, a user just needs to put the directory
	  /usr/lib/distcc/bin early in the PATH (and have set the
	  DISTCC_HOSTS environment variable) and distcc will handle the
	  rest.  Note that this masquerade dir must occur on the PATH
	  earlier than the directory that contains the actual compilers
	  of the same names, and that any auxiliary programs that these
	  compilers call (such as "as" or "ld") must also be found on the
	  PATH in a dir after the masquerade dir (since distcc calls out
	  to the real compiler with a PATH value that has all dirs up to
	  and including the masquerade dir trimmed off).
	</p>
	<p>
	  An alternate setup is to prefix the distcc command to compiler
	  command lines so that it is called explicitly.  This allows you
	  to more easily control which things use distcc and which things
	  don't, but can be more problematical when trying to use distcc
	  with existing projects.
	</p>
	<p>
	  For example, to compile the standard application program:
	  <tscreen><verb>distcc gcc -o hello.o -c hello.c</verb></tscreen>
	</p>
	<p>
	  Standard Makefiles, including those using the GNU
	  autoconf/automake system use the <bf>$CC</bf> variable as
	  the name of the C compiler to run and the <bf>$CXX</bf> variable
	  as the name of the C++ compiler to run.  In many cases, it is
	  sufficient to just override one or both of these variables,
	  either from the command line, or perhaps from your login script
	  (if you wish to use distcc for all compilations).  The following
	  example sets both variables and takes advantage of the fact that
	  distcc defaults to calling "gcc" if no other compiler name is
	  provided:
	  <tscreen><verb>make CC=distcc CXX='distcc g++'</verb></tscreen>
	</p>
	<p>
	  Unfortunately, this setup sometimes leads to incompatibilities
	  with packages that don't expect the compiler name to contain
	  spaces or to projects that don't honor the above variables when
	  deciding what compiler to use.  For instance, the KDE package
	  will fail to compile using the above CXX-munging idiom, but will
	  compile just fine if you use a masquerade dir (which causes all
	  executions of "g++" to really run distcc).
	</p>
      </sect>

      
      <sect>
	<heading>Options</heading>

	<p>
	  Options to distcc must precede the compiler name.  Any
	  arguments or options following the name of the compiler are
	  passed through to the compiler.
	</p>

	<p>
	  <descrip>
	    <tag><tt>-</tt><tt>-help</tt></tag>
	      
	    <p>
	      Print a detailed usage message and exit.
	    </p>
	    
	    <tag><tt>-</tt><tt>-version</tt></tag>
	    
	    <p>
	      Show distcc version and exit.
	    </p>
	  </descrip>
	</p>
      </sect>


      <sect>
	<heading>Environment Variables</heading>
	<p>
	  The way in which distcc runs the compiler is controlled by a
	  few environment variables.
	</p>
	
	<p>
	  <bf>NOTE:</bf> Some versions of make do not export Make
	  variables as environment variables by default.  Also,
	  assignments to variables within the Makefile may override
	  their definitions in the environment that calls make.  The
	  most reliable method seems to be to set <tt>DISTCC_*</tt>
	  variables in the environment of Make, and to set <tt>CC</tt>
	  on the right-hand-side of the Make command line.  For
	  example:

	  <tscreen><verb>$ DISTCC_HOSTS='localhost wistful toey'
$ PATH="/usr/lib/distcc/bin:$PATH"
$ export DISTCC_HOSTS PATH
$ ./configure
$ make all</verb></tscreen>

	<p>
	  or:

	  <tscreen><verb>$ DISTCC_HOSTS='localhost wistful toey'
$ export DISTCC_HOSTS
$ CC='distcc' ./configure
$ make CC='distcc' all</verb></tscreen>
	</p>

	<p>
	  Some Makefiles may, contrary to convention, explicitly call
	  <tt>gcc</tt> or some other compiler, in which case
	  overriding <tt>$CC</tt> will not be enough to call distcc.
	  While this is harmless (but suboptimal), using a masquerade
	  dir of distcc links will avoid this.
	</p>
	<p>
	  Remember that you should not use both methods for calling
	  distcc at the same time.  If you are using a masquerade dir,
	  don't munge CC and/or CXX (just put the dir early on your
	  PATH).  If you're not using a masquerade dir, you'll need to
	  either change CC and/or CXX, or modify the Makefile(s) to
	  call distcc explicitly.
	</p>
	<p>
	  <descrip>
	    <tag><tt>DISTCC_HOSTS</tt></tag>
	    <p>
	      Space-separated list of volunteer host specifications.
	    </p>

	    <tag>
	      <tt>DISTCC_VERBOSE</tt>
	    </tag>
	    <p>
	      If set to <tt>1</tt>, distcc produces explanatory messages on the
	      standard error stream.  This can be helpful in debugging
	      problems.  Bug reports should include verbose output.
	    </p>

	    <tag>
	      <tt>DISTCC_LOG</tt>
	    <p>
	      Log file to receive messages from distcc itself, rather
	      than stderr.
	    </p>

	    <tag><tt>DISTCC_SAVE_TEMPS</tt>
	    <p>
	      If set to <tt>1</tt>, temporary files are not deleted
	      after use.  Good for debugging, or if your disks are too
	      empty.
	    </p>

	    <tag><tt>DISTCC_TCP_CORK</tt>
	    <p>
	      If set to <tt>0</tt>, 
	    disable use of "TCP corks", even if they're present on
	    this system.  Using corks normally helps pack requests into
	    fewer packets and aids performance.
	    </p>
	  </descrip>
	</p>
      </sect>


      <sect>
	<heading>Which Jobs are Distributed?</heading>

	<p>
	  Building a C or C++ program on Unix involves several phases:

	  <itemize>
	    <item>
	      Preprocessing source (<tt/.c/) and headers (<tt/.h/) to
	      a preprocessed file (<tt/.i/)
	    </item>
	    <item>
	      Compiling preprocessed source (<tt/.i/) to assembly
	      instructions (<tt/.s/)
	    </item>
	    <item>
	      Assembling to an object file (<tt/.o/)
	    </item>
	    <item>
	      Linking object files and libraries to form an
	      executable, library, or shared library.
	    </item>
	  </itemize>
	</p>

	<p>
	  distcc only ever runs the compiler and assembler remotely.
	  The preprocessor must always run locally because it needs to
	  access various header files on the local machine which may
	  not be present, or may not be the same, on the volunteer.
	  The linker similarly needs to examine libraries and object
	  files, and so must run locally.
	</p>

	<p>
	  The compiler and assembler take only a single input file,
	  the preprocessed source, produce a single output, the object
	  file.  distcc ships these two files across the network and
	  can therefore run the compiler/assembler remotely.
	</p>

	<p>
	  Fortunately, for most programs running the preprocessor is
	  relatively cheap, and the linker is called relatively
	  infrequent, so most of the work can be distributed.
	</p>

	<p>
	  distcc examines its command line to determine which of these
	  phases are being invoked, and whether the job can be
	  distributed.  Here is an example of a typical command that
	  can be preprocessed locally and compiled remotely:
<tscreen><verb>distcc gcc -o hello.o -DGREETING="hello" -c hello.c</verb></tscreen>
	</p>

	<p>
	  The command-line scanner is intended to behave in the same
	  way as gcc.  In case of doubt, distcc runs the job locally.
	</p>

	<p>
	  In particular, this means that commands that compile and
	  link in one go cannot be distributed.  These are quite rare
	  in realistic projects.  Here is one example of a command
	  that could not be distributed, because it calls the compiler
	  and linker

	  <tscreen><verb>distcc gcc -o hello hello.c</verb></tscreen>
	</p>
      </sect>



      <sect>
	<heading>Running Jobs in Parallel</heading>

	<p>
	  Moving source across the network is less efficient to
	  compiling it locally.  If you have access to a machine much
	  faster than your workstation, the performance gain may
	  overwhelm the cost of transferring the source code and it
	  may be quicker to ship all your source across the network to
	  compile it there.

	<p>
	  In general, it is even better to compile on two or machines
	  in parallel.  Any number of invocations of distcc can run at
	  the same time, and they will distribute their work across
	  the available hosts.

	<p>
	  distcc does not manage parallelization, but relies on Make
	  or some other build system to invoke compiles in parallel.

	<p>
	  With GNU Make, you should use the <tt/-j/ option to specify
	  a number of parallel tasks slightly higher than the number
	  of available hosts.  For example:

	  <tscreen><verb>$ export DISTCC_HOSTS='angry toey wistful localhost'
$ make -j5</verb></tscreen>
	</p>
	
      

      <sect>
	<heading>Choosing a Host</heading>

	<p>
	  The <tt>$DISTCC_HOSTS</tt> variable tells distcc which
	  volunteer machines are available to run jobs.  This is a
	  space-separated list of host specifications, each of which
	  has the syntax:
<tscreen><verb>HOSTNAME[/MAX_JOBS][:PORT]</verb></tscreen>

	<p>
	  You can specify the maximum number of jobs that the host
	  should receive by affixing a number after a slash (e.g.
	  "localhost/2").

	<p>
	  A numeric TCP port may optionally be specified after a
	  colon.  If no port is specified, it uses the default, which
	  is currently 3632.

	<p>
	  If only one invocation of distcc runs at a time, it will
	  always execute on the first host in the list.  (This
	  behaviour is not absolutely guaranteed, however, and may
	  change in future versions.)

	<p>
	  The name <tt>localhost</tt> is handled specially by running
	  the compiler in place.

	<p>
	  The daemon may be tested on localhost by setting 

<tscreen><verb>DISTCC_HOSTS=127.0.0.1</verb></tscreen>

	  Although <tt>localhost</tt> causes distcc to execute the job
	  directly, using an IP address will cause it to make a TCP
	  connection to a daemon on localhost.  This is slower, but
	  useful for testing.
      </sect>


      <sect>
	<heading>Load Distribution Algorithm</heading>

	<p>
	  When distcc is invoked, it needs to decide which of the
	  volunteers in <tt>DISTCC_HOSTS</tt> should be used to
	  compile a job.  It uses a simple heuristic to try to spread
	  load across machines appropriately.
	</p>

	<p>
	  You can imagine all of the compile machines as being leaky
	  buckets, some with larger holes (faster CPUs) than others.
	  The distcc client tries to keep water at the same level on
	  each one (the same number of jobs running), preferring hosts
	  occurring earlier in DISTCC_HOSTS.  Over the course of a
	  build, the faster machines will complete jobs more quickly,
	  and therefore be topped up more quickly and do more work
	  overall, but without the client ever actually needing to
	  know which one is fastest.
	</p>

	<p>
	  This design has the advantage of not requiring the client to
	  know in advance the speeds of the volunteers, and being
	  quite simple to implement.  It copes quite well with
	  machines that are temporarily slowed down: they are just
	  topped-up more slowly in the future.
	</p>

	<p>
	  Scheduling is coordinated between different invocations of
	  the <tt>distcc</tt> client by lockfiles in the temporary
	  directory.  There is no coordination between clients running
	  as different users, on different hosts, or with different
	  <tt>TMPDIR</tt> paths.
	</p>

	<p>
	  On Linux, scheduling slightly too many jobs on any machine
	  is quite harmless, as long as the number is not so high that
	  the machine begins thrashing.  So it's OK to provide a
	  <tt>-j</tt> number substantially higher than the number of
	  available processors.
	</p>

	<p>
	  The biggest problem with this design is that it handles
	  multiprocessor machines poorly: they probably ought to have
	  jobs scheduled proportional to the number of processors.  At
	  the moment, the best thing is to run with a <tt>-j</tt>
	  factor equal to the product of the maximum number of CPUs in
	  any machine (<tt>MAX_CPUS</tt>) and the number of machines.
	  This should make sure that roughly <tt>MAX_CPUS</tt> tasks
	  run on every machine at all times, and will therefore keep
	  all CPUs loaded, but will cause excessive task-switching on
	  machines with fewer CPUs.  Task switching is not very
	  expensive on Linux so it is not a big problem, but it does
	  lose a few percentage points of speed.  This should be fixed
	  in a future release.
	</p>
      </sect>



      <sect>
	<heading>Diagnostic Messages</heading>
	<p>
	  Error messages or warnings from local or remote compilers
	  are passed through to diagnostic output on the client.  The
	  compiler takes all file names and line numbers from pragmas
	  in the preprocessed output, so error messages will always
	  have the correct pathnames for files on the client.
	
	<p>
	  distcc can supply extensive debugging information when the
	  verbose option is used.  This is controlled by the 
	  <tt>$DISTCC_VERBOSE</tt> environment variable on the client,
	  and the <tt>-</tt><tt>-verbose</tt> option on the server.
	</p>

	<p>
	  By default, distcc prints diagnostic messages to stderr.
	  Sometimes these are too intrusive into the output of the
	  regular compiler, and so they may be selectively redirected
	  by setting the <tt>$DISTCC_LOG</tt> environment variable to a
	  filename.
	</p>
	  
	<p>
	  Error messages from the daemon are sent to both the log file
	  on the volunteer, and also back to the client's diagnostic
	  output.  (By default, it uses the syslog <tt>daemon</tt>
	  channel.)  If compilation is failing, please examine the log
	  file on the relevant volunteer machine.
      </sect>


      
      <sect>
	<heading>distcc Exit Codes<label id="exit"></heading>
	<p>
	  The exit code of distcc is normally that of the compiler:
	  zero for successful compilation and non-zero otherwise.
	</p>

	<p>
	  If distcc fails to distribute a job to a selected volunteer machine,
	  it will try to run the compiler locally on the client.  distcc only
	  tries a single remote machine for each job.
	</p>

	<p>
	  distcc tries to distinguish between a failure to distribute
	  the job, and a "genuine" failure of the compiler on the
	  remote machine, for example because of a syntax error in the
	  program.  In the second case, distcc does not re-run the
	  compiler locally, and returns the same exit code as the
	  remote compiler.
	</p>

	<p>
	  If the compiler exits with a signal, distcc returns an exit
	  code of 128 plus the signal number, following Unix
	  convention.
	</p>

	<p>
	  If distcc fails to run the compiler, it may return one one
	  of the following error codes.  These are also used by
	  distccd.
	</p>

	<p>
	  <descrip>
	    <tag>100 <tt/EXIT_DISTCC_FAILED/</tag>
	    <p>
	      Generic or unspecified failure in distcc.
	    </p>
	      
	    <tag>102 <tt>EXIT_BIND_FAILED</tt></tag>
	    <p>
	      Failed to bind and listen on network socket.  Port may
	      already be in use.
	    </p>

	    <tag>103 <tt/EXIT_CONNECT_FAILED/</tag>
	    <p>
	      Failed to establish network connection or listen on
	      socket.  The host may be invalid or unreachable, or
	      there may be no daemon listening.
	    </p>

	    <tag>104 <tt>EXIT_COMPILER_CRASHED</tt>

	    <p>
	      The underlying compiler exited because of a signal.
	      This probably indicates a compiler bug, or a problem
	      with the hardware or OS on the server.  (Obsolete in 0.13.)
	    </p>

	    <tag>105 <tt>EXIT_OUT_OF_MEMORY</tt>
	    <p>
	      Obvious.
	    </p>

	    <tag>106 <tt>EXIT_BAD_HOSTSPEC</tt>
	    <p>
	      <tt>$DISTCC_HOSTS</tt> was undefined, empty, or
	      syntactically invalid.  (At the moment, you should never
	      see this code because distcc will fall back to building
	      locally.  Let me know if you would prefer a hard error.)
	    </p>

	    <tag>107 <tt>EXIT_IO_ERROR</tt>
	    <p>
	      There was a disk or network IO error while distributing
	      the job.  For example, the network may be failing, or a
	      disk on either end may be out of space.
	    </p>

	    <tag>108 <tt>EXIT_TRUNCATED</tt>
	    <p>
	      The network socket was closed unexpectedly.  This
	      probably indicates a network problem or distcc bug.
	    </p>

	    <tag>109 <tt>EXIT_PROTOCOL_ERROR</tt></tag>
	    <p>
	      The distcc internal network protocol was not followed by
	      the remote program.  This probably indicates a network
	      problem or distcc bug.
	    </p>

	    <tag>110 <tt>EXIT_COMPILER_MISSING</tt></tag>
	    <p>
	      The specified compiler was not found in the path of the
	      server.
	    </p>

	    <tag>111 <tt>EXIT_RECURSION</tt></tag>
	    <p>
	      distcc ended up invoking itself recursively.  This is
	      caught as an error to prevent unbounded recursion or
	      inefficiency.  For example, this error may occur if
	      distcc is called by the <tt>cc</tt> command in the
	      daemon's path.
	    </p>
	  </descrip>
	</p>
      </sect> <!-- end of "distcc Exit Codes" -->


      
      <sect>
	<heading>File Metadata</heading>

	<p>
	  distcc transfers only the binary contents of source, error,
	  and object files, without any concern for metadata,
	  attributes, character sets or end-of-line conventions.
	</p>
	
	<p>
	  distcc never transmits file times across the network or
	  modifies them, and so should not care whether the clocks on
	  the client and volunteer machines are synchronized or not.
	  When an object file is received onto the client, its
	  modification time will be the current time on the client
	  machine.
	</p>
      </sect>
    </chapt>  <!-- End of "using distcc" -->

    

    <chapt>
      <heading>distcc Compatibility</heading>
      
      <sect>
	<heading>distcc with ccache</heading>
	<p>
	  distcc works well with the <htmlurl
	    url="http://ccache.samba.org/" name="ccache"> tool for
	  caching compilation results.  The best way to use the two of
	  them together is to create a masquerade dir for ccache and a
	  masquerade dir for distcc, and update your path to have the
	  ccache-in-disguise links be found first, followed by the
	  distcc-in-disguise links, and finally the real compiler.
	  For instance:
	  <tscreen><verb>PATH="/usr/lib/ccache/bin:/usr/lib/distcc/bin:$PATH"
export PATH</verb></tscreen>
	</p>
	<p>
	  Another alternative is to munge CC and/or CXX to mention both
	  commands:
	  <tscreen><verb>CC='ccache distcc'
CXX='ccache distcc g++'</verb></tscreen>
	</p>
      </sect>


      <sect>
	<heading>distcc with autoconf</heading>
	<p>
	  distcc works quite well with autoconf.
	</p>
	<p>
	  <tt>DISTCC_VERBOSE</tt> can give autoconf trouble because
	  autoconf tries to parse error messages from the compiler.
	  If you redirect distcc's diagnostics using
	  <tt>DISTCC_LOG</tt> then it seems to be fine.
	</p>
	<p>
	  Some autoconf-based systems "freeze" the compiler name
	  used for configure into their Makefiles.  To make them use
	  distcc, you should either set the PATH to include the distcc
	  masquerade dir or set <tt>$CC</tt> prior to running
	  <tt>./configure</tt>, and/or override <tt>$CC</tt> on the
	  right-hand-side of the Make command line.
	</p>
	<p>
	  Some poorly-written shell scripts may assume that
	  <tt>$CC</tt> is a single word.  Using a masquerade dir
	  avoids this problem.
	</p>
      </sect>

      
      <sect>
	<heading>distcc with libtool</heading>

	<p>
	  Some versions of libtool seem not to cope well when CC is
	  set to more than one word, such as <tt>"distcc gcc"</tt>.
	  Setting <tt>CC=distcc</tt>, which is supported in 0.10 and
	  later, seems to work well.  Using a masquerade dir, which
	  was first supported in 2.0, is even better (since it avoids
	  problems with C++ programs too).
	</p>
      </sect>


      
      <sect>
	<heading>distcc with Gentoo Linux</heading>

	<p>
	  Gentoo is a "ports"-based free software distribution, in
	  which packages are always built from source on
	  installation.  distcc works well with Gentoo to speed
	  installation.
	</p>

	<p>
	  You can install distcc either using the upstream tarball
	  from <tt>distcc.samba.org</tt> (which may be newer), or
	  using <tt>emerge distcc</tt> to get the Gentoo port, which
	  may be better integrated.  You can also get ccache with
	  <tt>emerge ccache</tt>.
	</p>

        <p>
	  Using distcc (and ccache) to speed your "emerge" commands is
	  simplicity itself since full support for these packages is
	  already built into portage.  The two items you need to customize
	  for your system are the MAKEOPTS value and the DISTCC_HOSTS
	  value.  You can either set these in the /etc/make.conf file, or
	  you can export them from your local environment.  Then, simply
	  /etc/make.conf to add the "distcc" (and "ccache") features to the
	  "FEATURE" setting (uncommenting the line, if needed), start up
	  any missing remote distccd servers, and portage will take care
	  of munging the PATH that the ebuild uses in an appropriate
	  manner (note that the Gentoo version of distcc comes with a
	  masquerade dir as a part of the standard installation, and has
	  had this since distcc-1.1-r8 (when it was first patched into
	  the ebuild).  For example, if you wanted 9 parallel compiles
	  and you had 4 remote systems, you might run something like this:
	  <tscreen><verb>export MAKEOPTS=-j9
export DISTCC_HOSTS='local larry moe curly shemp'
emerge </verb><em>packagename</em></tscreen>
	</p>

	<p>
	  To use distcc for compilations outside of an emerge/ebuild
	  command, include the /usr/lib/distcc/bin dir early on your
	  PATH before configuring/compiling the package.
	</p>
      </sect>


      <sect>
	<heading>distcc with gcc dependency computation</heading>

	<p>
	  gcc has the ability to produce information about header
	  dependencies as a side-effect of preprocessing.  These can
	  be included in Makefiles in various ways to make sure that
	  files are up-to-date.
	</p>
	
	<p>
	  This feature is enabled using <tt>-MD</tt>, <tt>-M</tt>
	  and related options.
	</p>

	<p>
	  Unfortunately, gcc changed the behaviour of this feature
	  between gcc 2.95 and 3.x in such a way that it seems
	  properly for distcc to generally support it.  The
	  difficulty is that the filename to which dependencies are
	  written depends in a very complicated way on the gcc
	  command line.  distcc needs to change the command line to
	  run the preprocessor locally and the compiler remotely,
	  and this can sometimes cause problems.  (This also causes
	  problems for Makefiles that are supposed to work with both
	  versions of the compiler.)

	<p>
	  <tt>-M</tt> causes gcc to produce dependency information
	  instead of compiling.  distcc understands this and passes
	  the option straight through to gcc.  It should work correctly.

	<p>
	  With gcc 2.95, <tt>-MD</tt> always writes dependencies
	  into the preprocessor's working directory.  distcc should
	  work fine.

	<p>
	  With gcc 3.2, <tt>-MD</tt> writes the output into either
	  the source directory or output directory, depending on the
	  presence of the <tt>-o</tt> option.   However, gcc 3.2
	  also has a <tt>-MF</tt> option  that can be used to
	  explicitly set the dependency output file, and this works
	  well with distcc.
	<p>

	<p>
	  In summary: for gcc 2.95, no changes are required.  For
	  gcc 3.2, <tt>-MF</tt> should be used to specify the file
	  to write dependencies to.
	</p>
      </sect>
    </chapt>	


    <chapt>
      <heading>The distccd Server</heading>
      
      <p><label id="server">
	The distccd server may be started either from a super-server
	such as <tt>inetd</tt>, or as a stand-alone daemon.
      </p>
      
      <p>
	distccd does not need to run as root and should not.
      </p>

      <p>
	distccd does not have a configuration file; it's behaviour is
	controlled only by command-line options and requests from
	clients.
      </p>

      <sect>
	<heading>Invoking distccd</heading>

	<p>
	  These options may be used for either inetd or standalone
	  mode.
	</p>

	<p>
	  If you want to see if the daemon started properly, look in the
	  log file.  By default this is something like
	  <tt>/var/log/daemon</tt> or <tt>/var/log/messages</tt>,
	  depending on your system.
	</p>
	
	<p>
	  <descrip>
	    <tag><tt>-</tt><tt>-help</tt></tag>
	    <p>
	      Explains usage of the daemon and exits.
	    </p>
	      
	    <tag><tt>-</tt><tt>-version</tt></tag>
	    <p>
	      Shows the daemon version and exits.
	    </p>
	      
	    <tag><tt>-N</tt>, <tt>-</tt><tt>-nice NICENESS</tt>
	    <p>
	      Makes the daemon more nice about giving up the CPU to
	      other tasks on the machine.  <em>NICENESS</em> is a
	      value from 0 (regular priority) to 20 (lowest
	      priority).  This option is good if you want to run
	      distccd in the background on a machine used for other
	      purposes.
	    </p>

	    <tag><tt>-p, -</tt><tt>-port PORT</tt></tag>
	    <p>
	      Set the TCP port to listen on.  (Standalone mode only.)
	    </p>

	    <tag>
	      <tt>-P, -</tt><tt>-pid-file FILE</tt>
	    </tag>
	    <p>
	      Save daemon process id to file.
	    </p>
	    
	    <tag>
	      <tt>-</tt><tt>-verbose</tt>
	    </tag>
	    <p>
	      Include debug messages in log.
	    </p>

	    <tag>
	      <tt>-</tt><tt>-no-detach</tt>
	    </tag>
	    <p>
	      Do not detach from the shell that started the daemon.
	      This may be useful when running distccd from a system
	      such as <em>daemontools</em> that manages daemons after
	      they start.
	    </p>
	    
	    <tag>
	      <tt>-</tt><tt>-no-fork</tt>
	    </tag>
	    <p>
	      Don't fork children for each connection, to allow
	      attaching <tt>gdb</tt>.  <bf>Don't use this if you don't
	      understand it!</bf>
	    </p>

	    <tag>
	      <tt>--no-fifo</tt>
	    </tag>
	    <p>
	      Send input to the compiler by writing to a temporary
	      file, rather than using a pipe.  This is required when
	      the server's temporary directory is on NFS, on at least
	      some machines.  It may be faster in some circumstances,
	      but probably is not.
	    </p>
	      
	    <tag>
	      <tt>-</tt><tt>-log-file=FILE</tt>
	    </tag>
	    <p>Send messages here instead of syslog.</p>

	    <tag><tt>-</tt><tt>-log-stderr</tt></tag>
	    <p>
	      Send log messages to stderr, rather than to a file or
	      syslog.  This is mainly intended for use in debugging.
	    </p>

	    <tag><tt>-</tt><tt>-inetd</tt>
	    <p>
	      Serve a client connected to stdin/stdout.  As the name
	      suggests, this option should be used when distccd is run
	      from within a super-server like <tt>inetd</tt>.  distccd
	      assumes inetd mode when stdin is a socket.
	    </p>

	    <tag><tt>-</tt><tt>-daemon</tt>
	    <p>
	      Bind and listen on a socket, rather than running from
	      inetd.  This is used for standalone mode.  distccd
	      assumes daemon mode at startup if stdin is a tty, so
	      <tt>--daemon</tt> should be explicitly specified when
	      starting distccd from a script or in a non-interactive
	      ssh connection.
	    </p>
	  </descrip>
	</p>
      </sect>

      <sect>
	<heading>distccd Exit Codes</heading>

	<p>
	  As for distcc <ref id="exit">.
	</p>
      </sect>



      <sect>
	<heading>distccd Environment Variables</heading>
	<p>
	  <descrip>
	    <tag><tt>DISTCC_SAVE_TEMPS</tt>
	    <p>
	      If set to <tt>1</tt>, temporary files are not deleted
	      after use.  Good for debugging or if your disks are too
	      empty.
	    </p>
	  </descrip>
      </sect>
    </chapt>  <!-- End of "The distccd Server" -->



    <chapt>
      <heading>Cross compiling</heading>
      
      <p>
	Cross compilation means building programs to run on a
	machine with a different processor, architecture, or
	operating system to where they were compiled.  distcc
	supports cross compilation, including teams of
	mixed-architecture machines, although some changes to the
	compilation commands may be required.
      </p>

      <p>
	The compilation command passed to distcc must be one that
	will execute properly on every volunteer machine to produce
	an object file of the appropriate type.  If the machines
	have different processors, then simply using <tt>distcc
	  cc</tt> will probably <bf>not</bf> work, because that will
	normally invoke the volunteer's native compiler.
      </p>

      <p>
	Machines with the same instruction set but different
	operating systems may not necessarily generate compatible .o
	files.  Empirically it seems that the native FreeBSD
	compiler generates object files compatible with Linux for C
	programs, but not for C++.  It may be a good idea to install
	a Linux cross compiler on BSD volunteers.
      </p>
      
      <p>
	Different versions of the compiler may generate incompatible
	object files.  This seems to be much more of a problem with
	C++ than with C, because the C++ ABI (application binary
	interface) has changed in recent years.  If you will be
	building C++ programs, it may be a good idea to install the
	same version of <tt>g++</tt> on all machines.
      </p>
      
      <p>
	Several different gcc configurations can be installed
	side-by-side on any machine.  If you build gcc from source,
	you should use the <tt>--program-suffix</tt> configuration
	options to cause it to be installed with a name that encodes
	the gcc version and the target platform.
      </p>
      
      <p>
	The recommended convention for the gcc name is
	<em>target</em><tt>-gcc-</tt><em>version</em>, such as
	<tt>i686-linux-gcc-3.2</tt>.  GCC 3.3 will install itself
	under this name, in addition to <em>target</em><tt>-gcc</tt>
	and, if it's native, <tt>gcc-</tt><em>version</em> and
	<tt>gcc</tt>.
      </p>

      <p>
	The compiler must be installed under the same name on the
	client and on every volunteer machine.
      </p>

      <p>
	gcc also has <tt>-b</tt> and <tt>-V</tt> options to specify
	a target and version, but at the moment the gcc team
	recommend using a qualified compiler name instead.
      </p>

      <p>
	For more information on cross-compiling, see <em>Specifying
	  Target Machine and Compiler Version</em> in the gcc manual,
	and the gcc installation guide.
      </p>	
    </chapt> <!-- end of "Cross compiling" -->

    
    <chapt>
      <heading>distcc Internals</heading>
      
      <sect>
	<heading>
	  Protocol
	</heading>
	
	<p>
	  distcc uses a simple, application-specific protocol running
	  directly over a TCP socket.  A new request socket is opened
	  for each job.
	</p>

	<p>
	  The request and response begin with a magic number and
	  version number, allowing incompatible versions or
	  misconfigurations to be identified.  At the moment there is
	  only one deployed protocol version, and no attempt to
	  support backward or forward compatibility, though this could
	  be added in the future.
	</p>

	<p>
	  The request and response consist of tagged, length-preceded
	  elements.  Each element of the request contains a
	  four-character ASCII token, an eight-digit ASCII hexadecimal
	  length or value, and, depending on the tag, a byte stream
	  whose length is determined by the hexadecimal field.
	</p>

	<p>
	  The complete request is sent to the server before the reply
	  begins.  Opening the TCP socket is performed concurrently
	  with execution of the preprocessor on the client.
	</p>

	<p>
	  The request from the client contains
	  
	  <enum>
	    <item>Magic number and version</item>
	    <item>Compiler command line</item>
	    <item>Preprocessed source code</item>
	  </enum>
	</p>

	
	<p>
	  The response from the server contains
	
	  <enum>
	    <item>Magic number and version
	    <item>Compiler exit code & status
	    <item>Compiler error messages
	    <item>Compiler stdout
	    <item>Object file (if any)
	  </enum>
	</p>

	<p>
	  Consult the source for more information.
	</p>
      </sect>


      <sect>
	<heading>
	  Working files
	</heading>

	<p>
	  distcc stores working files in a subdirectory of
	  <tt>/tmp</tt>.  These include synchronization files, and
	  compiler input/output temporary files.
	<p>
	  Temporary files should normally be cleaned up when the
	  program exits.  If distcc misbehaves, these files may be
	  useful in tracking down the cause.  Any that remain can be
	  removed by the system's temporary file reaper, or by hand.
	</p>
      </sect>


      <sect>
	<heading>
	  Lock files
	</heading>

	<p>
	  distcc uses lock files to allow each client to balance its
	  jobs across available volunteer machines.  For each
	  volunteer host, a zero-length file is created.  Clients
	  using that volunteer hold a <tt>flock</tt> lock on the file
	  while running.
	</p>
    </chapt>
  </book>
</linuxdoc>