To setup distcc to be compatible with the widest range of existing software, create a "masquerade dir" of compiler links that will invoke distcc. When the distcc-in-disguise gets invoked, it invokes the real compiler of the same name either on the local client machine, or on a remote volunteer host.
For instance, you could create the directory named /usr/lib/distcc/bin and populate it links (symlinks are slightly easier to maintain in the long run, but use hard links if you prefer them):
$ mkdir /usr/lib/distcc/bin
$ cd /usr/lib/distcc/bin
$ ln -s ../../../bin/distcc gcc
$ ln -s ../../../bin/distcc cc
$ ln -s ../../../bin/distcc g++
$ ln -s ../../../bin/distcc c++
Then, to use distcc, a user just needs to put the directory /usr/lib/distcc/bin early in the PATH (and have set the DISTCC_HOSTS environment variable) and distcc will handle the rest. Note that this masquerade dir must occur on the PATH earlier than the directory that contains the actual compilers of the same names, and that any auxiliary programs that these compilers call (such as "as" or "ld") must also be found on the PATH in a dir after the masquerade dir (since distcc calls out to the real compiler with a PATH value that has all dirs up to and including the masquerade dir trimmed off).
An alternate setup is to prefix the distcc command to compiler command lines so that it is called explicitly. This allows you to more easily control which things use distcc and which things don't, but can be more problematical when trying to use distcc with existing projects.
For example, to compile the standard application program:
distcc gcc -o hello.o -c hello.c
Standard Makefiles, including those using the GNU autoconf/automake system use the $CC variable as the name of the C compiler to run and the $CXX variable as the name of the C++ compiler to run. In many cases, it is sufficient to just override one or both of these variables, either from the command line, or perhaps from your login script (if you wish to use distcc for all compilations). The following example sets both variables and takes advantage of the fact that distcc defaults to calling "gcc" if no other compiler name is provided:
make CC=distcc CXX='distcc g++'
Unfortunately, this setup sometimes leads to incompatibilities with packages that don't expect the compiler name to contain spaces or to projects that don't honor the above variables when deciding what compiler to use. For instance, the KDE package will fail to compile using the above CXX-munging idiom, but will compile just fine if you use a masquerade dir (which causes all executions of "g++" to really run distcc).
Options to distcc must precede the compiler name. Any arguments or options following the name of the compiler are passed through to the compiler.
-
-help
Print a detailed usage message and exit.
-
-version
Show distcc version and exit.
The way in which distcc runs the compiler is controlled by a few environment variables.
NOTE: Some versions of make do not export Make
variables as environment variables by default. Also,
assignments to variables within the Makefile may override
their definitions in the environment that calls make. The
most reliable method seems to be to set DISTCC_*
variables in the environment of Make, and to set CC
on the right-hand-side of the Make command line. For
example:
$ DISTCC_HOSTS='localhost wistful toey'
$ PATH="/usr/lib/distcc/bin:$PATH"
$ export DISTCC_HOSTS PATH
$ ./configure
$ make all
or:
$ DISTCC_HOSTS='localhost wistful toey'
$ export DISTCC_HOSTS
$ CC='distcc' ./configure
$ make CC='distcc' all
Some Makefiles may, contrary to convention, explicitly call
gcc
or some other compiler, in which case
overriding $CC
will not be enough to call distcc.
While this is harmless (but suboptimal), using a masquerade
dir of distcc links will avoid this.
Remember that you should not use both methods for calling distcc at the same time. If you are using a masquerade dir, don't munge CC and/or CXX (just put the dir early on your PATH). If you're not using a masquerade dir, you'll need to either change CC and/or CXX, or modify the Makefile(s) to call distcc explicitly.
DISTCC_HOSTS
Space-separated list of volunteer host specifications.
DISTCC_VERBOSE
If set to 1
, distcc produces explanatory messages on the
standard error stream. This can be helpful in debugging
problems. Bug reports should include verbose output.
DISTCC_LOG
Log file to receive messages from distcc itself, rather than stderr.
DISTCC_SAVE_TEMPS
If set to 1
, temporary files are not deleted
after use. Good for debugging, or if your disks are too
empty.
DISTCC_TCP_CORK
If set to 0
,
disable use of "TCP corks", even if they're present on
this system. Using corks normally helps pack requests into
fewer packets and aids performance.
Building a C or C++ program on Unix involves several phases:
.c
) and headers (.h
) to
a preprocessed file (.i
).i
) to assembly
instructions (.s
).o
)distcc only ever runs the compiler and assembler remotely. The preprocessor must always run locally because it needs to access various header files on the local machine which may not be present, or may not be the same, on the volunteer. The linker similarly needs to examine libraries and object files, and so must run locally.
The compiler and assembler take only a single input file, the preprocessed source, produce a single output, the object file. distcc ships these two files across the network and can therefore run the compiler/assembler remotely.
Fortunately, for most programs running the preprocessor is relatively cheap, and the linker is called relatively infrequent, so most of the work can be distributed.
distcc examines its command line to determine which of these phases are being invoked, and whether the job can be distributed. Here is an example of a typical command that can be preprocessed locally and compiled remotely:
distcc gcc -o hello.o -DGREETING="hello" -c hello.c
The command-line scanner is intended to behave in the same way as gcc. In case of doubt, distcc runs the job locally.
In particular, this means that commands that compile and link in one go cannot be distributed. These are quite rare in realistic projects. Here is one example of a command that could not be distributed, because it calls the compiler and linker
distcc gcc -o hello hello.c
Moving source across the network is less efficient to compiling it locally. If you have access to a machine much faster than your workstation, the performance gain may overwhelm the cost of transferring the source code and it may be quicker to ship all your source across the network to compile it there.
In general, it is even better to compile on two or machines in parallel. Any number of invocations of distcc can run at the same time, and they will distribute their work across the available hosts.
distcc does not manage parallelization, but relies on Make or some other build system to invoke compiles in parallel.
With GNU Make, you should use the -j
option to specify
a number of parallel tasks slightly higher than the number
of available hosts. For example:
$ export DISTCC_HOSTS='angry toey wistful localhost'
$ make -j5
The $DISTCC_HOSTS
variable tells distcc which
volunteer machines are available to run jobs. This is a
space-separated list of host specifications, each of which
has the syntax:
HOSTNAME[/MAX_JOBS][:PORT]
You can specify the maximum number of jobs that the host should receive by affixing a number after a slash (e.g. "localhost/2").
A numeric TCP port may optionally be specified after a colon. If no port is specified, it uses the default, which is currently 3632.
If only one invocation of distcc runs at a time, it will always execute on the first host in the list. (This behaviour is not absolutely guaranteed, however, and may change in future versions.)
The name localhost
is handled specially by running
the compiler in place.
The daemon may be tested on localhost by setting
DISTCC_HOSTS=127.0.0.1
Although localhost
causes distcc to execute the job
directly, using an IP address will cause it to make a TCP
connection to a daemon on localhost. This is slower, but
useful for testing.
When distcc is invoked, it needs to decide which of the
volunteers in DISTCC_HOSTS
should be used to
compile a job. It uses a simple heuristic to try to spread
load across machines appropriately.
You can imagine all of the compile machines as being leaky buckets, some with larger holes (faster CPUs) than others. The distcc client tries to keep water at the same level on each one (the same number of jobs running), preferring hosts occurring earlier in DISTCC_HOSTS. Over the course of a build, the faster machines will complete jobs more quickly, and therefore be topped up more quickly and do more work overall, but without the client ever actually needing to know which one is fastest.
This design has the advantage of not requiring the client to know in advance the speeds of the volunteers, and being quite simple to implement. It copes quite well with machines that are temporarily slowed down: they are just topped-up more slowly in the future.
Scheduling is coordinated between different invocations of
the distcc
client by lockfiles in the temporary
directory. There is no coordination between clients running
as different users, on different hosts, or with different
TMPDIR
paths.
On Linux, scheduling slightly too many jobs on any machine
is quite harmless, as long as the number is not so high that
the machine begins thrashing. So it's OK to provide a
-j
number substantially higher than the number of
available processors.
The biggest problem with this design is that it handles
multiprocessor machines poorly: they probably ought to have
jobs scheduled proportional to the number of processors. At
the moment, the best thing is to run with a -j
factor equal to the product of the maximum number of CPUs in
any machine (MAX_CPUS
) and the number of machines.
This should make sure that roughly MAX_CPUS
tasks
run on every machine at all times, and will therefore keep
all CPUs loaded, but will cause excessive task-switching on
machines with fewer CPUs. Task switching is not very
expensive on Linux so it is not a big problem, but it does
lose a few percentage points of speed. This should be fixed
in a future release.
Error messages or warnings from local or remote compilers are passed through to diagnostic output on the client. The compiler takes all file names and line numbers from pragmas in the preprocessed output, so error messages will always have the correct pathnames for files on the client.
distcc can supply extensive debugging information when the
verbose option is used. This is controlled by the
$DISTCC_VERBOSE
environment variable on the client,
and the -
-verbose
option on the server.
By default, distcc prints diagnostic messages to stderr.
Sometimes these are too intrusive into the output of the
regular compiler, and so they may be selectively redirected
by setting the $DISTCC_LOG
environment variable to a
filename.
Error messages from the daemon are sent to both the log file
on the volunteer, and also back to the client's diagnostic
output. (By default, it uses the syslog daemon
channel.) If compilation is failing, please examine the log
file on the relevant volunteer machine.
The exit code of distcc is normally that of the compiler: zero for successful compilation and non-zero otherwise.
If distcc fails to distribute a job to a selected volunteer machine, it will try to run the compiler locally on the client. distcc only tries a single remote machine for each job.
distcc tries to distinguish between a failure to distribute the job, and a "genuine" failure of the compiler on the remote machine, for example because of a syntax error in the program. In the second case, distcc does not re-run the compiler locally, and returns the same exit code as the remote compiler.
If the compiler exits with a signal, distcc returns an exit code of 128 plus the signal number, following Unix convention.
If distcc fails to run the compiler, it may return one one of the following error codes. These are also used by distccd.
EXIT_DISTCC_FAILED
Generic or unspecified failure in distcc.
EXIT_BIND_FAILED
Failed to bind and listen on network socket. Port may already be in use.
EXIT_CONNECT_FAILED
Failed to establish network connection or listen on socket. The host may be invalid or unreachable, or there may be no daemon listening.
EXIT_COMPILER_CRASHED
The underlying compiler exited because of a signal. This probably indicates a compiler bug, or a problem with the hardware or OS on the server. (Obsolete in 0.13.)
EXIT_OUT_OF_MEMORY
Obvious.
EXIT_BAD_HOSTSPEC
$DISTCC_HOSTS
was undefined, empty, or
syntactically invalid. (At the moment, you should never
see this code because distcc will fall back to building
locally. Let me know if you would prefer a hard error.)
EXIT_IO_ERROR
There was a disk or network IO error while distributing the job. For example, the network may be failing, or a disk on either end may be out of space.
EXIT_TRUNCATED
The network socket was closed unexpectedly. This probably indicates a network problem or distcc bug.
EXIT_PROTOCOL_ERROR
The distcc internal network protocol was not followed by the remote program. This probably indicates a network problem or distcc bug.
EXIT_COMPILER_MISSING
The specified compiler was not found in the path of the server.
EXIT_RECURSION
distcc ended up invoking itself recursively. This is
caught as an error to prevent unbounded recursion or
inefficiency. For example, this error may occur if
distcc is called by the cc
command in the
daemon's path.
distcc transfers only the binary contents of source, error, and object files, without any concern for metadata, attributes, character sets or end-of-line conventions.
distcc never transmits file times across the network or modifies them, and so should not care whether the clocks on the client and volunteer machines are synchronized or not. When an object file is received onto the client, its modification time will be the current time on the client machine.