description of distcc protocol version 1 Copyright (C) 2003 by Martin Pool disclaimer ---------- This document is provided as explanation for people developing or debugging distcc. Discrepancies between this document and the distcc code are an error in the document. This document is intended to describe distcc 2.0. Because this program and document is licensed free of charge, there is no warranty of any kind, to the extent permitted by applicable law. If anything is unclear, please ask on the mailing list. purpose ------- The distcc protocol allows a compiler command line plus C preprocessed source code to be transmitted from a client computer to a server, where it is compiled. The results (object code, exit status, and/or error messages) are returned to the client. protocol versioning ------------------- This document describes version 1 of the protocol, which is used by distcc releases up to and including 2.1. Future versions of the protocol may vary in any way. The client begins its request by identifying the protocol version that it will use. The server must either respond using the requested protocol version, or respond with an error in the appropriate protocol version, or drop the connection. There is no mechanisms for negotiation of versions. It is envisaged that when software using a new protocol is released, users will upgrade both clients and servers at the same time. Since distcc is only used within organizations rather than on the internet this should be realistic. foundations ----------- The distcc protocol runs over a bidirectional non-delimited byte channel. This is normally a TCP connection, although operation over the OpenSSH secure shell is also supported. Each compilation of a file ("job") corresponds to one invocation of the client program and connection. control flow ------------ The overall structure of the protocol is very simple: 1: The client opens a connection, which is accepted by the server. 2: The client sends a request. 3: The server sends a response. 4: The client and server mutually close the connection. connections ----------- Connections are initiated by the client. TCP connections are opened by connecting to the server port, normally 3632. SSH connections are initiated by running the command "distccd --inetd" over SSH. Either party (client or server) may drop the connection at any point, in which case the job is discarded by both parties. In particular, the server drops the connection if the client's IP address is not allowed to connect, or if the client makes a protocol error. packets ------- The request and response both consist of a series of packets. Each packet consists of two or three parts: token -- four ascii bytes, defining the type of the packet parameter -- eight ascii hexadecimal digits, corresponding to a 32-bit unsigned quantity body -- optionally present, depending on the token The sequence in which tokens are sent by both client and server is always fixed and is specified by this document. If the type of packet (determined by the token) requires a body, then the length of the body is given by the parameter. Otherwise, the meaning of the parameter depends on the token. Closing the channel has no meaning and does not indicate the end of a packet, although it should correspond with the end of the last packet. An early close indicates an error. [note: Because the sequence of tokens never varies they are strictly redundant: each party could just read the parameters and bodies in order and know what they mean from context. Sending the tokens provides a check that both parties understand the data to mean the same thing and guards against code or network errors. It also aids any person examining the network stream.] [note: Sending the parameter as hexadecimal is redundant, because four bytes would be sufficient to encode the 32-bit integer. Sending as hexadecimal serves as a futher check that the client and server are synchronized. Incorrect values for the parameter are detected as a program or network error.] [note: The fixed length of the token and length-preceded body allows IO with a small number of system calls: the header can be read in exactly twelve bytes, and the body can be read in a single chunk without needing to parse its contents.] [note: The length-preceded format means that data cannot be streamed out until the process that produced it is complete and the length of the data is known.] request ------- The following packets comprise the request: DIST Greeting from client, followed by the protocol version, which is always 1. ARGC Specifies the number of arguments in the command, including the compiler name. ARGV Repeated times. Specifies the string value of one element of the command. DOTI The contents of the preprocessed source (.i) file. response -------- The following packets comprise the response: DONE Introduces the response. The version must be 1. STAT Gives the Unix wait status of the compiler. This 0 for success, or otherwise (EXITCODE << 8) | (TERMSIGNAL), depending on whether the compiler returned an error or was terminated by a signal. (See or a POSIX standard for a full definition.) SERR Contains messages sent to stderr by the compiler. SOUT Contains messages sent to stdout by the compiler. Normally empty -- errors and warnings for most compilers go to stderr instead. DOTO Contains the object file (.o) produced by the compiler. In very early versions, this is missing if the status code is not zero. In versions up to and including 2.7.1, zero-length output files are not supported. If compilation fails, then the server sends the file with a length of zero, but the client does not read it. From 2.8, things work as follows: regular success: status zero length nonzero succeeded, but zero-byte output: status zero length zero output file is created/touched failed: status nonzero length sent as zero This is a little complex but is compatible with earlier releases. This does mean we cannot represent a compiler that produces output and then fails, but I don't think that is very important.