implementor.texinfo [plain text]

\input texinfo @c -*-texinfo-*-
@c
@c Note: the above texinfo file must include the "doubleleftarrow"
@c definitions added by jcb.
@c %**start of header
@c guide
@setfilename krb5-implement.info
@settitle Kerberos V5 Installation Guide
@setchapternewpage odd                  @c chapter begins on next odd page
@c @setchapternewpage on                   @c chapter begins on next page
@c @smallbook                              @c Format for 7" X 9.25" paper
@c %**end of header

@paragraphindent 0
@iftex
@parskip 6pt plus 6pt
@end iftex

@include definitions.texinfo
@set EDITION b7-1

@finalout                               @c don't print black warning boxes

@titlepage
@title @value{PRODUCT} Implementor's Guide
@subtitle Release:  @value{RELEASE}
@subtitle Document Edition:  @value{EDITION}
@subtitle Last updated:  @value{UPDATED}
@author @value{COMPANY}

@page
@vskip 0pt plus 1filll

@iftex
@include copyright.texinfo
@end iftex
@end titlepage

@node Top, Introduction, (dir), (dir)
@comment  node-name,  next,  previous,  up

@ifinfo
This file contains internal implementor's information for the
@value{RELEASE} release of @value{PRODUCT}.  

@include copyright.texinfo

@end ifinfo

@c The master menu is updated using emacs19's M-x texinfo-all-menus-update
@c function.  Don't forget to run M-x texinfo-every-node-update after
@c you add a new section or subsection, or after you've rearranged the
@c order of sections or subsections.  Also, don't forget to add an @node
@c comand before each @section or @subsection!  All you need to enter
@c is:
@c
@c @node New Section Name

@c @section New Section Name
@c
@c M-x texinfo-every-node-update will take care of calculating the
@c node's forward and back pointers.
@c
@c ---------------------------------------------------------------------

@menu
* Introduction::                
* Socket API::                  
* IPv6 Support::                
* Local Addresses::             
* Host Address Lookup::         
* Thread Safety::               
* Shared Libraries::            
@end menu

@node Introduction, Socket API, Top, Top
@chapter Introduction

This file contains internal implementor's information for
@value{PRODUCT}.  It is currently contains information that was removed
from install.texi; eventually it will have more detailed information on
the internals of the @value{PRODUCT}.

@node Socket API, IPv6 Support, Introduction, Top
@chapter Socket API

Someone should describe the API subset we're allowed to use with
sockets, how and when to use @code{SOCKET_ERRNO}, @i{etc}.

Note that all new code doing hostname and address translation should
use @code{getaddrinfo} and friends.  (@xref{Host Address Lookup}.)

@node IPv6 Support, Local Addresses, Socket API, Top
@chapter IPv6 Support

Most of the IPv6 support is keyed on the macro @code{KRB5_USE_INET6}.
If this macro is not defined, there should be no references to
@code{AF_INET6}, @code{struct sockaddr_in6}, @i{etc}.

The @code{configure} scripts will check for the existence of various
functions, macros and structure types to decide whether to enable the
IPv6 support.  You can also use the @samp{--enable-ipv6} or
@samp{--disable-ipv6} options to override this decision.

Regardless of the setting of @code{KRB5_USE_INET6}, some aspects of
the new APIs devised for IPv6 are used throughout the code, because it
would be too difficult maintain code for the IPv6 APIs and for the old
APIs at the same time.  But for backwards compatibility, we try to
fake them if the system libraries don't provide them, at least for
now.  This means we sometimes use slightly modified versions of the
APIs, but we try to keep the modifications as non-intrusive as
possible.  Macros are used to rename struct tags and function names,
so don't @code{#undef} any of these names.

@table @code

@item getaddrinfo
@itemx getnameinfo
@itemx freeaddrinfo
@itemx gai_strerror
@itemx struct addrinfo
Always include the header file @code{fake-addrinfo.h} before using
these.  If the native system doesn't provide them, the header file
will, using static functions that will call @code{gethostbyname} and
the like in the native libraries.  (This also happens to be the way
the Winsock 2 headers work, depending on some of the predefined macros
indicating the target OS version.)

We also provide ``wrapper'' versions on some systems where a native
implementation exists but the data it returns is broken in some way.

So these may not always be thread-safe, and they may not always
provide IPv6 support, but the API will be consistent.

@item struct sockaddr_storage
@itemx socklen_t
These are provided by @code{socket-utils.h}, if the native headers
don't provide them.  @code{sockaddr_storage} contains a
@code{sockaddr_in}, so by definition it's big enough to hold one; it
also has some extra padding which will probably make it big enough to
hold a @code{sockaddr_in6} if the resulting binary should get run on a
kernel with IPv6 support.

Question: Should these simply be moved into @code{port-sockets.h}?

@end table

IRIX 6.5.7 has no IPv6 support.  Of the systems most actively in the
MIT's Athena environment (used by MIT's Kerberos UNIX developers),
this is the only one without built-in IPv6 support.  In another year
or so we probably won't be using those systems any more, and we may
consider dropping support for systems without IPv6 support.

Somewhere between IRIX 6.5.14 and 6.5.16, partial IPv6 support was
introduced to the extent that the configuration system detects the
IPv6 support and attempts to use it. Code compiles, but then upon
linking, one discovers that ``in6addr_any'' is not defined in any
system library. A work around the header file @code{fake-addrinfo.h}
is provided by providing a static copy. This run time IPv6 code has
still not been tested.

Some utility functions or macros are also provided to give a
convenient shorthand for some operations, and to retain compile-time
type checking when possible (generally using inline functions but only
when compiling with GCC).

@table @code

@item socklen(struct sockaddr *)
Returns the length of the @code{sockaddr} structure, by looking at the
@code{sa_len} field if it exists, or by returning the known sizes of
@code{AF_INET} and @code{AF_INET6} address structures.

@item sa2sin(struct sockaddr *)
@itemx sa2sin6(struct sockaddr *)
@itemx ss2sa(struct sockaddr_storage *)
@itemx ss2sin(struct sockaddr_storage *)
@itemx ss2sin6(struct sockaddr_storage *)
Pointer type conversions.  Use these instead of plain casts, to get
type checking under GCC.

@end table

@node Local Addresses, Host Address Lookup, IPv6 Support, Top
@chapter Local Addresses

(Last update: 2002-03-13.)

Different systems have different ways of finding the local network
addresses.

On Windows, @code{gethostbyname} is called on the local host name to get a
set of addresses.  If that fails, a UDP socket is ``connected'' to a
particular IPv4 address, and the local socket name is retrieved, its
address being treated as the one local network address.  Future
versions of the Windows code should be able to actually examine local
interfaces.

On Mac OS 9 and earlier, a Mac-specific interface is used to look up
local addresses.  Presumably, on Mac OS X we'll use that or the
general UNIX code.

On (most?) UNIX systems, there is an @code{ioctl} called
@code{SIOCGIFCONF} which gets interface configuration information.
The behavior of this @code{ioctl} varies across UNIX systems though.
It takes as input a buffer to fill with data structures, but if the
buffer isn't big enough, the behavior isn't well defined.  Sometimes
you get an error, sometimes you get incomplete data.  Sometimes you
get a clear indication that more space was needed, sometimes not.  A
couple of systems have additional @code{ioctl}s that can be used to
determine or at least estimate the correct size for the buffer.
Solaris has introduced @code{SIOCGLIFCONF} for querying IPv6
addresses, and restricts @code{SIOCGIFCONF} to IPv4 only.  (** We
should actually check if that's true.)

We (Ken Raeburn in particular) ran some tests on various systems to
see what would happen with buffers of various sizes from much smaller
to much larger than needed for the actual data.  The buffers were
filled with specific byte values, and then checked to see how much of
the buffer was actually written to.  The "largest gap" values listed
below are the largest number of bytes we've seen left unused at the
end of the supplied buffer when there were more entries to return.
These values may of coures be dependent on the configurations of the
particular systems we wre testing with.  (See
@code{lib/krb5/os/t_gifconf.c} for the test program.)

NetBSD 1.5-alpha: The returned @code{ifc_len} is the desired amount of
space, always.  The returned list may be truncated if there isn't
enough room; no overrun.  Largest gap: 43.  However, NetBSD has
@code{getifaddrs}, which hides all the ugliness within the C library.

BSD/OS 4.0.1 (courtesy djm): The returned @code{ifc_len} is equal to
or less than the supplied @code{ifc_len}.  Sometimes the entire buffer
is used; sometimes N-1 bytes; occasionally, the buffer must have quite
a bit of extra room before the next structure will be added.  Largest
gap: 39.

Solaris 7,8: Return @code{EINVAL} if the buffer space is too small for
all the data to be returned, including when @code{ifc_len} is 0.
Solaris is the only system I've found so far that actually returns an
error.  No gap.  However, @code{SIOCGIFNUM} may be used to query the
number of interfaces.

Linux 2.2.12 (Red Hat 6.1 distribution, x86), 2.4.9 (RH 7.1, x86): The
buffer is filled in with as many entries as will fit, and the size
used is returned in @code{ifc_len}.  The list is truncated if needed,
with no indication.  Largest gap: 31.  @emph{However}, this interface
does not return any IPv6 addresses.  They must be read from a file
under @code{/proc}.  (This appears to be what the @samp{ifconfig}
program does.)

IRIX 6.5.7: The buffer is filled in with as many entries as will fit
in N-1 bytes, and the size used is returned in @code{ifc_len}.
Providing exactly the desired number of bytes is inadequate; the
buffer must be @emph{bigger} than needed.  (E.g., 32->0, 33->32.)  The
returned @code{ifc_len} is always less than the supplied one.  Largest
gap: 32.

AIX 4.3.3: Sometimes the returned @code{ifc_len} is bigger than the
supplied one, but it may not be big enough for @emph{all} the
interfaces.  Sometimes it's smaller than the supplied value, even if
the returned list is truncated.  The list is filled in with as many
entries as will fit; no overrun.  Largest gap: 143.

Older AIX: We're told by W. David Shambroom (DShambroom@@gte.com) in
PR krb5-kdc/919 that older versions of AIX have a bug in the
@code{SIOCGIFCONF} @code{ioctl} which can cause them to overrun the
supplied buffer.  However, we don't yet have details as to which
version, whether the overrun amount was bounded (e.g., one
@code{ifreq}'s worth) or not, whether it's a real buffer overrun or
someone assuming it was because @code{ifc_len} was increased, etc.
Once we've got details, we can try to work around the problem.

Digital UNIX 4.0F: If input @code{ifc_len} is zero, return an
@code{ifc_len} that's big enough to include all entries.  (Actually,
on our system, it appears to be larger than that by 32.)  If input
@code{ifc_len} is nonzero, fill in as many entries as will fit, and
set @code{ifc_len} accordingly.  (Tested only with buffer previously
filled with zeros.)

Tru64 UNIX 5.1A: Like Digital UNIX 4.0F, except the ``extra'' space
indicated when the input @code{ifc_len} is zero is larger.  (We got
400 out when 320 appeared to be needed.)

So... if the returned @code{ifc_len} is bigger than the supplied one,
we'll need at least that much space -- but possibly more -- to hold
all the results.  If the returned value is smaller or the same, we may
still need more space.

The heuristic we're using on most systems now is to keep growing the
buffer until the unused space is larger than an @code{ifreq} structure
by some safe margin.

@node Host Address Lookup, Thread Safety, Local Addresses, Top
@chapter Host Address Lookup

The traditional @code{gethostbyname} function is not thread-safe, and
does not support looking up IPv6 addresses, both of which are becoming
more important.  New standards have been in development that should
address both of these problems.  The most promising is
@code{getaddrinfo} and friends, which is part of the Austin Group and
UNIX 98(?) specifications.  Code in the MIT tree is gradually being
converted to use this interface.

@quotation
(Question: What about @code{inet_ntop} and @code{inet_pton}?  We're
not using them at the moment, but some bits of code would be
simplified if we were to do so, when plain addresses and not socket
addresses are already presented to us.)
@end quotation

The @code{getaddrinfo} function takes a host name and service name and
returns a linked list of structures indicating the address family,
length, and actual data in ``sockaddr'' form.  (That is, it includes a
pointer to a @code{sockaddr_in} or @code{sockaddr_in6} structure.)
Depending on options set via the @code{hints} input argument, the results
can be limited to a single address family (@i{e.g.}, for IPv4
applications), and the canonical name of the indicated host can be
returned.  Either the host or service can be a null pointer, in which
case only the other is looked up; they can also be expressed in
numeric form.  This interface is extensible to additional address
families in the future.  The returned linked list can be freed with
the @code{freeaddrinfo} function.

The @code{getnameinfo} function does the reverse -- given an address
in ``sockaddr'' form, it converts the address and port values into
printable forms.

Errors returned by either of these functions -- as return values, not
global variables -- can be translated into printable form with the
@code{gai_strerror} function.

Some vendors are starting to implement @code{getaddrinfo} and friends,
however, some of the implementations are deficient in one way or
another.

@table @asis

@item AIX
As of AIX 4.3.3, @code{getaddrinfo} returns sockaddr structures
without the family and length fields filled in.

@item GNU libc
The GNU C library, used on GNU/Linux systems, has had a few problems
in this area.  One version would drop some IPv4 addresses for some
hosts that had multiple IPv4 and IPv6 addresses.

In GNU libc 2.2.4, when the DNS is used, the name referred to by PTR
records for each of the addresses is looked up and stored in the
@code{ai_canonname} field, or the printed numeric form of the address
is, both of which are wrong.

@item IRIX
No known bugs here, but as of IRIX 6.5.7, the version we're using at
MIT, these functions had not been implemented.

@item NetBSD
As of NetBSD 1.5, this function is not thread-safe.  In 1.5X
(intermediate code snapshot between 1.5 and 1.6 releases), the
@code{ai_canonname} field can be empty, even if the
@code{AI_CANONNAME} flag was passed.  In particular, this can happen
if a numeric host address string is provided.  Also, numeric service
names appear not to work unless the stream type is given; specifying
the TCP protocol is not enough.

@item Tru64 UNIX
In Tru64 UNIX 5.0, @code{getaddrinfo} is available, but requires that
@code{<netdb.h>} be included before its use; that header file defines
@code{getaddrinfo} as a macro expanding to either @code{ogetaddrinfo}
or @code{ngetaddrinfo}, and apparently the symbol @code{getaddrinfo}
is not present in the system library, causing the @code{configure}
test for it to fail.  Technically speaking, I [Ken] think Compaq has
it wrong here, I think the symbol is supposed to be available even if
the application uses @code{#undef}, but I have not confirmed it in the
spec.

@item Windows
According to Windows documentation, the returned @code{ai_canonname}
field can be null even if the @code{AI_CANONNAME} flag is given.

@end table

For most systems where @code{getaddrinfo} returns incorrect data,
we've provided wrapper versions that call the system version and then
try to fix up the returned data.

For systems that don't provide these functions at all, we've provided
replacement versions that neither are thread-safe nor support IPv6,
but will allow us to convert the rest of our code to assume the
availability of @code{getaddrinfo}, rather than having to use two
branches everywhere, one for @code{getaddrinfo} and one for
@code{gethostbyname}.  These replacement functions do use
@code{gethostbyname} and the like; for some systems it would be
possible to use @code{gethostbyname2} or @code{gethostbyname_r} or
other such functions, to provide thread safety or IPv6 support, but
this has not been a priority for us, since most modern systems have
these functions anyways.  And if they don't, they probably don't have
real IPv6 support either.

Including @code{fake-addrinfo.h} will enable the wrapper or
replacement versions when needed.  Depending on the system
configuration, this header file may define several static functions
(and declare them @code{inline} under GNU C), and leave it to the
compiler to discard any unused code.  This may produce warnings on
some systems, and if the compiler isn't being too clever, may cause
several kilobytes of excess storage to be consumed on these backwards
systems.

Do not assume that @code{ai_canonname} will be set when the
@code{AI_CANONNAME} flag is set.  Check for a null pointer before
using it.

@node Thread Safety, Shared Libraries, Host Address Lookup, Top
@chapter Thread Safety

Hahahahahaha...  We're not even close.

We have started talking about it, though.  Some stuff is ``kind of''
thread safe because it operates on a @code{krb5_context} and we simply
assert that a context can be used only in one thread at a time.  But
there are places where we use unsafe C library functions, and a few
places where we have modifiable static data in the libraries.

Even if the Kerberos or C library functions aren't using static data
themselves, there are other instances of per-process data that have to
be dealt with before our library can become thread-safe.  For example,
file locking with UNIX @code{flock()} is on a per-process basis;
for a single thread to be able to lock a file against accesses from
other threads, we'll have to implement per-thread locks for files on
top of the operating system per-process locks, and that means a global
(per-process) table listing all the locks.  So it seems unlikely that
we will find an approach that eliminates all static modifiable data
from the library.

A rough proposal for hooks for implementing locking was put forth, and
an IBM Linux group is experimenting with a trial implementation of it,
with a few changes.  A few issues with the proposal have been
discussed on the @samp{krbdev} mailing list, and you can find the
discussion in the list archives.

@node Shared Libraries,  , Thread Safety, Top
@chapter Shared Libraries

(These sections are old -- they should get updated.)

@menu
* Shared Library Theory::       
* Operating System Notes for Shared Libraries::  
@end menu

@node Shared Library Theory, Operating System Notes for Shared Libraries, Shared Libraries, Shared Libraries
@section Theory of How Shared Libraries are Used

An explanation of how shared libraries are implemented on a given
platform is too broad a topic for this manual. Instead this will touch
on some of the issues that the Kerberos V5 tree uses to support version
numbering and alternate install locations.

Normally when one builds a shared library and then links with it, the
name of the shared library is stored in the object
(i.e. libfoo.so). Most operating systems allows one to change name that
is referenced and we have done so, placing the version number into the
shared library (i.e. libfoo.so.0.1). At link time, one would reference
libfoo.so, but when one executes the program, the shared library loader
would then look for the shared library with the alternate name.  Hence
multiple versions of shared libraries may be supported relatively
easily. @footnote{Under AIX for the RISC/6000, multiple versions of
shared libraries are supported by combining two or more versions of the
shared library into one file.  The Kerberos build procedure produces
shared libraries with version numbers in the internal module names, so
that the shared libraries are compatible with this scheme.
Unfortunately, combining two shared libraries requires internal
knowledge of the AIX shared library system beyond the scope of this
document.  Practicallyspeaking, only one version of AIX shared libraries
can be supported on a system, unless the multi-version library is
constructed by a programmer familiar with the AIX internals.}

All operating systems (that we have seen) provide a means for programs
to specify the location of shared libraries. On different operating
systems, this is either specified when creating the shared library, and
link time, or both.@footnote{Both are necessary sometimes as the shared
libraries are dependent on other shared libraries} The build process
will hardwire a path to the installed destination.

@node Operating System Notes for Shared Libraries,  , Shared Library Theory, Shared Libraries
@section Operating System Notes for Shared Libraries

From time to time users or developers suggest using GNU @code{Libtool}
or some other mechanism to  generate shared libraries.  Experience
with other packages suggests that Libtool tends to be difficult to
debug and when it works incorrectly, patches are required to generated
scripts to work around problems.  So far, the Kerberos shared library
build mechanism, which sets a variety of makefile variables based on
operating system type and then uses those variables in the build
process has proven to be easier to debug and adequate to the task of
building shared libraries for Kerberos.


@menu
* NetBSD Shared Library Support::  
* AIX Shared Library Support::  
* Solaris  Shared Library Support::  
* Alpha OSF/1 Shared Library Support::  
@end menu

@node NetBSD Shared Library Support, AIX Shared Library Support, Operating System Notes for Shared Libraries, Operating System Notes for Shared Libraries
@subsection NetBSD Shared Library Support

XXX I think this is horribly out of date and reflects pre-elf NetBSD.

Shared library support has been tested under NetBSD 1.0A using 
GCC 2.4.5. Due to the vagaries of the loader in the operating system,
the library load path needs to be specified in building libraries and in
linking with them. Unless the library is placed in a standard location
to search for libraries, this may make it difficult for developers to
work with the shared libraries.

@node AIX Shared Library Support, Solaris  Shared Library Support, NetBSD Shared Library Support, Operating System Notes for Shared Libraries
@subsection AIX Shared Library Support

        AIX specifies shared library versions by combining multiple
versions into a single file.  Because of the complexity of this process,
no automatic procedure for building multi-versioned shared libraries is
provided. Therefore, supporting multiple versions of the Kerberos shared
libraries under AIX will require significant work on the part of a
programmer famiiliar with AIX internals.  

        AIX allows a single library to be used both as a static library
and as a shared library.  For this reason, the @samp{--enable-shared}
switch to configure builds only shared libraries.  On other operating
systems, both shared and static libraries are built when this switch is
specified.  As with all other operating systems, only non-shared static
libraries are built when @samp{--enable-shared} is not specified.

        The AIX 3.2.5 linker dumps core trying to build a shared
@samp{libkrb5.a} produced with the GNU C compiler.  The native AIX
compiler works fine.  In addition, the AIX 4.1 linker is able to build a
shared @samp{libkrb5.a} when GNU C is used.


@node Solaris  Shared Library Support, Alpha OSF/1 Shared Library Support, AIX Shared Library Support, Operating System Notes for Shared Libraries
@subsection Solaris  Shared Library Support

Shared library support only works when using the Sunsoft C
compiler. We are currently using version 3.0.1.  Modern versions of
Solaris do not have this problem.


The path to the shared library must be specified at link time as well as
when creating libraries. 


@node Alpha OSF/1 Shared Library Support,  , Solaris  Shared Library Support, Operating System Notes for Shared Libraries
@subsection Alpha OSF/1 Shared Library Support

Shared library support has been tested with V2.1 and higher of the
operating system. Shared libraries may be compiled both with GCC and the
native compiler.

One of the nice features on this platform is that the paths to the
shared libraries is specified in the library itself without requiring
that one specify the same at link time. 

We are using the @samp{-rpath} option to @samp{ld} to place the library
load path into the executables. The one disadvantage of this is during
testing where we want to make sure that we are using the build tree
instead of a possibly installed library. The loader uses the contents of
@samp{-rpath} before LD_LIBRARY_PATH so we must specify a dummy _RLD_ROOT
and complete LD_LIBRARY_PATH in our tests.

The one disadvantage with the method we are using....

@contents
@bye