configurable_failover [plain text]

	Configurable Module Fail Over
	-----------------------------

Before configurable module failover, we had this kind of entry in
"radiusd.conf":

#---
authorize {
  preprocess
  files
}
#---

  This entry instructed the "authorize" section to first process the
request through the "preprocess" module, and if that returned success,
to process it through "files" module.  If that sequence returned
success, then the "authorize" stage itself would then return success.
Processing was strictly linear and if one module failed, the whole
section would fail immediately.

  Configurable failover provides more flexibility. It takes advantage
of the tree structure of radiusd.conf to support a configuration
language that allows you to "group" modules that should work together
in ways other than simple lists.  You can control the flow of any
stage (e.g. "authorize") to fit your needs, without touching C code,
just by altering radiusd.conf.

  This configurable fail-over has a convenient short-hand, too.
Administrators commonly want to say things like "try SQL1, if it's
down, try SQL2, otherwise drop the request."

  For example:

#---
  modules {
    sql sql1 {
      # configuration to connect to SQL database one
    }
    sql sql2 {
      # configuration to connect to SQL database two
    }
    always handled {
      rcode = handled
    }
  }

  #  Handle accounting packets
  accounting {
      detail			# always log to detail, stopping if it fails
      redundant {
        sql1			# try module sql1
        sql2			# if that's down, try module sql2
	handled			# otherwise drop the request as
				# it's been "handled" by the "always"
				# module (see doc/rlm_always)
      }
  }
#---

  The "redundant" section is a configuration directive which tells the
server to process the second module if the first one fails.  Any
number of modules can be listed in a "redundant" section.  The server
will process each in turn, until one of the modules succeeds.  It willthen stop processing the "redundant" list.

1. Rewriting results for single modules
   ------------------------------------

  Normally, when a module fails, the entire section ("authorize",
"accounting", etc.) stops being processed.  In some cases, we may want
to permit "soft failures".  That is, we may want to tell the server
that it is "ok" for a module to fail, and that the failure should not
be treated as a fatal error.

  In this case, the module is treated as a "section", rather than just
as a single line in "radiusd.conf".  The configuration entries for
that section are taken from the "configurable fail-over" code, and not
from the configuration information for that module.

  For example, the "detail" module normally returns "fail" if it is
unable to write its information to the "detail" file.  As a test, we
can configure the server so that it continues processing the request,
even if the "detail" module fails.  The following example shows how:

#--
  #  Handle accounting packets
  accounting {
      detail {
        fail = 1
      }
      redundant {
        sql1
        sql2
	handled
      }
  }
#--

  The "fail = 1" entry tells the server to remember the "fail" code,
with priority "1".  The normal configuration is "fail = return", which
means "if the detail module fails, stop processing the accounting
section".

2. Fail-over configuration entries
   -------------------------------

  Modules normally return on of the following codes as their result:

	Code		Meaning
	----		------
	notfound	the user was not found
	noop		the module did nothing
	ok		the module succeeded
	updated		the module updated information in the request
	fail		the module failed
	reject		the module rejected the user
	userlock	the user was locked out
	invalid		the user's configuration entry was invalid
	handled		the module has done everything to handle the request
	
  In a configurable fail-over section, each of these codes may be
listed, with a value.  If the code is not listed, or a configurable
fail-over section is not defined, then values that make sense for the
requested "group" (group, redundant, load-balance, etc) are used.

  The special code "default" can be used to set all return codes to
the specified value.  This value will be used with a lower priority
than ones that are explicitly set.

  The values for each code may be one of two things:

	Value		Meaning
	-----		-------
	<number>	Priority for this return code.
	return		stop processing this configurable fail-over list.
	reject		Stop processing this configurable fail-over list.
			and immediately return a reject.

  The <number> used for a value may be any decimal number between 1
and 99999.  The number is used when processing a list of modules, to
determine which code is returned from the list.  For example, if
"module1" returns "fail" with priority "1", and a later "module2"
returns "ok" with priority "3", the return code from the list of
modules will be "ok", because it has higher priority than "fail".

  This configurability allows the administrator to permit some modules
to fail, so long as a later module succeeds.


3. More Complex Configurations
   ---------------------------

  The "authorize" section is normally a list of module names.  We can
create sub-lists by using the section name "group".  The "redundant"
section above is just a short-hand for "group", with a set of default
return codes, which are different than the normal "stop processing the
list on failure".

  For example, we can configure two detail modules, and allow either
to fail, so long as one of them succeeds.

#--
  #  Handle accounting packets
  accounting {
      group {
        detail1 {
          fail = 1		# remember "fail" with priority 1
	  ok = return		# if we succeed, don't do "detail2"
        }
	detail2 {
	  fail = 1		# remember "fail" with priority 1
	  ok = return		# if we succeed, return "ok"
				# if "detail1" returned "fail"
	}
      }			# returns "fail" only if BOTH modules returned "fail"
      redundant {
        sql1
        sql2
	handled
      }
  }

#--

  This configuration says:

	log to "detail1", and stop processing the "group" list if
	"detail1" returned OK.

	If "detail1" returned "fail", then continue, but remember the
	"fail" code, with priority 1.

	If "detail2" fails, then remember "fail" with priority 1.

	If "detail2" returned "ok", return "ok" from the "group".

  The return code from the "group" is the return code which was either
forced to return (e.g. "ok" for "detail1"), or the highest priority
return code found by processing the list.

  This process can be extended to any number of modules listed in a
"group" section.


4. Virtual Modules
   ---------------

  Some configurations may require using the same list of modules, in
the same order, in multiple sections.  For those systems, the
configuration can be simplified through the use of "virtual" modules.
These modules are configured as named sub-sections of the
"instantiate" section, as follows:

	instantiate {
		...

		redundant sql1_or_2 {
			sql1
			sql2
		}
	}

  The name "sql1_or_2" can then be used in any other section, such as
"authorize" or "accounting".  The result will be *exactly* as if that
section was placed at the location of the "sql1_or_2" reference.

  These virtual modules are full-fledged objects in and of themselves.
One virtual module can refer to another virtual module, and they can
contain "if" conditions, or any other configuration permitted in a
section.


5. Redundancy and Load-Balancing
   -----------------------------

  See "man unlang" or doc/load-balance.txt for information on simple
redundancy (fail-over) and load balancing.


6. The Gory Details
   ----------------

The fundamental object is called a MODCALLABLE, because it is something that
can be passed a specific radius request and returns one of the RLM_MODULE_*
results. It is a function - if you can accept the fact that pieces of
radiusd.conf are functions. There are two kinds of MODCALLABLEs: GROUPs and
SINGLEs.

A SINGLE is a reference to a module instance that was set up in the modules{}
section of radiusd.conf, like "preprocess" or "sql1". When a SINGLE is
called, the corresponding function in the rlm is invoked, and whichever
RLM_MODULE_* it returns becomes the RESULT of the SINGLE.

A GROUP is a section of radiusd.conf that includes some MODCALLABLEs.
Examples of GROUPs above include "authorize{...}", which implements the C
function module_authorize, and "redundant{...}", which contains two SINGLEs
that refer to a couple of redundant databases. Note that a GROUP can contain
other GROUPs - "Auth-Type SQL{...}" is also a GROUP, which implements the C
function module_authenticate when Auth-Type is set to SQL.

Now here's the fun part - what happens when a GROUP is called? It simply runs
through all of its children in order, and calls each one, whether it is
another GROUP or a SINGLE. It then looks at the RESULT of that child, and
takes some ACTION, which is basically either "return that RESULT immediately"
or "Keep going". In the first example, any "bad" RESULT from the preprocess
module causes an immediate return, and any "good" RESULT causes the
authorize{...} GROUP to proceed to the files module.

We can see the exact rules by writing them out the long way:

authorize {
  preprocess {
    notfound = 1
    noop     = 2
    ok       = 3
    updated  = 4
    fail     = return
    reject   = return
    userlock = return
    invalid  = return
    handled  = return
  }
  files {
    notfound = 1
    noop     = 2
    ok       = 3
    updated  = 4
    fail     = return
    reject   = return
    userlock = return
    invalid  = return
    handled  = return
  }
}

This is the same as the first example, with the behavior explicitly
spelled out. Each SINGLE becomes its own section, containing a list of
RESULTs that it may return and what ACTION should follow from them. So
preprocess is called, and if it returns for example RLM_MODULE_REJECT,
then the reject=return rule is applied, and the authorize{...} GROUP
itself immediately returns RLM_MODULE_REJECT.

If preprocess returns RLM_MODULE_NOOP, the corresponding ACTION is "2". An
integer ACTION serves two purposes - first, it tells the parent GROUP to go
on to the next module. Second, it is a hint as to how desirable this RESULT
is as a candidate for the GROUP's own RESULT. So files is called... suppose
it returns RLM_MODULE_NOTFOUND. The ACTION for notfound inside the files{...}
block is "1". We have now reached the end of the authorize{...} GROUP and we
look at the RESULTs we accumulated along the way - there is a noop with
preference level 2, and a notfound with preference level 1, so the
authorize{...} GROUP as a whole returns RLM_MODULE_NOOP, which makes sense
because to say the user was not found at all would be a lie, since preprocess
apparently found him, or else it would have returned RLM_MODULE_NOTFOUND too.

We could use the "default" code to simplify the above example a
little.  The following two configurations are identical:

...
  files {
    notfound = 1
    noop     = 2
    ok       = 3
    updated  = 4
    default  = return
  }
...

When putting the "default" first, later definitions over-ride it's
return code:

...
  files {
    default  = return
    notfound = 1
    noop     = 2
    ok       = 3
    updated  = 4
  }
...


[Take a deep breath - the worst is over]

That RESULT preference/desirability stuff is pretty complex, but my hope is
that it will be complex enough to handle the needs of everyone's real-world
imperfect systems, while staying out of sight most of the time since the
defaults will be right for the most common configurations.

So where does redundant{...} fit in with all that? Well, redundant{...} is
simply a group that changes the default ACTIONs to something like

  fail = 1
  everythingelse = return

so that when one module fails, we keep trying until we find one that doesn't
fail, then return whatever it returned. And at the end, if they all failed,
the redundant GROUP as a whole returns RLM_MODULE_FAIL, just as you'd want it
to (I hope).

There are two other kinds of grouping: group{...} which does not have any
specialized default ACTIONs, and append{...}, which should be used when you
have separate but similarly structured databases that are guaranteed not to
overlap.

That's all that really needs to be said. But now a few random notes:

1. GROUPs may have RESULT=ACTION specifiers too! It would look like this:

  authorize {
    preprocess
    redundant {
      sql1
      sql2
      notfound = return
    }
    files
  }

which would prevent rlm_files from being called if neither of the SQL
instances could find the user.

2. redundant{...} and append{...} are just shortcuts. You could write
    group {
      sql1 {
        fail     = 1
        notfound = 2
        noop     = return
        ok       = return
        updated  = return
        reject   = return
        userlock = return
        invalid  = return
        handled  = return
      }
      sql2 {
        fail     = 1
        notfound = 2
        noop     = return
        ok       = return
        updated  = return
        reject   = return
        userlock = return
        invalid  = return
        handled  = return
      }
    }
  instead of
    redundant {
      sql1
      sql2
    }
  but the latter is just a whole lot easier to read.

3. "authenticate{...}" itself is not a GROUP, even though it contains a list
of Auth-Type GROUPs, because its semantics are totally different - it uses
Auth-Type to decide which of its members to call, and their order is
irrelevant.

4. The default rules are context-sensitive - for authorize, the defaults are
what you saw above - notfound, noop, ok, and updated are considered
success, and anything else has an ACTION of "return". For authenticate, the
default is to return on success *or* reject, and only try the second and
following items if the first one fails. You can read all the default ACTIONs
in modcall.c (int defaultactions[][][]), or just trust me. They do the right
thing.

5. There are some rules that can't be implemented in this language - things
like "notfound = 1-reject", "noop = 2-ok", "ok = 3-ok", etc. But I don't feel
justified adding that complexity in the first draft.
There are already enough things here that may never see real-world usage.
Like append{...}

-- Pac. 9/18/2000