g-spipat.ads   [plain text]


------------------------------------------------------------------------------
--                                                                          --
--                         GNAT LIBRARY COMPONENTS                          --
--                                                                          --
--                G N A T . S P I T B O L . P A T T E R N S                 --
--                                                                          --
--                                 S p e c                                  --
--                                                                          --
--                                                                          --
--           Copyright (C) 1997-1999 Ada Core Technologies, Inc.            --
--                                                                          --
-- GNAT is free software;  you can  redistribute it  and/or modify it under --
-- terms of the  GNU General Public License as published  by the Free Soft- --
-- ware  Foundation;  either version 2,  or (at your option) any later ver- --
-- sion.  GNAT is distributed in the hope that it will be useful, but WITH- --
-- OUT ANY WARRANTY;  without even the  implied warranty of MERCHANTABILITY --
-- or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License --
-- for  more details.  You should have  received  a copy of the GNU General --
-- Public License  distributed with GNAT;  see file COPYING.  If not, write --
-- to  the Free Software Foundation,  59 Temple Place - Suite 330,  Boston, --
-- MA 02111-1307, USA.                                                      --
--                                                                          --
-- As a special exception,  if other files  instantiate  generics from this --
-- unit, or you link  this unit with other files  to produce an executable, --
-- this  unit  does not  by itself cause  the resulting  executable  to  be --
-- covered  by the  GNU  General  Public  License.  This exception does not --
-- however invalidate  any other reasons why  the executable file  might be --
-- covered by the  GNU Public License.                                      --
--                                                                          --
-- GNAT is maintained by Ada Core Technologies Inc (http://www.gnat.com).   --
--                                                                          --
------------------------------------------------------------------------------

--  SPITBOL-like pattern construction and matching

--  This child package of GNAT.SPITBOL provides a complete implementation
--  of the SPITBOL-like pattern construction and matching operations. This
--  package is based on Macro-SPITBOL created by Robert Dewar.

------------------------------------------------------------
-- Summary of Pattern Matching Packages in GNAT Hierarchy --
------------------------------------------------------------

--  There are three related packages that perform pattern maching functions.
--  the following is an outline of these packages, to help you determine
--  which is best for your needs.

--     GNAT.Regexp (files g-regexp.ads/g-regexp.adb)
--       This is a simple package providing Unix-style regular expression
--       matching with the restriction that it matches entire strings. It
--       is particularly useful for file name matching, and in particular
--       it provides "globbing patterns" that are useful in implementing
--       unix or DOS style wild card matching for file names.

--     GNAT.Regpat (files g-regpat.ads/g-regpat.adb)
--       This is a more complete implementation of Unix-style regular
--       expressions, copied from the original V7 style regular expression
--       library written in C by Henry Spencer. It is functionally the
--       same as this library, and uses the same internal data structures
--       stored in a binary compatible manner.

--     GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb)
--       This is a completely general patterm matching package based on the
--       pattern language of SNOBOL4, as implemented in SPITBOL. The pattern
--       language is modeled on context free grammars, with context sensitive
--       extensions that provide full (type 0) computational capabilities.

with Ada.Finalization; use Ada.Finalization;
with Ada.Strings.Maps; use Ada.Strings.Maps;
with Ada.Text_IO;      use Ada.Text_IO;

package GNAT.Spitbol.Patterns is
pragma Elaborate_Body (Patterns);

   -------------------------------
   -- Pattern Matching Tutorial --
   -------------------------------

   --  A pattern matching operation (a call to one of the Match subprograms)
   --  takes a subject string and a pattern, and optionally a replacement
   --  string. The replacement string option is only allowed if the subject
   --  is a variable.

   --  The pattern is matched against the subject string, and either the
   --  match fails, or it succeeds matching a contiguous substring. If a
   --  replacement string is specified, then the subject string is modified
   --  by replacing the matched substring with the given replacement.


   --  Concatenation and Alternation
   --  =============================

   --    A pattern consists of a series of pattern elements. The pattern is
   --    built up using either the concatenation operator:

   --       A & B

   --    which means match A followed immediately by matching B, or the
   --    alternation operator:

   --       A or B

   --    which means first attempt to match A, and then if that does not
   --    succeed, match B.

   --    There is full backtracking, which means that if a given pattern
   --    element fails to match, then previous alternatives are matched.
   --    For example if we have the pattern:

   --      (A or B) & (C or D) & (E or F)

   --    First we attempt to match A, if that succeeds, then we go on to try
   --    to match C, and if that succeeds, we go on to try to match E. If E
   --    fails, then we try F. If F fails, then we go back and try matching
   --    D instead of C. Let's make this explicit using a specific example,
   --    and introducing the simplest kind of pattern element, which is a
   --    literal string. The meaning of this pattern element is simply to
   --    match the characters that correspond to the string characters. Now
   --    let's rewrite the above pattern form with specific string literals
   --    as the pattern elements:

   --      ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")

   --    The following strings will be attempted in sequence:

   --       ABC . DEF . GH
   --       ABC . DEF . IJ
   --       ABC . CDE . GH
   --       ABC . CDE . IJ
   --       AB . DEF . GH
   --       AB . DEF . IJ
   --       AB . CDE . GH
   --       AB . CDE . IJ

   --    Here we use the dot simply to separate the pieces of the string
   --    matched by the three separate elements.


   --  Moving the Start Point
   --  ======================

   --    A pattern is not required to match starting at the first character
   --    of the string, and is not required to match to the end of the string.
   --    The first attempt does indeed attempt to match starting at the first
   --    character of the string, trying all the possible alternatives. But
   --    if all alternatives fail, then the starting point of the match is
   --    moved one character, and all possible alternatives are attempted at
   --    the new anchor point.

   --    The entire match fails only when every possible starting point has
   --    been attempted. As an example, suppose that we had the subject
   --    string

   --      "ABABCDEIJKL"

   --    matched using the pattern in the previous example:

   --      ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")

   --    would succeed, afer two anchor point moves:

   --      "ABABCDEIJKL"
   --         ^^^^^^^
   --         matched
   --         section

   --    This mode of pattern matching is called the unanchored mode. It is
   --    also possible to put the pattern matcher into anchored mode by
   --    setting the global variable Anchored_Mode to True. This will cause
   --    all subsequent matches to be performed in anchored mode, where the
   --    match is required to start at the first character.

   --    We will also see later how the effect of an anchored match can be
   --    obtained for a single specified anchor point if this is desired.


   --  Other Pattern Elements
   --  ======================

   --    In addition to strings (or single characters), there are many special
   --    pattern elements that correspond to special predefined alternations:

   --      Arb       Matches any string. First it matches the null string, and
   --                then on a subsequent failure, matches one character, and
   --                then two characters, and so on. It only fails if the
   --                entire remaining string is matched.

   --      Bal       Matches a non-empty string that is parentheses balanced
   --                with respect to ordinary () characters. Examples of
   --                balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E".
   --                Bal matches the shortest possible balanced string on the
   --                first attempt, and if there is a subsequent failure,
   --                attempts to extend the string.

   --      Cancel    Immediately aborts the entire pattern match, signalling
   --                failure. This is a specialized pattern element, which is
   --                useful in conjunction with some of the special pattern
   --                elements that have side effects.

   --      Fail      The null alternation. Matches no possible strings, so it
   --                always signals failure. This is a specialized pattern
   --                element, which is useful in conjunction with some of the
   --                special pattern elements that have side effects.

   --      Fence     Matches the null string at first, and then if a failure
   --                causes alternatives to be sought, aborts the match (like
   --                a Cancel). Note that using Fence at the start of a pattern
   --                has the same effect as matching in anchored mode.

   --      Rest      Matches from the current point to the last character in
   --                the string. This is a specialized pattern element, which
   --                is useful in conjunction with some of the special pattern
   --                elements that have side effects.

   --      Succeed   Repeatedly matches the null string (it is equivalent to
   --                the alternation ("" or "" or "" ....). This is a special
   --                pattern element, which is useful in conjunction with some
   --                of the special pattern elements that have side effects.


   --  Pattern Construction Functions
   --  ==============================

   --    The following functions construct additional pattern elements

   --      Any(S)    Where S is a string, matches a single character that is
   --                any one of the characters in S. Fails if the current
   --                character is not one of the given set of characters.

   --      Arbno(P)  Where P is any pattern, matches any number of instances
   --                of the pattern, starting with zero occurrences. It is
   --                thus equivalent to ("" or (P & ("" or (P & ("" ....)))).
   --                The pattern P may contain any number of pattern elements
   --                including the use of alternatiion and concatenation.

   --      Break(S)  Where S is a string, matches a string of zero or more
   --                characters up to but not including a break character
   --                that is one of the characters given in the string S.
   --                Can match the null string, but cannot match the last
   --                character in the string, since a break character is
   --                required to be present.

   --      BreakX(S) Where S is a string, behaves exactly like Break(S) when
   --                it first matches, but if a string is successfully matched,
   --                then a susequent failure causes an attempt to extend the
   --                matched string.

   --      Fence(P)  Where P is a pattern, attempts to match the pattern P
   --                including trying all possible alternatives of P. If none
   --                of these alternatives succeeds, then the Fence pattern
   --                fails. If one alternative succeeds, then the pattern
   --                match proceeds, but on a subsequent failure, no attempt
   --                is made to search for alternative matches of P. The
   --                pattern P may contain any number of pattern elements
   --                including the use of alternatiion and concatenation.

   --      Len(N)    Where N is a natural number, matches the given number of
   --                characters. For example, Len(10) matches any string that
   --                is exactly ten characters long.

   --      NotAny(S) Where S is a string, matches a single character that is
   --                not one of the characters of S. Fails if the current
   --                characer is one of the given set of characters.

   --      NSpan(S)  Where S is a string, matches a string of zero or more
   --                characters that is among the characters given in the
   --                string. Always matches the longest possible such string.
   --                Always succeeds, since it can match the null string.

   --      Pos(N)    Where N is a natural number, matches the null string
   --                if exactly N characters have been matched so far, and
   --                otherwise fails.

   --      Rpos(N)   Where N is a natural number, matches the null string
   --                if exactly N characters remain to be matched, and
   --                otherwise fails.

   --      Rtab(N)   Where N is a natural number, matches characters from
   --                the current position until exactly N characters remain
   --                to be matched in the string. Fails if fewer than N
   --                unmatched characters remain in the string.

   --      Tab(N)    Where N is a natural number, matches characters from
   --                the current position until exactly N characters have
   --                been matched in all. Fails if more than N characters
   --                have already been matched.

   --      Span(S)   Where S is a string, matches a string of one or more
   --                characters that is among the characters given in the
   --                string. Always matches the longest possible such string.
   --                Fails if the current character is not one of the given
   --                set of characters.

   --  Recursive Pattern Matching
   --  ==========================

   --    The plus operator (+P) where P is a pattern variable, creates
   --    a recursive pattern that will, at pattern matching time, follow
   --    the pointer to obtain the referenced pattern, and then match this
   --    pattern. This may be used to construct recursive patterns. Consider
   --    for example:

   --       P := ("A" or ("B" & (+P)))

   --    On the first attempt, this pattern attempts to match the string "A".
   --    If this fails, then the alternative matches a "B", followed by an
   --    attempt to match P again. This second attempt first attempts to
   --    match "A", and so on. The result is a pattern that will match a
   --    string of B's followed by a single A.

   --    This particular example could simply be written as NSpan('B') & 'A',
   --    but the use of recursive patterns in the general case can construct
   --    complex patterns which could not otherwise be built.


   --  Pattern Assignment Operations
   --  =============================

   --    In addition to the overall result of a pattern match, which indicates
   --    success or failure, it is often useful to be able to keep track of
   --    the pieces of the subject string that are matched by individual
   --    pattern elements, or subsections of the pattern.

   --    The pattern assignment operators allow this capability. The first
   --    form is the immediate assignment:

   --       P * S

   --    Here P is an arbitrary pattern, and S is a variable of type VString
   --    that will be set to the substring matched by P. This assignment
   --    happens during pattern matching, so if P matches more than once,
   --    then the assignment happens more than once.

   --    The deferred assignment operation:

   --      P ** S

   --    avoids these multiple assignments by deferring the assignment to the
   --    end of the match. If the entire match is successful, and if the
   --    pattern P was part of the successful match, then at the end of the
   --    matching operation the assignment to S of the string matching P is
   --    performed.

   --    The cursor assignment operation:

   --      Setcur(N'Access)

   --    assigns the current cursor position to the natural variable N. The
   --    cursor position is defined as the count of characters that have been
   --    matched so far (including any start point moves).

   --    Finally the operations * and ** may be used with values of type
   --    Text_IO.File_Access. The effect is to do a Put_Line operation of
   --    the matched substring. These are particularly useful in debugging
   --    pattern matches.


   --  Deferred Matching
   --  =================

   --    The pattern construction functions (such as Len and Any) all permit
   --    the use of pointers to natural or string values, or functions that
   --    return natural or string values. These forms cause the actual value
   --    to be obtained at pattern matching time. This allows interesting
   --    possibilities for constructing dynamic patterns as illustrated in
   --    the examples section.

   --    In addition the (+S) operator may be used where S is a pointer to
   --    string or function returning string, with a similar deferred effect.

   --    A special use of deferred matching is the construction of predicate
   --    functions. The element (+P) where P is an access to a function that
   --    returns a Boolean value, causes the function to be called at the
   --    time the element is matched. If the function returns True, then the
   --    null string is matched, if the function returns False, then failure
   --    is signalled and previous alternatives are sought.

   --  Deferred Replacement
   --  ====================

   --    The simple model given for pattern replacement (where the matched
   --    substring is replaced by the string given as the third argument to
   --    Match) works fine in simple cases, but this approach does not work
   --    in the case where the expression used as the replacement string is
   --    dependent on values set by the match.

   --    For example, suppose we want to find an instance of a parenthesized
   --    character, and replace the parentheses with square brackets. At first
   --    glance it would seem that:

   --      Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']');

   --    would do the trick, but that does not work, because the third
   --    argument to Match gets evaluated too early, before the call to
   --    Match, and before the pattern match has had a chance to set Char.

   --    To solve this problem we provide the deferred replacement capability.
   --    With this approach, which of course is only needed if the pattern
   --    involved has side effects, is to do the match in two stages. The
   --    call to Match sets a pattern result in a variable of the private
   --    type Match_Result, and then a subsequent Replace operation uses
   --    this Match_Result object to perform the required replacement.

   --    Using this approach, we can now write the above operation properly
   --    in a manner that will work:

   --      M : Match_Result;
   --      ...
   --      Match (Subject, '(' & Len (1) * Char & ')', M);
   --      Replace (M, '[' & Char & ']');

   --    As with other Match cases, there is a function and procedure form
   --    of this match call. A call to Replace after a failed match has no
   --    effect. Note that Subject should not be modified between the calls.

   --  Examples of Pattern Matching
   --  ============================

   --    First a simple example of the use of pattern replacement to remove
   --    a line number from the start of a string. We assume that the line
   --    number has the form of a string of decimal digits followed by a
   --    period, followed by one or more spaces.

   --       Digs : constant Pattern := Span("0123456789");

   --       Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' ');

   --    Now to use this pattern we simply do a match with a replacement:

   --       Match (Line, Lnum, "");

   --    which replaces the line number by the null string. Note that it is
   --    also possible to use an Ada.Strings.Maps.Character_Set value as an
   --    argument to Span and similar functions, and in particular all the
   --    useful constants 'in Ada.Strings.Maps.Constants are available. This
   --    means that we could define Digs as:

   --       Digs : constant Pattern := Span(Decimal_Digit_Set);

   --    The style we use here, of defining constant patterns and then using
   --    them is typical. It is possible to build up patterns dynamically,
   --    but it is usually more efficient to build them in pieces in advance
   --    using constant declarations. Note in particular that although it is
   --    possible to construct a pattern directly as an argument for the
   --    Match routine, it is much more efficient to preconstruct the pattern
   --    as we did in this example.

   --    Now let's look at the use of pattern assignment to break a
   --    string into sections. Suppose that the input string has two
   --    unsigned decimal integers, separated by spaces or a comma,
   --    with spaces allowed anywhere. Then we can isolate the two
   --    numbers with the following pattern:

   --       Num1, Num2 : aliased VString;

   --       B : constant Pattern := NSpan(' ');

   --       N : constant Pattern := Span("0123456789");

   --       T : constant Pattern :=
   --             NSpan(' ') & N * Num1 & Span(" ,") & N * Num2;

   --    The match operation Match (" 124, 257  ", T) would assign the
   --    string 124 to Num1 and the string 257 to Num2.

   --    Now let's see how more complex elements can be built from the
   --    set of primitive elements. The following pattern matches strings
   --    that have the syntax of Ada 95 based literals:

   --       Digs  : constant Pattern := Span(Decimal_Digit_Set);
   --       UDigs : constant Pattern := Digs & Arbno('_' & Digs);

   --       Edig  : constant Pattern := Span(Hexadecimal_Digit_Set);
   --       UEdig : constant Pattern := Edig & Arbno('_' & Edig);

   --       Bnum  : constant Pattern := Udigs & '#' & UEdig & '#';

   --    A match against Bnum will now match the desired strings, e.g.
   --    it will match 16#123_abc#, but not a#b#. However, this pattern
   --    is not quite complete, since it does not allow colons to replace
   --    the pound signs. The following is more complete:

   --       Bchar : constant Pattern := Any("#:");
   --       Bnum  : constant Pattern := Udigs & Bchar & UEdig & Bchar;

   --    but that is still not quite right, since it allows # and : to be
   --    mixed, and they are supposed to be used consistently. We solve
   --    this by using a deferred match.

   --       Temp  : aliased VString;

   --       Bnum  : constant Pattern :=
   --                 Udigs & Bchar * Temp & UEdig & (+Temp)

   --    Here the first instance of the base character is stored in Temp, and
   --    then later in the pattern we rematch the value that was assigned.

   --    For an example of a recursive pattern, let's define a pattern
   --    that is like the built in Bal, but the string matched is balanced
   --    with respect to square brackets or curly brackets.

   --    The language for such strings might be defined in extended BNF as

   --      ELEMENT ::= <any character other than [] or {}>
   --                  | '[' BALANCED_STRING ']'
   --                  | '{' BALANCED_STRING '}'

   --      BALANCED_STRING ::= ELEMENT {ELEMENT}

   --    Here we use {} to indicate zero or more occurrences of a term, as
   --    is common practice in extended BNF. Now we can translate the above
   --    BNF into recursive patterns as follows:

   --      Element, Balanced_String : aliased Pattern;
   --      .
   --      .
   --      .
   --      Element := NotAny ("[]{}")
   --                   or
   --                 ('[' & (+Balanced_String) & ']')
   --                   or
   --                 ('{' & (+Balanced_String) & '}');

   --      Balanced_String := Element & Arbno (Element);

   --    Note the important use of + here to refer to a pattern not yet
   --    defined. Note also that we use assignments precisely because we
   --    cannot refer to as yet undeclared variables in initializations.

   --    Now that this pattern is constructed, we can use it as though it
   --    were a new primitive pattern element, and for example, the match:

   --      Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail);

   --    will generate the output:

   --       x
   --       xy
   --       xy[ab{cd}]
   --       y
   --       y[ab{cd}]
   --       [ab{cd}]
   --       a
   --       ab
   --       ab{cd}
   --       b
   --       b{cd}
   --       {cd}
   --       c
   --       cd
   --       d

   --    Note that the function of the fail here is simply to force the
   --    pattern Balanced_String to match all possible alternatives. Studying
   --    the operation of this pattern in detail is highly instructive.

   --    Finally we give a rather elaborate example of the use of deferred
   --    matching. The following declarations build up a pattern which will
   --    find the longest string of decimal digits in the subject string.

   --       Max, Cur : VString;
   --       Loc      : Natural;

   --       function GtS return Boolean is
   --       begin
   --          return Length (Cur) > Length (Max);
   --       end GtS;

   --       Digit : constant Character_Set := Decimal_Digit_Set;

   --       Digs  : constant Pattern := Span(Digit);

   --       Find : constant Pattern :=
   --         "" * Max & Fence            & -- initialize Max to null
   --         BreakX (Digit)              & -- scan looking for digits
   --         ((Span(Digit) * Cur         & -- assign next string to Cur
   --          (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max)
   --          Setcur(Loc'Access))          -- if so, save location
   --                   * Max)            & -- and assign to Max
   --         Fail;                         -- seek all alternatives

   --    As we see from the comments here, complex patterns like this take
   --    on aspects of sequential programs. In fact they are sequential
   --    programs with general backtracking. In this pattern, we first use
   --    a pattern assignment that matches null and assigns it to Max, so
   --    that it is initialized for the new match. Now BreakX scans to the
   --    next digit. Arb would do here, but BreakX will be more efficient.
   --    Once we have found a digit, we scan out the longest string of
   --    digits with Span, and assign it to Cur. The deferred call to GtS
   --    tests if the string we assigned to Cur is the longest so far. If
   --    not, then failure is signalled, and we seek alternatives (this
   --    means that BreakX will extend and look for the next digit string).
   --    If the call to GtS succeeds then the matched string is assigned
   --    as the largest string so far into Max and its location is saved
   --    in Loc. Finally Fail forces the match to fail and seek alternatives,
   --    so that the entire string is searched.

   --    If the pattern Find is matched against a string, the variable Max
   --    at the end of the pattern will have the longest string of digits,
   --    and Loc will be the starting character location of the string. For
   --    example, Match("ab123cd4657ef23", Find) will assign "4657" to Max
   --    and 11 to Loc (indicating that the string ends with the eleventh
   --    character of the string).

   --    Note: the use of Unrestricted_Access to reference GtS will not
   --    be needed if GtS is defined at the outer level, but definitely
   --    will be necessary if GtS is a nested function (in which case of
   --    course the scope of the pattern Find will be restricted to this
   --    nested scope, and this cannot be checked, i.e. use of the pattern
   --    outside this scope is erroneous). Generally it is a good idea to
   --    define patterns and the functions they call at the outer level
   --    where possible, to avoid such problems.


   --  Correspondence with Pattern Matching in SPITBOL
   --  ===============================================

   --    Generally the Ada syntax and names correspond closely to SPITBOL
   --    syntax for pattern matching construction.

   --      The basic pattern construction operators are renamed as follows:

   --          Spitbol     Ada

   --          (space)      &
   --             |         or
   --             $         *
   --             .         **

   --      The Ada operators were chosen so that the relative precedences of
   --      these operators corresponds to that of the Spitbol operators, but
   --      as always, the use of parentheses is advisable to clarify.

   --    The pattern construction operators all have similar names except for

   --          Spitbol      Ada

   --          Abort        Cancel
   --          Rem          Rest

   --    where we have clashes with Ada reserved names.

   --    Ada requires the use of 'Access to refer to functions used in the
   --    pattern match, and often the use of 'Unrestricted_Access may be
   --    necessary to get around the scope restrictions if the functions
   --    are not declared at the outer level.

   --    The actual pattern matching syntax is modified in Ada as follows:

   --          Spitbol      Ada

   --          X Y          Match (X, Y);
   --          X Y = Z      Match (X, Y, Z);

   --    and pattern failure is indicated by returning a Boolean result from
   --    the Match function (True for success, False for failure).

   -----------------------
   -- Type Declarations --
   -----------------------

   type Pattern is private;
   --  Type representing a pattern. This package provides a complete set of
   --  operations for constructing patterns that can be used in the pattern
   --  matching operations provided.

   type Boolean_Func is access function return Boolean;
   --  General Boolean function type. When this type is used as a formal
   --  parameter type in this package, it indicates a deferred predicate
   --  pattern. The function will be called when the pattern element is
   --  matched and failure signalled if False is returned.

   type Natural_Func is access function return Natural;
   --  General Natural function type. When this type is used as a formal
   --  parameter type in this package, it indicates a deferred pattern.
   --  The function will be called when the pattern element is matched
   --  to obtain the currently referenced Natural value.

   type VString_Func is access function return VString;
   --  General VString function type. When this type is used as a formal
   --  parameter type in this package, it indicates a deferred pattern.
   --  The function will be called when the pattern element is matched
   --  to obtain the currently referenced string value.

   subtype PString is String;
   --  This subtype is used in the remainder of the package to indicate a
   --  formal parameter that is converted to its corresponding pattern,
   --  i.e. a pattern that matches the characters of the string.

   subtype PChar is Character;
   --  Similarly, this subtype is used in the remainder of the package to
   --  indicate a formal parameter that is converted to its corresponding
   --  pattern, i.e. a pattern that matches this one character.

   subtype VString_Var is VString;
   subtype Pattern_Var is Pattern;
   --  These synonyms are used as formal parameter types to a function where,
   --  if the language allowed, we would use in out parameters, but we are
   --  not allowed to have in out parameters for functions. Instead we pass
   --  actuals which must be variables, and with a bit of trickery in the
   --  body, manage to interprete them properly as though they were indeed
   --  in out parameters.

   --------------------------------
   -- Basic Pattern Construction --
   --------------------------------

   function "&"  (L : Pattern; R : Pattern) return Pattern;
   function "&"  (L : PString; R : Pattern) return Pattern;
   function "&"  (L : Pattern; R : PString) return Pattern;
   function "&"  (L : PChar;   R : Pattern) return Pattern;
   function "&"  (L : Pattern; R : PChar)   return Pattern;

   --  Pattern concatenation. Matches L followed by R.

   function "or" (L : Pattern; R : Pattern) return Pattern;
   function "or" (L : PString; R : Pattern) return Pattern;
   function "or" (L : Pattern; R : PString) return Pattern;
   function "or" (L : PString; R : PString) return Pattern;
   function "or" (L : PChar;   R : Pattern) return Pattern;
   function "or" (L : Pattern; R : PChar)   return Pattern;
   function "or" (L : PChar;   R : PChar)   return Pattern;
   function "or" (L : PString; R : PChar)   return Pattern;
   function "or" (L : PChar;   R : PString) return Pattern;
   --  Pattern alternation. Creates a pattern that will first try to match
   --  L and then on a subsequent failure, attempts to match R instead.

   ----------------------------------
   -- Pattern Assignment Functions --
   ----------------------------------

   function "*" (P : Pattern; Var : VString_Var)  return Pattern;
   function "*" (P : PString; Var : VString_Var)  return Pattern;
   function "*" (P : PChar;   Var : VString_Var)  return Pattern;
   --  Matches P, and if the match succeeds, assigns the matched substring
   --  to the given VString variable S. This assignment happens as soon as
   --  the substring is matched, and if the pattern P1 is matched more than
   --  once during the course of the match, then the assignment will occur
   --  more than once.

   function "**" (P : Pattern; Var : VString_Var) return Pattern;
   function "**" (P : PString; Var : VString_Var) return Pattern;
   function "**" (P : PChar;   Var : VString_Var) return Pattern;
   --  Like "*" above, except that the assignment happens at most once
   --  after the entire match is completed successfully. If the match
   --  fails, then no assignment takes place.

   ----------------------------------
   -- Deferred Matching Operations --
   ----------------------------------

   function "+" (Str : VString_Var)  return Pattern;
   --  Here Str must be a VString variable. This function constructs a
   --  pattern which at pattern matching time will access the current
   --  value of this variable, and match against these characters.

   function "+" (Str : VString_Func) return Pattern;
   --  Constructs a pattern which at pattern matching time calls the given
   --  function, and then matches against the string or character value
   --  that is returned by the call.

   function "+" (P : Pattern_Var)    return Pattern;
   --  Here P must be a Pattern variable. This function constructs a
   --  pattern which at pattern matching time will access the current
   --  value of this variable, and match against the pattern value.

   function "+" (P : Boolean_Func)   return Pattern;
   --  Constructs a predicate pattern function that at pattern matching time
   --  calls the given function. If True is returned, then the pattern matches.
   --  If False is returned, then failure is signalled.

   --------------------------------
   -- Pattern Building Functions --
   --------------------------------

   function Arb                                             return Pattern;
   --  Constructs a pattern that will match any string. On the first attempt,
   --  the pattern matches a null string, then on each successive failure, it
   --  matches one more character, and only fails if matching the entire rest
   --  of the string.

   function Arbno  (P : Pattern)                            return Pattern;
   function Arbno  (P : PString)                            return Pattern;
   function Arbno  (P : PChar)                              return Pattern;
   --  Pattern repetition. First matches null, then on a subsequent failure
   --  attempts to match an additional instance of the given pattern.
   --  Equivalent to (but more efficient than) P & ("" or (P & ("" or ...

   function Any    (Str : String)                           return Pattern;
   function Any    (Str : VString)                          return Pattern;
   function Any    (Str : Character)                        return Pattern;
   function Any    (Str : Character_Set)                    return Pattern;
   function Any    (Str : access VString)                   return Pattern;
   function Any    (Str : VString_Func)                     return Pattern;
   --  Constructs a pattern that matches a single character that is one of
   --  the characters in the given argument. The pattern fails if the current
   --  character is not in Str.

   function Bal                                             return Pattern;
   --  Constructs a pattern that will match any non-empty string that is
   --  parentheses balanced with respect to the normal parentheses characters.
   --  Attempts to extend the string if a subsequent failure occurs.

   function Break  (Str : String)                           return Pattern;
   function Break  (Str : VString)                          return Pattern;
   function Break  (Str : Character)                        return Pattern;
   function Break  (Str : Character_Set)                    return Pattern;
   function Break  (Str : access VString)                   return Pattern;
   function Break  (Str : VString_Func)                     return Pattern;
   --  Constructs a pattern that matches a (possibly null) string which
   --  is immediately followed by a character in the given argument. This
   --  character is not part of the matched string. The pattern fails if
   --  the remaining characters to be matched do not include any of the
   --  characters in Str.

   function BreakX (Str : String)                           return Pattern;
   function BreakX (Str : VString)                          return Pattern;
   function BreakX (Str : Character)                        return Pattern;
   function BreakX (Str : Character_Set)                    return Pattern;
   function BreakX (Str : access VString)                   return Pattern;
   function BreakX (Str : VString_Func)                     return Pattern;
   --  Like Break, but the pattern attempts to extend on a failure to find
   --  the next occurrence of a character in Str, and only fails when the
   --  last such instance causes a failure.

   function Cancel                                          return Pattern;
   --  Constructs a pattern that immediately aborts the entire match

   function Fail                                            return Pattern;
   --  Constructs a pattern that always fails.

   function Fence                                           return Pattern;
   --  Constructs a pattern that matches null on the first attempt, and then
   --  causes the entire match to be aborted if a subsequent failure occurs.

   function Fence  (P : Pattern)                            return Pattern;
   --  Constructs a pattern that first matches P. if P fails, then the
   --  constructed pattern fails. If P succeeds, then the match proceeds,
   --  but if subsequent failure occurs, alternatives in P are not sought.
   --  The idea of Fence is that each time the pattern is matched, just
   --  one attempt is made to match P, without trying alternatives.

   function Len    (Count : Natural)                        return Pattern;
   function Len    (Count : access Natural)                 return Pattern;
   function Len    (Count : Natural_Func)                   return Pattern;
   --  Constructs a pattern that matches exactly the given number of
   --  characters. The pattern fails if fewer than this number of characters
   --  remain to be matched in the string.

   function NotAny (Str : String)                           return Pattern;
   function NotAny (Str : VString)                          return Pattern;
   function NotAny (Str : Character)                        return Pattern;
   function NotAny (Str : Character_Set)                    return Pattern;
   function NotAny (Str : access VString)                   return Pattern;
   function NotAny (Str : VString_Func)                     return Pattern;
   --  Constructs a pattern that matches a single character that is not
   --  one of the characters in the given argument. The pattern Fails if
   --  the current character is in Str.

   function NSpan  (Str : String)                           return Pattern;
   function NSpan  (Str : VString)                          return Pattern;
   function NSpan  (Str : Character)                        return Pattern;
   function NSpan  (Str : Character_Set)                    return Pattern;
   function NSpan  (Str : access VString)                   return Pattern;
   function NSpan  (Str : VString_Func)                     return Pattern;
   --  Constructs a pattern that matches the longest possible string
   --  consisting entirely of characters from the given argument. The
   --  string may be empty, so this pattern always succeeds.

   function Pos    (Count : Natural)                        return Pattern;
   function Pos    (Count : access Natural)                 return Pattern;
   function Pos    (Count : Natural_Func)                   return Pattern;
   --  Constructs a pattern that matches the null string if exactly Count
   --  characters have already been matched, and otherwise fails.

   function Rest                                            return Pattern;
   --  Constructs a pattern that always succeeds, matching the remaining
   --  unmatched characters in the pattern.

   function Rpos   (Count : Natural)                        return Pattern;
   function Rpos   (Count : access Natural)                 return Pattern;
   function Rpos   (Count : Natural_Func)                   return Pattern;
   --  Constructs a pattern that matches the null string if exactly Count
   --  characters remain to be matched in the string, and otherwise fails.

   function Rtab   (Count : Natural)                        return Pattern;
   function Rtab   (Count : access Natural)                 return Pattern;
   function Rtab   (Count : Natural_Func)                   return Pattern;
   --  Constructs a pattern that matches from the current location until
   --  exactly Count characters remain to be matched in the string. The
   --  pattern fails if fewer than Count characters remain to be matched.

   function Setcur (Var : access Natural)                   return Pattern;
   --  Constructs a pattern that matches the null string, and assigns the
   --  current cursor position in the string. This value is the number of
   --  characters matched so far. So it is zero at the start of the match.

   function Span   (Str : String)                           return Pattern;
   function Span   (Str : VString)                          return Pattern;
   function Span   (Str : Character)                        return Pattern;
   function Span   (Str : Character_Set)                    return Pattern;
   function Span   (Str : access VString)                   return Pattern;
   function Span   (Str : VString_Func)                     return Pattern;
   --  Constructs a pattern that matches the longest possible string
   --  consisting entirely of characters from the given argument. The
   --  string cannot be empty , so the pattern fails if the current
   --  character is not one of the characters in Str.

   function Succeed                                         return Pattern;
   --  Constructs a pattern that succeeds matching null, both on the first
   --  attempt, and on any rematch attempt, i.e. it is equivalent to an
   --  infinite alternation of null strings.

   function Tab    (Count : Natural)                        return Pattern;
   function Tab    (Count : access Natural)                 return Pattern;
   function Tab    (Count : Natural_Func)                   return Pattern;
   --  Constructs a pattern that from the current location until Count
   --  characters have been matched. The pattern fails if more than Count
   --  characters have already been matched.

   ---------------------------------
   -- Pattern Matching Operations --
   ---------------------------------

   --  The Match function performs an actual pattern matching operation.
   --  The versions with three parameters perform a match without modifying
   --  the subject string and return a Boolean result indicating if the
   --  match is successful or not. The Anchor parameter is set to True to
   --  obtain an anchored match in which the pattern is required to match
   --  the first character of the string. In an unanchored match, which is

   --  the default, successive attempts are made to match the given pattern
   --  at each character of the subject string until a match succeeds, or
   --  until all possibilities have failed.

   --  Note that pattern assignment functions in the pattern may generate
   --  side effects, so these functions are not necessarily pure.

   Anchored_Mode : Boolean := False;
   --  This global variable can be set True to cause all subsequent pattern
   --  matches to operate in anchored mode. In anchored mode, no attempt is
   --  made to move the anchor point, so that if the match succeeds it must
   --  succeed starting at the first character. Note that the effect of
   --  anchored mode may be achieved in individual pattern matches by using
   --  Fence or Pos(0) at the start of the pattern.

   Pattern_Stack_Overflow : exception;
   --  Exception raised if internal pattern matching stack overflows. This
   --  is typically the result of runaway pattern recursion. If there is a
   --  genuine case of stack overflow, then either the match must be broken
   --  down into simpler steps, or the stack limit must be reset.

   Stack_Size : constant Positive := 2000;
   --  Size used for internal pattern matching stack. Increase this size if
   --  complex patterns cause Pattern_Stack_Overflow to be raised.

   --  Simple match functions. The subject is matched against the pattern.
   --  Any immediate or deferred assignments or writes are executed, and
   --  the returned value indicates whether or not the match succeeded.

   function Match
     (Subject : VString;
      Pat     : Pattern)
      return    Boolean;

   function Match
     (Subject : VString;
      Pat     : PString)
      return    Boolean;

   function Match
     (Subject : String;
      Pat     : Pattern)
      return    Boolean;

   function Match
     (Subject : String;
      Pat     : PString)
      return    Boolean;

   --  Replacement functions. The subject is matched against the pattern.
   --  Any immediate or deferred assignments or writes are executed, and
   --  the returned value indicates whether or not the match succeeded.
   --  If the match succeeds, then the matched part of the subject string
   --  is replaced by the given Replace string.

   function Match
     (Subject : VString_Var;
      Pat     : Pattern;
      Replace : VString)
      return    Boolean;

   function Match
     (Subject : VString_Var;
      Pat     : PString;
      Replace : VString)
      return    Boolean;

   function Match
     (Subject : VString_Var;
      Pat     : Pattern;
      Replace : String)
      return    Boolean;

   function Match
     (Subject : VString_Var;
      Pat     : PString;
      Replace : String)
      return    Boolean;

   --  Simple match procedures. The subject is matched against the pattern.
   --  Any immediate or deferred assignments or writes are executed. No
   --  indication of success or failure is returned.

   procedure Match
     (Subject : VString;
      Pat     : Pattern);

   procedure Match
     (Subject : VString;
      Pat     : PString);

   procedure Match
     (Subject : String;
      Pat     : Pattern);

   procedure Match
     (Subject : String;
      Pat     : PString);

   --  Replacement procedures. The subject is matched against the pattern.
   --  Any immediate or deferred assignments or writes are executed. No
   --  indication of success or failure is returned. If the match succeeds,
   --  then the matched part of the subject string is replaced by the given
   --  Replace string.

   procedure Match
     (Subject : in out VString;
      Pat     : Pattern;
      Replace : VString);

   procedure Match
     (Subject : in out VString;
      Pat     : PString;
      Replace : VString);

   procedure Match
     (Subject : in out VString;
      Pat     : Pattern;
      Replace : String);

   procedure Match
     (Subject : in out VString;
      Pat     : PString;
      Replace : String);

   --  Deferred Replacement

   type Match_Result is private;
   --  Type used to record result of pattern match

   subtype Match_Result_Var is Match_Result;
   --  This synonyms is used as a formal parameter type to a function where,
   --  if the language allowed, we would use an in out parameter, but we are
   --  not allowed to have in out parameters for functions. Instead we pass
   --  actuals which must be variables, and with a bit of trickery in the
   --  body, manage to interprete them properly as though they were indeed
   --  in out parameters.

   function Match
     (Subject : VString_Var;
      Pat     : Pattern;
      Result  : Match_Result_Var)
      return    Boolean;

   procedure Match
     (Subject : in out VString;
      Pat     : Pattern;
      Result  : out Match_Result);

   procedure Replace
     (Result  : in out Match_Result;
      Replace : VString);
   --  Given a previous call to Match which set Result, performs a pattern
   --  replacement if the match was successful. Has no effect if the match
   --  failed. This call should immediately follow the Match call.

   ------------------------
   -- Debugging Routines --
   ------------------------

   --  Debugging pattern matching operations can often be quite complex,
   --  since there is no obvious way to trace the progress of the match.
   --  The declarations in this section provide some debugging assistance.

   Debug_Mode : Boolean := False;
   --  This global variable can be set True to generate debugging on all
   --  subsequent calls to Match. The debugging output is a full trace of
   --  the actions of the pattern matcher, written to Standard_Output. The
   --  level of this information is intended to be comprehensible at the
   --  abstract level of this package declaration. However, note that the
   --  use of this switch often generates large amounts of output.

   function "*"  (P : Pattern; Fil : File_Access)           return Pattern;
   function "*"  (P : PString; Fil : File_Access)           return Pattern;
   function "*"  (P : PChar;   Fil : File_Access)           return Pattern;
   function "**" (P : Pattern; Fil : File_Access)           return Pattern;
   function "**" (P : PString; Fil : File_Access)           return Pattern;
   function "**" (P : PChar;   Fil : File_Access)           return Pattern;
   --  These are similar to the corresponding pattern assignment operations
   --  except that instead of setting the value of a variable, the matched
   --  substring is written to the appropriate file. This can be useful in
   --  following the progress of a match without generating the full amount

   --  of information obtained by setting Debug_Mode to True.

   Terminal : constant File_Access := Standard_Error;
   Output   : constant File_Access := Standard_Output;
   --  Two handy synonyms for use with the above pattern write operations.

   --  Finally we have some routines that are useful for determining what
   --  patterns are in use, particularly if they are constructed dynamically.

   function Image (P : Pattern) return String;
   function Image (P : Pattern) return VString;
   --  This procedures yield strings that corresponds to the syntax needed
   --  to create the given pattern using the functions in this package. The
   --  form of this string is such that it could actually be compiled and
   --  evaluated to yield the required pattern except for references to
   --  variables and functions, which are output using one of the following
   --  forms:
   --
   --     access Natural     NP(16#...#)
   --     access Pattern     PP(16#...#)
   --     access VString     VP(16#...#)
   --
   --     Natural_Func       NF(16#...#)
   --     VString_Func       VF(16#...#)
   --
   --  where 16#...# is the hex representation of the integer address that
   --  corresponds to the given access value

   procedure Dump (P : Pattern);
   --  This procedure writes information about the pattern to Standard_Out.
   --  The format of this information is keyed to the internal data structures
   --  used to implement patterns. The information provided by Dump is thus
   --  more precise than that yielded by Image, but is also a bit more obscure
   --  (i.e. it cannot be interpreted solely in terms of this spec, you have
   --  to know something about the data structures).

   ------------------
   -- Private Part --
   ------------------

private
   type PE;
   --  Pattern element, a pattern is a plex structure of PE's. This type
   --  is defined and sdescribed in the body of this package.

   type PE_Ptr is access all PE;
   --  Pattern reference. PE's use PE_Ptr values to reference other PE's

   type Pattern is new Controlled with record

      Stk : Natural;
      --  Maximum number of stack entries required for matching this
      --  pattern. See description of pattern history stack in body.

      P   : PE_Ptr;
      --  Pointer to initial pattern element for pattern

   end record;

   pragma Finalize_Storage_Only (Pattern);

   procedure Adjust (Object : in out Pattern);
   --  Adjust routine used to copy pattern objects

   procedure Finalize (Object : in out Pattern);
   --  Finalization routine used to release storage allocated for a pattern.

   type VString_Ptr is access all VString;

   type Match_Result is record
      Var   : VString_Ptr;
      --  Pointer to subject string. Set to null if match failed.

      Start : Natural;
      --  Starting index position (1's origin) of matched section of
      --  subject string. Only valid if Var is non-null.

      Stop  : Natural;
      --  Ending index position (1's origin) of matched section of
      --  subject string. Only valid if Var is non-null.

   end record;

   pragma Volatile (Match_Result);
   --  This ensures that the Result parameter is passed by reference, so
   --  that we can play our games with the bogus Match_Result_Var parameter
   --  in the function case to treat it as though it were an in out parameter.

end GNAT.Spitbol.Patterns;