filter.txt   [plain text]


Input Filter Extension
~~~~~~~~~~~~~~~~~~~~~~

Introduction
============
We all know that you should always check input variables, but PHP does not
offer really good functionality for doing this in a safe way. The Input Filter
extension is meant to address this issue by implementing a set of filters and
mechanisms that users can use to safely access their input data. 


Change Log
==========
2005-10-27
    * Updated filter_data prototype
    * Added filter constants
    * Fixed minor problems
    * Changes by David Tulloh

2005-10-05
    * Changed "input_filter.paranoid_admin_default_filter" to
      "filter.default".
    * Updated API prototypes to reflect implementation.
    * Added 'on' and 'off' to the boolean filter.
    * Removed min_range and max_range flags from the float filter.
    * Added validate_url, validate_email and validate_ip filters.
    * Updated allows flags for all filters.

2005-08-15
    * Unmade *source* a bitmask, it doesn't make sense to do.
    * Changed return value of filters which got invalid data from 'false' to
      'null.
    * Failed filters do not throw an E_NOTICE any longer.
    * Added a magic_quotes sanitizing filter.


General Considerations
======================
* If the filter's expected input data mask does not match the provided data
  for logical filters the filter function returns "false". If the data was
  not found, "null" is returned.
* Character filters always return a string.
* With the input filter extension enabled, and the
  input_filter.paranoid_admin_default_filter is set to something != 'raw',
  then all entries in the affected super globals will be passed through the
  configured filter. The 'callback' filter can not be used here, as that
  requieres a PHP script to be running already.
* As the input filter acts on input data before the magic quotes function
  mangles data, all access through the filter() function will not have any
  quotes or slashes added - it will be the pure data as send by the browser.
* All flags mentioned here should be prepended with `FILTER_FLAG_` when used
  with PHP.

  
API
===
mixed *input_get* (int *source*, string *name*, [, int *filter* [, mixed *filter_options*, [ string *characterset* ] ]);
    Returns the filtered variable *$name* from source *$source*. It uses the
    filter as specified in *$filter* with a constant, and additional options
    to the filter through *$filter_options*.

mixed *input_get_args* (array *definitions*, int *source*, [, array *data*]);
    Returns an array with all filtered variables defined in 'definition'.
    The keys are used as the name of the argument. The value can be either
    an integer (flags) or an array of options. This array can contain 
    the 'filter' type, the 'flags', the 'otptions' or the 'charset'

bool *input_has_variable (int *source*, string *name*);
    Returns *true* if the variable with the name *name* exists in *source*, or
    *false* otherwise.

array *input_filters_list* ();
    Returns a list with all supported filter names.

mixed *filter_data* (mixed *variable*, int *filter* [, mixed *filter_options*, [ string *characterset* ] ]);
    Filters the user supplied variable *$variable* in the same manner as
    *input_get*.

*$source*:

* INPUT_POST     0
* INPUT_GET      1
* INPUT_COOKIE   2
* INPUT_ENV      4
* INPUT_SERVER   5 (not implemented yet)
* INPUT_SESSION  6 (not implemented yet)


General flags
=============

* FILTER_FLAG_SCALAR
* FILTER_FLAG_ARRAY

These two constants define whether to allow arrays in the source values. The
default value is SCALAR for input_get_args and ARRAY for the other functions
(< 0.9.5). These constants also insure that the function returns the correct
type, if you ask for an array, you will get an array even if the source is
only one value. However, if you ask for a scalar and the source is an array,
the result will be FALSE (invalid).


Logical Filters
===============

These filters check whether passed data was valid, and do never mangle input
variables, but ofcourse they can deny the whole input variable getting to the
application by returning false.

The constants should be prepended by `FILTER_VALIDATE_` when used with php.

================ ========== =========== ==================================================
Name             Constant   Return Type Description      
================ ========== =========== ==================================================
int              INT        integer     Returns the input variable as an integer

                                        $filter_options - an array with the optional
                                        elements:

                                        * min_range: Minimal number that is allowed
                                          (inclusive)
                                        * max_range: Maximum number that is allowed
                                          (inclusive)
                                        * flags: A bitmask supporting the following flags:
                          
                                          - ALLOW_OCTAL: allow octal numbers with the format
                                            0nn as input too.
                                          - ALLOW_HEX: allow hexadecimal numbers with the
                                            format 0xnn or 0Xnn too.

boolean          BOOLEAN    boolean     Returns *true* for '1', 'on' and 'true' and *false*
                                        for '0', 'off' and 'false'

float            FLOAT      float       Returns the input variable as a floating point value

validate_regexp  REGEXP     string      Matches the input value as a string against the
                                        regular expression. If there is a match then the
                                        string is returned, otherwise the filter returns
                                        *null*.
                                        Remarks: Only available if pcre has been compiled
                                        into PHP.

validate_url     URL        string      Validates an URL's format.

                                        $filter_options - an bitmask that supports the
                                        following flags:
           
                                        * SCHEME_REQUIRED: The 'schema' part of the URL
                                          needs to in the passed URL.
                                        * HOST_REQUIRED: The 'host' part of the URL
                                          needs to in the passed URL.
                                        * PATH_REQUIRED: The 'path' part of the URL
                                          needs to in the passed URL.
                                        * QUERY_REQUIRED: The 'query' part of the URL
                                          needs to in the passed URL.

validate_email   EMAIL      string      Validates the passed string against a reasonably
                                        good regular expression for validating an email
                                        address.

validate_ip      IP         string      Validates a string representing an IP address.

                                        $filter_options - an bitmask that supports the
                                        following flags:

                                        * IPV4: Allows IPv4 addresses.
                                        * IPV6: Allows IPv6 addresses.
                                        * NO_RES_RANGE: Disallows addresses in reversed
                                          ranges (IPv4 only)
                                        * NO_PRIV_RANGE: Disallows addresses in private
                                          ranges (IPv4 only)
================ ========== =========== ==================================================


Sanitizing Filters
==================

These filters remove data, or change data depending on the filter, and the
set rules for this specific filter. Instead of taking an *options* array, they
use this parameter for flags for the specific filter.

The constants should be prepended by `FILTER_SANITIZE_` when used with php.
                         
============= ================ =========== =====================================================
Name          Constant         Return Type Description      
============= ================ =========== =====================================================
string        STRING           string      Returns the input variable as a string after it has
                                           been stripped of XML/HTML tags and other evil things
                                           that can cause XSS problems.

                                           $filter_options - an bitmask that supports the
                                           following flags:
                 
                                           * NO_ENCODE_QUOTES: Prevents single and double
                                             quotes from being encoded as numerical HTML
                                             entities.
                                           * STRIP_LOW: excludes all characters < 0x20 from the
                                             allowed character list
                                           * STRIP_HIGH: excludes all characters >= 0x80 from
                                             the allowed character list
                                           * ENCODE_LOW: allows characters < 0x20 but encodes
                                             them as numerical HTML entities
                                           * ENCODE_HIGH: allows characters >= 0x80 but encodes
                                             them as numerical HTML entities
                                           * ENCODE_AMP: encodes & as &amp;

                                           The flags STRIP_LOW and ENCODE_LOW are mutual
                                           exclusive, and so are STRIP_HIGH and ENCODE_HIGH. In
                                           the case they clash, the characters will be
                                           stripped.

stripped      STRIPPED         string      Alias for 'string'.

encoded       ENCODED          string      Encodes all characters outside the range
                                           "a-zA-Z0-9-._" as URL encoded values.

                                           $filter_options - an bitmask that supports the
                                           following flags:

                                           * STRIP_LOW: excludes all characters < 0x20 from the
                                             allowed character list
                                           * STRIP_HIGH: excludes all characters >= 0x80 from
                                             the allowed character list
                                           * ENCODE_LOW: allows characters < 0x20 but encodes
                                             them as numerical HTML entities
                                           * ENCODE_HIGH: allows characters >= 0x80 but encodes
                                             them as numerical HTML entities

special_chars SPECIAL_CHARS    string      Encodes the 'special' characters ' " < > &, \0 and
                                           everything below 0x20 as numerical HTML entities.
                 
                                           $filter_options - an bitmask that supports the
                                           following flags:
                 
                                           * STRIP_LOW: excludes all characters < 0x20 from the
                                             allowed character list. If this is not set, then
                                             those characters are encoded as numerical HTML
                                             entities
                                           * STRIP_HIGH: excludes all characters >= 0x80 from
                                             the allowed character list
                                           * ENCODE_HIGH: allows characters >= 0x80 but encodes
                                             them as numerical HTML entities

unsafe_raw    UNSAFE_RAW       string      Returns the input variable as a string without
                                           XML/HTML being stripped from the input value.
                 
                                           $filter_options - an bitmask that supports the
                                           following flags:
                 
                                           * STRIP_LOW: excludes all characters < 0x20 from the
                                             allowed character list
                                           * STRIP_HIGH: excludes all characters >= 0x80 from
                                             the allowed character list
                                           * ENCODE_LOW: allows characters < 0x20 but encodes
                                             them as numerical HTML entities
                                           * ENCODE_HIGH: allows characters >= 0x80 but encodes
                                             them as numerical HTML entities
                                           * ENCODE_AMP: encodes & as &amp;
                 
                                           The flags STRIP_LOW and ENCODE_LOW are mutual
                                           exclusive, and so are STRIP_HIGH and ENCODE_HIGH. In
                                           the case they clash, the characters will be
                                           stripped.

email         EMAIL            string      Removes all characters that can not be part of a
                                           correctly formed e-mail address (exception are
                                           comments in the email address) (a-z A-Z 0-9 " ! # $
                                           % & ' * + - / = ? ^ _ ` { | } ~ @ . [ ]). This
                                           filter does `not` validate if the e-mail address has
                                           the correct format, use the validate_email filter
                                           for that.
                 
url           URL              string      Removes all characters that can not be part of a
                                           correctly formed URI. (a-z A-Z 0-9 $ - _ . + ! * ' (
                                           ) , { } | \ ^ ~ [ ] ` < > # % " ; / ? : @ & =) This
                                           filter does `not` validate if a URI has the correct
                                           format, use the validate_url filter for that.
                 
number_int    NUMBER_INT       int         Removes all characters that are [^0-9+-].

number_float  NUMBER_FLOAT     float       Removes all characters that are [^0-9+-].

                                           $filter_options - an bitmask that supports the
                                           following flags:
                 
                                           * ALLOW_FRACTION: adds "." to the characters that
                                             are not stripped.
                                           * ALLOW_THOUSAND: adds "," to the characters that
                                             are not stripped.
                                           * ALLOW_SCIENTIFIC: adds "eE" to the characters that
                                             are not stripped.

magic_quotes  MAGIC_QUOTES     string      BC filter for people who like magic quotes.
============= ================ =========== =====================================================


Callback Filter
===============

This filter will callback to the specified callback function as specified with
the *filter_options* parameter. All variants of callback functions are
supported:

* function with *'functionname'*
* static method with *array('classname', 'methodname')*
* dynamic method with *array(&$this, 'methodname')*

The constants should be prepended by `FILTER_` when used with php.

============= =========== =========== =====================================================
Name          Constant    Return Type Description      
============= =========== =========== =====================================================
callback      CALLBACK    mixed       Calls the callback function/method with the input
                                      variable's value by reference which can do filtering
                                      and modifying of the input value. If the callback
                                      function returns "false" then the input value is
                                      supposed to be incorrect and the returned value will
                                      be 'false' (and an E_NOTICE will be raised).
============= =========== =========== =====================================================

The callback function's prototype is:

boolean callback(&$value, $characterset);
    With *$value* being a reference to the input variable and *$characterset*
    containing the same value as this parameter's value in the call to
    *input_get()* or *input_get_array()*. If the *$characterset* parameter was
    not passed, it defaults to *'null'*.

Version: $Id: filter.txt,v 1.5 2006/05/24 11:51:55 pajoye Exp $
.. vim: et syn=rst tw=78