XML::LibXML

XML::LibXML Matt Sergeant Christian Glahn Petr Pajas 1.64 2001-2007 AxKit.com Ltd; 2002-2006 Christian Glahn; 2006-2007 Petr Pajas Introduction README This module implements a Perl interface to the Gnome libxml2 library which provides interfaces for parsing and manipulating XML files. This module allows Perl programmers to make use of the highly capable validating XML parser and the high performance DOM implementation. Important Notes XML::LibXML was almost entirely reimplemented between version 1.40 to version 1.49. This may cause problems on some production machines. With version 1.50 a lot of compatibility fixes were applied, so programs written for XML::LibXML 1.40 or less should run with version 1.50 again. In 1.59, a new callback API was introduced. This new API is not compatible with the previous one. See XML::LibXML::InputCallback manual page for details. In 1.61 the XML::LibXML::XPathContext module, previously distributed separately, was merged in. Dependencies Prior to installation you MUST have installed the libxml2 library. You can get the latest libxml2 version from http://xmlsoft.org/ Without libxml2 installed this module will neither build nor run. Also XML::LibXML requires the following packages: XML::LibXML::Common - general functions used by various XML::LibXML modules XML::SAX - DOM building support from SAX XML::NamespaceSupport - DOM building support from SAX These packages are required. If one is missing some tests will fail. Again, libxml2 is required to make XML::LibXML work. The library is not just required to build XML::LibXML, it has to be accessible during run-time as well. Because of this you need to make sure libxml2 is installed properly. To test this, run the xmllint program on your system. xmllint is shipped with libxml2 and therefore should be available. For building the module you will also need the header file for libxml2, which in binary (.rpm,.deb) etc. distributions usually dwell in a package named libxml2-devel or similar. Installation To install XML::LibXML just follow the standard installation routine for Perl modules: perl Makefile.PL make make test make install # as superuser Note that XML::LibXML is an XS based Perl extension and you need a C compiler to build it. Note also that you should rebuild XML::LibXML if you upgrade libxml2 in order to avoid problems with possible binary incompatibilities between releases of the library. Notes on libxml2 versions XML::LibXML requires at least libxml2 2.6.16 to compile and pass all tests and at least 2.6.21 is required for XML::LibXML::Reader. For some older OS versions this means that an update of the pre-built packages is required. Although libxml2 claims binary compatibility between its patch levels, it is a good idea to recompile XML::LibXML and XML::LibXML::Common and run its tests after an upgrade of libxml2. If your libxml2 installation is not within your $PATH, you can pass the XMLPREFIX=$YOURLIBXMLPREFIX parameter to Makefile.PL determining the correct libxml2 version in use. e.g. perl Makefile.PL XMLPREFIX=/usr/brand-new will ask '/usr/brand-new/bin/xml2-config' about your real libxml2 configuration. Try to avoid setting INC and LIBS directly on the command-line, for if used, Makefile.PL does not check the libxml2 version for compatibility with XML::LibXML. Which version of libxml2 should be used? XML::LibXML is tested against a couple versions of libxml2 before it is released. Thus there are versions of libxml2 that are known not to work properly with XML::LibXML. The Makefile.PL keeps a blacklist of the incompatible libxml2 versions. If Makefile.PL detects one of the incompatible versions, it notifies the user. It may still happen that XML::LibXML builds and pass its tests with such a version, but that does not mean everything is OK. There will be no support at all for blacklisted versions! As of XML::LibXML 1.61, only versions 2.6.16 and higher are supported. XML::LibXML will probably not compile with earlier libxml2 versions than 2.5.6. Versions prior to 2.6.8 are known to be broken for various reasons, versions prior to 2.1.16 exhibit problems with namespaced attributes and do not therefore pass XML::LibXML regression tests. It may happen that an unsupported version of libxml2 passes all tests under certain conditions. This is no reason to assume that it shall work without problems. If Makefile.PL marks a version of libxml2 as incompatible or broken it is done for a good reason. Notes for Microsoft Windows Thanks to Randy Kobes there is a pre-compiled PPM package available on http://theoryx5.uwinnipeg.ca/ppmpackages/ Usually it takes a little time to build the package for the latest release. Notes for Mac OS X Due refactoring the module, XML::LibXML will not run with some earlier versions of Mac OS X. It appears that this is related to special linker options for that OS prior to version 10.2.2. Since the developers do not have full access to this OS, help/ patches from OS X gurus are highly appreciated. It is confirmed that XML::LibXML builds and runs without problems since Mac OS X 10.2.6. Notes for HPUX XML::LibXML requires libxml2 2.6.16 or later. There may not exist a usable binary libxml2 package for HPUX and XML::LibXML. If HPUX cc does not compile libxml2 correctly, you will be forced to recompile perl with gcc (unless you have already done that). Additionally I received the following Note from Rozi Kovesdi: Here is my report if someone else runs into the same problem: Finally I am done with installing all the libraries and XML Perl modules The combination that worked best for me was: gcc GNU make Most importantly - before trying to install Perl modules that depend on libxml2: must set SHLIB_PATH to include the path to libxml2 shared library assuming that you used the default: export SHLIB=/usr/local/lib also, make sure that the config files have execute permission: /usr/local/bin/xml2-config /usr/local/bin/xslt-config they did not have +x after they were installed by 'make install' and it took me a while to realize that this was my problem or one can use: perl Makefile.PL LIBS='-L/path/to/lib' INC='-I/path/to/include' Contact For bug reports, please use the CPAN request tracker on http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML For suggestions etc. you may contact the maintainer directly at "pajas at ufal dot mff dot cuni dot cz", but in general, it is recommended to use the mailing list given below. For suggestions etc., and other issues related to XML::LibXML you may use the perl XML mailing list (perl-xml@listserv.ActiveState.com), where most XML-related Perl modules are discussed. In case of problems you should check the archives of that list first. Many problems are already discussed there. You can find the list's archives and subscription options at http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml Package History Version < 0.98 were maintained by Matt Sergeant 0.98 > Version > 1.49 were maintained by Matt Sergeant and Christian Glahn Versions >= 1.49 are maintained by Christian Glahn Versions > 1.56 are co-maintained by Petr Pajas Versions >= 1.59 are provisionally maintained by Petr Pajas Patches and Developer Version As XML::LibXML is open source software help and patches are appreciated. If you find a bug in the current release, make sure this bug still exists in the developer version of XML::LibXML. This version can be downloaded from its Subversion repository, e.g. via svn co svn://axkit.org/XML-LibXML/trunk Note that this account does not allow direct commits. Please consider all regression tests as correct. If any test fails it is most certainly related to a bug. If you find documentation bugs, please fix them in the libxml.dbk file, stored in the docs directory. Known Issues The push-parser implementation causes memory leaks. License LICENSE This is free software, you may use it and distribute it under the same terms as Perl itself. Copyright 2001-2003 AxKit.com Ltd, All rights reserved. Disclaimer THIS PROGRAM IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL, BUT WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Perl Binding for libxml2 XML::LibXML Synopsis use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string(<<'EOT'); <some-xml/> EOT Description This module is an interface to the gnome libxml2 DOM and SAX parser and the DOM tree. It also provides an XML::XPath-like findnodes() interface, providing access to the XPath API in libxml2. The module is split into several packages which are not described in this section. For further information, please check the following documentation: XML::LibXML::Parser Parsing XML Files with XML::LibXML XML::LibXML::DOM XML::LibXML DOM Implementation XML::LibXML::SAX XML::LibXML direct SAX parser XML::LibXML::Reader Reading XML with a pull-parser XML::LibXML::Document XML::LibXML DOM Document Class XML::LibXML::Node Abstract Base Class of XML::LibXML Nodes XML::LibXML::Element XML::LibXML Class for Element Nodes XML::LibXML::Text XML::LibXML Class for Text Nodes XML::LibXML::Comment XML::LibXML Comment Nodes XML::LibXML::CDATASection XML::LibXML Class for CDATA Sections XML::LibXML::Attr XML::LibXML Attribute Class XML::LibXML::DocumentFragment XML::LibXML's DOM L2 Document Fragment Implementation XML::LibXML::Namespace XML::LibXML Namespace Implementation XML::LibXML::PI XML::LibXML Processing Instructions XML::LibXML::Dtd XML::LibXML DTD Support XML::LibXML::RelaxNG XML::LibXML frontend for RelaxNG schema validation XML::LibXML::Schema XML::LibXML frontend for W3C Schema schema validation XML::LibXML::XPathContext API for evaluating XPath expressions XML::LibXMLguts Internal of the Perl Layer for libxml2 (not done yet) Version Information Sometimes it is useful to figure out, for which version XML::LibXML was compiled for. In most cases this is for debugging or to check if a given installation meets all functionality for the package. The functions XML::LibXML::LIBXML_DOTTED_VERSION and XML::LibXML::LIBXML_VERSION provide this version information. Both functions simply pass through the values of the similar named macros of libxml2. Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns the version of the (usually dynamically) linked libxml2. XML::LibXML::LIBXML_DOTTED_VERSION $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION; Returns the version string of the libxml2 version XML::LibXML was compiled for. This will be "2.6.2" for "libxml2 2.6.2". XML::LibXML::LIBXML_VERSION $Version_ID = XML::LibXML::LIBXML_VERSION; Returns the version id of the libxml2 version XML::LibXML was compiled for. This will be "20602" for "libxml2 2.6.2". Don't mix this version id with $XML::LibXML::VERSION. The latter contains the version of XML::LibXML itself while the first contains the version of libxml2 XML::LibXML was compiled for. XML::LibXML::LIBXML_RUNTIME_VERSION $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION; Returns a version string of the libxml2 which is (usually dynamically) linked by XML::LibXML. This will be "20602" for libxml2 released as "2.6.2" and something like "20602-CVS2032" for a CVS build of libxml2. XML::LibXML issues a warning if the version of libxml2 dynamically linked to it is less than the version of libxml2 which it was compiled against. Related Modules The modules described in this section are not part of the XML::LibXML package itself. As they support some additional features, they are mentioned here. XML::LibXSLT XSLT Processor using libxslt and XML::LibXML XML::LibXML::Common Common functions for XML::LibXML related Classes XML::LibXML::Iterator XML::LibXML Implementation of the DOM Traversal Specification XML::LibXML and XML::GDOME Note: THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL Although both modules make use of libxml2's XML capabilities, the DOM implementation of both modules are not compatible. But still it is possible to exchange nodes from one DOM to the other. The concept of this exchange is pretty similar to the function cloneNode(): The particular node is copied on the low-level to the opposite DOM implementation. Since the DOM implementations cannot coexist within one document, one is forced to copy each node that should be used. Because you are always keeping two nodes this may cause quite an impact on a machines memory usage. XML::LibXML provides two functions to export or import GDOME nodes: import_GDOME() and export_GDOME(). Both function have two parameters: the node and a flag for recursive import. The flag works as in cloneNode(). The two functions allow to export and import XML::GDOME nodes explicitly, however, XML::LibXML allows also the transparent import of XML::GDOME nodes in functions such as appendChild(), insertAfter() and so on. While native nodes are automatically adopted in most functions XML::GDOME nodes are always cloned in advance. Thus if the original node is modified after the operation, the node in the XML::LibXML document will not have this information. import_GDOME $libxmlnode = XML::LibXML->import_GDOME( $node, $deep ); This clones an XML::GDOME node to a XML::LibXML node explicitly. export_GDOME $gdomenode = XML::LibXML->export_GDOME( $node, $deep ); Allows to clone an XML::LibXML node into a XML::GDOME node. CONTACTS For bug reports, please use the CPAN request tracker on http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML For suggestions etc., and other issues related to XML::LibXML you may use the perl XML mailing list (perl-xml@listserv.ActiveState.com), where most XML-related Perl modules are discussed. In case of problems you should check the archives of that list first. Many problems are already discussed there. You can find the list's archives and subscription options at http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml. Parsing XML Data with XML::LibXML XML::LibXML::Parser Synopsis use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string(<<'EOT'); <some-xml/> EOT my $fdoc = $parser->parse_file( $xmlfile ); my $fhdoc = $parser->parse_fh( $xmlstream ); my $fragment = $parser->parse_xml_chunk( $xml_wb_chunk ); Parsing A XML document is read into a data structure such as a DOM tree by a piece of software, called a parser. XML::LibXML currently provides four different parser interfaces: A DOM Pull-Parser A DOM Push-Parser A SAX Parser A DOM based SAX Parser. Creating a Parser Instance XML::LibXML provides an OO interface to the libxml2 parser functions. Thus you have to create a parser instance before you can parse any XML data. new $parser = XML::LibXML->new(); There is nothing much to say about the constructor. It simply creates a new parser instance. Although libxml2 uses mainly global flags to alter the behaviour of the parser, each XML::LibXML parser instance has its own flags or callbacks and does not interfere with other instances. DOM Parser One of the common parser interfaces of XML::LibXML is the DOM parser. This parser reads XML data into a DOM like data structure, so each tag can get accessed and transformed. XML::LibXML's DOM parser is not only capable to parse XML data, but also (strict) HTML files. There are three ways to parse documents - as a string, as a Perl filehandle, or as a filename/URL. The return value from each is a XML::LibXML::Document object, which is a DOM object. All of the functions listed below will throw an exception if the document is invalid. To prevent this causing your program exiting, wrap the call in an eval{} block parse_file $doc = $parser->parse_file( $xmlfilename ); This function parses an XML document from a file or network; $xmlfilename can be either a filename or an URL. Note that for parsing files, this function is the fastest choice, about 6-8 times faster then parse_fh(). parse_fh $doc = $parser->parse_fh( $io_fh ); parse_fh() parses a IOREF or a subclass of IO::Handle. Because the data comes from an open handle, libxml2's parser does not know about the base URI of the document. To set the base URI one should use parse_fh() as follows: my $doc = $parser->parse_fh( $io_fh, $baseuri ); parse_string $doc = $parser->parse_string( $xmlstring); This function is similar to parse_fh(), but it parses a XML document that is available as a single string in memory. Again, you can pass an optional base URI to the function. my $doc = $parser->parse_string( $xmlstring, $baseuri ); parse_html_file $doc = $parser->parse_html_file( $htmlfile, \%opts ); Similar to parse_file() but parses HTML (strict) documents; $htmlfile can be filename or URL. An optional second argument can be used to pass some options to the HTML parser as a HASH reference. Possible options are: Possible options are: encoding and URI for libxml2 < 2.6.27, and for later versions of libxml2 additionally: recover, suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network. parse_html_fh $doc = $parser->parse_html_fh( $io_fh, \%opts ); Similar to parse_fh() but parses HTML (strict) streams. An optional second argument can be used to pass some options to the HTML parser as a HASH reference. Possible options are: encoding and URI for libxml2 < 2.6.27, and for later versions of libxml2 additionally: recover, suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network. Note: encoding option may not work correctly with this function in libxml2 < 2.6.27 if the HTML file declares charset using a META tag. parse_html_string $doc = $parser->parse_html_string( $htmlstring, \%opts ); Similar to parse_string() but parses HTML (strict) strings. An optional second argument can be used to pass some options to the HTML parser as a HASH reference. Possible options are: encoding and URI for libxml2 < 2.6.27, and for later versions of libxml2 additionally: recover, suppress_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network. Parsing HTML may cause problems, especially if the ampersand ('&') is used. This is a common problem if HTML code is parsed that contains links to CGI-scripts. Such links cause the parser to throw errors. In such cases libxml2 still parses the entire document as there was no error, but the error causes XML::LibXML to stop the parsing process. However, the document is not lost. Such HTML documents should be parsed using the recover flag. By default recovering is deactivated. The functions described above are implemented to parse well formed documents. In some cases a program gets well balanced XML instead of well formed documents (e.g. a XML fragment from a Database). With XML::LibXML it is not required to wrap such fragments in the code, because XML::LibXML is capable even to parse well balanced XML fragments. parse_balanced_chunk $fragment = $parser->parse_balanced_chunk( $wbxmlstring ); This function parses a well balanced XML string into a XML::LibXML::DocumentFragment. parse_xml_chunk $fragment = $parser->parse_xml_chunk( $wbxmlstring ); This is the old name of parse_balanced_chunk(). Because it may causes confusion with the push parser interface, this function should not be used anymore. By default XML::LibXML does not process XInclude tags within a XML Document (see options section below). XML::LibXML allows to post process a document to expand XInclude tags. process_xincludes $parser->process_xincludes( $doc ); After a document is parsed into a DOM structure, you may want to expand the documents XInclude tags. This function processes the given document structure and expands all XInclude tags (or throws an error) by using the flags and callbacks of the given parser instance. Note that the resulting Tree contains some extra nodes (of type XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully processing the document. These nodes indicate where data was included into the original tree. if the document is serialized, these extra nodes will not show up. Remember: A Document with processed XIncludes differs from the original document after serialization, because the original XInclude tags will not get restored! If the parser flag "expand_xincludes" is set to 1, you need not to post process the parsed document. processXIncludes $parser->processXIncludes( $doc ); This is an alias to process_xincludes, but through a JAVA like function name. Push Parser XML::LibXML provides a push parser interface. Rather than pulling the data from a given source the push parser waits for the data to be pushed into it. This allows one to parse large documents without waiting for the parser to finish. The interface is especially useful if a program needs to pre-process the incoming pieces of XML (e.g. to detect document boundaries). While XML::LibXML parse_*() functions force the data to be a well-formed XML, the push parser will take any arbitrary string that contains some XML data. The only requirement is that all the pushed strings are together a well formed document. With the push parser interface a program can interrupt the parsing process as required, where the parse_*() functions give not enough flexibility. Different to the pull parser implemented in parse_fh() or parse_file(), the push parser is not able to find out about the documents end itself. Thus the calling program needs to indicate explicitly when the parsing is done. In XML::LibXML this is done by a single function: parse_chunk $parser->parse_chunk($string, $terminate); parse_chunk() tries to parse a given chunk of data, which isn't necessarily well balanced data. The function takes two parameters: The chunk of data as a string and optional a termination flag. If the termination flag is set to a true value (e.g. 1), the parsing will be stopped and the resulting document will be returned as the following example describes: my $parser = XML::LibXML->new; for my $string ( "<", "foo", ' bar="hello world"', "/>") { $parser->parse_chunk( $string ); } my $doc = $parser->parse_chunk("", 1); # terminate the parsing Internally XML::LibXML provides three functions that control the push parser process: start_push $parser->start_push(); Initializes the push parser. push $parser->push(@data); This function pushes the data stored inside the array to libxml2's parser. Each entry in @data must be a normal scalar! finish_push $doc = $parser->finish_push( $recover ); This function returns the result of the parsing process. If this function is called without a parameter it will complain about non well-formed documents. If $restore is 1, the push parser can be used to restore broken or non well formed (XML) documents as the following example shows: eval { $parser->push( "<foo>", "bar" ); $doc = $parser->finish_push(); # will report broken XML }; if ( $@ ) { # ... } This can be annoying if the closing tag is missed by accident. The following code will restore the document: eval { $parser->push( "<foo>", "bar" ); $doc = $parser->finish_push(1); # will return the data parsed # unless an error happened }; print $doc->toString(); # returns "<foo>bar</foo>" Of course finish_push() will return nothing if there was no data pushed to the parser before. DOM based SAX Parser XML::LibXML provides a DOM based SAX parser. The SAX parser is defined in XML::LibXML::SAX::Parser. As it is not a stream based parser, it parses documents into a DOM and traverses the DOM tree instead. The API of this parser is exactly the same as any other Perl SAX2 parser. See XML::SAX::Intro for details. Aside from the regular parsing methods, you can access the DOM tree traverser directly, using the generate() method: my $doc = build_yourself_a_document(); my $saxparser = $XML::LibXML::SAX::Parser->new( ... ); $parser->generate( $doc ); This is useful for serializing DOM trees, for example that you might have done prior processing on, or that you have as a result of XSLT processing. WARNING This is NOT a streaming SAX parser. As I said above, this parser reads the entire document into a DOM and serialises it. Some people couldn't read that in the paragraph above so I've added this warning. If you want a streaming SAX parser look at the XML::LibXML::SAX man page Serialization XML::LibXML provides some functions to serialize nodes and documents. The serialization functions are described on the XML::LibXML::Node manpage or the XML::LibXML::Document manpage. XML::LibXML checks three global flags that alter the serialization process: skipXMLDeclaration skipDTD setTagCompression of that three functions only setTagCompression is available for all serialization functions. Because XML::LibXML does these flags not itself, one has to define them locally as the following example shows: local $XML::LibXML::skipXMLDeclaration = 1; local $XML::LibXML::skipDTD = 1; local $XML::LibXML::setTagCompression = 1; If skipXMLDeclaration is defined and not '0', the XML declaration is omitted during serialization. If skipDTD is defined and not '0', an existing DTD would not be serialized with the document. If setTagCompression is defined and not '0' empty tags are displayed as open and closing tags rather than the shortcut. For example the empty tag foo will be rendered as <foo></foo> rather than <foo/>. Parser Options LibXML options are global (unfortunately this is a limitation of the underlying implementation, not this interface). They can either be set using $parser->option(...), or XML::LibXML->option(...), both are treated in the same manner. Note that even two parser processes will share some of the same options, so be careful out there! Every option returns the previous value, and can be called without parameters to get the current value. validation $parser->validation(1); Turn validation on (or off). Defaults to off. recover $parser->recover(1); Turn the parsers recover mode on (or off). Defaults to off. This allows one to parse broken XML data into memory. This switch will only work with XML data rather than HTML data. Also the validation will be switched off automatically. The recover mode helps to recover documents that are almost well-formed very efficiently. That is for example a document that forgets to close the document tag (or any other tag inside the document). The recover mode of XML::LibXML has problems restoring documents that are more like well balanced chunks. XML::LibXML will only parse until the first fatal error occurs, reporting recoverable parsing errors as warnings. To suppress these warnings use $parser->recover_silently(1); or, equivalently, $parser->recover(2). recover_silently $parser->recover_silently(1); Turns the parser warnings off (or on). Defaults to on. This allows to switch off warnings printed to STDERR when parsing documents with recover(1). Please note that calling recover_silently(0) also turns the parser recover mode off and calling recover_silently(1) automatically activates the parser recover mode. expand_entities $parser->expand_entities(0); Turn entity expansion on or off, enabled by default. If entity expansion is off, any external parsed entities in the document are left as entities. Probably not very useful for most purposes. keep_blanks $parser->keep_blanks(0); Allows you to turn off XML::LibXML's default behaviour of maintaining white-space in the document. pedantic_parser $parser->pedantic_parser(1); You can make XML::LibXML more pedantic if you want to. line_numbers $parser->line_numbers(1); If this option is activated XML::LibXML will store the line number of a node. This gives more information where a validation error occurred. It could be also used to find out about the position of a node after parsing (see also XML::LibXML::Node::line_number()) By default line numbering is switched off (0). load_ext_dtd $parser->load_ext_dtd(1); Load external DTD subsets while parsing. This flag is also required for DTD Validation, to provide complete attribute, and to expand entities, regardless if the document has an internal subset. Thus switching off external DTD loading, will disable entity expansion, validation, and complete attributes on internal subsets as well. If you leave this parser flag untouched, everything will work, because the default is 1 (activated) complete_attributes $parser->complete_attributes(1); Complete the elements attributes lists with the ones defaulted from the DTDs. By default, this option is enabled. expand_xinclude $parser->expand_xinclude(1); Expands XIinclude tags immediately while parsing the document. This flag assures that the parser callbacks are used while parsing the included document. load_catalog $parser->load_catalog( $catalog_file ); Will use $catalog_file as a catalog during all parsing processes. Using a catalog will significantly speed up parsing processes if many external resources are loaded into the parsed documents (such as DTDs or XIncludes). Note that catalogs will not be available if an external entity handler was specified. At the current state it is not possible to make use of both types of resolving systems at the same time. base_uri $parser->base_uri( $your_base_uri ); In case of parsing strings or file handles, XML::LibXML doesn't know about the base uri of the document. To make relative references such as XIncludes work, one has to set a separate base URI, that is then used for the parsed documents. gdome_dom $parser->gdome_dom(1); THIS FLAG IS EXPERIMENTAL! Although quite powerful XML:LibXML's DOM implementation is limited if one needs or wants full DOM level 2 or level 3 support. XML::GDOME is based on libxml2 as well but provides a rather complete DOM implementation by wrapping libgdome. This allows you to make use of XML::LibXML's full parser options and XML::GDOME's DOM implementation at the same time. To make use of this function, one has to install libgdome and configure XML::LibXML to use this library. For this you need to rebuild XML::LibXML! clean_namespaces $parser->clean_namespaces( 1 ); libxml2 2.6.0 and later allows to strip redundant namespace declarations from the DOM tree. To do this, one has to set clean_namespaces() to 1 (TRUE). By default no namespace cleanup is done. no_network $parser->no_network(1); Turn networking support on or off, enabled by default. If networking is off, all attempts to fetch non-local resources (such as DTD or external entities) will fail (unless custom callbacks are defined). It may be necessary to use $parser->recover(1) for processing documents requiring such resources while networking is off. Error Reporting XML::LibXML throws exceptions during parsing, validation or XPath processing (and some other occasions). These errors can be caught by using eval blocks. The error then will be stored in $@. XML::LibXML throws errors as they occurs and does not wait if a user test for them. This is a very common misunderstanding in the use of XML::LibXML. If the eval is omitted, XML::LibXML will always halt your script by "croaking" (see Carp man page for details). Also note that an increasing number of functions throw errors if bad data is passed. If you cannot assure valid data passed to XML::LibXML you should eval these functions. Note: since version 1.59, get_last_error() is no longer available in XML::LibXML for thread-safety reasons. XML::LibXML direct SAX parser XML::LibXML::SAX Description XML::LibXML provides an interface to libxml2 direct SAX interface. Through this interface it is possible to generate SAX events directly while parsing a document. While using the SAX parser XML::LibXML will not create a DOM Document tree. Such an interface is useful if very large XML documents have to be processed and no DOM functions are required. By using this interface it is possible to read data stored within a XML document directly into the application data structures without loading the document into memory. The SAX interface of XML::LibXML is based on the famous XML::SAX interface. It uses the generic interface as provided by XML::SAX::Base. Additionally to the generic functions, which are only able to process entire documents, XML::LibXML::SAX provides parse_chunk(). This method generates SAX events from well balanced data such as is often provided by databases. NOTE: At the moment XML::LibXML provides only an incomplete interface to libxml2's native SAX implementation. The current implementation is not tested in production environment. It may causes significant memory problems or shows wrong behaviour. If you run into specific problems using this part of XML::LibXML, let me know. Building DOM trees from SAX events. XML::LibXML::SAX::Builder Synopsis use XML::LibXML::SAX::Builder; my $builder = XML::LibXML::SAX::Builder->new(); my $gen = XML::Generator::DBI->new(Handler => $builder, dbh => $dbh); $gen->execute("SELECT * FROM Users"); my $doc = $builder->result(); Description This is a SAX handler that generates a DOM tree from SAX events. Usage is as above. Input is accepted from any SAX1 or SAX2 event generator. Building DOM trees from SAX events is quite easy with XML::LibXML::SAX::Builder. The class is designed as a SAX2 final handler not as a filter! Since SAX is strictly stream oriented, you should not expect anything to return from a generator. Instead you have to ask the builder instance directly to get the document built. XML::LibXML::SAX::Builder's result() function holds the document generated from the last SAX stream. XML::LibXML DOM Implementation XML::LibXML::DOM Description XML::LibXML provides an light-wight interface to modify a node of the document tree generated by the XML::LibXML parser. This interface follows as far as possible the DOM Level 3 specification. Additionally to the specified functions the XML::LibXML supports some functions that are more handy to use in the perl environment. One also has to remember, that XML::LibXML is an interface to libxml2 nodes which actually reside on the C-Level of XML::LibXML. This means each node is a reference to a structure different than a perl hash or array. The only way to access these structure's values is through the DOM interface provided by XML::LibXML. This also means, that one can't simply inherit a XML::LibXML node and add new member variables as they were hash keys. The DOM interface of XML::LibXML does not intend to implement a full DOM interface as it is done by XML::GDOME and used for full featured application. Moreover, it offers an simple way to build or modify documents that are created by XML::LibXML's parser. Another target of the XML::LibXML interface is to make the interfaces of libxml2 available to the perl community. This includes also some workarounds to some features where libxml2 assumes more control over the C-Level that most perl users don't have. One of the most important parts of the XML::LibXML DOM interface is, that the interfaces try do follow the DOM Level 3 specification rather strictly. This means the interface functions are named as the DOM specification says and not what widespread Java interfaces claim to be standard. Although there are several functions that have only a singular interface that conforms to the DOM spec XML::LibXML provides an additional Java style alias interface. Also there are some function interfaces left over from early stages of XML::LibXML for compatibility reasons. These interfaces are for compatibility reasons only. They might disappear in one of the future versions of XML::LibXML, so a user is requested to switch over to the official functions. More recent versions of perl (e.g. 5.6.1 or higher) support special flags to distinguish between UTF-8 and so called binary data. XML::LibXML provides for these versions functionality to make efficient use of these flags: If a document has set an encoding other than UTF-8 all strings that are not already in UTF-8 are implicitly encoded from the document encoding to UTF-8. On output these strings are commonly returned as UTF-8 unless a user does request explicitly the original (aka. document) encoding. Older version of perl (such as 5.00503 or less) do not support these flags. If XML::LibXML is build for these versions, all strings have to get encoded to UTF-8 manually before they are passed to any DOM functions. NOTE: XML::LibXML's magic encoding may not work on all platforms. Some platforms are known to have a broken iconv(), which is partly used by libxml2. To test if your platform works correctly with your language encoding, build a simple document in the particular encoding and try to parse it with XML::LibXML. If your document gets parsed with out causing any segmentation faults, bus errors or whatever your OS throws. An example for such a test can be found in test 19encoding.t of the distribution. Namespaces and XML::LibXML's DOM implementation XML::LibXML's DOM implementation is limited by the DOM implementation of libxml2 which treats namespaces slightly differently than required by the DOM Level 2 specification. According to the DOM Level 2 specification, namespaces of elements and attributes should be persistent, and nodes should be permanently bound to namespace URIs as they get created; it should be possible to manipulate the special attributes used for declaring XML namespaces just as other attributes without affecting the namespaces of other nodes. In DOM Level 2, the application is responsible for creating the special attributes consistently and/or for correct serialization of the document. This is both inconvenient, causes problems in serialization of DOM to XML, and most importantly, seems almost impossible to implement over libxml2. In libxml2, namespace URI and prefix of a node is provided by a pointer to a namespace declaration (appearing as a special xmlns attribute in the XML document). If the prefix or namespace URI of the declaration changes, the prefix and namespace URI of all nodes that point to it changes as well. Moreover, in contrast to DOM, a node (element or attribute) can only be bound to a namespace URI if there is some namespace declaration in the document to point to. Therefore current DOM implementation in XML::LibXML tries to treat namespace declarations in a compromise between reason, common sense, limitations of libxml2, and the DOM Level 2 specification. In XML::LibXML, special attributes declaring XML namespaces are often created automatically, usually when a namespaced node is attached to a document and no existing declaration of the namespace and prefix is in the scope to be reused. In this respect, XML::LibXML DOM implementation differs from the DOM Level 2 specification according to which special attributes for declaring the appropriate XML namespaces should not be added when a node with a namespace prefix and namespace URI is created. Namespace declarations are also created when XML::LibXML::Document's createElementNS() or createAttributeNS() function are used. If the a namespace is not declared on the documentElement, the namespace will be locally declared for the newly created node. In case of Attributes this may look a bit confusing, since these nodes cannot have namespace declarations itself. In this case the namespace is internally applied to the attribute and later declared on the node the attribute is appended to (if required). The following example may explain this a bit: my $doc = XML::LibXML->createDocument; my $root = $doc->createElementNS( "", "foo" ); $doc->setDocumentElement( $root ); my $attr = $doc->createAttributeNS( "bar", "bar:foo", "test" ); $root->setAttributeNodeNS( $attr ); This piece of code will result in the following document: <?xml version="1.0"?> <foo xmlns:bar="bar" bar:foo="test"/> The namespace is declared on the document element during the setAttributeNodeNS() call. Namespaces can be also declared explicitly by the use of XML::LibXML:Element's setNamespace() function. Since 1.61, they can also be manipulated with functions setNamespaceDeclPrefix() and setNamespaceDeclURI() (not available in DOM). Changing an URI or prefix of an existing namespace declaration affects the namespace URI and prefix of all nodes which point to it (that is the nodes in its scope). It is also important to repeat the specification: While working with namespaces you should use the namespace aware functions instead of the simplified versions. For example you should never use setAttribute() but setAttributeNS(). XML::LibXML DOM Document Class XML::LibXML::Document Synopsis use XML::LibXML; # Only methods specific to Document nodes listed here, # see XML::LibXML::Node manpage for other methods The Document Class is in most cases the result of a parsing process. But sometimes it is necessary to create a Document from scratch. The DOM Document Class provides functions that conform to the DOM Core naming style. It inherits all functions from XML::LibXML::Node as specified in the DOM specification. This enables access to the nodes besides the root element on document level - a DTD for example. The support for these nodes is limited at the moment. While generally nodes are bound to a document in the DOM concept it is suggested that one should always create a node not bound to any document. There is no need of really including the node to the document, but once the node is bound to a document, it is quite safe that all strings have the correct encoding. If an unbound text node with an ISO encoded string is created (e.g. with $CLASS->new()), the toString function may not return the expected result. All this seems like a limitation as long as UTF-8 encoding is assured. If ISO encoded strings come into play it is much safer to use the node creation functions of XML::LibXML::Document. new $dom = XML::LibXML::Document->new( $version, $encoding ); alias for createDocument() createDocument $dom = XML::LibXML::Document->createDocument( $version, $encoding ); The constructor for the document class. As Parameter it takes the version string and (optionally) the encoding string. Simply calling createDocument() will create the document: <?xml version="your version" encoding="your encoding"?> Both parameter are optional. The default value for $version is 1.0, of course. If the $encoding parameter is not set, the encoding will be left unset, which means UTF-8 is implied. The call of createDocument() without any parameter will result the following code: <?xml version="1.0"?> Alternatively one can call this constructor directly from the XML::LibXML class level, to avoid some typing. This will not have any effect on the class instance, which is always XML::LibXML::Document. my $document = XML::LibXML->createDocument( "1.0", "UTF-8" ); is therefore a shortcut for my $document = XML::LibXML::Document->createDocument( "1.0", "UTF-8" ); encoding $strEncoding = $doc->encoding(); returns the encoding string of the document. my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" ); print $doc->encoding; # prints ISO-8859-15 actualEncoding $strEncoding = $doc->actualEncoding(); returns the encoding in which the XML will be returned by $doc->toString(). This is usually the original encoding of the document as declared in the XML declaration and returned by $doc->encoding. If the original encoding is not known (e.g. if created in memory or parsed from a XML without a declared encoding), 'UTF-8' is returned. my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" ); print $doc->encoding; # prints ISO-8859-15 setEncoding $doc->setEncoding($new_encoding); This method allows to change the declaration of encoding in the XML declaration of the document. The value also affects the encoding in which the document is serialized to XML by $doc->toString(). version $strVersion = $doc->version(); returns the version string of the document getVersion() is an alternative form of this function. standalone $doc->standalone This function returns the Numerical value of a documents XML declarations standalone attribute. It returns 1 if standalone="yes" was found, 0 if standalone="no" was found and -1 if standalone was not specified (default on creation). setStandalone $doc->setStandalone($numvalue); Through this method it is possible to alter the value of a documents standalone attribute. Set it to 1 to set standalone="yes", to 0 to set standalone="no" or set it to -1 to remove the standalone attribute from the XML declaration. compression my $compression = $doc->compression; libxml2 allows reading of documents directly from gzipped files. In this case the compression variable is set to the compression level of that file (0-8). If XML::LibXML parsed a different source or the file wasn't compressed, the returned value will be -1. setCompression $doc->setCompression($ziplevel); If one intends to write the document directly to a file, it is possible to set the compression level for a given document. This level can be in the range from 0 to 8. If XML::LibXML should not try to compress use -1 (default). Note that this feature will only work if libxml2 is compiled with zlib support and toFile() is used for output. toString $docstring = $dom->toString($format); toString is a DOM serializing function, so the DOM Tree can be serialized into a XML string, ready for output. IMPORTANT: unlike toString for other nodes, on document nodes this function returns the XML as a byte string in the original encoding of the document (see the actualEncoding() method)! The optional $format parameter sets the indenting of the output. This parameter is expected to be an integer value, that specifies that indentation should be used. The format parameter can have three different values if it is used: If $format is 0, than the document is dumped as it was originally parsed If $format is 1, libxml2 will add ignorable white spaces, so the nodes content is easier to read. Existing text nodes will not be altered If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a leading and a trailing line break to each text node. libxml2 uses a hard-coded indentation of 2 space characters per indentation level. This value can not be altered on run-time. toStringC14N $c14nstr = $doc->toStringC14N($comment_flag,$xpath); See the documentation in XML::LibXML::Node. toStringEC14N $ec14nstr = $doc->toStringEC14N($inclusive_prefix_list, $comment_flag,$xpath); See the documentation in XML::LibXML::Node. serialize $str = $doc->serialize($format); An alias for toString(). This function was name added to be more consistent with libxml2. serialize_c14n $c14nstr = $doc->serialize_c14n($comment_flag,$xpath); An alias for toStringC14N(). serialize_exc_c14n $ec14nstr = $doc->serialize_exc_c14n($comment_flag,$xpath,$inclusive_prefix_list); An alias for toStringEC14N(). toFile $state = $doc->toFile($filename, $format); This function is similar to toString(), but it writes the document directly into a filesystem. This function is very useful, if one needs to store large documents. The format parameter has the same behaviour as in toString(). toFH $state = $doc->toFH($fh, $format); This function is similar to toString(), but it writes the document directly to a filehandle or a stream. The format parameter has the same behaviour as in toString(). toStringHTML $str = $document->toStringHTML(); toStringHTML serialize the tree to a string as HTML. With this method indenting is automatic and managed by libxml2 internally. serialize_html $str = $document->serialize_html(); An alias for toStringHTML(). is_valid $bool = $dom->is_valid(); Returns either TRUE or FALSE depending on whether the DOM Tree is a valid Document or not. You may also pass in a XML::LibXML::Dtd object, to validate against an external DTD: if (!$dom->is_valid($dtd)) { warn("document is not valid!"); } validate $dom->validate(); This is an exception throwing equivalent of is_valid. If the document is not valid it will throw an exception containing the error. This allows you much better error reporting than simply is_valid or not. Again, you may pass in a DTD object documentElement $root = $dom->documentElement(); Returns the root element of the Document. A document can have just one root element to contain the documents data. Optionally one can use getDocumentElement. setDocumentElement $dom->setDocumentElement( $root ); This function enables you to set the root element for a document. The function supports the import of a node from a different document tree. createElement $element = $dom->createElement( $nodename ); This function creates a new Element Node bound to the DOM with the name $nodename. createElementNS $element = $dom->createElementNS( $namespaceURI, $qname ); This function creates a new Element Node bound to the DOM with the name $nodename and placed in the given namespace. createTextNode $text = $dom->createTextNode( $content_text ); As an equivalent of createElement, but it creates a Text Node bound to the DOM. createComment $comment = $dom->createComment( $comment_text ); As an equivalent of createElement, but it creates a Comment Node bound to the DOM. createAttribute $attrnode = $doc->createAttribute($name [,$value]); Creates a new Attribute node. createAttributeNS $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] ); Creates an Attribute bound to a namespace. createDocumentFragment $fragment = $doc->createDocumentFragment(); This function creates a DocumentFragment. createCDATASection $cdata = $dom->create( $cdata_content ); Similar to createTextNode and createComment, this function creates a CDataSection bound to the current DOM. createProcessingInstruction my $pi = $doc->createProcessingInstruction( $target, $data ); create a processing instruction node. Since this method is quite long one may use its short form createPI(). createEntityReference my $entref = $doc->createEntityReference($refname); If a document has a DTD specified, one can create entity references by using this function. If one wants to add a entity reference to the document, this reference has to be created by this function. An entity reference is unique to a document and cannot be passed to other documents as other nodes can be passed. NOTE: A text content containing something that looks like an entity reference, will not be expanded to a real entity reference unless it is a predefined entity my $string = "&foo;"; $some_element->appendText( $string ); print $some_element->textContent; # prints "&foo;" createInternalSubset $dtd = $document->createInternalSubset( $rootnode, $public, $system); This function creates and adds an internal subset to the given document. Because the function automatically adds the DTD to the document there is no need to add the created node explicitly to the document. my $document = XML::LibXML::Document->new(); my $dtd = $document->createInternalSubset( "foo", undef, "foo.dtd" ); will result in the following XML document: <?xml version="1.0"?> <!DOCTYPE foo SYSTEM "foo.dtd"> By setting the public parameter it is possible to set PUBLIC DTDs to a given document. So my $document = XML::LibXML::Document->new(); my $dtd = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef ); will cause the following declaration to be created on the document: <?xml version="1.0"?> <!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN"> createExternalSubset $dtd = $document->createExternalSubset( $rootnode, $public, $system); This function is similar to createInternalSubset() but this DTD is considered to be external and is therefore not added to the document itself. Nevertheless it can be used for validation purposes. importNode $document->importNode( $node ); If a node is not part of a document, it can be imported to another document. As specified in DOM Level 2 Specification the Node will not be altered or removed from its original document ($node->cloneNode(1) will get called implicitly). NOTE: Don't try to use importNode() to import sub-trees that contain an entity reference - even if the entity reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of XML::LibXML itself. adoptNode $document->adoptNode( $node ); If a node is not part of a document, it can be imported to another document. As specified in DOM Level 3 Specification the Node will not be altered but it will removed from its original document. After a document adopted a node, the node, its attributes and all its descendants belong to the new document. Because the node does not belong to the old document, it will be unlinked from its old location first. NOTE: Don't try to adoptNode() to import sub-trees that contain entity references - even if the entity reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of XML::LibXML itself. externalSubset my $dtd = $doc->externalSubset; If a document has an external subset defined it will be returned by this function. NOTE Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In particular one may not want use common node function on doctype declaration nodes! internalSubset my $dtd = $doc->internalSubset; If a document has an internal subset defined it will be returned by this function. NOTE Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In particular one may not want use common node function on doctype declaration nodes! setExternalSubset $doc->setExternalSubset($dtd); EXPERIMENTAL! This method sets a DTD node as an external subset of the given document. setInternalSubset $doc->setInternalSubset($dtd); EXPERIMENTAL! This method sets a DTD node as an internal subset of the given document. removeExternalSubset my $dtd = $doc->removeExternalSubset(); EXPERIMENTAL! If a document has an external subset defined it can be removed from the document by using this function. The removed dtd node will be returned. removeInternalSubset my $dtd = $doc->removeInternalSubset(); EXPERIMENTAL! If a document has an internal subset defined it can be removed from the document by using this function. The removed dtd node will be returned. getElementsByTagName my @nodelist = $doc->getElementsByTagName($tagname); Implements the DOM Level 2 function In SCALAR context this function returns a XML::LibXML::NodeList object. getElementsByTagNameNS my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname); Implements the DOM Level 2 function In SCALAR context this function returns a XML::LibXML::NodeList object. getElementsByLocalName my @nodelist = $doc->getElementsByLocalName($localname); This allows the fetching of all nodes from a given document with the given Localname. In SCALAR context this function returns a XML::LibXML::NodeList object. getElementById my $node = $doc->getElementById($id); Returns the element that has an ID attribute with the given value. If no such element exists, this returns undef. Note: the ID of an element may change while manipulating the document. For documents with a DTD, the information about ID attributes is only available if DTD loading/validation has been requested. For HTML documents parsed with the HTML parser ID detection is done automatically. In XML documents, all "xml:id" attributes are considered to be of type ID. You can test ID-ness of an attribute node with $attr->isId(). In versions 1.59 and earlier this method was called getElementsById() (plural) by mistake. Starting from 1.60 this name is maintained as an alias only for backward compatibility. indexElements $dom->indexElements(); This function causes libxml2 to stamp all elements in a document with their document position index which considerably speeds up XPath queries for large documents. It should only be used with static documents that won't be further changed by any DOM methods, because once a document is indexed, XPath will always prefer the index to other methods of determining the document order of nodes. XPath could therefore return improperly ordered node-lists when applied on a document that has been changed after being indexed. It is of course possible to use this method to re-index a modified document before using it with XPath again. This function is not a part of the DOM specification. This function returns number of elements indexed, -1 if error occurred, or -2 if this feature is not available in the running libxml2. Abstract Base Class of XML::LibXML Nodes XML::LibXML::Node Synopsis use XML::LibXML; XML::LibXML::Node defines functions that are common to all Node Types. A LibXML::Node should never be created standalone, but as an instance of a high level class such as LibXML::Element or LibXML::Text. The class itself should provide only common functionality. In XML::LibXML each node is part either of a document or a document-fragment. Because of this there is no node without a parent. This may causes confusion with "unbound" nodes. nodeName $name = $node->nodeName; Returns the node's name. This function is aware of namespaces and returns the full name of the current node (prefix:localname). Since 1.62 this function also returns the correct DOM names for node types with constant names, namely: #text, #cdata-section, #comment, #document, #document-fragment. setNodeName $node->setNodeName( $newName ); In very limited situations, it is useful to change a nodes name. In the DOM specification this should throw an error. This Function is aware of namespaces. isSameNode $bool = $node->isSameNode( $other_node ); returns TRUE (1) if the given nodes refer to the same node structure, otherwise FALSE (0) is returned. isEqual $bool = $node->isEqual( $other_node ); deprecated version of isSameNode(). NOTE isEqual will change behaviour to follow the DOM specification nodeValue $content = $node->nodeValue; If the node has any content (such as stored in a text node) it can get requested through this function. NOTE: Element Nodes have no content per definition. To get the text value of an Element use textContent() instead! textContent $content = $node->textContent; this function returns the content of all text nodes in the descendants of the given node as specified in DOM. nodeType $type = $node->nodeType; Return the node's type. The possible types are described in the libxml2 tree.h documentation. The return value of this function is a numeric value. Therefore it differs from the result of perl ref function. unbindNode $node->unbindNode(); Unbinds the Node from its siblings and Parent, but not from the Document it belongs to. If the node is not inserted into the DOM afterwards it will be lost after the program terminated. From a low level view, the unbound node is stripped from the context it is and inserted into a (hidden) document-fragment. removeChild $childnode = $node->removeChild( $childnode ); This will unbind the Child Node from its parent $node. The function returns the unbound node. If oldNode is not a child of the given Node the function will fail. replaceChild $oldnode = $node->replaceChild( $newNode, $oldNode ); Replaces the $oldNode with the $newNode. The $oldNode will be unbound from the Node. This function differs from the DOM L2 specification, in the case, if the new node is not part of the document, the node will be imported first. replaceNode $node->replaceNode($newNode); This function is very similar to replaceChild(), but it replaces the node itself rather than a childnode. This is useful if a node found by any XPath function, should be replaced. appendChild $childnode = $node->appendChild( $childnode ); The function will add the $childnode to the end of $node's children. The function should fail, if the new childnode is already a child of $node. This function differs from the DOM L2 specification, in the case, if the new node is not part of the document, the node will be imported first. addChild $childnode = $node->addChild( $chilnode ); As an alternative to appendChild() one can use the addChild() function. This function is a bit faster, because it avoids all DOM conformity checks. Therefore this function is quite useful if one builds XML documents in memory where the order and ownership (ownerDocument) is assured. addChild() uses libxml2's own xmlAddChild() function. Thus it has to be used with extra care: If a text node is added to a node and the node itself or its last childnode is as well a text node, the node to add will be merged with the one already available. The current node will be removed from memory after this action. Because perl is not aware of this action, the perl instance is still available. XML::LibXML will catch the loss of a node and refuse to run any function called on that node. my $t1 = $doc->createTextNode( "foo" ); my $t2 = $doc->createTextNode( "bar" ); $t1->addChild( $t2 ); # is OK my $val = $t2->nodeValue(); # will fail, script dies Also addChild() will not check if the added node belongs to the same document as the node it will be added to. This could lead to inconsistent documents and in more worse cases even to memory violations, if one does not keep track of this issue. Although this sounds like a lot of trouble, addChild() is useful if a document is built from a stream, such as happens sometimes in SAX handlers or filters. If you are not sure about the source of your nodes, you better stay with appendChild(), because this function is more user friendly in the sense of being more error tolerant. addNewChild $node = $parent->addNewChild( $nsURI, $name ); Similar to addChild(), this function uses low level libxml2 functionality to provide faster interface for DOM building. addNewChild() uses xmlNewChild() to create a new node on a given parent element. addNewChild() has two parameters $nsURI and $name, where $nsURI is an (optional) namespace URI. $name is the fully qualified element name; addNewChild() will determine the correct prefix if necessary. The function returns the newly created node. This function is very useful for DOM building, where a created node can be directly associated with its parent. NOTE this function is not part of the DOM specification and its use will limit your code to XML::LibXML. addSibling $node->addSibling($newNode); addSibling() allows adding an additional node to the end of a nodelist, defined by the given node. cloneNode $newnode =$node->cloneNode( $deep ); cloneNode creates a copy of $node. When $deep is set to 1 (true) the function will copy all childnodes as well. If $deep is 0 only the current node will be copied. Note that in case of element, attributes are copied even if $deep is 0. Note that the behavior of this function for $deep=0 has changed in 1.62 in order to be consistent with the DOM spec (in older versions attributes and namespace information was not copied for elements). parentNode $parentnode = $node->parentNode; Returns simply the Parent Node of the current node. nextSibling $nextnode = $node->nextSibling(); Returns the next sibling if any . previousSibling $prevnode = $node->previousSibling(); Analogous to getNextSibling the function returns the previous sibling if any. hasChildNodes $boolean = $node->hasChildNodes(); If the current node has Childnodes this function returns TRUE (1), otherwise it returns FALSE (0, not undef). firstChild $childnode = $node->firstChild; If a node has childnodes this function will return the first node in the childlist. lastChild $childnode = $node->lastChild; If the $node has childnodes this function returns the last child node. ownerDocument $documentnode = $node->ownerDocument; Through this function it is always possible to access the document the current node is bound to. getOwner $node = $node->getOwner; This function returns the node the current node is associated with. In most cases this will be a document node or a document fragment node. setOwnerDocument $node->setOwnerDocument( $doc ); This function binds a node to another DOM. This method unbinds the node first, if it is already bound to another document. This function is the opposite calling of XML::LibXML::Document's adoptNode() function. Because of this it has the same limitations with Entity References as adoptNode(). insertBefore $node->insertBefore( $newNode, $refNode ); The method inserts $newNode before $refNode. If $refNode is undefined, the newNode will be set as the new last child of the parent node. This function differs from the DOM L2 specification, in the case, if the new node is not part of the document, the node will be imported first, automatically. $refNode has to be passed to the function even if it is undefined: $node->insertBefore( $newNode, undef ); # the same as $node->appendChild( $newNode ); $node->insertBefore( $newNode ); # wrong Note, that the reference node has to be a direct child of the node the function is called on. Also, $newChild is not allowed to be an ancestor of the new parent node. insertAfter $node->insertAfter( $newNode, $refNode ); The method inserts $newNode after $refNode. If $refNode is undefined, the newNode will be set as the new last child of the parent node. Note, that $refNode has to be passed explicitly even if it is undef. findnodes @nodes = $node->findnodes( $xpath_expression ); findnodes evaluates the xpath expression (XPath 1.0) on the current node and returns the resulting node set as an array. In scalar context returns a XML::LibXML::NodeList object. NOTE ON NAMESPACES AND XPATH: A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace. So, for example, one cannot match the root element of an XHTML document with $node->find('/html') since '/html' would only match if the root element <html> had no namespace, but all XHTML elements belong to the namespace http://www.w3.org/1999/xhtml. (Note that xmlns="..." namespace declarations can also be specified in a DTD, which makes the situation even worse, since the XML document looks as if there was no default namespace). There are several possible ways to deal with namespaces in XPath: The recommended way is to use the XML::LibXML::XPathContext module to define an explicit context for XPath evaluation, in which a document independent prefix-to-namespace mapping can be defined. For example: my $xpc = XML::LibXML::XPathContext->new; $xpc->registerNs('x', 'http://www.w3.org/1999/xhtml'); $xpc->find('/x:html',$node); Another possibility is to use prefixes declared in the queried document (if known). If the document declares a prefix for the namespace in question (and the context node is in the scope of the declaration), XML::LibXML allows you to use the prefix in the XPath expression, e.g.: $node->find('/x:html'); See also XML::LibXML::XPathContext->findnodes. find $result = $node->find( $xpath ); find evaluates the XPath 1.0 expression using the current node as the context of the expression, and returns the result depending on what type of result the XPath expression had. For example, the XPath "1 * 3 + 52" results in a XML::LibXML::Number object being returned. Other expressions might return a XML::LibXML::Boolean object, or a XML::LibXML::Literal object (a string). Each of those objects uses Perl's overload feature to "do the right thing" in different contexts. See also XML::LibXML::XPathContext->find. findvalue print $node->findvalue( $xpath ); findvalue is exactly equivalent to: $node->find( $xpath )->to_literal; That is, it returns the literal value of the results. This enables you to ensure that you get a string back from your search, allowing certain shortcuts. This could be used as the equivalent of XSLT's <xsl:value-of select="some_xpath"/>. See also XML::LibXML::XPathContext->findvalue. childNodes @childnodes = $node->childNodes; getChildnodes implements a more intuitive interface to the childnodes of the current node. It enables you to pass all children directly to a map or grep. If this function is called in scalar context, a XML::LibXML::NodeList object will be returned. toString $xmlstring = $node->toString($format,$docencoding); This is the equivalent to XML::LibXML::Document::toString for a single node. This means a node and all its childnodes will be dumped into the result string. Additionally to the $format flag of XML::LibXML::Document, this version accepts the optional $docencoding flag. If this flag is set this function returns the string in its original encoding (the encoding of the document) rather than UTF-8. toStringC14N $c14nstring = $node->toStringC14N($with_comments, $xpath_expression); The function is similar to toString(). Instead of simply serializing the document tree, it transforms it as it is specified in the XML-C14N Specification (see http://www.w3.org/TR/xml-c14n). Such transformation is known as canonization. If $with_comments is 0 or not defined, the result-document will not contain any comments that exist in the original document. To include comments into the canonized document, $with_comments has to be set to 1. The parameter $xpath_expression defines the nodeset of nodes that should be visible in the resulting document. This can be used to filter out some nodes. One has to note, that only the nodes that are part of the nodeset, will be included into the result-document. Their child-nodes will not exist in the resulting document, unless they are part of the nodeset defined by the xpath expression. If $xpath_expression is omitted or empty, toStringC14N() will include all nodes in the given sub-tree. toStringEC14N $ec14nstring = $node->toStringEC14N($with_comments, $xpath_expression, $inclusive_prefix_list); The function is similar to toStringC14N() but follows the XML-EXC-C14N Specification (see http://www.w3.org/TR/xml-exc-c14n) for exclusive canonization of XML. The first two arguments are as above. If $inclusive_prefix_list is used, it should be an ARRAY reference listing namespace prefixes that are to be handled in the manner described by the Canonical XML Recommendation (i.e. preserved in the output even if the namespace is not used). C.f. the spec for details. serialize $str = $doc->serialize($format); An alias for toString(). This function was name added to be more consistent with libxml2. serialize_c14n $c14nstr = $doc->serialize_c14n($comment_flag,$xpath); An alias for toStringC14N(). serialize_exc_c14n $ec14nstr = $doc->serialize_ec14n($comment_flag,$xpath,$inclusive_prefix_list); An alias for toStringEC14N(). localname $localname = $node->localname; Returns the local name of a tag. This is the part behind the colon. prefix $nameprefix = $node->prefix; Returns the prefix of a tag. This is the part before the colon. namespaceURI $uri = $node->namespaceURI(); returns the URI of the current namespace. hasAttributes $boolean = $node->hasAttributes(); returns 1 (TRUE) if the current node has any attributes set, otherwise 0 (FALSE) is returned. attributes @attributelist = $node->attributes(); This function returns all attributes and namespace declarations assigned to the given node. Because XML::LibXML does not implement namespace declarations and attributes the same way, it is required to test what kind of node is handled while accessing the functions result. If this function is called in array context the attribute nodes are returned as an array. In scalar context the function will return a XML::LibXML::NamedNodeMap object. lookupNamespaceURI $URI = $node->lookupNamespaceURI( $prefix ); Find a namespace URI by its prefix starting at the current node. lookupNamespacePrefix $prefix = $node->lookupNamespacePrefix( $URI ); Find a namespace prefix by its URI starting at the current node. NOTE Only the namespace URIs are meant to be unique. The prefix is only document related. Also the document might have more than a single prefix defined for a namespace. normalize $node->normalize; This function normalizes adjacent text nodes. This function is not as strict as libxml2's xmlTextMerge() function, since it will not free a node that is still referenced by the perl layer. getNamespaces @nslist = $node->getNamespaces; If a node has any namespaces defined, this function will return these namespaces. Note, that this will not return all namespaces that are in scope, but only the ones declared explicitly for that node. Although getNamespaces is available for all nodes, it only makes sense if used with element nodes. removeChildNodes $node->removeChildNodes(); This function is not specified for any DOM level: It removes all childnodes from a node in a single step. Other than the libxml2 function itself (xmlFreeNodeList), this function will not immediately remove the nodes from the memory. This saves one from getting memory violations, if there are nodes still referred to from the Perl level. nodePath $node->nodePath(); This function is not specified for any DOM level: It returns a canonical structure based XPath for a given node. line_number $lineno = $node->line_number(); This function returns the line number where the tag was found during parsing. If a node is added to the document the line number is 0. Problems may occur, if a node from one document is passed to another one. Note: line_number() is special to XML::LibXML and not part of the DOM specification. If the line_numbers flag of the parser was not activated before parsing, line_number() will always return 0. XML::LibXML Class for Element Nodes XML::LibXML::Element Synopsis use XML::LibXML; # Only methods specific to Element nodes listed here, # see XML::LibXML::Node manpage for other methods new $node = XML::LibXML::Element->new( $name ); This function creates a new node unbound to any DOM. setAttribute $node->setAttribute( $aname, $avalue ); This method sets or replaces the node's attribute $aname to the value $avalue setAttributeNS $node->setAttributeNS( $nsURI, $aname, $avalue ); Namespace-aware version of setAttribute, where $nsURI is a namespace URI, $aname is a qualified name, and $avalue is the value. The namespace URI may be null (empty or undefined) in order to create an attribute which has no namespace. The current implementation differs from DOM in the following aspects If an attribute with the same local name and namespace URI already exists on the element, but its prefix differs from the prefix of $aname, then this function is supposed to change the prefix (regardless of namespace declarations and possible collisions). However, the current implementation does rather the opposite. If a prefix is declared for the namespace URI in the scope of the attribute, then the already declared prefix is used, disregarding the prefix specified in $aname. If no prefix is declared for the namespace, the function tries to declare the prefix specified in $aname and dies if the prefix is already taken by some other namespace. According to DOM Level 2 specification, this method can also be used to create or modify special attributes used for declaring XML namespaces (which belong to the namespace "http://www.w3.org/2000/xmlns/" and have prefix or name "xmlns"). This should work since version 1.61, but again the implementation differs from DOM specification in the following: if a declaration of the same namespace prefix already exists on the element, then changing its value via this method automatically changes the namespace of all elements and attributes in its scope. This is because in libxml2 the namespace URI of an element is not static but is computed from a pointer to a namespace declaration attribute. getAttribute $avalue = $node->getAttribute( $aname ); If $node has an attribute with the name $aname, the value of this attribute will get returned. getAttributeNS $avalue = $node->setAttributeNS( $nsURI, $aname ); Retrieves an attribute value by local name and namespace URI. getAttributeNode $attrnode = $node->getAttributeNode( $aname ); Retrieve an attribute node by name. If no attribute with a given name exists, undef is returned. getAttributeNodeNS $attrnode = $node->getAttributeNodeNS( $namespaceURI, $aname ); Retrieves an attribute node by local name and namespace URI. If no attribute with a given localname and namespace exists, undef is returned. removeAttribute $node->removeAttribute( $aname ); The method removes the attribute $aname from the node's attribute list, if the attribute can be found. removeAttributeNS $node->removeAttributeNS( $nsURI, $aname ); Namespace version of removeAttribute hasAttribute $boolean = $node->hasAttribute( $aname ); This function tests if the named attribute is set for the node. If the attribute is specified, TRUE (1) will be returned, otherwise the return value is FALSE (0). hasAttributeNS $boolean = $node->hasAttributeNS( $nsURI, $aname ); namespace version of hasAttribute getChildrenByTagName @nodes = $node->getChildrenByTagName($tagname); The function gives direct access to all child elements of the current node with a given tagname, where tagname is a qualified name, that is, in case of namespace usage it may consist of a prefix and local name. This function makes things a lot easier if one needs to handle big data sets. A special tagname '*' can be used to match any name. If this function is called in SCALAR context, it returns the number of elements found. getChildrenByTagNameNS @nodes = $node->getChildrenByTagNameNS($nsURI,$tagname); Namespace version of getChildrenByTagName. A special nsURI '*' matches any namespace URI, in which case the function behaves just like getChildrenByLocalName. If this function is called in SCALAR context, it returns the number of elements found. getChildrenByLocalName @nodes = $node->getChildrenByLocalName($localname); The function gives direct access to all child elements of the current node with a given local name. It makes things a lot easier if one needs to handle big data sets. A special localname '*' can be used to match any local name. If this function is called in SCALAR context, it returns the number of elements found. getElementsByTagName @nodes = $node->getElementsByTagName($tagname); This function is part of the spec. It fetches all descendants of a node with a given tagname, where tagname is a qualified name, that is, in case of namespace usage it may consist of a prefix and local name. A special tagname '*' can be used to match any tag name. In SCALAR context this function returns a XML::LibXML::NodeList object. getElementsByTagNameNS @nodes = $node->getElementsByTagNameNS($nsURI,$localname); Namespace version of getElementsByTagName as found in the DOM spec. A special localname '*' can be used to match any local name and nsURI '*' can be used to match any namespace URI. In SCALAR context this function returns a XML::LibXML::NodeList object. getElementsByLocalName @nodes = $node->getElementsByLocalName($localname); This function is not found in the DOM specification. It is a mix of getElementsByTagName and getElementsByTagNameNS. It will fetch all tags matching the given local-name. This allows one to select tags with the same local name across namespace borders. In SCALAR context this function returns a XML::LibXML::NodeList object. appendWellBalancedChunk $node->appendWellBalancedChunk( $chunk ); Sometimes it is necessary to append a string coded XML Tree to a node. appendWellBalancedChunk will do the trick for you. But this is only done if the String is well-balanced. Note that appendWellBalancedChunk() is only left for compatibility reasons. Implicitly it uses my $fragment = $parser->parse_xml_chunk( $chunk ); $node->appendChild( $fragment ); This form is more explicit and makes it easier to control the flow of a script. appendText $node->appendText( $PCDATA ); alias for appendTextNode(). appendTextNode $node->appendTextNode( $PCDATA ); This wrapper function lets you add a string directly to an element node. appendTextChild $node->appendTextChild( $childname , $PCDATA ); Somewhat similar with appendTextNode: It lets you set an Element, that contains only a text node directly by specifying the name and the text content. setNamespace $node->setNamespace( $nsURI , $nsPrefix, $activate ); setNamespace() allows one to apply a namespace to an element. The function takes three parameters: 1. the namespace URI, which is required and the two optional values prefix, which is the namespace prefix, as it should be used in child elements or attributes as well as the additional activate parameter. If prefix is not given, undefined or empty, this function tries to create a declaration of the default namespace. The activate parameter is most useful: If this parameter is set to FALSE (0), a new namespace declaration is simply added to the element while the element's namespace itself is not altered. Nevertheless, activate is set to TRUE (1) on default. In this case the namespace is used as the node's effective namespace. This means the namespace prefix is added to the node name and if there was a namespace already active for the node, it will be replaced (but its declaration is not removed from the document). A new namespace declaration is only created if necessary (that is, if the element is already in the scope of a namespace declaration associating the prefix with the namespace URI, then this declaration is reused). The following example may clarify this: my $e1 = $doc->createElement("bar"); $e1->setNamespace("http://foobar.org", "foo") results <foo:bar xmlns:foo="http://foobar.org"/> while my $e2 = $doc->createElement("bar"); $e2->setNamespace("http://foobar.org", "foo",0) results only <bar xmlns:foo="http://foobar.org"/> By using $activate == 0 it is possible to create multiple namespace declarations on a single element. The function fails if it is required to create a declaration associating the prefix with the namespace URI but the element already carries a declaration with the same prefix but different namespace URI. setNamespaceDeclURI $node->setNamespaceDeclURI( $nsPrefix, $newURI ); EXPERIMENTAL IN 1.61 ! This function manipulates directly with an existing namespace declaration on an element. It takes two parameters: the prefix by which it looks up the namespace declaration and a new namespace URI which replaces its previous value. It returns 1 if the namespace declaration was found and changed, 0 otherwise. All elements and attributes (even those previously unbound from the document) for which the namespace declaration determines their namespace belong to the new namespace after the change. If the new URI is undef or empty, the nodes have no namespace and no prefix after the change. Namespace declarations once nulled in this way do not further appear in the serialized output (but do remain in the document for internal integrity of libxml2 data structures). This function is NOT part of any DOM API. setNamespaceDeclPrefix $node->setNamespaceDeclPrefix( $oldPrefix, $newPrefix ); EXPERIMENTAL IN 1.61 ! This function manipulates directly with an existing namespace declaration on an element. It takes two parameters: the old prefix by which it looks up the namespace declaration and a new prefix which is to replace the old one. The function dies with an error if the element is in the scope of another declaration whose prefix equals to the new prefix, or if the change should result in a declaration with a non-empty prefix but empty namespace URI. Otherwise, it returns 1 if the namespace declaration was found and changed and 0 if not found. All elements and attributes (even those previously unbound from the document) for which the namespace declaration determines their namespace change their prefix to the new value. If the new prefix is undef or empty, the namespace declaration becomes a declaration of a default namespace. The corresponding nodes drop their namespace prefix (but remain in the, now default, namespace). In this case the function fails, if the containing element is in the scope of another default namespace declaration. This function is NOT part of any DOM API. XML::LibXML Class for Text Nodes XML::LibXML::Text Synopsis use XML::LibXML; # Only methods specific to Text nodes listed here, # see XML::LibXML::Node manpage for other methods Different to the DOM specification XML::LibXML implements the text node as the base class of all character data node. Therefor there exists no CharacterData class. This allow one to use all methods that are available for text nodes as well for Comments or CDATA-sections. new $text = XML::LibXML::Text->new( $content ); The constructor of the class. It creates an unbound text node. data $nodedata = $text->data; Although there exists the nodeValue attribute in the Node class, the DOM specification defines data as a separate attribute. XML::LibXML implements these two attributes not as different attributes, but as aliases, such as libxml2 does. Therefore $text->data; and $text->nodeValue; will have the same result and are not different entities. setData($string) $text->setData( $text_content ); This function sets or replaces text content to a node. The node has to be of the type "text", "cdata" or "comment". substringData($offset,$length) $text->substringData($offset, $length); Extracts a range of data from the node. (DOM Spec) This function takes the two parameters $offset and $length and returns the sub-string, if available. If the node contains no data or $offset refers to an non-existing string index, this function will return undef. If $length is out of range substringData will return the data starting at $offset instead of causing an error. appendData($string) $text->appendData( $somedata ); Appends a string to the end of the existing data. If the current text node contains no data, this function has the same effect as setData. insertData($offset,$string) $text->insertData($offset, $string); Inserts the parameter $string at the given $offset of the existing data of the node. This operation will not remove existing data, but change the order of the existing data. The $offset has to be a positive value. If $offset is out of range, insertData will have the same behaviour as appendData. deleteData($offset, $length) $text->deleteData($offset, $length); This method removes a chunk from the existing node data at the given offset. The $length parameter tells, how many characters should be removed from the string. deleteDataString($string, [$all]) $text->deleteDataString($remstring, $all); This method removes a chunk from the existing node data. Since the DOM spec is quite unhandy if you already know which string to remove from a text node, this method allows more perlish code :) The functions takes two parameters: $string and optional the $all flag. If $all is not set, undef or 0, deleteDataString will remove only the first occurrence of $string. If $all is TRUE deleteDataString will remove all occurrences of $string from the node data. replaceData($offset, $length, $string) $text->replaceData($offset, $length, $string); The DOM style version to replace node data. replaceDataString($oldstring, $newstring, [$all]) $text->replaceDataString($old, $new, $flag); The more programmer friendly version of replaceData() :) Instead of giving offsets and length one can specify the exact string ($oldstring) to be replaced. Additionally the $all flag allows to replace all occurrences of $oldstring. replaceDataRegEx( $search_cond, $replace_cond, $reflags ) $text->replaceDataRegEx( $search_cond, $replace_cond, $reflags ); This method replaces the node's data by a simple regular expression. Optional, this function allows to pass some flags that will be added as flag to the replace statement. NOTE: This is a shortcut for my $datastr = $node->getData(); $datastr =~ s/somecond/replacement/g; # 'g' is just an example for any flag $node->setData( $datastr ); This function can make things easier to read for simple replacements. For more complex variants it is recommended to use the code snippet above. XML::LibXML Comment Class XML::LibXML::Comment Synopsis use XML::LibXML; # Only methods specific to Comment nodes listed here, # see XML::LibXML::Node manpage for other methods This class provides all functions of XML::LibXML::Text, but for comment nodes. This can be done, since only the output of the node types is different, but not the data structure. :-) new $node = XML::LibXML::Comment( $content ); The constructor is the only provided function for this package. It is required, because libxml2 treats text nodes and comment nodes slightly differently. XML::LibXML Class for CDATA Sections XML::LibXML::CDATASection Synopsis use XML::LibXML; # Only methods specific to CDATA nodes listed here, # see XML::LibXML::Node manpage for other methods This class provides all functions of XML::LibXML::Text, but for CDATA nodes. new $node = XML::LibXML::CDATASection( $content ); The constructor is the only provided function for this package. It is required, because libxml2 treats the different text node types slightly differently. XML::LibXML Attribute Class XML::LibXML::Attr Synopsis use XML::LibXML; # Only methods specific to Attribute nodes listed here, # see XML::LibXML::Node manpage for other methods This is the interface to handle Attributes like ordinary nodes. The naming of the class relies on the W3C DOM documentation. new $attr = XML::LibXML::Attr->new($name [,$value]); Class constructor. If you need to work with ISO encoded strings, you should always use the createAttrbute of XML::LibXML::Document. getValue $string = $attr->getValue(); Returns the value stored for the attribute. If undef is returned, the attribute has no value, which is different of being not specified. value $string = $attr->value; Alias for getValue() setValue $attr->setValue( $string ); This is needed to set a new attribute value. If ISO encoded strings are passed as parameter, the node has to be bound to a document, otherwise the encoding might be done incorrectly. getOwnerElement $node = $attr->getOwnerElement(); returns the node the attribute belongs to. If the attribute is not bound to a node, undef will be returned. Overwriting the underlying implementation, the parentNode function will return undef, instead of the owner element. setNamespace $attr->setNamespace($nsURI, $prefix); This function tries to bound the attribute to a given namespace. If $nsURI is undefined or empty, the function discards any previous association of the attribute with a namespace. If the namespace was not previously declared in the context of the attribute, this function will fail. In this case you may wish to call setNamespace() on the ownerElement. If the namespace URI is non-empty and declared in the context of the attribute, but only with a different (non-empty) prefix, then the attribute is still bound to the namespace but gets a different prefix than $prefix. The function also fails if the prefix is empty but the namespace URI is not (because unprefixed attributes should by definition belong to no namespace). This function returns 1 on success, 0 otherwise. isId $bool = $attr->isId; Determine whether an attribute is of type ID. For documents with a DTD, this information is only available if DTD loading/validation has been requested. For HTML documents parsed with the HTML parser ID detection is done automatically. In XML documents, all "xml:id" attributes are considered to be of type ID. serializeContent($docencoding) $string = $attr->serializeContent; This function is not part of DOM API. It returns attribute content in the form in which it serializes into XML, that is with all meta-characters properly quoted and with raw entity references (except for entities expanded during parse time). Setting the optional $docencoding flag to 1 enforces document encoding for the output string (which is then passed to Perl as a byte string). Otherwise the string is passed to Perl as (UTF-8 encoded) characters. XML::LibXML's DOM L2 Document Fragment Implementation XML::LibXML::DocumentFragment Synopsis use XML::LibXML; This class is a helper class as described in the DOM Level 2 Specification. It is implemented as a node without name. All adding, inserting or replacing functions are aware of document fragments now. As well all unbound nodes (all nodes that do not belong to any document sub-tree) are implicit members of document fragments. XML::LibXML Namespace Implementation XML::LibXML::Namespace Synopsis use XML::LibXML; # Only methods specific to Namespace nodes listed here, # see XML::LibXML::Node manpage for other methods Namespace nodes are returned by both $element->findnodes('namespace::foo') or by $node->getNamespaces(). The namespace node API is not part of any current DOM API, and so it is quite minimal. It should be noted that namespace nodes are not a sub class of XML::LibXML::Node, however Namespace nodes act a lot like attribute nodes, and similarly named methods will return what you would expect if you treated the namespace node as an attribute. Note that in order to fix several inconsistencies between the API and the documentation, the behavior of some functions have been changed in 1.64. new my $ns = XML::LibXML::Namespace->new($nsURI); Creates a new Namespace node. Note that this is not a 'node' as an attribute or an element node. Therefore you can't do call all XML::LibXML::Node Functions. All functions available for this node are listed below. Optionally you can pass the prefix to the namespace constructor. If this second parameter is omitted you will create a so called default namespace. Note, the newly created namespace is not bound to any document or node, therefore you should not expect it to be available in an existing document. declaredURI Returns the URI for this namespace. declaredPrefix Returns the prefix for this namespace. nodeName print $ns->nodeName(); Returns "xmlns:prefix", where prefix is the prefix for this namespace. name print $ns->name(); Alias for nodeName() getLocalName $localname = $ns->getLocalName(); Returns the local name of this node as if it were an attribute, that is, the prefix associated with the namespace. getData print $ns->getData(); Returns the URI of the namespace, i.e. the value of this node as if it were an attribute. getValue print $ns->getValue(); Alias for getData() value print $ns->value(); Alias for getData() getNamespaceURI $known_uri = $ns->getNamespaceURI(); Returns the string "http://www.w3.org/2000/xmlns/" getPrefix $known_prefix = $ns->getPrefix(); Returns the string "xmlns" XML::LibXML Processing Instructions XML::LibXML::PI Synopsis use XML::LibXML; # Only methods specific to Processing Instruction nodes listed here, # see XML::LibXML::Node manpage for other methods Processing instructions are implemented with XML::LibXML with read and write access. The PI data is the PI without the PI target (as specified in XML 1.0 [17]) as a string. This string can be accessed with getData as implemented in XML::LibXML::Node. The write access is aware about the fact, that many processing instructions have attribute like data. Therefore setData() provides besides the DOM spec conform Interface to pass a set of named parameter. So the code segment my $pi = $dom->createProcessingInstruction("abc"); $pi->setData(foo=>'bar', foobar=>'foobar'); $dom->appendChild( $pi ); will result the following PI in the DOM: <?abc foo="bar" foobar="foobar"?> Which is how it is specified in the DOM specification. This three step interface creates temporary a node in perl space. This can be avoided while using the insertProcessingInstruction() method. Instead of the three calls described above, the call $dom->insertProcessingInstruction("abc",'foo="bar" foobar="foobar"'); will have the same result as above. XML::LibXML::PI's implementation of setData() differs a bit from the the standard version as available in XML::LibXML::Node(): setData $pinode->setData( $data_string ); $pinode->setData( name=>string_value [...] ); This method allows to change the content data of a PI. Additionally to the interface specified for DOM Level2, the method provides a named parameter interface to set the data. This parameter list is converted into a string before it is appended to the PI. XML::LibXML DTD Handling XML::LibXML::Dtd Synopsis use XML::LibXML; This class holds a DTD. You may parse a DTD from either a string, or from an external SYSTEM identifier. No support is available as yet for parsing from a filehandle. XML::LibXML::Dtd is a sub-class of Node, so all the methods available to nodes (particularly toString()) are available to Dtd objects. new $dtd = XML::LibXML::Dtd->new($public_id, $system_id); Parse a DTD from the system identifier, and return a DTD object that you can pass to $doc->is_valid() or $doc->validate(). my $dtd = XML::LibXML::Dtd->new( "SOME // Public / ID / 1.0", "test.dtd" ); my $doc = XML::LibXML->new->parse_file("test.xml"); $doc->validate($dtd); parse_string $dtd = XML::LibXML::Dtd->parse_string($dtd_str); The same as new() above, except you can parse a DTD from a string. Note that parsing from string may fail if the DTD contains external parametric-entity references with relative URLs. getName $publicId = $dtd->getName(); Returns the name of DTD; i.e., the name immediately following the DOCTYPE keyword. publicId $publicId = $dtd->publicId(); Returns the public identifier of the external subset. systemId $systemId = $dtd->systemId(); Returns the system identifier of the external subset. XML::LibXML Class for Input Callbacks XML::LibXML::InputCallback Synopsis use XML::LibXML; Synopsis my $input_callbacks = XML::LibXML::InputCallback->new(); $input_callbacks->register_callbacks([ $match_cb1, $open_cb1, $read_cb1, $close_cb1 ] ); $input_callbacks->register_callbacks([ $match_cb2, $open_cb2, $read_cb2, $close_cb2 ] ); $input_callbacks->register_callbacks( [ $match_cb3, $open_cb3, $read_cb3, $close_cb3 ] ); $parser->input_callbacks( $input_callbacks ); $parser->parse_file( $some_xml_file ); Description You may get unexpected results if you are trying to load external documents during libxml2 parsing if the location of the resource is not a HTTP, FTP or relative location but a absolute path for example. To get around this limitation, you may add your own input handler to open, read and close particular types of locations or URI classes. Using this input callback handlers, you can handle your own custom URI schemes for example. The input callbacks are used whenever LibXML has to get something other than externally parsed entities from somewhere. They are implemented using a callback stack on the Perl layer in analogy to libxml2's native callback stack. The XML::LibXML::InputCallback class transparently registers the input callbacks for the libxml2's parser processes. How does XML::LibXML::InputCallback work? The libxml2 library offers a callback implementation as global functions only. To work-around the troubles resulting in having only global callbacks - for example, if the same global callback stack is manipulated by different applications running together in a single Apache Web-server environment -, XML::LibXML::InputCallback comes with a object-oriented and a function-oriented part. Using the function-oriented part the global callback stack of libxml2 can be manipulated. Those functions can be used as interface to the callbacks on the C- and XS Layer. At the object-oriented part, operations for working with the "pseudo-localized" callback stack are implemented. Currently, you can register and de-register callbacks on the Perl layer and initialize them on a per parser basis. Callback Groups The libxml2 input callbacks come in groups. One group contains a URI matcher (match), a data stream constructor (open), a data stream reader (read), and a data stream destructor (close). The callbacks can be manipulated on a per group basis only. The Parser Process The parser process work on a XML data stream, along which, links to other resources can be embedded. This can be links to external DTDs or XIncludes for example. Those resources are identified by URIs. The callback implementation of libxml2 assumes that one callback group can handle a certain amount of URIs and a certain URI scheme. Per default, callback handlers for file://*, file:://*.gz, http://* and ftp://* are registered. Callback groups in the callback stack are processed from top to bottom, meaning that callback groups registered later will be processed before the earlier registered ones. While parsing the data stream, the libxml2 parser checks if a registered callback group will handle a URI - if they will not, the URI will be interpreted as file://URI. To handle a URI, the match callback will have to return '1'. If that happens, the handling of the URI will be passed to that callback group. Next, the URI will be passed to the open callback, which should return a reference to the data stream if it successfully opened the file, '0' otherwise. If opening the stream was successful, the read callback will be called repeatedly until it returns an empty string. After the read callback, the close callback will be called to close the stream. Organisation of callback groups in XML::LibXML::InputCallback Callback groups are implemented as a stack (Array), each entry holds a reference to an array of the callbacks. For the libxml2 library, the XML::LibXML::InputCallback callback implementation appears as one single callback group. The Perl implementation however allows to manage different callback stacks on a per libxml2-parser basis. Using XML::LibXML::InputCallback After object instantiation using the parameter-less constructor, you can register callback groups. my $input_callbacks = XML::LibXML::InputCallback->new(); $input_callbacks->register_callbacks([ $match_cb1, $open_cb1, $read_cb1, $close_cb1 ] ); $input_callbacks->register_callbacks([ $match_cb2, $open_cb2, $read_cb2, $close_cb2 ] ); $input_callbacks->register_callbacks( [ $match_cb3, $open_cb3, $read_cb3, $close_cb3 ] ); $parser->input_callbacks( $input_callbacks ); $parser->parse_file( $some_xml_file ); What about the old callback system prior to XML::LibXML::InputCallback? In XML::LibXML versions prior to 1.59 - i.e. without the XML::LibXML::InputCallback module - you could define your callbacks either using globally or locally. You still can do that using XML::LibXML::InputCallback, and in addition to that you can define the callbacks on a per parser basis! If you use the old callback interface through global callbacks, XML::LibXML::InputCallback will treat them with a lower priority as the ones registered using the new interface. The global callbacks will not override the callback groups registered using the new interface. Local callbacks are attached to a specific parser instance, therefore they are treated with highest priority. If the match callback of the callback group registered as local variable is identical to one of the callback groups registered using the new interface, that callback group will be replaced. Users of the old callback implementation whose open callback returned a plain string, will have to adapt their code to return a reference to that string after upgrading to version >= 1.59. The new callback system can only deal with the open callback returning a reference! Interface Description Global Variables $_CUR_CB Stores the current callback and can be used as shortcut to access the callback stack. @_GLOBAL_CALLBACKS Stores all callback groups for the current parser process. @_CB_STACK Stores the currently used callback group. Used to prevent parser errors when dealing with nested XML data. Global Callbacks _callback_match Implements the interface for the match callback at C-level and for the selection of the callback group from the callbacks defined at the Perl-level. _callback_open Forwards the open callback from libxml2 to the corresponding callback function at the Perl-level. _callback_read Forwards the read request to the corresponding callback function at the Perl-level and returns the result to libxml2. _callback_close Forwards the close callback from libxml2 to the corresponding callback function at the Perl-level.. Class methods new() A simple constructor. register_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ]) The four callbacks have to be given as array reference in the above order match, open, read, close! unregister_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ]) With no arguments given, unregister_callbacks() will delete the last registered callback group from the stack. If four callbacks are passed as array reference, the callback group to unregister will be identified by the match callback and deleted from the callback stack. Note that if several identical match callbacks are defined in different callback groups, ALL of them will be deleted from the stack. init_callbacks() Initializes the callback system before a parsing process. cleanup_callbacks() Resets global variables and the libxml2 callback stack. lib_init_callbacks() Used internally for callback registration at C-level. lib_cleanup_callbacks() Used internally for callback resetting at the C-level. Example callbacks The following example is a purely fictitious example that uses a MyScheme::Handler object that responds to methods similar to an IO::Handle. # Define the four callback functions sub match_uri { my $uri = shift; return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs } sub open_uri { my $uri = shift; my $handler = MyScheme::Handler->new($uri); return $handler; } # The returned $buffer will be parsed by the libxml2 parser sub read_uri { my $handler = shift; my $length = shift; my $buffer; read($handler, $buffer, $length); return $buffer; # $buffer will be an empty string '' if read() is done } # Close the handle associated with the resource. sub close_uri { my $handler = shift; close($handler); } # Register them with a instance of XML::LibXML::InputCallback my $input_callbacks = XML::LibXML::InputCallback->new(); $input_callbacks->register_callbacks([ \&match_uri, \&open_uri, \&read_uri, \&close_uri ] ); # Register the callback group at a parser instance $parser->input_callbacks( $input_callbacks ); # $some_xml_file will be parsed using our callbacks $parser->parse_file( $some_xml_file ); RelaxNG Schema Validation XML::LibXML::RelaxNG Synopsis use XML::LibXML; $doc = XML::LibXML->new->parse_file($url); The XML::LibXML::RelaxNG class is a tiny frontend to libxml2's RelaxNG implementation. Currently it supports only schema parsing and document validation. new $rngschema = XML::LibXML::RelaxNG->new( location => $filename_or_url ); $rngschema = XML::LibXML::RelaxNG->new( string => $xmlschemastring ); $rngschema = XML::LibXML::RelaxNG->new( DOM => $doc ); The constructor of XML::LibXML::RelaxNG may get called with either one of three parameters. The parameter tells the class from which source it should generate a validation schema. It is important, that each schema only have a single source. The location parameter allows to parse a schema from the filesystem or a URL. The string parameter will parse the schema from the given XML string. The DOM parameter allows to parse the schema from a pre-parsed XML::LibXML::Document. Note that the constructor will die() if the schema does not meed the constraints of the RelaxNG specification. validate eval { $rngschema->validate( $doc ); }; This function allows to validate a (parsed) document against the given RelaxNG schema. The argument of this function should be a XML::LibXML::Document object. If this function succeeds, it will return 0, otherwise it will die() and report the errors found. Because of this validate() should be always evaluated. XML Schema Validation XML::LibXML::Schema Synopsis use XML::LibXML; $doc = XML::LibXML->new->parse_file($url); The XML::LibXML::Schema class is a tiny frontend to libxml2's XML Schema implementation. Currently it supports only schema parsing and document validation. new $xmlschema = XML::LibXML::Schema->new( location => $filename_or_url ); $xmlschema = XML::LibXML::Schema->new( string => $xmlschemastring ); The constructor of XML::LibXML::Schema may get called with either one of two parameters. The parameter tells the class from which source it should generate a validation schema. It is important, that each schema only have a single source. The location parameter allows to parse a schema from the filesystem or a URL. The string parameter will parse the schema from the given XML string. Note that the constructor will die() if the schema does not meed the constraints of the XML Schema specification. validate eval { $xmlschema->validate( $doc ); }; This function allows to validate a (parsed) document against the given XML Schema. The argument of this function should be a XML::LibXML::Document object. If this function succeeds, it will return 0, otherwise it will die() and report the errors found. Because of this validate() should be always evaluated. XPath Evaluation XML::LibXML::XPathContext The XML::LibXML::XPathContext class provides an almost complete interface to libxml2's XPath implementation. With XML::LibXML::XPathContext is is possible to evaluate XPath expressions in the context of arbitrary node, context size, and context position, with a user-defined namespace-prefix mapping, custom XPath functions written in Perl, and even a custom XPath variable resolver. Examples Namespaces This example demonstrates registerNs() method. It finds all paragraph nodes in an XHTML document. my $xc = XML::LibXML::XPathContext->new($xhtml_doc); $xc->registerNs('xhtml', 'http://www.w3.org/1999/xhtml'); my @nodes = $xc->findnodes('//xhtml:p'); Custom XPath functions This example demonstrates registerFunction() method by defining a function filtering nodes based on a Perl regular expression: sub grep_nodes { my ($nodelist,$regexp) = @_; my $result = XML::LibXML::NodeList->new; for my $node ($nodelist->get_nodelist()) { $result->push($node) if $node->textContent =~ $regexp; } return $result; }; my $xc = XML::LibXML::XPathContext->new($node); $xc->registerFunction('grep_nodes', \&grep_nodes); my @nodes = $xc->findnodes('//section[grep_nodes(para,"\bsearch(ing|es)?\b")]'); Variables This example demonstrates registerVarLookup() method. We use XPath variables to recycle results of previous evaluations: sub var_lookup { my ($varname,$ns,$data)=@_; return $data->{$varname}; } my $areas = XML::LibXML->new->parse_file('areas.xml'); my $empl = XML::LibXML->new->parse_file('employees.xml'); my $xc = XML::LibXML::XPathContext->new($empl); my %variables = ( A => $xc->find('/employees/employee[@salary>10000]'), B => $areas->find('/areas/area[district='Brooklyn']/street'), ); # get names of employees from $A working in an area listed in $B $xc->registerVarLookupFunc(\&var_lookup, \%variables); my @nodes = $xc->findnodes('$A[work_area/street = $B]/name'); Methods new my $xpc = XML::LibXML::XPathContext->new(); Creates a new XML::LibXML::XPathContext object without a context node. my $xpc = XML::LibXML::XPathContext->new($node); Creates a new XML::LibXML::XPathContext object with the context node set to $node. registerNs $xpc->registerNs($prefix, $namespace_uri) Registers namespace $prefix to $namespace_uri. unregisterNs $xpc->unregisterNs($prefix) Unregisters namespace $prefix. lookupNs $uri = $xpc->lookupNs($prefix) Returns namespace URI registered with $prefix. If $prefix is not registered to any namespace URI returns undef. registerVarLookupFunc $xpc->registerVarLookupFunc($callback, $data) Registers variable lookup function $prefix. The registered function is executed by the XPath engine each time an XPath variable is evaluated. It takes three arguments: $data, variable name, and variable ns-URI and must return one value: a number or string or any XML::LibXML:: object that can be a result of findnodes: Boolean, Literal, Number, Node (e.g. Document, Element, etc.), or NodeList. For convenience, simple (non-blessed) array references containing only XML::LibXML::Node objects can be used instead of a XML::LibXML::NodeList. getVarLookupData $data = $xpc->getVarLookupData(); Returns the data that have been associated with a variable lookup function during a previous call to registerVarLookupFunc. getVarLookupFunc $callback = $xpc->getVarLookupFunc(); Returns the variable lookup function previously registered with registerVarLookupFunc. unregisterVarLookupFunc $xpc->unregisterVarLookupFunc($name); Unregisters variable lookup function and the associated lookup data. registerFunctionNS $xpc->registerFunctionNS($name, $uri, $callback) Registers an extension function $name in $uri namespace. $callback must be a CODE reference. The arguments of the callback function are either simple scalars or XML::LibXML::* objects depending on the XPath argument types. The function is responsible for checking the argument number and types. Result of the callback code must be a single value of the following types: a simple scalar (number, string) or an arbitrary XML::LibXML::* object that can be a result of findnodes: Boolean, Literal, Number, Node (e.g. Document, Element, etc.), or NodeList. For convenience, simple (non-blessed) array references containing only XML::LibXML::Node objects can be used instead of a XML::LibXML::NodeList. unregisterFunctionNS $xpc->unregisterFunctionNS($name, $uri) Unregisters extension function $name in $uri namespace. Has the same effect as passing undef as $callback to registerFunctionNS. registerFunction $xpc->registerFunction($name, $callback) Same as registerFunctionNS but without a namespace. unregisterFunction $xpc->unregisterFunction($name) Same as unregisterFunctionNS but without a namespace. findnodes @nodes = $xpc->findnodes($xpath) @nodes = $xpc->findnodes($xpath, $context_node ) $nodelist = $xpc->findnodes($xpath, $context_node ) Performs the xpath statement on the current node and returns the result as an array. In scalar context returns a XML::LibXML::NodeList object. Optionally, a node may be passed as a second argument to set the context node for the query. find $object = $xpc->find($xpath ) $object = $xpc->find($xpath, $context_node ) Performs the xpath expression using the current node as the context of the expression, and returns the result depending on what type of result the XPath expression had. For example, the XPath 1 * 3 + 52 results in a XML::LibXML::Number object being returned. Other expressions might return a XML::LibXML::Boolean object, or a XML::LibXML::Literal object (a string). Each of those objects uses Perl's overload feature to ``do the right thing'' in different contexts. Optionally, a node may be passed as a second argument to set the context node for the query. findvalue $value = $xpc->findvalue($xpath ) $value = $xpc->findvalue($xpath, $context_node ) Is exactly equivalent to: $node->find( $xpath )->to_literal; That is, it returns the literal value of the results. This enables you to ensure that you get a string back from your search, allowing certain shortcuts. This could be used as the equivalent of <xsl:value-of select=``some_xpath''/>. Optionally, a node may be passed in the second argument to set the context node for the query. setContextNode $xpc->setContextNode($node) Set the current context node. getContextNode my $node = $xpc->getContextNode; Get the current context node. setContextPosition $xpc->setContextPosition($position) Set the current context position. By default, this value is -1 (and evaluating XPath function position() in the initial context raises an XPath error), but can be set to any value up to context size. This usually only serves to cheat the XPath engine to return given position when position() XPath function is called. Setting this value to -1 restores the default behavior. getContextPosition my $position = $xpc->getContextPosition; Get the current context position. setContextSize $xpc->setContextSize($size) Set the current context size. By default, this value is -1 (and evaluating XPath function last() in the initial context raises an XPath error), but can be set to any non-negative value. This usually only serves to cheat the XPath engine to return the given value when last() XPath function is called. If context size is set to 0, position is automatically also set to 0. If context size is positive, position is automatically set to 1. Setting context size to -1 restores the default behavior. getContextSize my $size = $xpc->getContextSize; Get the current context size. setContextNode $xpc->setContextNode($node) Set the current context node. Bugs And Caveats XML::LibXML::XPathContext objects are reentrant, meaning that you can call methods of an XML::LibXML::XPathContext even from XPath extension functions registered with the same object or from a variable lookup function. On the other hand, you should rather avoid registering new extension functions, namespaces and a variable lookup function from within extension functions and a variable lookup function, unless you want to experience untested behavior. Authors Ilya Martynov and Petr Pajas, based on XML::LibXML and XML::LibXSLT code by Matt Sergeant and Christian Glahn. Historical remark Prior to XML::LibXML 1.61 this module was distributed separately for maintenance reasons. XML::LibXML::Reader - interface to libxml2 pull parser XML::LibXML::Reader Synopsis use XML::LibXML::Reader; $reader = new XML::LibXML::Reader(location => "file.xml") or die "cannot read file.xml\n"; while ($reader->read) { processNode($reader); } sub processNode { $reader = shift; printf "%d %d %s %d\n", ($reader->depth, $reader->nodeType, $reader->name, $reader->isEmptyElement); } or $reader = new XML::LibXML::Reader(location => "file.xml") or die "cannot read file.xml\n"; $reader->preservePattern('//table/tr'); $reader->finish; print $reader->document->toString(1); DESCRIPTION This is a perl interface to libxml2's pull-parser implementation xmlTextReader http://xmlsoft.org/html/libxml-xmlreader.html. This feature requires at least libxml2-2.6.21. Pull-parser (StAX in Java, XmlReader in C#) use an iterator approach to parse a xml-file. They are easier to program than event-based parser (SAX) and much more lightweight than tree-based parser (DOM), which load the complete tree into memory. The Reader acts as a cursor going forward on the document stream and stopping at each node in the way. At every point DOM-like methods of the Reader object allow to examine the current node (name, namespace, attributes, etc.) The user's code keeps control of the progress and simply calls the read() function repeatedly to progress to the next node in the document order. Other functions provide means for skipping complete sub-trees, or nodes until a specific element, etc. At every time, only a very limited portion of the document is kept in the memory, which makes the API more memory-efficient than using DOM. However, it is also possible to mix Reader with DOM. At every point the user may copy the current node (optionally expanded into a complete sub-tree) from the processed document to another DOM tree, or to instruct the Reader to collect sub-document in form of a DOM tree consisting of selected nodes. Reader API also supports namespaces, xml:base, entity handling, and DTD validation. Schema and RelaxNG validation support will probably be added in some later revision of the Perl interface. The naming of methods compared to libxml2 and C# XmlTextReader has been changed slightly to match the conventions of XML::LibXML. Some functions have been changed or added with respect to the C interface. CONSTRUCTOR Depending on the XML source, the Reader object can be created with either of: my $reader = XML::LibXML::Reader->new( location => "file.xml", ... ); my $reader = XML::LibXML::Reader->new( string => $xml_string, ... ); my $reader = XML::LibXML::Reader->new( IO => $file_handle, ... ); my $reader = XML::LibXML::Reader->new( DOM => $dom, ... ); where ... are (optional) reader options described below in Parser options. The constructor recognizes the following XML sources: Source specification location Read XML from a local file or URL. string Read XML from a string. IO Read XML a Perl IO filehandle. FD Read XML from a file descriptor (bypasses Perl I/O layer, only applicable to filehandles for regular files or pipes). Possibly faster than IO. DOM Use reader API to walk through a pre-parsed XML::LibXML::Document. Parsing options URI can be used to provide baseURI when parsing strings or filehandles. encoding override document encoding. RelaxNG can be used to pass either a XML::LibXML::RelaxNG object or a filename or URL of a RelaxNG schema to the constructor. The schema is then used to validate the document as it is processed. Schema can be used to pass either a XML::LibXML::Schema object or a filename or URL of a W3C XSD schema to the constructor. The schema is then used to validate the document as it is processed. recover recover on errors (0 or 1) expand_entities substitute entities (0 or 1) load_ext_dtd load the external subset (0 or 1) complete_attributes default DTD attributes (0 or 1) validation validate with the DTD (0 or 1) suppress_errors suppress error reports (0 or 1) suppress_warnings suppress warning reports (0 or 1) pedantic_parser pedantic error reporting (0 or 1) no_blanks remove blank nodes (0 or 1) expand_xinclude Implement XInclude substitution (0 or 1) no_network Forbid network access (0 or 1) clean_namespaces remove redundant namespaces declarations (0 or 1) no_cdata merge CDATA as text nodes (0 or 1) no_xinclude_nodes do not generate XINCLUDE START/END nodes (0 or 1) METHODS CONTROLLING PARSING PROGRESS read () Moves the position to the next node in the stream, exposing its properties. Returns 1 if the node was read successfully, 0 if there is no more nodes to read, or -1 in case of error readAttributeValue () Parses an attribute value into one or more Text and EntityReference nodes. Returns 1 in case of success, 0 if the reader was not positioned on an attribute node or all the attribute values have been read, or -1 in case of error. readState () Gets the read state of the reader. Returns the state value, or -1 in case of error. The module exports constants for the Reader states, see STATES below. depth () The depth of the node in the tree, starts at 0 for the root node. next () Skip to the node following the current one in the document order while avoiding the sub-tree if any. Returns 1 if the node was read successfully, 0 if there is no more nodes to read, or -1 in case of error. nextElement (localname?,nsURI?) Skip nodes following the current one in the document order until a specific element is reached. The element's name must be equal to a given localname if defined, and its namespace must equal to a given nsURI if defined. Either of the arguments can be undefined (or omitted, in case of the latter or both). Returns 1 if the element was found, 0 if there is no more nodes to read, or -1 in case of error. skipSiblings () Skip all nodes on the same or lower level until the first node on a higher level is reached. In particular, if the current node occurs in an element, the reader stops at the end tag of the parent element, otherwise it stops at a node immediately following the parent node. Returns 1 if successful, 0 if end of the document is reached, or -1 in case of error. nextSibling () It skips to the node following the current one in the document order while avoiding the sub-tree if any. Returns 1 if the node was read successfully, 0 if there is no more nodes to read, or -1 in case of error nextSiblingElement (name?,nsURI?) Like nextElement but only processes sibling elements of the current node (moving forward using nextSibling () rather than read (), internally). Returns 1 if the element was found, 0 if there is no more sibling nodes, or -1 in case of error. finish () Skip all remaining nodes in the document, reaching end of the document. Returns 1 if successful, 0 in case of error. close () This method releases any resources allocated by the current instance and closes any underlying input. It returns 0 on failure and 1 on success. This method is automatically called by the destructor when the reader is forgotten, therefore you do not have to call it directly. METHODS EXTRACTING INFORMATION name () Returns the qualified name of the current node, equal to (Prefix:)LocalName. nodeType () Returns the type of the current node. See NODE TYPES below. localName () Returns the local name of the node. prefix () Returns the prefix of the namespace associated with the node. namespaceURI () Returns the URI defining the namespace associated with the node. isEmptyElement () Check if the current node is empty, this is a bit bizarre in the sense that <a/> will be considered empty while <a></a> will not. hasValue () Returns true if the node can have a text value. value () Provides the text value of the node if present or undef if not available. readInnerXml () Reads the contents of the current node, including child nodes and markup. Returns a string containing the XML of the node's content, or undef if the current node is neither an element nor attribute, or has no child nodes. readOuterXml () Reads the contents of the current node, including child nodes and markup. Returns a string containing the XML of the node including its content, or undef if the current node is neither an element nor attribute. METHODS EXTRACTING DOM NODES document () Provides access to the document tree built by the reader. This function can be used to collect the preserved nodes (see preserveNode() and preservePattern). CAUTION: Never use this function to modify the tree unless reading of the whole document is completed! copyCurrentNode (deep) This function is similar a DOM function copyNode(). It returns a copy of the currently processed node as a corresponding DOM object. Use deep = 1 to obtain the full sub-tree. preserveNode () This tells the XML Reader to preserve the current node in the document tree. A document tree consisting of the preserved nodes and their content can be obtained using the method document() once parsing is finished. Returns the node or NULL in case of error. preservePattern (pattern,\%ns_map) This tells the XML Reader to preserve all nodes matched by the pattern (which is a streaming XPath subset). A document tree consisting of the preserved nodes and their content can be obtained using the method document() once parsing is finished. An optional second argument can be used to provide a HASH reference mapping prefixes used by the XPath to namespace URIs. The XPath subset available with this function is described at http://www.w3.org/TR/xmlschema-1/#Selector and matches the production Path ::= ('.//')? ( Step '/' )* ( Step | '@' NameTest ) Returns a positive number in case of success and -1 in case of error METHODS PROCESSING ATTRIBUTES attributeCount () Provides the number of attributes of the current node. hasAttributes () Whether the node has attributes. getAttribute (name) Provides the value of the attribute with the specified qualified name. Returns a string containing the value of the specified attribute, or undef in case of error. getAttributeNs (localName, namespaceURI) Provides the value of the specified attribute. Returns a string containing the value of the specified attribute, or undef in case of error. getAttributeNo (no) Provides the value of the attribute with the specified index relative to the containing element. Returns a string containing the value of the specified attribute, or undef in case of error. isDefault () Returns true if the current attribute node was generated from the default value defined in the DTD. moveToAttribute (name) Moves the position to the attribute with the specified local name and namespace URI. Returns 1 in case of success, -1 in case of error, 0 if not found moveToAttributeNo (no) Moves the position to the attribute with the specified index relative to the containing element. Returns 1 in case of success, -1 in case of error, 0 if not found moveToAttributeNs (localName,namespaceURI) Moves the position to the attribute with the specified local name and namespace URI. Returns 1 in case of success, -1 in case of error, 0 if not found moveToFirstAttribute () Moves the position to the first attribute associated with the current node. Returns 1 in case of success, -1 in case of error, 0 if not found moveToNextAttribute () Moves the position to the next attribute associated with the current node. Returns 1 in case of success, -1 in case of error, 0 if not found moveToElement () Moves the position to the node that contains the current attribute node. Returns 1 in case of success, -1 in case of error, 0 if not moved isNamespaceDecl () Determine whether the current node is a namespace declaration rather than a regular attribute. Returns 1 if the current node is a namespace declaration, 0 if it is a regular attribute or other type of node, or -1 in case of error. OTHER METHODS lookupNamespace (prefix) Resolves a namespace prefix in the scope of the current element. Returns a string containing the namespace URI to which the prefix maps or undef in case of error. encoding () Returns a string containing the encoding of the document or undef in case of error. standalone () Determine the standalone status of the document being read. Returns 1 if the document was declared to be standalone, 0 if it was declared to be not standalone, or -1 if the document did not specify its standalone status or in case of error. xmlVersion () Determine the XML version of the document being read. Returns a string containing the XML version of the document or undef in case of error. baseURI () The base URI of the node. See the XML Base W3C specification. isValid () Retrieve the validity status from the parser. Returns 1 if valid, 0 if no, and -1 in case of error. xmlLang () The xml:lang scope within which the node resides. lineNumber () Provide the line number of the current parsing point. columnNumber () Provide the column number of the current parsing point. byteConsumed () This function provides the current index of the parser relative to the start of the current entity. This function is computed in bytes from the beginning starting at zero and finishing at the size in bytes of the file if parsing a file. The function is of constant cost if the input is UTF-8 but can be costly if run on non-UTF-8 input. setParserProp (prop => value, ...) Change the parser processing behaviour by changing some of its internal properties. The following properties are available with this function: ``load_ext_dtd'', ``complete_attributes'', ``validation'', ``expand_entities''. Since some of the properties can only be changed before any read has been done, it is best to set the parsing properties at the constructor. Returns 0 if the call was successful, or -1 in case of error getParserProp (prop) Get value of an parser internal property. The following property names can be used: ``load_ext_dtd'', ``complete_attributes'', ``validation'', ``expand_entities''. Returns the value, usually 0 or 1, or -1 in case of error. DESTRUCTION XML::LibXML takes care of the reader object destruction when the last reference to the reader object goes out of scope. The document tree is preserved, though, if either of $reader->document or $reader->preserveNode was used and references to the document tree exist. NODE TYPES The reader interface provides the following constants for node types (the constant symbols are exported by default or if tag :types is used). XML_READER_TYPE_NONE => 0 XML_READER_TYPE_ELEMENT => 1 XML_READER_TYPE_ATTRIBUTE => 2 XML_READER_TYPE_TEXT => 3 XML_READER_TYPE_CDATA => 4 XML_READER_TYPE_ENTITY_REFERENCE => 5 XML_READER_TYPE_ENTITY => 6 XML_READER_TYPE_PROCESSING_INSTRUCTION => 7 XML_READER_TYPE_COMMENT => 8 XML_READER_TYPE_DOCUMENT => 9 XML_READER_TYPE_DOCUMENT_TYPE => 10 XML_READER_TYPE_DOCUMENT_FRAGMENT => 11 XML_READER_TYPE_NOTATION => 12 XML_READER_TYPE_WHITESPACE => 13 XML_READER_TYPE_SIGNIFICANT_WHITESPACE => 14 XML_READER_TYPE_END_ELEMENT => 15 XML_READER_TYPE_END_ENTITY => 16 XML_READER_TYPE_XML_DECLARATION => 17 STATES The following constants represent the values returned by readState(). They are exported by default, or if tag :states is used: XML_READER_NONE => -1 XML_READER_START => 0 XML_READER_ELEMENT => 1 XML_READER_END => 2 XML_READER_EMPTY => 3 XML_READER_BACKTRACK => 4 XML_READER_DONE => 5 XML_READER_ERROR => 6 VERSION 0.02 AUTHORS Heiko Klein, <H.Klein@gmx.net<gt> and Petr Pajas, <pajas@matfyz.cz<gt> SEE ALSO http://xmlsoft.org/html/libxml-xmlreader.html http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html