Document.pod [plain text]

=head1 NAME

XML::LibXML::Document - XML::LibXML DOM Document Class

=head1 SYNOPSIS

  use XML::LibXML;
  # Only methods specific to Document nodes listed here,
  # see XML::LibXML::Node manpage for other methods

  $dom = XML::LibXML::Document->new( $version, $encoding );
  $dom = XML::LibXML::Document->createDocument( $version, $encoding );
  $strEncoding = $doc->encoding();
  $strEncoding = $doc->actualEncoding();
  $doc->setEncoding($new_encoding);
  $strVersion = $doc->version();
  $doc->standalone
  $doc->setStandalone($numvalue);
  my $compression = $doc->compression;
  $doc->setCompression($ziplevel);
  $docstring = $dom->toString($format);
  $c14nstr = $doc->toStringC14N($comment_flag,$xpath); 
  $ec14nstr = $doc->toStringEC14N($inclusive_prefix_list, $comment_flag,$xpath); 
  $str = $doc->serialize($format); 
  $c14nstr = $doc->serialize_c14n($comment_flag,$xpath); 
  $ec14nstr = $doc->serialize_exc_c14n($comment_flag,$xpath,$inclusive_prefix_list); 
  $state = $doc->toFile($filename, $format);
  $state = $doc->toFH($fh, $format);
  $str = $document->toStringHTML();
  $str = $document->serialize_html();
  $bool = $dom->is_valid();
  $dom->validate();
  $root = $dom->documentElement();
  $dom->setDocumentElement( $root );
  $element = $dom->createElement( $nodename );
  $element = $dom->createElementNS( $namespaceURI, $qname );
  $text = $dom->createTextNode( $content_text );
  $comment = $dom->createComment( $comment_text );
  $attrnode = $doc->createAttribute($name [,$value]);
  $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
  $fragment = $doc->createDocumentFragment();
  $cdata = $dom->create( $cdata_content );
  my $pi = $doc->createProcessingInstruction( $target, $data );
  my $entref = $doc->createEntityReference($refname);
  $dtd = $document->createInternalSubset( $rootnode, $public, $system);
  $dtd = $document->createExternalSubset( $rootnode, $public, $system);
  $document->importNode( $node );
  $document->adoptNode( $node );
  my $dtd = $doc->externalSubset;
  my $dtd = $doc->internalSubset;
  $doc->setExternalSubset($dtd);
  $doc->setInternalSubset($dtd);
  my $dtd = $doc->removeExternalSubset();
  my $dtd = $doc->removeInternalSubset();
  my @nodelist = $doc->getElementsByTagName($tagname);
  my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);
  my @nodelist = $doc->getElementsByLocalName($localname);
  my $node = $doc->getElementById($id);
  $dom->indexElements();


=head1 DESCRIPTION

The Document Class is in most cases the result of a parsing process. But
sometimes it is necessary to create a Document from scratch. The DOM Document
Class provides functions that conform to the DOM Core naming style.

It inherits all functions from XML::LibXML::Node as specified in the DOM
specification. This enables access to the nodes besides the root element on
document level - a DTD for example. The support for these nodes is limited at
the moment.

While generally nodes are bound to a document in the DOM concept it is
suggested that one should always create a node not bound to any document. There
is no need of really including the node to the document, but once the node is
bound to a document, it is quite safe that all strings have the correct
encoding. If an unbound text node with an ISO encoded string is created (e.g.
with $CLASS->new()), the toString function may not return the expected result.

All this seems like a limitation as long as UTF-8 encoding is assured. If ISO
encoded strings come into play it is much safer to use the node creation
functions of XML::LibXML::Document.

=over 4

=item B<new>

  $dom = XML::LibXML::Document->new( $version, $encoding );

alias for createDocument()


=item B<createDocument>

  $dom = XML::LibXML::Document->createDocument( $version, $encoding );

The constructor for the document class. As Parameter it takes the version
string and (optionally) the encoding string. Simply calling createDocument()
will create the document:

  <?xml version="your version" encoding="your encoding"?>

Both parameter are optional. The default value for $version is 1.0, of course.
If the $encoding parameter is not set, the encoding will be left unset, which
means UTF-8 is implied.

The call of createDocument() without any parameter will result the following
code:

  <?xml version="1.0"?>              

Alternatively one can call this constructor directly from the XML::LibXML class
level, to avoid some typing. This will not have any effect on the class
instance, which is always XML::LibXML::Document.

   my $document = XML::LibXML->createDocument( "1.0", "UTF-8" );

is therefore a shortcut for

  my $document = XML::LibXML::Document->createDocument( "1.0", "UTF-8" );


=item B<encoding>

  $strEncoding = $doc->encoding();

returns the encoding string of the document.

  my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
  print $doc->encoding; # prints ISO-8859-15


=item B<actualEncoding>

  $strEncoding = $doc->actualEncoding();

returns the encoding in which the XML will be returned by $doc->toString().
This is usually the original encoding of the document as declared in the XML
declaration and returned by $doc->encoding. If the original encoding is not
known (e.g. if created in memory or parsed from a XML without a declared
encoding), 'UTF-8' is returned.

  my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
  print $doc->encoding; # prints ISO-8859-15


=item B<setEncoding>

  $doc->setEncoding($new_encoding);

This method allows to change the declaration of encoding in the XML declaration
of the document. The value also affects the encoding in which the document is
serialized to XML by $doc->toString().


=item B<version>

  $strVersion = $doc->version();

returns the version string of the document

getVersion() is an alternative form of this function.


=item B<standalone>

  $doc->standalone

This function returns the Numerical value of a documents XML declarations
standalone attribute. It returns 1 if standalone="yes" was found, 0 if
standalone="no" was found and -1 if standalone was not specified (default on
creation).


=item B<setStandalone>

  $doc->setStandalone($numvalue);

Through this method it is possible to alter the value of a documents standalone
attribute. Set it to 1 to set standalone="yes", to 0 to set standalone="no" or
set it to -1 to remove the standalone attribute from the XML declaration.


=item B<compression>

  my $compression = $doc->compression;

libxml2 allows reading of documents directly from gzipped files. In this case
the compression variable is set to the compression level of that file (0-8). If
XML::LibXML parsed a different source or the file wasn't compressed, the
returned value will be -1.


=item B<setCompression>

  $doc->setCompression($ziplevel);

If one intends to write the document directly to a file, it is possible to set
the compression level for a given document. This level can be in the range from
0 to 8. If XML::LibXML should not try to compress use -1 (default).

Note that this feature will only work if libxml2 is compiled with zlib support
and toFile() is used for output.


=item B<toString>

  $docstring = $dom->toString($format);

toString is a DOM serializing function, so the DOM Tree can be serialized into
a XML string, ready for output.

IMPORTANT: unlike toString for other nodes, on document nodes this function
returns the XML as a byte string in the original encoding of the document (see
the actualEncoding() method)!

The optional $format parameter sets the indenting of the output. This parameter
is expected to be an integer value, that specifies that indentation should be
used. The format parameter can have three different values if it is used:

If $format is 0, than the document is dumped as it was originally parsed

If $format is 1, libxml2 will add ignorable white spaces, so the nodes content
is easier to read. Existing text nodes will not be altered

If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a
leading and a trailing line break to each text node.

libxml2 uses a hard-coded indentation of 2 space characters per indentation
level. This value can not be altered on run-time.


=item B<toStringC14N>

$c14nstr = $doc->toStringC14N($comment_flag,$xpath); See the documentation in
XML::LibXML::Node.


=item B<toStringEC14N>

$ec14nstr = $doc->toStringEC14N($inclusive_prefix_list, $comment_flag,$xpath);
See the documentation in XML::LibXML::Node.


=item B<serialize>

  $str = $doc->serialize($format); 

An alias for toString(). This function was name added to be more consistent
with libxml2.


=item B<serialize_c14n>

  $c14nstr = $doc->serialize_c14n($comment_flag,$xpath); 

An alias for toStringC14N().


=item B<serialize_exc_c14n>

  $ec14nstr = $doc->serialize_exc_c14n($comment_flag,$xpath,$inclusive_prefix_list); 

An alias for toStringEC14N().


=item B<toFile>

  $state = $doc->toFile($filename, $format);

This function is similar to toString(), but it writes the document directly
into a filesystem. This function is very useful, if one needs to store large
documents.

The format parameter has the same behaviour as in toString().


=item B<toFH>

  $state = $doc->toFH($fh, $format);

This function is similar to toString(), but it writes the document directly to
a filehandle or a stream.

The format parameter has the same behaviour as in toString().


=item B<toStringHTML>

  $str = $document->toStringHTML();

toStringHTML serialize the tree to a string as HTML. With this method indenting
is automatic and managed by libxml2 internally.


=item B<serialize_html>

  $str = $document->serialize_html();

An alias for toStringHTML().


=item B<is_valid>

  $bool = $dom->is_valid();

Returns either TRUE or FALSE depending on whether the DOM Tree is a valid
Document or not.

You may also pass in a XML::LibXML::Dtd object, to validate against an external
DTD:

   if (!$dom->is_valid($dtd)) {
       warn("document is not valid!");
   }


=item B<validate>

  $dom->validate();

This is an exception throwing equivalent of is_valid. If the document is not
valid it will throw an exception containing the error. This allows you much
better error reporting than simply is_valid or not.

Again, you may pass in a DTD object


=item B<documentElement>

  $root = $dom->documentElement();

Returns the root element of the Document. A document can have just one root
element to contain the documents data.

Optionally one can use getDocumentElement.


=item B<setDocumentElement>

  $dom->setDocumentElement( $root );

This function enables you to set the root element for a document. The function
supports the import of a node from a different document tree.


=item B<createElement>

  $element = $dom->createElement( $nodename );

This function creates a new Element Node bound to the DOM with the name
$nodename.


=item B<createElementNS>

  $element = $dom->createElementNS( $namespaceURI, $qname );

This function creates a new Element Node bound to the DOM with the name
$nodename and placed in the given namespace.


=item B<createTextNode>

  $text = $dom->createTextNode( $content_text );

As an equivalent of createElement, but it creates a Text Node bound to the DOM.


=item B<createComment>

  $comment = $dom->createComment( $comment_text );

As an equivalent of createElement, but it creates a Comment Node bound to the
DOM.


=item B<createAttribute>

  $attrnode = $doc->createAttribute($name [,$value]);

Creates a new Attribute node.


=item B<createAttributeNS>

  $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );

Creates an Attribute bound to a namespace.


=item B<createDocumentFragment>

  $fragment = $doc->createDocumentFragment();

This function creates a DocumentFragment.


=item B<createCDATASection>

  $cdata = $dom->create( $cdata_content );

Similar to createTextNode and createComment, this function creates a
CDataSection bound to the current DOM.


=item B<createProcessingInstruction>

  my $pi = $doc->createProcessingInstruction( $target, $data );

create a processing instruction node.

Since this method is quite long one may use its short form createPI().


=item B<createEntityReference>

  my $entref = $doc->createEntityReference($refname);

If a document has a DTD specified, one can create entity references by using
this function. If one wants to add a entity reference to the document, this
reference has to be created by this function.

An entity reference is unique to a document and cannot be passed to other
documents as other nodes can be passed.

NOTE: A text content containing something that looks like an entity reference,
will not be expanded to a real entity reference unless it is a predefined
entity

   my $string = "&foo;";
   $some_element->appendText( $string );
   print $some_element->textContent; # prints "&amp;foo;"


=item B<createInternalSubset>

  $dtd = $document->createInternalSubset( $rootnode, $public, $system);

This function creates and adds an internal subset to the given document.
Because the function automatically adds the DTD to the document there is no
need to add the created node explicitly to the document.

   my $document = XML::LibXML::Document->new();
   my $dtd      = $document->createInternalSubset( "foo", undef, "foo.dtd" );

will result in the following XML document:

  <?xml version="1.0"?>
   <!DOCTYPE foo SYSTEM "foo.dtd">                    

By setting the public parameter it is possible to set PUBLIC DTDs to a given
document. So

  my $document = XML::LibXML::Document->new();
  my $dtd      = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );
  

will cause the following declaration to be created on the document:

  <?xml version="1.0"?>
  <!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN">


=item B<createExternalSubset>

  $dtd = $document->createExternalSubset( $rootnode, $public, $system);

This function is similar to createInternalSubset() but this DTD is considered
to be external and is therefore not added to the document itself. Nevertheless
it can be used for validation purposes.


=item B<importNode>

  $document->importNode( $node );

If a node is not part of a document, it can be imported to another document. As
specified in DOM Level 2 Specification the Node will not be altered or removed
from its original document ($node->cloneNode(1) will get called implicitly).

NOTE: Don't try to use importNode() to import sub-trees that contain an entity
reference - even if the entity reference is the root node of the sub-tree. This
will cause serious problems to your program. This is a limitation of libxml2
and not of XML::LibXML itself.


=item B<adoptNode>

  $document->adoptNode( $node );

If a node is not part of a document, it can be imported to another document. As
specified in DOM Level 3 Specification the Node will not be altered but it will
removed from its original document.

After a document adopted a node, the node, its attributes and all its
descendants belong to the new document. Because the node does not belong to the
old document, it will be unlinked from its old location first.

NOTE: Don't try to adoptNode() to import sub-trees that contain entity
references - even if the entity reference is the root node of the sub-tree.
This will cause serious problems to your program. This is a limitation of
libxml2 and not of XML::LibXML itself.


=item B<externalSubset>

  my $dtd = $doc->externalSubset;

If a document has an external subset defined it will be returned by this
function.

NOTE Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in
XML::LibXML is still limited. In particular one may not want use common node
function on doctype declaration nodes!


=item B<internalSubset>

  my $dtd = $doc->internalSubset;

If a document has an internal subset defined it will be returned by this
function.

NOTE Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in
XML::LibXML is still limited. In particular one may not want use common node
function on doctype declaration nodes!


=item B<setExternalSubset>

  $doc->setExternalSubset($dtd);

EXPERIMENTAL!

This method sets a DTD node as an external subset of the given document.


=item B<setInternalSubset>

  $doc->setInternalSubset($dtd);

EXPERIMENTAL!

This method sets a DTD node as an internal subset of the given document.


=item B<removeExternalSubset>

  my $dtd = $doc->removeExternalSubset();

EXPERIMENTAL!

If a document has an external subset defined it can be removed from the
document by using this function. The removed dtd node will be returned.


=item B<removeInternalSubset>

  my $dtd = $doc->removeInternalSubset();

EXPERIMENTAL!

If a document has an internal subset defined it can be removed from the
document by using this function. The removed dtd node will be returned.


=item B<getElementsByTagName>

  my @nodelist = $doc->getElementsByTagName($tagname);

Implements the DOM Level 2 function

In SCALAR context this function returns a XML::LibXML::NodeList object.


=item B<getElementsByTagNameNS>

  my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);

Implements the DOM Level 2 function

In SCALAR context this function returns a XML::LibXML::NodeList object.


=item B<getElementsByLocalName>

  my @nodelist = $doc->getElementsByLocalName($localname);

This allows the fetching of all nodes from a given document with the given
Localname.

In SCALAR context this function returns a XML::LibXML::NodeList object.


=item B<getElementById>

  my $node = $doc->getElementById($id);

Returns the element that has an ID attribute with the given value. If no such
element exists, this returns undef.

Note: the ID of an element may change while manipulating the document. For
documents with a DTD, the information about ID attributes is only available if
DTD loading/validation has been requested. For HTML documents parsed with the
HTML parser ID detection is done automatically. In XML documents, all "xml:id"
attributes are considered to be of type ID. You can test ID-ness of an
attribute node with $attr->isId().

In versions 1.59 and earlier this method was called getElementsById() (plural)
by mistake. Starting from 1.60 this name is maintained as an alias only for
backward compatibility.


=item B<indexElements>

  $dom->indexElements();

This function causes libxml2 to stamp all elements in a document with their
document position index which considerably speeds up XPath queries for large
documents. It should only be used with static documents that won't be further
changed by any DOM methods, because once a document is indexed, XPath will
always prefer the index to other methods of determining the document order of
nodes. XPath could therefore return improperly ordered node-lists when applied
on a document that has been changed after being indexed. It is of course
possible to use this method to re-index a modified document before using it
with XPath again. This function is not a part of the DOM specification.

This function returns number of elements indexed, -1 if error occurred, or -2
if this feature is not available in the running libxml2.



=back

=head1 AUTHORS

Matt Sergeant, 
Christian Glahn, 
Petr Pajas, 

=head1 VERSION

1.64

=head1 COPYRIGHT

2001-2007, AxKit.com Ltd; 2002-2006 Christian Glahn; 2006-2007 Petr Pajas, All rights reserved.

=cut