The DOMNode class

(PHP 5, PHP 7)

类摘要

DOMNode {

/* 属性 */

public readonly string $nodeName ;

public string $nodeValue ;

public readonly int $nodeType ;

public readonly DOMNode $parentNode ;

public readonly DOMNodeList $childNodes ;

public readonly DOMNode $firstChild ;

public readonly DOMNode $lastChild ;

public readonly DOMNode $previousSibling ;

public readonly DOMNode $nextSibling ;

public readonly DOMNamedNodeMap $attributes ;

public readonly DOMDocument $ownerDocument ;

public readonly string $namespaceURI ;

public string $prefix ;

public readonly string $localName ;

public readonly string $baseURI ;

public string $textContent ;

/* 方法 */

public DOMNode appendChild ( DOMNode $newnode )

public string C14N ([ bool $exclusive [, bool $with_comments [, array $xpath [, array $ns_prefixes ]]]] )

public int C14NFile ( string $uri [, bool $exclusive [, bool $with_comments [, array $xpath [, array $ns_prefixes ]]]] )

public DOMNode cloneNode ([ bool $deep ] )

public int getLineNo ( void )

public string getNodePath ( void )

public bool hasAttributes ( void )

public bool hasChildNodes ( void )

public DOMNode insertBefore ( DOMNode $newnode [, DOMNode $refnode ] )

public bool isDefaultNamespace ( string $namespaceURI )

public bool isSameNode ( DOMNode $node )

public bool isSupported ( string $feature , string $version )

public string lookupNamespaceURI ( string $prefix )

public string lookupPrefix ( string $namespaceURI )

public void normalize ( void )

public DOMNode removeChild ( DOMNode $oldnode )

public DOMNode replaceChild ( DOMNode $newnode , DOMNode $oldnode )

}

属性

nodeName: Returns the most accurate name for the current node type
nodeValue: The value of this node, depending on its type. Contrary to the W3C specification, the node value of DOMElement nodes is equal to DOMNode::textContent instead of NULL.
nodeType: Gets the type of the node. One of the predefined XML_xxx_NODE constants
parentNode: The parent of this node. If there is no such node, this returns NULL.
childNodes: A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.
firstChild: The first child of this node. If there is no such node, this returns NULL.
lastChild: The last child of this node. If there is no such node, this returns NULL.
previousSibling: The node immediately preceding this node. If there is no such node, this returns NULL.
nextSibling: The node immediately following this node. If there is no such node, this returns NULL.
attributes: A DOMNamedNodeMap containing the attributes of this node (if it is a DOMElement) or NULL otherwise.
ownerDocument: The DOMDocument object associated with this node.
namespaceURI: The namespace URI of this node, or NULL if it is unspecified.
prefix: The namespace prefix of this node, or NULL if it is unspecified.
localName: Returns the local part of the qualified name of this node.
baseURI: The absolute base URI of this node or NULL if the implementation wasn't able to obtain an absolute URI.
textContent: The text content of this node and its descendants.

注释

Note:
The DOM extension uses UTF-8 encoding. Use utf8_encode() and utf8_decode() to work with texts in ISO-8859-1 encoding or Iconv for other encodings.

更新日志

版本	说明
5.6.1	The textContent property has been made writable (formerly it has been readonly).

参见

» W3C specification of Node

DOMNode::appendChild — Adds new child at the end of the children
DOMNode::C14N — Canonicalize nodes to a string
DOMNode::C14NFile — Canonicalize nodes to a file
DOMNode::cloneNode — Clones a node
DOMNode::getLineNo — Get line number for a node
DOMNode::getNodePath — Get an XPath for a node
DOMNode::hasAttributes — Checks if node has attributes
DOMNode::hasChildNodes — Checks if node has children
DOMNode::insertBefore — Adds a new child before a reference node
DOMNode::isDefaultNamespace — Checks if the specified namespaceURI is the default namespace or not
DOMNode::isSameNode — Indicates if two nodes are the same node
DOMNode::isSupported — Checks if feature is supported for specified version
DOMNode::lookupNamespaceURI — Gets the namespace URI of the node based on the prefix
DOMNode::lookupPrefix — Gets the namespace prefix of the node based on the namespace URI
DOMNode::normalize — Normalizes the node
DOMNode::removeChild — Removes child from list of children
DOMNode::replaceChild — Replaces a child

User Contributed Notes

zlk1214 at gmail dot com 15-Jan-2016 10:07


A function that can set the inner HTML without encoding error. $html can be broken content such as "<a ID=id20>ssss"

function setInnerHTML($node, $html) {

    removeChildren($node);

    if (empty($html)) {

        return;

    }

   

    $doc = $node->ownerDocument;

    $htmlclip = new DOMDocument();

    $htmlclip->loadHTML('<meta http-equiv="Content-Type" content="text/html;CHARSET=gb2312"><div>' . $html . '</div>');

    $clipNode = $doc->importNode($htmlclip->documentElement->lastChild->firstChild, true);

    while ($item = $clipNode->firstChild) {

        $node->appendChild($item);

    }

}

metanull 25-Jul-2014 12:11


Yet another DOMNode to php array conversion function. 

Other ones on this page are generating too "complex" arrays; this one should keep the array as tidy as possible.

Note: make sure to set LIBXML_NOBLANKS when calling DOMDocument::load, loadXML or loadHTML

See: http://be2.php.net/manual/en/libxml.constants.php

See: http://be2.php.net/manual/en/domdocument.loadxml.php



<?php

         /**

         * Returns an array representation of a DOMNode

         * Note, make sure to use the LIBXML_NOBLANKS flag when loading XML into the DOMDocument

         * @param DOMDocument $dom

         * @param DOMNode $node

         * @return array

         */

        function nodeToArray( $dom, $node) {

            if(!is_a( $dom, 'DOMDocument' ) || !is_a( $node, 'DOMNode' )) {

                return false;

            }

            $array = false; 

            if( empty( trim( $node->localName ))) {// Discard empty nodes

                return false;

            }

            if( XML_TEXT_NODE == $node->nodeType ) {

                return $node->nodeValue;

            }

            foreach ($node->attributes as $attr) { 

                $array['@'.$attr->localName] = $attr->nodeValue; 

            } 

            foreach ($node->childNodes as $childNode) { 

                if ( 1 == $childNode->childNodes->length && XML_TEXT_NODE == $childNode->firstChild->nodeType ) { 

                    $array[$childNode->localName] = $childNode->nodeValue; 

                }  else {

                    if( false !== ($a = self::nodeToArray( $dom, $childNode))) {

                        $array[$childNode->localName] =     $a;

                    }

                }

            }

            return $array; 

        }

?>

pizarropablo at gmail dot com 16-Apr-2014 11:36


In response to: alastair dot dallas at gmail dot com about "#text" nodes.

"#text" nodes appear when there are spaces or new lines between end tag and next initial tag.



Eg "<data><age>10</age>[SPACES]<other>20</other>[SPACES]</data>"



"data" childNodes has 4 childs:

- age = 10

- #text = spaces

- other = 20

- #text =  spaces

matej dot golian at gmail dot com 29-Aug-2013 10:15


Here is a little function that truncates a DomNode to a specified number of text characters. I use it to generate HTML excerpts for my blog entries.



<?php



function makehtmlexcerpt(DomNode $html, $excerptlength)

{

$remove = 0;

$htmllength = strlen(html_entity_decode($html->textContent, ENT_QUOTES, 'UTF-8'));

$truncate = $htmllength - $excerptlength;

if($htmllength > $excerptlength)

{

if($html->hasChildNodes())

{

$children = $html->childNodes;

for($counter = 0; $counter < $children->length; $counter ++)

{

$child = $children->item($children->length - ($counter + 1));

$childlength = strlen(html_entity_decode($child->textContent, ENT_QUOTES, 'UTF-8'));

if($childlength <= $truncate)

{

$remove ++;

$truncate = $truncate - $childlength;

}

else

{

$child = makehtmlexcerpt($child, $childlength - $truncate);

break;

}

}

if($remove != 0)

{

for($counter = 0; $counter < $remove; $counter ++)

{

$html->removeChild($html->lastChild);

}

}

}

else

{

if($html->nodeName == '#text')

{

$html->nodeValue = substr(html_entity_decode($html->nodeValue, ENT_QUOTES, 'UTF-8'), 0, $htmllength - $truncate);

}

}

}

return $html;

}



?>

alastair dot dallas at gmail dot com 25-Sep-2011 05:44


The issues around mixed content took me some experimentation to remember, so I thought I'd add this note to save others time.



When your markup is something like: <div><p>First text.</p><ul><li><p>First bullet</p></li></ul></div>, you'll get XML_ELEMENT_NODEs that are quite regular. The <div> has children <p> and <ul> and the nodeValue for both <p>s yields the text you expect.



But when your markup is more like <p>This is <b>bold</b> and this is <i>italic</i>.</p>, you realize that the nodeValue for XML_ELEMENT_NODEs is not reliable. In this case, you need to look at the <p>'s child nodes. For this example, the <p> has children: #text, <b>, #text, <i>, #text. 



In this example, the nodeValue of <b> and <i> is the same as their #text children. But you could have markup like: <p>This <b>is bold and <i>bold italic</i></b>, you see?</p>. In this case, you need to look at the children of <b>, which will be #text, <i>, because the nodeValue of <b> will not be sufficient.



XML_TEXT_NODEs have no children and are always named '#text'. Depending on how whitespace is handled, your tree may have "empty" #text nodes as children of <body> and elsewhere.



Attributes are nodes, but I had forgotten that they are not in the tree expressed by childNodes. Walking the full tree using childNodes will not visit any attribute nodes.

imranomar at gmail dot com 20-Mar-2011 03:10


Just discovered that node->nodeValue strips out all the tags

I. Cook 19-Apr-2010 11:43


For a reference with more information about the XML DOM node types, see http://www.w3schools.com/dom/dom_nodetype.asp



(When using PHP DOMNode, these constants need to be prefaced with "XML_")

R. Studer 13-Jan-2010 06:03


For clarification:

The assumingly 'discoverd' by previous posters and seemingly undocumented methods (.getElementsByTagName and .getAttribute) on this class (DOMNode) are in fact methods of the class DOMElement, which inherits from DOMNode.



See: http://www.php.net/manual/en/class.domelement.php

David Rekowski 08-Jan-2010 10:54


You cannot simply overwrite $textContent, to replace the text content of a DOMNode, as the missing readonly flag suggests. Instead you have to do something like this:



<?php



$node->removeChild($node->firstChild);

$node->appendChild(new DOMText('new text content'));



?>



This example shows what happens:



<?php



$doc = DOMDocument::loadXML('<node>old content</node>');

$node = $doc->getElementsByTagName('node')->item(0);

echo "Content 1: ".$node->textContent."\n";



$node->textContent = 'new content';

echo "Content 2: ".$node->textContent."\n";



$newText = new DOMText('new content');



$node->appendChild($newText);

echo "Content 3: ".$node->textContent."\n";



$node->removeChild($node->firstChild);

$node->appendChild($newText);

echo "Content 4: ".$node->textContent."\n";



?>



The output is:



Content 1: old content // starting content

Content 2: old content // trying to replace overwriting $node->textContent

Content 3: old contentnew content // simply appending the new text node

Content 4: new content // removing firstchild before appending the new text node



If you want to have a CDATA section, use this:



<?php

$doc = DOMDocument::loadXML('<node>old content</node>');

$node = $doc->getElementsByTagName('node')->item(0);

$node->removeChild($node->firstChild);

$newText = $doc->createCDATASection('new cdata content');

$node->appendChild($newText);

echo "Content withCDATA: ".$doc->saveXML($node)."\n";

?>

Steve K 03-Nov-2009 08:47


This class apparently also has a getElementsByTagName method.



I was able to confirm this by evaluating the output from DOMNodeList->item() against various tests with the is_a() function.

marc at ermshaus dot org 05-May-2009 05:36


It took me forever to find a mapping for the XML_*_NODE constants. So I thought, it'd be handy to paste it here:



 1 XML_ELEMENT_NODE

 2 XML_ATTRIBUTE_NODE

 3 XML_TEXT_NODE

 4 XML_CDATA_SECTION_NODE

 5 XML_ENTITY_REFERENCE_NODE

 6 XML_ENTITY_NODE

 7 XML_PROCESSING_INSTRUCTION_NODE

 8 XML_COMMENT_NODE

 9 XML_DOCUMENT_NODE

10 XML_DOCUMENT_TYPE_NODE

11 XML_DOCUMENT_FRAGMENT_NODE

12 XML_NOTATION_NODE

matt at lamplightdb dot co dot uk 06-Apr-2009 02:39


And apparently also a setAttribute method too:



$node->setAttribute( 'attrName' , 'value' );

brian wildwoodassociates.info 08-Dec-2008 08:27


This class has a getAttribute method.



Assume that a DOMNode object $ref contained an anchor taken out of a DOMNode List.  Then 



    $url = $ref->getAttribute('href'); 



would isolate the url associated with the href part of the anchor.

The DOMNode class

类摘要

属性

注释

更新日志

参见

Table of Contents

User Contributed Notes