Awhile ago I attended an interview with Kaweb (I didn’t get the role btw), they asked me, if I did XML processing before, which I said I did XML and HTML processing with DOMDocument, they also asked me if I used XPath, which I said no to, but I have heard of it, I remember saying it’s like Unix directory structures.
Anyhow I just go ahead, the script in PHP & Python. I didn’t use XPath with Python, only PHP.
PHP (with XPath)
<?php
$dom = new DOMDocument();
$dom->load('http://cj-jackson.com/feed/');
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//channel/item[position() <=5]");
foreach($nodes as $node) {
echo $node->getElementsByTagName('title')->item(0)->nodeValue . '<br />';
}
Python (no XPath)
#!/usr/bin/python2.7
from urllib2 import build_opener
from xml.etree.cElementTree import parse as xmlparse
opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0'), ('Accept', '*/*')] # To get round anti-spam system.
source = opener.open('http://cj-jackson.com/feed/')
feed = xmlparse(source).getroot()
for element in feed.findall('channel/item')[:5]:
print element.findtext('title')
Output
Happy New Year! RockForums.Co Revisited Screw dual-boot, Synergy is Awesome! My Motorbike Got Stolen No more post for awhile
Conclusion
Python and ElementTree are so elegant, it’s pretty much written in a way that I don’t need to use XPath. As for PHP DOMDocument, at least it’s support HTML Processing as well, with Python I had to use html5lib for HTML Processing, the only problem I have with html5lib is that it’s not come with Python by default unlike ElementTree and cElementTree.
The different between ElementTree and cElementTree, the former written in Python, the latter written in C as the name implies for that reason it’s also the fastest, nothing is faster than C except the speed of light.
Update:
ElementTree does not support XPath, if you want to use XPath in Python use lxml instead, it’s does not come with Python by default.