XML Parsing with PHP & Python

Awhile ago I attended an interview with Kaweb (I didn’t get the role btw), they asked me, if I did XML processing before, which I said I did XML and HTML processing with DOMDocument, they also asked me if I used XPath, which I said no to, but I have heard of it, I remember saying it’s like Unix directory structures.

Anyhow I just go ahead, the script in PHP & Python.  I didn’t use XPath with Python, only PHP.

PHP (with XPath)

<?php

$dom = new DOMDocument();
$dom->load('http://cj-jackson.com/feed/');

$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//channel/item[position() <=5]");

foreach($nodes as $node) {
	echo $node->getElementsByTagName('title')->item(0)->nodeValue . '<br />';
}

Python (no XPath)

#!/usr/bin/python2.7
from urllib2 import build_opener
from xml.etree.cElementTree import parse as xmlparse

opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0'), ('Accept', '*/*')] # To get round anti-spam system.
source = opener.open('http://cj-jackson.com/feed/')

feed = xmlparse(source).getroot()

for element in feed.findall('channel/item')[:5]:
	print element.findtext('title')

Output

Happy New Year!
RockForums.Co Revisited
Screw dual-boot, Synergy is Awesome!
My Motorbike Got Stolen
No more post for awhile

Conclusion

Python and ElementTree are so elegant, it’s pretty much written in a way that I don’t need to use XPath. As for PHP DOMDocument, at least it’s support HTML Processing as well, with Python I had to use html5lib for HTML Processing, the only problem I have with html5lib is that it’s not come with Python by default unlike ElementTree and cElementTree.

The different between ElementTree and cElementTree, the former written in Python, the latter written in C as the name implies for that reason it’s also the fastest, nothing is faster than C except the speed of light.

Update:

ElementTree does not support XPath, if you want to use XPath in Python use lxml instead, it’s does not come with Python by default.

RockForums.Co Revisited

I have spent most of my Web Development years using PHP, one of the limitations is conforming to the shared server php.ini file such as file upload limit which is 64mb and cannot be changed by myself (http://support.hostgator.com/articles/cpanel/php-settings-that-cannot-be-changed), only the server admin can do that (very unlikely to change setting even if I asked nicely); One way of avoiding php.ini is to use a language other than PHP, I choose to go by what the Scotsman (in one of the job interview I had) suggested to me, Python, which is virtually zero configuration at the core because of the general purpose nature, framework like Django has its own set of configurations which I have full control of by modifying settings.py in the project folder.

The design methodology is based on Object Role Modelling (ORM) not to be confused with Object Relational Mapping which happens to share the same initial.  I could of used Entity Relational Diagram (ERD) in Visio, but then I would find myself fighting with dialogue boxes which I find very tedious with ORM I never get into that situation with those, I just simply drag and drop the symbols and type in the labels.

Here the ORM Design, those dashed boxes are not part of the ORM standard, but at least it’s fellow Django structure quite well.

The prototype is all up and running at http://rockforums.co, I still got to fully implement the Message App and the password reset feature.

I find Python to be faster than PHP, PHP tends to include all the modules, with Python I have to explicitly specify the required modules to import, Python is compiled while PHP is interpreted, plus someone in London pointed out to me is that most shared server have PHP poorly configured, PHP can do html cache but many of them choose not to set it up,  I believe Django also has html cache (https://docs.djangoproject.com/en/dev/topics/cache/).  I also find Python a lot tidier then PHP, mainly because it’s not so dependent on wildcard ‘$’ or dollar as it’s known in currency.

I had fun with Python and Django, I plan on getting the rest done in January next year 2012.