XML Parsing with PHP & Python

Awhile ago I attended an interview with Kaweb (I didn’t get the role btw), they asked me, if I did XML processing before, which I said I did XML and HTML processing with DOMDocument, they also asked me if I used XPath, which I said no to, but I have heard of it, I remember saying it’s like Unix directory structures.

Anyhow I just go ahead, the script in PHP & Python.  I didn’t use XPath with Python, only PHP.

PHP (with XPath)

<?php

$dom = new DOMDocument();
$dom->load('http://cj-jackson.com/feed/');

$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//channel/item[position() <=5]");

foreach($nodes as $node) {
	echo $node->getElementsByTagName('title')->item(0)->nodeValue . '<br />';
}

Python (no XPath)

#!/usr/bin/python2.7
from urllib2 import build_opener
from xml.etree.cElementTree import parse as xmlparse

opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0'), ('Accept', '*/*')] # To get round anti-spam system.
source = opener.open('http://cj-jackson.com/feed/')

feed = xmlparse(source).getroot()

for element in feed.findall('channel/item')[:5]:
	print element.findtext('title')

Output

Happy New Year!
RockForums.Co Revisited
Screw dual-boot, Synergy is Awesome!
My Motorbike Got Stolen
No more post for awhile

Conclusion

Python and ElementTree are so elegant, it’s pretty much written in a way that I don’t need to use XPath. As for PHP DOMDocument, at least it’s support HTML Processing as well, with Python I had to use html5lib for HTML Processing, the only problem I have with html5lib is that it’s not come with Python by default unlike ElementTree and cElementTree.

The different between ElementTree and cElementTree, the former written in Python, the latter written in C as the name implies for that reason it’s also the fastest, nothing is faster than C except the speed of light.

Update:

ElementTree does not support XPath, if you want to use XPath in Python use lxml instead, it’s does not come with Python by default.

RockForums.Co Revisited

I have spent most of my Web Development years using PHP, one of the limitations is conforming to the shared server php.ini file such as file upload limit which is 64mb and cannot be changed by myself (http://support.hostgator.com/articles/cpanel/php-settings-that-cannot-be-changed), only the server admin can do that (very unlikely to change setting even if I asked nicely); One way of avoiding php.ini is to use a language other than PHP, I choose to go by what the Scotsman (in one of the job interview I had) suggested to me, Python, which is virtually zero configuration at the core because of the general purpose nature, framework like Django has its own set of configurations which I have full control of by modifying settings.py in the project folder.

The design methodology is based on Object Role Modelling (ORM) not to be confused with Object Relational Mapping which happens to share the same initial.  I could of used Entity Relational Diagram (ERD) in Visio, but then I would find myself fighting with dialogue boxes which I find very tedious with ORM I never get into that situation with those, I just simply drag and drop the symbols and type in the labels.

Here the ORM Design, those dashed boxes are not part of the ORM standard, but at least it’s fellow Django structure quite well.

The prototype is all up and running at http://rockforums.co, I still got to fully implement the Message App and the password reset feature.

I find Python to be faster than PHP, PHP tends to include all the modules, with Python I have to explicitly specify the required modules to import, Python is compiled while PHP is interpreted, plus someone in London pointed out to me is that most shared server have PHP poorly configured, PHP can do html cache but many of them choose not to set it up,  I believe Django also has html cache (https://docs.djangoproject.com/en/dev/topics/cache/).  I also find Python a lot tidier then PHP, mainly because it’s not so dependent on wildcard ‘$’ or dollar as it’s known in currency.

I had fun with Python and Django, I plan on getting the rest done in January next year 2012.

Screw dual-boot, Synergy is Awesome!

One day I was sitting in front of both computers, I noticed that both have Windows 7 installed and I thought having two computers with the same OS is a bit pointless so I decided to have Ubuntu (GNU/Linux) installed on my Dell laptop, before I did that I switched the hard drive, so I can always switch it back to Windows with no problem.

Also I have Synergy server installed on Windows and the client on Ubuntu, the server shares it’s keyboard and mouse with the client, in other words I can control both computers with the same keyboard and mouse.  It’s also share the clipboard (copy & paste).

Plus I can use the Web Browser and Secure Shell (SSH) (viewing apache web log) without having to minimize the IDE.

Without doubt, the solution is more expensive than dual-boot, but yet it’s more productive as I don’t have to keep on rebooting just to use a different OS or minimize Virtual Machine.

My Motorbike Got Stolen

Sinnis Vista 125cc (GX07 KKT)

DELETED!

It’s 2012 now, I rather puts this behind, it’s so 2011. Comments closed, don’t want to talk about that.  It’s was my fault, I left the bike on the front drive, I don’t keep my new bike on the front drive.  Also I have learned that complacency can have a nasty impact on financials.

RockForums.Co 0.95 – Bye Bye Symfony

On the last post I said I had the confidence to rewrite RockForums.Co from scratched, without the use of the Symfony or any other PHP framework, also on the last post I showed you the source code to the URL Routing System, well I also used that source code in RockForums.Co and it works like a charm, well I did make a few modification to the code to get it to work on the production server.

Also my development server which happen to run on Windows 7 and IIS actually let me off with something like “include ‘/../view/example.php’;” but on the production server it didn’t work, it runs on Linux and Apache, I had to add “dirname(__FILE__)” to get it to work on the production server, and still work on the development server.

Neither the less the code on the development server is exactly the same as the production server, because I written a simple if statement which determine the different between development and production therefore load the correct database configuration with no problem.  I couldn’t do that with the Symfony Framework, I had to correct the configuration manually for production and like any other human, I am prone to making mistakes, I could accidently overwrite the configuration with the wrong configuration, because I rewritten the script that won’t happen.

Speaking of Framework, the best PHP Framework there is, is PHP itself, not Symfony, not CakePHP and certainly not Zend Framework, but PHP itself from PHP.net because it simple and does not add too much complexity plus PHP follows the reflection pattern quite well the URL Routing System is an example of a reflection pattern.   I find the Symfony function link_to() ridiculous because all that does is generate a hyperlink, which is very easy to write in html. (<a  class=”link” href=”http://example.com”>Example</a>)

I written in an auto-upgrade script, what that script does is update the tables automatically so I don’t need to modify the tables manually while deploying to production.  HTML5 AV Manager for WordPress also has that script.

I also written an auto embed library called oEmbedder, yes as the name implies it uses oEmbed and it includes support for embedly and plus I opened sourced it and release it into Google Code under the MIT License, available from http://code.google.com/p/oembedder/ .  I am aware of PHP-oEmbed and oEmbed-PHP on Google Code, one was a bit bloated, and the other was simpler but not flexible enough to my taste, both of them had error checking which I find kind of pointless because the thing with json_decode and simple_xml is that they both return false on fail and that all the information I need to know, so basically it either works or it doesn’t just like HDMI Cables, so please don’t buy the expensive ones it just a waste of money.

The forum is at http://rockforums.co , enjoy.

Simple & Effective URL Routing System.

I have learned how to build a Model-View-Controller and Object-Relational-Mapping, now I learned how to build a simple and effective URL Routing System.  I should now have the confidence to rewrite RockForums.Co without the use of symfony or any other PHP Framework.  Beside PHP is an excellent framework in itself.

Here the PHP code for the Simple & Effective URL Routing System.

class route {

    static private $ROUTES;

    static public function init() {
        $path = false;
        if (isset($_SERVER['PATH_INFO'])) {
            $path = $_SERVER['PATH_INFO'];
        } else {
            $path = $_SERVER['REQUEST_URI'];
            $self = $_SERVER['PHP_SELF'];
            $self = dirname($self);
            $self = str_replace('\\', '/', $self); // for Windows Compatibility.
            $self = strlen($self);
            $path = substr($path, $self);
            if ($path != '') {
                $thehash = strpos($path, '#');
                if ($thehash) {
                    $path = substr($path, 0, $thehash);
                }
                $question = strpos($path, '?');
                if ($question) {
                    $path = substr($path, 0, $question);
                }
            } else {
                $path = false;
            }
        }

        if (!$path) {
            $routes = array('index', 'index');
        } else {
            $path = trim($path);
            $path = trim($path, "/");
            $routes = strtolower($path);
            $routes = explode('/', $routes);
        }

        // Useful for pagenationing an index. ;)
        if (is_numeric($routes[0])) {
            $tempRoutes = array('index', 'index');
            foreach ($routes as $route) {
                $tempRoutes[] = $route;
            }
            unset($route);
            $routes = $tempRoutes;
            unset($tempRoutes);
        }

        if (class_exists($routes[0] . '_action')) {
            if (!isset($routes[1])) {
                $routes[1] = 'index';
            }
            if (method_exists($routes[0] . '_action', $routes[1])) {
                self::$ROUTES = $routes;
            } else {
                $altRoutes = array($routes[0], 'index');
                $count = 1;
                while (isset($routes[$count])) {
                    $altRoutes[] = $routes[$count];
                    $count++;
                }
                self::$ROUTES = $altRoutes;
            }
            call_user_func(array(self::$ROUTES[0] . '_action',
                self::$ROUTES[1]));
            return;
        } elseif (class_exists('index_action')) {
            if (method_exists('index_action', $routes[0])) {
                $altRoutes = array('index', $routes[0]);
                $count = 1;
            } else {
                $altRoutes = array('index', 'index');
                $count = 0;
            }
            while (isset($routes[$count])) {
                $altRoutes[] = $routes[$count];
                $count++;
            }
            self::$ROUTES = $altRoutes;
            call_user_func(array(self::$ROUTES[0] . '_action',
                self::$ROUTES[1]));
            return;
        }

        page::show404();
    }

    static public function getRoutes() {
        return self::$ROUTES;
    }

}

class index_action {

    static public function index() {
        echo 'Index, Index';
        $route = route::getRoutes();
        if (isset($route[2])) {
            echo ', ' . $route[2];
        }
    }

    static public function testing() {
        echo 'Index, Testing';
        $route = route::getRoutes();
        if (isset($route[2])) {
            echo ', ' . $route[2];
        }
    }

}

class test_action {

    static public function index() {
        echo 'Test, Index';
        $route = route::getRoutes();
        if (isset($route[2])) {
            echo ', ' . $route[2];
        }
    }

    static public function test() {
        echo 'Test, Test';
        $route = route::getRoutes();
        if (isset($route[2])) {
            echo ', ' . $route[2];
        }
    }

}

route::init();

The class name is in 0, the method name is in 1, 2 and above are the parameters. The benefit of writing a URL Routing System in PHP rather than writing it purely in .htaccess (Apache) or web.config (IIS7) is cross compatibly with Apache and IIS, and probably with some other web servers.

Code for Apache .htaccess

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f [NC,OR]
RewriteCond %{REQUEST_FILENAME} -d [NC]
RewriteRule .* - [L]
RewriteRule ^(.*)$ index.php/$1 [QSA,L]
</IfModule>

Code for IIS7 web.config

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
        <rewrite>
            <rules>
                <rule name="Imported Rule 3" stopProcessing="true">
                    <match url=".*" ignoreCase="false" />
                    <conditions logicalGrouping="MatchAny">
                        <add input="{REQUEST_FILENAME}" matchType="IsFile" />
                        <add input="{REQUEST_FILENAME}" matchType="IsDirectory" />
                    </conditions>
                    <action type="None" />
                </rule>
                <rule name="Imported Rule 4" stopProcessing="true">
                    <match url="^(.*)$" ignoreCase="false" />
                    <action type="Rewrite" url="index.php/{R:1}" appendQueryString="true" />
                </rule>
            </rules>
        </rewrite>
    </system.webServer>
</configuration>

.htaccess and web.config are used as pointers to the file.

One more thing if you’re planning on using URL slug, always prefix it with a dash (-) to avoid problems.

Let me know what you think?

Update: I have added rtrim and other improvement to the script, e.g. if a class is not detected it will try to detect it as a method in the index class, if that fails it will use the index method.

Update 2: Now uses trim rather than rtrim, works better.

Update 3: Found out that $_SERVER['PATH_INFO'] does not seem to work with mod_rewrite on some servers, but can be emulated in combination with $_SERVER['REQUEST_URI'] and $_SERVER['PHP_SELF'].