XML Parsing in PHP, XPATH way – The best I know so far

If you are a PHP developer, you surely must have done XML parsing at some stage or the other. Over the years I myself have implemented XML Parsing in atleast 3-4 different ways. Finally I have stuck to this approach which I personally find far more better than the rest, Not only because it’s quite simple but also because it’s extendable. By extendable I mean, you don’t have to touch your code if the XML structure changes at a later stage or if you need to parse a new node at a later stage in the project. In this blog post we will try to parse my twitter timeline, using the XPATH way. To start with let’s see how a my twitter timeline look like:

XML Source:
http://twitter.com/statuses/user_timeline/imoracle.xml
You may want to open this XML structure in a separate window of your browser for reference, as we walk through various XML parsing techniques.

Data Requirement:
Before we proceed to parse this twitter timeline, lets decide what all data do we want to extract out of the XML. Each <status></status> node consists of two parts. Information about the tweet and information about the user.

Lets finalize the following list of nodes which we want about the tweets and also their corresponding XPATH’s:

  1. id: ../statuses/status/id
  2. text: ../statuses/status/text
  3. source: ../statuses/status/source

Further lets zero out on list of nodes we want about the user details:

  1. id: ../statuses/status/user/id
  2. name: ../statuses/status/user/name
  3. screen_name: ../statuses/status/user/screen_name

XML Parsing:
Let us create a file called xpath.php, which will contain xpath of various nodes which we have finalized above. The xpath.php file will look like:

xpath.php

<?php

  $user_status = array(
                      'status_id' => '../statuses/status/id',
                      'status_text' => '../statuses/status/text',
                      'status_source' => '../statuses/status/source',
                      'user_id' => '../statuses/status/user/id',
                      'user_name' => '../statuses/status/user/name',
                      'user_screen_name' => '../statuses/status/user/screen_name'
                      );

?>

parser.php

<?php

  // include the xpath file
  require_once("xpath.php");

  // read the xml source as string
  $str = file_get_contents("imoracle.xml");

  // load the string as xml object
  $xml = simplexml_load_string($str);

  // initialize the return array
  $result = array();

  // parse the xml nodes
  foreach($user_status as $key => $xpath) {
    $values = $xml->xpath("{$xpath}");
    foreach($values as $value) {
      $result[$key][] = (string)$value;
    }
  }

  // print the return array
  print_r($result);

?>

Results:
If we try to print out this $result on a browser screen, here is how the result will look like:

Array
(
    [status_id] => Array
        (
            [0] => 2499838341
            [1] => 2499780899
            [2] => 2499724163
            [3] => 2499607183
        )

    [status_text] => Array
        (
            [0] => 13 Beautiful WordPress Showcase Sites
            [1] => 55 Really Creative And Unique Blog Design Showcase
            [2] => Need PHP symfony developer to complete tvguide.com clone
            [3] => 22 Open Source PHP Frameworks To Shorten Your Development Time
        )

    [status_source] => Array
        (
            [0] => <a href="http://apiwiki.twitter.com/">API</a>
            [1] => <a href="http://apiwiki.twitter.com/">API</a>
            [2] => <a href="http://apiwiki.twitter.com/">API</a>
            [3] => <a href="http://apiwiki.twitter.com/">API</a>
        )

    [user_id] => Array
        (
            [0] => 14574588
            [1] => 14574588
            [2] => 14574588
            [3] => 14574588
        )

    [user_name] => Array
        (
            [0] => Abhinav Singh
            [1] => Abhinav Singh
            [2] => Abhinav Singh
            [3] => Abhinav Singh
        )

    [user_screen_name] => Array
        (
            [0] => imoracle
            [1] => imoracle
            [2] => imoracle
            [3] => imoracle
        )

)

The Best Part:
The best part of this approach is that, suppose in future our project demands extraction of the following nodes too:

  1. truncated: ../statuses/status/truncated
  2. favorited: ../statuses/status/truncated ../statuses/status/favorited
  3. location: ../statuses/status/user/location
  4. description: ../statuses/status/user/description

All we need to do is, simply add these xpaths in xpath.php file, without having to change the parser.php file. In case of a project you may want to create a function or a class out of the parser.php file so that you can request data from that.

Download the source code from here:
http://abhinavsingh.googlecode.com/files/xml-parser-xpath-way.rar

If you liked the post, do not forget to leave a comment and follow me on twitter.
Do let me know of better methods if you know any. Happy XML Parsing!