XML Parsing in PHP, XPATH way – The best I know so far

Standard

If you are a PHP developer, you surely must have done XML parsing at some stage or the other. Over the years I myself have implemented XML Parsing in atleast 3-4 different ways. Finally I have stuck to this approach which I personally find far more better than the rest, Not only because it’s quite simple but also because it’s extendable. By extendable I mean, you don’t have to touch your code if the XML structure changes at a later stage or if you need to parse a new node at a later stage in the project. In this blog post we will try to parse my twitter timeline, using the XPATH way. To start with let’s see how a my twitter timeline look like:

XML Source:
http://twitter.com/statuses/user_timeline/imoracle.xml
You may want to open this XML structure in a separate window of your browser for reference, as we walk through various XML parsing techniques.

Data Requirement:
Before we proceed to parse this twitter timeline, lets decide what all data do we want to extract out of the XML. Each <status></status> node consists of two parts. Information about the tweet and information about the user.

Lets finalize the following list of nodes which we want about the tweets and also their corresponding XPATH’s:

  1. id: ../statuses/status/id
  2. text: ../statuses/status/text
  3. source: ../statuses/status/source

Further lets zero out on list of nodes we want about the user details:

  1. id: ../statuses/status/user/id
  2. name: ../statuses/status/user/name
  3. screen_name: ../statuses/status/user/screen_name

XML Parsing:
Let us create a file called xpath.php, which will contain xpath of various nodes which we have finalized above. The xpath.php file will look like:

xpath.php

<?php

  $user_status = array(
                      'status_id' => '../statuses/status/id',
                      'status_text' => '../statuses/status/text',
                      'status_source' => '../statuses/status/source',
                      'user_id' => '../statuses/status/user/id',
                      'user_name' => '../statuses/status/user/name',
                      'user_screen_name' => '../statuses/status/user/screen_name'
                      );

?>

parser.php

<?php

  // include the xpath file
  require_once("xpath.php");

  // read the xml source as string
  $str = file_get_contents("imoracle.xml");

  // load the string as xml object
  $xml = simplexml_load_string($str);

  // initialize the return array
  $result = array();

  // parse the xml nodes
  foreach($user_status as $key => $xpath) {
    $values = $xml->xpath("{$xpath}");
    foreach($values as $value) {
      $result[$key][] = (string)$value;
    }
  }

  // print the return array
  print_r($result);

?>

Results:
If we try to print out this $result on a browser screen, here is how the result will look like:

Array
(
    [status_id] => Array
        (
            [0] => 2499838341
            [1] => 2499780899
            [2] => 2499724163
            [3] => 2499607183
        )

    [status_text] => Array
        (
            [0] => 13 Beautiful WordPress Showcase Sites
            [1] => 55 Really Creative And Unique Blog Design Showcase
            [2] => Need PHP symfony developer to complete tvguide.com clone
            [3] => 22 Open Source PHP Frameworks To Shorten Your Development Time
        )

    [status_source] => Array
        (
            [0] => <a href="http://apiwiki.twitter.com/">API</a>
            [1] => <a href="http://apiwiki.twitter.com/">API</a>
            [2] => <a href="http://apiwiki.twitter.com/">API</a>
            [3] => <a href="http://apiwiki.twitter.com/">API</a>
        )

    [user_id] => Array
        (
            [0] => 14574588
            [1] => 14574588
            [2] => 14574588
            [3] => 14574588
        )

    [user_name] => Array
        (
            [0] => Abhinav Singh
            [1] => Abhinav Singh
            [2] => Abhinav Singh
            [3] => Abhinav Singh
        )

    [user_screen_name] => Array
        (
            [0] => imoracle
            [1] => imoracle
            [2] => imoracle
            [3] => imoracle
        )

)

The Best Part:
The best part of this approach is that, suppose in future our project demands extraction of the following nodes too:

  1. truncated: ../statuses/status/truncated
  2. favorited: ../statuses/status/truncated ../statuses/status/favorited
  3. location: ../statuses/status/user/location
  4. description: ../statuses/status/user/description

All we need to do is, simply add these xpaths in xpath.php file, without having to change the parser.php file. In case of a project you may want to create a function or a class out of the parser.php file so that you can request data from that.

Download the source code from here:
http://abhinavsingh.googlecode.com/files/xml-parser-xpath-way.rar

If you liked the post, do not forget to leave a comment and follow me on twitter.
Do let me know of better methods if you know any. Happy XML Parsing!

  • Pingback: XML Parsing in PHP, XPATH way – The best I know so far | My Money and Finance Blog @ Smrits()

  • Pingback: XML Parsing in PHP, XPATH way – The best I know so far | Abhi’s Weblog()

  • Anand

    Good One …… imoracle …..

  • http://artemis.com.vn/blogvui Hung Nguyen

    Well, I think this is a nice way to shortern the code but performance will be a big problem. The XPath query is supposed to use when you gather data from many nodes in your XML document. Once you know the node, using Node methods and properties to get data is better.

  • Unomi

    Yeah, for sure XPath in PHP rocks. Shame is, that it only implements XPath 1.0 for now. But anyhow, with the DOMXPath implementation at least, you can use most of the functionality like XPath functions and syntax. This is way powerful for accessing parts of XML like:
    descandant::name/[position()=1]/ancestor::person/../city[normalize-space(text())]
    It’s just an example, but it makes it possible to hop from one known place in the structure to a relative place in the structure.
    I use DOMDocument rather than SimpleXML. SimpleXML is nice for quick and dirty jobs. But DOMDocument lets you alter the XML in a wimp too.
    – Unomi -

  • http://abhinavsingh.com admin

    @Hung Nguyen: I am not sure about the performance issues. I really don’t have any comparision results. But I have used this technique in past year for projects with traffic estimates of 1 million per month to 1 million per day and I never found any performance issue with this approach.

    @Unomi: Yes I too started with DOMDocument and SimpleXML standards, but I love this approach as this solves the problem in just a few lines. Further I don’t have to alter my code much, if I need to parse more data out of the xml’s

  • http://www.danieldelrio.com Daniel Del Rio

    Awesome article man. I have a few projects I can implement this into!

  • http://www.machete.ca Allain Lalonde

    Pretty sure #2 from “The best bart” is wrong.  Nice article though.

  • http://abhinavsingh.com admin

    Oh yeah thats a typo, will fix that.

    Thanks for pointing that out :)

  • Pingback: 網站製作學習誌 » [Web] 連結分享()

  • kshitiz

    Please tell me what to do if XML file also has the attributes? You code is great… can you provide me the modified code to use xml files with atrributes as well

    • ahmed

      nice tut thank you
      for getting to attributes
      xml:

      php:
      ‘page_index’ => ‘../book/page/@index’>
      ‘page_name’ => ‘../book/page/@name’>

      /*use the ‘@’ for accessing attributes*/

  • http://vtd-xml.sourceforge.net Harry Xu

    Another option for parse/XPath xml is called vtd-xml

    http://vtd-xml.sf.net

  • Pingback: 25 New & Useful PHP Techniques & Tutorials()

  • chad

    Xpath blog is great… But, Abhi can you teach us to sort xml using php and xpath

    • http://abhinavsingh.com Abhinav Singh

      Hi Chad,

      I think sorting can be done quite easily with similar technique. However will try to put up some sample code here.

    • chad

      Hi Abhi, I have been reading your blog since a long time. It is great work. Please send me the code for sorting xml using php. I would be your biggest fan if you could help me in this. Here is the xml example:

      raddison
      5

      Niswazi Hotel
      2

      Other new Hotel
      1

      I want to sort this by hotel code.

      Waiting for your reply…

    • http://abhinavsingh.com Abhinav Singh

      after sorting this xml by hotel code in what form are you expecting an output? List of hotel names as an array? or Whole XML chunk as string with sorted xml structure?

    • chad

      It is nice to see your prompt reply.. Yes, I need the whole XML chunk as string with sorted xml structure…

      Thank you in advance… Please give this solution. And my any chance have you formed on GDS integration like galileo?

    • chad

      Hi Abhi, please show me code to sort xml using php and xpath? Waiting bro..

    • http://abhinavsingh.com Abhinav Singh

      Well i would say expect any kind of code assistance on this only over the weekend :) Also i cannot promise you about the code samples..

      However, if you are in a hurry here is what you can try. Create a function sortXML($xml, $xpath) where $xml is xml data as string which you want to sort. $xpath is the xpath of node/attribute by which you want to sort.

      Using $xpath extract all nodes at that level and sort them. Finally rearrange $xml depending upon sorted array you just received. Use the tip below.

      Some tricks you may want to use, in case of you unaware of:
      1. ../statuses/status/id/parent::* is equivalent to ../statuses/status
      2. ../statuses/status/child::* is equivalent to ../statuses/status/id

      Using these two tips you can traverse to any parent or children which you like to rearrange or exchange depending upon sorted array. Hope this helps and get you started.

    • Unomi

      This is exactly why I thought SimpleXML is not sufficient. XPath (and therefore XSL) is way powerful over SimpleXML parsing.

      To sort an XML string, it is doable the XSL way (using XPath as a modifier), since XSL as a tag called sort () and works like a charm.

      I know, that XSL requires more coding, but it also makes sure that the returned XML is valid XML etc. In XSL you can also code dependencies which otherwise should be done in PHP the hard way (read: as a work-around).

      I won’t provide code examples since Google is your friend and the beefy stuff is out there already.

      – Unomi -

  • Pingback: 25 New & Useful PHP Techniques & Tutorials | Web Development News()

  • Pingback: 25 New & Useful PHP Techniques & Tutorials | php tutorial()

  • hariram

    mr.abinav singh i need some help regarding my project can u please add me in gtalk plsssssssssss

  • sunita

    hello i am sunita pls can u tell me what is the need of xml parsing?

  • sunita

    how can we add videos in our php web pages?

  • swagat

    Thanks for the post dude!

  • Pingback: 25 New & Useful PHP Techniques & Tutorials | DevIphoneApp()

  • Pingback: Amy N Boon » 25 New & Useful PHP Techniques & Tutorials()

  • http://twitter.com/#!/webseficientes Gerardo

    Great article, simple and specific, It was good for me.

  • http://www.amitpatil.me Amit

    What if we dont know the structure of xml file paths or may be files are generating randomly with random structure, in this case is there any oter way we can get xml file in a array structure ??

  • Pingback: 25 Nya och Användbara PHP Tekniker & handledningar | Artiklar()

  • Pingback: 25 New & Useful PHP Techniques & Tutorials | Certalinx()