XML Parsing in PHP, XPATH way – The best I know so far

XML Parsing in PHP, XPATH way – The best I know so far

If you are a PHP developer, you surely must have done XML parsing at some stage or the other. Over the years I myself have implemented XML Parsing in atleast 3-4 different ways. Finally I have stuck to this approach which I personally find far more better than the rest, Not only because it’s quite simple but also because it’s extendable. By extendable I mean, you don’t have to touch your code if the XML structure changes at a later stage or if you need to parse a new node at a later stage in the project. In this blog post we will try to parse my twitter timeline, using the XPATH way. To start with let’s see how a my twitter timeline look like:

XML Source:
You may want to open this XML structure in a separate window of your browser for reference, as we walk through various XML parsing techniques.

Data Requirement:
Before we proceed to parse this twitter timeline, lets decide what all data do we want to extract out of the XML. Each <status></status> node consists of two parts. Information about the tweet and information about the user.

Lets finalize the following list of nodes which we want about the tweets and also their corresponding XPATH’s:

  1. id: ../statuses/status/id
  2. text: ../statuses/status/text
  3. source: ../statuses/status/source

Further lets zero out on list of nodes we want about the user details:

  1. id: ../statuses/status/user/id
  2. name: ../statuses/status/user/name
  3. screen_name: ../statuses/status/user/screen_name

XML Parsing:
Let us create a file called xpath.php, which will contain xpath of various nodes which we have finalized above. The xpath.php file will look like:



  $user_status = array(
                      'status_id' => '../statuses/status/id',
                      'status_text' => '../statuses/status/text',
                      'status_source' => '../statuses/status/source',
                      'user_id' => '../statuses/status/user/id',
                      'user_name' => '../statuses/status/user/name',
                      'user_screen_name' => '../statuses/status/user/screen_name'




  // include the xpath file

  // read the xml source as string
  $str = file_get_contents("imoracle.xml");

  // load the string as xml object
  $xml = simplexml_load_string($str);

  // initialize the return array
  $result = array();

  // parse the xml nodes
  foreach($user_status as $key => $xpath) {
    $values = $xml->xpath("{$xpath}");
    foreach($values as $value) {
      $result[$key][] = (string)$value;

  // print the return array


If we try to print out this $result on a browser screen, here is how the result will look like:

    [status_id] => Array
            [0] => 2499838341
            [1] => 2499780899
            [2] => 2499724163
            [3] => 2499607183

    [status_text] => Array
            [0] => 13 Beautiful WordPress Showcase Sites
            [1] => 55 Really Creative And Unique Blog Design Showcase
            [2] => Need PHP symfony developer to complete tvguide.com clone
            [3] => 22 Open Source PHP Frameworks To Shorten Your Development Time

    [status_source] => Array
            [0] => <a href="http://apiwiki.twitter.com/">API</a>
            [1] => <a href="http://apiwiki.twitter.com/">API</a>
            [2] => <a href="http://apiwiki.twitter.com/">API</a>
            [3] => <a href="http://apiwiki.twitter.com/">API</a>

    [user_id] => Array
            [0] => 14574588
            [1] => 14574588
            [2] => 14574588
            [3] => 14574588

    [user_name] => Array
            [0] => Abhinav Singh
            [1] => Abhinav Singh
            [2] => Abhinav Singh
            [3] => Abhinav Singh

    [user_screen_name] => Array
            [0] => imoracle
            [1] => imoracle
            [2] => imoracle
            [3] => imoracle


The Best Part:
The best part of this approach is that, suppose in future our project demands extraction of the following nodes too:

  1. truncated: ../statuses/status/truncated
  2. favorited: ../statuses/status/truncated ../statuses/status/favorited
  3. location: ../statuses/status/user/location
  4. description: ../statuses/status/user/description

All we need to do is, simply add these xpaths in xpath.php file, without having to change the parser.php file. In case of a project you may want to create a function or a class out of the parser.php file so that you can request data from that.

Download the source code from here:

If you liked the post, do not forget to leave a comment and follow me on twitter.
Do let me know of better methods if you know any. Happy XML Parsing!


  1. Pingback: XML Parsing in PHP, XPATH way – The best I know so far | My Money and Finance Blog @ Smrits

  2. Pingback: XML Parsing in PHP, XPATH way – The best I know so far | Abhi’s Weblog

  3. Well, I think this is a nice way to shortern the code but performance will be a big problem. The XPath query is supposed to use when you gather data from many nodes in your XML document. Once you know the node, using Node methods and properties to get data is better.

  4. Unomi

    Yeah, for sure XPath in PHP rocks. Shame is, that it only implements XPath 1.0 for now. But anyhow, with the DOMXPath implementation at least, you can use most of the functionality like XPath functions and syntax. This is way powerful for accessing parts of XML like:
    It’s just an example, but it makes it possible to hop from one known place in the structure to a relative place in the structure.
    I use DOMDocument rather than SimpleXML. SimpleXML is nice for quick and dirty jobs. But DOMDocument lets you alter the XML in a wimp too.
    – Unomi –

  5. @Hung Nguyen: I am not sure about the performance issues. I really don’t have any comparision results. But I have used this technique in past year for projects with traffic estimates of 1 million per month to 1 million per day and I never found any performance issue with this approach.

    @Unomi: Yes I too started with DOMDocument and SimpleXML standards, but I love this approach as this solves the problem in just a few lines. Further I don’t have to alter my code much, if I need to parse more data out of the xml’s

  6. Pingback: 網站製作學習誌 » [Web] 連結分享

    1. ahmed

      nice tut thank you
      for getting to attributes

      ‘page_index’ => ‘../book/page/@index’>
      ‘page_name’ => ‘../book/page/@name’>

      /*use the ‘@’ for accessing attributes*/

  7. Pingback: 25 New & Useful PHP Techniques & Tutorials

    1. chad

      Hi Abhi, I have been reading your blog since a long time. It is great work. Please send me the code for sorting xml using php. I would be your biggest fan if you could help me in this. Here is the xml example:


      Niswazi Hotel

      Other new Hotel

      I want to sort this by hotel code.

      Waiting for your reply…

    2. chad

      It is nice to see your prompt reply.. Yes, I need the whole XML chunk as string with sorted xml structure…

      Thank you in advance… Please give this solution. And my any chance have you formed on GDS integration like galileo?

    3. Well i would say expect any kind of code assistance on this only over the weekend 🙂 Also i cannot promise you about the code samples..

      However, if you are in a hurry here is what you can try. Create a function sortXML($xml, $xpath) where $xml is xml data as string which you want to sort. $xpath is the xpath of node/attribute by which you want to sort.

      Using $xpath extract all nodes at that level and sort them. Finally rearrange $xml depending upon sorted array you just received. Use the tip below.

      Some tricks you may want to use, in case of you unaware of:
      1. ../statuses/status/id/parent::* is equivalent to ../statuses/status
      2. ../statuses/status/child::* is equivalent to ../statuses/status/id

      Using these two tips you can traverse to any parent or children which you like to rearrange or exchange depending upon sorted array. Hope this helps and get you started.

    4. Unomi

      This is exactly why I thought SimpleXML is not sufficient. XPath (and therefore XSL) is way powerful over SimpleXML parsing.

      To sort an XML string, it is doable the XSL way (using XPath as a modifier), since XSL as a tag called sort () and works like a charm.

      I know, that XSL requires more coding, but it also makes sure that the returned XML is valid XML etc. In XSL you can also code dependencies which otherwise should be done in PHP the hard way (read: as a work-around).

      I won’t provide code examples since Google is your friend and the beefy stuff is out there already.

      – Unomi –

  8. Pingback: 25 New & Useful PHP Techniques & Tutorials | Web Development News

  9. Pingback: 25 New & Useful PHP Techniques & Tutorials | php tutorial

  10. Pingback: 25 New & Useful PHP Techniques & Tutorials | DevIphoneApp

  11. Pingback: Amy N Boon » 25 New & Useful PHP Techniques & Tutorials

  12. Pingback: 25 Nya och Användbara PHP Tekniker & handledningar | Artiklar

  13. Pingback: 25 New & Useful PHP Techniques & Tutorials | Certalinx

  14. Mukund

    file size 560mb how to resolve please help

    Give some issue

    PHP Warning: SimpleXMLElement::xpath(): Memory allocation failed : growing nodeset hit limit

  15. Once you know the node, using Node methods and properties to get data is better. I think this is a nice way to shortern the code but performance will be a big problem. The XPath query is supposed to use when you gather data from many nodes in your XML document.

Leave a Reply