Writing a custom unix style tail in PHP using Libevent API on Mac OS X 10.5.x and other platforms

Libevent is a library which provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. Many famous applications/frameworks/libraries like memcached are using libevent. In this blog post, I will demonstrate how to write a custom unix style tail script using Libevent API in PHP.

Setting up the environment:
Setting up libevent with PHP is a little tricky. Below are the steps, I followed to make it work on Mac OSX 10.5. However the steps should be same for any other OS you choose to code on. Here we go:

  1. Check the version of libevent installed on your system. If you don’t have libevent or the installed version is < 1.4, you will need to compile libevent-1.4.x
    ~ sabhinav$ port list | grep libevent
    libevent                       @1.4.12         devel/libevent
  2. Uninstall existing libevent
    ~ sabhinav$ port uninstall libevent
  3. Add the following into your .bash_profile file:
    export MACOSX_DEPLOYMENT_TARGET=10.5
    export CFLAGS="-arch x86_64 -g -Os -pipe -no-cpp-precomp"
    export CCFLAGS="-arch x86_64 -g -Os -pipe"
    export CXXFLAGS="-arch x86_64 -g -Os -pipe"
    export LDFLAGS="-arch x86_64 -bind_at_load"
  4. Open a new terminal window. Download and extract libevent-1.4.12-stable
    ~ sabhinav$ wget http://www.monkey.org/~provos/libevent-1.4.12-stable.tar.gz
    ~ sabhinav$ tar -xvzf libevent-1.4.12-stable.tar.gz
  5. Compile libevent-1.4.12-stable
    ~ sabhinav$ cd libevent-1.4.12-stable
    ~ sabhinav$ ./configure
    ~ sabhinav$ make
    ~ sabhinav$ sudo make install
  6. Assuming you have a successful installation, lets install PECL package libevent-0.0.2.
    ~ sabhinav$ pecl download libevent-0.0.2
    ~ sabhinav$ tar -xvzf libevent-0.0.2.tgz libevent-0.0.2
    ~ sabhinav$ cd libevent-0.0.2
    ~ sabhinav$ phpize
    ~ sabhinav$ ./configure
    ~ sabhinav$ make
    ~ sabhinav$ sudo make install
  7. Enable libevent extension in your php.ini
    extension=libevent.so
  8. Reload apache server
    ~ sabhinav$ sudo apachectl restart
  9. Confirm we have libevent extension enabled using phpinfo(); or
    ~ sabhinav$ php -i | grep libevent

Writing a custom unix style tail script in PHP (tail.php)
Below is a sample script which can be used as a base for writing custom unix style tail script. Comments in the code will help you understanding the flow of the code. Also do view official documentation for PHP Libevent extension usage.

<?php

	// callback function called whenever the registered event is triggered
	function eventFd($fd, $events, $arg) {
		echo fread($fd, 4096);
	}

	// create event base
	$base_fd = event_base_new();

	// create a new event
	$event_fd = event_new();

	// resource to be monitored
	$fd = fopen($argv[1], 'r');

	// set event on passed file name
	event_set($event_fd, $fd, EV_WRITE | EV_PERSIST, 'eventFd', array($event_fd, $base_fd));

	// associate base with this event
	event_base_set($event_fd, $base_fd);

	// register event
	event_add($event_fd);

	// start event loop
	event_base_loop($base_fd);

?>

Trying out tail.php
Save the above code file and issue the following on the terminal:

~ sabhinav$ php tail.php /var/log/apache2/access_log

Try accessing a page on your webserver and you should see the access log being tailed by the php script. 😀

Enjoy!

Web Security : Using crumbs to protect your PHP API (Ajax) call from Cross-site request forgery (CSRF/XSRF) and other vulnerabilities

Have your API calls ever being used directly by someone without your permission? If yes, read on to find out how can we protect our API’s from such spammers and hackers. Before we go ahead and see a possible solution for this, lets try to list out a few cases, when our API’s can be accessed without our permissions.

Common cases of vulnerable API/Ajax calls

  • Ajax calls having no user authentication: This is the first place where a spammer will try to find out a loop hole. Take this example, suppose I created a group chat plugin for my blog. Since it’s a group chat plugin, I don’t really want the blog viewers to register before they can write a messages. Blog viewer only need to provide their name, email and url (just like wordpress comments). Thereafter, they can write messages which are submitted on the server side using ajax calls. And here is the “problem”. Anyone can pick up the ajax url, write a curl script, post the required parameters and fill up my database with millions of messages.
  • Ajax calls having user authentication: One day I realize my group chat plugin has received more than 1 million messages last night (all spams). Hence I decide to make my blog viewers to register before they can post a message on the group chat plugin, simply because someone is filling up my database by simulating ajax calls through a curl script. Anyone can write a script, since these ajax call do not authenticate the user making the call. But are my ajax calls safe after forcing users to register? NO, a registered user too can simulate these ajax calls and passing authentication by sending the right cookies.

Possible solutions and their flaws
If you look around on web, you will find a bunch of solution to such problems. But then every solution have it’s own problem which forces you not to use them. Listed below are 2 possible solutions to our problem:

  • Using X-Requested-With to protect ajax calls: All famous javascript frameworks like JQuery, YUI, Mootools etc sends an additional header parameter while making an XHR request. These libraries set “X-Requested-With=XMLHttpRequest” header, which can then be used on the server side to detect if the call was made through an ajax call. But a programmer can easily pass these headers using a curl script, making the server believe that the call was made through an XHR request.
  • Using HTTP Referrer: This solution comes in handy for cases when a spammer/hacker try to POST data into your site’s. We can check for the referrer page, before we go ahead and accept the POST data. If the POST data is coming from a page within your site, you go ahead and accept the data, otherwise reject it. But this solution again have it’s shortcomings. HTTP Referrer can be tampered in certain browsers using javascript and they can also be stripped away by some proxies and firewalls.

Using crumbs
Finally the idea is to have crumbs. A unique electronic key which is shared between server and client, and which have a short life time. But how are these useful? Suppose, in my group chat module, upon page load i generate a crumb whose life time is 30 minutes (tunable). Why 30 minutes? Because, I assume my blog viewers to either engage into the group chat module or leave that specific blog post within 30 minutes.

Now whenever a user writes a message, this crumb is passed back to the server side. If user writes a message before 30 minutes, this crumb will be validated and user shout submitted. (30 minutes should take care of 99.99% of the cases). In response, server api sends back the new crumb which should be sent back with the next ajax call.

Now when a spammer try to simulate the ajax request using curl calls, he will not be able to succeed because of the absence of the crumb. But he can capture the crumb from the site and simulate the effect, right? YES he can, but we can take care of this by reducing the life time of the generated crumb.

Generating crumbs using PHP
Here are the two functions, I use to generate and verify crumbs in PHP:

        // user for whom crumb is to be generated
        $uid = "[email protected]";

        // usually $salt = DB_PASSWORD . DB_USER . DB_NAME . DB_HOST . ABSPATH;
        $salt = "abcdefghijklmnopqrstuvwxyz";

        function challenge($data) {
                global $salt;
                return hash_hmac('md5', $data, $salt);
        }

        function issue_crumb($ttl, $action = -1) {
                global $uid;

                // ttl
                $i = ceil(time() / $ttl);

                // log
                echo "Generating crumb at time:".time().", i:".$i.", action:".$action.", uid:".$uid.PHP_EOL;

                // return crumb
                return substr(challenge($i . $action . $uid), -12, 10);
        }

        function verify_crumb($ttl, $crumb, $action = -1) {
                global $uid;

                // ttl
                $i = ceil(time() / $ttl);

                // log
                echo "Verifying crumb:".$crumb." at time:".time().", i:".$i.", action:".$action.", uid:".$uid.PHP_EOL;

                // verify crumb
                if(substr(challenge($i . $action . $uid), -12, 10) == $crumb || substr(challenge(($i - 1) . $action . $uid), -12, 10) == $crumb)
                        return true;
                return false;
        }

I can generate crumbs with a simple call:

$crumb = issue_crumb(300, "group_chat_module");

where $ttl = 300 (required), $action = “group_chat_module” (optional, defaults to -1)

Later on I can verify the crumb using another call:

var_dump(verify_crumb(300, $crumb, "group_chat_module"));

I hope this helps you protecting your API’s. Let me know of better methods to stop such attacks.
Enjoy!

Building a Custom PHP Framework with a custom template caching engine using Output Control functions

In past 1 year or so, I had opportunities of using a lot of php frameworks including zend, symfony, cakephp, codeigniter. All frameworks have their pros and cons, however that is out of scope of this blog post. You may want to checkout this comparison list of php frameworks here.

In this blog post I will build a custom PHP framework (MVC Architecture). Then go on to discuss in brief about the output control functions and finally show how to build a custom template caching engine using these functions for our framework.

Source Code
You may want to download the complete source code for this blog post from here.

Building a custom PHP Framework
We will choose a MVC architecture for our framework. Here is a basic directory structure for our custom framework:

/index.php
/.htaccess
/config.ini.php
/404.php
/view
    /view/test1.php
    /view/test2.php
/model
    /model/model.class.php
/controller
    /controller/controller.class.php
/log
    /log/logger.class.php
    /log/log.log
/cache
    /cache/cache.class.php
    /cache/template.cache.class.php
    /cache/template
        /cache/template/test1.php.tmp
        /cache/template/test2.php.tmp

The view, model, controller, log and cache directories contains the following framework modules respectively:

  1. view directory contains our view level files. i.e. files containing our HTML, js, css code.
  2. model directory contains the model class responsible for interacting with database and other storages
  3. controller directory contains our controller class. Each incoming request is first received by the controller class constructor, which thereafter controls the flow of request in the framework
  4. log directory contains our logger class. This class is auto loaded for every request providing a basic logger::log($log_message) logger method throughout the framework. This class logs all data in a file called log.log.
  5. cache directory contains our cache class. For this blog tutorial, we will only write the template caching engine class. In production systems, we might have individual classes for other types of cache systems e.g. memcached (Read Memcached and “N” things you can do with it – Part 1 to know more about memcached and MySQL Query Cache, WP-Cache, APC, Memcache – What to choose for a complete comparison lists of various other caching techniques.

Lets see in details, what all file each and every directory contain contains.

Root directory files
We have 4 files in our root directory, namely .htaccess, index.php, config.ini.php and 404.php in order of relevance. Lets look through the content of these files:

.htaccess

RewriteEngine on
RewriteBase /

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) index.php
  1. 1st two lines essentially means that Switch on the apache rewrite module and set RewriteBase as / i.e. the root directory
  2. Last 3 lines mean that, if incoming request is for a file or directory which physically exists under the root directory serve them otherwise route all other requests to index.php in the root directory

Hence now for an incoming request like http://localhost/test1.php, apache will route the request to index.php in the root directory because there is no test1.php under the root directory. Cool, lets see what index.php has to offer.

index.php

<?php

  // include configuration file
  include_once("config.ini.php");

  // include controller files
  include_once("controller/controller.class.php");

?>

index.php doesn’t do much except for including our core configuration file and controller class file. Controller class constructor is initiated as soon as the class file is included.

config.ini.php is our core configuration file. It provides the framework with an array called $config containing various information like: mysql database credentials, requested url details and various other global parameters. Lets see what all parameter does it provide us with.

config.ini.php

<?php

  $config = array(
                  'host'=>array(
                                'name' => "http://".$_SERVER['HTTP_HOST'].'/',
                                'uri' => $_SERVER['REQUEST_URI'],
                                'url' => parse_url($_SERVER['REQUEST_URI']), // note it contains the parsed url contents
                               ),
                  'mysql'=>array(
                                 'host' => 'localhost',
                                 'name'=> 'testdb',
                                 'user' => 'root',
                                 'pass' => 'password',
                                ),
                  'cache'=>array(
                                 'template' => 'On', // template caching switched on by default
                                 'memcached' => 'Off', // switch off memcached caching
                                ),
                 );

?>

$config[‘host’] array saves various parameter about the host itself, e.g. hostname, hosturi (the requested uri, hosturl (it contains the parse_url(hosturi)).

$config[‘mysql’] array contains mysql database parameters. However in this blog post we will not interact with databases.

$config[‘cache’] tells the framework what all caching modules are switched on.

404.php

<html>
  <head>

  </head>
  <body>
    <h1>404 Page</h1>
  </body>
</html>

Controller directory files
For this blog post, controller directory consists of a single class file. i.e. controller.class.php. We saw this being included by index.php in the root folder above. As soon as controller class is included, it’s constructor is invoked. Before we dig in more, lets see the controller class file:

controller.class.php

<?php

  // include logger class
  include_once("log/logger.class.php");

  // include cache class (contains template caching)
  include_once("cache/cache.class.php");

  // include model class
  include_once("model/model.class.php");

  class controller {

    function __construct($config) {
        global $config;

        // generate requested template name and path
        $config['template']['name'] = $config['host']['uri'] == '/' ? 'index.php' : substr($config['host']['uri'], 1, strlen($config['host']['uri']));
        $config['template']['path'] = "view/".$config['template']['name'];

        // check 404
        if(!file_exists($config['template']['path'])) {
            $config['template']['name'] = "404.php";
            $config['template']['path'] = "404.php";
        }
        logger::log("Requested template name ".$config['template']['name'].", path ".$config['template']['path']);

        // invoke template caching engine
        $template_cache = new template_cache();

        // include the template
        include_once($config['template']['path']);

        // cache template
        $template_cache->setTemplate();
    }

  }

  $controller = new controller($config);

?>

At the top, controller class includes the logger.class.php, cache.class.php and model.class.php files. At the bottom, the controller object is instantiated.

The constructor performs the following 5 tasks:

  1. At first it generates a template name and a template path for the incoming request i.e. for http://localhost/, $config['template']['name']='index.php' and for http://localhost/test1.php, $config['template']['name']='test1.php'.
  2. Second it checks for 404. For the above generated template path, e.g. $config['template']['path']='view/test1.php', it checks whether this file exists inside root directory. If it doesn’t template path and names are set to 404.php
  3. Thirdly, It invokes the template caching engine. i.e. $template_cache = new template_cache();
  4. Forth, it includes the generated template path above i.e. include_once($config['template']['path']);
  5. Fifth and finally, it caches the generated HTML, js, css code by the template file includes above. This is achieved by the following code, $template_cache->setTemplate();

Before we move our attention to, lets see in short the content of log and view directories.

Log directory files
Log directory contains our logger class. This class is auto loaded for each incoming request that is being routed to index.php in the root directory (as we saw above). The logger.class.php provides a static logger::log($log_message) method, which can be used throughout the framework for logging messages. We will be using it everywhere.

logger.class.php

<?php

  class logger {

    static $log_file = "log/log.log";

    static function log($log) {
      if($log != '') {
        $fh = fopen(self::$log_file, "a");
        fwrite($fh,date('Y-m-d H:i:s')."n".$log."nn");
        fclose($fh);
      }
    }

  }

?>

The logger class by default logs all data to a file called log.log.

View directory files
For this blog post, we have two simple test pages in view directory namely test1.php and test2.php, which can be access by typing http://localhost/test1.php and http://localhost/test2.php respectively in the browser.

test1.php

<html>
  <head>

  </head>
  <body>
    <p>
      <?php echo model::test1data(); ?>
    </p>
  </body>
</html>

test1.php simply calls the model class method called model::test1data() (static method). This method extracts some dummy text from the database and returns it back.

Model directory files
Model directory contains the model class file. In production systems, model class file will provide various methods to select and insert data in the databases. However for this blog post we will simply return some static dummy test.

model.class.php

<?php

  class model {

    // This method will return data generally from a database table
    // To keep it simple for the post we return some dummy lipsum text
    static function test1data() {
      logger::log("Returning test1data() from database");
      return "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ut nulla ac risus viverra ornare. Nulla consectetur, metus eleifend pharetra posuere, lacus nibh elementum leo, in fermentum lectus lorem in ipsum. Nullam pulvinar purus at erat pharetra volutpat. Pellentesque egestas rutrum lectus, ut rutrum tellus tristique sed. Integer diam est, ornare ac ultricies vel, aliquam non mi. Etiam tempor leo eu lacus tempus sagittis sagittis turpis dictum. Sed leo sapien, pharetra sit amet faucibus et, mollis id nulla. Praesent feugiat mi nec dui scelerisque mollis vehicula magna feugiat. Aliquam erat volutpat. Curabitur quis velit ut nibh rhoncus convallis. Proin mauris nunc, rhoncus vel laoreet vel, aliquet quis nunc. Aenean interdum risus non neque blandit sed adipiscing ipsum mollis. Vivamus enim orci, ultrices at scelerisque vel, laoreet a turpis. Nullam posuere ante sed nisl porta porta aliquam metus suscipit. Fusce enim odio, iaculis at suscipit eget, vestibulum volutpat enim. Nam dictum turpis quis velit posuere in malesuada mi convallis. Donec faucibus, felis id dictum imperdiet, orci tortor tristique neque, vitae lobortis libero tellus sed lorem. Duis tellus magna, commodo eget blandit ut, auctor nec nibh. Maecenas ornare ornare risus nec ultrices. Pellentesque lectus eros, imperdiet ut rhoncus vel, tempus ut nisi.";
    }

    static function test2data() {
      logger::log("Returning test2data() from database");
      return "Vestibulum laoreet nibh sed nulla mollis cursus. Maecenas sodales mauris sit amet ligula euismod a lacinia turpis adipiscing. Nulla gravida porta augue, id adipiscing libero tincidunt ac. Morbi non velit id odio porta tempus id eget massa. Cras nibh purus, gravida sed suscipit ut, tincidunt eu neque. In id est eros, ac sodales orci. Ut lectus augue, feugiat sit amet consectetur id, pharetra quis tellus. Maecenas eget lobortis urna. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce tincidunt eleifend neque. Aenean accumsan orci vitae erat blandit porttitor. Aliquam tristique dolor ac nibh elementum id lacinia diam cursus.";
    }

  }

?>

Cache directory files
For this blog post, cache directory contains the main cache.class.php file which in turn includes various other cache classes e.g. template.cache.class.php

cache.class.php

<?php

  // check for switched on cache modules
  foreach($config['cache'] as $key => $value) {
    // include all cache classes, which are swicted on
    if($value == 'On') {
      // naming convention is <modulename.cache.class.php>
      include_once('cache/'.$key.'.cache.class.php');
    }
  }

?>

Output Control Functions
PHP is a very simple language. You can write a Hello World! code or calculate similarity between two strings (see similar_text()), both with a single line of code. And hence there are a lot of fundamental concepts of PHP, which not only beginners but even some advanced coders can ignore. One such concept is Output Control in PHP.

The Output Control functions allow you to control when the output of your PHP script will be thrown to the browsers (console). i.e. You can pre-process the final html output (append, prepend, chip-chop, inserting ad-codes, url linking, keyword highlighting, template caching), which will otherwise be thrown on the browser. Interesting, isn’t it? Can you feel the power of Output Control functions?

  1. ob_start: This turns on output buffering. i.e. no output is sent from the script, instead the output is saved in an internal buffer. However output buffering doesn’t buffers your headers. ob_start() also takes an optional callback function name. The function is called when output buffer is flushed (see ob_flush()) or cleaned (see ob_clean()). We can access the this internal buffer using functions like ob_get_contents()
  2. ob_end_flush: This function send the content of the buffer (if any) and turns off output buffering. We should always call functions like ob_get_contents() before ob_end_flush(), since any changes after this functions will not reflect on the browser

Building a custom Template Caching Engine
We saw above some of the output control functions PHP has to offer. ob_start(), ob_get_contents() and ob_end_flush(); are the 3 functions we will use to create our custom template caching engine.

template.cache.class.php

<?php

  class template_cache {

    var $template_cache_file = FALSE;
    var $template_cache_file_ext = ".tmp";
    var $template_cache_dir = "cache/template/";
    var $template_cache_ttl = 300; // secs

    function __construct() {

      // initiate template caching
      $this->init();

    }

    function init() {
      // get template path
      $this->template_cache_file = $this->generateTemplatePath();

      // get template from cache if exists
      $this->getTemplate();

      // start output buffering
      ob_start();
    }

    function generateTemplatePath() {
      global $config;

      // generate template file name
      return $this->template_cache_dir.$config['template']['name'].$this->template_cache_file_ext;
    }

    function getTemplate() {
      global $config;

      // check if a cached template exists
      if(file_exists($this->template_cache_file)) {
        if(time() - filemtime($this->template_cache_file) < $this->template_cache_ttl) {
          logger::log("Cache hit for template ".$config['template']['name']);
          $content = file_get_contents($this->template_cache_file);
          echo $content;
          exit;
        }
        else {
          logger::log("Cache stale for template ".$config['template']['name']);
          return FALSE;
        }
      }
      else {
        logger::log("Cache miss for template ".$config['template']['name']);
        return FALSE;
      }
    }

    function setTemplate() {
      global $config;

      // get buffer
      $content = ob_get_contents();

      // save template
      logger::log("Caching template ".$config['template']['name']);
      $fh = fopen($this->template_cache_file, 'w');
      fwrite($fh, $content);
      fclose($fh);

      // Flush the output buffer and turn off output buffering
      ob_end_flush();
    }

  }

?>

As we saw above in controller class, the template engine class was instantiated before including the actual template file. Template engine constructor do the following 3 tasks:

  1. Generate a cached file name for the requested uri by calling the $this->generateTemplatePath(); method. e.g. if http://localhost/test1.php is the requested uri, test1.php.tmp is it’s static cached template
  2. Secondly, it tries to fetch the cached template file by calling the method $this->getTemplate(); (read on for details of this method)
  3. Finally it turns on output buffering by calling ob_start();

List of methods provided by template.cache.class.php are:

  1. generateTemplatePath() method generates a cache file name for incoming request. By default extension of all cached files in “.tmp” and are stored under the /cache/template directory.
  2. getTemplate() method do a number of tasks. First, it checks if a cached template exists for the requested uri. If it does not exists or if it is not a fresh cache (see $template_cache_ttl), this method simply returns control back to controller which go ahead and include the actual template file. However if the file exists and is fresh it reads the content of the file and throw back to browser. At this point control is no longer transferred back to the controller, hence saving various un-necessary processing and database calls.
  3. setTemplate() method is called by controller after including the actual template file from under the view directory. Point to note is that, before getTemplate() returns control back to controller (in case of missed or stale cache), the template cache class constructor does switch on output buffering. And when setTemplate() method is called, we can access this buffer using output functions like ob_get_contents() and then save the template for next incoming request. Bingo!. Finally this method throw away the buffer to the browser using ob_end_flush();

Is it working?
To verify the flow of framework, I hit the url http://localhost/test1.php 3 times, with $template_cache_ttl = 10; (seconds).

  1. Once after clearing the template cache folder
  2. Once within next 10 seconds
  3. And finally after 10 seconds

Here is how the log file looks like:

2009-08-16 19:50:49
Requested template name test1.php, path view/test1.php (1st REQUEST)

2009-08-16 19:50:49
Cache miss for template test1.php

2009-08-16 19:50:49
Returning test1data() from database

2009-08-16 19:50:49
Caching template test1.php

2009-08-16 19:50:54
Requested template name test1.php, path view/test1.php (2nd REQUEST)

2009-08-16 19:50:54
Cache hit for template test1.php

2009-08-16 19:51:03
Requested template name test1.php, path view/test1.php (3rd REQUEST)

2009-08-16 19:51:03
Cache stale for template test1.php

2009-08-16 19:51:03
Returning test1data() from database

2009-08-16 19:51:03
Caching template test1.php

Moving forward, What’s Next? Extending template.cache.class.php
Template cache class can be extended to do a lot more, other than caching the template files. For instance we might want to perform (chip-chop, append, prepend etc) a few tasks, before we cache the final template and throw back to the browser. Few tasks which look quite obvious to me are:

  1. Short Codes: We can insert short codes in our HTML templates, which later on can be expanded into full fledged codes. e.g. For embedding a YouTube video, we can simply put something like [[YouTube yjPBkvYh-ss]] into test1.php. And in setTemplate() method we can call helper/plugin methods to process such short codes. More professionally, we can add hooks for various tasks we might want to perform before caching the template. Read How to add wordpress like add_filter hooks in your PHP framework for a more professional approach.
  2. Inserting page header and footer: Instead of including page header and footer inside test1.php, we can simply put our main <body> code inside test1.php. Then before caching the template file, we can append and prepend header and footer modules to the buffer of each page. Thus avoiding including the same header and footer files across various pages.
  3. HTML module caching: There are several instances where we can have a common module across all pages. For instance, I can have a events module across all my pages, which basically displays a calendar with various events for the week or month marked on it. The event details are extracted from the database. Since this module of mine is a static HTML chunk for atleast a week, I would like to have a difference cache for this module. Intelligently hooking up these modules with template caching engine, can allow us to do module level caching

I can probably write down 10-15 more such applications and probably there might be many more such applications of the above coded template caching engine. (Note: The power actually lies in Output Control Functions provided by PHP).

Let me know if you liked the post or any bug in it.

XML Parsing in PHP, XPATH way – The best I know so far

If you are a PHP developer, you surely must have done XML parsing at some stage or the other. Over the years I myself have implemented XML Parsing in atleast 3-4 different ways. Finally I have stuck to this approach which I personally find far more better than the rest, Not only because it’s quite simple but also because it’s extendable. By extendable I mean, you don’t have to touch your code if the XML structure changes at a later stage or if you need to parse a new node at a later stage in the project. In this blog post we will try to parse my twitter timeline, using the XPATH way. To start with let’s see how a my twitter timeline look like:

XML Source:
http://twitter.com/statuses/user_timeline/imoracle.xml
You may want to open this XML structure in a separate window of your browser for reference, as we walk through various XML parsing techniques.

Data Requirement:
Before we proceed to parse this twitter timeline, lets decide what all data do we want to extract out of the XML. Each <status></status> node consists of two parts. Information about the tweet and information about the user.

Lets finalize the following list of nodes which we want about the tweets and also their corresponding XPATH’s:

  1. id: ../statuses/status/id
  2. text: ../statuses/status/text
  3. source: ../statuses/status/source

Further lets zero out on list of nodes we want about the user details:

  1. id: ../statuses/status/user/id
  2. name: ../statuses/status/user/name
  3. screen_name: ../statuses/status/user/screen_name

XML Parsing:
Let us create a file called xpath.php, which will contain xpath of various nodes which we have finalized above. The xpath.php file will look like:

xpath.php

<?php

  $user_status = array(
                      'status_id' => '../statuses/status/id',
                      'status_text' => '../statuses/status/text',
                      'status_source' => '../statuses/status/source',
                      'user_id' => '../statuses/status/user/id',
                      'user_name' => '../statuses/status/user/name',
                      'user_screen_name' => '../statuses/status/user/screen_name'
                      );

?>

parser.php

<?php

  // include the xpath file
  require_once("xpath.php");

  // read the xml source as string
  $str = file_get_contents("imoracle.xml");

  // load the string as xml object
  $xml = simplexml_load_string($str);

  // initialize the return array
  $result = array();

  // parse the xml nodes
  foreach($user_status as $key => $xpath) {
    $values = $xml->xpath("{$xpath}");
    foreach($values as $value) {
      $result[$key][] = (string)$value;
    }
  }

  // print the return array
  print_r($result);

?>

Results:
If we try to print out this $result on a browser screen, here is how the result will look like:

Array
(
    [status_id] => Array
        (
            [0] => 2499838341
            [1] => 2499780899
            [2] => 2499724163
            [3] => 2499607183
        )

    [status_text] => Array
        (
            [0] => 13 Beautiful WordPress Showcase Sites
            [1] => 55 Really Creative And Unique Blog Design Showcase
            [2] => Need PHP symfony developer to complete tvguide.com clone
            [3] => 22 Open Source PHP Frameworks To Shorten Your Development Time
        )

    [status_source] => Array
        (
            [0] => <a href="http://apiwiki.twitter.com/">API</a>
            [1] => <a href="http://apiwiki.twitter.com/">API</a>
            [2] => <a href="http://apiwiki.twitter.com/">API</a>
            [3] => <a href="http://apiwiki.twitter.com/">API</a>
        )

    [user_id] => Array
        (
            [0] => 14574588
            [1] => 14574588
            [2] => 14574588
            [3] => 14574588
        )

    [user_name] => Array
        (
            [0] => Abhinav Singh
            [1] => Abhinav Singh
            [2] => Abhinav Singh
            [3] => Abhinav Singh
        )

    [user_screen_name] => Array
        (
            [0] => imoracle
            [1] => imoracle
            [2] => imoracle
            [3] => imoracle
        )

)

The Best Part:
The best part of this approach is that, suppose in future our project demands extraction of the following nodes too:

  1. truncated: ../statuses/status/truncated
  2. favorited: ../statuses/status/truncated ../statuses/status/favorited
  3. location: ../statuses/status/user/location
  4. description: ../statuses/status/user/description

All we need to do is, simply add these xpaths in xpath.php file, without having to change the parser.php file. In case of a project you may want to create a function or a class out of the parser.php file so that you can request data from that.

Download the source code from here:
http://abhinavsingh.googlecode.com/files/xml-parser-xpath-way.rar

If you liked the post, do not forget to leave a comment and follow me on twitter.
Do let me know of better methods if you know any. Happy XML Parsing!

Facebook type image rotation and more using PHP and Javascript

If you are a facebook geek like me, you must have noticed till now the image rotate functionality in the photo albums. Facebook allows you to rotate images 90 degree clockwise and anti-clockwise after image upload. If you haven’t tried that till now, below is a screenshot for your convenience.

facebook-image-rotate

Question:
But the question is how does facebook team succeed doing this in one click. Today I tried looking around for a solution over internet and I came across the inbuilt imagerotate functionality in PHP.

Problem:
Unfortunately the problem is that even if you have GD Library extension enabled in PHP, imagerotate function just doesn’t work. After some research I found that to enable imagerotate function inside PHP, you need to compile and build PHP manually and enable imagerotate while installing PHP. You can check your PHP installation support for imagerotate using the following command line:

php -r "var_dump(function_exists('imagerotate'));"

Solution:
Since I didn’t want to touch my current installation of PHP, ImageMagick came to my rescue. Below I will show you how did I achieve cloning facebook’s 90 degree rotation functionality and also added a custom degree rotation (functionality usually seen in collage tools).

Demo:
You can try a live demo of the application here:
http://abhinavsingh.com/webdemos/imagerotate/

facebook-image-rotate-using-php-javascript

Source Code:
Before you go ahead and try this demo, you will need to install imagemagick on your system. On debian and ubuntu this can be achieved by the following command:

apt-get install imagemagick

To make sure installation is alright, run the following command from the shell:

convert -version

You should see something like this:

Version: ImageMagick 6.2.4 02/10/07 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2005 ImageMagick Studio LLC

Convert is a command line utility provided by imagemagick. Read more about convert here.

The source code consists of the following files and folders:

  • index.php : Generates the Frontend UI part for the application
  • action.php : Responsible for handling requests and rotating image using imagemagick library
  • style.css : Used for styling the UI
  • script.js : Used for handling the horizontal slider
  • images : This folder contains all the required images for the application. The image which we will be rotating left and right is “me.jpg”

action.php

  // Image Path
  $image = "images/me.jpg";
  $original = "images/me-original.jpg";

  if($_POST['restore']) {
    exec("cp ".$original." ".$image);
  }
  else if($_GET['left'] == 1 || $_GET['right'] == 1 || $_POST['degree']) {
    // Rotation Degree
    if(isset($_GET['left'])) $degree = -90;
    if(isset($_GET['right'])) $degree = 90;
    if(isset($_POST['degree'])) $degree = $_POST['degree'];

    // rotate image
    if(is_numeric($degree)) exec("convert ".$image." -rotate ".$degree." ".$image);
  }

From the UI, a user can pass 4 kind of requests:

  • Rotate Image 90 degree clockwise by clicking the button on bottom right corner
  • Rotate Image 90 degree anti-clockwise by clicking the button on bottom right corner
  • Rotate Image x degree, by choosing the value of x using horizontal slider on bottom left
  • Restore the image back to the original self

action.php handles the incoming 4 cases, calculates $degree to rotate and passes as a parameters to the command line utility provided by imagemagick.

Download Code:
Download the complete source code from here:
http://abhinavsingh.googlecode.com/files/image-rotate-application.rar
Unzip into a folder, and make the images directory writable.
PS: Application not tested on IE browser

Don’t forget to share and leave a comment if you liked the post.
Cheers!

MySQL Query Cache, WP-Cache, APC, Memcache – What to choose

Hello Cache Freaks,

Ever since I changed my job (from Business Intelligence to Web development) and started working with my present employer, I have had a chance to work on a lot of scalable projects. From making my project to scale from 20 Million PV’s to 100 Million PV’s to development of an internal tool, the answer to all scalable applications have been caching.

There are a lot of caching techniques which are being employed by sites worldwide.

  1. WP-Cache used in wordpress – a file system based caching mechanism
  2. APC Cache – an opcode based caching system
  3. Memcache – an in memory caching system
  4. Query Cache – caching mechanism employed in MySQL

Here in this post I would like to pen down my experiences while working with all the caching mechanism. Their pros and cons. What things you need to take care while working with them and every little tit bit which comes to my mind while writing this post.

Query Cache – inbuilt cache mechanism in MySQL
Query cache is an inbuilt cache mechanism for MySQL. Basically when you fire a query against a MySQL database, it goes through a lot of core modules. e.g. Connection Manager, Thread Manager, Connection Thread, User Authentication Module, Parse Module, Command Dispatcher, Optimizer Module, Table Manager, Query Cache Module and blah blah. Discussing these modules is out of scope of this blog post. The only module we are interested here is Query Cache Module.

Suppose I fire a query:
$query = “Select * from users”;

MySQL fetches it for the first time from the database and caches the result for further similar query. Next time when you fire a similar query, it picks the result from the cache and deliver it back to you.

However, there are a few drawbacks with Query Cache:

  • If there is a new entry in the users table, the cache is cleared.
  • Secondly, even if the result of your query is cached, MySQL has to go through a number of core modules before it give back the cached result to you.
  • Thirdly, even if your results are caches, you need to connect to your MySQL database, which generally have a bottleneck with number of connections allowed.

One thing which you should take care while replying on Query Cache is that, your query must not have any parameter which is random or changes quite often. For e.g. If you wish to fetch url’s of 100 photo from the database, and then you want to present them in a random fashion every time, you might be tempted to use rand() somewhere in your MySQL queries. However by using rand() you ensure that the query is never returned from cache, since rand() parameter always makes the query look different.

Similarly, If you have a case where you need to show data not older than 15 days, and by mistake you also include the seconds parameter in your SQL query, then the query will never return from cache.

Hence for a scalable site, with 100 Million PV’s you can’t really survive with a simple query cache provided by MySQL database.

WP-Cache – Caching mechanism for WordPress blogs
WP-Cache is a file system based caching mechanism i.e. it caches your blog posts in form of simple text files which are saved on your file system. You can have a look at these cached files by visiting wp-content/cache folder inside your blog directory. Generally you will find two set of files for a single blog post. One .html and another .meta file.

.html file generally contains the static html content for your blog post. Once published, the blog post content is static, hence instead of fetching it’s data from the database, WP-Cache serves it from the cache directory.

.meta file contains serialized information such as Last Modified, Content-Type, URI which WP-Cache uses to maintain cache expiry on your blog.

WP-Cache works really well, however if the traffic starts increasing on your blog, then the bottleneck will be maximum number of child processes which apache can create (For starters you can think, each user connecting to your blog as one apache process, hence there is a restriction on number of users who can connect to your blog at a particular time). In such case the solution can be either to have multiple front end servers with a load balancer to distribute the traffic among front end servers, or to have a better cache solution such as a memory based caching mechanism (e.g. Memcache). Also since memory read is always faster than file read, you must go for a memory based cache system.

APC Cache – An opcode based cache for PHP
APC stands for Alternative PHP Cache. In one of my previous post How does PHP echo’s a “Hello World”? – Behind the scene, I talk about how PHP churns out “Hello World” for you.

PHP takes your code through various steps:

  1. Scanning – The human readable source code is turned into tokens.
  2. Parsing – Groups of tokens are collected into simple, meaningful expressions.
  3. Compilation – Expressions are translated into instruction (opcodes)
  4. Execution – Opcode stacks are processed (one opcode at a time) to perform the scripted tasks.

Opcode caches let the ZEND engine perform the first three of these steps, then store that compiled form (opcodes) so that the next time a given script is used, it can use the stored version without having to redo those steps only to come to the same result.

However the problem with APC cache is that it is not a distributed cache system. By distributed cache I mean, if you have 3 frontend server then you need to have a copy of this opcode on all the three fronend server. Also like WP-Cache APC is again a file system driven cache system, which is not the optimal solution. Also with APC cache, PHP still has to go through the last step as described above.

Memcache – In memory based cache mechanism
Memcache is the solution when you talk about million PV’s on your site. It is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

For starters, Memcache is not PHP, nor Python or any other thing as you may think. It’s a deamon which runs in the background on your server. Your code connects to it and cache query results, JS, CSS and other cachable data in the server’s memory (RAM). Since its an in-memory caching system, it is faster than any of the discussed caching system above. However it is unreliable and unsecure but then there are ways to tackle this unreliable and unsecure nature of memcache.

Memcache is unreliable because it resides in your system memory. So an event like system reboot or power failure will result in loss of all your cache. Also memcache provides no mechanism to take backup of your caches. Hence once lost, you need to programmatically warmup your caches.

Memcache is unsecure because it doesn’t require any authentication mechanism. There is no username or password with which your code connects to it. (Hence it is super fast, unlike Query cache which has to go through auth module even if the query result is cached). It usually runs at port 11211 on your server and if not taken care, anyone can telnet to port 11211 on your server and steal your caches.

Below are the steps which are being followed on a memcache enabled website:

  1. User enter your site’s url in his browser, say http://localhost
  2. There are about 6 queries which drives your opening page
  3. Lets assume one of the query for this page is $query = “SELECT photo_url from photos LIMIT 0,10”
  4. When the user visit http://localhost, your code will first connect to memcache deamon on port 11211
  5. If memcache deamon is running, it checks if the result of this query are already cached. Generally data is cached in memcache as (key,value) pair
  6. Since this is the first visit on your site, ofcourse there is no query being cached till now. Hence your code now connect to your MySQL database and fetched the resultset. Something like this probably. $resultset = mysql_query($query);
  7. After successfully fetching the resultset, your code will cache this resultset in memcache. The code connects to memcache deamon and saves this (key,value) pair in memory, where $key = md5($query) and $value = serialize($resultset)
  8. After caching the (key,value) pair, your code returns back the fetched resultset to the frontpage where it is being displayed to the user
  9. Similarly all the 6 queries which drives your front page are being cached one by one
  10. Now the next time when another user visit this page i.e. http://localhost, your code will first see if $key = md5($query) is already present in cache. It will find the $key being cached, fetches the serialized resultset from memory, unserialize it and throws back to the front page where it is displayed as intended
  11. While caching (key,value) pair you also have a provision to specify a TTL (time to live) after which memcache will automatically expire your cached result
  12. Suppose you specifies a TTL = 15 Minutes for all the above queries and now a visitor visit http://localhost after 30 minutes
  13. Your code will again first connect to memcache deamon and check if $key = md5($query) is present in cache. Memcache deamon will see that this $key is present but it has expired. It will return a result saying $key is not cached and internally flushes out $key. Your code then again connect to MySQL database, fetches the resultset and cache back the results in memcache for further use

I will leave you with a presentation on memcache which I gave sometime back at office. I hope it will help you gain more understanding of memcache.

Memcache

View SlideShare presentation or Upload your own. (tags: caching memcache)

In my next posts, I will be covering a few code samples and use cases of memcache which you wouldn’t even heard of. If you liked the post don’t forget to promote it on social networking sites and do subscribe to my blog. 🙂

PHP Extensions – How and Why?

In this short post we will quickly see:

  1. How to write PHP extensions?
  2. Why to write PHP extensions?

However before you could understand what we are going to disucss, I will recommend you to read one of  my previous post How does PHP echo’s a “Hello World”? – Behind the scene . In this post I discussed in brief the backend architecture of PHP.

Assuming you have read the previous post, lets discuss on how to build our first PHP extension:

  1. Every PHP extension is built out of minimum of 2 files.
  2. Configuration file (config.m4) which tells us what files to build and what external libraries are needed.
  3. Source File(s) which will contain the actual functionalities provided by the extension.

Building a sample extension skeleton
Lets start with building and understanding a sample extension skeleton. Then we will move ahead with building our first PHP extension:

config.m4

PHP_ARG_ENABLE(sample,
        [Whether to enable the "sample" extension],
        [-enable-sample  Enable "sample" extension support])
if test $PHP_SAMPLE != "no"; then
        PHP_SUBST(SAMPLE_SHARED_LIBADD)
        PHP_NEW_EXTENSION(sample,sample.c,$ext_shared)  // 1st argument declares the module
                                                        // 2nd tells what all files to compile
                                                        // $ext_shared is counterpart of PHP_SUBST()
fi

I found a number of articles on internet which gives you code for your first PHP extension but none of them go ahead and explain each and every word in those codes. Lets give an attempt in understanding every bit of this strange config file.

  1. This is a minimalistic config file which is required for an extension
  2. The first parameter to PHP_ARG_ENABLE(), sets up a ./configure option called -enable-sample
  3. The second parameter to PHP_ARG_ENABLE() will be displayed during the ./configure process as it reaches this configuration file
  4. Third parameter will be displayed as an option if end user issues ./configure -help

For Newbies: Wondering what is this ./configure option? Kindly read PHP: Installation on Unix System for details.

Lets understand the remaining part of the config.m4 file:

  1. To compile an extension we follow 3 steps: (i) phpize (ii) ./configure -enable-sample (iii) make
  2. When we call ./configure -enable-sample in step (ii), a local environmental variable $PHP_SAMPLE is set to yes. (PS: If our extension name was Hello, then $PHP_HELLO would have been set to yes)
  3. PHP_SUBST() is a MACRO similar to AC_SUBST() in C and is necessary to build the extension as a shared module
  4. PHP_NEW_EXTENSION() declares the module and tell source files that must be compiled as part of the extension. $ext_shared is a counterpart of PHP_SUBST() and is necessary for buildin an extension as a shared module

(PS: We only have a single source file i.e. sample.c for this extension. If in case we had more than a single source file then last line of config.m4 would have been something like this: PHP_NEW_EXTENSION(sample,sample1.c sample2.c sample3.c,$ext_shared) and so on)

Now lets build our source file skeleton. Let’s segregate certain type of data in a header file, which we will finally include in sample.c file. This is generally a good practice rather than maintaining a single source file.

php_sample.h

#ifndef PHP_SAMPLE_H
  #define PHP_SAMPLE_H
  #define PHP_SAMPLE_EXTNAME "sample"
  #define PHP_SAMPLE_EXTVER "1.0"

  #ifdef HAVE_CONFIG_H
    #include "config.h"
  #endif

  #include "php.h"
  extern zend_module_entry sample_module_entry;
  #define phpext_sample_ptr &sample_module_entry
#endif

Do not leave this page on seeing this code. It’s all very simple if you have ever written some code in C.
All that this file wants to do is:

  1. config.h file is included when compiled using phpize tool
  2. It also includes php.h from the PHP source tree. With inclusion of php.h, many other .h files also gets included and hence making available a lot of PHP API’s, which can be used by this extension
  3. The zend_module_entry struct is defined as extern so that it can be picked up by ZEND engine using dlopen() and dlsym() functions, when the module loads.

sample.c

#include "php_sample.h"

zend_module_entry sample_module_entry = {
  #if ZEND_MODULE_API_NO >= 20010901
    STANDARD_MODULE_HEADER,        // Roughly means if PHP Version > 4.2.0
  #endif
    PHP_SAMPLE_EXTNAME,        // Define PHP extension name
    NULL,        /* Functions */
    NULL,        /* MINIT */
    NULL,        /* MSHUTDOWN */
    NULL,        /* RINIT */
    NULL,        /* RSHUTDOWN */
    NULL,        /* MINFO */
  #if ZEND_MODULE_API_NO >= 20010901
    PHP_SAMPLE_EXTVER,        // Roughly means if PHP Version > 4.2.0
  #endif
    STANDARD_MODULE_PROPERTIES
};
#ifdef COMPILE_DL_SAMPLE
  ZEND_GET_MODULE(sample)      // Common for all PHP extensions which are build as shared modules
#endif


Thats it! We have our first PHP extension ready. Compile this module as discussed above i.e.
(i) phpize
(ii) ./configure -enable-sample
(iii) make
check your phpinfo() and see if you have an extension called “sample” loaded successfully or not.

Though this extension is capable of doing nothing, but the skeleton here is the base for every PHP extension. Lets recap in short what has happened till now:

RECAP

  1. config.m4 file is the configuration file for extension
  2. It declared the extension, tells what all files are required for the extension to build, add a few ./configure -help options too
  3. On the other hand sample.c and php_sample.h are the main source files.
  4. The header file includes the config.h and php.h header files from PHP source tree, which additionally provides a number of PHP API’s which can be used
  5. As discussed in last blog post, every extension have the following modules: MINIT, RINIT, RSHUTDOWN, MSHUTDOWN. sample.c helps in telling PHP, which part of the code corresponds to the above module
  6. For our extension “sample” we have defined NULL as MINIT, RINIT, RSHUTDOWN and MSHUTDOWN and hence this module isn’t capable of doing anything

Building a Hello World Extension
To build an extension which actually do something, we will need to just tweak the abiove skeleton. Here we are trying to build an extension which will provide us with a function called sample_hello_world(), which we can use directly in our php codes to output Hello World!

Quickest link between userspace and extension code is the PHP_FUNCTION(). Start by adding the following code block near the top of sample.c file just after
#include “php_sample.h”

PHP_FUNCTION(sample_hello_world) {
  php_printf("Hello World!n");
}

PHP_FUNCTION() is basically a MACRO which expands internally. (I will skip this expansion as of now to keep this post as simple as possible)

But simply declaring the function isn’t enough. The ZE needs to know the address of the function as well as how the function name should be exported to the userspace. Place the following block of code immediately after PHP_FUNCTION() block:

static function_entry php_sample_functions[] = {
    PHP_FE(sample_hello_world,NULL)
    {NULL,NULL,NULL}
};

The php_sample_functions vector is a NULL terminated vector that will grow as we continue to add more functionality to sample extension. Every function we export will appear as an item in this vector.

PHP_FE(sample_hello_world,NULL) expands to {sample_hello_world, zif_sample_hello_world, NULL} and hence providing a name and an address to implement it.

Finally simply go to the sample_module_entry struct and replace:
NULL /* functions */ with
php_sample_functions /* functions */

Now simply rebuild the extension and then try this on command line:
$ php -r ‘sample_hello_world();’

If everything was done perfectly, you would see “Hello World!” output on the shell.

PS: In this tutorial I have tried to explain each and every line which is involved in making a hello world php extension. However I am sure that many questions are still un-answered. Feel free to ask any doubt or correct me in case I have made a blunder while penning this down.

How does PHP echo’s a “Hello World”? – Behind the scene

Have you ever wondered how PHP echo’s a “Hello World” for you on the browser? Even I didn’t until I read about the PHP internals and extensions. I thought may be a few out there will be interested in exploring the other side of PHP, so here we go.

In my last post I discussed in brief “How your browser reaches to my server when you type http://abhinavsingh.com in address bar?”. Read through if you have missed out on that. Here I will discuss in brief “How does PHP churns out the content requested on the webpage?”

An Overview
Here is what happens step-wise:

  1. We never start any PHP daemon or anything by ourself. When we start Apache, it starts the PHP interpreter along itself
  2. PHP is linked to Apache (In general term SAPI i.e. a Server API) using mod_php5.so module
  3. PHP as a whole consists of 3 modules (Core PHP, Zend Engine and Extension Layer)
  4. Core PHP is the module which handles the requests, file streams, error handling and other such operations
  5. Zend Engine(ZE) is the one which converts human readable code into machine understandable tokens/op-codes. Then it executes this generate code into a Virtual Machine.
  6. Extensions are a bunch of functions, classes, streams made available to the PHP scripts, which can be used to perform certain tasks. For example, we need mysql extension to connect to MySQL database using PHP.
  7. While Zend Engine executes the generated code, the script might require access to a few extensions. Then ZE passes the control to the extension module/layer which transfer back the control to ZE after completion of tasks.
  8. Finally Zend Engine returns back the result to PHP Core, which gives that to SAPI layer, and finally which displays it on your browser.

A Step Deeper
But wait! This is still not over yet. Above was just a high level flow diagram. Lets dig a step deeper and see what more is happening behind the scenes:

  1. When we start Apache, it also starts PHP interpreter along itself
  2. PHP startup happens in 2 steps
  3. 1st step is to perform initial setup of structures and values that persists for the life of SAPI
  4. 2nd step is for transient settings that only last for a single page request

Step 1 of PHP Startup
Confused over what’s step 1 and 2 ? No worries, next we will discuss the same in a bit more detail. Lets first see step 1, which is basically the main stuff.
Remember step 1 happens even before any page request is being made.

  1. As we start Apache, it starts PHP interpreter
  2. PHP calls MINIT method of each extension, which is being enabled. View your php.ini file to see the modules which are being enabled by default
  3. MINIT refers to Module Initialization. Each Module Initialization method initializes and define a set of functions, classes which will be used by future page requests

A typical MINIT method looks like:

      PHP_MINIT_FUNCTION(extension_name) {

          /* Initialize functions, classes etc */

      }

Step 2 of PHP Startup

  1. When a page request is being made, SAPI layer gives control to PHP layer. PHP then set up an environment to execute the PHP page requested. In turn it also create a symbol table which will store various variables being used while executing this page.
  2. PHP then calls the RINIT method of each module. RINIT refers to Request Initialization Module. Classic example of RINIT module implementation is the Session’s module. If enabled in php.ini, the RINIT method of Sessions module will pre-populate the $_SESSION variable and save in the symbol table.
  3. RINIT method can be thought as an auto_prepend_file directive, which is pre-appended to every PHP script before execution.

A typical RINIT method looks like:

      PHP_RINIT_FUNCTION(extension_name) {

          /* Initialize session variables, pre-populate variables, redefine global variables etc */

      }

Step 1 of PHP Shutdown
Just like PHP startup, shutdown also happens in 2 steps.

  1. After the page execution is complete either by reaching the end of the script or by call of any exit() or die() function, PHP starts the cleanup process. In turn it calls RSHUTDOWN method of every extension. RSHUTDOWN can be thought as auto_append_file directive to every PHP script, which no matter what happens, is always executed.
  2. RSHUTDOWN method, destroys the symbols table (memory management) by calling unset() on all variables in the symbols table

A typical RSHUTDOWN method looks like:

      PHP_RSHUTDOWN_FUNCTION(extension_name) {

          /* Do memory management, unset all variables used in the last PHP call etc */

      }

Step 2 of PHP Shutdown
Finally when all requests has been made and SAPI is ready to shutdown, PHP call its 2nd step of shutdown process.

  1. PHP calls the MSHUTDOWN method of every extension, which is basically the last chance for every extension to unregister handlers and free any persistent memory allocated during the MINIT cycle.

A typical RSHUTDOWN method looks like:

      PHP_MSHUTDOWN_FUNCTION(extension_name) {

          /* Free handlers and persistent memory etc */

      }

And that brings us to the end of what we can call as PHP Lifecycle. Important point to note is that Step 1 of Startup and Step 2 of Shutdown happens when no request is being made to the web servers.

I hope this post will clear many doubts and un-answered questions which you might have.

Do leave a comment and feedbacks.

What happens before you finally viewed this page?

Since past 1 week or so I am trying to make a small tiny light weight web server of my own, and for the same I have been referring to dozens of papers, websites, people and what not. I still haven’t finished making one, I am still toggling between Java and Python, since both seems to satisfy my needs and unfortunately I am master in none. While I was digging deep into the theory of how can I make a web server, I cleared many other concepts of mine, which finally leads to this post.

“What happens before you finally viewed this page/post of mine ?” – Many of you might know behind the scene stories and many like me might have a vague idea. But for the good of all those who don’t know and for people like me who needs a reference, I thought of better penning it down.

Lets see what possibly is going in the background:

  1. Suppose you type in http://www.yahoo.com in your web browser or clicked this link on some other page.
  2. Your browser see’s the above URL, and identifies it as a HTTP protocol.
  3. Then it breaks the URL into Protocol, Domain Name, File Name (in above case no file name is specified)
  4. Browser contacts its default DNS (Domain Name Server), which helps it with an IP Address. DNS is a huge distributed database which contains mapping of URL’s to IP Address. Your browser’s default DNS might or might not have the required mapping of URL to IPAddress.
  5. If it don’t have the IPAddress corresponding to http://www.yahoo.com , it will try to contact other root name servers for the required IPAddress.
  6. If it already have the IPAddress corresponding to http://www.yahoo.com (which is possible if a similar request has already been made recently) it will provide the required IPAddress to your browser.

And finally your browser then connects to the IPAddress it received and servers you the page returned back by Yahoo!. If you are still not clear about the process, following practical example might help you.

  1. I thought of starting a website and decided to name it as http://gtalkbots.com
  2. Firstly, I needed to register domain name with a registrant. For e.g. In my case I blocked the domain name gtalkbots.com with godaddy.com.
  3. Secondly, I needed to have a machine i.e. a place where I can have all my HTML, PHP files. So I bought a VPS (Virtual Private Server)
  4. Thirdly, I had to link the above two i.e. when someone types in http://gtalkbots.com in his browser, it should come to my VPS where I have my HTML files.
  5. For linking the domain name and my VPS, I simply make an ARecord entry at godaddy, telling it the IPAddress of my VPS. In turn when you type in http://gtalkbots.com in your browser, GoDaddy redirects you to my VPS. (Google for ARecord and CNAME for more details)
  6. Last 4 entries in the image above are for linking my gtalkbots.com domain with my google apps account, so that I can send and receive my gtalkbots.com email on google apps.

There is a lot more to it, however the info above is more than sufficient to understand the basics of
“What happens before you finally viewed this page/post of mine ?”

Symfony model layer tips and tricks – A PHP Framework – Part 3

I had a tough time searching for examples over the net for all kind of symfony queries. Though symfony is well documented here, I didn’t find enough examples there.

If you have landed up on this post straight and are new to symfony, you may want to first read previous posts:
Getting started with Symfony – A PHP Framework – Part 1
How to build a login-registration system using Symfony – A PHP Framework – Part 2

If you are well equipped with the basics, read on

Select Query:
Lets start with the few basic example queries.

  1. Extract list of all users from the users table

    $c = new Criteria();
    $this->resultSet = UsersPeer::doSelect($c);

    Now lets see how can we use this $resultset within the model layer and in the template files.
    To use the resultset within model layer, we can do something like this:

    foreach($this->resultset as $rs) {
        $uid = $ps->getId();
        $fname = $ps->getFirstName();
        $lname = $ps->getLastName();
    }

    To use the resultset in template files (viewer layer), we can do something like this:

    foreach($resultset as $rs) {
        $uid = $ps->getId();
        $fname = $ps->getFirstName();
        $lname = $ps->getLastName();
    }

  2. Extract list of first 10 users only

    $c = new Criteria();
    $c->setLimit(10);
    $this->resultSet = UsersPeer::doSelect($c);

  3. Extract list of 10 users starting from 100th user

    $c = new Criteria();
    $c->setOffset(100);
    $c->setLimit(10);
    $this->resultSet = UsersPeer::doSelect($c);

  4. Extract user data where username = ‘imoracle’

    $c = new Criteria();
    $c->add(UsersPeer::USERNAME,”imoracle”);           // Note USERNAME is all in caps, even if your column name is in small case
    $this->resultSet = UsersPeer::doSelect($c);

  5. Another approach to do a select query

    $c = new Criteria();
    $c->add(UsersPeer::USERNAME,”imoracle”);
    $resultSet = UsersPeer::doSelectRS($c);

    To access this resultset in the model layer itself, we can do something like this:

    while($resultSet->next()) {
        $uid = $resultSet->get(1);
        $fname = $resultSet->get(2);
        $lname = $resultSet->get(3);
    }

    Where get(1) fetches the 1st column for you of the resultset. Hence $resultSet->get(n) will fetch you the nth column of the resultSet.

Insert Query:
Lets see a few quick examples for inserting a new row in a table.

  1. Insert a user entry in user table

    $new_user = new Users();
    $new_user->setUserName($uname);
    $new_user->setFirstName($fname);
    $new_user->setLastName($lname);
    $new_user->setPassword(md5($pass));

    $new_user->save();
    $new_user_id = $new_user->getId();

    Finally we get the last inserted user row id.

Update Query:
Symfony provides nothing special for the update queries. The same above format works for updating a row as well. Symfony is smart enough to analyze whether its a select or update query.

  1. First Method for Update Query (Recommended)

    $c = new Criteria();
    $c->add(UsersPeer::USERNAME,”imoracle”);
    $c->add(UsersPeer::PASSWORD,$newpass);
    $res = UsersPeer::doUpdate($c);

    In the above example we chose to update the password for user with username “imoracle”.

  2. Another Method for Update Query

    $user = new User();
    $user->setNew(false);
    $user->setId($uid);
    $user->setPassword($newpass);
    $user->save();

    Important code in above 5 lines is the 2nd line. Without this line it will add a new row and not update. Also another drawback which I found in this approach is that in third line you need to give the PK (primary key) of the table. Like our 1st method you cannot give username in the third line.

    Finally I haven’t tried and tested this method enough. So I will not recommend this.

Delete Query:
Its basically the same above methods, with change of final call.

  1. Delete a user’s account where username = ‘imoracle’

    $c = new Criteria();
    $c->add(UsersPeer::USERNAME,’imoracle’);
    $res = UsersPeer::doDelete($c);

How to alter a table in symfony?

  1. Method 1 (Non-standard)
    A drawback which I have seen in symfony till now is that, when you try to alter a table by adding new columns in your schema.yml, It flushes the data from the table and gives you an altered table with no data. This is really bad, if you don’t know this behavior and move ahead altering the table. Though Symfony provider propel-dump-data to handle alter tables, but it is still quite buggy and slow for big databases. The clean and smooth way to achieve this is:

    1. Take a mysql dump with following options:
      mysqldump -h hostname -u username -p password -c -t dbname > dbname.sql
      Do not forget to use option -c and -t, without which it can land you in a problem
    2. Update your schema.yml file, with new columns which you want to add and run
      sudo symfony propel-build-all
    3. Restore the database content using following command:
      mysql -h hostname -u username -p password dbname < dbname.sql

    And you have your new altered table without any loss of data.

  2. Method 2 (Standard)
    If you are afraid of your data being flushed by the stupid symfony, then you may choose not to touch your database using propel at all. Here is what you can do in such a case.

    1. Update your schema.yml file and run
      sudo symfony propel-build-model
    2. Then manually go and alter the table in the database.

    Here propel-build-model will only rebuild the ORM layer to incorporate your schema.yml changes. However propel-build-all not only builds the ORM layer, but also actually creates the table in the database (flushing the data)

In future I shall keep adding more SQL queries on this post, as and when I write them using symfony. So keep glued to this one 😉