Setting up ejabberd 2.1.x development environment on Ubuntu

apt-get provide a convenient way of installing ejabberd on Ubuntu distributions. However, if you are an erlang developer and looking to write custom ejabberd modules, you might want to install ejabberd from the source code.

Checkout ejabberd source
To start with lets grab the ejabberd 2.1.x branch source code:

  • sudo apt-get install git-core
  • git clone git://git.process-one.net/ejabberd/mainline.git ejabberd
  • cd ejabberd
  • git checkout -b 2.1.x origin/2.1.x
  • cd src

Installing pre-requisites
Lets setup necessary pre-requisites before compiling ejabberd source code:

  • sudo apt-get install build-essential
  • sudo apt-get install automake autoconf
  • sudo apt-get install erlang erlang-manpages
  • sudo apt-get install libexpat1-dev zlib1g-dev libssl-dev

Compiling ejabberd source code
Compiling and installing is dead simple:

  • ./configure
  • make
  • sudo make install

Setting up administration account
By now we have ejabberd server setup on our ubuntu box. Lets setup the admin account:

  • sudo ejabberdctl start
  • sudo ejabberdctl register admin localhost password
  • sudo vim /etc/ejabberd/ejabberd.cfg
  • Add following configurations if not already present:
    {acl, admin, {user, "admin", "localhost"}}.
    {loglevel, 5}. %% Log level while development
  • sudo ejabberdctl restart
  • Visit ejabberd admin panel: http://localhost:5280/admin
  • When prompted enter username as admin@localhost and the password you chose above while registering admin account

Resolving startup errors
You might see a few error reports in logs or elsewhere while starting ejabberd server:

  • /sbin/ejabberdctl: 340: cannot create //var/lock/ejabberdctl/ejabberdctl-1: Directory nonexistent
  • =ERROR REPORT==== 2010-03-29 21:53:03 ===
    C(<0.995.0>:ejabberd_captcha:331) : The option captcha_cmd is not configured, but some module wants to use the CAPTCHA feature.
  • =INFO REPORT==== 2010-03-29 21:53:03 ===
    I(<0.780.0>:ejabberd_rdbms:37) : ejabberd has not been compiled with relational database support. Skipping database startup.

Here is how you can resolve the above errors:

  • sudo mkdir /var/lock/ejabberdctl/ejabberdctl-1
  • Add following line in ejabberd.cfg
    {captcha_cmd, "/lib/ejabberd/priv/bin/captcha.sh"}.

    Replace /lib/ejabberd/priv/bin/captcha.sh with path to /ejabberd/tools/captcha.sh

Getting started with ejabberd development
Below are links to a few useful resources to get you started with erlang and ejabberd module development:

WordPress Toolbar v 2.2 : Custom toolbar url, Support for WPMU and bug fixes

download-wordpress-plugin

WordPress toolbar plugin provide a facebook, digg style toolbar for all outgoing links from your blog posts. The toolbar url defaults to http://yourblog/wp-content/plugins/wordpress-toolbar/toolbar.php. However with version 2.2, blog admin can customize toolbar url to http://yourblog/wordpress-toolbar/ through the admin panel. A lot of other enhancements have been added like cross-plugin compatibility and support for WPMU hosted blogs. Check full feature list below.

What’s New?
Listed below is list of new features and bug fixes released with v 2.2:

  1. Support for customizing toolbar url through admin panel
  2. Support for WPMU hosted blogs
  3. Support for removing “Get this Plugin” widget from the toolbar through admin panel
  4. Security fix for possible XSS attack. Fix done by passing encoded hash string instead of plain text parameters. Also added various security checks on toolbar page to avoid possible XSS attacks.
  5. Bug fix where plugin didn’t work as expected because of cross plugin compatibility issues. Fix done by replacing server side toolbar logic with client side (using jquery) logic.
  6. Bug fix to show sociable share icons and tinyurl share link only for single posts and pages
  7. Bug fix for unrecognizable code in the toolbar when the encoding of hosted blog is different from utf-8. Fix done by using hosted blog settings instead of hardcoded utf-8.

Also core plugin code has been restructured (OOPS oriented now) so that maintainability and support becomes easier and quicker.

Steps to customize the default toolbar URL
Enable WordPress Toolbar v 2.2 plugin. Assuming you want to change default toolbar url from /wp-content/plugins/wordpress-toolbar/toolbar.php to /wordpress-toolbar, follow these steps:

  1. Enable apache mod_rewrite
  2. Add AllowOverride All in your blog virtual host config file and restart apache
  3. Add following apache rewrite rule by editing your blog .htaccess file
    RewriteRule ^wordpress-toolbar$ wp-content/plugins/wordpress-toolbar/toolbar.php
    RewriteRule ^wordpress-toolbar/$ wp-content/plugins/wordpress-toolbar/toolbar.php
  4. If you have blogs hosted using WPMU, add following apache rewrite rules in .htaccess file
    RewriteRule ^wordpress-toolbar$ wp-content/plugins/wordpress-toolbar/toolbar.php
    RewriteRule ^wordpress-toolbar/$ wp-content/plugins/wordpress-toolbar/toolbar.php
    RewriteRule ^([0-9a-zA-Z-]+)/wordpress-toolbar$ $1/wp-content/plugins/wordpress-toolbar/toolbar.php
    RewriteRule ^([0-9a-zA-Z-]+)/wordpress-toolbar/$ $1/wp-content/plugins/wordpress-toolbar/toolbar.php
  5. Manually check if rewrite rules are working. Open your custom toolbar url and you should see a result similar to this http://abhinavsingh.com/blog/wordpress-toolbar
  6. If for some reasons you DO NOT see “Working! Though required parameters are missing.” on toolbar page, it means rewrite rules didn’t worked as expected. Before you proceed with the setup, you SHOULD fix rewrite rules
  7. Go to wordpress admin and click "Wordpress Toolbar" under Settings tab
  8. Update your new custom toolbar url as shown: wordpress-toolbar-v-2.2-custom-toolbar-url-demo
  9. Clear cache and verify your toolbar

Enjoy and kindly let me know if you have issues installing plugin on your host.

Writing your first facebook chat bot in PHP using Jaxl library

Today facebook officially announced availability of it’s chat through jabber/xmpp clients. This is a big win for XMPP, with almost 400 million new probable users adding into XMPP club. In this post, I will demonstrate how to connect to facebook chat servers using Jaxl client library in PHP. It can further be used to make custom chat bots for facebook.

Creating your first facebook chat bot:
Follow the steps to successfully run a facebook chat bot:

  1. Download Jaxl or checkout latest from trunk
    svn checkout http://jaxl.googlecode.com/svn/trunk/ jaxl-read-only
  2. Edit the configuration file config.ini.php as follows:
      // Set an enviornment
      $env = "prod";
    
      $key = array("prod"=>array("user"=>"facebook_username",
                                 "pass"=>"facebook_password",
                                 "host"=>"chat.facebook.com",
                                 "port"=>5222,
                                 "domain"=>"chat.facebook.com"
                                ),
  3. Run from command line:
    abhinavsingh@abhinavsingh-desktop:/jaxl$ sudo php index.php
    OSType: Linux, Registering shutdown for SIGINT and SIGTERM
    OpenSSL: Enabled for CLI
    Attempting DIGEST-MD5 Authentication...
    Starting Session...
    Requesting Feature List...
    Requesting Roster List...
    Setting Status...
    Done
    

Try to send a message to your running chat bot and you shall receive a default message back from the bot saying “Hi, Thanks for your message”.

See further sample codes and explaination on how to build a full fledged gaming chat bots under xmpp category.

MEMQ : Fast queue implementation using Memcached and PHP only

Memcached is a scalable caching solution developed by Danga interactive. One can do a lot of cool things using memcached including spam control, online-offline detection of users, building scalable web services. In this post, I will demonstrate and explain how to implement fast scalable queues in PHP.

MEMQ: Overview
Every queue is uniquely identified by it’s name. Let’s consider a queue named “foo” and see how MEMQ will implement it inside memcached:

  • Two keys namely, foo_head and foo_tail contains meta information about the queue
  • While queuing, item is saved in key foo_1234, where 1234 is the current value of key foo_tail
  • While de-queuing, item saved in key foo_123 is returned, where 123 is the current value of key foo_head
  • Value of keys foo_head and foo_tail start with 1 and gets incremented on every pop and push operation respectively
  • Value of key foo_head NEVER exceeds value of foo_tail. When value of two meta keys is same, queue is considered empty.

MEMQ: Code
Get the source code from GitHub:
http://github.com/abhinavsingh/memq

<?php

	define('MEMQ_POOL', 'localhost:11211');
	define('MEMQ_TTL', 0);

	class MEMQ {

		private static $mem = NULL;

		private function __construct() {}

		private function __clone() {}

		private static function getInstance() {
			if(!self::$mem) self::init();
			return self::$mem;
		}

		private static function init() {
			$mem = new Memcached;
			$servers = explode(",", MEMQ_POOL);
			foreach($servers as $server) {
				list($host, $port) = explode(":", $server);
				$mem->addServer($host, $port);
			}
			self::$mem = $mem;
		}

		public static function is_empty($queue) {
			$mem = self::getInstance();
			$head = $mem->get($queue."_head");
			$tail = $mem->get($queue."_tail");

			if($head >= $tail || $head === FALSE || $tail === FALSE)
				return TRUE;
			else
				return FALSE;
		}

		public static function dequeue($queue, $after_id=FALSE, $till_id=FALSE) {
			$mem = self::getInstance();

			if($after_id === FALSE && $till_id === FALSE) {
				$tail = $mem->get($queue."_tail");
				if(($id = $mem->increment($queue."_head")) === FALSE)
					return FALSE;

				if($id <= $tail) {
					return $mem->get($queue."_".($id-1));
				}
				else {
					$mem->decrement($queue."_head");
					return FALSE;
				}
			}
			else if($after_id !== FALSE && $till_id === FALSE) {
				$till_id = $mem->get($queue."_tail");
			}

			$item_keys = array();
			for($i=$after_id+1; $i<=$till_id; $i++)
				$item_keys[] = $queue."_".$i;
			$null = NULL;

			return $mem->getMulti($item_keys, $null, Memcached::GET_PRESERVE_ORDER);
		}

		public static function enqueue($queue, $item) {
			$mem = self::getInstance();

			$id = $mem->increment($queue."_tail");
			if($id === FALSE) {
				if($mem->add($queue."_tail", 1, MEMQ_TTL) === FALSE) {
					$id = $mem->increment($queue."_tail");
					if($id === FALSE)
						return FALSE;
				}
				else {
					$id = 1;
					$mem->add($queue."_head", $id, MEMQ_TTL);
				}
			}

			if($mem->add($queue."_".$id, $item, MEMQ_TTL) === FALSE)
				return FALSE;

			return $id;
		}

	}

?>

MEMQ: Usage
The class file provide 3 methods which can be utilized for implementing queues:

  1. MEMQ::is_empty – Returns TRUE if a queue is empty, otherwise FALSE
  2. MEMQ::enqueue – Queue up the passed item
  3. MEMQ::dequeue – De-queue an item from the queue

Specifically MEMQ::dequeue can run in two modes depending upon the parameters passed, as defined below:

  1. $queue: This is MUST for dequeue to work. If other optional parameters are not passed, top item from the queue is returned back
  2. $after_id: If this parameter is also passed along, all items from $after_id till the end of the queue are returned
  3. $till_id: If this paramater is also passed along with $after_id, dequeue acts like a popRange function

Whenever optional parameters are passed, MEMQ do not remove the returned items from the queue.

MEMQ: Is it working?
Add following line of code at the end of the above class file and hit the class file from your browser. You will get back inserted item id as response on the browser:

var_dump(MEMQ::enqueue($_GET['q'], time()));

Lets see how cache keys looks like in memcached:

abhinavsingh@abhinavsingh-desktop:~$ telnet localhost 11211
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

get foo_head
VALUE foo_head 1 1
1
END

get foo_tail
VALUE foo_tail 1 1
2
END

get foo_1
VALUE foo_1 1 10
1265540583
END

get foo_2
VALUE foo_2 1 10
1265540585
END

MEMQ: Benchmark
Below are the benchmarking results for varying load:

  1. Queuing performance: 697.18 req/sec (n=1000, c=100) and 258.64 req/sec (n=5000, c=500)
  2. Dequeue performance: 641.27 req/sec (n=1000, c=100) and 242.87 req/sec (n=5000, c=500)

MEMQ: Why and other alternatives
There are several open source alternatives which provide a lot more scalability. However, MEMQ was written because my application doesn’t expect a load in order of 10,000 hits/sec. Listed below are a few open source alternatives for applications expecting high load:

  1. ActiveMQ: A reliable and fast solution under apache foundation
  2. RabbitMQ: Another reliable solution based on AMQP solution
  3. Memcacheq: A mash-up of two very stable stacks namely memcached and berkleyDB. However, it’s installation is a bit tricky.

MEMQ: Mantra and Customization
At the base MEMQ implementation can be visualized as follows:

There is a race between two keys in memcached (foo_head and foo_tail). Both are incremented on every dequeue and queue operation respectively. However, foo_tail is strong enough and never allows foo_head to exceed. When value of keys foo_tail and foo_head are equal, queue is considered empty.

The above code file still doesn’t include utility methods like MEMQ::total_items etc. However, writing such methods should be pretty easy depending upon your application needs. Also depending upon your application requirement, you should also take care of overflowing integer values.

JAXL BOSH Demo: IM chat client for all WordPress blogs

Have you ever wished of a wordpress plugin capable of providing a facebook style chat bar on your blog post. In this blog post, I will lay down the details of how Jaxl‘s bosh support comes in handy for building such browser based real time application. Specifically, I will explain how I achieved building a plugin for my wordpress blog. If everything goes perfect over next few weeks, this plugin might be submitted in wordpress plugin’s directory.

Jaxl BOSH Support Framework
Jaxl BOSH support comprise of three main parts:

  • jaxl.jquery.js: JQuery extension written for Jaxl bosh support
  • jaxl4bosh.class.php: Connection manager in PHP
  • jaxl UI: Integrated UI framework for changing your application skin on the fly. Your application skin can be a simple facebook style chat bar (as on this page) or chesspark style whole html page

jaxl.jquery.js is responsible for initiating and maintaining a connection between the browser and the PHP connection manager. While jaxl4bosh.class.php implements the BOSH protocol and maintain a persistent connection with the jabber server.

jaxl.jquery.js provide a few basic methods like:

  • jaxl.connect: Call for initiating the connection
  • jaxl.sendMessage: Call for sending a message to other jid’s
  • jaxl.ping: Call to maintain the connection and gather any incoming data
  • jaxl.disconnect: Call for disconnecting

jaxl4bosh.class.php provide wordpress style filter/hooks which can be used to modify every incoming and outgoing messages.

  • jaxl_pre_connect: Call to perform initialization before jaxl connects to jabber server
  • jaxl_post_connect: Call to perform shutdown after jaxl is connected to jabber server
  • jaxl_send_message: Call to perform actions on outgoing messages from jaxl
  • jaxl_recv_message: Call to perform actions on incoming messages to jaxl
  • jaxl_send_presence: Call to perform actions on outgoing presence from jaxl
  • jaxl_recv_presence: Call to perform actions on incoming presence from jaxl
  • jaxl_pre_disconnect: Call to perform initialization before jaxl disconnects to jabber server
  • jaxl_post_disconnect: Call to perform shutdown after jaxl is disconnected to jabber server

Jaxl and WordPress
Using Jaxl bosh support require you to only edit the configuration file. Here are the config variables:

        // JAXL config
        define('JAXL_BOSH_HOST', 'localhost');
        define('JAXL_BOSH_SERVER', 'localhost');
        define('JAXL_BOSH_URL', 'http://localhost:7070/http-bind/');
        define('JAXL_ADMIN_JID', 'admin@localhost');

You can configure Jaxl to use any of the available public xmpp services. However, I choose to host my own jabber server for my blog.

JAXL_ADMIN_JID is the admin jid to which Jaxl should route all incoming messages, added specifically for wordpress related requirement. PHP connection manager can be extended to route different chat sessions to different admins.

jaxl_recv_message handler is used to embed smiley’s and youtube videos by parsing the incoming chat messages.
jaxl-bosh-support-hook-demo-youtube-smiley

A few other hooks like jaxl_post_connect are used to notify JAXL_ADMIN_JID about the newly connected user.

Admin Screen
Below is a screen shot of how an admin desktop will look like while chatting with his site visitors:
Admin-screenshot-jaxl-bosh-support

Let me know if you are having any issues chatting using the chat bar at the bottom of the page. Code and installation might be buggy at times, and would appreciate any help from you on it.

Get real time system & server load notification on any IM using PHP and XMPP

There are various system and server related information which server administrators always need to have as soon as possible, infact I must say in real time. There are several open and closed source softwares in the market which can generate almost real time notifications for you. Most famous one being Nagios. In this blog post I will discuss, how to generate real time system notifications using PHP and XMPP. Specifically, I will present sample script using Jaxl (Jabber XMPP Client Library) for generating real time system load notifications, which can be received using any Instant Messengers.

/proc/loadavg
We will be using system /proc/loadavg file to get real time system load information. If you are unaware about this file, here is in brief how this file is helpful to us:

sabhinav:~# cat /proc/loadavg
0.22 0.12 0.09 1/68 12621

where first three columns measure the CPU and IO utilization of last one, five and 10 minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used.

jaxl4serveradmins.class.php
We will be using Jaxl PHP client library for handling the XMPP part. jaxl4serveradmins.class.php is an extension to Jaxl, providing various server administration helper function. Below is the code for server administration extension:

  include_once("xmpp.class.php");

  define('JAXL_SERVER_ADMIN', 'mailsforabhinav@gmail.com');
  define('JAXL_SERVER_LOAD_POLL_INTERVAL', 10);

  class JAXL extends XMPP {

    function eventMessage($fromJid, $content, $offline = FALSE) {
    }

    function eventPresence($fromJid, $status, $photo) {
    }

    function eventNewEMail($total,$thread,$url,$participation,$messages,$date,$senders,$labels,$subject,$snippet) {
      // Not used here. See jaxl4gmail.class.php for it's use case
    }

    function setStatus() {
      // Set a custom status or use $this->status
      $this->sendStatus($this->status);
      print "Setting Status...n";
      print "Donen";

      $this->addJob(JAXL_SERVER_LOAD_POLL_INTERVAL, array($this, 'parseServerLoad'));
    }

    function parseServerLoad() {
      $loadavg = file_get_contents('/proc/loadavg');
      $this->sendMessage(JAXL_SERVER_ADMIN, $loadavg);
    }

  }

I have utilized addJob() method provided by Jaxl library, using which you can specify a callback to be called after every N seconds (in short a periodic cron). Here we add a periodic job to be runned every JAXL_SERVER_LOAD_POLL_INTERVAL seconds. parseServerLoad() method is called as the callback function.

$this->addJob(JAXL_SERVER_LOAD_POLL_INTERVAL, array($this, 'parseServerLoad'));

To keep the demo simple, I am simply sending the content of /proc/loadavg file as a message to server admins.

    function parseServerLoad() {
      $loadavg = file_get_contents('/proc/loadavg');
      $this->sendMessage(JAXL_SERVER_ADMIN, $loadavg);
    }

Running it for your servers:
Follow the following steps to get this started on your server (only Unix, no Windows):

  • Checkout from Jaxl trunk
    sabhinav:~# sudo svn checkout http://jaxl.googlecode.com/svn/trunk/ jaxl-read-only
  • Enter checked out directory
    sabhinav:~# cd jaxl-read-only
  • Enter your server admin IM contact details
    sabhinav:~# sudo vim config.ini.php
    define('JAXL_SERVER_ADMIN', 'webmaster@foobar.com');
  • Enable server administration extension
    sabhinav:~# sudo vim index.php
    include_once("jaxl4serveradmins.class.php"); // include_once("jaxl.class.php");
  • Wroom Wroom, start Jaxl
    sabhinav:~# sudo php index.php
    Starting TLS Encryption...
    Attempting PLAIN Authentication...
    Starting Session...
    Requesting Feature List...
    Requesting Roster List...
    Setting Status...
    Done
    

Tail the jaxl log file in case you are facing any difficulties in the setup.

sabhinav:~# tail -f log/logger.log

You should also consider adding /proc/ directory under open_basedir in php.ini file.

Is it working?
If all is well configured server admins will start getting notifications every 10 seconds which is default value for JAXL_SERVER_LOAD_POLL_INTERVAL.
Jaxl4serveradmins.class.php example screenshot for system load

Writing custom notifications
Above I demonstrate how we can use XMPP and PHP to generate real time system notification. However, you may want to modify parseServerLoad() method to send notifications only when the server load exceeds a certain value. You may also want to add other methods which can notify you of various System and Server level parameters in a similar fashion. Below are a few useful system administration commands:

sabhinav:~# free -m
sabhinav:~# vmstat 1 20

Is it really real time?
Since, parseServerLoad() method polls for /proc/loadavg file every 10 seconds, this is not exactly real time. However you can configure JAXL_SERVER_LOAD_POLL_INTERVAL to make it poll faster. You can also use libevent extension in PHP to make it real time in real sense.

Do let me know if you write any interesting functionality, I will be more than happy to include it as a part of current extension.

Get lyrics for any song using XMPP and PHP right into your IM – Add lyricsfly@gtalkbots.com

XMPP is soon finding it’s way into real time applications other than just chat. I have combined JAXL (Jabber XMPP client library written in PHP) and the API from lyricsfly.com to build a real time chat bot which can assist you with lyrics for any song. You can start using it by simply adding lyricsfly@gtalkbots.com to your IM account (e.g. Gtalk, Jabber etc). In this blog post, I will explain in brief the working of lyricsfly bot and how you can integrate XMPP into your own application.

Try out lyricsfly@gtalkbots.com
Follow the following steps to get the bot working for you:

  • Login to your gtalk account using any of the IM available
  • Press Add Contact
  • Add lyricsfly@gtalkbots.com as your chat buddy
  • Send a chat message in following format “Song Title – Song Artist” e.g. “one – metallica”
  • You should see something like this: lyricsfly@gtalkbots.com Demo for "one-metallica"

Working of lyricsfly@gtalkbots.com with Jaxl
Here is in brief the working of lyricsfly bot using Jaxl client library:

  • When someone sends a message like “one – metallica” to the bot, eventMessage() method is called inside jaxl.class.php
  • eventMessage then extracts the song title and artist name from the message using PHP explode. Filter the title and artist names for allowed characters.
  • eventMessage also calls lyricsfly API and fetch the lyrics. Finally it sends the lyrics as message to requester.
  • eventMessage also uses memcached to cache the lyrics. It decreases both response time and load on lyricsfly servers
  • Bot also keeps a count of number of queries from a particular user. Since it is still under development, currently there is a limit on number of lyrics you can fetch in a single day.

Making your own custom bot

  • Checkout latest from the trunk
    sabhinav$ svn checkout http://jaxl.googlecode.com/svn/trunk/ jaxl-read-only
  • Edit config file with your bot username, password and jabber servers
  • Run from command like
    php index.php
  • To customize the bot modify eventMessage and eventPresence methods of Jaxl class inside jaxl.class.php

For a full fledged running bot example code, edit index.php and include jaxl4dzone.class.php instead of jaxl.class.php and re-run the bot.

Have fun and enjoy singing songs along with the lyrics.

WordPress style “Duplicate comment detected” using Memcached and PHP

If you have a knack of leaving comments on blogs, chances are you might have experienced a wordpress error page saying “Duplicate comment detected; it looks as though you’ve already said that!“, probably because you were not sure that your comment was saved last time and you tried to re-post your comment. In this blog post, I will put up some sample PHP code for Duplicate comment detection using Memcached without touching the databases. Towards the end, I will also discuss how the script can be modified for usage in any environment including forums and social networking websites.

Duplicate comment detection using Memcached
Here is a php function called is_repetitive_comment which return some useful value if the comment is repetitive, otherwise FALSE.

<?php

        define('COMPRESSION', 0);
        define('SIGNATURE_TTL', 60);

        $mem = new Memcache;
        $mem->addServer("localhost", 11211);

        function is_repetitive_comment($comment, $username) { // username can be ip address for anonymous env
                                                              // for per blog/forum checks pass forum id too
                                                              // for multi-host using same memcached instance, pass hostname too
                                                              // for restricting post of same comment, don't pass username
                $comment = trim($comment);
                $signature = md5(implode('',func_get_args()));

                global $mem;
                if(($value = $mem->get($signature)) !== FALSE) {
                        error_log($signature." found at ".time());
                        return $value;
                }
                else {
                        $value = array('comment' => $comment,
                                       'by' => $username,
                                       /* Other information if you may want to save */
                                      );
                        $mem->set($signature, $value, COMPRESSION, SIGNATURE_TTL);
                        error_log($signature." set at ".time());
                        return FALSE;
                }
        }

?>

Is it working?
Lets verify the working of the code and then we will dig into the code:

  • Save the sample code in a file, name it index.php
  • Towards the end of the script add following 3 line of code:
            var_dump(is_repetitive_comment("User Comment", "username"));
            sleep(5); // Simulating the case when a user might try to post the same comment again knowingly or unknowingly
                      // Similar kind of check is done in wordpress comment submission (though without memcached)
            var_dump(is_repetitive_comment("User Comment", "username"));
  • Run from command line:
    sabhinav$ php index.php
    6105b67d969642fe9e27bc052f29e259 set at 1262393877
    bool(false)
    6105b67d969642fe9e27bc052f29e259 found at 1262393882
    array(2) {
      ["comment"]=>
      string(12) "User Comment"
      ["by"]=>
      string(8) "username"
    }
  • As seen, function is_repetitive_comment returns bool(false) for the first time. However, after 5 seconds when same comment is being submitted it throws back some useful information from previous submission.

Working of is_repetitive_comment
Here is in brief, how memcached is used for duplicate comment detection by the script:

  • SIGNATURE_TTL defines the time limit between two similar comment submissions. Default set to 60 seconds
  • is_repetitive_comment takes two parameter namely the comment itself and the username of the user trying to post the comment.
  • The function create a signature by combining the passed parameters and checks whether a key=$signature exists in memcache
  • If key is found, it means same user has posted the same comment in past SIGNATURE_TTL i.e. 60 seconds. Function simply return back the value set for the key from memcache
  • However, if key is NOT found, user is allowed to post the comment by returning FALSE. However function also sets a key=$signature into memcache

The value of key=$signature depends upon your application and use case. You might want to save some useful parameters so that you can show appropriate error message without hitting the databases for anything.

Extracting more from the sample script
Here is how you can modify the above sample script for various environments:

  • If you are performing repetitive comment check in an anonymous environment i.e. commenter may not be registered users, you can pass commenter’s ip address instead of username
  • If you serve multiple sites out of the same box and all share the same memcached instance, you SHOULD also pass site’s root url to the function. Otherwise you might end up showing error message to wrong users
  • If you want to restrict submission of same comment per blog or forum, also pass the blog id to the function
  • If you want to simply restrict submission of same comment through out your site, pass only the comment to the function

Let me know if you do similar tiny little hacks using memcached 😀

How to use locks in PHP cron jobs to avoid cron overlaps

Cron jobs are hidden building blocks for most of the websites. They are generally used to process/aggregate data in the background. However as a website starts to grow and there is gigabytes of data to be processed by every cron job, chances are that our cron jobs might overlap and possibly corrupt our data. In this blog post, I will demonstrate how can we avoid such overlaps by using simple locking techniques. I will also discuss a few edge cases we need to consider while using locks to avoid overlap.

Cron job helper class
Here is a helper class (cron.helper.php) which will help us avoiding cron job overlaps. (See usage example below)

<?php

	define('LOCK_DIR', '/Users/sabhinav/Workspace/cronHelper/');
	define('LOCK_SUFFIX', '.lock');

	class cronHelper {

		private static $pid;

		function __construct() {}

		function __clone() {}

		private static function isrunning() {
			$pids = explode(PHP_EOL, `ps -e | awk '{print $1}'`);
			if(in_array(self::$pid, $pids))
				return TRUE;
			return FALSE;
		}

		public static function lock() {
			global $argv;

			$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;

			if(file_exists($lock_file)) {
				//return FALSE;

				// Is running?
				self::$pid = file_get_contents($lock_file);
				if(self::isrunning()) {
					error_log("==".self::$pid."== Already in progress...");
					return FALSE;
				}
				else {
					error_log("==".self::$pid."== Previous job died abruptly...");
				}
			}

			self::$pid = getmypid();
			file_put_contents($lock_file, self::$pid);
			error_log("==".self::$pid."== Lock acquired, processing the job...");
			return self::$pid;
		}

		public static function unlock() {
			global $argv;

			$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;

			if(file_exists($lock_file))
				unlink($lock_file);

			error_log("==".self::$pid."== Releasing lock...");
			return TRUE;
		}

	}

?>

Using cron.helper.php
Here is how the helper class can be integrated in your current cron job code:

  • Save cron.helper.php in a folder called cronHelper
  • Update LOCK_DIR as per your need
  • You might have to set proper permissions on folder cronHelper, so that running cron job have write permissions
  • Wrap your cron job code as show below:
    <?php
    
    	require 'cronHelper/cron.helper.php';
    
    	if(($pid = cronHelper::lock()) !== FALSE) {
    
    		/*
    		 * Cron job code goes here
    		*/
    		sleep(10); // Cron job code for demonstration
    
    		cronHelper::unlock();
    	}
    
    ?>

Is it working? Verify
Lets verify is the helper class really take care of all the edge cases.

  • sleep(10) is our cron job code for this test
  • Run from command line:
    sabhinav$ php job.php
    ==40818== Lock acquired, processing the job...
    ==40818== Releasing lock...
    

    where 40818 is the process id of current running cron job

  • Run from command line and terminate the cron job in between by pressing CNTR+C:
    sabhinav$ php job.php
    ==40830== Lock acquired, processing the job...
    

    By pressing CNTR+C, we simulate the cases when a cron job can die in between due to a fatal error or system shutdown. In such cases, helper class fails to release the lock on this cron job.

  • With the lock in place (ls -l cronHelper | grep lock), run from command line:
    sabhinav$ php job.php
    ==40830== Previous job died abruptly...
    ==40835== Lock acquired, processing the job...
    ==40835== Releasing lock...
    

    As seen, helper class detects that one of the previous cron job died abruptly and then allow the current job to run successfully.

  • Run the cron job from two command line window and one of them will not proceed as shown below:
    centurydaily-lm:cronHelper sabhinav$ php job.php
    ==40856== Already in progress...
    

    One of the cron job will die since a cron job with $pid=40856 is already in progress.

Working of cron.helper.php
The helper class create a lock file inside LOCK_DIR. For our test cron job above, lock file name will be job.php.lock. Lock file name suffix can be configured using LOCK_SUFFIX.

cronHelper::lock() places the current running cron job process id inside the lock file. Upon job completion cronHelper::unlock() deletes the lock file.

If cronHelper::lock() finds that lock file already exists, it extracts the previous cron job process id from the lock file and checks whether a previous cron job is still running. If previous job is still in progress, we abort our current current job. If previous job is not in progress i.e. died abruptly, current cron job acquires the lock.

This is the classic method for avoiding cron overlaps. However there can be various other methods of achieving the same thing. If you know any do let me know through your comments.

How to build a custom static file serving HTTP server using Libevent in C

Libevent is an event notification library which lays the foundation for immensely successful open source projects like Memcached. As the web advances into a real time mode, more and more websites are using a mix of technologies like HTTP Pub-Sub, HTTP Long-polling and Comet with a custom light weight HTTP servers in the backend to create a real time user experience. In this blog post, I will start with necessary prerequisites for setting up the development environment. Further, I will demonstrate how to build a HTTP server capable of serving static pages. Finally, I will put up a few use cases of a custom HTTP server in today’s world.

Setting up Environment
Follow the following steps to install the latest version of libevent (version 2.0.3-alpha)

  • $ wget http://www.monkey.org/~provos/libevent-2.0.3-alpha.tar.gz
  • $ tar -xvzf libevent-2.0.3-alpha.tar.gz
  • $ cd libevent-2.0.3-alpha.tar.gz
  • ./configure
  • make
  • sudo make install

Check the environment by running the following piece of C code (event2.cpp):

#include <event2/event.h>

int main(int argc, char **argv) {
	const char *version;
	version = event_get_version();
	printf("%sn", version);
	return 0;
}

Compile and run as following:

$ g++ -arch x86_64 -Wall -levent event2.cpp -o event2
$ ./event2
$ 2.0.3-alpha

I had to pass -arch x86_64 flags on Mac OSX 10.5.8. This can vary depending upon your operating system.

Libsrvr: Static file serving HTTP Server
Below is the C code for a static file serving HTTP server using libevent called “Libsrvr”:

libsrvr.h

// General purpose header files
#include <iostream>
#include <getopt.h>
#include <sys/stat.h>

// Libevent header files
#include </usr/local/include/event2/event.h>
#include </usr/local/include/event2/http.h>
#include </usr/local/include/event2/buffer.h>

// Libsrvr configuration settings
#define LIBSRVR_SIGNATURE "libsrvr v 0.0.1"
#define LIBSRVR_HTDOCS "/Users/sabhinav/libsrvr/www"
#define LIBSRVR_INDEX "/index.html"

// Libsrvr http server and base struct
struct evhttp *libsrvr;
struct event_base *libbase;

// Libsrvr options
struct _options {
	int port;
	char *address;
	int verbose;
} options;
  • LIBSRVR_SIGNATURE is the server signature sent as response header for all incoming requests
  • LIBSRVR_HTDOCS is the path to the the DocumentRoot for libsrvr
  • LIBSRVR_INDEX is the similar to DirectoryIndex directive of apache

libsrvr.cpp

#include </Users/sabhinav/libsrvr/libsrvr.h>

void router(struct evhttp_request *r, void *arg) {
	const char *uri = evhttp_request_get_uri(r);

	char *static_file = (char *) malloc(strlen(LIBSRVR_HTDOCS) + strlen(uri) + strlen(LIBSRVR_INDEX) + 1);
	stpcpy(stpcpy(static_file, LIBSRVR_HTDOCS), uri);

	bool file_exists = true;
	struct stat st;
	if(stat(static_file, &st) == -1) {
		file_exists = false;
		evhttp_send_error(r, HTTP_NOTFOUND, "NOTFOUND");
	}
	else {
		if(S_ISDIR(st.st_mode)) {
			strcat(static_file, LIBSRVR_INDEX);

			if(stat(static_file, &st) == -1) {
				file_exists = false;
				evhttp_send_error(r, HTTP_NOTFOUND, "NOTFOUND");
			}
		}
	}

	if(file_exists) {
		int file_size = st.st_size;

		char *html;
		html = (char *) alloca(file_size);

		if(file_size != 0) {
			FILE *fp = fopen(static_file, "r");
			fread(html, 1, file_size, fp);
			fclose(fp);
		}

		struct evbuffer *buffer;
		buffer = evbuffer_new();

		struct evkeyvalq *headers = evhttp_request_get_output_headers(r);
		evhttp_add_header(headers, "Content-Type", "text/html; charset=UTF-8");
		evhttp_add_header(headers, "Server", LIBSRVR_SIGNATURE);

		evbuffer_add_printf(buffer, "%s", html);
		evhttp_send_reply(r, HTTP_OK, "OK", buffer);
		evbuffer_free(buffer);

		if(options.verbose) fprintf(stderr, "%st%dn", static_file, file_size);
	}
	else {
		if(options.verbose) fprintf(stderr, "%st%sn", static_file, "404 Not Found");
	}

	free(static_file);
}

int main(int argc, char **argv) {
	int opt;

	options.port = 4080;
	options.address = "0.0.0.0";
	options.verbose = 0;

	while((opt = getopt(argc,argv,"p:vh")) != -1) {
		switch(opt) {
			case 'p':
				options.port = atoi(optarg);
				break;
			case 'v':
				options.verbose = 1;
				break;
			case 'h':
				printf("Usage: ./libsrvr -p port -v[erbose] -h[elp]n");
				exit(1);
		}
	}

	libbase = event_base_new();
	libsrvr = evhttp_new(libbase);
	evhttp_bind_socket(libsrvr, options.address, options.port);
	evhttp_set_gencb(libsrvr, router, NULL);
	event_base_dispatch(libbase);

	return 0;
}

Here is some explanation for the above code:

  • Command line options are parsed using GNU getopt library
  • libbase is the event base for HTTP server libsrvr.
  • HTTP server is bind to port 4080 (by default).
  • A callback function is registered for each incoming HTTP request to libsrvr. Function router is invoked every time a HTTP request is received
  • Finally libbase is dispatched and code never reaches return 0

The working of the router function is as follows:

  • Incoming request uri is converted to absolute file path on the system
  • Checks for file or directory existence is done
  • If absolute path is a directory, LIBSRVR_INDEX is served out of that directory

Launching Libsrvr:
Compile and run the libsrvr as follows:

$ g++ -arch x86_64 -Wall -levent libsrvr.cpp -o libsrvr
$ ./libsrvr -v
/Users/sabhinav/libsrvr/www//index.html	538
/Users/sabhinav/libsrvr/www/assets/style.css	35
/Users/sabhinav/libsrvr/www/assets/script.js	27
/Users/sabhinav/libsrvr/www/dummy	404 Not Found
/Users/sabhinav/libsrvr/www/index.html	538
/Users/sabhinav/libsrvr/www/assets/style.css	35

If started under verbose mode (-v), libsrvr will output each requested file path on the console as shown above.

Use cases
Below are a few use cases of a custom HTTP server as seen in web today:

  • Facebook Chat: Uses a custom http server based on mochiweb framework
  • Yahoo finance: Uses a custom http streaming server based on libevent

Generally, iframe technique is combined with javascript hacks for streaming data from the custom http servers. Read “How to make cross-sub-domain ajax (XHR) requests using mod_proxy and iframes” for details.

Conclusion
Though a static file server find little place in today’s world, the idea was to show the ease by which you can create your own HTTP server which is light weight, fast and scalable (all thanks to Niels for his libevent). Couple libsrvr with memcached for caching static files, and benchmark will show over 10,000 req/sec handling capability of libsrvr.

Share if you like it and also let me know your thoughts through comments.