Introducing MemChat: Open source group chat framework in PHP supporting Memcached, APC, SQLite, Flat Files and MySQL

MemChat is an open source group chat framework for personal and enterprise level websites. Written in PHP, MemChat can be configured to run with Memcached, APC, SQLite, Flat files and MySQL as it’s storage engine. With memcached, APC and Flat files serving as temporary storages and MySQL, SQLites being permanent storage engines.

MemChat uses MemBurger API for storing all the incoming messages in the storage engine. MemBurger is an open source PHP wrapper for all the storage engines mentioned above providing Collapsed Forwarding and Stale-While-Revalidate functionality.

MemChat can also be configured to notify the site owners at various event handlers provided. e.g. In case of a wordpress blog, site owner might want to get notified when someone post a new message on one of the blog post. Infact developers can write plugins for custom event handlers. MemChat provides two kind of notification methods. By default, MemChat uses XMPP protocol to notify blog owners. For these kind of notifications, JAXL an open source Jabber XMPP Client Library is used to send instant messages as notifications. MemChat can also be configured to send notifications using SMTP protocol a.k.a EMail.

MemChat also comes with a profanity word filter using PHProfane library i.e. one can configure MemChat to block all spam messages posted in the chat rooms. Developers can write a plugin to add custom spam words in the profanity filter.

MemChat Flow Diagram:
Below is a flow diagram showing how all the above components are clubbed in as MemChat:
MemChat Workflow Diagram

MemChat use cases:
MemChat framework requires a unique alphanumeric id for each group you want to create. For example, WordPress blog makes a good environment to setup MemChat since every blog post have a unique id. Hence to setup MemChat on a wordpress blog all we need to do is, call

$memchat_ui_html = memchat_init($memchat_group_id);

method, where $memchat_group_id = Blog post id.

Similarly MemChat can be setup on a number of places as listed below:

  1. Forums: Since each forum have a unique id per discussion thread, MemChat suits well here. Forum owners can setup MemChat for allowing current viewing users of the thread to chat in real time. If using temporary storage engines (since they scale up well), forum owners can easily write event handlers to save chat messages asynchronously so that interesting discussions can be made a part of actual discussion thread.
  2. Social Networking Websites: A social network will generally have group pages, fan pages, event pages and application pages. Since all pages will have a unique id associated with them, MemChat is a good fit here. Simply pass $blog_id = “group-1234”, for setting up a chat room on group page having id 1234. Similarly for fan, event and application pages.
  3. Blogs: A blog as discussed above is a perfect place for setting up MemChat. WordPress blog owners will be able to setup MemChat on their blog using WP-Chat plugin.

Setting up MemChat in 3 steps:
MemChat can be setup in 3 simple steps:

  1. Download: MemChat is hosted on Google Code.
  2. Update Configuration: Update MemChat config file to setup various default behavior and actions.
  3. Initialize: Include MemChat class file in your PHP template.
    include_once("/path/to/memchat.class.php")

    Next simply initialize MemChat

    $memchat_ui_html = memchat_init($memchat_group_id);

    which will return the user interface HTML code. Finally append it to your template and MemChat group is ready to serve.

The ease of setup can be imagine from the fact that, WP-Chat wordpress plugin development required only the above 3 steps. I was able to develop the plugin within 5 minutes.

MemChat Performance Benchmark
Initial benchmarking results show, On a single apache and single storage instance MemChat is capable of handling over 250 (MySQL) incoming chat messages per second. Capacity reaches in thousands when configured with Memcached or APC cache. Again Memcached configuration and APC no stat feature makes a difference in performance results.

Why still in alpha?
Yes, MemChat is still in alpha release. I want to test all MemChat features on a live traffic and hence I chose my blog for its testing. I have installed MemChat on my blog which is currently using APC cache as storage medium. Over two weeks or so I will configure MemChat with various other storage engines to have a final round of test. Also the event handler and PHProfane plugin feature is still under development. MemBurger API also requires a more modular design pattern so that extending it for any other storage types become trivial.

Try out MemChat on my blog, can be seen as a facebook type chat bar below. This will be soon available as WP-Chat plugin for wordpress blogs. WP-Chat user interface is customizable just like wordpress themes.

Let me know your views and suggestions for improvement.

MySQL Query Cache, WP-Cache, APC, Memcache – What to choose

Hello Cache Freaks,

Ever since I changed my job (from Business Intelligence to Web development) and started working with my present employer, I have had a chance to work on a lot of scalable projects. From making my project to scale from 20 Million PV’s to 100 Million PV’s to development of an internal tool, the answer to all scalable applications have been caching.

There are a lot of caching techniques which are being employed by sites worldwide.

  1. WP-Cache used in wordpress – a file system based caching mechanism
  2. APC Cache – an opcode based caching system
  3. Memcache – an in memory caching system
  4. Query Cache – caching mechanism employed in MySQL

Here in this post I would like to pen down my experiences while working with all the caching mechanism. Their pros and cons. What things you need to take care while working with them and every little tit bit which comes to my mind while writing this post.

Query Cache – inbuilt cache mechanism in MySQL
Query cache is an inbuilt cache mechanism for MySQL. Basically when you fire a query against a MySQL database, it goes through a lot of core modules. e.g. Connection Manager, Thread Manager, Connection Thread, User Authentication Module, Parse Module, Command Dispatcher, Optimizer Module, Table Manager, Query Cache Module and blah blah. Discussing these modules is out of scope of this blog post. The only module we are interested here is Query Cache Module.

Suppose I fire a query:
$query = “Select * from users”;

MySQL fetches it for the first time from the database and caches the result for further similar query. Next time when you fire a similar query, it picks the result from the cache and deliver it back to you.

However, there are a few drawbacks with Query Cache:

  • If there is a new entry in the users table, the cache is cleared.
  • Secondly, even if the result of your query is cached, MySQL has to go through a number of core modules before it give back the cached result to you.
  • Thirdly, even if your results are caches, you need to connect to your MySQL database, which generally have a bottleneck with number of connections allowed.

One thing which you should take care while replying on Query Cache is that, your query must not have any parameter which is random or changes quite often. For e.g. If you wish to fetch url’s of 100 photo from the database, and then you want to present them in a random fashion every time, you might be tempted to use rand() somewhere in your MySQL queries. However by using rand() you ensure that the query is never returned from cache, since rand() parameter always makes the query look different.

Similarly, If you have a case where you need to show data not older than 15 days, and by mistake you also include the seconds parameter in your SQL query, then the query will never return from cache.

Hence for a scalable site, with 100 Million PV’s you can’t really survive with a simple query cache provided by MySQL database.

WP-Cache – Caching mechanism for WordPress blogs
WP-Cache is a file system based caching mechanism i.e. it caches your blog posts in form of simple text files which are saved on your file system. You can have a look at these cached files by visiting wp-content/cache folder inside your blog directory. Generally you will find two set of files for a single blog post. One .html and another .meta file.

.html file generally contains the static html content for your blog post. Once published, the blog post content is static, hence instead of fetching it’s data from the database, WP-Cache serves it from the cache directory.

.meta file contains serialized information such as Last Modified, Content-Type, URI which WP-Cache uses to maintain cache expiry on your blog.

WP-Cache works really well, however if the traffic starts increasing on your blog, then the bottleneck will be maximum number of child processes which apache can create (For starters you can think, each user connecting to your blog as one apache process, hence there is a restriction on number of users who can connect to your blog at a particular time). In such case the solution can be either to have multiple front end servers with a load balancer to distribute the traffic among front end servers, or to have a better cache solution such as a memory based caching mechanism (e.g. Memcache). Also since memory read is always faster than file read, you must go for a memory based cache system.

APC Cache – An opcode based cache for PHP
APC stands for Alternative PHP Cache. In one of my previous post How does PHP echo’s a “Hello World”? – Behind the scene, I talk about how PHP churns out “Hello World” for you.

PHP takes your code through various steps:

  1. Scanning – The human readable source code is turned into tokens.
  2. Parsing – Groups of tokens are collected into simple, meaningful expressions.
  3. Compilation – Expressions are translated into instruction (opcodes)
  4. Execution – Opcode stacks are processed (one opcode at a time) to perform the scripted tasks.

Opcode caches let the ZEND engine perform the first three of these steps, then store that compiled form (opcodes) so that the next time a given script is used, it can use the stored version without having to redo those steps only to come to the same result.

However the problem with APC cache is that it is not a distributed cache system. By distributed cache I mean, if you have 3 frontend server then you need to have a copy of this opcode on all the three fronend server. Also like WP-Cache APC is again a file system driven cache system, which is not the optimal solution. Also with APC cache, PHP still has to go through the last step as described above.

Memcache – In memory based cache mechanism
Memcache is the solution when you talk about million PV’s on your site. It is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

For starters, Memcache is not PHP, nor Python or any other thing as you may think. It’s a deamon which runs in the background on your server. Your code connects to it and cache query results, JS, CSS and other cachable data in the server’s memory (RAM). Since its an in-memory caching system, it is faster than any of the discussed caching system above. However it is unreliable and unsecure but then there are ways to tackle this unreliable and unsecure nature of memcache.

Memcache is unreliable because it resides in your system memory. So an event like system reboot or power failure will result in loss of all your cache. Also memcache provides no mechanism to take backup of your caches. Hence once lost, you need to programmatically warmup your caches.

Memcache is unsecure because it doesn’t require any authentication mechanism. There is no username or password with which your code connects to it. (Hence it is super fast, unlike Query cache which has to go through auth module even if the query result is cached). It usually runs at port 11211 on your server and if not taken care, anyone can telnet to port 11211 on your server and steal your caches.

Below are the steps which are being followed on a memcache enabled website:

  1. User enter your site’s url in his browser, say http://localhost
  2. There are about 6 queries which drives your opening page
  3. Lets assume one of the query for this page is $query = “SELECT photo_url from photos LIMIT 0,10”
  4. When the user visit http://localhost, your code will first connect to memcache deamon on port 11211
  5. If memcache deamon is running, it checks if the result of this query are already cached. Generally data is cached in memcache as (key,value) pair
  6. Since this is the first visit on your site, ofcourse there is no query being cached till now. Hence your code now connect to your MySQL database and fetched the resultset. Something like this probably. $resultset = mysql_query($query);
  7. After successfully fetching the resultset, your code will cache this resultset in memcache. The code connects to memcache deamon and saves this (key,value) pair in memory, where $key = md5($query) and $value = serialize($resultset)
  8. After caching the (key,value) pair, your code returns back the fetched resultset to the frontpage where it is being displayed to the user
  9. Similarly all the 6 queries which drives your front page are being cached one by one
  10. Now the next time when another user visit this page i.e. http://localhost, your code will first see if $key = md5($query) is already present in cache. It will find the $key being cached, fetches the serialized resultset from memory, unserialize it and throws back to the front page where it is displayed as intended
  11. While caching (key,value) pair you also have a provision to specify a TTL (time to live) after which memcache will automatically expire your cached result
  12. Suppose you specifies a TTL = 15 Minutes for all the above queries and now a visitor visit http://localhost after 30 minutes
  13. Your code will again first connect to memcache deamon and check if $key = md5($query) is present in cache. Memcache deamon will see that this $key is present but it has expired. It will return a result saying $key is not cached and internally flushes out $key. Your code then again connect to MySQL database, fetches the resultset and cache back the results in memcache for further use

I will leave you with a presentation on memcache which I gave sometime back at office. I hope it will help you gain more understanding of memcache.

Memcache

View SlideShare presentation or Upload your own. (tags: caching memcache)

In my next posts, I will be covering a few code samples and use cases of memcache which you wouldn’t even heard of. If you liked the post don’t forget to promote it on social networking sites and do subscribe to my blog. 🙂