MySQL Query Cache, WP-Cache, APC, Memcache – What to choose

Hello Cache Freaks,

Ever since I changed my job (from Business Intelligence to Web development) and started working with my present employer, I have had a chance to work on a lot of scalable projects. From making my project to scale from 20 Million PV’s to 100 Million PV’s to development of an internal tool, the answer to all scalable applications have been caching.

There are a lot of caching techniques which are being employed by sites worldwide.

  1. WP-Cache used in wordpress – a file system based caching mechanism
  2. APC Cache – an opcode based caching system
  3. Memcache – an in memory caching system
  4. Query Cache – caching mechanism employed in MySQL

Here in this post I would like to pen down my experiences while working with all the caching mechanism. Their pros and cons. What things you need to take care while working with them and every little tit bit which comes to my mind while writing this post.

Query Cache – inbuilt cache mechanism in MySQL
Query cache is an inbuilt cache mechanism for MySQL. Basically when you fire a query against a MySQL database, it goes through a lot of core modules. e.g. Connection Manager, Thread Manager, Connection Thread, User Authentication Module, Parse Module, Command Dispatcher, Optimizer Module, Table Manager, Query Cache Module and blah blah. Discussing these modules is out of scope of this blog post. The only module we are interested here is Query Cache Module.

Suppose I fire a query:
$query = “Select * from users”;

MySQL fetches it for the first time from the database and caches the result for further similar query. Next time when you fire a similar query, it picks the result from the cache and deliver it back to you.

However, there are a few drawbacks with Query Cache:

  • If there is a new entry in the users table, the cache is cleared.
  • Secondly, even if the result of your query is cached, MySQL has to go through a number of core modules before it give back the cached result to you.
  • Thirdly, even if your results are caches, you need to connect to your MySQL database, which generally have a bottleneck with number of connections allowed.

One thing which you should take care while replying on Query Cache is that, your query must not have any parameter which is random or changes quite often. For e.g. If you wish to fetch url’s of 100 photo from the database, and then you want to present them in a random fashion every time, you might be tempted to use rand() somewhere in your MySQL queries. However by using rand() you ensure that the query is never returned from cache, since rand() parameter always makes the query look different.

Similarly, If you have a case where you need to show data not older than 15 days, and by mistake you also include the seconds parameter in your SQL query, then the query will never return from cache.

Hence for a scalable site, with 100 Million PV’s you can’t really survive with a simple query cache provided by MySQL database.

WP-Cache – Caching mechanism for WordPress blogs
WP-Cache is a file system based caching mechanism i.e. it caches your blog posts in form of simple text files which are saved on your file system. You can have a look at these cached files by visiting wp-content/cache folder inside your blog directory. Generally you will find two set of files for a single blog post. One .html and another .meta file.

.html file generally contains the static html content for your blog post. Once published, the blog post content is static, hence instead of fetching it’s data from the database, WP-Cache serves it from the cache directory.

.meta file contains serialized information such as Last Modified, Content-Type, URI which WP-Cache uses to maintain cache expiry on your blog.

WP-Cache works really well, however if the traffic starts increasing on your blog, then the bottleneck will be maximum number of child processes which apache can create (For starters you can think, each user connecting to your blog as one apache process, hence there is a restriction on number of users who can connect to your blog at a particular time). In such case the solution can be either to have multiple front end servers with a load balancer to distribute the traffic among front end servers, or to have a better cache solution such as a memory based caching mechanism (e.g. Memcache). Also since memory read is always faster than file read, you must go for a memory based cache system.

APC Cache – An opcode based cache for PHP
APC stands for Alternative PHP Cache. In one of my previous post How does PHP echo’s a “Hello World”? – Behind the scene, I talk about how PHP churns out “Hello World” for you.

PHP takes your code through various steps:

  1. Scanning – The human readable source code is turned into tokens.
  2. Parsing – Groups of tokens are collected into simple, meaningful expressions.
  3. Compilation – Expressions are translated into instruction (opcodes)
  4. Execution – Opcode stacks are processed (one opcode at a time) to perform the scripted tasks.

Opcode caches let the ZEND engine perform the first three of these steps, then store that compiled form (opcodes) so that the next time a given script is used, it can use the stored version without having to redo those steps only to come to the same result.

However the problem with APC cache is that it is not a distributed cache system. By distributed cache I mean, if you have 3 frontend server then you need to have a copy of this opcode on all the three fronend server. Also like WP-Cache APC is again a file system driven cache system, which is not the optimal solution. Also with APC cache, PHP still has to go through the last step as described above.

Memcache – In memory based cache mechanism
Memcache is the solution when you talk about million PV’s on your site. It is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

For starters, Memcache is not PHP, nor Python or any other thing as you may think. It’s a deamon which runs in the background on your server. Your code connects to it and cache query results, JS, CSS and other cachable data in the server’s memory (RAM). Since its an in-memory caching system, it is faster than any of the discussed caching system above. However it is unreliable and unsecure but then there are ways to tackle this unreliable and unsecure nature of memcache.

Memcache is unreliable because it resides in your system memory. So an event like system reboot or power failure will result in loss of all your cache. Also memcache provides no mechanism to take backup of your caches. Hence once lost, you need to programmatically warmup your caches.

Memcache is unsecure because it doesn’t require any authentication mechanism. There is no username or password with which your code connects to it. (Hence it is super fast, unlike Query cache which has to go through auth module even if the query result is cached). It usually runs at port 11211 on your server and if not taken care, anyone can telnet to port 11211 on your server and steal your caches.

Below are the steps which are being followed on a memcache enabled website:

  1. User enter your site’s url in his browser, say http://localhost
  2. There are about 6 queries which drives your opening page
  3. Lets assume one of the query for this page is $query = “SELECT photo_url from photos LIMIT 0,10”
  4. When the user visit http://localhost, your code will first connect to memcache deamon on port 11211
  5. If memcache deamon is running, it checks if the result of this query are already cached. Generally data is cached in memcache as (key,value) pair
  6. Since this is the first visit on your site, ofcourse there is no query being cached till now. Hence your code now connect to your MySQL database and fetched the resultset. Something like this probably. $resultset = mysql_query($query);
  7. After successfully fetching the resultset, your code will cache this resultset in memcache. The code connects to memcache deamon and saves this (key,value) pair in memory, where $key = md5($query) and $value = serialize($resultset)
  8. After caching the (key,value) pair, your code returns back the fetched resultset to the frontpage where it is being displayed to the user
  9. Similarly all the 6 queries which drives your front page are being cached one by one
  10. Now the next time when another user visit this page i.e. http://localhost, your code will first see if $key = md5($query) is already present in cache. It will find the $key being cached, fetches the serialized resultset from memory, unserialize it and throws back to the front page where it is displayed as intended
  11. While caching (key,value) pair you also have a provision to specify a TTL (time to live) after which memcache will automatically expire your cached result
  12. Suppose you specifies a TTL = 15 Minutes for all the above queries and now a visitor visit http://localhost after 30 minutes
  13. Your code will again first connect to memcache deamon and check if $key = md5($query) is present in cache. Memcache deamon will see that this $key is present but it has expired. It will return a result saying $key is not cached and internally flushes out $key. Your code then again connect to MySQL database, fetches the resultset and cache back the results in memcache for further use

I will leave you with a presentation on memcache which I gave sometime back at office. I hope it will help you gain more understanding of memcache.

Memcache

View SlideShare presentation or Upload your own. (tags: caching memcache)

In my next posts, I will be covering a few code samples and use cases of memcache which you wouldn’t even heard of. If you liked the post don’t forget to promote it on social networking sites and do subscribe to my blog. 🙂

Caching problem in IE with Ajax : A Solution

Hello All,

Problem Definition:
I made my first browser based chat messenger way back in January this year, and I tried making a improved version of it yesterday again. But unfortunately Microsoft continues to give 1000 reasons for a web developer to suicide. They continue to serve the same old buggy browser way back in 2000 and even now.

The problem I was facing with IE7 (I haven’t tested on IE6), while trying to make a browser based chat messenger was that the chat application loads the chat messages from the server successfully only for the first time i.e. when you load the site for the first time. However, after that it continues showing the same old loaded data, even as new messages keep coming in the server. My chat application always worked fine in Firefox, Netscape, Safari and Flock. So what the hell with Internet Explorer.

The Real Problem:
I faced the same problem 6 months back and was facing it again now. Frustrating enough! As a web developer its a reason enough to die. Anyways, I finally tried analyzing the things. I went around installing Wireshark , which lets you see each and every http request made from your computer. Unfortunately, It was just not showing any ajax request being made by my site (opened on IE). What the hell? However, it showed successful ajax requests being made by my site opened on Firefox.

Enough! The problem was clear that, for some reason IE was not allowing the Ajax request being made. Then I went around installing IE Debug Bar, there in the HTTP(S) section you can see each and every single request being made by IE. I restarted IE and opened my site again. There in the HTTP(S) section, I could see ajax request being made successfully every 15 seconds as intended (a live chat application will set that to about 2 seconds). Then why the hell was wireshark not showing the same.On further investigation with DebugBar I found this was what was not allowing my site to make a successful ajax request the second time:

  • In the Headers section of HTTP(S) I could see:
    GET /ajax.php?getchatmessage=true&id=36&username=imoracle HTTP/1.1
    Accept: */*
    Accept-Language: en-in
    Referer: http://altertunes.com/home.php
    UA-CPU: x86
    Accept-Encoding: gzip, deflate
    HTTP/1.1 200 OK
    X-Powered-By: PHP/5.2.5
    Keep-Alive: timeout=5, max=100
    Transfer-Encoding: chunked
    Content-Type: text/html

  • In the Timing section of HTTP(S) I could see:
    Request start time: Thu, 05 Jun 2008 22:29:55
    All request retrieved from cache (no server request)

  • Further, in the Info & Cache section of HTTP(S) I could see:
    URL: http://altertunes.com/ajax.php?getchatmessage=true&id=36&username=imoracle
    Content Type: text/html
    Headers size (bytes): 336
    Data size (bytes): 3
    Total size (bytes): 339
    Transferred data size (bytes): 0
    Cached data: Yes
    Cache File: C:UsersabhiAppDataLocalMicrosoftWindowsTemporary Internet FilesLowContent.IE5O184DNELajax[1].htm
    Cache expires: Thu, 01 Jan 1970 05:30:00
    Cache last modified: Thu, 01 Jan 1970 05:30:00

Hence the conclusion was, IE do make an ajax request every 15 seconds as intended. But as it always reads the content from the the browser cache, it never gets the new messages from the server. And hence, Wireshark was unable to show any request being made from IE browser, as the request never goes out of the computer.

Solution to this disaster:
I tried looking for many solutions available over internet. A few of them asks to set some settings in your browser, which will force browser to read all the data from server directly. But for two reasons you surely won’t like to do this:

  1. You surely would like to cache your various images, css and other files for faster loading of your website.
  2. Next, You simply can’t expect your users to do the settings, just to make sure that your ajax runs on IE.

A few other sites asked to set a header information at the top of the php page. Probably something like this:

header(“Last-Modified: ” . gmdate(“D, d M Y H:i:s”) . ” GMT”);
header(“Cache-Control: no-store, no-cache, must-revalidate”);
header(“Cache-Control: post-check=0, pre-check=0”,false);
header(“Pragma: no-cache”);

But even after setting the following header IE was not allowing the ajax requests to fetch the data from the server.

Working Solution:
Then I thought of why not make my ajax requests look unique every time it is fired. i.e. if my ajax request seems like this in the firebug:
http://altertunes.com/ajax.php?getchatmessage=true&id=36&name=imoracle

why not make it look like this the next time:
http://altertunes.com/ajax.php?getchatmessage=true&id=36&name=imoracle&timeid=1234.4323

and like this the very next time:
http://altertunes.com/ajax.php?getchatmessage=true&id=36&name=imoracle&timeid=4324.3589

I tried this up and Bingo! My ajax was working just as great, as it was in any other browser.

I simply changed the url in my ajax request from:
http://altertunes.com/ajax.php?getchatmessage=true&id=36&name=imoracle to
http://altertunes.com/ajax.php?getchatmessage=true&id=36&name=imoracle&timeid=(Math.random()*100000)

and yes, I have my chat messenger working great now.

Try out this solution if you are facing a similar problem with your Ajax requests. I will try to test the same in IE6 if I can find one, but I am pretty sure it will work great there too.

Feel free to post your comments and modified solutions if any.

Happy Ajaxing.