MySQL Query Cache, WP-Cache, APC, Memcache – What to choose

Hello Cache Freaks,

Ever since I changed my job (from Business Intelligence to Web development) and started working with my present employer, I have had a chance to work on a lot of scalable projects. From making my project to scale from 20 Million PV’s to 100 Million PV’s to development of an internal tool, the answer to all scalable applications have been caching.

There are a lot of caching techniques which are being employed by sites worldwide.

  1. WP-Cache used in wordpress – a file system based caching mechanism
  2. APC Cache – an opcode based caching system
  3. Memcache – an in memory caching system
  4. Query Cache – caching mechanism employed in MySQL

Here in this post I would like to pen down my experiences while working with all the caching mechanism. Their pros and cons. What things you need to take care while working with them and every little tit bit which comes to my mind while writing this post.

Query Cache – inbuilt cache mechanism in MySQL
Query cache is an inbuilt cache mechanism for MySQL. Basically when you fire a query against a MySQL database, it goes through a lot of core modules. e.g. Connection Manager, Thread Manager, Connection Thread, User Authentication Module, Parse Module, Command Dispatcher, Optimizer Module, Table Manager, Query Cache Module and blah blah. Discussing these modules is out of scope of this blog post. The only module we are interested here is Query Cache Module.

Suppose I fire a query:
$query = “Select * from users”;

MySQL fetches it for the first time from the database and caches the result for further similar query. Next time when you fire a similar query, it picks the result from the cache and deliver it back to you.

However, there are a few drawbacks with Query Cache:

  • If there is a new entry in the users table, the cache is cleared.
  • Secondly, even if the result of your query is cached, MySQL has to go through a number of core modules before it give back the cached result to you.
  • Thirdly, even if your results are caches, you need to connect to your MySQL database, which generally have a bottleneck with number of connections allowed.

One thing which you should take care while replying on Query Cache is that, your query must not have any parameter which is random or changes quite often. For e.g. If you wish to fetch url’s of 100 photo from the database, and then you want to present them in a random fashion every time, you might be tempted to use rand() somewhere in your MySQL queries. However by using rand() you ensure that the query is never returned from cache, since rand() parameter always makes the query look different.

Similarly, If you have a case where you need to show data not older than 15 days, and by mistake you also include the seconds parameter in your SQL query, then the query will never return from cache.

Hence for a scalable site, with 100 Million PV’s you can’t really survive with a simple query cache provided by MySQL database.

WP-Cache – Caching mechanism for WordPress blogs
WP-Cache is a file system based caching mechanism i.e. it caches your blog posts in form of simple text files which are saved on your file system. You can have a look at these cached files by visiting wp-content/cache folder inside your blog directory. Generally you will find two set of files for a single blog post. One .html and another .meta file.

.html file generally contains the static html content for your blog post. Once published, the blog post content is static, hence instead of fetching it’s data from the database, WP-Cache serves it from the cache directory.

.meta file contains serialized information such as Last Modified, Content-Type, URI which WP-Cache uses to maintain cache expiry on your blog.

WP-Cache works really well, however if the traffic starts increasing on your blog, then the bottleneck will be maximum number of child processes which apache can create (For starters you can think, each user connecting to your blog as one apache process, hence there is a restriction on number of users who can connect to your blog at a particular time). In such case the solution can be either to have multiple front end servers with a load balancer to distribute the traffic among front end servers, or to have a better cache solution such as a memory based caching mechanism (e.g. Memcache). Also since memory read is always faster than file read, you must go for a memory based cache system.

APC Cache – An opcode based cache for PHP
APC stands for Alternative PHP Cache. In one of my previous post How does PHP echo’s a “Hello World”? – Behind the scene, I talk about how PHP churns out “Hello World” for you.

PHP takes your code through various steps:

  1. Scanning – The human readable source code is turned into tokens.
  2. Parsing – Groups of tokens are collected into simple, meaningful expressions.
  3. Compilation – Expressions are translated into instruction (opcodes)
  4. Execution – Opcode stacks are processed (one opcode at a time) to perform the scripted tasks.

Opcode caches let the ZEND engine perform the first three of these steps, then store that compiled form (opcodes) so that the next time a given script is used, it can use the stored version without having to redo those steps only to come to the same result.

However the problem with APC cache is that it is not a distributed cache system. By distributed cache I mean, if you have 3 frontend server then you need to have a copy of this opcode on all the three fronend server. Also like WP-Cache APC is again a file system driven cache system, which is not the optimal solution. Also with APC cache, PHP still has to go through the last step as described above.

Memcache – In memory based cache mechanism
Memcache is the solution when you talk about million PV’s on your site. It is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

For starters, Memcache is not PHP, nor Python or any other thing as you may think. It’s a deamon which runs in the background on your server. Your code connects to it and cache query results, JS, CSS and other cachable data in the server’s memory (RAM). Since its an in-memory caching system, it is faster than any of the discussed caching system above. However it is unreliable and unsecure but then there are ways to tackle this unreliable and unsecure nature of memcache.

Memcache is unreliable because it resides in your system memory. So an event like system reboot or power failure will result in loss of all your cache. Also memcache provides no mechanism to take backup of your caches. Hence once lost, you need to programmatically warmup your caches.

Memcache is unsecure because it doesn’t require any authentication mechanism. There is no username or password with which your code connects to it. (Hence it is super fast, unlike Query cache which has to go through auth module even if the query result is cached). It usually runs at port 11211 on your server and if not taken care, anyone can telnet to port 11211 on your server and steal your caches.

Below are the steps which are being followed on a memcache enabled website:

  1. User enter your site’s url in his browser, say http://localhost
  2. There are about 6 queries which drives your opening page
  3. Lets assume one of the query for this page is $query = “SELECT photo_url from photos LIMIT 0,10″
  4. When the user visit http://localhost, your code will first connect to memcache deamon on port 11211
  5. If memcache deamon is running, it checks if the result of this query are already cached. Generally data is cached in memcache as (key,value) pair
  6. Since this is the first visit on your site, ofcourse there is no query being cached till now. Hence your code now connect to your MySQL database and fetched the resultset. Something like this probably. $resultset = mysql_query($query);
  7. After successfully fetching the resultset, your code will cache this resultset in memcache. The code connects to memcache deamon and saves this (key,value) pair in memory, where $key = md5($query) and $value = serialize($resultset)
  8. After caching the (key,value) pair, your code returns back the fetched resultset to the frontpage where it is being displayed to the user
  9. Similarly all the 6 queries which drives your front page are being cached one by one
  10. Now the next time when another user visit this page i.e. http://localhost, your code will first see if $key = md5($query) is already present in cache. It will find the $key being cached, fetches the serialized resultset from memory, unserialize it and throws back to the front page where it is displayed as intended
  11. While caching (key,value) pair you also have a provision to specify a TTL (time to live) after which memcache will automatically expire your cached result
  12. Suppose you specifies a TTL = 15 Minutes for all the above queries and now a visitor visit http://localhost after 30 minutes
  13. Your code will again first connect to memcache deamon and check if $key = md5($query) is present in cache. Memcache deamon will see that this $key is present but it has expired. It will return a result saying $key is not cached and internally flushes out $key. Your code then again connect to MySQL database, fetches the resultset and cache back the results in memcache for further use

I will leave you with a presentation on memcache which I gave sometime back at office. I hope it will help you gain more understanding of memcache.

Memcache
View SlideShare presentation or Upload your own. (tags: caching memcache)

In my next posts, I will be covering a few code samples and use cases of memcache which you wouldn’t even heard of. If you liked the post don’t forget to promote it on social networking sites and do subscribe to my blog. :)

  • Sowmya

    great post…followed by a nice presentation ! :)

  • http://abhinavsingh.com admin

    Thanks, though i think this presentation won’t make much sense without the presenter. It mostly have the points I wanted to discuss, but not what I discussed.

  • mustafa

    Very nice post. Specially for beginners like me who have either only heard about these caching technologies or only used them like WP-Cache but never cared to understand what exactly do they do.

    Looking forward to your next memcache post.

  • foobar

    ” Also like WP-Cache APC is again a file system driven cache system” – guess again. APC stores cached data in server’s memory.

  • http://abhinavsingh.com admin

    Oh is it….I didn’t knew that, infact I have never played with APC much, apart from using it for making file upload meter. Will checkout and edit the buggy part. Thanks for pointing this out. :)

  • K

    You don’t have to choose between them  , each has it’s own purpose.  The best solution is always a combination between them.  An excellent post  about caching can be found here :
    http://techblog.tilllate.com/2007/11/30/caching-of-dynamic-data-sets/.

    As a side note , file system caching should not be associated with wordpress (to much credit for wordpress) and you should really do your homework before posting something ( the APC thing).

  • http://abhinavsingh.com admin

    Yeah mate the title doesn’t convey what I wanted to, by chosing I meant, chosing between file system cache and in memory cache. :) Yes I didn’t do any homework for APC, infact zero homework for APC.

  • Jammy

    First of all congratulations on your new theme. I am in love with it, though its copied from google code i guess. Secondly, very nice post and how did you missed that APC stuff :P Nothing serious just pulling your leg ;)

    Lemme know when you open source your theme, will look out for that.

  • Federico

    You can secure memcached by specifying the IP address it listens to when you run the daemon:

    $ ./memcached -d -m 2048 -l 127.0.0.1 -p 11211

    Memcahced listen on 127.0.0.1. The -l is an important option to consider. Binding to an internal or firewalled network interface is suggested.

  • Johannes

    Another note about APC: You can also store PHP variables inside shared memory using APC, not only the Opcode array, which is a nice, fast, cache for stuff you need on,, mostly, every request. See apc_stoe()/apc_fetch()

  • Johannes

    apc_store() I meant

  • Sandyy

    Hey thanx for this post Abhi. Here is a request, if you can possibly write a post on extended usage of memcache. How to use memcache at enterprise level.

    Thanxs Sandyy

  • http://sudhirpatil.livejournal.com/ sudhir

    Nice & very informative posts .. i guess bit of goof up at APC … but thts fine. ur going into my feed reader. Keep on posting articles like this.

  • Rhone

    I specially liked the way you drafted out this blog post. Very much clear and straight forward, with nothing to confuse the readers. Burning your feed, will chk back soon :)

  • Pingback: Abhi Navsingh’s Blog: MySQL Query Cache, WP-Cache, APC, Memcache - What to choose | How2Pc

  • Pingback: Memcached and “N” things you can do with it « Abhi’s Weblog

  • Pingback: Utilisation du cache « Mysql

  • http://jamescoletti.com James Coletti

    APC and similar packages can also store variables in memory, aside from being an opcode cache. While you’re talking about WordPress, there are object caching plugins available, which can leverage APC, XCache, etc. and store WP objects in memory to reduce DB hits. I recently switched to XCache and am using a slightly modified XCache Object Caching plugin for WordPress by Dougal Campbell. Shameless plug of my blog post with my recent experiences here: http://jamescoletti.com/multiple-wordpress-installs-and-object-cache-collision

  • Pingback: 網站製作學習誌 » [Web] 連結分享

  • http://www.alachisoft.com Kevin Clark

    Alachisoft’s NCache is a great tool for both in-proc and out-proc caching with loads of other cool features which you won’t find in a lot of Caching Solutions available in the market. It also has a free version called Ncache Express.

  • http://electchep.blogspot.com niyata

    thank !!

  • http://electcheap.blogspot.com niyata

    Very much clear and straight forward

  • http://www.vigrxdeals.net buy_vigrxplus

    The best information i have found exactly here. Keep going Thank you

  • Pingback: Building a Custom PHP Framework with a custom template caching engine using Output Control functions | Abhi's Weblog

  • http://www.zoombits.fr/ram ddr

    Hi,
    Thanks for a good day we left with loads of ideas and thing to think about. Once again thanks for sharing.

  • http://af-design.com/blog Erik Giberti

    Abhinav,
    I ran a comparison looking at the raw performance differences between APC and Memcached that shows APC cache is roughly 30% faster than Memcached – but is not without limitations. Be sure to look at the results on my blog:
    http://af-design.com/blog/2009/01/28/php-memory-caching-performance/
    The real key is that you need to find the levels of caching that benefit your applications usage. It’s rarely sufficient to provide a single layer of caching. It should also be noted that another excellent way to cache content that isn’t covered here (or in my article) is to generate static files (think .html) that require no processing at all for content that rarely changes. That helped Slashdot to survive large scale traffic during the Sept. 11th terrorist attacks in the US.
    http://www.ojr.org/ojr/workplace/1015016724.php
    Erik

  • Pingback: MySQL Query Cache, WP-Cache, APC, Memcache – What to choose | Abhi’s Weblog

  • jak

    Nice overview. Though i think now the web is moving more towards NoSQL solutions. Redis is one hot solution in the market.

  • Mayank

    Very Nice Article.

    I’ve a question though :

    How does the memcache work in distributed environment?

    How the data cahced by one appllication by using memcache can be used by other application?

    Thanks.
    Mayank.

    • http://abhinavsingh.com Abhinav Singh

      Hi Mayank,

      Well consider memcached as a distributed hash table allowing you to set ($key, $value) pairs and later on retrieve them by $key name.

      Hence a ($key, $value) pair set by some application is always accessible to any other application who can query for $key from memcached.

      If you have more than one memcached box in your pool, try using consistent hashing otherwise you might run into unexpected results.

    • Mayank

      Hey Abhinav, thanks for the quick reply.

      I’wd like to know, what will be the best solution in multi-server(Front End servers) web aplication.

      1. Just have only single memcahce server.

      2. Have multiple memcache daemon( memcache pool).

      In 2 case, is it possible to have multiple daemons in sych?

    • http://abhinavsingh.com Abhinav Singh

      Hi Mayank,

      Well choosing a single or multiple clustered instances of memcached depends completely upon your application peak load. Start with one daemon, and see if it is enough for your site. If not add another memcached daemon. Infact you can keep adding as many as you may want to.

      Best will be to read memcached wiki here: http://code.google.com/p/memcached/w/list . You will get a lot of idea how memcached should be used.

      All the best!

    • Mayank

      Thanks Abhinav.

      Still have a doubt :)

      Is it possible to have multiple memcahce daemons in sych?

    • http://abhinavsingh.com Abhinav Singh

      I have a doubt, what do you mean by sync :D

    • Mayank

      Synch* :)

    • Mayank

      Anyway, thanks for the information!!!!!!

    • http://abhinavsingh.com Abhinav Singh

      Well if you mean something similar to replication by “sync” then i will suggest you to kindly go ahead and read the wiki. Memcached is not like MySQL and have no inbuilt support for key replication. All has to be managed by you.

  • Pingback: Memcached y cosas que se pueden hacer con él | Sentido Web

  • Pingback: Enlaces rápidos (16-02-2010) | Sentido Web

  • http://chatchennai.com Pradeep

    I just wanted to clarify on the caching levels, since I see readers here asking “Which of these do I use”

    As you had mentioned, APC works at the source code level, removing the need for Scanning, Parsing, Compilation of your PHP files for each request.

    Memcached is a self-managed cache where you can store stuff like database resultsets.

    An ideal architecture would use both in tandem.. APC or Zend Performance Cache for the opcode caching and memcached for the application-specific database resultset caching.

    Memcahced cannot do what APC does for you, and APC cannot do what Memcached does for you!

  • Deeps

    Hi,

    Any updates to this wrt. PHP 6? If at all anything is changing..

    Thanks!

  • sanjiv

    Very Nice and informative topics.

  • http://www.maltblue.com Matthew Setter

    A nicely comparison of a good set of caching options and where they do and do not work suitably.

    Matt

  • romeo antony

    Great post. Clearly explained