Getting started with Autotools – GNU Build System on Debian

If you eat and drink open source, chances are high that you might have downloaded an open source project code, only to see files like: aclocal.m4,,, and what not. You might have also used commands like ./configure, make etc but what are these files? Does they really belong to the project you download? Do I need to understand them? In this blog post I look forward to answer all your question, as well as introduce you to not so popular Autotools – A GNU Build System.

Setting up Autotools on Debian?
Before we go ahead and understand what Autotools is, we will try building a HelloWorld package. Lets get started by setting up Autotools on debian machine.

  • apt-get install build-essential
  • gcc –version (verifying install)
  • g++ –version (verifying install)
  • apt-get install automake autoconf

You have your environment ready. Lets start packaging the HelloWorld package.

Hello World Source Code
Download full source code from here

We will need to create 5 files for our basic HelloWorld package. Start by creating a directory structure like this:
    — README
    — src
        — helloworld.c


#include <config.h>
#include <stdio.h>

int main (void) {
    puts ("Hello World!");
    puts ("This is " PACKAGE_STRING ".");
    return 0;

Note we don’t have a config.h file but still we include it here. In actual config.h will be autogenerated by the autotools, when we build the package. Similarly, PACKAGE_STRING will be a pre-defined variable inside config.h.


bin_PROGRAMS = helloworld
hello_SOURCES = helloworld.c

Here we tell the build system to generate a binary named helloworld using the sources defined below i.e. helloworld.c


Here we give information about the various sub-directory. For a bigger project you might have a man directory, data directory etc. Also we tell the build to package README file with the build.


This is a demonstration HelloWorld package for GNU Automake.
Type `info Automake' to read the Automake manual.

AC_INIT([helloworld], [1.0], [[email protected]])
AM_INIT_AUTOMAKE([-Wall -Werror foreign])

Don’t leave the post on seeing the above file. We will go through each and every one of them. contains a series of M4 macros that will expand to some shell code to finally generate the configure script. Autotools have utilities like automake and autoconf (details below) which read this file to generate intermediate and final build files. The variables starting with AC_ are Autoconf macros and those starting with AM_ are Automake macros.

  1. AC_INIT: Initializes autoconf. It takes 3 input parameters: Name of the package, Version of the package and Contact address for bug reports
  2. AM_INIT_AUTOMAKE: Initializes automake. It can take a number of available input parameters. -Wall -Werror specifically tells automake to turn on all warnings and report them as error. While development we will keep error reporting turned on. foreign tells automake that this package doesn’t follow GNU standard. As per GNU standards we should also distribute files like ChangeLog, AUTHORS and at this stage we don’t want automake to complaint about them.
  3. AC_PROG_CC: This line tells configure script to search available C compilers and define variable CC with its name. Later on many intermediate files will use this variable CC for building binary files.
  4. AC_CONFIG_HEADERS: It tells the configure script to generate a config.h file which is pre-included by helloworld.c. Generated config.h will have content like this:
    /* config.h.  Generated from by configure.  */
    /*  Generated from by autoheader.  */
    /* Name of package */
    #define PACKAGE "helloworld"
    /* Define to the address where bug reports for this package should be sent. */
    #define PACKAGE_BUGREPORT "[email protected]"
    /* Define to the full name of this package. */
    #define PACKAGE_NAME "helloworld"
  5. AC_CONFIG_FILES: This tells configure script list of files from which it should generate it’s *.in templates. This variable is also used by automake utility to know list of it should process. (Note: Each directory should have a file and as you keep adding new directories keep adding them to AC_CONFIG_FILES, else build will not consider your new directories while building packages.
  6. AC_OUTPUT: It is a closing command that actually produces the part of the script in charge of creating the files registered with AC_CONFIG_HEADERS and AC_CONFIG_FILES.

Building a Hello World package for distribution
Lets create our first package for distribution.

  1. cd path/to/helloworld/directory: Migrate to the project directory
  2. autoreconf –install: This command initiates the build system. You should see something like this as output: installing `./missing' installing `./install-sh'
    src/ installing `./depcomp'

    Also if you scan through the HelloWorld directory, you will find a lot of new files being generated by the build system. Particularly you will see a being generated for each Apart from these files of interest are configure and

  3. ./configure: It utilizes *.in files generated by the previous step to build the Makefile, src/Makefile and config.h. You should see something like this on your console:
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    checking for a thread-safe mkdir -p... /bin/mkdir -p
    checking for gawk... no
    checking for mawk... mawk
    checking whether make sets $(MAKE)... yes
    checking for gcc... gcc
  4. make
  5. src/helloworld: This will output this on the console.
    Hello World!
    This is helloworld 1.0.
  6. make distcheck: This utility finally creates the helloworld-1.0.tar.gz package for distribution. You should see this on your console on running this utility:
    helloworld-1.0 archives ready for distribution:

Installing distributed HelloWorld package

  1. Copy the generated package into your temp directory and then issue the following commands
  2. tar -xzvf helloworld-1.0.tar.gz
  3. cd helloworld-1.0
  4. ./configure
  5. make
  6. make install

make install will copy the helloworld binary into the /usr/local/bin directory. Try running helloworld from command line and you should see a similar output, as we saw above while building the package. Further it also copies the README file under /usr/local/share/doc/helloworld directory. If your built package includes the man directory, it gets copied to /usr/local/share/man automatically.

What is Autotools?
Autotools is a build system developed by GNU which helps you distribute your source code across various Unix systems. The files you are wondering about are auto generated by the Autotools.

Autotools is a combination of several utilities made available by GNU, including:

  1. Autoconf
  2. Automake

There are many others which can be listed above, but for this blog post we will restrict ourselves to Automake and Autoconf only.

autoconf process files like to generate a configure script. When we run the configure script, it reads other template files like to generate a final output file, in this case Makefile

It reads all and generate corresponding, used by the configure script as described above.

Happy Packaging!

Memcached and “N” things you can do with it – Part 1

In my last post MySQL Query Cache, WP-Cache, APC, Memcache – What to choose, I discussed in brief about 4 caching technologies which you might have used knowingly or unknowingly.

Towards the end we came to a conclusion that memcached is the best caching solution when you are looking for speed and number of hits per second. By my experience, memcached is capable of handling more than a 100 Million PV’s per month without any problem. However, towards the end I did also discussed why memcached is unreliable and unsecure.

In this post I will dig a level deeper into memcached. For ease here is the so called table of content:

  1. Basics: Memcached – Revisited
  2. Code Sample: A memcached class and how to use it
  3. N things: What else can I do with memcached
  4. Case Study: How Facebook uses memcached
  5. DONT’s: A few things to be taken care

Basics: Memcached – Revisited
Memcached was developed by Brad when live journal was hitting more than 20 Million PV’s per day. Handling 20 Million PV’s was no joke and he needed a better solution to handle such a high traffic. Since most of the blogs doesn’t change once published, he thought of having a model where he can skip the database read for the same posts again and again or atleast reduce the number of database reads. And hence came Memcached. Memcached is a deamon which runs in the background. By deamon you may think of a process running in the background doing its job.

If you are using ubuntu or debian like me, here are the steps for installing memcached:

  1. sudo apt-get install php5-memcache
  2. Go to /etc/php5/conf.d/memcache.ini and uncomment the line ; to enable this module
  3. sudo pecl install memcache
  4. Go to php.ini file and add this line:
  5. sudo apt-get install memcached
  6. sudo /etc/init.d/memcached start
  7. Restart Apache

If you are on windows, here are the steps which will help you running memcached on windows machine:

  1. Download the memcached win32 binary from my vault.
  2. Unzip the downloaded file under C:memcached
  3. As we need to install memcached as a service, run this from a command line: C:memcachedmemcached.exe -d install from the command line
  4. Memcache by default loads with 64Mb of memory which is just not sufficient enough. Navigate to HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesmemcached Server in your registry and find the ImagePath entry. Change that to “C:memcachedmemcached.exe” -d runservice -m 512
  5. Start the server by running this command: C:memcachedmemcached.exe -d start
  6. Open folder C:PHPext and check for php_memcache.dll. If you are unlucky not to have that download from here for PHP-4.x and from here for PHP-5.x
  7. Add extension=php_memcache.dll in your php.ini file
  8. Restart Apache
  9. Download the full package consisting of exe’s, classes and dll’s from here.

A few other options which you can use for starting memcached are:
memcached -d -m 2048 -l -p 11211 , This will start memcached as a daemon, using 2GB of memory, and listening on IP, port 11211. Port 11211 is the default port for memcached.

By now I assume you have a 🙂 on your face, because you have memcached deamon running on your machine. Windows users can check that by opening up the task manager and looking for a memcached.exe process. I don’t see any reason for you not smiling, but if in case you are that unlucky windows user, please leave that system and move to a unix machine. Atleast try running Ubuntu on windows by reading this post of mine How to configure Ubuntu and LAMP on Windows.

Code Sample: Memcached class
So we have memcached setup on our system. Now we will try to hook up a simple code with it, which will do all the necessary talking with the deamon. Below I will demonstrate a very basic and simple application which will help you getting started. This application contains a total of 5 files namely: database.class.php, memcache.class.php, log.class.php, index.php, memcacheExtendedStats.php.

This is a very basic logger class which will log every thing for you to inspect later. Here is the code:

  class logAPI {
    var $LogFile = "log.txt";

    function Logger($Log) {
      $fh = fopen($this->LogFile,"a");

When ever the code connects with memcached deamon, or database or fail to connect to either of them , this class will log the appropriate message in log.txt

This is another basic database class which have methods like getData() and setData(). getData() is used while trying to retrieve rows from the database while setData() is used while updating or inserting new rows in the database. For this demo application we will only be using the getData() method. Here is the code:


  class databaseAPI {

    /** Database information **/
    var $dbhost = "localhost";
    var $dbuser = "root";
    var $dbpass = "";
    var $dbname = "gtalkbots";
    var $db = NULL;

    function __construct() {

    // Function establishes a connection to database //
    function connect() {
      // Connect to the dbhost
      $connection = mysql_connect($this->dbhost,$this->dbuser,$this->dbpass) or die(mysql_error());

	  // If connection fails send a mail to $dbmail about the same
      if(!$connection) {
        echo "Failed to establish a connection to host";
      else {
        // Connect to dbname
        $database = @mysql_select_db($this->dbname);

	// If fails to connect to the database send a mail to $dbmail
        if(!$database) {
          echo "Failed to establish a connection to database";
        else {
          $this->db = $connection;

    // Function closes the database connection //
    function close() {

    // Function executes the query against database and returns the result set   //
    // Result returned is in associative array format, and then frees the result //
    function getData($query,$options=array("type"=>"array","cache"=>"on"),$resultset="") {
        // Lookup on memcache servers if cache is on
	if($options['cache'] == "on") {
	    $obj = new memcacheAPI();
	    if($obj->connect()) {
	        // Try to fetch from memcache server if present
		$resultset = $obj->getCache($query);
    	    else {
	        // Fetch query from the database directly
	// If $resultset == "" i.e. either caching is off or memcache server is down
        // OR $resultset == null i.e. when $query is not cached
	if($resultset == "" || $resultset == null) {
	    $result = mysql_query($query,$this->db);
	    if($result) {
		if($options['type'] == "original") {
		    // Return the original result set, if passed options request for the same
		    $resultset = $result;
                else if($options['type'] == "array") {
	            // Return the associative array and number of rows
	    	    $mysql_num_rows = mysql_num_rows($result);
		    $result_arr = array();
		    while($info = mysql_fetch_assoc($result)) {
		    $resultset = array("mysql_num_rows" => $mysql_num_rows,"result" => $result_arr,"false_query" => "no");
  		// Cache the $query and $resultset
		return $resultset;

                // Free the memory
            else {
	        $resultset = array("false_query" => "yes");
	        return $resultset;
	else {
	    // If $query was found in the cache, simple return it
	    return $resultset;

    // Function executes the query against database (INSERT, UPDATE) types   //
    function setData($query) {
      // Run the query
      $result = mysql_query($query,$this->db);
      // Return the result
      return array('result'=>$result,'mysql_affected_rows'=>mysql_affected_rows());


Memcache class consists of two methods: getCache() and setCache(). getCache() will look up for the (key,value) pair in memory. If it exists, the method unserialize it and returns back. setCache() is used to set (key,value) pair in memory. It accepts the key and value, serialize the value before storing in cache.

  class memcacheAPI {

	/* Define the class constructor */
	function __construct() {
	  $this->connection = new Memcache;
	  $this->log = new logAPI();
	  $this->date = date('Y-m-d H:i:s');
	  $this->log->Logger("[[".$this->date."]] "."New Instance Created
n"); } /* connect() connects to the Memcache Server */ /* returns TRUE if connection established */ /* returns FALSE if connection failed */ function connect() { $memHost = "localhost"; $memPort = 11211; if($this->connection->connect($memHost,$memPort)) { $this->log->Logger("[[".$this->date."]] "."Connection established with memcache server
n"); return TRUE; } else { $this->log->Logger("[[".$this->date."]] "."Connection failed to establish with memcache server
n"); return FALSE; } } /* close() will disconnet from Memcache Server */ function close() { if($this->connection->close()) { $this->log->Logger("[[".$this->date."]] "."Connection closed with memcache server
n"); $this->log->Logger("=================================================================================================
nn"); return TRUE; } else { $this->log->Logger("[[".$this->date."]] "."Connection didn't close with memcache server
n"); $this->log->Logger("=======================================================================================================
nn"); return FALSE; } } /* getCache() function will fetch the passed $query resultset from cache */ /* returned resultset is null if $query not found in cache */ function getCache($query) { /* Generate the key corresponding to query */ $key = base64_encode($query); /* Get the resultset from cache */ $resultset = $this->connection->get($key); /* Unserialize the result if found in cache */ if($resultset != null) { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was found already cached
n"); $resultset = unserialize($resultset); } else { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was not found cached in memcache server
n"); } return $resultset; } /* setCache() function will set the serialized resultset on Memcache Server */ function setCache($query,$resultset,$useCompression=0,$ttl=600) { /* Generate the key corresponding to query */ $key = base64_encode($query); /* Set the value on Memcache Server */ $resultset = serialize($resultset); if($this->connection->set($key,$resultset,$useCompression,$ttl)) { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was cached
n"); return TRUE; } else { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was not able to cache
n"); return FALSE; } } }

With everything in place, its time to test memcached. We will check if memcached is working fine by running this code file twice one by one. Open command line and point to this code. Run from command line: php index.php . Then again run from command line php index.php.

  $mdb = new databaseAPI();

  $query = "SELECT * from status LIMIT 0,1000";
  $resultset = $mdb->getData($query);

  echo "
  echo "


If everything is working fine, you will see a log.txt file being generated which will look as follows.


[[2009-01-18 09:52:57]] New Instance Created
[[2009-01-18 09:52:57]] Connection established with memcache server
[[2009-01-18 09:52:57]] Query SELECT * from status LIMIT 0,1000 was not found cached in memcache server
[[2009-01-18 09:52:57]] Query SELECT * from status LIMIT 0,1000 was cached
[[2009-01-18 09:52:57]] Connection closed with memcache server
[[2009-01-18 09:53:08]] New Instance Created
[[2009-01-18 09:53:08]] Connection established with memcache server
[[2009-01-18 09:53:08]] Query SELECT * from status LIMIT 0,1000 was found already cached
[[2009-01-18 09:53:08]] Connection closed with memcache server

From the log file we can see that the 1st time results were fetched from the database and for the second time from memcached 🙂

Before we proceed further lets explain the flow of the above scripts. In index.php we create a new instance of database.class.php $mdb. Then we try to $query for 100 rows from the database. $mdb->getData($query) initiates this database fetch. As the program control goes to getData() method of database.class.php, it passed the control to getCache() method of memcache.class.php. There the code create a $key = base64_encode($query) and checks if we have the result set cached in memcached. If it doesn’t exists, it passed control back to getData() of database.class.php which fetches it from the database. After the fetch, it passes the $resultset back to setCache() method of memcache.class.php. There the setCache() method serialize the $resultset and cache it as ($key,$value) = (base64_encode($query), serialize($resultset)) in memcache.

Next time when the same query is fired and control goes to getCache() method of memcache.class.php, it fetches the result from cache, unserialize it and returns back the result to getData() of database.class.php. And thats why you see a log similar to above.

Finally it’s time to see some statistics. Here is a simple file which will show memcache status:

  $memcache_obj = new Memcache;
  $memcache_obj->addServer('localhost', 11211);

  $stats = $memcache_obj->getExtendedStats();

  echo "
  echo "


Running it from command line using: php memcache.extendedstats.php will give you a statistic array like this.

    [localhost:11211] => Array
            [pid] => 5472
            [uptime] => 17
            [time] => 1232303504
            [version] => 1.2.5
            [pointer_size] => 32
            [curr_items] => 1
            [total_items] => 1
            [bytes] => 271631
            [curr_connections] => 2
            [total_connections] => 5
            [connection_structures] => 3
            [cmd_get] => 2
            [cmd_set] => 1
            [get_hits] => 1
            [get_misses] => 1
            [evictions] => 0
            [bytes_read] => 271705
            [bytes_written] => 271614
            [limit_maxbytes] => 536870912
            [threads] => 1


This array tells you of a number of things about how your memcached deamon and caching architecture is performing. In short here is what each of the variable would mean:

  1. pid: Process id of this server process
  2. uptime: Number of seconds this server has been running
  3. time: Current UNIX time according to the server
  4. version: Version string of this server
  5. rusage_user: Accumulated user time for this process
  6. rusage_system: Accumulated system time for this process
  7. curr_items: Current number of items stored by the server
  8. total_items: Total number of items stored by this server ever since it started
  9. bytes: Current number of bytes used by this server to store items
  10. curr_connections: Number of open connections
  11. total_connections: Total number of connections opened since the server started running
  12. connection_structures: Number of connection structures allocated by the server
  13. cmd_get: Cumulative number of retrieval requests
  14. cmd_set: Cumulative number of storage requests
  15. get_hits: Number of keys that have been requested and found present
  16. get_misses: Number of items that have been requested and not found
  17. bytes_read: Total number of bytes read by this server from network
  18. bytes_written: Total number of bytes sent by this server to network
  19. limit_maxbytes: Number of bytes this server is allowed to use for storage.

However at this stage the figures which you might be interested in knowing are get_hits and get_misses. get_misses means number of times you requested for a key in memcache and it was not found while get_hits means number of times your requested key was successfully retrieved from memcached. Hence as expected we currently have get_misses = 1 and get_hits = 1. Try running php index.php once more and get_hits will get incremented by one.

N things: What else can I do
Till now you have memcached deamon running on your system and you know how to communicate with the deamon. You also know a very basic usage of memcached now, i.e. cache the queries to reduce load from your database. However there is a lot more you can do with memcached.

Here I would like to present before you a few applications of memcached which I learnt from my experiences. I hope they will be enough to make you think otherwise.

  1. Restricting spammers on your site : Often you will find on social networking sites like Digg, Facebook and Orkut that if you try to add several users as your friend within a short span, they (facebook) will show you a warning or they (digg) will restrict you from doing it. Similarly there are cases when you want to send a shout to more than 200 users on digg and you are restricted from doing so. How will you implement this on your site?

    Ordinary user’s solution: One solution is if a user ‘X’ add another user ‘Y’ as a friend, you will check how many friends has ‘X’ added in the past 10 minutes. If that exceeds 20, you won’t allow him to add more friends or show him a warning message. Simple enough and will work coolly. But what if your site have more than a million users, with many hackers around the world trying to keep your servers busy. As a memcached user you must have following solution in your mind:

    Your solution: As ‘X’ adds ‘Y’ in his friend list, you will set a (key,value) pair in memcached, where $key = “user_x” with $TTL = 10 minutes. For the first friend added, $value = 1. Now as ‘X’ tries to add another friend, your simply increment $value for $key = “user_x”. As this $value equals 20, and ‘X’ tried to add another friend, your code will check the value of $key = “user_x” and see if it’s present in memcached. If it’s present check for it’s value. If it’s value is equal to 20, you show a warning message to the user. Hence restricting him from adding more than 20 friends within a time span of 10 minutes. After 10 minutes, $key = “user_x” will automatically expires and your code will allow ‘X’ to add more friends. Similar solution exists if you want to stop spammers from sending message or commenting over a limit on your site. Now I see confidence building in you as a memcached programmer 😀

  2. Detecting online active/inactive users : Often you want a method which can tell your registered users, about their online friends. Or in a forum you want to tell the current reading user, about who all are reading this post. I won’t tell how an ordinary user will implement this, but as a memcached user your solution should be:

    Ordinary user’s solution: You don’t want to think of this.

    Your solution: As user ‘X’ visit post#12345 in the forum, not only you will fetch post data from the database, you also set two (key,value) pairs.

    $key1 = “post_12345”
    $value1 = [[comma separated list of user names]]
    $TTL1 = [[some arbitrary large value is fine]]

    $key2 = “user_x”
    $value2 = “post_12345”
    $TTL2 = 10 minutes, We assume a user to be inactive if he is on a particular post for more than 10 minutes (tunable), and we will mark him as inactive.

    (key1,value1) is a post specific data while (key2,value2) is a user specific data. Now every time a user visits post#12345, you will do two things. Read all comma separated user names from $value1, and then check for their corresponding $key2 value. If corresponding $key2 exists and equals to $value2 = “post_12345”, i.e. on the same post and not idle, we will keep that user name in value of $key1. However if $key2 is not present (i.e. user gone away) or value of $key2 equals to some other post, we will remove his user name from $value1. Confused 😛 , read twice and the picture will be clear.

    Can you think of a better solution? Please let me and others know about it. (Remember we are trying to detect only active/inactive users, which is not same as online/offline users)

  3. Building scalable web services : Another application of memcached lies in building scalable web services and web widgets. Gtalkbots offer a cool widget which you can put on your blog and sites to show off your aggregated status messages. See this widget live in the right hand sidebar. While building this widget, one thing which I kept in mind was that, what if someone with million hits per day put my widget on his site. Even though, Gtalkbots gets a few thousand of hits per day, it will crash, mainly because of this particular widget being placed on a high traffic site. So as a memcached user I have done the following thing.

    Ordinary user’s solution: Deprecated

    Your solution: I simply cache the widget data in memcache with $TTL = 1 hour. So every time this million hits per day site is accessed, which loads my widget million times a day, the query will be returned from cache. And hence saving my server from crashing. Fetch Gtalkbots widget from here and try putting on your site.

Alright by now you can impress your bosses with your cool memcache implementations. But wait there is a lot more you need to know. I can go on putting hundred’s of memcached applications here, but main point is, setting your mind as a memcached user. I personally have this habit of linking everything to memcached while designing a system, and if it suits my need, Bingo!.

  1. Versioning your cache keys : One disadvantage of using cache at times is that, if unfortunately something goes wrong with your data and that buggy data is cached, your users will keep seeing that buggy data until your cache expires or you manually clear off your cache. Suppose you clear your cache and then one of your fucking engineer comes running saying the data is fine.

    Ordinary user’s solution: Stop it, No more plz

    Your solution: As a memcached user, i would love to keep a Versioning system for my caches. Nothing complex, simply append “v1” (version number) to all your keys. i.e. $key = “user_x” will now become $key = “v1:user_x”. and at a place in your code you have $current_cache_version = “v1”. Now suppose you are told your data is buggy, so by the time your engineers are investigation change $current_cache_version = “v2”. This will keep your old caches, which you may want to recover after your investigation and at the same time show new data to your users.

  2. Not so frequent update of Db for trivial data : This is a site dependent application. Suppose you run a site where you are not so serious about database columns like “last_update_date”, “last_logged_in” and so on. However you still keep a track of such for analysis purpose and don’t mind if it’s not so accurate.

    Your solution: One simple solution to this problem is keep such trivial data in memcached and set up a cron job which will run every 10 minutes and populate such data in the database. 🙂

I will leave you with a presentation on memcache which I gave sometime back at office. I hope it will help you gain more understanding of memcache.


View SlideShare presentation or Upload your own. (tags: caching memcache)

I hope after reading this you are well equipped on how to design scalable systems. All the best! , do leave a comment or suggestions if any. If you liked this post, do subscribe for the next post in memcache series.