Setting Nginx, PHP Fastcgi and XCache on a new Ubuntu

Recently, because of a mandatory VPS move I had an opportunity to migrate all my sites from apache to nginx. My old box was in a messy state and setting up a new box from scratch was always going to be fun. Here in this post, I will walk you through all the steps that helped me migrate seamlessly. Specially, how did I setup the new box ensuring zero downtime on the sites.

Ensuring zero downtime while migration:
By the time I will pin up various pieces on my new vps box, I didn’t want my site visitors to see an under-maintenance page. To ensure zero downtime, I usually follow these steps:

  • Setup the new vps with nginx, php-fastcgi, etc as described later in the post
  • To verify the setup on new vps, edit the local host file depending upon your operating system
    /private/etc/hosts (mac)
    /etc/hosts (ubuntu)
    C:WindowsSystem32driversetchosts (windows)
    

    Go ahead and add the following host entry:

    99.198.122.216 abhinavsingh.com

    where, 99.198.122.216 is my new vps ip address and abhinavsingh.com is a vhost configured to handle by nginx on my new vps

  • Now, whenever I visit http://abhinavsingh.com in my browser, my local machine will point it to my new vps box. However, all other visitors will still be served by apache on old vps.
  • After verifying the setup, I simply remove previously added host setting on my local box and update the DNS settings for my site at godaddy.

To start with, just add up the required host setting on your local system.

Installing Nginx and configuring vhosts:
Follow these steps to install nginx webserver:

  • Update and upgrade apt-get and install nginx
    sudo apt-get update
    sudo apt-get upgrade
    sudo apt-get install nginx
  • Configure vhost for nginx by creating a file /etc/nginx/sites-available/mysite.com as follows:
    server {
            listen   80;
            server_name  mysite.com;
            access_log  /var/log/nginx/mysite.access.log;
    
            root   /var/www/mysite;
            index  index.php index.html index.htm;
    
            location / {
            }
    }
  • Enable vhost by creating a symlink as follows:
    cd /etc/nginx/sites-enabled
    ln -s /etc/nginx/sites-available/mysite.com mysite.com
    sudo /etc/init.d/nginx restart
  • Assuming you have configured your local host file correctly, try visiting http://mysite.com and your browser will take it to the new vps

Setting up php-fastcgi and xcache:
Here are the steps to configure php-fastcgi and how to ensure php-fastcgi is up and running even after system reboot. We will also configure xcache for better performance.

  • sudo apt-get install php5-cgi php5-cli php5-xcache
  • Download php-fastcgi default config and place it at /etc/default/php-fastcgi
  • Download php-fastcgi init.d script and place it at /etc/init.d/php-fastcgi
  • Add php-fastcgi init.d as startup script
    update-rc.d -f php-fastcgi defaults
  • Update following fields inside /etc/php5/conf.d/xcache.ini:
    xcache.admin.user = "admin"
    xcache.admin.pass = "pass"
    xcache.size  =  128M
    xcache.count = 4

    xcache.count should ideally be equal to cat /proc/cpuinfo |grep -c processor

  • Setup xcache admin interface:
    cd /var/www
    ln -s /usr/share/xcache/admin xcache
  • Update /etc/php5/cgi/php.ini as per your requirements and start php-fastcgi process
    sudo /etc/init.d/php-fastcgi start
  • Visit xcache admin panel http://vps_ip_address/xcache

Stitching php-fastcgi and nginx vhosts:
Now lets enable php for vhosts configured with nginx:

  • Download nginx fastcgi param config file and place it at /etc/nginx/fastcgi.conf
  • Update /etc/nginx/sites-available/mysite.com with following config:
    location ~ .php$ {
                    fastcgi_pass    127.0.0.1:9000;
                    fastcgi_index   index.php;
                    fastcgi_param   SCRIPT_FILENAME /var/www/mysite$fastcgi_script_name;
                    include         /etc/nginx/fastcgi.conf;
            }
    
  • Restart nginx and we are done

Here we complete the vps setup and vhost configurations. Verify the new vps setup and once satisfied update site’s DNS settings. Another challenge involving migration from apache to nginx includes rewriting apache .htaccess rewrite rules for nginx. However, I will keep that for another post.

Setting up ejabberd 2.1.x development environment on Ubuntu

apt-get provide a convenient way of installing ejabberd on Ubuntu distributions. However, if you are an erlang developer and looking to write custom ejabberd modules, you might want to install ejabberd from the source code.

Checkout ejabberd source
To start with lets grab the ejabberd 2.1.x branch source code:

  • sudo apt-get install git-core
  • git clone git://git.process-one.net/ejabberd/mainline.git ejabberd
  • cd ejabberd
  • git checkout -b 2.1.x origin/2.1.x
  • cd src

Installing pre-requisites
Lets setup necessary pre-requisites before compiling ejabberd source code:

  • sudo apt-get install build-essential
  • sudo apt-get install automake autoconf
  • sudo apt-get install erlang erlang-manpages
  • sudo apt-get install libexpat1-dev zlib1g-dev libssl-dev

Compiling ejabberd source code
Compiling and installing is dead simple:

  • ./configure
  • make
  • sudo make install

Setting up administration account
By now we have ejabberd server setup on our ubuntu box. Lets setup the admin account:

  • sudo ejabberdctl start
  • sudo ejabberdctl register admin localhost password
  • sudo vim /etc/ejabberd/ejabberd.cfg
  • Add following configurations if not already present:
    {acl, admin, {user, "admin", "localhost"}}.
    {loglevel, 5}. %% Log level while development
  • sudo ejabberdctl restart
  • Visit ejabberd admin panel: http://localhost:5280/admin
  • When prompted enter username as [email protected] and the password you chose above while registering admin account

Resolving startup errors
You might see a few error reports in logs or elsewhere while starting ejabberd server:

  • /sbin/ejabberdctl: 340: cannot create //var/lock/ejabberdctl/ejabberdctl-1: Directory nonexistent
  • =ERROR REPORT==== 2010-03-29 21:53:03 ===
    C(<0.995.0>:ejabberd_captcha:331) : The option captcha_cmd is not configured, but some module wants to use the CAPTCHA feature.
  • =INFO REPORT==== 2010-03-29 21:53:03 ===
    I(<0.780.0>:ejabberd_rdbms:37) : ejabberd has not been compiled with relational database support. Skipping database startup.

Here is how you can resolve the above errors:

  • sudo mkdir /var/lock/ejabberdctl/ejabberdctl-1
  • Add following line in ejabberd.cfg
    {captcha_cmd, "/lib/ejabberd/priv/bin/captcha.sh"}.

    Replace /lib/ejabberd/priv/bin/captcha.sh with path to /ejabberd/tools/captcha.sh

Getting started with ejabberd development
Below are links to a few useful resources to get you started with erlang and ejabberd module development:

Memcached and “N” things you can do with it – Part 1

In my last post MySQL Query Cache, WP-Cache, APC, Memcache – What to choose, I discussed in brief about 4 caching technologies which you might have used knowingly or unknowingly.

Towards the end we came to a conclusion that memcached is the best caching solution when you are looking for speed and number of hits per second. By my experience, memcached is capable of handling more than a 100 Million PV’s per month without any problem. However, towards the end I did also discussed why memcached is unreliable and unsecure.

In this post I will dig a level deeper into memcached. For ease here is the so called table of content:

  1. Basics: Memcached – Revisited
  2. Code Sample: A memcached class and how to use it
  3. N things: What else can I do with memcached
  4. Case Study: How Facebook uses memcached
  5. DONT’s: A few things to be taken care

Basics: Memcached – Revisited
Memcached was developed by Brad when live journal was hitting more than 20 Million PV’s per day. Handling 20 Million PV’s was no joke and he needed a better solution to handle such a high traffic. Since most of the blogs doesn’t change once published, he thought of having a model where he can skip the database read for the same posts again and again or atleast reduce the number of database reads. And hence came Memcached. Memcached is a deamon which runs in the background. By deamon you may think of a process running in the background doing its job.

If you are using ubuntu or debian like me, here are the steps for installing memcached:

  1. sudo apt-get install php5-memcache
  2. Go to /etc/php5/conf.d/memcache.ini and uncomment the line ;extension=memcache.so to enable this module
  3. sudo pecl install memcache
  4. Go to php.ini file and add this line: extension=memcache.so
  5. sudo apt-get install memcached
  6. sudo /etc/init.d/memcached start
  7. Restart Apache

If you are on windows, here are the steps which will help you running memcached on windows machine:

  1. Download the memcached win32 binary from my code.google vault.
  2. Unzip the downloaded file under C:memcached
  3. As we need to install memcached as a service, run this from a command line: C:memcachedmemcached.exe -d install from the command line
  4. Memcache by default loads with 64Mb of memory which is just not sufficient enough. Navigate to HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesmemcached Server in your registry and find the ImagePath entry. Change that to “C:memcachedmemcached.exe” -d runservice -m 512
  5. Start the server by running this command: C:memcachedmemcached.exe -d start
  6. Open folder C:PHPext and check for php_memcache.dll. If you are unlucky not to have that download from here for PHP-4.x and from here for PHP-5.x
  7. Add extension=php_memcache.dll in your php.ini file
  8. Restart Apache
  9. Download the full package consisting of exe’s, classes and dll’s from here.

A few other options which you can use for starting memcached are:
memcached -d -m 2048 -l 10.0.0.40 -p 11211 , This will start memcached as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Port 11211 is the default port for memcached.

By now I assume you have a 🙂 on your face, because you have memcached deamon running on your machine. Windows users can check that by opening up the task manager and looking for a memcached.exe process. I don’t see any reason for you not smiling, but if in case you are that unlucky windows user, please leave that system and move to a unix machine. Atleast try running Ubuntu on windows by reading this post of mine How to configure Ubuntu and LAMP on Windows.

Code Sample: Memcached class
So we have memcached setup on our system. Now we will try to hook up a simple code with it, which will do all the necessary talking with the deamon. Below I will demonstrate a very basic and simple application which will help you getting started. This application contains a total of 5 files namely: database.class.php, memcache.class.php, log.class.php, index.php, memcacheExtendedStats.php.

log.class.php
This is a very basic logger class which will log every thing for you to inspect later. Here is the code:

  class logAPI {
    var $LogFile = "log.txt";

    function Logger($Log) {
      $fh = fopen($this->LogFile,"a");
      fwrite($fh,$Log);
      fclose($fh);
    }
  }

When ever the code connects with memcached deamon, or database or fail to connect to either of them , this class will log the appropriate message in log.txt

database.class.php
This is another basic database class which have methods like getData() and setData(). getData() is used while trying to retrieve rows from the database while setData() is used while updating or inserting new rows in the database. For this demo application we will only be using the getData() method. Here is the code:

  include_once("memcache.class.php");
  include_once("log.class.php");

  class databaseAPI {

    /************************/
    /** Database information **/
    /************************/
    var $dbhost = "localhost";
    var $dbuser = "root";
    var $dbpass = "";
    var $dbname = "gtalkbots";
    var $db = NULL;

    /******************************/
    // CONSTRUCTOR DEFINITION //
    /******************************/
    function __construct() {
      $this->connect();
    }

    /*************************************************/
    // Function establishes a connection to database //
    /*************************************************/
    function connect() {
      // Connect to the dbhost
      $connection = mysql_connect($this->dbhost,$this->dbuser,$this->dbpass) or die(mysql_error());

	  // If connection fails send a mail to $dbmail about the same
      if(!$connection) {
        echo "Failed to establish a connection to host";
        exit;
      }
      else {
        // Connect to dbname
        $database = @mysql_select_db($this->dbname);

	// If fails to connect to the database send a mail to $dbmail
        if(!$database) {
          echo "Failed to establish a connection to database";
          exit;
        }
        else {
          $this->db = $connection;
        }
      }
    }

    /*******************************************/
    // Function closes the database connection //
    /*******************************************/
    function close() {
      mysql_close($this->db);
    }

    /***********************************************************/
    // Function executes the query against database and returns the result set   //
    // Result returned is in associative array format, and then frees the result //
    /***********************************************************/
    function getData($query,$options=array("type"=>"array","cache"=>"on"),$resultset="") {
        // Lookup on memcache servers if cache is on
	if($options['cache'] == "on") {
	    $obj = new memcacheAPI();
	    if($obj->connect()) {
	        // Try to fetch from memcache server if present
		$resultset = $obj->getCache($query);
	    }
    	    else {
	        // Fetch query from the database directly
	    }
	}
	// If $resultset == "" i.e. either caching is off or memcache server is down
        // OR $resultset == null i.e. when $query is not cached
	if($resultset == "" || $resultset == null) {
	    $result = mysql_query($query,$this->db);
	    if($result) {
		if($options['type'] == "original") {
		    // Return the original result set, if passed options request for the same
		    $resultset = $result;
		}
                else if($options['type'] == "array") {
	            // Return the associative array and number of rows
	    	    $mysql_num_rows = mysql_num_rows($result);
		    $result_arr = array();
		    while($info = mysql_fetch_assoc($result)) {
		        array_push($result_arr,$info);
		    }
		    $resultset = array("mysql_num_rows" => $mysql_num_rows,"result" => $result_arr,"false_query" => "no");
		}
  		// Cache the $query and $resultset
		$obj->setCache($query,$resultset);
		$obj->close();
		return $resultset;

                // Free the memory
		mysql_free_result($result);
            }
            else {
	        $resultset = array("false_query" => "yes");
	        return $resultset;
            }
	}
	else {
	    // If $query was found in the cache, simple return it
	    $obj->close();
	    return $resultset;
	}
    }

    /**********************************************************/
    // Function executes the query against database (INSERT, UPDATE) types   //
    /************************************************************/
    function setData($query) {
      // Run the query
      $result = mysql_query($query,$this->db);
      // Return the result
      return array('result'=>$result,'mysql_affected_rows'=>mysql_affected_rows());
    }

  }

memcache.class.php
Memcache class consists of two methods: getCache() and setCache(). getCache() will look up for the (key,value) pair in memory. If it exists, the method unserialize it and returns back. setCache() is used to set (key,value) pair in memory. It accepts the key and value, serialize the value before storing in cache.

  class memcacheAPI {

	/* Define the class constructor */
	function __construct() {
	  $this->connection = new Memcache;
	  $this->log = new logAPI();
	  $this->date = date('Y-m-d H:i:s');
	  $this->log->Logger("[[".$this->date."]] "."New Instance Created
n"); } /* connect() connects to the Memcache Server */ /* returns TRUE if connection established */ /* returns FALSE if connection failed */ function connect() { $memHost = "localhost"; $memPort = 11211; if($this->connection->connect($memHost,$memPort)) { $this->log->Logger("[[".$this->date."]] "."Connection established with memcache server
n"); return TRUE; } else { $this->log->Logger("[[".$this->date."]] "."Connection failed to establish with memcache server
n"); return FALSE; } } /* close() will disconnet from Memcache Server */ function close() { if($this->connection->close()) { $this->log->Logger("[[".$this->date."]] "."Connection closed with memcache server
n"); $this->log->Logger("=================================================================================================
nn"); return TRUE; } else { $this->log->Logger("[[".$this->date."]] "."Connection didn't close with memcache server
n"); $this->log->Logger("=======================================================================================================
nn"); return FALSE; } } /* getCache() function will fetch the passed $query resultset from cache */ /* returned resultset is null if $query not found in cache */ function getCache($query) { /* Generate the key corresponding to query */ $key = base64_encode($query); /* Get the resultset from cache */ $resultset = $this->connection->get($key); /* Unserialize the result if found in cache */ if($resultset != null) { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was found already cached
n"); $resultset = unserialize($resultset); } else { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was not found cached in memcache server
n"); } return $resultset; } /* setCache() function will set the serialized resultset on Memcache Server */ function setCache($query,$resultset,$useCompression=0,$ttl=600) { /* Generate the key corresponding to query */ $key = base64_encode($query); /* Set the value on Memcache Server */ $resultset = serialize($resultset); if($this->connection->set($key,$resultset,$useCompression,$ttl)) { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was cached
n"); return TRUE; } else { $this->log->Logger("[[".$this->date."]] "."Query ".$query." was not able to cache
n"); return FALSE; } } }

index.php
With everything in place, its time to test memcached. We will check if memcached is working fine by running this code file twice one by one. Open command line and point to this code. Run from command line: php index.php . Then again run from command line php index.php.

  include_once("database.class.php");
  $mdb = new databaseAPI();

  $query = "SELECT * from status LIMIT 0,1000";
  $resultset = $mdb->getData($query);
  $mdb->close();

  echo "
";
  print_r($resultset);
  echo "

";

If everything is working fine, you will see a log.txt file being generated which will look as follows.

log.txt

[[2009-01-18 09:52:57]] New Instance Created
[[2009-01-18 09:52:57]] Connection established with memcache server
[[2009-01-18 09:52:57]] Query SELECT * from status LIMIT 0,1000 was not found cached in memcache server
[[2009-01-18 09:52:57]] Query SELECT * from status LIMIT 0,1000 was cached
[[2009-01-18 09:52:57]] Connection closed with memcache server
=================================================================================================
[[2009-01-18 09:53:08]] New Instance Created
[[2009-01-18 09:53:08]] Connection established with memcache server
[[2009-01-18 09:53:08]] Query SELECT * from status LIMIT 0,1000 was found already cached
[[2009-01-18 09:53:08]] Connection closed with memcache server
=================================================================================================

From the log file we can see that the 1st time results were fetched from the database and for the second time from memcached 🙂

Before we proceed further lets explain the flow of the above scripts. In index.php we create a new instance of database.class.php $mdb. Then we try to $query for 100 rows from the database. $mdb->getData($query) initiates this database fetch. As the program control goes to getData() method of database.class.php, it passed the control to getCache() method of memcache.class.php. There the code create a $key = base64_encode($query) and checks if we have the result set cached in memcached. If it doesn’t exists, it passed control back to getData() of database.class.php which fetches it from the database. After the fetch, it passes the $resultset back to setCache() method of memcache.class.php. There the setCache() method serialize the $resultset and cache it as ($key,$value) = (base64_encode($query), serialize($resultset)) in memcache.

Next time when the same query is fired and control goes to getCache() method of memcache.class.php, it fetches the result from cache, unserialize it and returns back the result to getData() of database.class.php. And thats why you see a log similar to above.

memcache.extendedstats.php
Finally it’s time to see some statistics. Here is a simple file which will show memcache status:

  $memcache_obj = new Memcache;
  $memcache_obj->addServer('localhost', 11211);

  $stats = $memcache_obj->getExtendedStats();

  echo "
";
  print_r($stats);
  echo "

";

Running it from command line using: php memcache.extendedstats.php will give you a statistic array like this.

Array
(
    [localhost:11211] => Array
        (
            [pid] => 5472
            [uptime] => 17
            [time] => 1232303504
            [version] => 1.2.5
            [pointer_size] => 32
            [curr_items] => 1
            [total_items] => 1
            [bytes] => 271631
            [curr_connections] => 2
            [total_connections] => 5
            [connection_structures] => 3
            [cmd_get] => 2
            [cmd_set] => 1
            [get_hits] => 1
            [get_misses] => 1
            [evictions] => 0
            [bytes_read] => 271705
            [bytes_written] => 271614
            [limit_maxbytes] => 536870912
            [threads] => 1
        )

)

This array tells you of a number of things about how your memcached deamon and caching architecture is performing. In short here is what each of the variable would mean:

  1. pid: Process id of this server process
  2. uptime: Number of seconds this server has been running
  3. time: Current UNIX time according to the server
  4. version: Version string of this server
  5. rusage_user: Accumulated user time for this process
  6. rusage_system: Accumulated system time for this process
  7. curr_items: Current number of items stored by the server
  8. total_items: Total number of items stored by this server ever since it started
  9. bytes: Current number of bytes used by this server to store items
  10. curr_connections: Number of open connections
  11. total_connections: Total number of connections opened since the server started running
  12. connection_structures: Number of connection structures allocated by the server
  13. cmd_get: Cumulative number of retrieval requests
  14. cmd_set: Cumulative number of storage requests
  15. get_hits: Number of keys that have been requested and found present
  16. get_misses: Number of items that have been requested and not found
  17. bytes_read: Total number of bytes read by this server from network
  18. bytes_written: Total number of bytes sent by this server to network
  19. limit_maxbytes: Number of bytes this server is allowed to use for storage.

However at this stage the figures which you might be interested in knowing are get_hits and get_misses. get_misses means number of times you requested for a key in memcache and it was not found while get_hits means number of times your requested key was successfully retrieved from memcached. Hence as expected we currently have get_misses = 1 and get_hits = 1. Try running php index.php once more and get_hits will get incremented by one.

N things: What else can I do
Till now you have memcached deamon running on your system and you know how to communicate with the deamon. You also know a very basic usage of memcached now, i.e. cache the queries to reduce load from your database. However there is a lot more you can do with memcached.

Here I would like to present before you a few applications of memcached which I learnt from my experiences. I hope they will be enough to make you think otherwise.

  1. Restricting spammers on your site : Often you will find on social networking sites like Digg, Facebook and Orkut that if you try to add several users as your friend within a short span, they (facebook) will show you a warning or they (digg) will restrict you from doing it. Similarly there are cases when you want to send a shout to more than 200 users on digg and you are restricted from doing so. How will you implement this on your site?

    Ordinary user’s solution: One solution is if a user ‘X’ add another user ‘Y’ as a friend, you will check how many friends has ‘X’ added in the past 10 minutes. If that exceeds 20, you won’t allow him to add more friends or show him a warning message. Simple enough and will work coolly. But what if your site have more than a million users, with many hackers around the world trying to keep your servers busy. As a memcached user you must have following solution in your mind:

    Your solution: As ‘X’ adds ‘Y’ in his friend list, you will set a (key,value) pair in memcached, where $key = “user_x” with $TTL = 10 minutes. For the first friend added, $value = 1. Now as ‘X’ tries to add another friend, your simply increment $value for $key = “user_x”. As this $value equals 20, and ‘X’ tried to add another friend, your code will check the value of $key = “user_x” and see if it’s present in memcached. If it’s present check for it’s value. If it’s value is equal to 20, you show a warning message to the user. Hence restricting him from adding more than 20 friends within a time span of 10 minutes. After 10 minutes, $key = “user_x” will automatically expires and your code will allow ‘X’ to add more friends. Similar solution exists if you want to stop spammers from sending message or commenting over a limit on your site. Now I see confidence building in you as a memcached programmer 😀

  2. Detecting online active/inactive users : Often you want a method which can tell your registered users, about their online friends. Or in a forum you want to tell the current reading user, about who all are reading this post. I won’t tell how an ordinary user will implement this, but as a memcached user your solution should be:

    Ordinary user’s solution: You don’t want to think of this.

    Your solution: As user ‘X’ visit post#12345 in the forum, not only you will fetch post data from the database, you also set two (key,value) pairs.

    $key1 = “post_12345”
    $value1 = [[comma separated list of user names]]
    $TTL1 = [[some arbitrary large value is fine]]

    $key2 = “user_x”
    $value2 = “post_12345”
    $TTL2 = 10 minutes, We assume a user to be inactive if he is on a particular post for more than 10 minutes (tunable), and we will mark him as inactive.

    (key1,value1) is a post specific data while (key2,value2) is a user specific data. Now every time a user visits post#12345, you will do two things. Read all comma separated user names from $value1, and then check for their corresponding $key2 value. If corresponding $key2 exists and equals to $value2 = “post_12345”, i.e. on the same post and not idle, we will keep that user name in value of $key1. However if $key2 is not present (i.e. user gone away) or value of $key2 equals to some other post, we will remove his user name from $value1. Confused 😛 , read twice and the picture will be clear.

    Can you think of a better solution? Please let me and others know about it. (Remember we are trying to detect only active/inactive users, which is not same as online/offline users)

  3. Building scalable web services : Another application of memcached lies in building scalable web services and web widgets. Gtalkbots offer a cool widget which you can put on your blog and sites to show off your aggregated status messages. See this widget live in the right hand sidebar. While building this widget, one thing which I kept in mind was that, what if someone with million hits per day put my widget on his site. Even though, Gtalkbots gets a few thousand of hits per day, it will crash, mainly because of this particular widget being placed on a high traffic site. So as a memcached user I have done the following thing.

    Ordinary user’s solution: Deprecated

    Your solution: I simply cache the widget data in memcache with $TTL = 1 hour. So every time this million hits per day site is accessed, which loads my widget million times a day, the query will be returned from cache. And hence saving my server from crashing. Fetch Gtalkbots widget from here and try putting on your site.

Alright by now you can impress your bosses with your cool memcache implementations. But wait there is a lot more you need to know. I can go on putting hundred’s of memcached applications here, but main point is, setting your mind as a memcached user. I personally have this habit of linking everything to memcached while designing a system, and if it suits my need, Bingo!.

  1. Versioning your cache keys : One disadvantage of using cache at times is that, if unfortunately something goes wrong with your data and that buggy data is cached, your users will keep seeing that buggy data until your cache expires or you manually clear off your cache. Suppose you clear your cache and then one of your fucking engineer comes running saying the data is fine.

    Ordinary user’s solution: Stop it, No more plz

    Your solution: As a memcached user, i would love to keep a Versioning system for my caches. Nothing complex, simply append “v1” (version number) to all your keys. i.e. $key = “user_x” will now become $key = “v1:user_x”. and at a place in your code you have $current_cache_version = “v1”. Now suppose you are told your data is buggy, so by the time your engineers are investigation change $current_cache_version = “v2”. This will keep your old caches, which you may want to recover after your investigation and at the same time show new data to your users.

  2. Not so frequent update of Db for trivial data : This is a site dependent application. Suppose you run a site where you are not so serious about database columns like “last_update_date”, “last_logged_in” and so on. However you still keep a track of such for analysis purpose and don’t mind if it’s not so accurate.

    Your solution: One simple solution to this problem is keep such trivial data in memcached and set up a cron job which will run every 10 minutes and populate such data in the database. 🙂

I will leave you with a presentation on memcache which I gave sometime back at office. I hope it will help you gain more understanding of memcache.

Memcache

View SlideShare presentation or Upload your own. (tags: caching memcache)

I hope after reading this you are well equipped on how to design scalable systems. All the best! , do leave a comment or suggestions if any. If you liked this post, do subscribe for the next post in memcache series.

How to get started with web development?

Many of my friends have asked me on how to start with web development. I am not expert but here in this post I would like to describe a few steps that might help one climb the ladder of web development fast. Kindly feel free to add to this blog post if you think some other methodology might be better off.

  1. Where do I start web development? Well the first and best place to start web development is your own system. Yes your very own personal computer. You don’t need a server on internet to start with. Infact I worked locally on my computer for about a month before I went for free hostings, shared hosting and finally VPS. Once you feel that you are comfortable enough with the tid-bits of web development, then you can move to one of them.
  2. How to configure my local computer for that? You have a number of options to start with. I personally started on Tomcat Server after my initial training at Oracle. Tomcat is suited if you are starting with JSP (Java Server/Servlet Pages) . I started by making a few basic pages like creating a login/registration system. I made a few and then hosted on my office computer. Distributed the links to a few around to get the much needed boost. (Yes if you are into web development you need a lot of encouragement and appreciation from others, else your enthusiasm can soon die out). However I soon realized that in todays world of web 2.0 Tomcat is not the thing. So I shifted to Apache.

    Next option you have is to go for Apache (which I use currently for all my developments). I have written 2 posts on how to configure your local system for the same. Read it here: Web Development Part 1 Apache MySQL PHP . If you are interested in starting on a unix machine (which is always the best thing to do) here is the post where I described: How to configure Ubuntu and LAMP on Windows.

  3. Things to concentrate upon initially? Before you go for any big big applications its better to get your basics right. I sucked for long while working with out dated styles and methodology. Here is a bunch of things which you should initially look to:
    • Choose the <div> style to create your HTML pages. I didn’t knew about this methodology and even went ahead to release Altertunes with <table> structure HTML pages.
    • While looking around on internet you will find loads and lot of framework offerings. They really make your work sooo simple. Takes all the burden away from you. Solves all cross browser compatibility for your application and what not. Find a list of frameworks I have tried before here: Essential Frameworks for web development
    • However one thing which you lose by using such frameworks is the insight. You never get to know what cross browser compatibility issues comes and how are they solved. They provide you easy ajax requests but you never get to know the methodology behind it. The biggest drawback which I find is that even for a simplest of application you always need to include whole bunch of files from framework hence making your application heavy for no reason.
    • Hence I finally went ahead coding everything from scratch and never ever used any framework for any of my works.
    • You need to learn how to prevent XSS (Cross browser scripting) and SQL Injection in your applications. Something very important for all your applications.

  4. Getting to a free hosting server: After initial play at your PC, its time to get to a free hosting server which brings in more knowledge and experience. I am lucky in the fact that I moved to X10 Hosting for the very first time. In my opinion they are one of the best in providing free hosts. The amount of features and largely the amount of support at their forums make them the best free hosting services around. I started with http://imoracle.x10hosting.com as my first ever site. Few applications which I build on that were:
    • User login and registration system, where user was authenticated by an link sent to him through email.
    • Inside I had a few basic applications including contact form, multiple picture uploads etc etc.
    • One of the finest feature I had in that was Gmail Contact Import script, which I made using PHP Curl application.
    • Last but not the least I built a chat application similar to browser based chat applications in google and yahoo mails. (Though not that efficient). That was first and the last time I used a framework for any of my application. It was built using Prototype Windows Class framework.

    I told you about these features as these were the applications which got me started initially. Helped me learn the tid bits of web development. May be you too can give a try to a few of them as your first few applications.

  5. Altertunes: Must be wondering why is this name up here again. Well I agree Altertunes fails as a full time product on web but then actually this self initiated project helped me learn all and everything about web development. It was hugely an ajax dominated website with ajax requests flowing in and out of everything you do inside. Basically the social networking aspect inside altertunes helped me learn most of the things. Key things inside altertunes which helped me develop my web dev skills are:
    • Social networking aspects like adding friends, sending personal messages, dedicating a song to your friends etc etc.
    • Importing Gmail, Yahoo and other email contacts. I initially used self made gmail address book import script. But then probably Gmail changed the ways on their mail and my script started failing. However I was lucky to find various services which helps you import mail contacts.
    • Creating of search bots for mp3. This was one of the things done in PERL instead of PHP. Read my blog on how to create an html parser using PERL here. This search bot was similar to any other search bot but it only used to index mp3 files in the database.
    • Use of various web services API’s like Artist Recommendation API by last.fm, Video API by YouTube, Pictures API by Google.
    • Features like creating your custom playlists, bookmarking artists, creation of analytic backend engine to provide users with the best recommended songs based on users listen history.
    • Finally, I learnt to make custom web players. Alterplayer which I made using OpenLaszlo, was something which I did out of PHP, MySQL, Apache.
    • Being at Oracle at that point in time, helped me in maintaining such a relational database with ease.
    • Auto suggest feature which is similar to google auto suggest. Altertunes used to offer you artist names as you typed in the name of the artist.
    • News from Google and Yahoo news. As there were no API’s available for this, I scrapped the google and yahoo news using PERL scripts and showed them to the users. (Illegal it was though) 😉

    However I must say that your website is not complete till you have all the look and feel and User Interface to lure the users. Initially I thought that the UI is coming up good but then as the features which I wanted to offer increased and with my sucking <table> structured HTML pages everything got screwed. However Altertunes helped me develop my web dev skills a hell lot and helped me understanding the right way of doing things. Probably picking up a small project for yourself is always the best idea, as then only you will always try to make your application look more professional.

  6. Time for a shared host : As the traffic increased on Altertunes I had to quickly shift to a shared hosting. I again went for a shared hosting at X10, where I bought a hosting server for around 80$ a year with 99% uptime till date. All my websites including Altertunes, My personal website and a few other under ground works of mine are hosted on the same shared host. Basically shared host assures you of more % uptime, more bandwidth and more forum support from X10. Those guys just go out of their ways at times to help you out.

    However you always have few drawbacks of a shared host:

    • First of all its a shared host i.e. a single CPU with many document roots (one root for each client) in it. At times hosting service providers over shoot the maximum allowed roots on a single CPU, which makes your CPU share low. And hence sucking performances.
    • Your site can go down even without a single error from your side. Yes this is the case which happened with me, when one of the other user on the same CPU set up a cron job (an automated job which runs every X hours) to take a backup of his server every 1 hour. Well in that case every single hour he used to consume maximum CPU cycles and hence all other users their started experiencing major downtimes. However X10 forum moderators are kind enough. They take immediate actions in such cases and ensure everything is fine.
    • You can’t have all PHP PEAR and Perl modules installed. I wanted to have a PHP PERL module installed for me by which I can write perl codes inside my PHP script. But as its a shared host and a package installed effects everyone on the CPU, I was denied by the X10 hosting admins.
    • Similarly for creating a upload meter application you need to have APC (Alternate PHP Cache) module enabled on your server. And I was again denied this package by the X10 hosting as they will cache all other users file which they might not want to.
    • Finally the biggest drawback is that you don’t have access to the httpd.conf i.e you apache server configuration file and your php.ini your php configuration file. Which at times you may want to change to meet your applications demand.

  7. Finally time for a VPS (Virtual Private Server) : When you start feeling you want a total control on your server and applications, its time to switch to a VPS. However they will cost you more than expected. I started with VPS at VPSFarm which I must say totally sucks. Doesn’t go well for me atleast with Zero support (‘Zero’ means zero). I took the Xen 1024 plan which costs around 40$ per month. However by the time my 1st month got over X10 started offering VPS too and with my experience and relations with them I immediately shifted to a VPS at X10. It too costs me same, around 35$ per month with more RAM and more Disk Space.

    However do not move to a VPS because you want to. It takes a lot of hard work to set it up and maintain it further. You basically have no support what so ever. Your mails not going, file not loading, server down, database not working. No one is accountable for that except you. But then you get all the controls on various configuration files which means you can fine tune the system to meet your needs.

    However one thing which I miss is the comfort of CPanel which used to come in by default with free host and shared hostings. I got so used to it in last 1 year.  Plus being from ECE background doesn’t helps me a great deal either. Anyways I suppose I am here to stay on VPS for long. I am slowly migrating all my websites to VPS. 

I guess this is all I had to share here in this post. If I have missed out on anything which many of you have asked me on chat and mails kindly feel free to ask it here. If I feel I need to add anything, will do so.

For experienced developers who accidentally read till here, feel free to add anything you want to. If you feel anything was wrong in my methodology and approach do let us all know. I am still learning.

All the best for all those starting out there.

Patience is the Key I will say finally.

——————————————————————————-
Other Posts you may want to read:
——————————————————————————-

How to configure Ubuntu and LAMP on Windows

Hello all Linux freaks,

Having already looked upon how to configure Apache-PHP-MySQL on Windows, now here I will try to explain in short how to do the same on Linux OS. I personally don’t have a seperate machine for linux. I run Ubuntu on my Windows machine using VM Ware. So before we go on to see how to configure LAMP on Ubuntu, lets see how can we have Ubuntu running on Windows.

  • For this tutorial I have used VMware-player-2.0.2-59824.exe for VM Ware installation and Ubuntu-7.04-desktop-i386.zip for Ubuntu. You will need to download the same from http://www.vmware.com/download/player/ and http://www.ubuntu.com/GetUbuntu/download
  • Install VM Ware, which is the most simple installer as you can get. At the end it will ask you to reboot the machine and kindly do not skip this step.
  • Now, open VM Ware which you have just installed and you should see something like this:

  • Now click the open button and browse to the folder where you have unzipped the Ubuntu zip file.

  • Click open Ubuntu-7.04-desktop-i386.vmx and thats it. You have just installed and configured Ubuntu on Windows. Simple, Isn’t it ? You should be seeing something like this by now:

For my system, Ubuntu automatically picked up various internet settings. However when I tried running the same from my office, I had to make appropriate changes for proxy setting. Kindly do the same for running internet on your Ubuntu.

Now an important thing before we proceed:

  1. The default administrator password for Ubuntu is ubuntu
  2. By default you are not the admin or root user. Hence you will need to prefix sudo or su to run a command as administrator in the Ubuntu terminal.

Also, before we proceed further kindly check if your Ubuntu is configured correctly for internet connection. Just check by opening this blog through mozilla in ubuntu. If it works, you are all set to configure LAMP on ubuntu.

Follow the following steps to configure LAMP on ubuntu (you need to run a few commands on your terminal window)

  • Open file at /etc/apt/source.list and uncheck the box for install from CD. This will let ubuntu install all modules directly from the repository.
  • $ sudo apt-get upgrade
  • $ sudo apt-get update
  • $ sudo apt-get install mysql-server mysql-admin apache2 php5 libapache2-mod-php5 libapache2-mod-auth-mysql php5-mysql phpmyadmin
  • $ sudo mysqladmin -u root password [YOUR_NEW_PASSWORD]
  • $ sudo /etc/init.d/mysql restart
  • $ sudo /etc/init.d/apache2 restart

Thats pretty much we need. Now let us test our configuration and LAMP setup.

  1. $ sudo vim /var/www/phpinfo.php
  2. Type in the following few lines of code in the file:
    <?php
      phpinfo();
    ?>
  3. Open up your browser and type in http://localhost/phpinfo.php
  4. If you are able to see the php config file information on your browser. Thats it.
  5. Next type http://localhost/phpmyadmin
  6. Login as root i.e. Username : root and Password : [YOUR_NEW_PASSOWRD]
  7. If you are lucky enough you will see the phpadmin console.

Congratulations ! Thats pretty much what exactly we need. Now go on to do all your web development on Linux. Hail Windows 😉

I configure all this stuff long back, so if I have missed out on some issues kindly lemme know and comment the same.