WordPress style “Duplicate comment detected” using Memcached and PHP

If you have a knack of leaving comments on blogs, chances are you might have experienced a wordpress error page saying “Duplicate comment detected; it looks as though you’ve already said that!“, probably because you were not sure that your comment was saved last time and you tried to re-post your comment. In this blog post, I will put up some sample PHP code for Duplicate comment detection using Memcached without touching the databases. Towards the end, I will also discuss how the script can be modified for usage in any environment including forums and social networking websites.

Duplicate comment detection using Memcached
Here is a php function called is_repetitive_comment which return some useful value if the comment is repetitive, otherwise FALSE.

<?php

        define('COMPRESSION', 0);
        define('SIGNATURE_TTL', 60);

        $mem = new Memcache;
        $mem->addServer("localhost", 11211);

        function is_repetitive_comment($comment, $username) { // username can be ip address for anonymous env
                                                              // for per blog/forum checks pass forum id too
                                                              // for multi-host using same memcached instance, pass hostname too
                                                              // for restricting post of same comment, don't pass username
                $comment = trim($comment);
                $signature = md5(implode('',func_get_args()));

                global $mem;
                if(($value = $mem->get($signature)) !== FALSE) {
                        error_log($signature." found at ".time());
                        return $value;
                }
                else {
                        $value = array('comment' => $comment,
                                       'by' => $username,
                                       /* Other information if you may want to save */
                                      );
                        $mem->set($signature, $value, COMPRESSION, SIGNATURE_TTL);
                        error_log($signature." set at ".time());
                        return FALSE;
                }
        }

?>

Is it working?
Lets verify the working of the code and then we will dig into the code:

  • Save the sample code in a file, name it index.php
  • Towards the end of the script add following 3 line of code:
            var_dump(is_repetitive_comment("User Comment", "username"));
            sleep(5); // Simulating the case when a user might try to post the same comment again knowingly or unknowingly
                      // Similar kind of check is done in wordpress comment submission (though without memcached)
            var_dump(is_repetitive_comment("User Comment", "username"));
  • Run from command line:
    sabhinav$ php index.php
    6105b67d969642fe9e27bc052f29e259 set at 1262393877
    bool(false)
    6105b67d969642fe9e27bc052f29e259 found at 1262393882
    array(2) {
      ["comment"]=>
      string(12) "User Comment"
      ["by"]=>
      string(8) "username"
    }
  • As seen, function is_repetitive_comment returns bool(false) for the first time. However, after 5 seconds when same comment is being submitted it throws back some useful information from previous submission.

Working of is_repetitive_comment
Here is in brief, how memcached is used for duplicate comment detection by the script:

  • SIGNATURE_TTL defines the time limit between two similar comment submissions. Default set to 60 seconds
  • is_repetitive_comment takes two parameter namely the comment itself and the username of the user trying to post the comment.
  • The function create a signature by combining the passed parameters and checks whether a key=$signature exists in memcache
  • If key is found, it means same user has posted the same comment in past SIGNATURE_TTL i.e. 60 seconds. Function simply return back the value set for the key from memcache
  • However, if key is NOT found, user is allowed to post the comment by returning FALSE. However function also sets a key=$signature into memcache

The value of key=$signature depends upon your application and use case. You might want to save some useful parameters so that you can show appropriate error message without hitting the databases for anything.

Extracting more from the sample script
Here is how you can modify the above sample script for various environments:

  • If you are performing repetitive comment check in an anonymous environment i.e. commenter may not be registered users, you can pass commenter’s ip address instead of username
  • If you serve multiple sites out of the same box and all share the same memcached instance, you SHOULD also pass site’s root url to the function. Otherwise you might end up showing error message to wrong users
  • If you want to restrict submission of same comment per blog or forum, also pass the blog id to the function
  • If you want to simply restrict submission of same comment through out your site, pass only the comment to the function

Let me know if you do similar tiny little hacks using memcached 😀

Web Security : Using crumbs to protect your PHP API (Ajax) call from Cross-site request forgery (CSRF/XSRF) and other vulnerabilities

Have your API calls ever being used directly by someone without your permission? If yes, read on to find out how can we protect our API’s from such spammers and hackers. Before we go ahead and see a possible solution for this, lets try to list out a few cases, when our API’s can be accessed without our permissions.

Common cases of vulnerable API/Ajax calls

  • Ajax calls having no user authentication: This is the first place where a spammer will try to find out a loop hole. Take this example, suppose I created a group chat plugin for my blog. Since it’s a group chat plugin, I don’t really want the blog viewers to register before they can write a messages. Blog viewer only need to provide their name, email and url (just like wordpress comments). Thereafter, they can write messages which are submitted on the server side using ajax calls. And here is the “problem”. Anyone can pick up the ajax url, write a curl script, post the required parameters and fill up my database with millions of messages.
  • Ajax calls having user authentication: One day I realize my group chat plugin has received more than 1 million messages last night (all spams). Hence I decide to make my blog viewers to register before they can post a message on the group chat plugin, simply because someone is filling up my database by simulating ajax calls through a curl script. Anyone can write a script, since these ajax call do not authenticate the user making the call. But are my ajax calls safe after forcing users to register? NO, a registered user too can simulate these ajax calls and passing authentication by sending the right cookies.

Possible solutions and their flaws
If you look around on web, you will find a bunch of solution to such problems. But then every solution have it’s own problem which forces you not to use them. Listed below are 2 possible solutions to our problem:

  • Using X-Requested-With to protect ajax calls: All famous javascript frameworks like JQuery, YUI, Mootools etc sends an additional header parameter while making an XHR request. These libraries set “X-Requested-With=XMLHttpRequest” header, which can then be used on the server side to detect if the call was made through an ajax call. But a programmer can easily pass these headers using a curl script, making the server believe that the call was made through an XHR request.
  • Using HTTP Referrer: This solution comes in handy for cases when a spammer/hacker try to POST data into your site’s. We can check for the referrer page, before we go ahead and accept the POST data. If the POST data is coming from a page within your site, you go ahead and accept the data, otherwise reject it. But this solution again have it’s shortcomings. HTTP Referrer can be tampered in certain browsers using javascript and they can also be stripped away by some proxies and firewalls.

Using crumbs
Finally the idea is to have crumbs. A unique electronic key which is shared between server and client, and which have a short life time. But how are these useful? Suppose, in my group chat module, upon page load i generate a crumb whose life time is 30 minutes (tunable). Why 30 minutes? Because, I assume my blog viewers to either engage into the group chat module or leave that specific blog post within 30 minutes.

Now whenever a user writes a message, this crumb is passed back to the server side. If user writes a message before 30 minutes, this crumb will be validated and user shout submitted. (30 minutes should take care of 99.99% of the cases). In response, server api sends back the new crumb which should be sent back with the next ajax call.

Now when a spammer try to simulate the ajax request using curl calls, he will not be able to succeed because of the absence of the crumb. But he can capture the crumb from the site and simulate the effect, right? YES he can, but we can take care of this by reducing the life time of the generated crumb.

Generating crumbs using PHP
Here are the two functions, I use to generate and verify crumbs in PHP:

        // user for whom crumb is to be generated
        $uid = "[email protected]";

        // usually $salt = DB_PASSWORD . DB_USER . DB_NAME . DB_HOST . ABSPATH;
        $salt = "abcdefghijklmnopqrstuvwxyz";

        function challenge($data) {
                global $salt;
                return hash_hmac('md5', $data, $salt);
        }

        function issue_crumb($ttl, $action = -1) {
                global $uid;

                // ttl
                $i = ceil(time() / $ttl);

                // log
                echo "Generating crumb at time:".time().", i:".$i.", action:".$action.", uid:".$uid.PHP_EOL;

                // return crumb
                return substr(challenge($i . $action . $uid), -12, 10);
        }

        function verify_crumb($ttl, $crumb, $action = -1) {
                global $uid;

                // ttl
                $i = ceil(time() / $ttl);

                // log
                echo "Verifying crumb:".$crumb." at time:".time().", i:".$i.", action:".$action.", uid:".$uid.PHP_EOL;

                // verify crumb
                if(substr(challenge($i . $action . $uid), -12, 10) == $crumb || substr(challenge(($i - 1) . $action . $uid), -12, 10) == $crumb)
                        return true;
                return false;
        }

I can generate crumbs with a simple call:

$crumb = issue_crumb(300, "group_chat_module");

where $ttl = 300 (required), $action = “group_chat_module” (optional, defaults to -1)

Later on I can verify the crumb using another call:

var_dump(verify_crumb(300, $crumb, "group_chat_module"));

I hope this helps you protecting your API’s. Let me know of better methods to stop such attacks.
Enjoy!