How to use locks in PHP cron jobs to avoid cron overlaps

Standard

Cron jobs are hidden building blocks for most of the websites. They are generally used to process/aggregate data in the background. However as a website starts to grow and there is gigabytes of data to be processed by every cron job, chances are that our cron jobs might overlap and possibly corrupt our data. In this blog post, I will demonstrate how can we avoid such overlaps by using simple locking techniques. I will also discuss a few edge cases we need to consider while using locks to avoid overlap.

Cron job helper class
Here is a helper class (cron.helper.php) which will help us avoiding cron job overlaps. (See usage example below)

<?php

	define('LOCK_DIR', '/Users/sabhinav/Workspace/cronHelper/');
	define('LOCK_SUFFIX', '.lock');

	class cronHelper {

		private static $pid;

		function __construct() {}

		function __clone() {}

		private static function isrunning() {
			$pids = explode(PHP_EOL, `ps -e | awk '{print $1}'`);
			if(in_array(self::$pid, $pids))
				return TRUE;
			return FALSE;
		}

		public static function lock() {
			global $argv;

			$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;

			if(file_exists($lock_file)) {
				//return FALSE;

				// Is running?
				self::$pid = file_get_contents($lock_file);
				if(self::isrunning()) {
					error_log("==".self::$pid."== Already in progress...");
					return FALSE;
				}
				else {
					error_log("==".self::$pid."== Previous job died abruptly...");
				}
			}

			self::$pid = getmypid();
			file_put_contents($lock_file, self::$pid);
			error_log("==".self::$pid."== Lock acquired, processing the job...");
			return self::$pid;
		}

		public static function unlock() {
			global $argv;

			$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;

			if(file_exists($lock_file))
				unlink($lock_file);

			error_log("==".self::$pid."== Releasing lock...");
			return TRUE;
		}

	}

?>

Using cron.helper.php
Here is how the helper class can be integrated in your current cron job code:

  • Save cron.helper.php in a folder called cronHelper
  • Update LOCK_DIR as per your need
  • You might have to set proper permissions on folder cronHelper, so that running cron job have write permissions
  • Wrap your cron job code as show below:
    <?php
    
    	require 'cronHelper/cron.helper.php';
    
    	if(($pid = cronHelper::lock()) !== FALSE) {
    
    		/*
    		 * Cron job code goes here
    		*/
    		sleep(10); // Cron job code for demonstration
    
    		cronHelper::unlock();
    	}
    
    ?>

Is it working? Verify
Lets verify is the helper class really take care of all the edge cases.

  • sleep(10) is our cron job code for this test
  • Run from command line:
    sabhinav$ php job.php
    ==40818== Lock acquired, processing the job...
    ==40818== Releasing lock...
    

    where 40818 is the process id of current running cron job

  • Run from command line and terminate the cron job in between by pressing CNTR+C:
    sabhinav$ php job.php
    ==40830== Lock acquired, processing the job...
    

    By pressing CNTR+C, we simulate the cases when a cron job can die in between due to a fatal error or system shutdown. In such cases, helper class fails to release the lock on this cron job.

  • With the lock in place (ls -l cronHelper | grep lock), run from command line:
    sabhinav$ php job.php
    ==40830== Previous job died abruptly...
    ==40835== Lock acquired, processing the job...
    ==40835== Releasing lock...
    

    As seen, helper class detects that one of the previous cron job died abruptly and then allow the current job to run successfully.

  • Run the cron job from two command line window and one of them will not proceed as shown below:
    centurydaily-lm:cronHelper sabhinav$ php job.php
    ==40856== Already in progress...
    

    One of the cron job will die since a cron job with $pid=40856 is already in progress.

Working of cron.helper.php
The helper class create a lock file inside LOCK_DIR. For our test cron job above, lock file name will be job.php.lock. Lock file name suffix can be configured using LOCK_SUFFIX.

cronHelper::lock() places the current running cron job process id inside the lock file. Upon job completion cronHelper::unlock() deletes the lock file.

If cronHelper::lock() finds that lock file already exists, it extracts the previous cron job process id from the lock file and checks whether a previous cron job is still running. If previous job is still in progress, we abort our current current job. If previous job is not in progress i.e. died abruptly, current cron job acquires the lock.

This is the classic method for avoiding cron overlaps. However there can be various other methods of achieving the same thing. If you know any do let me know through your comments.

  • Pingback: How to use locks in PHP cron jobs to avoid cron overlaps | Abhi's …

  • Pingback: Webby Scripts How to use locks in PHP cron jobs to avoid cron overlaps | Abhi's …

  • Pingback: How to use locks in PHP cron jobs to avoid cron overlaps | Abhi's … | Coder Online

  • Pingback: uberVU - social comments

  • Pingback: Abhinav Singh’s Blog: How to use locks in PHP cron jobs to avoid cron overlaps | Webs Developer

  • Pingback: Abhinav Singh’s Blog: How to use locks in PHP cron jobs to avoid cron overlaps | Development Blog With Code Updates : Developercast.com

  • ndlinh

    Nice!

    However, I like to use cache solution for set lock variable with expired time other than using file.

  • Nate

    I use a database to hold cron locks since my cron jobs are balanced across a number of web nodes. I have a problem where a script or system crash can occur and the lock will never be removed. Does anyone know of a good solution to automatically handle locks across multiple web nodes, in cases where the script crashes and the lock is still present? The processes are run through http through a load balancer on a local network.

    • ndlinh

      Hi Nate,

      If you use cache solution to store locking flag and set expire time for it you can solve your problem.

    • http://abhinavsingh.com Abhinav Singh

      Hi Nate,

      As ndlinh suggested you can use cache like memcached to have your locks in place.

      Since memcached is a distributed solution all your nodes will be able to detect the lock. In case the process dies abruptly, the lock will expire automatically after $ttl. (Though $ttl will act as a tuning parameter here)

    • Kishore Kumar

      Dear Abhinav,

      Will this work when two or more cron jobs (schedule jobs) running on the same server? I am using PHP on IIS server on Windows 2008. Please help.

    • Nate

      The cache solution sounds promising, at least better than my current solution. I still have the problem that is, what if the script is actually still running? In some cases running the script twice could bring the database down or corrupt data. I have seen some situations where a script takes an average of 30 minutes to run, but sometimes it takes 90+ minutes due to heavy system load. It can be very unpredictable. In a situation where you could not ever risk the possibility of the script running twice, would the only solution be to check if the script is running through apache? I think I can do that by parsing server-status.

    • http://abhinavsingh.com Abhinav Singh

      Yeah cache solution can serve you better, but remember its only a cache. If cache is refreshed you might end up running the script twice. Check this link http://tinyurl.com/yz48ga9 in case you are using memcached.

      But still 90 min or even a 30 min cron job seems like a bad solution to me. In such cases its better to break down the job into several components. Probably by knowing what exactly you are trying to achieve through these cron jobs, I can think of a better solution.

    • Nate

      The scripts in question are creating “preferred lists” of user information based on a fairly intensive database aggregations. The lists are then stored in memcache and accessed by the application from there. I have each “preferred list” job in a separate script. There probably is some redesign that could be issued to optimize things but I am hoping to find a solid php cron solution that is as robust as a simple bash lock file implementation. This always worked so well on a single node.

      dir=/var/www/html/hfs/includes/cron
      lock=/var/run/cache_primer-bfs.lock

      if [ ! -e $lock ]; then
      trap “rm -f $lock; exit” INT TERM EXIT ERR
      touch $lock
      php $dir/cache_primer-bfs.php
      rm $lock
      trap – INT TERM EXIT ERR
      else
      echo “cache_primer-bfs is already running (check lock file)”
      fi

    • http://abhinavsingh.com Abhinav Singh

      Solution 1: Alright based on the job description I guess memcache based solution can serve you, though don’t rely on caches for such jobs.

      Solution 2: Since your cron jobs are interacting with databases, you can very well use the db itself for cron synchronization. Have a pid column per row, which is populated by the process id and probably hostname of the cron job processing it.

      Solution 3: Divide the rows in the databases based on the primary key among different cron jobs (just like consistent hashing algorithms in memcached to know which key goes to which server). So that your cron jobs on each machine know what all rows it should process. Then have a localized locking mechanism per box, just like the code you posted. And everything should work out well.

      Hope it helps and let me know how it goes :)

  • http://kevin.vanzonneveld.net kvz

    You can also open a socket on an unused port. Unix will never allow you to open another one.
    Solo by Tim Kay uses this method: http://timkay.com/solo/ so you don’t have to put it in your code.

    Another way is to avoid crons altogether and turn your script into a daemon http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php/

    • http://abhinavsingh.com Abhinav Singh

      Hi Kevin,

      Need to look into solo, seems promising.
      And yes if one can have an event based deamon, you don’t need any crons :D Also, thanks for your php.js project

  • http://herdian.ferdianto.com/ ferdhie

    you could use the register_shutdown_function function to unlock cleanly.

  • Pingback: Lock Files in PHP & Bash | Development, Analysis And Research

  • http://www.ajohnstone.com Andrew Johnstone

    Hi,

    I just wrote a post to elaborate on the above. In addition Nate, you might want to take a look at Gearman.

    Cheers,
    Andy

  • Pingback: Servefault.com

  • Pingback: Comment utiliser les locks avec des tâches CRON en PHP ? | Alheim

  • Pingback: Andrew Johnstone’s Blog: Lock Files in PHP & Bash | Webs Developer

  • Pingback: Andrew Johnstone’s Blog: Lock Files in PHP & Bash | Development Blog With Code Updates : Developercast.com

  • Pingback: Evitar ejecuciones múltiples en cron con PHP | Sentido Web

  • http://blackshell.usebox.net/ Juanjo

    The socket idea seems better, because if the cron job dies, the socket will be freed withouth executing an external command (ps in your code).

  • Pingback: You are now listed on FAQPAL

  • Pingback: How to use locks in PHP cron jobs to avoid cron overlaps | PHP Digg

  • http://www.thedeveloperday.com Zilvinas

    Nice helper class. Thanks for your work. I would suggest defining a setLogger method to be able to inject a custom logger implementing a common interface.

  • http://abhinavsingh.com Abhinav Singh

    Hi Zilvinas,

    Thanks. Yeah this is still a very generic example and class file, surely can be customized for all good reasons :)

  • http://blog.kitamura.jp/ bgv

    I’ve used ps command and find the php module file name with path in process list, like:

    exec(‘COLUMNS=255 ps xa’, $Tasks);

    In this case i can control not only presence of module in memory, but number of instances and can limit it by 1 or any…

  • http://johnbokma.com/ John Bokma

    All solutions that do the following:

    if ( lock file doesn’t exist ) {
    create the lock file
    store data in lock file
    }

    suffer from a race condition since the checking and creating are not atomic:

    job 1 doesn’t see the lock file
    job 2 doesn’t see the lock file
    job 1 creates the lock file
    job 2 creates the lock file
    job 1 writes data to the lock file
    job 2 overwrites data in the lock file

    Ooops…

  • http://abhinavsingh.com Abhinav Singh

    Hi John,

    Absolutely right. However this script is intended to prevent cron job overlapping when a previous cron job is still in action.

    I rumbled something on atomic locking sometime back here: http://tinyurl.com/yz48ga9

  • http://studyhat.blogspot.com rajat

    hi abhi,

    i need u r help badly man

  • http://abhinavsingh.com Abhinav Singh

    Hi Rajat,

    What kind of help are you looking for? Can you elaborate here or contact me via email/chat?

  • Pingback: PHP – Evitar que un proceso cron se ejecute más de una vez al mismo tiempo « El bit campeador

  • Pingback: Lock Files in PHP & Bash – Missing | Development, Analysis And Research

  • Stefan

    Hello,

    Thank you for your post. Interesting topic. Might an alternative solution be to use a file locked with flock()?

    1. Attempt to obtain an exclusive lock on blank file.
    2. If lock fails task is running so exit.
    3. If lock succeeds run the tasks and finally unlock the file.

    In the event of a fatal script error or script completion PHP will automatically unlock the file.

    Pros:
    – Fatal script errors handled automatically.
    – Simple.

    Cons:
    – Differences in lock handling between platforms (My windows box will only recognise non-blocking locks when run as CLI).
    – No way of knowing if the previous tank crashed – you only know it finished (though in my case I’m writing script start and end times to a DB so runs with no end time could be considered crashed).

    Example Code:

    <?php
    $fp = fopen('task.lock', 'a') ;

    //try and get a lock
    if(flock($fp, LOCK_EX | LOCK_NB)) {
    //success! look busy.
    echo 'Obtained lock';
    ftruncate($fp, 0) ;
    fputs($fp, 'Locked by PID '.getmypid().' @ '.date('U'));
    sleep(10);
    //unlock file on completiion
    flock($fp, LOCK_UN) ;
    } else {
    //File is already locked by previous cron task – exit.
    echo 'Could not lock file';
    exit(-1);
    }

    fclose($fp);

  • http://abhinavsingh.com Abhinav Singh

    Heylo friends,

    I have create a repo for this utility script at https://github.com/abhinavsingh/cronHelper

    Kindly feel free to fork it and contribute your ideas to make this a more generic utility class

  • Paul

    Thanks for sharing this! I been trying to understand why I was getting a “fork: service temporary unavailable” for days! my computer was almost out of memory and I stumbled upon your post and your code. I implemented it and did the job! Now my processes lists is clean and my crons work fine and I also get some free memory that was previously consumed by overlapping crons

  • http://www.colab-aktiv.com ChrisG

    Works as advertised, I especially like the check to see if the job is actually run rather than just relying on the existence of the file. This means that it gets unlocked if the job abends abnormally.

    thanks

  • http://www.microcerdos.com.ar Babblo

    A little contribution:

    replace

    $lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;

    with

    $lock_file = LOCK_DIR.basename($argv[0]).LOCK_SUFFIX;

    to avoid problems calling the script with full paths.

  • Sunil

    Hi Abhinav,

    This is good, but as I test it from command line then a lock is created but when I test it through cron then lock file in not created in the cronHelper directory. while all the code is continuing to execute. Since it get the pid from the lock().

    Please suggest, what should I do??

  • Pawan

    Hi Abhinav,
    I am really interested and excited to see your solution for the problem I am facing. I am running ISPCONFiG server on ubuntu 12.04. I have many cronjobs php files created in joomla articles and are run through ISPCONFIG cronjobs which do overlaps.

    I am very new to linux system and have no idea where I should put your helper file and path I should give in the php file for including the helper file.
    Thanks.

  • Pingback: Impedir que um Script PHP seja executado ao mesmo tempo no Cron – DevHouse Internet Software Development House