Cron jobs are hidden building blocks for most of the websites. They are generally used to process/aggregate data in the background. However as a website starts to grow and there is gigabytes of data to be processed by every cron job, chances are that our cron jobs might overlap and possibly corrupt our data. In this blog post, I will demonstrate how can we avoid such overlaps by using simple locking techniques. I will also discuss a few edge cases we need to consider while using locks to avoid overlap.
Cron job helper class
Here is a helper class (cron.helper.php) which will help us avoiding cron job overlaps. (See usage example below)
<?php
define('LOCK_DIR', '/Users/sabhinav/Workspace/cronHelper/');
define('LOCK_SUFFIX', '.lock');
class cronHelper {
private static $pid;
function __construct() {}
function __clone() {}
private static function isrunning() {
$pids = explode(PHP_EOL, `ps -e | awk '{print $1}'`);
if(in_array(self::$pid, $pids))
return TRUE;
return FALSE;
}
public static function lock() {
global $argv;
$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;
if(file_exists($lock_file)) {
//return FALSE;
// Is running?
self::$pid = file_get_contents($lock_file);
if(self::isrunning()) {
error_log("==".self::$pid."== Already in progress...");
return FALSE;
}
else {
error_log("==".self::$pid."== Previous job died abruptly...");
}
}
self::$pid = getmypid();
file_put_contents($lock_file, self::$pid);
error_log("==".self::$pid."== Lock acquired, processing the job...");
return self::$pid;
}
public static function unlock() {
global $argv;
$lock_file = LOCK_DIR.$argv[0].LOCK_SUFFIX;
if(file_exists($lock_file))
unlink($lock_file);
error_log("==".self::$pid."== Releasing lock...");
return TRUE;
}
}
?>
Using cron.helper.php
Here is how the helper class can be integrated in your current cron job code:
- Save
cron.helper.phpin a folder calledcronHelper - Update LOCK_DIR as per your need
- You might have to set proper permissions on folder cronHelper, so that running cron job have write permissions
- Wrap your cron job code as show below:
<?php require 'cronHelper/cron.helper.php'; if(($pid = cronHelper::lock()) !== FALSE) { /* * Cron job code goes here */ sleep(10); // Cron job code for demonstration cronHelper::unlock(); } ?>
Is it working? Verify
Lets verify is the helper class really take care of all the edge cases.
sleep(10)is our cron job code for this test- Run from command line:
sabhinav$ php job.php ==40818== Lock acquired, processing the job... ==40818== Releasing lock...
where 40818 is the process id of current running cron job
- Run from command line and terminate the cron job in between by pressing CNTR+C:
sabhinav$ php job.php ==40830== Lock acquired, processing the job...
By pressing CNTR+C, we simulate the cases when a cron job can die in between due to a fatal error or system shutdown. In such cases, helper class fails to release the lock on this cron job.
- With the lock in place (
ls -l cronHelper | grep lock), run from command line:sabhinav$ php job.php ==40830== Previous job died abruptly... ==40835== Lock acquired, processing the job... ==40835== Releasing lock...
As seen, helper class detects that one of the previous cron job died abruptly and then allow the current job to run successfully.
- Run the cron job from two command line window and one of them will not proceed as shown below:
centurydaily-lm:cronHelper sabhinav$ php job.php ==40856== Already in progress...
One of the cron job will die since a cron job with
$pid=40856is already in progress.
Working of cron.helper.php
The helper class create a lock file inside LOCK_DIR. For our test cron job above, lock file name will be job.php.lock. Lock file name suffix can be configured using LOCK_SUFFIX.
cronHelper::lock() places the current running cron job process id inside the lock file. Upon job completion cronHelper::unlock() deletes the lock file.
If cronHelper::lock() finds that lock file already exists, it extracts the previous cron job process id from the lock file and checks whether a previous cron job is still running. If previous job is still in progress, we abort our current current job. If previous job is not in progress i.e. died abruptly, current cron job acquires the lock.
This is the classic method for avoiding cron overlaps. However there can be various other methods of achieving the same thing. If you know any do let me know through your comments.
Abhi's Weblog is a collection of blog articles written by
[...] How to use locks in PHP cron jobs to avoid cron overlaps | Abhi's … Related PostsKnow About Freelance SEO Writing Jobs « Freelance Jobs Articles …All About [...]
[...] the original: How to use locks in PHP cron jobs to avoid cron overlaps | Abhi's … [...]
[...] Read the original: How to use locks in PHP cron jobs to avoid cron overlaps | Abhi's … [...]
Social comments and analytics for this post…
This post was mentioned on Twitter by imoracle: How to use locks in #PHP cron jobs to avoid cron overlaps http://goo.gl/fb/qTEk #webdevelopment #cronjob…
[...] this new post from Abhinav Singh on how to use file locking to keep your cron jobs from trying to use the same [...]
[...] this new post from Abhinav Singh on how to use file locking to keep your cron jobs from trying to use the same [...]
Nice!
However, I like to use cache solution for set lock variable with expired time other than using file.
I use a database to hold cron locks since my cron jobs are balanced across a number of web nodes. I have a problem where a script or system crash can occur and the lock will never be removed. Does anyone know of a good solution to automatically handle locks across multiple web nodes, in cases where the script crashes and the lock is still present? The processes are run through http through a load balancer on a local network.
Hi Nate,
If you use cache solution to store locking flag and set expire time for it you can solve your problem.
You can also open a socket on an unused port. Unix will never allow you to open another one.
Solo by Tim Kay uses this method: http://timkay.com/solo/ so you don’t have to put it in your code.
Another way is to avoid crons altogether and turn your script into a daemon http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php/
Hi Nate,
As ndlinh suggested you can use cache like memcached to have your locks in place.
Since memcached is a distributed solution all your nodes will be able to detect the lock. In case the process dies abruptly, the lock will expire automatically after $ttl. (Though $ttl will act as a tuning parameter here)
Hi Kevin,
Need to look into solo, seems promising.
Also, thanks for your php.js project
And yes if one can have an event based deamon, you don’t need any crons
you could use the register_shutdown_function function to unlock cleanly.
The cache solution sounds promising, at least better than my current solution. I still have the problem that is, what if the script is actually still running? In some cases running the script twice could bring the database down or corrupt data. I have seen some situations where a script takes an average of 30 minutes to run, but sometimes it takes 90+ minutes due to heavy system load. It can be very unpredictable. In a situation where you could not ever risk the possibility of the script running twice, would the only solution be to check if the script is running through apache? I think I can do that by parsing server-status.
Yeah cache solution can serve you better, but remember its only a cache. If cache is refreshed you might end up running the script twice. Check this link http://tinyurl.com/yz48ga9 in case you are using memcached.
But still 90 min or even a 30 min cron job seems like a bad solution to me. In such cases its better to break down the job into several components. Probably by knowing what exactly you are trying to achieve through these cron jobs, I can think of a better solution.
The scripts in question are creating “preferred lists” of user information based on a fairly intensive database aggregations. The lists are then stored in memcache and accessed by the application from there. I have each “preferred list” job in a separate script. There probably is some redesign that could be issued to optimize things but I am hoping to find a solid php cron solution that is as robust as a simple bash lock file implementation. This always worked so well on a single node.
dir=/var/www/html/hfs/includes/cron
lock=/var/run/cache_primer-bfs.lock
if [ ! -e $lock ]; then
trap “rm -f $lock; exit” INT TERM EXIT ERR
touch $lock
php $dir/cache_primer-bfs.php
rm $lock
trap – INT TERM EXIT ERR
else
echo “cache_primer-bfs is already running (check lock file)”
fi
Solution 1: Alright based on the job description I guess memcache based solution can serve you, though don’t rely on caches for such jobs.
Solution 2: Since your cron jobs are interacting with databases, you can very well use the db itself for cron synchronization. Have a pid column per row, which is populated by the process id and probably hostname of the cron job processing it.
Solution 3: Divide the rows in the databases based on the primary key among different cron jobs (just like consistent hashing algorithms in memcached to know which key goes to which server). So that your cron jobs on each machine know what all rows it should process. Then have a localized locking mechanism per box, just like the code you posted. And everything should work out well.
Hope it helps and let me know how it goes
[...] just read “How to use locks in PHP cron jobs to avoid cron overlaps” and I thought I would elaborate on this and provide some more examples. In order for a lock [...]
Hi,
I just wrote a post to elaborate on the above. In addition Nate, you might want to take a look at Gearman.
Cheers,
Andy
How to use locks in PHP cron jobs to avoid cron overlaps | Abhi’s Weblog…
Thank you for submitting this cool story – Trackback from Servefault.com…
[...] à abhinavsingh.com pour l'article [...]
[...] Johnstone, inspired by a previous post on file locking to avoid cron job overlaps, as posted his own method and give a few more examples [...]
[...] Johnstone, inspired by a previous post on file locking to avoid cron job overlaps, as posted his own method and give a few more examples [...]
[...] How to use locks in PHP cron jobs to avoid cron overlaps [...]
The socket idea seems better, because if the cron job dies, the socket will be freed withouth executing an external command (ps in your code).
How to use locks in PHP cron jobs to avoid cron overlaps…
In this blog post, I will demonstrate how can we avoid such overlaps by using simple locking techniques. I will also discuss a few edge cases we need to consider while using locks to avoid overlap….
[...] Read the rest of the post Random PostsPHP command line progress barGetting Started with ORM in PHPEasy PHP Photo GalleryIntroduction to do a Simple SQL SearchUsing ini files for PHP application settings [...]
Nice helper class. Thanks for your work. I would suggest defining a setLogger method to be able to inject a custom logger implementing a common interface.
Hi Zilvinas,
Thanks. Yeah this is still a very generic example and class file, surely can be customized for all good reasons
I’ve used ps command and find the php module file name with path in process list, like:
exec(‘COLUMNS=255 ps xa’, $Tasks);
In this case i can control not only presence of module in memory, but number of instances and can limit it by 1 or any…