Customizing Redis pubsub for message persistence – Part 2

Redis Logo

In the last post we saw how Redis can easily be modified to persist the last published message on PubSub channels. Without subscribing to the PubSub channel we were able to get the last published message from Redis db. In this post, I will take that idea one step ahead and add native capabilities within Redis to persist all the unprocessed messages published on PubSub channel in channel specific lists. We’ll also preserve our capability to send the last published message to clients upon subscription.

But why are we doing this?

Popular open source application that provide support for Redis are based out of it’s list class of API. For example, let’s look at Celery which is a distributed task queue written in Python. Start redis-cli MONITOR on a terminal and then start celery in another window as follows:

$ pip install celery
$ celery worker --queues=testing --broker=redis:// --without-gossip --without-mingle --without-heartbeat

You’ll find celery polling Redis periodically as indicated by log lines like "BRPOP" "testing" "testing\x06\x163" "testing\x06\x166" "testing\x06\x169" "1". Celery uses the Redis PubSub mechanism (that we disabled in the above command) only for internal features. Sentry, a popular exception logging and aggregation library, internally depends upon Celery. There is an open pull request that claims to add Redis pubsub based support to Celery. In the world of Ruby, background processing frameworks like Requeue and Sidekiq depend upon Redis list class of API’s.

However, with no native support for persistence of PubSub messages in Redis, it’s not difficult to understand why adopting to Redis PubSub can be tricky for some. Currently, Redis simply drops the message if no subscribers are found. Hence, question really is whether your application is tolerant to loss of published messages (for example, dropped messages while you were upgrading your application) ?

To solve persistence problem with Redis pubsub, the usual approach is to start multiple application instances. Some instances can continue to serve while others get deployed. However still, your active instances might be experiencing a network partition and unable to receive published messages. After all, primary goal is to guarantee processing of every message received by Redis irrespective of whether we are using list or pubsub based backend. A native support to solve persistence problems with Redis PubSub is clearly desirable.

Persisting dropped Redis PubSub messages in a list

In the last post we added a single line of code to persist the last published message on channels in a separate Redis key. We’ll update implementation to push every received message at the end of a channel specific list. Replace `setKey(c->db, c->argv[1], c->argv[2]);` line that we added the last time with following code:

// Persist messages in list only if no receivers were found
if (receivers == 0) {
    int j, pushed = 0, where = REDIS_TAIL;

    // Fetch list key from the database
    robj *lobj = lookupKeyWrite(c->db,c->argv[1]);

    // For every published message on the channel
    for (j = 2; j < c->argc; j++) {
        c->argv[j] = tryObjectEncoding(c->argv[j]);

        // Ensure we have our quicklist initialized
        if (!lobj) {
            lobj = createQuicklistObject();
            quicklistSetOptions(lobj->ptr, server.list_max_ziplist_size,
                                server.list_compress_depth);
            dbAdd(c->db,c->argv[1],lobj);
        }

        // Push message at the tail of the list
        listTypePush(lobj,c->argv[j],where);
        pushed++;
    }

    // Signal key watchers and internal event subscribers
    if (pushed) {
        char *event = (where == REDIS_HEAD) ? "lpush" : "rpush";
        signalModifiedKey(c->db,c->argv[1]);
        notifyKeyspaceEvent(REDIS_NOTIFY_LIST,event,c->argv[1],c->db->id);
    }
    server.dirty += pushed;
}

I have added some comments in the code for clarity. This code is merely is a rip off of src/t_list.c:pushGenericCommand function. We don’t want to send replies to the client that are usually sent after an RPUSH command. Frankly, we can further refactor pushGenericCommand function to turn the above code into a single function call.

make test and start ./src/redis-server. Using ./src/redis-cli try:

$ ./src/redis-cli 
127.0.0.1:6379> publish persistent-channel this
(integer) 0
127.0.0.1:6379> publish persistent-channel is
(integer) 0
127.0.0.1:6379> publish persistent-channel gonna
(integer) 0
127.0.0.1:6379> publish persistent-channel be
(integer) 0
127.0.0.1:6379> publish persistent-channel awesome
(integer) 0
127.0.0.1:6379> lrange persistent-channel 0 -1
1) "this"
2) "is"
3) "gonna"
4) "be"
5) "awesome"
127.0.0.1:6379>

Voila! Since we published a few messages with no active subscriber, they all got persisted in a list named after the channel itself. Now incoming subscribers can process pending messages by fetching them from the list which otherwise would have been dropped.

Delivering unprocessed messages to subscribers upon subscription

Instead of depending upon clients to poll for channel list length, server can deliver unprocessed messages to subscribers upon successful subscription. Since this can get overwhelming for subscribers if there are several pending messages waiting in the list, we may not want to do this at all and leave it up to the clients. Let’s preserve our feature from the last post i.e. to deliver the last published message to clients upon subscription.

Here we don’t want to remove the last published message from our persistent list. We simply wish to send it to the incoming subscribers. For example, imagine cases like user status, location and mood being published on separated channels. Here is a method that will give back the last element from the list without removing it (equivalent to LRANGE key -1 -1):

robj *llast(redisClient *c, robj *key) {
    listTypeEntry entry;
    robj *o, *value = NULL;
    long llen;

    // Fetch list object from db
    o = lookupKeyRead(c->db, key);

    // Ensure key contains a list
    if(o != NULL && o->encoding == REDIS_ENCODING_QUICKLIST) {
        // Get list iterator for "lrange key -1 -1" use case
        llen = listTypeLength(o);
        listTypeIterator *iter = listTypeInitIterator(o, llen-1, REDIS_TAIL);

        // Fetch last item, prepare value
        listTypeNext(iter, &entry);
        quicklistEntry *qe = &entry.entry;
        if (qe->value) {
            value = createStringObject((char *)qe->value,
                                       qe->sz);
        } else {
            value = createStringObjectFromLongLong(qe->longval);
        }
        listTypeReleaseIterator(iter);
    }
    return value;
}

We now need to replace our modifications within subscription handling methods from the last post. Note, callee must call decrRefCount on the returned robj to free up created string object after delivery. Kindly check this github commit for all the changes since the last post. You can also checkout my Redis fork and directly play with the modified code.

Conclusion

We saw how easy it is to modify Redis for fun and profit. By adding native persistence capabilities, we offload our task of ensuring processing of every message received by Redis cluster. After all, none of the magical client side logics will ever be as reliable as native support. By the way, Redis 3.0.0 was released this week with native support for clustering, checkout while it’s hot.

Leave your comments and feedback.

Read More

Customizing Redis pubsub for message persistence

Redis Logo

Redis comes packed with a simple yet powerful PubSub API.  It provides low latency and scales well.  A message published on a channel is received by subscriber(s) at the other end.  However, if no active subscriber is found the message is simply lost.  This drawback puts Redis out of the probables list for several use cases where message persistence of unprocessed published messages is desired.  It’s also probably a reason why several open source projects that support Redis as a broker are based upon it’s list push / pop API.  In this post I will demonstrate how to modify Redis PubSub API to support message persistence, opening possibilities for several interesting use cases.

Last Published Message

Ability to fetch the last published message on a particular channel without subscribing to the channel opens doors for several interesting use cases.  src/pubsub.c:publishCommand is where Redis handles publish command.  Let’s add a line of code to persist the most recently published message on a channel:

void publishCommand(redisClient *c) {
    ....

    /* Persist last published message in channel specific key */
    setKey(c->db, c->argv[1], c->argv[2]);

    addReplyLongLong(....);
}

Above, we added a call to src/db.c:setKey function that sets the value of key c->argv[1] (channel name) to c->argv[2] (published message).

Run make from the project root directory and start ./src/redis-server. Now we can do something like:

127.0.0.1:6379> publish channel1 c1m1
(integer) 0
127.0.0.1:6379> get channel1
"c1m1"
127.0.0.1:6379> publish channel1 c1m2
(integer) 0
127.0.0.1:6379> get channel1
"c1m2"

Voila. We published a message with no subscriber. However, an incoming user can still be served with the last published message on the channel by fetching the value of key channel1 without explicitly subscribing to the channel.

Let’s take this idea one step ahead. XMPP Publish-Subscribe (XEP-0060) defines a specification for receiving the last published item. It says,

When a subscription request is successfully processed, the service MAY send the last published item to the new subscriber.

Let’s add this idea to Redis PubSub mechanism. src/pubsub.c:subscribeCommand function is where Redis processes channel subscription requests. Add the following lines of code at the end of this function.

void subscribeCommand(redisClient *c) {
    ....

    /* Send last received message on the subscribed channel(s) */
    robj *o;
    for (j = 1; j < c->argc; j++) {
    	o = lookupKeyRead(c->db, c->argv[j]);
    	if(o != NULL) {
		addReply(c,shared.mbulkhdr[3]);
		addReply(c,shared.messagebulk);
		addReplyBulk(c,c->argv[j]);
		addReplyBulk(c,o);
    	}
    }
}

Here, post subscription, we fetch and send the last published message for all channels that the client just subscribed to. make and restart ./src/redis-server. Now on a new ./src/redis-cli terminal subscribe to channel1:

127.0.0.1:6379> subscribe channel1
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "channel1"
3) (integer) 1
1) "message"
2) "channel1"
3) "c1m2"

Voila! Now Redis server will send the last published message upon subscription. But what about PSUBSCRIBE use case?

src/pubsub.c:psubscribeCommand handles pattern based channel subscription logic. Add following lines of code at the end of this function:

void psubscribeCommand(redisClient *c) {
    ....

    /* Send last received message on the channel(s) matching subscribed patterns */
    for (j = 1; j < c->argc; j++) {
    	robj *pat = c->argv[j];
    	dictIterator *di = dictGetIterator(server.pubsub_channels);
    	dictEntry *de;
    	while((de = dictNext(di)) != NULL) {
		robj *cobj = dictGetKey(de);
		sds channel = cobj->ptr;
		if (stringmatchlen((char*)pat->ptr,
				sdslen(pat->ptr),
				(char*)channel,
				sdslen(channel), 0)) {
			robj *o = lookupKeyRead(c->db, cobj);
			if(o != NULL) {
	                addReply(c,shared.mbulkhdr[4]);
	                addReply(c,shared.pmessagebulk);
	                addReplyBulk(c,pat);
	                addReplyBulk(c,cobj);
	                addReplyBulk(c,o);
			}
		}
	}
	dictReleaseIterator(di);
    }
}

Above, for every subscribed pattern, we iterate over active server.pubsub_channels and check if the active channel matches the subscription pattern. On match we fetch and send the last published message on the channel to the client.

With previous redis-cli subscribe terminal running, open a new terminal and try:

127.0.0.1:6379> psubscribe channel*
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "channel*"
3) (integer) 1
1) "pmessage"
2) "channel*"
3) "channel1"
4) "c1m2"

Next

You can checkout my Redis fork and commits under pubsub-persistence branch. Enhancements described above can also be found on this github commit.

Currently it is unclear what Antirez (Sanfilippo Salvatore) plans to do further with PubSub in Redis. It stands on a solid base and recent efforts are rightly put behind Redis cluster. However, I see some interesting enhancements that can be made to Redis PubSub mainline. In the next post I will take the current idea one step ahead and add persistence support for all or only unprocessed published messages in a Redis list (possibly with a cap or expiration on persisted messages).

Read More

Back to blogging: What to expect

Hello Readers,

I started this blog as a way to share my experiments and experiences while learning web development and computer science in general. In the first 2 years (between Apr’08 and Aug’10) I wrote as many as 100 blog posts. Quite a frenzy. Ever since, I only managed to write 5-6 posts in the following 4 years, about nearly 45 drafts which may now never get published. Good thing is that, I am back to blogging, which means a lot to share.

Briefly, here is what (or what not) to expect in the future posts:

  1. PHP – In past, PHP has dominated the content on this blog. Mostly web demos, some quick hacks or some JAXL library examples. However, I am no longer working actively with PHP since ’10 and probably never saw it after ’12. Expect zero PHP.

  2. JAXL – No more PHP essentially means no more JAXL posts. In fact, I recently moved JAXL repository to it’s own Github organization where other collaborators can maintain, improve and work on it without requiring my active involvement. This organization also contains other repositories that I managed to open source from my startup Jaxl.

  3. XMPP – Unfortunately, I am no longer in touch with progress on XMPP specifications. Specs has evolved a lot, to an extent that some developers have reported mod_message_carbon no longer works as expected with new Ejabberd server version (also, Message Carbon extension XEP-0280 has itself been deprecated). However, XMPP will always be my preferred choice whenever I need entire suite of user-to-user, group messaging, presence, contacts management, Jingle / SIP integration and other features baked into XMPP XEPs. For my everyday messaging needs, new technologies like ZeroMQ, AMQP (RabbitMQ), MQTT or even Redis PubSub are more suitable.

  4. Java – After some journey I am now finally working full-time with Java. I still hate it but trying to adapt, learn and love it for at least what it’s worth for.

  5. Python – Thanks to my stint with Appurify, I had a chance to work full-time with Python. I even managed to work on some interesting open source Python projects. Even though now it’s no longer my primary language, Python is always fun specially when one is in a hurry of getting things done.

  6. Golang / Erlang – I met Golang a year back. I met Erlang while hacking Ejabberd, Riak etc for my startup Jaxl and immediately fell in love with it. Nowadays, I am in love with Golang. It’s simple and precise, has similar message passing semantics (buffered channels) as found in Erlang (mail boxes). I highly recommend digging into these languages and getting comfortable with message passing programming paradigm. They will change how you approach and think about your application structure. Expect lots of Golang and some Erlang.

  7. Docker – Who is not into docker these days? If that’s not the case with you, leave this post right now and head over to docker user guide. That’s how important I find this piece of beauty (technology). Expect a lot about docker in my future posts.

  8. Startups – A lot of startup fun has kept me busy since ’10, some experiences and learnings are worth sharing.

  9. Android – I have been working full-time with mobiles (both Android and iOS) since ’12. Not much of application making but a lot of hacking with Adb protocol and libimobiledevice.

  10. System designing – Luckily, I happened to experience a lot of end-to-end system and network designing. This domain is of great interest once you start to have fun with Racks, Subnet, Routes, Switches, Firewalls, DNS, Multi-cast and entire suite of technology under this umbrella.

Will end this post with some interesting images from the past.

Fsck'd iPhone screen
Swollen iPhone screen due to high device temperature
Rackframe
Setting up Racks

Read More

How to perform X-FACEBOOK-PLATFORM and Google Talk X-OAUTH2 XMPP authentication with PHP Jaxl library

Ever since Jaxl library first introduced support for X-FACEBOOK-PLATFORM XMPP authentication mechanism, it has changed significantly. Also, Google Talk now supports OAuth 2.0 Authorization, an XMPP extension to allow users to log in using OAuth 2.0 credentials.

Both these mechanisms are a big win for XMPP developers, since real-time conversation experience can now be provided to their application users without asking them for their passwords. In this blog post, I will demonstrate how to perform X-FACEBOOK-PLATFORM and X-OAUTH2 XMPP authentication mechanism using Jaxl v3.x PHP Library.

X-FACEBOOK-PLATFORM XMPP Authentication
Here is a quick guide on how to perform X-FACEBOOK-PLATFORM XMPP authentication using xfacebook_platform_client.php which comes bundled with Jaxl v3.x examples:

  • Visit Facebook Developer Apps page and register your application
  • Once registered, visit access token tool to get required parameters to perform X-FACEBOOK-PLATFORM authentication Facebook Access Token Tool
  • Click on the debug button next to User Token and make sure xmpp_login is one of the extended permissions (scope)
  • Enter downloaded Jaxl library folder and run from command line as follows:

    $ php examples/xfacebook_platform_client.php fb_user_id_or_username fb_app_key fb_access_token

You can now take the source code of xfacebook_platform_client.php and customize it for your application needs.

Google Talk X-OAUTH2 XMPP Authentication
Here is a quick guide on how to perform Google Talk X-OAUTH2 XMPP authentication using xoauth2_gtalk_client.php which comes bundled with Jaxl v3.x examples:

  • Visit Google OAuth Playground and input https://www.googleapis.com/auth/googletalk as the required scope. Press “Authorize API” and then “Allow Access” button on the redirected page
  • In step 2, simply press “Exchange authorize code for tokens” and copy the access token
  • Enter downloaded Jaxl library folder and run from command line as follows:

    $ php examples/xoauth2_gtalk_client.php [email protected] access_token

You can now take the source code of xoauth2_gtalk_client.php and customize it for your application needs.

Wasn’t that simple :)

Read More

JAXLXml – Strophe style XML Builder : Working with Jaxl – A Networking Library in PHP – Part 2

Prior to Jaxl v3.x, the most ugliest piece of code inside Jaxl library was handling of XML packets. If you are working with XMPP protocol which is all about sending and receiving XML packets, it can become a nightmare if you don’t have a proper XML manipulation library in your toolkit. For Jaxl v3.x, first thing I decided to write was JAXLXml class, which is a custom XML packet implementation with no external dependencies and is an extension over the ideas from Strophe.Builder class written by Jack Moffitt.

JAXLXml is generic enough to find a place inside any PHP application that requires easy and elegant XML packet creation. In this blog post, I will give an exhaustive overview of how to create XML packets using JAXLXml class.

JAXLXml Constructor
Depending upon the need, there are several different ways of initializing a JAXLXml object:

  • $xml_obj = new JAXLXml($name, $ns, $attrs, $text);
  • $xml_obj = new JAXLXml($name, $ns, $attrs);
  • $xml_obj = new JAXLXml($name, $ns, $text);
  • $xml_obj = new JAXLXml($name, $attrs, $text);
  • $xml_obj = new JAXLXml($name, $attrs);
  • $xml_obj = new JAXLXml($name, $ns);
  • $xml_obj = new JAXLXml($name);

where:

  • $name – the XML node name
  • $ns – the XML namespace
  • $attrs – Key-Value (KV) pair of XML attributes
  • $text – XML content

Here are a few examples for each constructor style shown above:

JAXLXml will sanitize attributes and text values as shown below:

Manipulating Attributes, Child Nodes and Content
Below is an exhaustive list of methods available over initialized JAXLXml object $xml_obj for manipulating attributes, child nodes and content:

  • c($name, $ns=null, $attrs=array(), $text=null) : Append a child node at current rover and update the rover to point at newly added child node. Rover is nothing but a pointer indicating the level in the XML tree where this and other methods will perform. When an JAXLXml instance is initialized, rover points to the top level node.
  • cnode($node) : Append a child node given by $node (a JAXLXml object) at current rover and update the rover to point at newly added child node.
  • t($text, $append=FALSE) : Update text of the node pointed by current rover
  • top() : Move rover back to the top in the XML tree
  • up() : Move rover one step up the XML tree
  • attrs($attrs) : Merge new attributes specified as KV pair $attrs with existing attributes at the current rover.
  • match_attrs($attrs) : Accepts a KV pair of attributes $attrs, return bool if all keys exist and have same value as specified in the passed KV pair.
  • exists($name, $ns=null, $attrs=array()) : Checks if a child with $name exist. If found, return matching child as JAXLXml object otherwise false. If multiple children exist with same name, this function will return on first matching child
  • update($name, $ns=null, $attrs=array(), $text=null) : Update $ns, $attrs and $text (all at once) of an existing child node $name
  • to_string($parent_ns=null) : Return string representation of JAXLXml object

Method Chaining
The best thing one will find while working with JAXLXml class is that all the above methods are chain-able i.e. Any complex XML structure can be built with a single line of code.

Here is an example building a fairly nested XML structure in a single line of code:

Read More

Working with Jaxl – A Networking Library in PHP – Part 1 – An Introduction, Philosophy and History

Development of Jaxl library started way back in December’07 while I was working on a self-initiated project called Gtalkbots. The project is now dead, if you are interested in knowing more about it go through Gtalkbots BlogSpot. Jaxl v1.x was first released in Jan’09 and about a year later in Aug’10 Jaxl v2.x was released. First two versions were released as JAbber XMPP Library for writing clients and external server components.

While working on my startup Jaxl – A Platform As A Service (PAAS) for developing real-time applications, I started experiencing v2.x limitations when my external server side components were unable to process XMPP packets at the speed they were sent by ejabberd server. I started restructuring and refactoring the library which gave birth to Jaxl v3.x. Since v3.x was initially being used for developing the entire infrastructure, it shaped up as a networking library in PHP with stable support for XMPP protocol. However, later I had to rewrite several infrastructure components in Erlang Programming Language due to several issues that PHP as a language couldn’t solve (after all PHP wasn’t made for such tasks). Finally in April’12, Jaxl v3.x was open sourced.

Jaxl v3.x is an asynchronous, non-blocking, event based networking library in PHP for writing custom TCP/IP client and server implementations. From previous versions, Jaxl library inherits a full blown stable support for XMPP protocol stack. In v3.0, support for HTTP protocol stack was also introduced. At the heart of every protocol stack sits a Core stack. It contains all the building blocks for everything that we aim to do with Jaxl library. Both XMPP and HTTP protocol stacks are written on top of the Core stack. Infact the source code of these protocol implementations knows nothing about the standard (inbuilt) PHP socket and stream methods.

Philosophy
Jaxl is designed to work asynchronously in a non-blocking fashion and provides an event based callback API. Now what does all that mean?

By non-blocking and asynchronous it means, when a library function like:
$jaxl->send($stanza); is called, it will return immediately i.e. this function call will NOT block any further execution of your application script until $stanza has actually been sent over the connected TCP socket. Infact, when this function is called, passed $stanza object is put into an output buffer queue, which will be flushed as and when underlying TCP socket is available for writes. Similarly, most of the available methods (wherever required and possible) inside Jaxl library are non-blocking and asynchronous in nature.

By event based callback API it means, application code will need to register/add callbacks over necessary events as they occur inside Jaxl instance lifecycle. A list of available event callbacks with some explanation can be found here. For example, most of the XMPP applications will usually register a callback over on_auth_success event. As and when this event occurs inside Jaxl instance lifecycle, registered function will be callback’d with necessary parameters (if any).

Related Links

  • Read library documentation
  • Download the latest and greatest source from GitHub.
  • Have any Question? Want to discuss? Need Help? Use Google Group/Forum.
  • Found something missing or a bug in the source code? Kindly report an issue.
  • Fixed a bug? Want to submit a patch? Want to improve documentation? Checkout source code and contribute to the library

XMPP Application Examples

HTTP Application Examples

Stay Tuned
In coming weeks, under this series of blog posts titled “Working with Jaxl – A Networking Library in PHP”, I will cover following major topics with sample code:

  • Explanation of each Core stack class and how to use them
  • Design of each XMPP and HTTP stack class
  • XMPP over HTTP
  • XMPP File Transfer and Multimedia Sessions
  • Understanding and Using External Jabber Components
  • Asynchronous Job/Task Queues
  • Developing Concurrent and Parallel Systems

If you have any specific topic that you would like me to be cover, kindly let me know via your comments here.

Read More

Announcing Jaxl v3.x – asynchronous, non-blocking I/O, event based PHP client/server library

Jaxl v3.x is a successor of v2.x (and is NOT backward compatible), carrying a lot of code from v2.x while throwing away the ugly parts. A lot of components have been re-written keeping in mind the feedback from the developer community over the last 4 years. Also Jaxl shares a few philosophies from my experience with erlang and python languages.

Jaxl is an asynchronous, non-blocking I/O, event based PHP library for writing custom TCP/IP client and server implementations. From it’s previous versions, library inherits a full blown stable support for XMPP protocol stack. In v3.0, support for HTTP protocol stack was also added.

At the heart of every protocol stack sits a Core stack. It contains all the building blocks for everything that we aim to do with Jaxl library. Both XMPP and HTTP protocol stacks are written on top of the Core stack. Infact the source code of protocol implementations knows nothing about the standard (inbuilt) PHP socket and stream methods.

Source code on GitHub

Examples

Documentation

Group and Mailing List

Create a bug/issue

Read why v3.x was written and what traffic it has served in the past.

Read More

How to Write a Spelling Corrector in Erlang (ESpell)

Erlang is a beautiful programming language from Ericsson which i first came across while cutomizing authentication flow of ejabberd about 2 years back. Ever since then I have been using erlang for all my application backend needs including custom http server, custom bosh conn. manager, xmpp components and clients, … Recently i have even started churning my application html pages via erlang using erlydtl (an Erlang implementation of the Django Template Language).

Years ago, i gave a successful shot at implementing Peter Norvig’s Spell Corrector in PHP. Last weekend i attempted the same “Spell Corrector” algorithm in about 45 lines of Erlang code.

ESpell:

Complete code file with comments and explaination can be found here:
https://github.com/abhinavsingh/espell

-module(espell).
-define(alphabet, "abcdefghijklmnopqrstuvwxyz").
-export([start/0, correct/1]).

%%
%% API Functions
%%

%% @doc start spell checker
start() -> train(words()).

%% @doc returns most probable correct candidate with score
correct(Word) ->
	lists:foldl(
		fun(Candidate, {Correction, Score}) -> 
			case ets:lookup(?MODULE, list_to_binary(Candidate)) of
				[{_, Counter}] when Counter > Score -> {Candidate, Counter};
				_ -> {Correction, Score}
			end
		end, {Word, 0}, get_candidates(Word)).

%%
%% Local Functions
%%

words() ->
	{ok, Bin} = file:read_file("../priv/big.txt"),
	{ok, Words} = regexp:split(binary_to_list(Bin), "[^a-zA-Z]"),
	lists:map(fun(X) -> string:to_lower(X) end, Words).

train(Features) ->
	io:fwrite("training initial word list...~n"),
	ets:new(?MODULE, [set, named_table]),
	lists:foreach(fun(X) ->
		case ets:insert_new(?MODULE, {list_to_binary(X), 1}) of
			false -> ets:update_counter(?MODULE, list_to_binary(X), 1);
			true -> true
		end
	end, Features),
	io:fwrite("training complete...~n"),
	ok.

edits1(Word) ->
	Splits = lists:foldl(fun(I, Acc) -> Acc ++ [{string:substr(Word, 1, I), string:substr(Word, I+1, string:len(Word)-I)}] end, [{"", Word}], lists:seq(1, string:len(Word))),
	Deletes = [A ++ string:substr(B, 2) || {A,B} <- Splits, B =/= []], 	Transposes = [A ++ string:substr(B, 2, 1) ++ string:substr(B, 1, 1) ++ string:substr(B, 3) || {A,B} <- Splits, string:len(B) > 1],
	Replaces = [A ++ binary_to_list(<>) ++ string:substr(B, 2) || {A,B} <- Splits, B =/= [], C <- ?alphabet], 	Inserts = [A ++ binary_to_list(<>) ++ B || {A,B} <- Splits, C <- ?alphabet], 	lists:usort(Deletes ++ Transposes ++ Replaces ++ Inserts).  %%edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1)]). known_edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1), ets:member(?MODULE, list_to_binary(E2))]). known(Words) -> lists:usort([Word || Word <- Words, ets:member(?MODULE, list_to_binary(Word))]).  get_candidates(Word) ->
	C1 = known([Word]),
	if 
		length(C1) > 0 -> C1;
		true -> C2 = known(edits1(Word)),
			if
				length(C2) > 0 -> C2;
				true ->	C3 = known_edits2(Word),
					if length(C3) > 0 -> C3; 
					true -> [Word] end
			end
	end.

Try It Out:
espell provides 2 simple function for all it’s working:

  • start() : start espell which initiates reading initial data and training phase
  • correct(Word) : this accepts 1 parameter, which is the word you want to correct. It returns a 2-tuple, where 1st element is the correct word and 2nd element is a score (which right now simply means number of times correct word was seen in training data set)
$ cd espell
$ erlc -o ebin/ src/espell.erl
$ erl -pa ebin/
Erlang R14B03 (erts-5.8.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.4  (abort with ^G)
1> espell:start().
training initial word list...
training complete...
ok
2> espell:correct("speling").
{"spelling",4}
5> espell:correct("somthing").
{"something",683}

This code makes extensive use of list comprehensions in erlang, which is hugely responsible for cutting down espell code to just 45 lines of erlang.

Read More

JAXL library – List of available hooks for various XMPP events

Jaxl 2.x provides an event mechanism using which developers can register callbacks for various xmpp events inside their application code. This blog post will demonstrate how to register callbacks for required xmpp events and go through a list of all available hooks. Finally, we will discuss parameters that are passed to called back methods by Jaxl core.

Registering callback on XMPP events
Applications can register callback for various XMPP events. Jaxl core will then callback application methods (with 2 parameters) every time associated XMPP event occurs. Shown below are some sample examples for registering callbacks.

When application callback’d method is a function:

function postAuth($payload, $jaxl) {

}
$jaxl->addPlugin('jaxl_post_auth', 'postAuth');

When application callback’d method is a public static method of a class:

class MyXMPPApp {
    public static function postAuth($payload, $jaxl) {

    }
}
$jaxl->addPlugin('jaxl_post_auth', array('MyXMPPApp', 'postAuth'));

When application callback’d method is a public method inside a class:

class MyXMPPApp {
    function postAuth($payload, $jaxl) {

    }
}
$MyXMPPApp = new MyXMPPApp();
$jaxl->addPlugin('jaxl_post_auth', array($MyXMPPApp, 'postAuth'));

In all the above examples jaxl_post_auth is one of the available hook for registering callbacks.

List of available hooks
Below is a complete list of available hooks in order of their occurrence within a Jaxl instance life cycle:

Hooks for events related to instance connection and authentication steps in various modes:

  • jaxl_post_connect
  • jaxl_get_auth_mech
  • jaxl_get_facebook_key
  • jaxl_post_auth_failure
  • jaxl_post_auth
  • jaxl_post_handshake
  • jaxl_pre_shutdown
  • jaxl_post_disconnect
  • jaxl_get_empty_body

Hooks for events related to XMPP stream and stanza’s:

  • jaxl_get_stream_error
  • jaxl_get_presence
  • jaxl_get_message
  • jaxl_get_iq_get
  • jaxl_get_iq_set
  • jaxl_get_iq_error
  • jaxl_send_message
  • jaxl_send_presence

Hooks for events related to reading/writing of XMPP packets and internal packet routing:

  • jaxl_get_xml
  • jaxl_send_xml
  • jaxl_send_body
  • jaxl_pre_handler
  • jaxl_post_handler

TO-DO: Update when every hook is called inside your application life cycle and list of parameters passed for each callback. As of now you can var_dump($payload); inside your callback method.

Read More

How to write External Jabber Components in PHP using Jaxl library?

Jabber Component Protocol (XEP-0114) documents how XMPP protocol can be used to communicate between servers and “external” components over the Jabber network. XMPP components “bind” to a domain, usually a sub-domain of the main XMPP service, such as service.example.org.

All incoming stanzas addressed to that domain (to='service.example.org') or to entities on that domain (to='[email protected]') will be routed to your Jaxl (Jabber XMPP Library) based code. In this blog post, I will demonstrate a sample external jabber component bot written in PHP using Jaxl library.

Refer Jaxl Installation, Usage guide and Example apps if you are new to Jaxl. Demonstrated component bot code can be obtained from Jaxl@github.

Using Jabber Component Protocol
Include Jaxl implementation of XEP-0114 in your application code to setup necessary environment for using Jabber component protocol. Here is how this can be done at the top of your application code:

        // Initialize Jaxl Library
        $jaxl = new JAXL(array(
                'component' => JAXL_COMPONENT_HOST,
                'port' => JAXL_COMPONENT_PORT
        ));

        // Include required XEP's
        jaxl_require('JAXL0114', $jaxl); // Jabber Component Protocol

Register callback for XMPP events
Above we have setup the necessary environment for writing external Jabber component bots. Next we register callback for necessary XMPP events inside our componentbot class.

        // Sample Component class
        class componentbot {

        }

        // Add callbacks on various event handlers
        $componentbot = new componentbot();
        JAXLPlugin::add('jaxl_pre_handshake', array($componentbot, 'doAuth'));
        JAXLPlugin::add('jaxl_post_handshake', array($componentbot, 'postAuth'));
        JAXLPlugin::add('jaxl_get_message', array($componentbot, 'getMessage'));

Component bot class
Finally, lets complete the missing pieces inside componentbot class.

        // Sample Component class
        class componentbot {

                function doAuth() {
                        $jaxl->log("Going for component handshake ...", 1);
                        return JAXL_COMPONENT_PASS;
                }

                function postAuth() {
                        $jaxl->log("Component handshake completed ...", 1);
                }

                function getMessage($payloads) {
                        global $jaxl;

                        // echo back
                        foreach($payloads as $payload) {
                                $jaxl->sendMessage($payload['from'], $payload['body'], $payload['to']);
                        }
                }

        }

Configure, Setup and Run
If you have a local “ejabberd” installed, add following lines inside ejabberd.cfg to make example component bot to work:

  {5559, ejabberd_service, [
                          {host, "component.localhost", [{password, "pass"}]}
                           ]},

Update jaxl.ini if you choose to have different password, port or host name above:

        // Connecting jabber server details
        define('JAXL_HOST_NAME', 'localhost');
        define('JAXL_HOST_DOMAIN', 'localhost');

        // Component bot setting
        define('JAXL_COMPONENT_HOST', 'component.'.JAXL_HOST_DOMAIN);
        define('JAXL_COMPONENT_PASS', 'pass');
        define('JAXL_COMPONENT_PORT', 5559);

Finally, run from command line:

root@ubuntu:~/usr/share/php/jaxl/app/componentbot# jaxl componentbot.php
[15008] 2010-08-24 01:40:03 - Socket opened to the jabber host localhost:5559 ...

Tail jaxl.log for details:

[15008] 2010-08-24 01:40:04 - Going for component handshake ...

[15008] 2010-08-24 01:40:04 - [[XMPPSend]] 63
<handshake>4d6c2e762d5ba5dca2cbd3a90a4deeb6a6fa0838</handshake>

[15008] 2010-08-24 01:40:05 - [[XMPPGet]]
<handshake/>

[15008] 2010-08-24 01:40:05 - Component handshake completed ...

Log into your Ejabberd with a client and send a message to [email protected] – You should receive an instant response back – congratulations!

Read More