Back to blogging: What to expect

Hello Readers,

I started this blog as a way to share my experiments and experiences while learning web development and computer science in general. In the first 2 years (between Apr’08 and Aug’10) I wrote as many as 100 blog posts. Quite a frenzy. Ever since, I only managed to write 5-6 posts in the following 4 years, about nearly 45 drafts which may now never get published. Good thing is that, I am back to blogging, which means a lot to share.

Briefly, here is what (or what not) to expect in the future posts:

  1. PHP – In past, PHP has dominated the content on this blog. Mostly web demos, some quick hacks or some JAXL library examples. However, I am no longer working actively with PHP since ’10 and probably never saw it after ’12. Expect zero PHP.

  2. JAXL – No more PHP essentially means no more JAXL posts. In fact, I recently moved JAXL repository to it’s own Github organization where other collaborators can maintain, improve and work on it without requiring my active involvement. This organization also contains other repositories that I managed to open source from my startup Jaxl.

  3. XMPP – Unfortunately, I am no longer in touch with progress on XMPP specifications. Specs has evolved a lot, to an extent that some developers have reported mod_message_carbon no longer works as expected with new Ejabberd server version (also, Message Carbon extension XEP-0280 has itself been deprecated). However, XMPP will always be my preferred choice whenever I need entire suite of user-to-user, group messaging, presence, contacts management, Jingle / SIP integration and other features baked into XMPP XEPs. For my everyday messaging needs, new technologies like ZeroMQ, AMQP (RabbitMQ), MQTT or even Redis PubSub are more suitable.

  4. Java – After some journey I am now finally working full-time with Java. I still hate it but trying to adapt, learn and love it for at least what it’s worth for.

  5. Python – Thanks to my stint with Appurify, I had a chance to work full-time with Python. I even managed to work on some interesting open source Python projects. Even though now it’s no longer my primary language, Python is always fun specially when one is in a hurry of getting things done.

  6. Golang / Erlang – I met Golang a year back. I met Erlang while hacking Ejabberd, Riak etc for my startup Jaxl and immediately fell in love with it. Nowadays, I am in love with Golang. It’s simple and precise, has similar message passing semantics (buffered channels) as found in Erlang (mail boxes). I highly recommend digging into these languages and getting comfortable with message passing programming paradigm. They will change how you approach and think about your application structure. Expect lots of Golang and some Erlang.

  7. Docker – Who is not into docker these days? If that’s not the case with you, leave this post right now and head over to docker user guide. That’s how important I find this piece of beauty (technology). Expect a lot about docker in my future posts.

  8. Startups – A lot of startup fun has kept me busy since ’10, some experiences and learnings are worth sharing.

  9. Android – I have been working full-time with mobiles (both Android and iOS) since ’12. Not much of application making but a lot of hacking with Adb protocol and libimobiledevice.

  10. System designing – Luckily, I happened to experience a lot of end-to-end system and network designing. This domain is of great interest once you start to have fun with Racks, Subnet, Routes, Switches, Firewalls, DNS, Multi-cast and entire suite of technology under this umbrella.

Will end this post with some interesting images from the past.

Fsck'd iPhone screen
Swollen iPhone screen due to high device temperature
Rackframe
Setting up Racks

How to Write a Spelling Corrector in Erlang (ESpell)

Erlang is a beautiful programming language from Ericsson which i first came across while cutomizing authentication flow of ejabberd about 2 years back. Ever since then I have been using erlang for all my application backend needs including custom http server, custom bosh conn. manager, xmpp components and clients, … Recently i have even started churning my application html pages via erlang using erlydtl (an Erlang implementation of the Django Template Language).

Years ago, i gave a successful shot at implementing Peter Norvig’s Spell Corrector in PHP. Last weekend i attempted the same “Spell Corrector” algorithm in about 45 lines of Erlang code.

ESpell:

Complete code file with comments and explaination can be found here:
https://github.com/abhinavsingh/espell

-module(espell).
-define(alphabet, "abcdefghijklmnopqrstuvwxyz").
-export([start/0, correct/1]).

%%
%% API Functions
%%

%% @doc start spell checker
start() -> train(words()).

%% @doc returns most probable correct candidate with score
correct(Word) ->
	lists:foldl(
		fun(Candidate, {Correction, Score}) -> 
			case ets:lookup(?MODULE, list_to_binary(Candidate)) of
				[{_, Counter}] when Counter > Score -> {Candidate, Counter};
				_ -> {Correction, Score}
			end
		end, {Word, 0}, get_candidates(Word)).

%%
%% Local Functions
%%

words() ->
	{ok, Bin} = file:read_file("../priv/big.txt"),
	{ok, Words} = regexp:split(binary_to_list(Bin), "[^a-zA-Z]"),
	lists:map(fun(X) -> string:to_lower(X) end, Words).

train(Features) ->
	io:fwrite("training initial word list...~n"),
	ets:new(?MODULE, [set, named_table]),
	lists:foreach(fun(X) ->
		case ets:insert_new(?MODULE, {list_to_binary(X), 1}) of
			false -> ets:update_counter(?MODULE, list_to_binary(X), 1);
			true -> true
		end
	end, Features),
	io:fwrite("training complete...~n"),
	ok.

edits1(Word) ->
	Splits = lists:foldl(fun(I, Acc) -> Acc ++ [{string:substr(Word, 1, I), string:substr(Word, I+1, string:len(Word)-I)}] end, [{"", Word}], lists:seq(1, string:len(Word))),
	Deletes = [A ++ string:substr(B, 2) || {A,B} <- Splits, B =/= []],
	Transposes = [A ++ string:substr(B, 2, 1) ++ string:substr(B, 1, 1) ++ string:substr(B, 3) || {A,B} <- Splits, string:len(B) > 1],
	Replaces = [A ++ binary_to_list(<>) ++ string:substr(B, 2) || {A,B} <- Splits, B =/= [], C <- ?alphabet],
	Inserts = [A ++ binary_to_list(<>) ++ B || {A,B} <- Splits, C <- ?alphabet],
	lists:usort(Deletes ++ Transposes ++ Replaces ++ Inserts).

%%edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1)]).
known_edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1), ets:member(?MODULE, list_to_binary(E2))]).
known(Words) -> lists:usort([Word || Word <- Words, ets:member(?MODULE, list_to_binary(Word))]).

get_candidates(Word) ->
	C1 = known([Word]),
	if 
		length(C1) > 0 -> C1;
		true -> C2 = known(edits1(Word)),
			if
				length(C2) > 0 -> C2;
				true ->	C3 = known_edits2(Word),
					if length(C3) > 0 -> C3; 
					true -> [Word] end
			end
	end.

Try It Out:
espell provides 2 simple function for all it’s working:

  • start() : start espell which initiates reading initial data and training phase
  • correct(Word) : this accepts 1 parameter, which is the word you want to correct. It returns a 2-tuple, where 1st element is the correct word and 2nd element is a score (which right now simply means number of times correct word was seen in training data set)
$ cd espell
$ erlc -o ebin/ src/espell.erl
$ erl -pa ebin/
Erlang R14B03 (erts-5.8.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.4  (abort with ^G)
1> espell:start().
training initial word list...
training complete...
ok
2> espell:correct("speling").
{"spelling",4}
5> espell:correct("somthing").
{"something",683}

This code makes extensive use of list comprehensions in erlang, which is hugely responsible for cutting down espell code to just 45 lines of erlang.