Erlang is a beautiful programming language from Ericsson which i first came across while cutomizing authentication flow of ejabberd about 2 years back. Ever since then I have been using erlang for all my application backend needs including custom http server, custom bosh conn. manager, xmpp components and clients, … Recently i have even started churning my application html pages via erlang using erlydtl (an Erlang implementation of the Django Template Language).
Years ago, i gave a successful shot at implementing Peter Norvig’s Spell Corrector in PHP. Last weekend i attempted the same “Spell Corrector” algorithm in about 45 lines of Erlang code.
ESpell:
Complete code file with comments and explaination can be found here:
https://github.com/abhinavsingh/espell
-module(espell).
-define(alphabet, "abcdefghijklmnopqrstuvwxyz").
-export([start/0, correct/1]).
%%
%% API Functions
%%
%% @doc start spell checker
start() -> train(words()).
%% @doc returns most probable correct candidate with score
correct(Word) ->
lists:foldl(
fun(Candidate, {Correction, Score}) ->
case ets:lookup(?MODULE, list_to_binary(Candidate)) of
[{_, Counter}] when Counter > Score -> {Candidate, Counter};
_ -> {Correction, Score}
end
end, {Word, 0}, get_candidates(Word)).
%%
%% Local Functions
%%
words() ->
{ok, Bin} = file:read_file("../priv/big.txt"),
{ok, Words} = regexp:split(binary_to_list(Bin), "[^a-zA-Z]"),
lists:map(fun(X) -> string:to_lower(X) end, Words).
train(Features) ->
io:fwrite("training initial word list...~n"),
ets:new(?MODULE, [set, named_table]),
lists:foreach(fun(X) ->
case ets:insert_new(?MODULE, {list_to_binary(X), 1}) of
false -> ets:update_counter(?MODULE, list_to_binary(X), 1);
true -> true
end
end, Features),
io:fwrite("training complete...~n"),
ok.
edits1(Word) ->
Splits = lists:foldl(fun(I, Acc) -> Acc ++ [{string:substr(Word, 1, I), string:substr(Word, I+1, string:len(Word)-I)}] end, [{"", Word}], lists:seq(1, string:len(Word))),
Deletes = [A ++ string:substr(B, 2) || {A,B} <- Splits, B =/= []],
Transposes = [A ++ string:substr(B, 2, 1) ++ string:substr(B, 1, 1) ++ string:substr(B, 3) || {A,B} <- Splits, string:len(B) > 1],
Replaces = [A ++ binary_to_list(<>) ++ string:substr(B, 2) || {A,B} <- Splits, B =/= [], C <- ?alphabet],
Inserts = [A ++ binary_to_list(<>) ++ B || {A,B} <- Splits, C <- ?alphabet],
lists:usort(Deletes ++ Transposes ++ Replaces ++ Inserts).
%%edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1)]).
known_edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1), ets:member(?MODULE, list_to_binary(E2))]).
known(Words) -> lists:usort([Word || Word <- Words, ets:member(?MODULE, list_to_binary(Word))]).
get_candidates(Word) ->
C1 = known([Word]),
if
length(C1) > 0 -> C1;
true -> C2 = known(edits1(Word)),
if
length(C2) > 0 -> C2;
true -> C3 = known_edits2(Word),
if length(C3) > 0 -> C3;
true -> [Word] end
end
end.
Try It Out:
espell provides 2 simple function for all it’s working:
- start() : start espell which initiates reading initial data and training phase
- correct(Word) : this accepts 1 parameter, which is the word you want to correct. It returns a 2-tuple, where 1st element is the correct word and 2nd element is a score (which right now simply means number of times correct word was seen in training data set)
$ cd espell
$ erlc -o ebin/ src/espell.erl
$ erl -pa ebin/
Erlang R14B03 (erts-5.8.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.8.4 (abort with ^G)
1> espell:start().
training initial word list...
training complete...
ok
2> espell:correct("speling").
{"spelling",4}
5> espell:correct("somthing").
{"something",683}
This code makes extensive use of list comprehensions in erlang, which is hugely responsible for cutting down espell code to just 45 lines of erlang.