Erlang is a beautiful programming language from Ericsson which i first came across while cutomizing authentication flow of ejabberd about 2 years back. Ever since then I have been using erlang for all my application backend needs including custom http server, custom bosh conn. manager, xmpp components and clients, … Recently i have even started churning my application html pages via erlang using erlydtl (an Erlang implementation of the Django Template Language).
Years ago, i gave a successful shot at implementing Peter Norvig’s Spell Corrector in PHP. Last weekend i attempted the same “Spell Corrector” algorithm in about 45 lines of Erlang code.
ESpell:
Complete code file with comments and explaination can be found here:
https://github.com/abhinavsingh/espell
-module(espell). -define(alphabet, "abcdefghijklmnopqrstuvwxyz"). -export([start/0, correct/1]). %% %% API Functions %% %% @doc start spell checker start() -> train(words()). %% @doc returns most probable correct candidate with score correct(Word) -> lists:foldl( fun(Candidate, {Correction, Score}) -> case ets:lookup(?MODULE, list_to_binary(Candidate)) of [{_, Counter}] when Counter > Score -> {Candidate, Counter}; _ -> {Correction, Score} end end, {Word, 0}, get_candidates(Word)). %% %% Local Functions %% words() -> {ok, Bin} = file:read_file("../priv/big.txt"), {ok, Words} = regexp:split(binary_to_list(Bin), "[^a-zA-Z]"), lists:map(fun(X) -> string:to_lower(X) end, Words). train(Features) -> io:fwrite("training initial word list...~n"), ets:new(?MODULE, [set, named_table]), lists:foreach(fun(X) -> case ets:insert_new(?MODULE, {list_to_binary(X), 1}) of false -> ets:update_counter(?MODULE, list_to_binary(X), 1); true -> true end end, Features), io:fwrite("training complete...~n"), ok. edits1(Word) -> Splits = lists:foldl(fun(I, Acc) -> Acc ++ [{string:substr(Word, 1, I), string:substr(Word, I+1, string:len(Word)-I)}] end, [{"", Word}], lists:seq(1, string:len(Word))), Deletes = [A ++ string:substr(B, 2) || {A,B} <- Splits, B =/= []], Transposes = [A ++ string:substr(B, 2, 1) ++ string:substr(B, 1, 1) ++ string:substr(B, 3) || {A,B} <- Splits, string:len(B) > 1], Replaces = [A ++ binary_to_list(<>) ++ string:substr(B, 2) || {A,B} <- Splits, B =/= [], C <- ?alphabet], Inserts = [A ++ binary_to_list(<>) ++ B || {A,B} <- Splits, C <- ?alphabet], lists:usort(Deletes ++ Transposes ++ Replaces ++ Inserts). %%edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1)]). known_edits2(Word) -> lists:usort([E2 || E1 <- edits1(Word), E2 <- edits1(E1), ets:member(?MODULE, list_to_binary(E2))]). known(Words) -> lists:usort([Word || Word <- Words, ets:member(?MODULE, list_to_binary(Word))]). get_candidates(Word) -> C1 = known([Word]), if length(C1) > 0 -> C1; true -> C2 = known(edits1(Word)), if length(C2) > 0 -> C2; true -> C3 = known_edits2(Word), if length(C3) > 0 -> C3; true -> [Word] end end end.
Try It Out:
espell provides 2 simple function for all it’s working:
- start() : start espell which initiates reading initial data and training phase
- correct(Word) : this accepts 1 parameter, which is the word you want to correct. It returns a 2-tuple, where 1st element is the correct word and 2nd element is a score (which right now simply means number of times correct word was seen in training data set)
$ cd espell $ erlc -o ebin/ src/espell.erl $ erl -pa ebin/ Erlang R14B03 (erts-5.8.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.8.4 (abort with ^G) 1> espell:start(). training initial word list... training complete... ok 2> espell:correct("speling"). {"spelling",4} 5> espell:correct("somthing"). {"something",683}
This code makes extensive use of list comprehensions in erlang, which is hugely responsible for cutting down espell code to just 45 lines of erlang.
hey good to see you back on blogging!
and on erlang! that’s wonderful.
will look fwd for more erlang.
hey good to see you back on blogging!
and on erlang! that’s wonderful.
will look fwd for more erlang.