Gmail Type Attachment – How to make one?

Google and its wide range of web applications have simple changed the way we used to look at internet. Be it the revolutionary fast Google search engine or mail by Google i.e Gmail. You will find enormous amount of threads and forums discussing How do google achieve doing XYZ? or How do they make it happen fast and efficiently?.

One of the most talked about thing about Gmail is probably how do they achieve making Gmail so simple, convinient, sober, fast and easy. One of the Gmail’s feature which contribute hugely in making it easier and faster for users is: Its ability to attach your files while you are writing your mail.

I have tried to come up with an exact clone for gmail attachment. Kindly click here for the demo and the source files.

This is achieved by using an iframe which contains your server side script responsible for handling uploads. As soon as the user select a file to upload, the upload form gets submitted to upload.php, which lies in an iframe. Hence at no point does the user leaves the page and still he gets his file uploaded. Find attached below all the files you need to implement this on your server. Just unzip the file on your server, and you are ready. All the uploaded files will go in th upload folder by default. You may change that in upload.php.

Feel free to post a comment back here for any queries etc.

Cheers !

My Interview with Yahoo-Inc! (Part 1)

Hello Friends,

Last month I was interviewed by Yahoo-Inc! for a job as software engineer. Here I would like to point out the discussions and various question/answer sessions I went through before I got my confirmation letter 🙂 Probably someone looking to join yahoo can take some tips from here.

In total I went through 7 rounds of interview (1 telephonic, 1 HR, 1 with Manager, 3 technical, 1 general aptitude). I will try my best to recall and summarize all of them for you all. But before I go ahead, I would like to thank my good friend Rajib Das for forwarding my resume in Yahoo and providing me with an opportunity to have a crack at it. Here I go:

  • Telephonic Interview: As it happens with all multinational firms, you go through a telephonic screening before you are actually called for a face-to-face interview. One good thing which happened for me even before the interview started was that, the interviewer already went through my CV thoroughly before interviewing me & he also checked my website Altertunes before interviewing me. You can be unlucky at times, if the interviewer comes with an open mind and try to test you in areas where you lack deep knowledge.

    My telephonic interview started early in the morning around 10 o’clock. It started with general about me questions. With my experiences I have learnt that these about me  sessions decide to a certain extent where your interview will head to. Hence, I always try to attract interviewer’s attention on my strong areas i.e. Audio Processing and web development.

    Next luckily for me he started discussing about Altertunes and various technologies used in developing the same. He specifically asked me questions like:

    1. How have you build the spell checker algorithm for Altertunes ?
    2. From where are you playing all the mp3 files ? Are they on your server ?
    3. If you have to, how will you implement the Orkut’s people connection algorithm ? i.e. If User A visits User C’s profile where User C is not a direct connection of User A, then Orkut shows a possible shortest connection. Something like this User A -> User B -> User C.
    4. What considerations have you taken while building the auto-suggest feature for Altertunes ?

    Though in my opinion a concrete answer do not exist for any of the above questions. There can be many possible solutions for every question he asked me. Here are my answers to his questions.

    1. I have used the spell checking algorithm as described by Peter Norvig, the director of research at Google in his post here. Many say, the 20 lines of code written by him in python is pure genius, and truly without any doubt it is one of the shortest and most efficient code that one can ever see. However for Altertunes, I have used a tweaked version of the same coded in PHP. For me the 20 lines code in python took 74 lines in PHP, however the results are just excellent for Altertunes. My spell checker for Altertunes gives over 90% accuracy for single word artists and over 80% accuracy for multiple word artists.
      Another simple but less accurate approach which I tried earlier was by using PHP’s inbuild function SOUNDEX, which is a phonetic algorithm for indexing names as sound, as pronounced in english. Read more about SOUNDEX here. However, the spell checker using SOUNDEX is not as accurate as the algorithm described by Peter Norvig.

    2. The answer to this question lies in one work Crawlers. I wrote crawlers in PERL which crawls the web for mp3 files and index them in the Altertunes database. However this answer didn’t satisfied the interviewer and he asked me Can you explain the architecture which you follow for developing crawlers? Probably he wanted to hear the word resolver from me, because as soon as I started telling him about resolvers he stopped me in between and proceeded to the next question. For more hint on how have I used PHP to develop web crawlers for Altertunes read here.
    3. Answer to the third question was probably hidden in the question itself. Shortest Path, is what we want hence we will obviously use one of the shortest path algorithm. As I didn’t had much knowledge on shortest path algorithms then, I suggested Dijkstra algorithm as the most appropriate for this task. However when I later discussed this problem with one of my friend, he told me probably Breath First Search (BFS) will be best suited for this problem.
    4. Here actually he asked me How have you developed the auto-suggest feature for Altertunes? For which I replied in one work AJAX. However, probably he wanted some more technical stuff on this topic and hence the next question he fired was What considerations have you taken while building the auto-suggest feature for Altertunes ? I still don’t really know what exactly was he looking for. My reply to him was, ‘I am sending each and every word typed in the input box to the server and getting response to all requests. However before sending the next request I wait for the previous response to come as I wanted my users to experience an incremental search.’

    I thought this was all that he will ask me for the telephonic interview, but ailaaa ! He started firing some questions from general algorithm and aptitude. A few questions which he further asked me were:

    • Given a tree, how will you create a mirror image of the tree? What algorithm will you use? What traversal will you prefer? 🙂
    • There is a link list, which contains a closed loop in it. How will you determine that closed loop in the link list? 🙂
    • Given a few numbers, how will you push them into a stack such that whenever I pop the top most element from that stack, I get the minimum number from the bunch. Also he wanted me to tell the pop push algorithm such that the order of insertion and popping should be O(1). 🙁
    • Given two log of wood which burns completely in 1 hour each. How will you determine 45 minutes.

    Hopefully this was all which I was asked in my telephonic interview. Within 10 minutes of the interview I received a call from the HR, asking me to come for a face-to-face interview which I scheduled for a week later. As I already got a feel of what all will be coming in the face-to-face interview, I didn’t wanted to go under prepared.

  • Face-to-Face Round 1: Before going for the in person interview, I tried my best to make myself familiar with the mother of all algorithm books by Cormen. You can download an e-copy of the same from here, size is about 10 Mb. Alternately you can search for -inurl:htm -inurl:html intitle:”index of” +(“/ebooks”|”/book”) +(chm|pdf|zip) +”cormenin google and get loads of sources for the same. You may change the search string to find other books from different formats. Anyways coming back to the interview.

    Just like my telephonic interview, the 1st round started at sharp 10 o’clock. The interviewer seemed liked a mixed bag to me. He tried to ask me questions ranging from C++, PERL, PHP, Java, Client-Server Model, Probability and even general aptitude. Here are a few questions which I can recall that were asked from me:

    1. Given a HTML page, how will you represent the same in XML format ?
    2. What are the advantages of Hidden Markov Model over other pattern recognition algorithms ?
    3. Yahoo is organizing an online programming competition and I am the head. There will be over a few million programmers participating in this contest. I have a set of 10 programming questions with me, which I give to each programmer one by one. As the programmer submits the solutions I check it, and if found perfect I give him the next question. Who-so-ever completes the 10 programming questions first is the winner. You have to award top 10 programmers. How will you go about developing such a system ? What all precautions will you take while developing such a system ? How many CPU will be required for such a system ? How will you protect the system from going down, in cases when a programmer tries to submit a virus as a solution ? etc etc
    4. Whats the probability that, my and your birthday lies on the same date and same day of the year ? (There were about 3-4 probability questions that he asked me)
    5. Draw and explain the whole client-server model ?
    6. Explain in detail, what all happens from the moment one submit a URL in the browser till the user sees the requested page on the browser?
    7. Write algorithms for Breath First Search (BFS) and Depth First Search (DFS) ? Which one is preferred over the other and why ?

    Then like the telephonic interview he fired almost the same set of questions over Altertunes. Probably there were a few more which I can’t recall at this point. Possibly the ones which I wasn’t able to reply. 😉 And that was it.

  • Face-to-Face Round 2: This was a complete session on aptitude level questions, which just kept coming one after the another. Some questions over which you can probably spend the whole day and say ‘This can be done the other way round too‘. Here are a few that I remember as of now:
    1. You have to weight all the weights between 1 and 100. Tell minimum number of weights required to do the same? (Probably the answer I gave was, weights which are a power of 2 i.e. 1,2,4,8,16,32 and 64)
    2. You have to weight all the weights between 1 and 100. Tell minimum number of weights required to do the same, however in this case you can use both pans to measure a particular weight ? For eg. If we need to weight 2 Kgs we can have 3 Kg weigh on one side and a 1 Kg + 2 Kg weight on the other side. (Probably the answer I gave was, weights which are a power of 3 i.e. 1,3,9,27…)
    3. You have a few eggs and there is a 100 level building. Determine the minimum number of attempts required to detect the floor from which, if we drop the egg it will break ? (Always consider the worst case that the egg will break at the top floor and for minimum attempts proceed from lower floor to top floor in non-uniform steps. Probably I was able to do this in 14 tries. Check if you can do the same :P)
    4. There is a huge string containing letters. Now in the string there are a lots and lots of palindromes. You need to find out the palindrome of the longest length. How will you do the same ? (Note: When I was reached the solution using stacks and queues, to complicate things he told me that a palindrome can further have a palindrome within it for eg. PALINDROMEDALADXYXDALADEMORDNILAP , hence we see the string has DALAD as a palindrome which is hidden in the full string which is a palindrome of bigger size.)
    5. There is a link list which have a lot of circular loops into it. How will you find the longest loop in the link list ?
    6. You have a very very large text file containing words. Consider that you can’t read it fully in the memory at once. How will you go about finding a word in the text file ?
    7. You have a very very large text file containing numbers. Consider that you can’t read it fully in the memory at once. How will you go about arranging the numbers in ascending or descending order ?
    8. A few questions on probability were discussed.

    There were a lot more which he asked from me. I guess there are easily 7-8 questions more that he asked from me. However this was the most interesting round that I had over the whole interview.

Probably thats enough for the first post. I will continue to write about my other rounds in upcoming posts. Till then if you have any questions or comments feel free to post them here.

Life in Bangalore after IIT Guwahati

Bangalore, Bangalore, Bangalore !!!

This was what everyone wished for in their final year at IIT Guwahati. Everyone wanted to get a job in one of the multinational firms in Bangalore and like everyone else, this was what I longed for and got too! After a long wait for my joining letter,  I was finally asked to join my present employer on 21st June, 2007 and if I can recall correctly, it was on 19th June, 2007 that I landed in Bangalore.

First trip to Bangalore (2002):
It wasn’t for the first time, that I was visiting Bangalore. I came to Bangalore way back in 2002 for an NDA (National Defence Academy) SSB Interview. That trip was my first trip outside Lucknow (my hometown) and it proved to be a disaster for many reasons:

  • I wasn’t selected in the Indian army.
  • I watched 4 movies in a span of 7 days here in Bangalore. ‘Lagaan’ and ‘Gadar’ were watched back to back on the very first day in Bangalore followed by another one starring Amitabh Bachaan, Manoj Bajpai and Raveena Tandon (I can’t recall its name). Must be wondering how watching movies could have made this trip a disaster. Wait, read on to know about my last movie on that trip.
  •  On the last day, I was already out of the NDA campus roaming on the streets at around 04:00 PM, when I cared to take a look at my train ticket back to Lucknow. The train was scheduled for 08:00 PM . So at 05:00 PM, since I had a lot of time at hand,  I wondered why not watch another movie, probably an english one, at one of the theaters near the railway station. It was ‘The tailor of panama ‘ that I chose to watch. The movie, starring Pierce Brosnan,  was as good as expected. It was about 07:00 PM when I came out of the movie hall. I rushed to the station and sat there awaiting the train’s arrival.  I continued to wait for the train, but till 08:00 PM, neither could I see any rush on the platform nor were there any announcements for that train. I called back home to my father to check if there were some changes in the train schedule. He asked me “Why are you out of the train ? You must be at Hyderabad station right now, isn’t it ?” Foxed on hearing what he was saying, I told him “I am still in Bangalore waiting for the train, but there is no announcements or anything what so ever.”  I had begun to sense that I’d committed some blunder. After the conversation with my father, I realized that the train was at 05:00 PM (the time i decided to watch a movie) and not 08:00 PM. I just put down the receiver and started looking at the ticket for that magical 08:00 PM, how could I mis-read such a thing, and I realized that the train ticket was booked at 08:00 PM in Lucknow, and that was what I interpreted as the train departure timing. Hell of a mistake!
  • But that disaster trip was not going to end here. The ticket stubs of the movies I watched during my stay at Bangalore  were all there in my back pocket. I reached home and my father got hold of those movie tickets. Thats when the real drama started, I was beaten to pieces.

First few days on my second trip:
Anyways, I expected a lot better experience in Bangalore this time. Hotel Orchid Harsha, was where we were supposed to stay for the initial 15 days. All the expenses were managed by the company so we all spent like a millionaire at Harsha. There were atleast 8-10 friends and colleagues present for dinner at the hotel room. 😉 I will say that those 15 days were one of the best here at Bangalore till date, when we lived life King size.

Dull, boring and Monotonous life:
However, once we were all engaged into our professional lifes, we could hardly find time for regular bakarchodi sessions we used to have back in college. It was more of a dull, boring and monotonous life for us all here. Also few of our friends found girl friends for themselves which didn’t help any further (less people = less bakarchodi). 😛 At times, a long drive on bikes with friend (Tau, a huge enthusiast for such long drives, owns a Karizma and 2 accidents till date did not lessen his spirits. Infact he’s looking forward to Hayabussa!) was another thing we tried, but our laziness and eagerness to sit in one position 😛 killed the interest in those long drive trips. Every friday, rock music and drinks at Purple Haze was the only joy we used to have then. Back in college, I used to think that we were not addicted to alcohol and that we used to drink for fun. But after coming to Bangalore I realized that we all had become addicted to beer and hard drinks at some level or the other.

No Rock band for me:
I came to Bangalore with high expectations of forming a new rock band. However it is really disappointing that I couldn’t find a band till now. All the bands from where I got invitation to play were either 20-25 Km from my house or they were not according to my taste or they were just on the starting to play rock music. Involvement with my website ‘Altertunes.com’ also didn’t leave me enough time to play my guitar at home.

Work @ Office:
I was probably looking for a different domain in my first job. Business intelligence was probably not the thing for me. (Though the field is great with some great scope ahead in the market) .Knowledge gained while developing my website eventually helped me in cracking Yahoo! . I just hope it will be great fun at Yahoo. Looking forward to working there !

Colleagues @ Office:
One of the things to cheer about is the team and colleague I got to work with at Oracle. Mahesh, Eshwar, Prashanth, Karthik, Vinnet Jain (mantri), Yasser, Vineet Alagh, Ankur and Avisek (Dada), everyone just made sure that I had a rocking and comfortable time at Oracle. Be it the moon walks in the lift by Ankur Jain, Finance funde by Dada or mimicry by mantri, it was fun all the way. The managers had an inspirational presence in the office, seeing them handle and managing thing was a great learning experience in itself. Kumar, Bala, Vishy and Michelle all were inspirational and joy to work with. Overall, I must say that I couldn’t have asked for a better work culture in my first job.

To be continued ….

Google, Yahoo, Microsoft toolkit for startups

Welcome,

Well there is some good news for young entrepreneurs out there.

  • Google released Google Apps which can be used by for your websites email, hosting needs. You can use all the google applications (Gmail, Gtalk, Google docs, Calendar, Page Creator etc) for your site and all for free. This really helps one to focus more on the functionality and features of their website rather than wasting time in maintaining such trivial things. (Altertunes currently uses Google Apps without any problem)
  • In competition, Microsoft too have released Startup Center and Microsoft Office Live Basics, which includes Business plans for startups, Domain names, webspace and hosting with metrics, Accounting software, tools and etc.
  • And, Finally Yahoo India too started a similar thing for small businesses called Yahoo Small Business. However this comes at a nominal price of Rs 999/- per year (which is acceptable considering they give you domain name registration with this). The package includes Domain Name, Free 200 Business email ids, Site Solutions to design Website, 5 GB disk space, 200 GB data transfer and some free Ads slots.

Not to forget that Google have already released Google App Engine a few weeks ago which allows you to execute your application on google infrastructue.

Surely the competition in the market is benefiting the small entrepreneurs in the market.

Take full advantage of this and All the best for your startup.

How to configure Ubuntu and LAMP on Windows

Hello all Linux freaks,

Having already looked upon how to configure Apache-PHP-MySQL on Windows, now here I will try to explain in short how to do the same on Linux OS. I personally don’t have a seperate machine for linux. I run Ubuntu on my Windows machine using VM Ware. So before we go on to see how to configure LAMP on Ubuntu, lets see how can we have Ubuntu running on Windows.

  • For this tutorial I have used VMware-player-2.0.2-59824.exe for VM Ware installation and Ubuntu-7.04-desktop-i386.zip for Ubuntu. You will need to download the same from http://www.vmware.com/download/player/ and http://www.ubuntu.com/GetUbuntu/download
  • Install VM Ware, which is the most simple installer as you can get. At the end it will ask you to reboot the machine and kindly do not skip this step.
  • Now, open VM Ware which you have just installed and you should see something like this:

  • Now click the open button and browse to the folder where you have unzipped the Ubuntu zip file.

  • Click open Ubuntu-7.04-desktop-i386.vmx and thats it. You have just installed and configured Ubuntu on Windows. Simple, Isn’t it ? You should be seeing something like this by now:

For my system, Ubuntu automatically picked up various internet settings. However when I tried running the same from my office, I had to make appropriate changes for proxy setting. Kindly do the same for running internet on your Ubuntu.

Now an important thing before we proceed:

  1. The default administrator password for Ubuntu is ubuntu
  2. By default you are not the admin or root user. Hence you will need to prefix sudo or su to run a command as administrator in the Ubuntu terminal.

Also, before we proceed further kindly check if your Ubuntu is configured correctly for internet connection. Just check by opening this blog through mozilla in ubuntu. If it works, you are all set to configure LAMP on ubuntu.

Follow the following steps to configure LAMP on ubuntu (you need to run a few commands on your terminal window)

  • Open file at /etc/apt/source.list and uncheck the box for install from CD. This will let ubuntu install all modules directly from the repository.
  • $ sudo apt-get upgrade
  • $ sudo apt-get update
  • $ sudo apt-get install mysql-server mysql-admin apache2 php5 libapache2-mod-php5 libapache2-mod-auth-mysql php5-mysql phpmyadmin
  • $ sudo mysqladmin -u root password [YOUR_NEW_PASSWORD]
  • $ sudo /etc/init.d/mysql restart
  • $ sudo /etc/init.d/apache2 restart

Thats pretty much we need. Now let us test our configuration and LAMP setup.

  1. $ sudo vim /var/www/phpinfo.php
  2. Type in the following few lines of code in the file:
    <?php
      phpinfo();
    ?>
  3. Open up your browser and type in http://localhost/phpinfo.php
  4. If you are able to see the php config file information on your browser. Thats it.
  5. Next type http://localhost/phpmyadmin
  6. Login as root i.e. Username : root and Password : [YOUR_NEW_PASSOWRD]
  7. If you are lucky enough you will see the phpadmin console.

Congratulations ! Thats pretty much what exactly we need. Now go on to do all your web development on Linux. Hail Windows 😉

I configure all this stuff long back, so if I have missed out on some issues kindly lemme know and comment the same.

Essential frameworks for web development

Hello developers,

Initially when I started working on Altertunes, I was unaware of the various frameworks which one can use for his website. Though I never have used any of them for Altertunes (main reason being the eagerness to do everything by myself), but may be some of you out there might be interested in having a look at the same:

  • http://prototypejs.org : Undoubtedly one of the best available framework for Ajax in the market. Prototype is a javascript framework that aims to ease development of dynamic web applications.
  • http://script.aculo.us : If you are looking to have all those fade-in and fade-out effects for your website, probably Scriptaculous does the best for your website. Scriptaculous provides you with easy-to-use, cross-browser user interface JavaScript libraries to make your web sites and web applications fly. Though I am not too sure, but I guess Ruby on Rails already comes packaged with Scriptaculous.
  • Prototype Windows Class : Another master piece which allows your website to have all those fancy popups, fade-in and fade-out effect.
  • Google Web Toolkit : Google Web Toolkit (GWT) makes it easier to write high-performance AJAX applications. You write your front end in the Java programming language and GWT compiles your source into highly optimized JavaScript.

Probably you will find hundred’s of them in the market right now. Above I have listed a few which I tried atleast once and got good performance. However there are various factors which decide what Ajax framework you should choose for your site. Most important being:

“It seems that a very light framework is required for a public site. If your visitors need to upload a large Javascript API, they may not visit further your website, depending on your page’s download times. When working on intranets, or professionnal services, it may be acceptable that the first access to the application may be longer, moreover when the website is used daily. Also think about on-demand Javascript and Javascript compression.” – by Michael Mahemoff

If you are looking to develop your site using flash, then there are again some excellent frameworks available in the market for the same. Two of the best which I know are:

  • OpenLaszlo : OpenLaszlo is an open source platform for creating zero-install web applications with the user interface capabilities of desktop client software. OpenLaszlo programs are written in XML and JavaScript and transparently compiled to Flash and DHTML. The Alterplayer used on Altertunes, was developed by me using OpenLaszlo. I am open for any suggestion or question you need for development in OpenLaszlo.
  • Red5 : It is an opensource flash server written in Java that supports Streaming audio and video, recording client streams and many other useful things which openlaszlo fails to do currently. So if you are looking to develop a voice chat application, Red5 is the way to go.

Above are just a few which I found useful on my way to Altertunes. There are several more untouched by me, which might be more powerful than the those I have listed above. If you find any, kindly post the same as comment here.

Happy Frameworking and web developement 😉

Altertunes featured in Bangaloreinc.com

Hello All,

Yesterday Altertunes was featured and reviewed by BangaloreInc.com. This is first ever coverage of Altertunes over web.

To read more on the same visit:
http://www.bangaloreinc.com/2008/05/11/introducing-altertunescom-music-from-the-web/

To have a look at “In conversion with Abhinav Singh, Founder of Altertunes.com” visit:
http://www.bangaloreinc.com/2008/05/11/binc-talk-time-in-conversation-with-founder-of-altertunescom/

Many thanks to Pravin Karoshi for covering Altertunes on BangaloreInc.com.

How to write crawlers and parse a page using Perl (Part 1)

Hello all perl freaks,

One of the most powerful thing which we can achieve using perl is, extracting any content from a website you want to. For example, you can use perl to extract information of all the artists from All Music, extract information about all cricket players and matches from CricInfo. In the past I have used perl for making web crawlers for Altertunes and most recently I used perl to extract news from Google News.

Here I will try to explain how efficiently you can extract information by parsing html pages using perl.

To start with lets revise some basic stuffs about perl.

Lets first see how can we get HTML content of a website:

Example 1

require LWP::UserAgent;

#~ Call the gethtmlpage function by passing the url we want to save
gethtmlpage("http://abhinavsingh.com");

sub gethtmlpage {
  my $ua = LWP::UserAgent->new;
  #~ Use below line of code for proxied net connection
  $ua->proxy('http','http://[PROXY_URL]:[PROXY_PORT]/');
  my $response = $ua->post("$_[0]");

  if ($response->is_success) {
    $output = $response->content;
    open($fh,">abhinavsingh.com.html");
    print $fh $output;
  }
  else {
    print "Error in getting HTML page";
  }
}

If you are using PXPerl on windows, copy paste the above code in the SciTE perl editor (which comes in packaged with PXPerl) and simply press CNTR+F7. This will result into an html file named ‘abhinavsingh.com.html’ in your folder.

Most important feature which makes PERL and Python as default choice for web crawlers, is their ability of regular expression match. Lets see at some of the regular expression we will be using for parsing an HTML page.

Example 2

$sentence = "This is a perl tutorial by Abhinav Singh at http://abhinavsingh.com";

#~ Matching $sentence for 'Abhinav Singh'
$sentence =~ m/Abhinav Singh/i;

print "Pre-Match: ".$`."n";
print "Match: ".$&."n";
print "Post-Match: ".$'."n";

Copy the above code in SciTE perl editor and press CNTR+F7. You should see a result similar to this:

Output 2

>perl example2.pl
Pre-Match: This is a perl tutorial by
Match: Abhinav Singh
Post-Match:  at http://abhinavsingh.com
>Exit code: 0    Time: 0.962

Now lets see how can we extract relevant information from a page. Suppose we are interested in extracting all information about the artist Metallica from AllMusic website. Below I will first show you my code for the same and then its result. Finally I will discuss as to how did I made all those regular expressions:

Example 3

require LWP::UserAgent;

$bandname = "Metallica";
getartistinfo($bandname);

sub getartistinfo {
  my %formdata;
  my $ua = LWP::UserAgent->new;
  #~ $ua->proxy('http','http://[PROXY_URL]:[PROXY_PORT]/');

  $formdata{'sql'}=$_[0];
  $formdata{'opt1'}=1;
  $formdata{'P'}='amg';

  print "Sending HTTP request for ".$_[0]."...n";
  my $response = $ua->post('http://www.allmusic.com/cg/amg.dll',%formdata);

  if ($response->is_success) {
    print "Got HTTP response... parsing output for ".$_[0]."...nn";
    $output=$response->content;

    # Extracting Overview, Biography, Discography, Songs, Credit, Charts & Awards link for the artist
    $output =~ m/cg/amg.dll?p=amg&searchlink=(.*)">/;
    $BaseLink = "http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=";
    $OverviewLink = $1;
    $DiscographyMainAlbumLink = $BaseLink.$OverviewLink;
    $DiscographyMainAlbumLink =~ s/T0/T20/;
    print "Discography Main Album: ".$DiscographyMainAlbumLink."n";
    $DiscographySinglesEPLink = $BaseLink.$OverviewLink;
    $DiscographySinglesEPLink =~ s/T0/T22/;
    print "Discography Singles&EP: ".$DiscographySinglesEPLink."n";
    $DiscographyDvDVideosLink = $BaseLink.$OverviewLink;
    $DiscographyDvDVideosLink =~ s/T0/T23/;
    print "Discography DVD Videos: ".$DiscographyDvDVideosLink."n";
    $DiscographyAllSongsLink = $BaseLink.$OverviewLink;
    $DiscographyAllSongsLink =~ s/T0/T31/;
    print "Songs All Songs: ".$DiscographyAllSongsLink."n";
    $DiscographyCnAAlbumsLink = $BaseLink.$OverviewLink;
    $DiscographyCnAAlbumsLink =~ s/T0/T50/;
    print "Charts & Awards Billboard Albums: ".$DiscographyCnAAlbumsLink."n";
    $DiscographyCnASinglesLink = $BaseLink.$OverviewLink;
    $DiscographyCnASinglesLink =~ s/T0/T51/;
    print "Charts & Awards Billboard Singles: ".$DiscographyCnASinglesLink."n";
    $DiscographyGrammyLink = $BaseLink.$OverviewLink;
    $DiscographyGrammyLink =~ s/T0/T52/;
    print "Charts & Awards Grammy Awards: ".$DiscographyGrammyLink."nn";

    # Extracting Title Bar
    $output =~ m/<td class="titlebar"><span class="title">(.*)</span><br />/;
    $titlebar = $1;
    print "Titlebar:n".$titlebar."nn";
    $output = $';

    # Extracting Formed-Sub
    $output =~ m/Begin Formed(.*)<span>(.*)End Formed/;
    $output = $';
    $formedsub = $2;
    $formedsub =~ m/<a href=(.*)>(.*)</a>(.*)<a href=(.*)>(.*?)</a>/; # Parse $formedsub for exact string
    print "Formed: ".$2.$3.$5."nn";

    # Extracting timelinesubactive
    while($output =~ m/class="timeline-sub-active">(d+)</div>/) {
      print "ActiveYear:".$1."n";
      $output = $';
    }
    print "n";

    # Extract Genre, Style titles
    $output =~ m/id="left-sidebar-title-small"(.*?)</tr>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/id="left-sidebar-title-small"><span>(.*?)</span>/) {
      #~ print "Subclasses:".$1."n";
      push(@GSM,$1);
      $suboutput = $';
    }
    #~ print "n";

    # Extract Genre contents
    $output =~ m/<td class="list-cell"(.*?)</td>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/<li>(.*?)</li>/) {
      #~ print "Genres:".$1."n";
      $suboutput = $';
      $1 =~ m/<a href=(.*)>(.*)</a>/;
      push(@G,$2);
    }
    #~ print "n";

    # Extract Style contents
    $output =~ m/<td class="list-cell"(.*?)</td>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/<li>(.*?)</li>/) {
      #~ print "Styles:".$1."n";
      $suboutput = $';
      $1 =~ m/<a href=(.*)>(.*)</a>/;
      push(@S,$2);
    }
    #~ print "n";

    # Extract Mood subclass
    $output =~ m/id="left-sidebar-title-small"><span>(.*?)</span>/;
    $output = $';
    #~ print "Subclasses:".$1."nn";
    push(@GSM,$1);

    # Extract Mood Contents
    $output =~ m/id="left-sidebar-list"(.*?)</div>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/<li>(.*?)</li>/) {
      #~ print "Moods:".$1."n";
      $suboutput = $';
      $1 =~ m/<a href=(.*)>(.*)</a>/;
      push(@M,$2);
    }
    print "n";

    # Print the @GSM and @G,@S,@M content
    print $GSM[0].":";
    foreach $gen (@G) {
      print $gen."t";
    }
    print "nn".$GSM[1].":";
    foreach $gen (@S) {
      print $gen."t";
    }
    print "nn".$GSM[2].":";
    foreach $gen (@M) {
      print $gen."t";
    }
    print "nn";

    # Extract AMG Artist ID
    $output =~ m/<td class="sub-text"(.*?)</pre>/;
    $output = $';
    $1 =~ m/<pre>(.*)/;
    print "AMG Artist ID:".$1."nn";

    # Extracting Artist Mini Bio
    $output =~ m/id="artistminibio"><p>(.*)</p>/;
    $artistminibio = $1;
    $artistminibio =~ s/<a href(.*?)>//g; # Filtering out any link or html tags
    $artistminibio =~ s/</a>//g;
    $artistminibio =~ s/<i>//g;
    $artistminibio =~ s/</i>//g;
    print "ArtistMiniBio:n".$artistminibio."nn";

    # Extracting Other Entries, Group Members, Similar Artists, Influenced By and Follower
    $output =~ m/id="large-list"><tr>(.*?)</table>/;
    $suboutput = $&;
    $output = $';
    # Extracting two part of the table
    $suboutput =~ m/<td valign="top" width="266px">(.*)</td><td/;
    $lefthalftemp = $1;
    $righthalftemp = $';

    while($lefthalftemp =~ m/<div class="large-list-subtitle">(.*?)</div>/) {
      print $1.":n";
      $' =~ m/<ul>(.*?)</ul>/;
      $lefthalftemp = $';
      $li = $1;
      while($li =~ m/<li>(.*?)</li>/) {
        $li = $';
        $1 =~ m/<span class="libg"><a href=(.*)>(.*)</a></span>/i;
        print $2."n";
      }
      print "nn";
    }

    while($righthalftemp =~ m/<div class="large-list-subtitle">(.*?)</div>/) {
      print $1.":n";
      $' =~ m/<ul>(.*?)</ul>/;
      $righthalftemp = $';
      $li = $1;
      while($li =~ m/<li>(.*?)</li>/) {
        $li = $';
        $1 =~ m/<span class="libg"><a href=(.*)>(.*)</a></span>/i;
        print $2."n";
      }
      print "nn";
    }
  }
}

Copy the above code into the SciTE perl editor and press CNTR+F7. You should see an output as below, which contains all the extracted data about the artist Metallica.

Output 3

>perl example4.pl
Sending HTTP request for Metallica...
Got HTTP response... parsing output for Metallica...

Discography Main Album: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T20
Discography Singles&EP: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T22
Discography DVD Videos: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T23
Songs All Songs: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T31
Charts & Awards Billboard Albums: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T50
Charts & Awards Billboard Singles: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T51
Charts & Awards Grammy Awards: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T52

Titlebar:
Metallica

Formed: 1981 in Los Angeles, CA

ActiveYear:80
ActiveYear:90
ActiveYear:2000

Genre:Rock

Styles:Thrash	Heavy Metal	Speed Metal	Hard Rock

Moods:Bitter	Suffocating	Fierce	Angry	Aggressive	Menacing
Gritty	Tense/Anxious	Hostile	Crunchy	Epic	Nihilistic	Fiery
Intense	Dramatic	Harsh	Ominous	Rebellious	Uncompromising
Searching	Gloomy

AMG Artist ID:P     4906

ArtistMiniBio:
Metallica was easily the best, most influential heavy metal band of the '80s,
responsible for bringing the music back to Earth.
Instead of playing the usual rock star games of metal stars of the early '80s,
the band looked and talked like they were from the street.
Metallica expanded the limits of thrash, using speed and volume not for their own sake,
but to enhance their intricately structured compositions.
The release of 1983's Kill 'Em All marked the beginning of the legitimization
of heavy metal's underground, bringing new complexity and depth to thrash metal.
With each album, the band's playing and writing improved;
James Hetfield developed a signature rhythm playing that matched his growl,
while lead guitarist Kirk Hammett... Read More...

Other Entries:
Movie Entry
Classical Music Entry

Group Members:
Kirk Hammett
James Hetfield
Dave Mustaine
Jason Newsted
Lars Ulrich
Cliff Burton
Robert Trujillo
Ron McGovney

Similar Artists:
Slayer
Anthrax
Sepultura
Machine Head
Coroner
Death
Dio
Danzig
King Diamond
Mercyful Fate
Metal Church
Overkill
Voivod
Death Angel
Queensr?che
Cancer
Corrosion of Conformity
White Zombie
Rollins Band
Melvins
Soundgarden

See Also:
Megadeth
Flotsam & Jetsam
Exodus
Rock Star Supernova

Influenced By:
Mot?rhead
The Misfits
Diamond Head
Black Sabbath
Judas Priest
Angel Witch
Iron Maiden
Saxon
Accept
Budgie
Deep Purple
Rush
AC/DC
Led Zeppelin
G.B.H.
Fear
Ted Nugent
Lynyrd Skynyrd
UFO
Thin Lizzy
Queen

Followers:
Carcass
Grindcrusher
At War
Crowbar
The Beyond
Sevendust
Boy Hits Car
Queens of the Stone Age
Roachpowder
Ossiris
Avenged Sevenfold
Trapt
Hurt
Scenes from a Movie
Sick City
Saving Abel

Performed Songs By:
James Hetfield
Lars Ulrich
Kirk Hammett
Cliff Burton
Bob Rock
Dave Mustaine
Brian Tatler
Sean Harris
Roger Taylor
"Fast" Eddie Clarke
Glenn Danzig
Jason Newsted
John Deacon
Brian May
Freddie Mercury
Lemmy
Lemmy Kilmister
Phil "Philthy Animal" Taylor
Burke Shelley

>Exit code: 0    Time: 5.940

Thus, on running the above script you get all the insformation about the artist Metallica from the All Music’s Metallica page. For demonstration purpose I have just extracted information from Metallica’s main page, however you can write similar code to extract information from metallica’s other sub-pages on All Music.

Meanwhile, if you are just thinging as to, How come my perl script extract the artist information? What method have i used to make sure only the relevant information is parsed from the page? or How did I made all those regular expression matches? , watch out for Part 2 of this blog. As of now I leave up on you, to figure out how is it all done.

Here are a few important links which will help you in making crawlers similar to those of Altertunes, and also understand the methods I have used above.
1. Leed’s University perl page
2. Tizag
3. Finally the documentation which comes in with PXPerl, is in itself a complete guide for everything.

Hope I helped a little in your quest of making crawlers.

In next blog I will try to wrap up this section (I am tried writing this one as of now) 😉

All the best.

Web Development – Part 1: Apache, MySQL, PHP

Hello Friends,

Ever since I have started web development and launched Altertunes, many of my friends have asked me, “How and Where to start web development ?” . Though there are infinite tutorials over the internet on web development, I will try to jot down my experiences and learnings here.

To start with, I will just try to explain a few terms, before we go on and learn how to handle and integrate them together:

  1. Apache: is the web server, which is used world wide for web development. Since April 1996 Apache has been the most popular HTTP server on the World Wide Web. However, since November 2005 it has experienced a steady decline of its market share, lost mostly to Microsoft Internet Information Services. As of April 2008 Apache served 50.42% of all websites.
  2. MySQL: is a multi threaded and multi-user SQL database management system, which has more than 11 million installations. MySQL is popular for web applications and acts as the database component of the LAMP, BAMP, MAMP, and WAMP platforms (Apache-MySQL-PHP/Perl/Python on Linux/BSD/Mac/Windows), and for open-source bug tracking tools like Bugzilla. It’s popularity for use with web applications is closely tied to the popularity of PHP and Ruby on Rails, which are often combined with MySQL. PHP and MySQL are essential components for running popular content management systems such as Joomla, WordPress, Drupal, and some BitTorrent trackers. Wikipedia runs on MediaWiki software, which is written in PHP and uses a MySQL database.
  3. PHP: is a widely-used, general-purpose scripting language that is especially suited for web development and can be embedded into HTML. It generally runs on a web server, taking PHP code as its input and creating web pages as output. It can be deployed on most web servers and on almost every operating system and platform, free of charge. PHP is installed on more than 20 million websites and 1 million servers, although the number of websites with PHP installed has declined since August 2005. It is also the most popular Apache module among computers using Apache as a web server. The most recent major release of PHP was version 5.2.6 on May 1, 2008.

In short, Apache is the web server which serves the web pages for the website. MySQL is the database which is used to store necessary information ranging from user’s data to user’s pictures. PHP is the server side scripting language used to make web pages. Apache server takes PHP code as input, and outputs an HTML page on the browser.

Lets start some installations and configurations: If you don’t want to handle the installations and configurations of WAMP (Apache-MySQL-PHP on Windows), you may choose to download and install the pre-packaged installer from here. However its always better to configure things yourself to know the inner details. If you are brave enough to install and configure these modules, here we go:

  • Apache Installation: Go to http://httpd.apache.org/download.cgi and download the latest Win32 Binary including OpenSSL (MSI Installer) version of Apache Server Installer Available. Double click the installer and you should see something like this:

    Click next and after a few clicks you will be asked for the following parameters:

    Fill in the details, as shown in the figure above , choose installation type as ‘Typical’ and proceed to install the Apache server.

    Open up your favorite browser and type in http://localhost. If you see a page that says “It works!” then the Apache server has been installed successfully.

  • PHP Installation: Go to http://www.php.net/downloads.php and download the latest PHP Zip Package. Unzip the file into your C: drive, and rename the folder to C:PHP. Next copy the file C:PHPphp.ini-dist file to C:Windows and rename the file it C:Windowsphp.ini. This is the PHP Configuation file where we will be setting some of the important parameters later on.Now its time to integrate PHP with Apache we installed earlier. Open the Apache configuation file (Important: you should always have a copy of this file stored somewhere else, because an error or misconfiguration can cause your Apache server to crash!), which is located at C:Program FilesApache Software FoundationApache2.2confhttpd.conf into notepad and copy the following lines at the end of ‘LoadModule’ section in the configuration file:

    #AddType application/x-httpd-php .php
    LoadModule php5_module “C:/PHP/php5apache2.dll”
    AddType application/x-httpd-php .php .phtml .inc .php3
    AddType application/x-httpd-php-source .phps
    # configure the path to php.ini
    PHPIniDir “C:/Windows”

    Now lets configure the root directory of our Apache server. Root directory is the base directory from where the website is served. When you type in http://localhost in your browser, the server looks for the necessary files in the root directory and serves it to the browser. By default, the root directory of the Apache server is set to : C:Apache2htdocs. You may keep the root as C:Apache2htdocs or you may change it to a directory of your choice as follows:

    Look for the ‘DocumentRoot’ Section in the Apache configuration file located here: C:Program FilesApache Software FoundationApache2.2confhttpd.conf. You should see something like this:

    # DocumentRoot: The directory out of which you will serve your
    # documents. By default, all requests are taken from this directory, but
    # symbolic links and aliases may be used to point to other locations.
    DocumentRoot “C:Apache2htdocs”

    Change C:Apache2htdocs to a directory of your choice, anywhere on your computer. For eg. C:Workspace.

    Further scroll down in the httpd.conf file and search for a section which looks like this:

    # This should be changed to whatever you set DocumentRoot to.
    <Directory “C:Apache2htdocs”>

    Change the directory location from C:Apache2htdocs to whatever you have set in the step before.

    Now to test whether everything is configured correctly, create a file in the root directory of Apache server, named phpinfo.php and paste the following code into it:

    <?php phpinfo(); ?>

    Restart your Apache server from here Start > All Programs > Apache HTTP Server > Control Apache Server > Restart, and open the following URL in your browser: http://localhost/phpinfo.php . If everything is well setup and you are lucky enough, you will see something like this:

    Finally, we just need to make a few configurations in the PHP configuration file which we copied to C:Windowsphp.ini. Open the file in a text editor and find the section which looks like this:

    ; Directory in which the loadable extensions (modules) reside.
    extension_dir = “./”

    Change it to look like this:

    ; Directory in which the loadable extensions (modules) reside.
    extension_dir = “C:PHPext”

    Also, find the section in your PHP configuration file which looks like this:

    session.save_path = “/tmp”

    and change it to:

    session.save_path = “C:WINDOWStemp”

    And finally we come to the end of PHP installation section. Congratulations, Good going !

  • MySQL Installation: Go to http://dev.mysql.com/downloads/mysql and download the latest MySQL database. Once ZIP file has finished downloading, extract it using WinZIP or a similiar program on your computer. Double click the setup.exe file to start the installation. You should see something like this:Keep clicking next and keep all the default options (use typical installation and not the detailed one). After installation finishes, go to start -> run -> services.msc and start the MySQL service. (If MySQL service doesn’t exist, the installation was not successful) . Now we need to tell PHP that MySQL exists on your system, and we need to integrate them. Go to php.ini i.e. the PHP configuration file, and make the following changes:

    Search for something like this:
    ;extension=php_mysql.dll

    and make it like this:
    extension=php_mysql.dll (i.e. you just need to remove the semicolon)

    Additionally you need to add this:
    extension=php_mysqli.dll

    at the end of the extensions. Further copy:
    libmysql.dll to C:WindowsSystem32 and restart Apache.

  • PHPMyAdmin and MySQL Query Browser Installation: Though you can always use MySQL database through command line but having PHPMyAdmin or MySQL query browser will make your database management a lot easier and simple. To install download PHPMyAdmin from here. Download the ZIP file and extract it under the root directory of Apache Server, i.e. if root directory is set as C:Workspace , the extracted PHPMyAdmin folder’s location should have a path C:Workspacephpmyadmin.Open the config.ini.php file in the PHPMyAdmin folder and make the following changes:$cfg[‘Servers’][$i][‘user’] = ‘root’;
    $cfg[‘Servers’][$i][‘password’] = ‘YOUR_PASSWORD’; // Your MySQL Password
    /* Authentication type */
    $cfg[‘Servers’][$i][‘auth_type’] = ‘config’;
    /* Server parameters */
    $cfg[‘Servers’][$i][‘host’] = ‘localhost’;
    $cfg[‘Servers’][$i][‘connect_type’] = ‘tcp’;
    $cfg[‘Servers’][$i][‘compress’] = false;
    /* Select mysqli if your server has it */
    $cfg[‘Servers’][$i][‘extension’] = ‘mysql’;
    /* User for advanced features */
    $cfg[‘Servers’][$i][‘controluser’] = ‘root’;
    $cfg[‘Servers’][$i][‘controlpass’] = ‘YOUR_PASSWORD’; // Your MySQL Password
    /* Advanced phpMyAdmin features */
    $cfg[‘Servers’][$i][‘pmadb’] = ‘phpmyadmin’; //Name of PHPMyAdmin folder
    $cfg[‘Servers’][$i][‘bookmarktable’] = ‘pma_bookmark’;
    $cfg[‘Servers’][$i][‘relation’] = ‘pma_relation’;
    $cfg[‘Servers’][$i][‘table_info’] = ‘pma_table_info’;
    $cfg[‘Servers’][$i][‘table_coords’] = ‘pma_table_coords’;
    $cfg[‘Servers’][$i][‘pdf_pages’] = ‘pma_pdf_pages’;
    $cfg[‘Servers’][$i][‘column_info’] = ‘pma_column_info’;
    $cfg[‘Servers’][$i][‘history’] = ‘pma_history’;

    Now open the url, http://locahost/phpmyadmin and if everything is configured correctly you must see something like this:

And thats it, if you are able to see all the above images on your computer, it means you have just configured everything correctly and you are all setup for web development.

I have installed and configured all the components way back (6 months ago), hence there is a possibility that I might have missed out on some trivial points. If you catch hold of any such issue do comment the same.

Enjoy making cool webpages, and All the best for your website 🙂