SEO Analyzer v 1.2 – Adding support for Bing along with Google and Yahoo

On May 28, 2009 Microsoft announced Bing which has now replaced Live search. And within 2 week of it’s release, Bing seems to have leapfrog Yahoo search in U.S.

With Bing coming up as a strong contender to Google and Yahoo search engines, I have added support for Bing in SEO Analyzer v 1.2. This will enable you to analyze a site’s ranking for a particular keyword on Bing search engine, along with Yahoo and Google search engines.

seoanalyzer-bing-support

SEO Analyzer Future Roadmap:

  • Saving reports on spreadsheets
  • Comparing two sites for a list of keywords
  • Keyword density extraction tool

Happy SEO Analysis!

Y!OS Developer Release is Live. Yahoo! is open!

Today, Yahoo opened the developer interfaces to our Y!OS platform stack through the release of the Yahoo! Social Platform (YSP), Yahoo! Query Language (YQL), Yahoo! Application Platform (YAP), and the YDN developer dashboard.  This makes Y!OS a reality for an essential constituency: developers.  Now, anyone on the web has access to Yahoo!’s tools and data to start building applications for Yahoo!’s vast audience and the web beyond.  It’s a great step forward in rewiring Yahoo! with a social dimension and a platform architecture that’s open like never before.

Yahoo! Social Platform (YSP)
See: http://developer.yahoo.com/social

Yahoo! Query Language (YQL)
See: http://developer.yahoo.com/yql

Yahoo! Application Platform (YAP)
See: http://developer.yahoo.com/yap

Caja, security, and user privacy
For more info on Caja: http://developer.yahoo.com/yap/caja

OAuth
See: http://developer.yahoo.com/oauth

OpenSocial Support
This launch also marks Yahoo!’s first implementation of OpenSocial support.
http://opensocialapis.blogspot.com/2008/10/launched-yahoos-first-implementation-of.html

Hats off to Yahoo!
Y!OS is out there.  Yahoo! is Open.


Yahoo Web Analytics (Beta) – Far more better than Google Analytics

Born out of our acquisition of IndexTools in May, Yahoo! Web Analytics (beta) provides powerful data and insights reporting that help website owners evaluate their marketing performance and tweak their website designs. They’ll get custom real-time reports and graphs that help them slice and dice metrics like sales, page views, and sources of traffic and ultimately identify ways to amp up their visitor satisfaction.

The first big deployment is Yahoo! Small Business, whose 13,000 hosted e-commerce customers can get set up just in time for the holiday shopping season at the click of a button.

Read more http://ycorpblog.com/2008/10/08/introducing-yahoo-web-analytics/

YAHOO! Announces Settlement with Carl Icahn

SUNNYVALE, Calif., July
21, 2008

Yahoo! Inc. (Nasdaq:YHOO), a leading global Internet company, announced today that it has reached an agreement with Carl Icahn to settle their pending proxy contest related to the Company’s 2008 annual meeting of stockholders.

Under the terms of the settlement agreement, eight members of Yahoo!’s current Board of Directors will stand for re-election at the 2008 annual meeting:  Roy Bostock, Ronald Burkle, Eric Hippeau, Vyomesh Joshi, Arthur Kern, Mary Agnes Wilderotter, Gary Wilson and Jerry Yang.  In view of the settlement agreement
with Mr. Icahn, and the termination of the proxy contest, Robert Kotick has decided not to stand for re-election to the Board at the 2008 annual meeting.

Following the 2008 annual meeting, the Yahoo! Board will be expanded to 11 members.  Carl Icahn will be appointed to the Board and the remaining two seats will be filled by the Board upon the recommendation of the
Board’s Nominating and Governance Committee from a list of nine candidates recommended by Mr. Icahn, which includes the eight remaining members of the Icahn slate of nominees and Jonathan Miller, currently a partner in Velocity Interactive Group and former Chairman and CEO of
AOL.

As part
of the settlement agreement, Mr. Icahn, who owns an aggregate of 68,786,320 shares, or 4.98% of Yahoo! common stock, has agreed to withdraw his nominees for consideration at the annual meeting and to vote his Yahoo! shares in support of the Board’s nominees.

“We are gratified to have reached this agreement, which serves the best interests of all Yahoo! stockholders,” said Yahoo! Chairman Roy Bostock. “We look forward to working productively with Carl and the new members of the Board on continuing to improve the Company’s performance and enhancing stockholder value.  Yahoo! is a world-class company with an extremely bright future, and collaborating together,
I believe we can help the Company achieve its ambitious goals.”

“This agreement will not only allow Yahoo! to put the distraction of the proxy contest behind us, it will allow the Company to continue pursuing its strategy of being the starting point for Internet users and a must buy for advertisers,” said Yahoo! Co-founder and Chief Executive Officer Jerry Yang. “No other company in
the Internet space has our unique combination of global brand, talented employees, innovative technologies and exceptional assets, attributes that will help us take advantage of the large and growing opportunity ahead of us.  I look forward to working together with our new colleagues on the Board to make that happen.”

Mr. Icahn said, “I am very pleased that this settlement will allow me to work in partnership with Yahoo!’s Board and management team to help the Company achieve its full potential.  While I continue to believe that the sale of the whole Company or the sale of its Search business in the right transaction must be
given full consideration, I share the view that Yahoo!’s valuable collection of assets positions it well to continue expanding its online leadership and enhancing returns to stockholders. I believe this is a good outcome and that we will have a strong working relationship going forward.  Additionally, I am happy that the board has agreed in the settlement agreement that any meaningful transaction, including the strategy in dealing with that transaction, will be fully discussed with the entire board before any final decision is made.”

In response to Mr. Kotick’s decision to step down from the Board, Mr. Bostock said, “I would like to personally thank Bobby for his dedicated service to Yahoo! these past 5 years.  Bobby has been a valuable resource to our Board and the Company and we are grateful for his contributions.  He wanted to help see the
Company through this recent chapter, but made it clear to me that once the proxy contest was resolved, he was eager to focus his efforts on his work as CEO of the newly merged Activision Blizzard and his other business and civic pursuits.”

The Company intends to file the full text of the settlement agreement later today with the Securities and Exchange Commission, and will also file and mail to its stockholders, supplemental proxy material.

Forward-Looking Statements

This press release (including without limitation the statements and information in the quotations in this press release) contains forward-looking statements that involve risks and uncertainties concerning Yahoo!’s strategic and operational plans. Actual
results may differ materially from those described in this release due to a number of risks and uncertainties. The potential risks and uncertainties include, among others, the expected benefits of the commercial agreement with Google may not be realized, including as a result of actions taken by United
States or foreign regulatory authorities and the response or acceptance of the agreement by publishers, advertisers, users and employees; the implementation and results of Yahoo!’s ongoing strategic initiatives; the impact of organizational changes; Yahoo!’s ability to compete with new or existing competitors; reduction in spending by, or loss of, marketing services customers;
the demand by customers for Yahoo!’s premium services; acceptance by users of
new products and services; risks related to joint ventures and the integration of acquisitions; risks related to Yahoo!’s international operations; failure to manage growth and diversification; adverse results in litigation, including intellectual property infringement claims; Yahoo!’s ability to protect its
intellectual property and the value of its brands; dependence on key personnel; dependence on third parties for technology, services, content and distribution; general economic conditions and changes in economic conditions; potential continuing uncertainty arising in connection with  Microsoft’s various proposals to acquire all or part of Yahoo!; the possibility that Microsoft or another
person may in the future make other proposals, or take other actions which may create uncertainty for our employees, publishers, advertisers and other business partners; and the possibility of significant costs of defense, indemnification and liability resulting from stockholder litigation relating to such proposals.

More information about potential factors that could affect Yahoo!’s business and financial results is included under the captions “Risk Factors” and “Management’s Discussion and Analysis of Financial Condition and Results of Operations” in Yahoo!’s Annual Report on Form 10-K for the fiscal year ended
December 31, 2007, as amended, and the Quarterly Report on Form 10-Q for the quarter ended March 31, 2008, which are on file with the Securities and Exchange Commission (“SEC”) and available at the SEC’s website at www.sec.gov. All information in this release is as of July 21, 2008, unless otherwise noted, and Yahoo! does not intend, and undertakes no duty, to update or otherwise revise
the information contained in this letter.

About Yahoo! Inc.

Yahoo! Inc. is a leading global Internet brand and one of the most trafficked Internet destinations worldwide. Yahoo! is focused on powering its communities of users, advertisers, publishers, and developers by creating indispensable experiences built on trust. Yahoo! is headquartered in Sunnyvale, California.
 

Sources:
http://www.smartbrief.com/news/aaaa/industryBW-detail.jsp?id=E5ABFF78-3F84-4507-B1B2-55696D8C85EB
http://www.scribd.com/doc/4017668/Yahoo-Announces-Settlement-with-Carl-Icahn
http://www.forbes.com/businesswire/feeds/businesswire/2008/07/21/businesswire20080721005563r1.html

My Interview with Yahoo-Inc! (Part 1)

Hello Friends,

Last month I was interviewed by Yahoo-Inc! for a job as software engineer. Here I would like to point out the discussions and various question/answer sessions I went through before I got my confirmation letter 🙂 Probably someone looking to join yahoo can take some tips from here.

In total I went through 7 rounds of interview (1 telephonic, 1 HR, 1 with Manager, 3 technical, 1 general aptitude). I will try my best to recall and summarize all of them for you all. But before I go ahead, I would like to thank my good friend Rajib Das for forwarding my resume in Yahoo and providing me with an opportunity to have a crack at it. Here I go:

  • Telephonic Interview: As it happens with all multinational firms, you go through a telephonic screening before you are actually called for a face-to-face interview. One good thing which happened for me even before the interview started was that, the interviewer already went through my CV thoroughly before interviewing me & he also checked my website Altertunes before interviewing me. You can be unlucky at times, if the interviewer comes with an open mind and try to test you in areas where you lack deep knowledge.

    My telephonic interview started early in the morning around 10 o’clock. It started with general about me questions. With my experiences I have learnt that these about me  sessions decide to a certain extent where your interview will head to. Hence, I always try to attract interviewer’s attention on my strong areas i.e. Audio Processing and web development.

    Next luckily for me he started discussing about Altertunes and various technologies used in developing the same. He specifically asked me questions like:

    1. How have you build the spell checker algorithm for Altertunes ?
    2. From where are you playing all the mp3 files ? Are they on your server ?
    3. If you have to, how will you implement the Orkut’s people connection algorithm ? i.e. If User A visits User C’s profile where User C is not a direct connection of User A, then Orkut shows a possible shortest connection. Something like this User A -> User B -> User C.
    4. What considerations have you taken while building the auto-suggest feature for Altertunes ?

    Though in my opinion a concrete answer do not exist for any of the above questions. There can be many possible solutions for every question he asked me. Here are my answers to his questions.

    1. I have used the spell checking algorithm as described by Peter Norvig, the director of research at Google in his post here. Many say, the 20 lines of code written by him in python is pure genius, and truly without any doubt it is one of the shortest and most efficient code that one can ever see. However for Altertunes, I have used a tweaked version of the same coded in PHP. For me the 20 lines code in python took 74 lines in PHP, however the results are just excellent for Altertunes. My spell checker for Altertunes gives over 90% accuracy for single word artists and over 80% accuracy for multiple word artists.
      Another simple but less accurate approach which I tried earlier was by using PHP’s inbuild function SOUNDEX, which is a phonetic algorithm for indexing names as sound, as pronounced in english. Read more about SOUNDEX here. However, the spell checker using SOUNDEX is not as accurate as the algorithm described by Peter Norvig.

    2. The answer to this question lies in one work Crawlers. I wrote crawlers in PERL which crawls the web for mp3 files and index them in the Altertunes database. However this answer didn’t satisfied the interviewer and he asked me Can you explain the architecture which you follow for developing crawlers? Probably he wanted to hear the word resolver from me, because as soon as I started telling him about resolvers he stopped me in between and proceeded to the next question. For more hint on how have I used PHP to develop web crawlers for Altertunes read here.
    3. Answer to the third question was probably hidden in the question itself. Shortest Path, is what we want hence we will obviously use one of the shortest path algorithm. As I didn’t had much knowledge on shortest path algorithms then, I suggested Dijkstra algorithm as the most appropriate for this task. However when I later discussed this problem with one of my friend, he told me probably Breath First Search (BFS) will be best suited for this problem.
    4. Here actually he asked me How have you developed the auto-suggest feature for Altertunes? For which I replied in one work AJAX. However, probably he wanted some more technical stuff on this topic and hence the next question he fired was What considerations have you taken while building the auto-suggest feature for Altertunes ? I still don’t really know what exactly was he looking for. My reply to him was, ‘I am sending each and every word typed in the input box to the server and getting response to all requests. However before sending the next request I wait for the previous response to come as I wanted my users to experience an incremental search.’

    I thought this was all that he will ask me for the telephonic interview, but ailaaa ! He started firing some questions from general algorithm and aptitude. A few questions which he further asked me were:

    • Given a tree, how will you create a mirror image of the tree? What algorithm will you use? What traversal will you prefer? 🙂
    • There is a link list, which contains a closed loop in it. How will you determine that closed loop in the link list? 🙂
    • Given a few numbers, how will you push them into a stack such that whenever I pop the top most element from that stack, I get the minimum number from the bunch. Also he wanted me to tell the pop push algorithm such that the order of insertion and popping should be O(1). 🙁
    • Given two log of wood which burns completely in 1 hour each. How will you determine 45 minutes.

    Hopefully this was all which I was asked in my telephonic interview. Within 10 minutes of the interview I received a call from the HR, asking me to come for a face-to-face interview which I scheduled for a week later. As I already got a feel of what all will be coming in the face-to-face interview, I didn’t wanted to go under prepared.

  • Face-to-Face Round 1: Before going for the in person interview, I tried my best to make myself familiar with the mother of all algorithm books by Cormen. You can download an e-copy of the same from here, size is about 10 Mb. Alternately you can search for -inurl:htm -inurl:html intitle:”index of” +(“/ebooks”|”/book”) +(chm|pdf|zip) +”cormenin google and get loads of sources for the same. You may change the search string to find other books from different formats. Anyways coming back to the interview.

    Just like my telephonic interview, the 1st round started at sharp 10 o’clock. The interviewer seemed liked a mixed bag to me. He tried to ask me questions ranging from C++, PERL, PHP, Java, Client-Server Model, Probability and even general aptitude. Here are a few questions which I can recall that were asked from me:

    1. Given a HTML page, how will you represent the same in XML format ?
    2. What are the advantages of Hidden Markov Model over other pattern recognition algorithms ?
    3. Yahoo is organizing an online programming competition and I am the head. There will be over a few million programmers participating in this contest. I have a set of 10 programming questions with me, which I give to each programmer one by one. As the programmer submits the solutions I check it, and if found perfect I give him the next question. Who-so-ever completes the 10 programming questions first is the winner. You have to award top 10 programmers. How will you go about developing such a system ? What all precautions will you take while developing such a system ? How many CPU will be required for such a system ? How will you protect the system from going down, in cases when a programmer tries to submit a virus as a solution ? etc etc
    4. Whats the probability that, my and your birthday lies on the same date and same day of the year ? (There were about 3-4 probability questions that he asked me)
    5. Draw and explain the whole client-server model ?
    6. Explain in detail, what all happens from the moment one submit a URL in the browser till the user sees the requested page on the browser?
    7. Write algorithms for Breath First Search (BFS) and Depth First Search (DFS) ? Which one is preferred over the other and why ?

    Then like the telephonic interview he fired almost the same set of questions over Altertunes. Probably there were a few more which I can’t recall at this point. Possibly the ones which I wasn’t able to reply. 😉 And that was it.

  • Face-to-Face Round 2: This was a complete session on aptitude level questions, which just kept coming one after the another. Some questions over which you can probably spend the whole day and say ‘This can be done the other way round too‘. Here are a few that I remember as of now:
    1. You have to weight all the weights between 1 and 100. Tell minimum number of weights required to do the same? (Probably the answer I gave was, weights which are a power of 2 i.e. 1,2,4,8,16,32 and 64)
    2. You have to weight all the weights between 1 and 100. Tell minimum number of weights required to do the same, however in this case you can use both pans to measure a particular weight ? For eg. If we need to weight 2 Kgs we can have 3 Kg weigh on one side and a 1 Kg + 2 Kg weight on the other side. (Probably the answer I gave was, weights which are a power of 3 i.e. 1,3,9,27…)
    3. You have a few eggs and there is a 100 level building. Determine the minimum number of attempts required to detect the floor from which, if we drop the egg it will break ? (Always consider the worst case that the egg will break at the top floor and for minimum attempts proceed from lower floor to top floor in non-uniform steps. Probably I was able to do this in 14 tries. Check if you can do the same :P)
    4. There is a huge string containing letters. Now in the string there are a lots and lots of palindromes. You need to find out the palindrome of the longest length. How will you do the same ? (Note: When I was reached the solution using stacks and queues, to complicate things he told me that a palindrome can further have a palindrome within it for eg. PALINDROMEDALADXYXDALADEMORDNILAP , hence we see the string has DALAD as a palindrome which is hidden in the full string which is a palindrome of bigger size.)
    5. There is a link list which have a lot of circular loops into it. How will you find the longest loop in the link list ?
    6. You have a very very large text file containing words. Consider that you can’t read it fully in the memory at once. How will you go about finding a word in the text file ?
    7. You have a very very large text file containing numbers. Consider that you can’t read it fully in the memory at once. How will you go about arranging the numbers in ascending or descending order ?
    8. A few questions on probability were discussed.

    There were a lot more which he asked from me. I guess there are easily 7-8 questions more that he asked from me. However this was the most interesting round that I had over the whole interview.

Probably thats enough for the first post. I will continue to write about my other rounds in upcoming posts. Till then if you have any questions or comments feel free to post them here.

Google, Yahoo, Microsoft toolkit for startups

Welcome,

Well there is some good news for young entrepreneurs out there.

  • Google released Google Apps which can be used by for your websites email, hosting needs. You can use all the google applications (Gmail, Gtalk, Google docs, Calendar, Page Creator etc) for your site and all for free. This really helps one to focus more on the functionality and features of their website rather than wasting time in maintaining such trivial things. (Altertunes currently uses Google Apps without any problem)
  • In competition, Microsoft too have released Startup Center and Microsoft Office Live Basics, which includes Business plans for startups, Domain names, webspace and hosting with metrics, Accounting software, tools and etc.
  • And, Finally Yahoo India too started a similar thing for small businesses called Yahoo Small Business. However this comes at a nominal price of Rs 999/- per year (which is acceptable considering they give you domain name registration with this). The package includes Domain Name, Free 200 Business email ids, Site Solutions to design Website, 5 GB disk space, 200 GB data transfer and some free Ads slots.

Not to forget that Google have already released Google App Engine a few weeks ago which allows you to execute your application on google infrastructue.

Surely the competition in the market is benefiting the small entrepreneurs in the market.

Take full advantage of this and All the best for your startup.

How to write crawlers and parse a page using Perl (Part 1)

Hello all perl freaks,

One of the most powerful thing which we can achieve using perl is, extracting any content from a website you want to. For example, you can use perl to extract information of all the artists from All Music, extract information about all cricket players and matches from CricInfo. In the past I have used perl for making web crawlers for Altertunes and most recently I used perl to extract news from Google News.

Here I will try to explain how efficiently you can extract information by parsing html pages using perl.

To start with lets revise some basic stuffs about perl.

Lets first see how can we get HTML content of a website:

Example 1

require LWP::UserAgent;

#~ Call the gethtmlpage function by passing the url we want to save
gethtmlpage("http://abhinavsingh.com");

sub gethtmlpage {
  my $ua = LWP::UserAgent->new;
  #~ Use below line of code for proxied net connection
  $ua->proxy('http','http://[PROXY_URL]:[PROXY_PORT]/');
  my $response = $ua->post("$_[0]");

  if ($response->is_success) {
    $output = $response->content;
    open($fh,">abhinavsingh.com.html");
    print $fh $output;
  }
  else {
    print "Error in getting HTML page";
  }
}

If you are using PXPerl on windows, copy paste the above code in the SciTE perl editor (which comes in packaged with PXPerl) and simply press CNTR+F7. This will result into an html file named ‘abhinavsingh.com.html’ in your folder.

Most important feature which makes PERL and Python as default choice for web crawlers, is their ability of regular expression match. Lets see at some of the regular expression we will be using for parsing an HTML page.

Example 2

$sentence = "This is a perl tutorial by Abhinav Singh at http://abhinavsingh.com";

#~ Matching $sentence for 'Abhinav Singh'
$sentence =~ m/Abhinav Singh/i;

print "Pre-Match: ".$`."n";
print "Match: ".$&."n";
print "Post-Match: ".$'."n";

Copy the above code in SciTE perl editor and press CNTR+F7. You should see a result similar to this:

Output 2

>perl example2.pl
Pre-Match: This is a perl tutorial by
Match: Abhinav Singh
Post-Match:  at http://abhinavsingh.com
>Exit code: 0    Time: 0.962

Now lets see how can we extract relevant information from a page. Suppose we are interested in extracting all information about the artist Metallica from AllMusic website. Below I will first show you my code for the same and then its result. Finally I will discuss as to how did I made all those regular expressions:

Example 3

require LWP::UserAgent;

$bandname = "Metallica";
getartistinfo($bandname);

sub getartistinfo {
  my %formdata;
  my $ua = LWP::UserAgent->new;
  #~ $ua->proxy('http','http://[PROXY_URL]:[PROXY_PORT]/');

  $formdata{'sql'}=$_[0];
  $formdata{'opt1'}=1;
  $formdata{'P'}='amg';

  print "Sending HTTP request for ".$_[0]."...n";
  my $response = $ua->post('http://www.allmusic.com/cg/amg.dll',%formdata);

  if ($response->is_success) {
    print "Got HTTP response... parsing output for ".$_[0]."...nn";
    $output=$response->content;

    # Extracting Overview, Biography, Discography, Songs, Credit, Charts & Awards link for the artist
    $output =~ m/cg/amg.dll?p=amg&searchlink=(.*)">/;
    $BaseLink = "http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=";
    $OverviewLink = $1;
    $DiscographyMainAlbumLink = $BaseLink.$OverviewLink;
    $DiscographyMainAlbumLink =~ s/T0/T20/;
    print "Discography Main Album: ".$DiscographyMainAlbumLink."n";
    $DiscographySinglesEPLink = $BaseLink.$OverviewLink;
    $DiscographySinglesEPLink =~ s/T0/T22/;
    print "Discography Singles&EP: ".$DiscographySinglesEPLink."n";
    $DiscographyDvDVideosLink = $BaseLink.$OverviewLink;
    $DiscographyDvDVideosLink =~ s/T0/T23/;
    print "Discography DVD Videos: ".$DiscographyDvDVideosLink."n";
    $DiscographyAllSongsLink = $BaseLink.$OverviewLink;
    $DiscographyAllSongsLink =~ s/T0/T31/;
    print "Songs All Songs: ".$DiscographyAllSongsLink."n";
    $DiscographyCnAAlbumsLink = $BaseLink.$OverviewLink;
    $DiscographyCnAAlbumsLink =~ s/T0/T50/;
    print "Charts & Awards Billboard Albums: ".$DiscographyCnAAlbumsLink."n";
    $DiscographyCnASinglesLink = $BaseLink.$OverviewLink;
    $DiscographyCnASinglesLink =~ s/T0/T51/;
    print "Charts & Awards Billboard Singles: ".$DiscographyCnASinglesLink."n";
    $DiscographyGrammyLink = $BaseLink.$OverviewLink;
    $DiscographyGrammyLink =~ s/T0/T52/;
    print "Charts & Awards Grammy Awards: ".$DiscographyGrammyLink."nn";

    # Extracting Title Bar
    $output =~ m/<td class="titlebar"><span class="title">(.*)</span><br />/;
    $titlebar = $1;
    print "Titlebar:n".$titlebar."nn";
    $output = $';

    # Extracting Formed-Sub
    $output =~ m/Begin Formed(.*)<span>(.*)End Formed/;
    $output = $';
    $formedsub = $2;
    $formedsub =~ m/<a href=(.*)>(.*)</a>(.*)<a href=(.*)>(.*?)</a>/; # Parse $formedsub for exact string
    print "Formed: ".$2.$3.$5."nn";

    # Extracting timelinesubactive
    while($output =~ m/class="timeline-sub-active">(d+)</div>/) {
      print "ActiveYear:".$1."n";
      $output = $';
    }
    print "n";

    # Extract Genre, Style titles
    $output =~ m/id="left-sidebar-title-small"(.*?)</tr>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/id="left-sidebar-title-small"><span>(.*?)</span>/) {
      #~ print "Subclasses:".$1."n";
      push(@GSM,$1);
      $suboutput = $';
    }
    #~ print "n";

    # Extract Genre contents
    $output =~ m/<td class="list-cell"(.*?)</td>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/<li>(.*?)</li>/) {
      #~ print "Genres:".$1."n";
      $suboutput = $';
      $1 =~ m/<a href=(.*)>(.*)</a>/;
      push(@G,$2);
    }
    #~ print "n";

    # Extract Style contents
    $output =~ m/<td class="list-cell"(.*?)</td>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/<li>(.*?)</li>/) {
      #~ print "Styles:".$1."n";
      $suboutput = $';
      $1 =~ m/<a href=(.*)>(.*)</a>/;
      push(@S,$2);
    }
    #~ print "n";

    # Extract Mood subclass
    $output =~ m/id="left-sidebar-title-small"><span>(.*?)</span>/;
    $output = $';
    #~ print "Subclasses:".$1."nn";
    push(@GSM,$1);

    # Extract Mood Contents
    $output =~ m/id="left-sidebar-list"(.*?)</div>/;
    $suboutput = $&;
    $output = $';
    while($suboutput =~ m/<li>(.*?)</li>/) {
      #~ print "Moods:".$1."n";
      $suboutput = $';
      $1 =~ m/<a href=(.*)>(.*)</a>/;
      push(@M,$2);
    }
    print "n";

    # Print the @GSM and @G,@S,@M content
    print $GSM[0].":";
    foreach $gen (@G) {
      print $gen."t";
    }
    print "nn".$GSM[1].":";
    foreach $gen (@S) {
      print $gen."t";
    }
    print "nn".$GSM[2].":";
    foreach $gen (@M) {
      print $gen."t";
    }
    print "nn";

    # Extract AMG Artist ID
    $output =~ m/<td class="sub-text"(.*?)</pre>/;
    $output = $';
    $1 =~ m/<pre>(.*)/;
    print "AMG Artist ID:".$1."nn";

    # Extracting Artist Mini Bio
    $output =~ m/id="artistminibio"><p>(.*)</p>/;
    $artistminibio = $1;
    $artistminibio =~ s/<a href(.*?)>//g; # Filtering out any link or html tags
    $artistminibio =~ s/</a>//g;
    $artistminibio =~ s/<i>//g;
    $artistminibio =~ s/</i>//g;
    print "ArtistMiniBio:n".$artistminibio."nn";

    # Extracting Other Entries, Group Members, Similar Artists, Influenced By and Follower
    $output =~ m/id="large-list"><tr>(.*?)</table>/;
    $suboutput = $&;
    $output = $';
    # Extracting two part of the table
    $suboutput =~ m/<td valign="top" width="266px">(.*)</td><td/;
    $lefthalftemp = $1;
    $righthalftemp = $';

    while($lefthalftemp =~ m/<div class="large-list-subtitle">(.*?)</div>/) {
      print $1.":n";
      $' =~ m/<ul>(.*?)</ul>/;
      $lefthalftemp = $';
      $li = $1;
      while($li =~ m/<li>(.*?)</li>/) {
        $li = $';
        $1 =~ m/<span class="libg"><a href=(.*)>(.*)</a></span>/i;
        print $2."n";
      }
      print "nn";
    }

    while($righthalftemp =~ m/<div class="large-list-subtitle">(.*?)</div>/) {
      print $1.":n";
      $' =~ m/<ul>(.*?)</ul>/;
      $righthalftemp = $';
      $li = $1;
      while($li =~ m/<li>(.*?)</li>/) {
        $li = $';
        $1 =~ m/<span class="libg"><a href=(.*)>(.*)</a></span>/i;
        print $2."n";
      }
      print "nn";
    }
  }
}

Copy the above code into the SciTE perl editor and press CNTR+F7. You should see an output as below, which contains all the extracted data about the artist Metallica.

Output 3

>perl example4.pl
Sending HTTP request for Metallica...
Got HTTP response... parsing output for Metallica...

Discography Main Album: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T20
Discography Singles&EP: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T22
Discography DVD Videos: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T23
Songs All Songs: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T31
Charts & Awards Billboard Albums: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T50
Charts & Awards Billboard Singles: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T51
Charts & Awards Grammy Awards: http://www.allmusic.com/cg/amg.dll?p=amg&searchlink=METALLICA&sql=11:kifpxqe5ldte~T52

Titlebar:
Metallica

Formed: 1981 in Los Angeles, CA

ActiveYear:80
ActiveYear:90
ActiveYear:2000

Genre:Rock

Styles:Thrash	Heavy Metal	Speed Metal	Hard Rock

Moods:Bitter	Suffocating	Fierce	Angry	Aggressive	Menacing
Gritty	Tense/Anxious	Hostile	Crunchy	Epic	Nihilistic	Fiery
Intense	Dramatic	Harsh	Ominous	Rebellious	Uncompromising
Searching	Gloomy

AMG Artist ID:P     4906

ArtistMiniBio:
Metallica was easily the best, most influential heavy metal band of the '80s,
responsible for bringing the music back to Earth.
Instead of playing the usual rock star games of metal stars of the early '80s,
the band looked and talked like they were from the street.
Metallica expanded the limits of thrash, using speed and volume not for their own sake,
but to enhance their intricately structured compositions.
The release of 1983's Kill 'Em All marked the beginning of the legitimization
of heavy metal's underground, bringing new complexity and depth to thrash metal.
With each album, the band's playing and writing improved;
James Hetfield developed a signature rhythm playing that matched his growl,
while lead guitarist Kirk Hammett... Read More...

Other Entries:
Movie Entry
Classical Music Entry

Group Members:
Kirk Hammett
James Hetfield
Dave Mustaine
Jason Newsted
Lars Ulrich
Cliff Burton
Robert Trujillo
Ron McGovney

Similar Artists:
Slayer
Anthrax
Sepultura
Machine Head
Coroner
Death
Dio
Danzig
King Diamond
Mercyful Fate
Metal Church
Overkill
Voivod
Death Angel
Queensr?che
Cancer
Corrosion of Conformity
White Zombie
Rollins Band
Melvins
Soundgarden

See Also:
Megadeth
Flotsam & Jetsam
Exodus
Rock Star Supernova

Influenced By:
Mot?rhead
The Misfits
Diamond Head
Black Sabbath
Judas Priest
Angel Witch
Iron Maiden
Saxon
Accept
Budgie
Deep Purple
Rush
AC/DC
Led Zeppelin
G.B.H.
Fear
Ted Nugent
Lynyrd Skynyrd
UFO
Thin Lizzy
Queen

Followers:
Carcass
Grindcrusher
At War
Crowbar
The Beyond
Sevendust
Boy Hits Car
Queens of the Stone Age
Roachpowder
Ossiris
Avenged Sevenfold
Trapt
Hurt
Scenes from a Movie
Sick City
Saving Abel

Performed Songs By:
James Hetfield
Lars Ulrich
Kirk Hammett
Cliff Burton
Bob Rock
Dave Mustaine
Brian Tatler
Sean Harris
Roger Taylor
"Fast" Eddie Clarke
Glenn Danzig
Jason Newsted
John Deacon
Brian May
Freddie Mercury
Lemmy
Lemmy Kilmister
Phil "Philthy Animal" Taylor
Burke Shelley

>Exit code: 0    Time: 5.940

Thus, on running the above script you get all the insformation about the artist Metallica from the All Music’s Metallica page. For demonstration purpose I have just extracted information from Metallica’s main page, however you can write similar code to extract information from metallica’s other sub-pages on All Music.

Meanwhile, if you are just thinging as to, How come my perl script extract the artist information? What method have i used to make sure only the relevant information is parsed from the page? or How did I made all those regular expression matches? , watch out for Part 2 of this blog. As of now I leave up on you, to figure out how is it all done.

Here are a few important links which will help you in making crawlers similar to those of Altertunes, and also understand the methods I have used above.
1. Leed’s University perl page
2. Tizag
3. Finally the documentation which comes in with PXPerl, is in itself a complete guide for everything.

Hope I helped a little in your quest of making crawlers.

In next blog I will try to wrap up this section (I am tried writing this one as of now) 😉

All the best.