Parsing With Nokogiri

I was reading an article from our blog about extracting all the links from a webpage with python.Have a look , that's a well written article.So ,I decided to write an article about extracting links and images links with Ruby using Nokogiri .

What's Nokogiri ?


Nokogiri is a library that acts as HTML/XML parser. In simple language, if you want to extract a piece of information from a website to use it in your program what would you do?  Suppose we want to extract the information in <div id="abc"> to use it in our program,either I will copy the source of website into a text file manualy  and  then search through the whole document or I can use a library that can help me in extracting the information directly from the website.Nokogiri is one such library.




Using Nokogiri

 Step 1.       Install the gem 'nokogiri' by typing  "gem install nokogiri" .


 Step 2.                                                                                                                                        
Include the library in your program by typing "require 'nokogiri'".Also include the 'open-uri' library by typing   " require 'open-uri' " as we will be dealing with the website.

 Step 3.       
Now we will open the page and with the help of css selector we will look for <a> tag and then we will   pick out whats inside 'href' that will be the  link.Same we will do for obtaining an image too.
  
Have a look at the complete code (Explanation in comments):

       
 
Run it by typing:  $ ruby nokogiri.rb  ,on your terminal.

Thank you!