Hpricot scraping in ruby

Include gems/library required before getting started
require 'hpricot'
require 'net/http'
require 'rio'
# Pass website url to be scraped
url = "www.funonrails.com"

# Define filename to store file locally
file = "temp.html"
# Save page locally
rio(url) < rio (file)
# Open page through hpricot
doc = Hpricot(open(file))

Apply hpricot library to get right contents

doc.at("div.pageTitle")
doc/"div.pageTitle"
doc.search("div.entry")
doc//"div.pageTitle"

Hpricot API Reference click here

Advertisements

About sandipransing

Web Developer #ruby #rails #JS
This entry was posted in Ruby and tagged , , . Bookmark the permalink.

One Response to Hpricot scraping in ruby

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s