RSS feeds have completely changed the way we consume information on the web. You no longer have to visit The New York Times or CNN every hour of the day to check news headlines because the feed reader is doing that for you behind the scenes.
The only problem is that not all web content is available via feeds. For instance, Amazon, eBay and Google Product Search (Froggle) are good places to find discount deals on books and electronic gadgets but unfortunately, none of these shopping sites publish feeds.
Monitor Web Pages with Google Docs
Problem: Let's say you are looking for some discount deals on iPod Nano. An option here is that you open Google Products page and search for iPod Nano. If you don't find the right price, repeat the same cycle next day. This may sound easy but imagine doing this for ten other products on five different shopping sites. Tedious, right?
Solution: What we can do here is build a simple spreadsheet in Google Docs that will monitor prices across all these search pages and will present them in a table so you don't just track prices but also compare them at the same time.
To get started, you need access to Google Docs and some basic knowledge of XPath. Don't let this scare you - XPath is a simple way to access information contained inside HTML web pages. For instance, if you want to know about all URLs that are mentioned on any web page, the XPath expression would be //a[@href]. Some more examples:
//strong means all the items in the web page with strong html tags//@href means all the items in the web page with href element, i.e., the URLs in that page.
If you feel that writing XPath expressions is a tricky job, get the XPath checker add-on for Firefox that will help you easily determine the XPath of any element on a web page.
Import Data from Websites into Google Docs using XPath
This is the search page for 'ipod nano' inside Google Products. As you may noticed already, the result title is formatted with CSS class "ps-large-t" while the the product price using the class "ps-larger-t" - you can easily find these class names via Firebug or from the HTML source.
ไม่มีความคิดเห็น:
แสดงความคิดเห็น