Suppose you make a living by selling information on the internet. You won’t make much $ if people can’t retrieve your information on google, yahoo,… That’s why “commercial types” allow searchengines to access (and index) their entire website, even the pages you’d normally have to pay for.
For example, google made a tool called googlebot that indexes “”the internet””. Or at least the http section of it . The clue is: if you can trick information-selling sites into believing that you are Googlebot, then free information is yours.
The site experts-exchange dotcom sells IT answers. Mostly it’s about troubleshooting very specific software problems. This article explains how to configure the “brightstor arcserve backup” application to backup to any tape in the drive:
As you can see the answer is not readable, unless you signup(pay) and log in. Now if they want this page to show up in google, they have to give googlebot access to it.
There are already a number of tools that can help you to pretend to be googlebot. For instance this webservice. Go there, fill in the above URL, and the answer is readable… Waweewah 🙂
Collection of interesting GoogleBot links
To make this ‘procedure’ a lot faster you can add a firefox quicksearch for the mentioned googlebot spoofer service (smart-it-consulting.com). I’ve assigned the quicksearch characters “gb” to view a page as googlebot. This way I can view http://www.pay-for-info.com as GoogleBot by typing
in the FF adressbar.
The bookmark url used for this is: http://www.smart-it-consulting.com/internet/google/googlebot-spoofer/view.htm?cBotName=Googlebot-2.1&cUrl=%s
What would also be really niccce? Having a firefox plugin that puts the browser in GoogleBot mode, one-click. That way you wouldn’t have to tunnel throug another service (–> much faster).
I’m aware it is already possible to do this… but not fast, it’s tedious to set up all the options just to retrieve one page….
Could use this firefox add-on instead of writing one from scratch:
“Google recommends that webmasters use DNS (define) to verify the identity of the user agent defined “googlebot” on a case-by-case basis doing a reverse DNS lookup that would verify that the suspect crawler is in the googlebot.com domain.”
Additionally Google recommends that webmasters also do a forward DNS->IP lookup, which would prevent a potential spoofer from simply setting up their own reverse DNS that points to the googlebot.com domain space. Google posted
details on its blog in September.