Google’s Robot

Googlebot?
Suppose you make a living by selling information on the internet. You won’t make much $ if people can’t retrieve your information on google, yahoo,… That’s why “commercial types” allow searchengines to access (and index) their entire website, even the pages you’d normally have to pay for.
For example, google made a tool called googlebot that indexes “”the internet””. Or at least the http section of it . The clue is: if you can trick information-selling sites into believing that you are Googlebot, then free information is yours.

Demonstration.
The site experts-exchange dotcom sells IT answers. Mostly it’s about troubleshooting very specific software problems. This article explains how to configure the “brightstor arcserve backup” application to backup to any tape in the drive:
http://www.experts-exchange.com/Storage/Backup_Restore/Q_22509249.html
As you can see the answer is not readable, unless you signup(pay) and log in. Now if they want this page to show up in google, they have to give googlebot access to it.
There are already a number of tools that can help you to pretend to be googlebot. For instance this webservice. Go there, fill in the above URL, and the answer is readable… Waweewah🙂

Collection of interesting GoogleBot links

Firefox quicksearch
To make this ‘procedure’ a lot faster you can add a firefox quicksearch for the mentioned googlebot spoofer service (smart-it-consulting.com). I’ve assigned the quicksearch characters “gb” to view a page as googlebot. This way I can view http://www.pay-for-info.com as GoogleBot by typing

gb http://www.pay-for-info.com/page-you-want.htm

in the FF adressbar.
The bookmark url used for this is: http://www.smart-it-consulting.com/internet/google/googlebot-spoofer/view.htm?cBotName=Googlebot-2.1&cUrl=%s

Firefox Plugin

What would also be really niccce? Having a firefox plugin that puts the browser in GoogleBot mode, one-click. That way you wouldn’t have to tunnel throug another service (–> much faster).
I’m aware it is already possible to do this… but not fast, it’s tedious to set up all the options just to retrieve one page….

Could use this firefox add-on instead of writing one from scratch:
http://prefbar.mozdev.org/installation.html
These are interesting javascript function calls to be used with this plugin:
prefbarClearCookies();
prefbarClearAllCache();
prefbarSetUseragent(“Googlebot/2.1 (+http://www.google.com/bot.html)”);

security.enable_java = False;
javascript.enabled = False;
prefbarSetFlash(False);
prefbarSetImages(False);
network.http.sendRefererHeader

Potential difficulties
“Google recommends that webmasters use DNS (define) to verify the identity of the user agent defined “googlebot” on a case-by-case basis doing a reverse DNS lookup that would verify that the suspect crawler is in the googlebot.com domain.”
Additionally Google recommends that webmasters also do a forward DNS->IP lookup, which would prevent a potential spoofer from simply setting up their own reverse DNS that points to the googlebot.com domain space. Google posted

details on its blog in September.

Tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: