Evading Google Search

1 comment

Yesterday CNet reported that Dell’s future laptop specifications were accessible by the general public through Google. Dell learned the hard way, but you don’t have to. There is a very easy way to block the GoogleBot from crawling and indexing parts of your website. I’m sure we all heard about a robot.txt file, but do you use it? If you have some sensitive information on your server that you don’t want search engines to index, a robot.txt file with instructions for spiders is the best way to do it. Here is what you do:

1. Create a file called robot.txt.

2. There are just two lines you need to know: “User-agent” and “Disallow”. “User-agent” is where you specify which spiders you would like to keep out and the “disallow” line tells that spider which directories not to crawl. So here is what the content of your robot.txt file looks like if you block all spiders:

User-agent: *
Disallow: /

And here is how it looks if you block GoogleBot from crawling your images folder:

User-agent: *
Disallow: /images/

Multiple folders:

User-agent: *
Disallow: /forum/
Disallow: /images/

3. Save the file and upload it to your web server’s root directory.

That’s it. This will keep your sensitive data safe from the general public.

About the Author

Andrey Milyan was the first editor-in-chief of Search Marketing Standard, the leading print publication covering the search marketing industry. He has been following and reporting on industry developments for over 10 years. Andrey now works in the paid search sector of a prominent search marketing agency.

Add Your Comments

  • (will not be published)

One Comment

  1. Anonymous

    Conversely, there is stuff that Google wont't touch