How will you stop web spiders from crawling certain directories on your website?

WWW wanderers or spiders are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages. Search engines like Google, frequently spider web pages for indexing. How will you stop web spiders from crawling certain directories on your website?

WWW wanderers or spiders are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages. Search engines like Google, frequently spider web pages for indexing. How will you stop web spiders from crawling certain directories on your website?

A.
Place robots.txt file in the root of your website with listing of directories that you don’t want to be crawled

B.
Place authentication on root directories that will prevent crawling from these spiders

C.
Enable SSL on the restricted directories which will block these spiders from crawling

D.
Place "HTTP:NO CRAWL" on the html pages that you don’t want the crawlers to index



Leave a Reply 2

Your email address will not be published. Required fields are marked *


Dick Steele

Dick Steele

To be honest, none of these is right, but A will be correct from CEH point of view. The reason none of the answers are right is because technically, nothing stops spiders from ignoring robots.txt, it’s just a common courtesy to respect that file if you are a search engine. User-run web spiders (e.g. wget, teleport) can happily ignore everything and crawl through whatever they can get their hands on.