The behavior of a web crawler is the outcome of a combination of policies:
- A selection policy that states which pages to download.
- A re-visit policy that states when to check for changes to the pages.
- A politeness policy that states how to avoid overloading websites.
- A parallelization policy that states how to coordinate distributed web crawlers.
cite from:
http://en.wikipedia.org/wiki/Web_crawler