Here is a presentation released by Google themselves titled “Optimize your Crawling and Indexing“. In the presentation they explain context, how to reduce the inefficient crawling of your website, get your preferred url’s indexed and some resources.

This presentation is good reference for any Search Engine Optimization (SEO) specialist.

Taken directly from on the topics found in the presentation.

Remove user-specific details from URLs.
URL parameters that don’t change the content of the page—like session IDs or sort order—can be removed from the URL and put into a cookie. By putting this information in a cookie and 301 redirecting to a “clean” URL, you retain the information and reduce the number of URLs pointing to that same content.

Rein in infinite spaces.
Do you have a calendar that links to an infinite number of past or future dates (each with their own unique URL)? Do you have paginated data that returns a status code of 200 when you add &page=3563 to the URL, even if there aren’t that many pages of data? If so, you have an infinite crawl space on your website, and crawlers could be wasting their (and your!) bandwidth trying to crawl it all. Consider these tips for reining in infinite spaces.

Disallow actions Googlebot can’t perform.
Using your robots.txt file, you can disallow crawling of login pages, contact forms, shopping carts, and other pages whose sole functionality is something that a crawler can’t perform. (Crawlers are notoriously cheap and shy, so they don’t usually “Add to cart” or “Contact us.”) This lets crawlers spend more of their time crawling content that they can actually do something with.

One man, one vote. One URL, one set of content.
In an ideal world, there’s a one-to-one pairing between URL and content: each URL leads to a unique piece of content, and each piece of content can only be accessed via one URL. The closer you can get to this ideal, the more streamlined your site will be for crawling and indexing. If your CMS or current site setup makes this difficult, you can use the rel=canonical element to indicate the preferred URL for a particular piece of content.

