Duplicate content is content that appears on the Internet in more than one place (URL). This is a problem because when there are more than one piece of identical content on the Internet, it is difficult for search engines to decide which version is more relevant to a given search query. To provide the best search experience, search engines will rarely show multiple, duplicate pieces of content and thus, are forced to choose which version is most likely to be the original (or best).
Some of the biggest issues with duplicate content include:
When duplicate content is present, site owners suffer rankings and traffic losses and search engines provide less relevant results.
Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. This can be accomplished using a 301 redirect to the correct URL, using the rel=canonical or in some cases using the Parameter handling tool in Google Webmaster Central.
In many cases the best way to combat duplicate content is to set up a 301 redirect from the "duplicate" page to the original content page. When multiple pages with the potential to rank well are combined into a single page, they not only no longer compete with one another, but create a stronger relevancy and popularity signal overall. This will positively impact their ability to rank well in the search engines.
Another option for dealing with duplicate content is to utilize the rel=canonical tag. The rel=canonical passes the same amount of link juice (ranking power) as a 301 redirect, and often takes up much less development time to implement. The tag is part of the HTML head of a web page. This meta tag isn't new, but like nofollow, simply uses a new rel parameter. For example: This tag tells Bing and Google that the given page should be treated as though it were a copy of the URL www.example.com/canonical-version-of-page/ and that all of the links and content metrics the engines apply should actually be credited toward the provided URL.
The meta robots tag with the values "noindex, follow" can be implemented on pages that shouldn't be included in a search engine's index. This allows the search engine bots to crawl the links on the specified page, but keeps them from including them in their index. This works particularly well with pagination issues.
Google Webmaster Tools allows you to set the preferred domain of your site and handle various URL parameters differently. The main drawback to these methods is that they only work for Google. Any change you make here will not affect Bing or any other search engines settings.
This should be set for all sites. It is a simple way to tell Google whether a given site should be shown with or without a www in the search engine result pages.