Duplicate Content For Dummies

Duplicate Content For Dummies - SEO in the Frantic Business of Internet

Duplicate Content For Dummies Duplicate content is content that appears on the Internet in more than one place (URL). This is a problem because when there are more than one piece of identical content on the Internet, it is difficult for search engines to decide which version is more relevant to a given search query. To provide the best search experience, search engines will rarely show multiple, duplicate pieces of content and thus, are forced to choose which version is most likely to be the original (or best).

Some of the biggest issues with duplicate content include:

Search engines don't know which version(s) to include/exclude from their indices
Search engines don't know whether to direct the link metrics (trust, authority, anchor text, link juice, etc.) to one page, or keep it separated between multiple versions
Search engines don't know which version(s) to rank for query results

When duplicate content is present, site owners suffer rankings and traffic losses and search engines provide less relevant results.

SEO Best Practice

Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. This can be accomplished using a 301 redirect to the correct URL, using the rel=canonical or in some cases using the Parameter handling tool in Google Webmaster Central.

301 Redirect

In many cases the best way to combat duplicate content is to set up a 301 redirect from the "duplicate" page to the original content page. When multiple pages with the potential to rank well are combined into a single page, they not only no longer compete with one another, but create a stronger relevancy and popularity signal overall. This will positively impact their ability to rank well in the search engines.

Rel="canonical"

Another option for dealing with duplicate content is to utilize the rel=canonical tag. The rel=canonical passes the same amount of link juice (ranking power) as a 301 redirect, and often takes up much less development time to implement. The tag is part of the HTML head of a web page. This meta tag isn't new, but like nofollow, simply uses a new rel parameter. For example: This tag tells Bing and Google that the given page should be treated as though it were a copy of the URL www.example.com/canonical-version-of-page/ and that all of the links and content metrics the engines apply should actually be credited toward the provided URL.

noindex, follow

The meta robots tag with the values "noindex, follow" can be implemented on pages that shouldn't be included in a search engine's index. This allows the search engine bots to crawl the links on the specified page, but keeps them from including them in their index. This works particularly well with pagination issues.

Parameter Handling in Google Webmaster Tools

Google Webmaster Tools allows you to set the preferred domain of your site and handle various URL parameters differently. The main drawback to these methods is that they only work for Google. Any change you make here will not affect Bing or any other search engines settings.

Set Preferred Domain

This should be set for all sites. It is a simple way to tell Google whether a given site should be shown with or without a www in the search engine result pages.

Additional Methods for Removing Duplicate Content

Maintain consistency when linking internally throughout a website. For example, if a webmaster determines that the canonical version of a domain is www.example.com/, then all internal links should go to http://www.example.com/example.html rather than http://example.com/page.html. (notice the absence of www)
When syndicating content make sure the syndicating website adds a link back to the original content. See Dealing With Duplicate Content for more information.
Minimize similar content. Rather than having one page about raincoats for boys (for example), and another page for raincoats for girls that share 95% of the same content, consider expanding those pages to include distinct, relevant content for each URL. Alternatively, a webmaster could combine the two pages into a single page that is highly relevant for childrens' raincoats.
Remove duplicate content from search engine's indices by noindexing with meta robots, or through removal via Webmaster Tools (Google and Bing)