The recent search engine updates have forced many a webmaster to drive website cleanups including clearing duplicate content. For large popular websites, where content is shared across the web, this may mean DOOM and how! These websites may have brilliant content which begs to be shared – this sharing is often achieved by passing on a link if the URL is posted out on a blog or forums. This link that was shared may have all sorts of parameters attached to them – Session IDs, Tracking parameters, etc.
When linked enough times, this URL (with additional parameters at the end) may even get indexed by search engines, thus essentially creating a duplicate version of the page originally linked. This causes quite a few issues, the biggest of them being duplicate content and a waste of the search engines indexing bandwidth for your website (which you can never have enough of) since the same page is essentially crawled multiple times.
Depressed Webmasters read on. There may now be a way out.
Step 1: Parameter Handling in Webmaster Tools:
Google’s Webmaster Tools allows a webmaster to specify the parameters which he believes should be ignored by search engine crawlers.
This is the easiest way that the additional parameters that are added to a URL can be removed. A downside to this method is that these parameters should be previously known by the webmaster which is not always the case; and by the time this data is collected, some duplicate versions of the page may even be indexed. Another downside is that any value gained by the duplicate versions of the page is not passed on to the original page.
Step 2: Self-Canonicalization:
The canonical tag was originally introduced so that webmasters can recommend the preferred version of a page to a search engine. This especially comes in handy when you have a website with multiple pages all of which have a large amount of similar content (Example: E-commerce websites).
All pages of a site can be self-canonicalized, i.e. a canonical link to itself so that if there is ever duplicate version of a page, the search engine will be informed that the original link is the preferred version of the page.
When there is no such duplication, you will end up having a self-canonical, which is rather redundant and does not have any negative effect.
A downside to this strategy is that any other canonicalization done for the website will be compromised due to this mass self-canonicalization effort.
Both the above mentioned methods standing alone in the park may not really prove to be your savior. But club them and you have a Gigantic Duplicate Killer. (May I call it the T-REX of Webpage Uniqueness?). Read on for the Eureka moment……
When Parameter Handling and Self-Canonicalization Collide:
Canonical link mentions will ensure that the value gained by a page is transferred to the original page (in case any page with additional tags is cached) and once you make note of which external parameters to exclude, you could go ahead and exclude them through Webmaster Tools. This will ensure that only one copy of a page is indexed by the search engines and there is permanent death of duplicate content on your website!