This is one thing that I have constantly wondered about, after hearing so many things, many of them being that really there is no such thing as duplicate content.
Well, after viewing another of Matt Cutts’ videos he touches on the subject and it seems that Google uses a lot of sophisticated logic to detect duplicate content right through the entire process of indexing your page, up until the few micro seconds before the page is displayed.
This makes sense in the fact that Google is trying to minimise the amount of work their spiders are doing and the more rigourous the duplicate detection the less crawling is necessary.
So, obviously there is a scale of duplication and it seems that Google even notes ‘near duplicates’. In a later video post Matt mentions that you do not need to worry if you have printable versions of pages, or documents like .doc which have duplicate content to that on your site say. Even if you have two sites which are identical, maybe one for Canada and one for USA - Google will just pick the one that seems most relevant to the search.
Related posts:
Leave a Reply