In one of Matt Cutts’ latest Google Videos he covers the topic of dynamic URLs and what effect they have on indexing the site basically.
Well, I have heard a lot of things about this topic, generally ending up at the conclusion that it wasn’t of great importance what sort of / number of variables you put into the URL and if I really wanted a dynamically generated page indexed then I could link to it from a static page indexed by Google and I should be right.
Interestingly Matt does touch on this mentioning something like ‘there were search engines that would only crawl one deep into dynamically generated content’ - such as what I just suggested, the spider would visit dynamic URLs linked to from static pages, but go no further. This is where the one deep comes into it. I guess this was to stop spiders from getting mixed up into those never ending loops of indexing dynamic pages with similar content over and over again, wasting resources on often times duplicate content.
But it appears this is no longer the case, matt suggests 2 or 3 parameters at most in the URL string, and no large numbers that look like session ids.
He also suggests to use Apache’s mod_rewrite if you have instances where you have the same format of parameters being passed through. I’m sure I will write a mod_rewrite post because it is such a useful tool, but for now, here are a few useful links…
Related posts:
No tags for this post.
[…] Note: If you are using mod-rewrite and a robots.txt file the robots who use the robots file will check there first, so if you have an instance where the whole site is excluded in the robots file, and then in mod_rewrite you are doing a 301 redirect to another site, the robots will obey the no index directive in the robots file and in theory never see the redirect. Robots files can get a little tricky especially if you are using mod_rewrite to generate essentially ‘virtual’ directories and you want to stop the indexing of certain directories when really that directory does not exist. I’m still trying to get to the bottom of this little intricity, basically you can put an entry in for the ‘virtual’ directory but it isn’t going to stop the bots from indexing the final desination URL (mod_rewritten one), in which case if you do not have any links to the final destination URL and it is only the result of another URL being mod_rewritten then I think you should be fine just excluding the first URL. Useful links: […]