Google doesn't spider "id=" in URLs

In this board you can talk about general questions about phpMyFAQ

Moderator: Thorsten

Post Reply
brianoz
Posts: 7
Joined: Tue Mar 22, 2005 2:17 pm
Location: Melbourne, Australia
Contact:

Google doesn't spider "id=" in URLs

Post by brianoz »

Reading some information on spidering and SEO last night I read that Google doesn't spider pages with "id=" in the URL, but that it will spider PHP scripts, which surprised me!

I know there's some new stuff coming out in 1.5.0 that will help (using mod_rewrite etc) but a really simple fix would be to remove the "id=" strings from the URLs then Google would work immediately, even without mod_rewrite. Remember, not every server has mod_rewrite installed, or even if it is installed it may not be available to users.

So, any chance of getting the "id=" strings renamed? (ident=, artno=, page=, anything like that)? The change would be trivial.
Thorsten
Posts: 15746
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Post by Thorsten »

Hi,

I didn't knew that. Do you have an URL for me where you found this information?

Changing the id to something similar is no big change. But this is something for version 1.5.1.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
brianoz
Posts: 7
Joined: Tue Mar 22, 2005 2:17 pm
Location: Melbourne, Australia
Contact:

Post by brianoz »

Yes, I was surprised by this too, but it's from a very popular thread on sitepoint.com, a highly reputable source, and it's consistent with Google's own comments:
# Allow search bots to crawl your sites without session ID's or arguments that track their path through the site. These techniques are useful for tracking individual user behaviour, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page."
http://www.sitepoint.com/forums/showthread.php?t=182915
Why doesn't Google index all of my site's pages? Why does Google update some pages more often then others?
...
3) Your site uses dynamic pages/session ID's and they are not search engine friendly. Session ID are to search engines like garlic is to vampires. They repell them. Have your site remove them when the bots come around and you will fare much better (yes, this is ethical and acceptable to the search engines). Query strings that are either very long or conatin "id=" tend to limit the amount of pages some search engines will index (Google is a good example of this). Change "id=" to "page=" or something similar and you should do better. Or read the article about search engine friendly web pages.
Thorsten
Posts: 15746
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Post by Thorsten »

Hi,

thanks, I'll plan to change the id's in 1.5.1 or 1.5.2

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
brianoz
Posts: 7
Joined: Tue Mar 22, 2005 2:17 pm
Location: Melbourne, Australia
Contact:

Post by brianoz »

After posting this I realised that changing to record= will only work for Google, apparently it still doesn't work for the others.

An alternative technique that could work for nearly all search engines is using PATH_INFO via URLs such as:

http://www.whitedoggreenfrog.com/faq/index.php/en/4/121

You could vary this slightly as:

http://www.whitedoggreenfrog.com/faq/in ... -4/art-121

Sitepoint article which talks about this:
http://www.sitepoint.com/article/search ... endly-urls

This isn't such a trivial change, but thought I'd mention it as it should solve the problem forever. What the search engines stop on is the '?'.

Thanks for a great product Thorsten, wishlist items on the way as soon as I can find a German speaker to help me navigate Amazon.de :)
Thorsten
Posts: 15746
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Post by Thorsten »

Hi,

the PATH_INFO does not work with PHP-CGI so it's not an option. The mod_rewrite works quite good and it's available in 1.5.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
rickray
Posts: 1
Joined: Wed Dec 07, 2005 9:55 pm
Location: Portland, OR, USA
Contact:

Indexing Dynamic URLs

Post by rickray »

Most dynamic pages are now indexed by Google. The other major search engines maiking this change as well. Just limit the number of parameters in your URL.

Google writes:
"We're able to index dynamically generated pages. However, because our web crawler could overwhelm and crash sites that serve dynamic content, we limit the number of dynamic pages we index. In addition, our crawlers may suspect that a URL with many dynamic parameters might be the same page as another URL with different parameters. For that reason, we recommend using fewer parameters if possible. Typically, URLs with 1-2 parameters are more easily crawlable than those with many parameters."

See http://www.google.com/webmasters/2.html for details.
Post Reply