Learn Success With Randy Brown

Do it right the first time, stoopit.

Drupal: Help Avoid Duplicate Content

Anybody that knows anything about SEO will tell you over and over to avoid duplicate content. Usually this means don’t copy and paste other people’s work on your website, or don’t buy turnkey sites that display feeds for article submissions, etc.

But if you use Drupal there is another way you can get hit with a duplicate content penalty without even knowing it by improper use of the dreaded / – That’s right, the / (backslash) character. As it turns out, if you use Drupal it treats www.yoursite.com/a-page and www.yoursite.com/a-page/ as different pages. This means that if a search-engine bot comes to your site via a link with trailing “/” it could potentially index several duplicate pages or at worst a duplicate of your entire site.

I had never heard of this potential issue with Drupal so big kudos to my new friend Alex of pitumbo.com who I met at the April WEMUG meeting. Alex pointed me to this article at blamcast.net that explains it better than I ever could (please take a minute to read it and give it a Digg).

Basically the trick here is to use a 301-redirect to remove the trailing slash at the end of all your URL’s. The code to put in your .HTACCESS according to Blamcast.net would look something like this:

#remove trailing slashes
RewriteCond %{HTTP_HOST} ^(www.)?yourdomain\.com$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]

I quickly tossed this code into my .HTACCESS file and loaded a few test pages and to my surprise it did not work. I think the WWW -> non-WWW redirect I’m using to remove WWW from all my URLs was affecting it so after some tinkering I ended up with this version which seems to work:
#remove trailing slashes
RewriteCond %{HTTP_HOST} !^\.grownupgeek\.com$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]

If you’re using Drupal and are looking to squeeze every bit of SEO from it, I recommend this simple change. Thanks to blamcast.net for putting it out there.

2 people like this post.

Categories: Drupal - SEO - The Site
pitumbo (3 comments)

Thanks for the kudos ;-)

I had the same problem you did with the code, since I too had set up the www –> no www redirect before dealing with the slashy issue. If you (or your readers) have any websites using the Wordpress CMS it too has issues with the trailing slash leading to duplicate content.

2 May 07 at 10:15
nexy (1 comments)

Very interesting. I use Drupal on my site and I did not know about this issue. Thank you for sharing.

7 May 07 at 00:04
wethead (2 comments)

This is a great Little tweak , Thanks for sharing !

I just used it on my site and it worked well,!

15 September 07 at 18:58
chefbrad (1 comments)

I was going to tackle this issue myself, but I couldn’t get either version of the code to work for me. Plus I notice your site now has a trailing slash on all your urls. Any reason for the switch?

20 August 08 at 15:45
albspotter (1 comments)

Thanks for this code. I had the same problem and this helped to solve it.

3 March 09 at 01:58
rob (1 comments)

Hello,


That first example should probably use


RewriteCond %{HTTP_HOST} ^(www\.)?yourdomain\.com$ [NC]


Notice the “\” on the “.” by the www. This could have been causing some of your problems.

27 May 09 at 01:11
John Hoff (1 comments)

Thanks for the code – worked like a charm for my .html site.

29 August 09 at 08:25
Martijn (1 comments)

Your code snippet does work like charm :) Not just for Drupal, but also in our company framework. One minor detail: The part where you say ‘ – That’s right, the / (backslash) character’ isn’t right at all! / would be the (forward) slash, \ is a backslash :P Still love you tho, this post saved me a lot of twiddeling around with .htaccess ;)

28 September 09 at 05:47