Who’s Reading Your SiteMap ?

Is your sitemap being used against you by scrapers to steal your content?  Mine was, until I started taking steps to protect it.

A sitemap helps to get your pages indexedWhat is a Sitemap?

A sitemap, simply stated, is a file that lists every page on your website making it easier for search engines to find and index your pages after you make it available to them.  A sitemap can be a simple flat text-file, but these days they are usually dynamically created and updated .XML files.

Contrary to popular misbelief, a sitemap will not increase your search engine rankings and it won’t get your site indexed any faster.   A sitemap will however, help to ensure that all of your pages get indexed by search engines.  This is particularity important for pages that have no links to them.  Ensuring that all of your pages are indexed is an important piece of an overall SEO strategy and although not 100% necessary, most SEO experts recommend submitting a sitemap to at least the Big-3 (Google, Yahoo, and MSN/Live).

Drupal has a great module to generate the XML sitemap for your site, and WordPress has a few plugins also.  If you have a static website or if your CMS or blog does not have a plugin or module for creating a dynamic (always updated) sitemap you can use a site like XML-Sitemaps.com to generate a static sitemap for you.

Once you have your sitemap, submitting it to the Big-3 (or others) is usually pretty easy. Here are the URL’s for submitting your sitemaps to the search engines that you probably care about.  Some might require you to signup for an account, but all are free.  On most of them after you sign up they have diagnostic tools and other free services that can help with sitemap errors as well as reports and information about your index-status, search statistics, and more.  Each has a slightly different process for submitting a sitemap so be sure to read the instructions, but all are pretty easy:

You’ve submitted your sitemap, so what could go wrong?

One day while going through my logs, I saw an IP not beloging to any search engines had downloaded my sitemap.  I started watching my logs more closely, and I was seeing many non-search engine IP’s downloading my sitemap every day – oddly enough (not!), most of these IP’s were from countries like India, China, and Russia – but they were basically coming from everywhere.  After a little more investigation, I found these same IP’s reading hundreds and thousands of my pages – what I realized is that these “scrapers” had downloaded my sitemap, then used it to crawl through and copy every single page on the site.  These scrapers were not only stealing my content for use on some MFA website, but they were also sucking huge amounts of my (very expensive) bandwidth.  This is probably common sense/common knowledge to experienced webmasters, but for me it was one of those “ahh-haa!” moments when I realized what was happening.

There are a few things I do to try and stop these bandwidth-sucking scraper-leeches.  Back when we were still using Drupal 4.7, we used the GsiteMap module for Drupal.   The gsitemap module used a non-standard sitemap name instead of the standard domainname.com/sitemap.xml  path.  Just the fact that the sitemap name was non-standard apparently fooled many scrapers.  So if possible, changing your sitemap name to something other than sitemap.xml will thwart many scrapers.

Since we’ve upgraded to Drupal 5 and started using the newer XML Sitemap module we can’t (easily) change the name of the sitemap so we immediately saw a huge increase in sitemap downloads, and site-scraping.   To combat them, we keep an eye on the logs -  The XML Sitemap Module will record an entry each time the sitemap is downloaded along with the domain-name & IP that downloaded it.  If it wasn’t an IP that belongs to a search engine, I use the Drupal Troll module to block that IP.  For WordPress you could use the Ban plugin, and for any site you could also block the IP in .HTACCESS manually or via cPanel/WHM if you have it.

With Drupal you can also easily see scraping behavior by viewing “Top Visitors” in admin/logs.  You can spot the scraper because it’s the IP that has 10,000 page views (or some other very large number) in the last xx hours.  I verify that these IP’s with unusually high page-reads do not belong to a legitimate search engine, then I ban that IP.  I don’t really worry about banning a real visitor because these scrapers usually have so many more page reads than a regular visitor, they really give themselves away and stand out like a sore-thumb.

I know that doing these things is a bit like spitting into the wind, or like trying to clean a beach one-grain of sand at a time, but it does help – and it makes me feel like I have a bit more control over MY content, MY site, and MY bandwidth.  YMMV ..

Mixed Messages From Google

A few weeks ago, around June 4 we suffered a bit of a Google-Slap.  Overnight, all of our keywords that were ranking at #1 -#5, dropped to page two or worse – correspondingly, traffic from Google dropped by at least 50%, along with Adsense and Kontera earnings and my general sense of self-worth.

Google slaps the shit out of me

Luckily, as you can see the drop in traffic from Google was temporary and after about 2 weeks we recovered by about 99%.  Sudden, unexpected (and undeserved?) drops in traffic from Google underscore the need for diversifying traffic sources – something I’ve been working hard to do for the last two years, but Google still accounts for about 70% of our traffic – a huge chunk.

During the pain and anguish of the GoogleSlap I was paying much closer attention to my WebMaster Tools Console, looking for any signs of being de-indexed or other communications from Google when I noticed something new.  The Webmaster tools was telling me that the site is indexed, something also reflected by doing a “SITE:GrownUpGeek.com” query, which indicates over 7,000 pages are indexed – but just below that was an error saying that NONE of the pages listed in my sitemap are included in the index – something that is totally contrary to being virtually fully indexed.

Google Webmaster Tools Info

Since the beginning of June there have been many discussions at the webmaster forums about sudden drops in SERPS (like mine), PR movement, etc – so it’s clear that Google has been up to something.  Maybe this is just a confusing side-effect of Google tweaking things.  If you have an explanation or theory about what Google is up to, or why I get this conflicting info from the Webmaster Tools console, please post a comment and fill me in.

SearchFeature.Com Interview

Last month I was interviewed by SearchFeature.com. Search Feature has done several interviews in the last few months with the likes of Barry Swartz, Seth Godin, Rank Fishkin and even Mr. Search Engine himself, Danny Sullivan. So for me, to be on the same page as these guys is quite an honor.

The interview is a quick read and since they are an SEO site, most of the questions are SEO related and focus on how I was able to learn my SEO techniques and become as relatively successful has we have become in such a short time.

You can read the full interview here: SearchFeature.com Interview with Randy Brown.

If you like the interview, please give them a Sphinn:

Why I Out-Rank Mubin Ahmed for the Term “Mubin Ahmed”

Seems that lately this blog has been ranking #1 on Google for the term “Mubin Mubin AhmedAhmed“.  I would like to say that the reason I outrank Mubin’s blog for his own name is because of my nija-like SEO skills – but the truth is it’s probably because Mubin had some server problems a few weeks ago and he’s still recovering from a Google bitch-slap.  I’m sure that in a week or two he’ll be back on top for the term “Mubin Ahmed”…. Unless I keep making keyword-stuffed posts like this one, full of the words Ahmed and Mubin:-)

If you’re looking for Mubin Ahmed’s blog, click on over to www.mubinahmed.com

Does Your Website Have Enough Personality ?

[Brand] Personality: The image or identity of a brand expressed in terms of human characteristics, such as young, old, warm, etc. Identifying a brand’s personality helps target consumers to relate more closely with the brand.

Today I had some time to mind-meld with my iPhone and was finally able to listen to a few podcasts. One of my favorites, SEO 101 | The Beginning SEO Podcast touched on the subject of a website’s personality and how important it can be in an overall SEO strategy, and how it seems like so many webmasters overlook it – focusing instead only on keywords, meta tags, page-titles, etc. Because what the hosts said on the podcast was almost exactly what I was thinking, word for word, as I was building GrownUpGeek.com, and because I think that website-personality played such an important role in the success of GrownUpGeek.com I thought I would share it.

On the podcast they brought up how you can have the best SEO in the world, rank high on your keywords, and yet it can all go to waste if a visitor comes to your site only to immediately leave (bounce) because your website looks like nothing more than a spam-farm, or simply lacks any ‘personality’ and looks like it was generated by a machine, or like every other canned website out there. They went on to discuss how writing unique content with your own ‘twist’ or building a community gives a site personality, and it’s that personality that keeps a visitor interested resulting in more page views and hopefully even return visits. Even websites that sell products could be made more interesting by writing unique product descriptions instead of using the same old canned descriptions that every other website selling the same product uses. A good example of this is W00t.com.

When I started building GrownUpGeek.com I knew the exact type of people I was building it for – Beginners with computers and the internet. So from the very start the “personality” of the site was “simple”, “inviting”, and “helpful” – and the entire time I built it the way I would like to see a website: not overcrowded with advertisements, easy to maneuver, and easy to read. Almost every day, I looked through the entire site asking myself, “If I came across this website, would I like it?”   During the entire building process (several months) I never forgot who my primary target was, and kept them in mind with every change I made.

The Female Grown Up GeekLater, I realized that by giving the site a female ‘persona’, “The Geek” (after my wife) it would appeal more to both female and male visitors, something that a more masculine looking website may not be able to pull-off. I also thought that the softer feminine feel would be more welcoming to people new to the internet, or shy about using computers.

As the website began to morph into a community, we tried hard to maintain a feeling of helpfulness, openness, friendliness and commradary. As time went on and the community grew (we just broke 15,000 members this week) this strategy has really paid off, resulting in a lot of word of mouth (viral) advertising and members affectionately referring to themselves as “GuGies”.

I’m not suggesting that you run out and make your website “girly” to try and attract or maintain visitors. Instead, take a moment to think about the type of visitors that you are targeting, and give your website(s) a personality compatible with them. Adding this type of thinking to your overall SEO and marketing strategy can make the difference between a one-time “bounce” and a long-term returning customer or member.

What Is YOUR SEO “Score” ?

While I was checking my domain registration info at domaintools.com I noticed a few new features that weren’t there when I checked last year. One of those features being the “SEO Score“. This tool attempts to look at your domain the way that Google sees it and rank it based on several factors.

For fun I decided to compare my SEO score to a few of the “experts” and other big websites and list them here for entertainment purposes:

Another new feature which is still in beta, is their SEO Text Browser. This tool looks at your site the way Google sees it but it makes suggestions to correct common errors and mistakes that might hurt your Google SEO ranking. If you’re new to SEO it might be worth the 2 minutes that it takes to play with this tool. To learn how to use the SEO Text Browser tool, go to the SEO Text Browser Instruction page.

If any of you have a higher SEO rank than me (96%) or problogger.net (97%), post a comment and brag about it!