In the last few months I have made a few posts about fighting spammers and scrapers with custom Mod Security rules, and in those posts I always add the warning: Watch your logs closely so that you don’t inadvertently block real humans. So this raises the question: What do you check your logs for?
How to check an IP address
These are some of the things that I look at when determining if an IP address belongs to a spammer, a scraper, or a real human being. There is no 100% hard and fast rule or way to know for sure how legitimate every IP address is, but after you’ve done this for a while, you start to get pretty accurate.The first thing to do is get the IP address that you’re dealing with. If you’re using WordPress the UserOnline plugin will display the IP address of recent visitors. If you use Drupal there are also a few Modules but I usually just look for errors in the logs that scrapers and spammers often generate. Of course the best way is to look into your raw web-server logs.
Once I get a suspicious IP address, I generally run it through a few online lookups to start determining if it’s a spammer/scraper or an actual human with a soul. These are some of these checks I do, and what I look for:
- WHOIS: I run a whois check, usually at whois.domaintools.com. I look for where the IP is located and whether or not it looks like a server. As an example, let’s look at: 22.214.171.124 – this IP is registered to UbiquityServers.com, so it is very likely to be a scraper and it is very unlikely that it is someone sitting at home surfing the web. It is also possible that it’s a legit search engine, but you can usually tell that by the server name. Another thing that I look at is the country of origin. My websites cater to visitors in the United States, so an IP from Poland, Russia, Turkey, etc, is automatically suspicious. China, India, Israel, and Brazil have the highest statistical chance of being bad for your website, based on my data.
- GOOGLE: Next I’ll do a simple Google search on the IP address. A real human/home IP address will typically return only 5-25 hits on Google. A spammer or scraper IP address will often return 100-50,000 results. The more hits, the more suspicious. I also look at the first few results. If you Google Search our friends at 126.96.36.199, you will see Project HoneyPot and StopForumSpam.com both in the top 5 results. This virtually ensures that this is not an IP that you want hanging around your website. Another thing to look for is multiple “wiki” entries made by the IP address. This usually indicates that the IP is a proxy of some type, and cowards have been hiding behind it to screw up online Wiki entries.
- Usually between the WHOIS and Google search, I’ve already made up my mind about an IP address, but when I’m not sure, I will sometimes resort to black-list checks. My favorite IP blacklist checker is multirbl.valli.org. A quick run of 188.8.131.52 at MultiRLB.valli.org returns several hits (listed in red), sealing the deal – not human, or if it is a human, it has no soul!
Like I wrote at the beginning, there is usually no one, black/white way of knowing for sure if the IP address is one you want at your website or not. But with a little practice of running known bad IP addresses and known good IP addresses through just these 3 tests, you will start to get a feel for it. For a good IP address, try your own IP, or a variation of it (ie; change the last few numbers). If you want some known bad IP addresses to test, let me know, I’ve got plenty!How To See If An IP Belongs To A Spammer or Scraper by Rand Wilson