I recently worked on a competition component for a client’s website. About a month in to the competition, we noticed that the number of people in the database (about 12,000) far exceeded the number of hits on the form that’d been picked up by Google Analytics (about 2,000). Hmm.
My first thought was that Google Analytics wasn’t working, so I switched on Apache logging, set up AWStats and started comparing what AWStats had to say with Google Analytics after a week or so.
To my surprise, the discrepancy was negligible (Google Analytics was very slightly lower), and could be accounted for by the users with JavaScript turned off, since Google doesn’t pick up non-JS users without some hacking about.
So, if the server logs and Google Analytics added up, what was going wrong? There can’t be someone bypassing our form and punching data directly in to the Experian CheetahMail database, can there?
As it happens, no.
A quick scan through the data that’d been captured looked perfectly legit (well formed names, e-mail and postal addresses) so we initially ruled out a bot pumping in spam. It wasn’t until we’d started to double check everything that we found the problem.
The e-mail addresses, names, and postal addresses all appeared legitimate but actually weren’t. A number of TLDs kept on cropping up:
- ontel.org.uk
- 1-bt.co.uk
- 00online.co.uk
- telonline.org.uk
- tisca.co.uk
The account names for the addresses were beginning to look a little suspect too:
- c.bevins.11285@1-bt.co.uk
- h.burke.1961@ontel.org.uk
- v.dunbar.2533@tisca.co.uk
- j.clarke.137@00online.co.uk
Closer inspection revealed that, for entries with from e-mails at these domains, the street names the postcodes resolved to and actual entered street names didn’t even match up.
Lo’ and behold, a quick WHOIS on the offending domains reveals a single company; Win24 Ltd.
Win24 Ltd., it transpires, seem to be in the business of entering competitions en-masse, on your behalf, and for a fee. They hammered our competition form for 2 days solid (shortly before we switched on Apache logging), and filled our database with extraneous entries (which have now been deleted, and blacklisted).
So, if you see a ton of entries in your DB from said domains, that’s why.
Footnote:
Apache access logging was switched off because it improves the site’s performance noticably, and the reason there was no CAPTCHA on the form was because we deemed it a barrier to entry.
Update:
The mother of all competition websites has an article on the same subject.
Filed under: Web Dev