Friday, February 4, 2011

ATTACK OF THE KILLER ROBOTS!!! (or at least spam-bots)

Spam - whether it's the e-mail kind or the (supposedly) edible kind, it ain't good.  I just opened my spam e-mail folder and found dozens of messages with lines like "I NEED YOUR URGENT ASSISTANCE IN TRANSFERRING THE SUM OF (USD$13.7) MILLION DOLLARS ..."

Quite a few users have registered on our journal's website.  This made me happy, until I read the detailed information on these users and found some trends.  Their first, middle, and last names tended to all be the same and jibberish (e.g., CAT41 CAT41 CAT41), their phone numbers were all 123456, and their countries were all Afghanistan (the first country on the scroll-down alphabetical list).  Yup, these were spam-bot registrations, which made me sad.  I don't know what the goal of these spammers was, but there were often links to a handbag company's website in their information, so I guess they make money from that website's sales or web traffic.  So how do we get rid of spammers?

I've spent the past couple of hours figuring out a solution, but I think I've found it.  There is a way to require users to enter a "Captcha" when registering.  Captcha's are letters and numbers at weird angles that most machines can't read but humans can.  Other journals' managers who use our journal software assure me that Captcha's really work at preventing spam registrations.  They are annoying to type (and sometimes difficult to read unfortunately), but that's because machines are getting better and better at reading text, so the "are you human?" tests need to be more and more difficult.

Our journal software designers have stated they will move from Captcha to reCaptcha in future versions of our software, which is great news.  reCaptcha (www.google.com/recaptcha) is like Captcha, but functions not only as a security feature but also helps transcribe old books.  There are many programs that are scanning pages from old books into computers, and then using Optical Character Recognition software to transcribe the scanned text into typed words to make books available digitally.  But some words are difficult for computers to read (e.g., they are weathered); reCaptcha uses these words to test if users are human.  It's great for security since it uses words machines have tried and failed to read, and it's good for humanity since it helps digitize old texts one word at a time (von Ahn 2008).  We will gladly use reCaptcha once our software allows us to.  In the meantime, Captcha's like "r33MD8" will have to do (which, ironically, is similar to the names of our spam-bot registrations).

REFERENCES
von Ahn L, Maurer B, McMillen C, Abraham D, and Blum M. 2008. reCAPTCHA: human-based character recognition via web security measures. Science 321: 1465-1468.