The constant evolution of blog spam

As a computer programmer, I find that I’m fascinated by the evolution of spam comments.

I mean, the first ones were blatant – “BUY MOAR CHEAP WATCHES AND VIAGRA!!!”

After that, spam filters got better, so the commenters started avoiding obvious phrases – “BY MOAR CH3AP W4+(HES AND V1AGR4!!!”

This made the spam filters broaden their blacklists, and made it harder to read. Also, spammers ran into the problem where blog owners would read every comment, and the human filter was just too good to get around.

So the spammers found another tactic – Comment from BuyCheap@WatchesAndViagra.com “Really great post! I’ll be looking for more of this!” The flattering was effusive and varied. Unfortunately for the spammers, the human filter is way too good for this one as well, which culls a lot of their effectiveness.

So then they tried a more subtle approach – Comment from fred@WatchesAndViagra.com “That was a great post about <blog post title>. You’ve really opened my eyes about <random word from blog post> and it’s importance in all our lives.” This version was more difficult to track down, because if it is done well, it can seem like a very organic comment.

So the human filter was a little more suspect, but it was still able to recognize those where the drop-in portions didn’t fit with the rest of the sentence. The spammers also ran into a problem where it was more obvious that English wasn’t their first language (Spammers are generally based outside of English-speaking countries, which makes them more easy to spot).

So, time and again, I’ve seen the spam comments get more and more involved, more and more organic-looking.

Today, I saw this one as a comment on a blog post called “Body Swapping and Having a Soul”:

E-mail : *********@live.com

URL : http://www.backpackvacuums.org/

Whois : http://whois.arin.net/rest/ip/***.***.***.***

Comment:

Body swapping means that moving the body. What a amazing way to swap the body.

thanks,

Now, seriously, how cool is that? I figure the bot scans my blog post, recognizes nouns and verbs from the text (which is actually based on some pretty cool open-source software called the Stanford NLP), then uses a separate search to pull out definitions for the words.

I mean, yeah, the human filter can generally catch these as well (if for no other reason than we can tell the URL is an ad). But this process is evolving, and it’s not showing any signs of slowing.

The constant evolution of blog spam

Comments

Brand Gamblin

Recent Comments