Ramblings of Narc

When the issue isn't confused enough.

Archive for April, 2009

Silly Google Phrases

Every blogger has done a post like this. You know them, you love them, they are… the silly Google search queries people use to find your blog!

These are in reverse order of arrival, and the URLs are mostly pasted straight out of the referrer log. Without further ado:

  • narc ftp port (Google UK) — I’m not sure I really want to know. Do I have an FTP daemon I’m not aware of? If so, it’s probably stuck inside the LAN, since I’m not forwarding anything unexpected.
  • what a narc does to set up people — Did you really think it would be that easy? Us narcs have our professional pride, you know?
  • acronym for narcs (Google Australia) — Do we really need an acronym here? “Narc” is a pretty short word already. What would be the acronym? “N”?
  • zap+ro (Google Thailand) — I’d really prefer if you didn’t, thank you. I happen to live here in .ro, and I like it.
  • pl poke data narc (Google UK) — Er… I don’t think I really want to know what that’s supposed to mean. Using Perl to poke data into my brain? No, thank you. Although, if you manage it, that’ll be a neat hack.
  • short summary of the notebook — Before I did that search, I hadn’t known The Notebook (2004) was a movie (and a novel, apparently). So here’s a short summary, then: “It’s a movie (and a book).” Happy?
  • why is vodafone website so shit? (Google UK) — Good question! Without knowing anything about their internal organization, I’d guess that most of it was their use of a very crappy technology (JavaServer Pages? That’s what the JSP stands for, yes?), which presumably was chosen because the rest weren’t Enterprise-y enough and/or because that’s what the consultants they hired to do the job “knew”.
  • mysql “add a fucking user” — This search actually returns a very specific result from my blog, that being my “Going Insane From Work” post, which unfortunately, doesn’t actually answer the (implied) question. So, here it is: to “add a fucking user” to mysql, the command is: GRANT <privileges> ON <database>.<table> TO '<username>'@'<host>' [IDENTIFIED BY '<password>]. Alternatively, to leave the user at default privileges (that is, none), use: CREATE USER '<username>'@'<host> [IDENTIFIED BY '<password>']. This, and more, can be found in the fucking MySQL manual, which you should’ve picked up like the rest of us do.
  • arguments against alcoholics anonymous — Er… why? Oh! Oh! I got one: “I’m not a drunk, I can quit whenever I like!” There’s your argument.
  • Finally, i didn’t know my friend was a narc — Well, neither did I. Which poses an interesting question: if neither of us knew, are you really my friend?

That’s it for this edition of “Silly Google Phrases”. One thing I’d like to mention, though — a lot of people have been finding my website by searching google for… narc.ro. I find this very curious, but ultimately, as long as people find what they’re looking for, who am I to judge?

Thank you all, and good night! :)

Spam, ReCAPTCHA, and Stuff

So if you’re a visitor here who’s ever at least thought of posting a comment, you’ll probably know I recently (about half a year ago?) switched away from Akismet to reCAPTCHA for my spam-blocking needs. ReCAPTCHA is nice, and the fact that they also make it possible for humans to help where OCR fails is a big bonus for them.

However, the fact that it’s one of the most common types of CAPTCHA means that it’s also the one under the heaviest attack, and that means there are spambots that have learned how to crack it.

As evidence of this, I offer the (dozens of) spam comments I just deleted from my queue (as I was typing this, another one just showed up). The major difference between this spam and the stuff that used to pass through Akismet is the length — these new spam comments are very long. This works in my favour, of course, since it makes it easy to figure out what to delete: if it takes a tap of the Page Down key to get to the end of the comment, it’s very likely spam.

However, if we disregard the content of the spam (which is easily changeable), we can see that it’s really quite a bad idea to rely on any kind of CAPTCHA by itself. It seems I have to echo the many others who have said that spam is a machine-generated problem with only human solutions.

Ultimately, every kind of anti-spam solution has drawbacks:

  • statistical analysis solutions (think Bayesian filters) will have false positives and false negatives sometimes.
  • distributed blacklists (like Akismet) fail because they’re blacklists — and enumerating badness is a failure waiting to happen[1]. On top of that, open blacklists are easy to poison, leading to… false positives, of course.
  • CAPTCHAs, as a special class of solutions, fail because they rely on computers not being able to “read” as well as humans can — the problem being that some humans cannot read as well as a computer can; and also that computers are getting smarter all the time.

I’m sure there are other types of anti-spam solutions I haven’t enumerated, and likely they all fail on one point or another.

One of the best approaches to such problems is a whitelist-based approach, or enumerating goodness. This is much easier to do, since the number of honest commenters is likely much lower and much more stable than the number of potential spammers out there.

But wait, narc, doesn’t that mean that I have to keep an eye on my moderation queue, to whitelist the allowed commenters?” Well, obviously, but you’d have to keep an eye there anyway to check for false positives, and to delete all the spam you’re getting. So there are no savings either way.

With that said, using a solution like reCAPTCHA can reduce the immense size of the moderation queue, which is a good enough reason to use it. But you still need to keep your eye on the ball, and you also shouldn’t forget that CAPTCHAs will keep out parts of your (potential) audience. I try to do that, and if I ever seem to have failed, I strongly encourage you to contact me and remind me.

Update: I’ve had to close comments on this post due to the fact that it got targeted by a bunch of spammers (who don’t seem to have much trouble with the reCAPTCHA). Meh.