50

What are some non-captcha methods for blocking spam on my comments?

Kip
  • 99,109
  • 82
  • 222
  • 258
ian
  • 10,075
  • 23
  • 66
  • 95

18 Answers18

60

In my experience the currently most effective methods are honeypot input fields that are made invisible to users via CSS (best use several different methods, such as visibility:hidden, setting a size of 0 pixels, and absolute positioning far outside the browser window); if they're filled anyway you can assume it's a spambot.

This blog describes a rather complex method that I've tried out myself (with 100% success so far), but I suspect that you could get the same result by skipping all the stuff with hashed field names and just add some simple honeypot fields.

Phil
  • 1,288
  • 1
  • 10
  • 19
Michael Borgwardt
  • 327,225
  • 74
  • 458
  • 699
  • +1 this worked on my blog too, mostly. every now and then there will be a spammer that submits to *every* form on the page though... – Kip Oct 16 '09 at 13:21
  • 8
    +1. Very cool. I also like the time limit idea. If a "user" submits a comment 74 milliseconds after requesting the page then you know something's up. – Steve Wortham Oct 16 '09 at 13:43
  • @Steve - yes, but unfortunately it's too easy for spambots to defeat. – Michael Borgwardt Oct 16 '09 at 13:50
  • also check out "detecting stealth web crawlers": http://stackoverflow.com/questions/233192/detecting-stealth-web-crawlers – Jacco Oct 16 '09 at 13:55
  • 4
    The biggest flaw of the "honeypot" methods - browsers can autofill those fields anyway. For instance, Chrome will put your email into every if autofilling is turned on. – Alex from Jitbit Feb 11 '11 at 09:45
  • @jitbit: well, that's something to keep in mind, but can't happen with the method I linked to, which uses randomized field names that are different each time the form is displayed. The idea is to make it impossible for the spambot to know which fields must be filled and which must not be filled. – Michael Borgwardt Feb 11 '11 at 09:48
  • 4
    Note that using this method, you may incorrectly block terminal users whose browsers do not apply that sort of CSS (namely, braille-readers) – BlueRaja - Danny Pflughoeft Apr 29 '11 at 14:31
  • 1
    @BlueRaja-DannyPflughoeft, doesn't the same limitation apply to any non-pure-HTML CAPTCHA method? – Matthew Smith Jan 26 '16 at 17:52
  • Any serious Braille Reader must use CSS. – Lothar May 25 '17 at 05:04
13

1) Adding session-related information into the form Example:

<input type="hidden" name="sh" value="<?php echo dechex(crc32(session_id())); ?>" />

then at postback, check whether session is valid or not.

2) Javascript-only. Use Javascript injection at Submission. Example:

<input type="hidden" id="txtKey" name="key" value="" />
<input type="submit" value="Go" onclick="document.getElementById('txtKey').value = '<?php echo dechex(crc32(session_id())) ?>';" />

3) Time-limit per IP, User or Session. this is quite straightforward.

4) Randomizing field names:

<?php
   $fieldkey = dechex(crc32(mt_rand().dechex(crc32(time()))));
   $_SESSION['fieldkey'] = $fieldkey;
?>
<input type="text" name="name<?php echo $fieldkey; ?>" value="" />
<input type="text" name="address<?php echo $fieldkey; ?>" value="" />   

Then you can check it over at the server side.

mauris
  • 39,624
  • 14
  • 92
  • 128
  • 6
    In my experience, 1) and 3) completely ineffective since mny bots behave like normal users as far as sessions are concerned, and requiring JavaScript for basic functionality is inacceptible. – Michael Borgwardt Oct 16 '09 at 13:14
  • 11
    @michael: Your anti-javascript attitude is antiquated. Less than 5% of all users have their javascript deactivated. And if you've got it off, then it's your own fault. Besides, just put in a no-script warning to these folks that they can't submit. – Robert K Oct 16 '09 at 14:44
  • 7
    Javascript is noadays the #1 source of virus infection, botnets and thus ultimately, spam. If you have it on indiscriminately, or require it for basic functionality. then you are part of the problem. – Michael Borgwardt Oct 16 '09 at 14:52
  • 1
    Even if the peopel with js disabled is less than 5%, i dont like to bind a webpage functionality to something that can be not enabled or -aware- be **easly fooled and manipulated** by the user. However in this case, i'll assume that the 99.99% of the target use javascript, and this use dont have any dangerous drawback. – Strae Dec 14 '09 at 15:27
  • 12
    @Michael Borgwardt the internet is 100% of the source for virus infection, botnets and thus ultimately spam. – Mark Dec 01 '10 at 17:43
9

Akismet has an API. Someone wrote a wrapper class (BSD liscense) for it over at: http://cesars.users.phpclasses.org/browse/package/4401.html

There's also a Bayesian filter class (BSD Liscense as well) http://cesars.users.phpclasses.org/browse/package/4236.html

Kip
  • 99,109
  • 82
  • 222
  • 258
easement
  • 5,945
  • 3
  • 26
  • 35
5

This is simple trick to block spam bot or brute force attack without using captcha.

Put this in your form:

<input type="hidden" name="hash" value="<?php echo md5($secret_key.time()).','.time(); ?>" />

Put this in your php code

$human_typing_time = 5;/** page load (1s) + submit (1s) + typing time (3s) */
$vars = explode(',', $_POST['hash']);
if(md5($secret_key.$vars[1]) != $vars[0] || time() < $var[1] + $human_typing_time){
    //bot?
    exit();
} 

Depend on weight of form you can increase or decrease $human_typing_time.

StoneHeart
  • 14,122
  • 31
  • 65
  • 83
  • 3
    Can you explain me what is $secret_key? And I think you have a typo at time() < $var[1] I think it should be $vars. – nmsdvid Feb 22 '14 at 08:36
4

Naive Beyesian filters, of course:

http://blog.liip.ch/archive/2005/03/30/php-naive-bayesian-filter.html

ChickenMilkBomb
  • 937
  • 5
  • 18
4

There is the Honey Pot Theory as well. I enjoy coupling honey pots with other forms of spam reduction for best results.

http://www.projecthoneypot.org/

3

Another common approach is to give the user a simple question ("is fire hot or cold?" "what is 2 plus 7?" etc.). It is a little captcha-like, but it is more accessible to users with vision disabilities using screen readers. I think there must be a WordPress plugin that does this, because I see it very frequently on WordPress blogs.

Kip
  • 99,109
  • 82
  • 222
  • 258
2

As lot of people already proposed : use a honey pot input field. But there are two other things you need to do. First, randomize the name / id of which input field is the honey pot. Store the state of usefull fields in session (as well as a form token, used against CSRF attacks). For exampe, you have these fields to get : name, email, message. In your form, you will have "token" which is your token, "jzefkl46" which is name for this form, "ofdizhae" for email, "45sd4s2" for message and "fgdfg5qsd4" for honey pot. In the user session, you can have something like

array("forms" => array("your-token-value" => array("jzefkl46" => "name",
                                                   "ofdizhae" => "email",
                                                   "45sd4s2" => "message",
                                                   "fgdfg5qsd4" => honey"));

You just have to re-associate it back when you get your form data.

Second thing, as the robot has lot of chances to avoid your honey pot field (25% chances), multiply the number of pots. With 10 or 20 of them, you add difficulty to the bots while not having too much overhead in your html.

Arkh
  • 8,174
  • 36
  • 43
  • in my experience, most spambots blindly submit to the first form on the page. a few submit to every form. i haven't noticed any picking one at random (though i'm sure there are some) – Kip Oct 16 '09 at 13:43
  • 1
    Sure, at the moment they don't pick randomly because not a lot of people use the honey pot method. But give it 2 or 3 years and some will. Let's get a little ahead of bots while it's possible. – Arkh Oct 16 '09 at 13:51
  • I don't really understand what multiple forms have to do with honeypots, but the blog post I linked to describes how you can randomize field names *without* needing a session. However, I suspect that this extra effort is wasted: simple bots right now will go for honeypot fields with non-random names as well (or does anyone have data that contradicts this?), and if they're forced to become more sophisticated, it would not be all that hard to analyze the lexical page structure and use that to decide which fields to fill and which to skip. – Michael Borgwardt Oct 16 '09 at 14:58
  • Not multiple forms, multiple hidden fields. With one bad field for 3 good ones, a bot could just bruteforce trying to get past with one field not completed at each try. So you have 25% of spam still going through. With 10 pots, it has to leave 10 fields blank out of 13. – Arkh Oct 16 '09 at 15:32
2

Sblam! is an open-source filter similar to Akismet.

It uses naive bayesian filtering, checks sender's IP and links in multiple distributed blacklists, checks correctness of HTTP requests, and uses presence of JS as a hint (but not requirement).

Kornel
  • 91,239
  • 30
  • 200
  • 278
2

Regular CAPTCHAs are spam-bot solvable now.

Consider instead "text CAPTCHAs" : a logic or common knowledge question, like "What's 1 + 1 ?" or "What color is General Custard's white horse?" The question can even be static (same question for every try).

Text Logic CAPTCHA

(Taken from http://matthewhutchinson.net/2010/4/21/actsastextcaptcha )

I think Jeff Atwood even uses a validation like this on his blog. (Correct me if I'm wrong)

Some resources:

rlb.usa
  • 14,245
  • 16
  • 75
  • 123
1

Disallow links. Without links, spam is useless.

[EDIT] As a middle way, only allow links to "good" sites (usually your own). There are only a handful of them, so you can either add them at the request of your users or hold a comment until you verified the link. When it's good, add it.

After a while, you can turn this off and automatically reject comments with links and wait for users to complain.

Aaron Digulla
  • 297,790
  • 101
  • 558
  • 777
  • 1
    this doesn't work unless you just discard any posts containing links, which means sometimes valid users are going to lose their posts for no apparent reason. if you remove the links but allow the text you'll probably still get spam (at least that was my experience) – Kip Oct 16 '09 at 13:20
  • -1, how would this even be effective? – Malfist Oct 16 '09 at 13:29
  • 1
    Links are nearly the only reason why automated comment spam exists. By disallowing posts that contain links, you'll catch close to 100% of all spam. You don't have to swallow them silently either; display an explicit error message "sorry, no links allowed in comments". But still, it does make comments less useful. – Michael Borgwardt Oct 16 '09 at 13:47
  • 2
    bots will still post spam, even a text only link is better than nothing. – Malfist Oct 16 '09 at 14:17
  • 3
    Spam is useless, but the bot doesn't care. – Malfist Oct 16 '09 at 14:18
1

You could try looking at using a third party like Akismet. API keys are free for personal use. Also, The Zend Framework has a package for this.

Kieran Hall
  • 2,617
  • 2
  • 23
  • 27
1

Most bots simply fill out the whole form and send it to you. A simple trick that works is to create a normal field that you usually hide with the aid of javascript. On the server side just check whether this field has been filled. If so -- then it is spam for sure.

clops
  • 4,760
  • 6
  • 35
  • 50
  • 8
    hiding the field with the aid of CSS would be better... then any non-JS users wouldn't see it either. – Kip Oct 16 '09 at 13:32
1

I have reduced about 99% of spam on my website through a simple mathematical question like the following:

What is 2+4 [TextBox]

The user will be able to submit the question/comment if they answer "6".

Works for me and similar solution works for Jeff Atwood from Coding Horror!

azamsharp
  • 18,391
  • 34
  • 139
  • 218
0

On my blog, I have a kind of compromise captcha: I only use a captcha if the post contains a link. I also use a honeypot input field. So far, this has been nearly 100% effective. Every now and then there will be a spammer that submits something to every form which contains no links (usually something like "nice site!"). I can only assume that these people think I will e-mail them to find out who they are (using the e-mail address that only I see).

Kip
  • 99,109
  • 82
  • 222
  • 258
  • If they submit to every form cant you determine that as spam due to the presence of text in the honey pot? No one but a bot will post there, so if it has any contents = spam. (one edge case: autofill) – RyanS Jun 19 '13 at 16:02
  • 1
    @RyanS: I wrote this nearly four years ago so I'm not sure, but I *think* I meant that someone submits to every form visible on the screen, without populating the honeypot field. – Kip Jun 21 '13 at 01:28
0

along with using honey pot fields, we can ban there IP automatically (which don't work for dynamic IPs) and especially any links posted back by bots.

Sanil
  • 11
  • 2
0

Akismet is a good alternative, they check your posts for spam and works very efficiently. You just need to load their librabry. http://akismet.com/development/

0

checkout some wp antispam plugins for examples and ideas

there're many nice antispam without using captcha.

some i'd recommend: hashcash, nospamnx, typepad antispam. all these using different methods blocking spam and i use them all. hashcash+nospamnx block almost all spambot. and typepad antispam block most human typed spam.

these are also good ones: spambam, wp-spamfree, anti-captcha, bad-behaviour, httpbl, etc

also with simple .htaccess that block any bot direct POST that do not come from your own site (check referer)

or, simply outsource your comment system to disqus and sleep tight.