Fighting form spam with a little JavaScript

Joor Loohuis, June 12, 2009, 9396 views.

Form spam is one of the many forms of abuse your website may fall victim to. Preventing automated form submissions is often done with captchas, but there are effective ways of preventing garbage in your websites and your databases, using nothing but a little JavaScript.

Tags: , , ,

If you have a website with one or more forms, you're probably faced with numerous submissions containing links to gambling sites, business proposals, or seemingly jibberish. These submissions are the result of scripts crawling the web, looking for forms through which they submit prefabricated content. The objective varies from looking for opportunities to send email spam to submitting links to improve search engine ranking. They may affect your mail inbox, your wiki content, your blog talkback, and any other systems that receive the form data. Stopping automated submissions is a high priority.

One typical way of preventing automated form submissions is by using a captcha, a little image that contains a warped set of characters. The idea is that a person can read the text and retype it, but a 'machine' (a spam script for example) has a really hard time reproducing the information. In reality, optical recognition systems are continuously getting better at processing captchas, while humans are having more and more trouble decyphering the same captchas. If you are anything like me, reproducing a captcha is a hit-or-miss procedure, often requiring entering all form fields more than once. Another problem with captchas is that they may be expensive to implement in money and time. It would be nice to have simple but effective alternatives.

A JavaScript alternative

If you have a form on a website or in a webapplication, it probably uses some JavaScript to validate that what the user has entered is more or less correct and complete before the information is submitted to the server. Now before you start complaining that JavaScript form validation is not sufficient, I agree with you there. Full input validation should be done on the server before anything else is done with the data. But JavaScript form validation is very useful in helping the user find out if the entered data are complete and correct before it is sent to the server. It is an addition to server side input validation, not a replacement.

Anyway, in the same way JavaScript is used to validate form input, it can be used to help discern scripted form submissions from actual submissions done by users. The idea is that many spam scripts don't support JavaScript, but merely parse the form HTML for its fields and generate content for a constructed request to the URL specified in the form action attribute. In the case of popular web application packages for wikis, blogs, etc., the script doesn't even have to parse the form because the composition is widely known, so a spam script can be even simpler. Using this insight, what if we're able to require JavaScript for the form submission to work? The method is very simple. We incorporate a field with a specific name in the form (say, 'forbidden'), and rename that to something else (say, 'required') in the JavaScript submit event handler of the form. On the server side we then reject any submission that contains the field 'forbidden' and doesn't contain the field 'required'. Of course, there's no harm in making these field names a little less obvious, but the idea should be clear. In your JavaScript submit event handler, this may look like

function validateForm()
{
    var myform = document.getElementById('myform');
    // regular validation goes here
    // ...
    
    // rename forbidden field
    var elm = myform.forbidden;
    if (elm)
        elm.name = 'required';
}

As you can see, there is hardly any code involved. The elegance of the idea is that it fits nicely with just about any form validation and processing you may have.

Results

Talk is cheap, but does it work? To find out, we applied the approach to a guestbook that was present on a website for a gym. They had a problem with loads of automated posts that they manually had to remove (no, they didn't want to moderate before submissions appearing on the site). We were supposed to 'just fix it'. So we applied our spam prevention strategy, but with the little addition that all submissions that were judged to be spam were not discarded but stored in a separate table in the database. This way, if we had any false positives, no data would be lost. The results: in a period between February 14 and April 21, 2008, the system identified 476 submissions as spam, and 4 as legitimate. There were no false negatives, and no false positives. Of course this was only one experiment on one site, but the results are definitely encouraging.

Multi-layered approach

Preventing form spam is best done using a multi-layered approach, rather than a single line of defense. The method described here can be combined with a graphical captcha, referrer checking of the submitted requests, input validation at the server side, and any other method you can think of. The idea behind most form spam prevention methods is not that it will stop all form spam, but that it makes form spamming a lot more expensive for the spammers, decreasing their profit margins. In this case, the more simple (cheap) approaches that simply submit forms but do not execute JavaScript on the page containing the form can be easily detected and dealt with. The flip side is that people that have JavaScript turned off in their browsers can't use the forms to submit data. You have to decide for yourself if this is an acceptable situation, but in general, your audience is accustomed to JavaScript being required for almost all websites.

Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl

Talkback

respond to this article