How To Help Spam Protect Your Drigg Social News Project

driggDrigg for Drupal is a professional grade set of modules that allows anyone to start their very own Digg Clone type website, with competing social news systems having been the victims of spammers how can you prevent this fate happening with your Drigg based site. Well unlike competing systems Drigg has some very powerful and effective modules available to almost eradicate the spam problems others face and still do face each day.

No website owner wants to spend vast amounts of time moderating spam each day or simply have their site become a project that has lots of visible spam submissions daily within the upcoming section of your website. This is where adequate forms of spam protection for submissions becomes an essential factor within your website. The main difference here between Drigg and Pligg based systems is that Drupal Spam modules actually work in a live environment and drupal’s modules don’t use techniques like limiting the amount of stories that can be submitted by a user thus limiting site interaction.

Below we will show you a few modules you can install to control spam signups and submissions within a Drigg system and why it’s the better choice of system when it comes to handling spam successfully and efficiently.

Drupal Spam Module

The first module for Drupal we are going to introduce you to is simply titled Spam

The spam module provides numerous tools to auto-detect and deal with spam content that is posted to your site. Spam can be automatically unpublished and/or deleted. The spam module also provides four main mechanisms for automatically detecting spam: a trainable Bayesian filter, manually entered custom filters, counting the number of URLs, and detection of content posted from open email relays.

Features:

  • Written in PHP specifically for Drupal.
  • Highly configurable.
  • Automatically detects and unpublishes spam comments and other spam content.
  • Automatically learns to detect spam in any language using Bayesian logic.
  • Automatically learns and blocks spammer URLs.
  • Automatically blacklists IPs of learned spammers, preventing them from posting additional spam and wasting database resources.
  • Detects repeated postings of the same identical content.
  • Detects content containing too many links, or the same link over and over.
  • Supports the creation of custom filters using powerful regular expressions.
  • Can notify the user that his or her content was determined to be spam, preventing confusion over why their content doesn’t show up.
  • Can notify the site administrator in an email when spam is detected.
  • Provides ‘report as spam’ links allowing users to easily help detect spam.
  • Provides simple administrative interfaces for reviewing spam content.
  • Provides comprehensive logging to offer an understanding as to how and why content is determined to be or not to be spam.

Overview:

The Bayesian filter does statistical analysis on spam content, learning from spam and non-spam that it sees to determine the liklihood that new content is or is not spam. The filter starts out knowing nothing, and has to be trained every time it makes a mistake. This is done by marking spam content on your site as spam when you see it. Each word of the spam content will be remembered and assigned a probability. The more often a word shows up in spam content, the higher the probability that future content with the same word is also spam. As most comment spam contains links back to the spammer’s websites (ie to sell Prozac), the Bayesian filter provides a special option to quickly learn and block content that contains links to known spammer websites.

The custom filtering functionality can blacklist, whitelist or greylist based on the matching of words, phrases and regular expressions. For example, a custom filter can be defined to always mark content as spam if it contains the word ‘Viagra’. Or, a custom filter can be defined to increase the probability that content is spam if it matches the case insensitive regular expression /free/i.

The spam module can also limit the total number of URLs allowed in comments and other content, as well as the number of times the same URL can be repeated in the same content. These limits can be different for comments and for other types of content. For example, if the module is set to only allow the same exact URL to appear in a comment twice, if “http://kerneltrap.org/” shows up in the same comment three or more times, the comment will be considered spam.

The fourth tool for detecting spam is to look up the poster’s IP address in the Distributed Server Boycott List (http://dsbl.org/). If the address is listed, it is known to come from an untrusted email server such as an open relay and is marked as spam. The theory is that most comment-spammers are also email spammers.

As a Drupal administrator, you can decide to enable any or all of the above tools as best suited to your needs.

For Drupal 5 based Drigg sites we recommend Spam v1.1.2 which is an older version but works very, very well.

Drupal Spam Module v1.1.2

There is a port of the Spam module available for Drupal6 although a port it seems to work adequatley and without any major bugs present. It should also be noted that the Drupal6 port of the Spam Module is the v3 of the spam module which we dont thik as yet works as well as  v1.1.2 for Drupal5.

Drupal Spam Module Drupal6 Ported Version

The Spam module is trainable and holds submitted stories, comments that it detects as spam in an unpublished area and out of your users view, you then decide if those articles are spam or not and can choose to delete or publish. A time limit for automatic deletion is also configurable along with email alerts. This is a very, very powerful module and also very effective in protecting spam submissions on your Drigg based website.

Drupal Troll Module Drupal5 Only

The Troll module gives owners several features to administer and protect against spam users that have registered for your Drigg based website.

This module provides troll management tools for community sites including users by IP address, banning IP addresses, advanced user searching and blocking user by role.

  • user tracking by IP address
  • bans IP addresses forever or for set duration, by ban I mean redirect, and by redirect I mean complete site blocking not just account creation and login
  • advanced user account searching
  • easy user blocking by role
  • IP block blacklisting with an import feature from Okean, SPEWS, or a custom list

Troll is a must have module if you want good spam control over any repeat offenders that will sign up to your site and try to spam.

ReCAPTCHA Module

Usining ReCPATCHA instead of other drupal methods effectivly stops bots signing up to your site, it wont however stop human spammers. Along with the registration form you can also easily place recaptcha on story submission and comment submission forms on your Drigg website for an added layer.

Uses the reCAPTCHA web service to improve theDrupal  CAPTCHA system and protect email addresses.

ReCAPTCHA is an addon for the Drupal CAPTCHA Module which can be downloaded from the link below.

Drupal CAPTCHA Module

Conclusion

The Drupal Spam module really is excellent and very effective and handling spam submissions or comments of any kind, the fact you can specify particular keyword terms as Custom Filters to indentify possible spam is excellent and really works well. URL filtering is also an excellent feature and stops any URL’s you identify as spam being submitted again.

If you are usuing Drupal 5 Troll is also brilliant for banning users by several layers including account and IP which is pretty effective and fast to do, a time limit for the ban can also be placed on an account as a temporary warning which is nice.

What Drupal offers over it’s main competitors is effective spam control after you have trained you filters to the type of spam that is being submitted to your site. Drupals modules are also all configurable and functions, features can all be administered from the administration section somthing of an essential feature which most of it’s competitors current crop of modules often lack.

No matter what system you use spammers will always turn up but by usuing Drigg for drupal you will not face the same headaces Pligg users have faced in the past and still face today.

We will follow up this article with a post about how to configure the Spam module and look at some of it’s features in more detail soon.

If you enjoyed this post, make sure you subscribe to our RSS feed!

Article Details

#

Author: Lincoln on December 24th, 2008

Category: Drigg, Drupal

Tags: , , ,

  1. Wim Mostrey says:

    Is there a specific reason why you don’t mention http://drupal.org/project/mollom as a means to project a Drupal project against spam? The integration is complete, very user-friendly and free for low-traffic sites. It’s really becoming the standard in spam protection for Drupal.

  2. Lincoln says:

    Hi WIm,

    We didn’t mention Mollom in this article as we wanted to show the options drupal users have without the requirement to have any 3rd party external services. We will be covering mollom however in an upcoming article.

  3. Multus says:

    I’m working on a Digg clone for a political news website in the Netherland. So thanks for the article, very helpful to me!

Leave a Reply