PDA

View Full Version : Bayesian Spam Filters for Message Boards?


IsaacW
04-15-2005, 12:40 PM
Has anyone seen an implementation of a probability-based spam filter for message boards? Paul Graham wrote about this a long time ago in A Plan for Spam (http://www.paulgraham.com/spam.html) and I think it might just be possible to do this on message boards as well as inboxes. For example, a lot of the spam that gets posted here looks a lot like e-mail spam, with a link and some bit of relevant text.

What are the major obstacles to implementing this kind of technology to help fight our spam problem here at 2+2? Has this been attempted on a message board before?

Aren't I asking a lot of questions?

Greg J
04-15-2005, 12:47 PM
I actually think the mods do a nice job of keeping spam off here. I know you are asking a theoretical question, and not hammering the mods. I guess I'm raising an issue of prudence. Chances are, it could filter out otherwise relavant posts with a link through statisitical anamoly -- kind of like the one you just posted. /images/graemlins/smile.gif

It's not a bad idea in theory -- I'm just not sure we need it.

AncientPC
04-15-2005, 12:52 PM
The difference between e-mail and forum spam is e-mail spam can be automatically generated with a bot and set to run for hours on end.

Forum spam actually requires a human on the other end to register a username, authenticate the account, and then start posting. Most of these guys get caught on pretty quick by people using the "Report Moderator" function.

IsaacW
04-15-2005, 12:55 PM
My point certainly was not to bash on the moderators here, they do a fine job. But, if this proved to be an effective tool against message board spam and became widely used, perhaps message board spam volume would reduce as it became harder to post spam messages.

Also, I singled out 2+2 because this is the only forum I frequent /images/graemlins/grin.gif

IsaacW
04-15-2005, 12:58 PM
[ QUOTE ]
Forum spam actually requires a human on the other end to register a username, authenticate the account, and then start posting.

[/ QUOTE ]
I don't think this is necessarily true. I have a domain where I can receive e-mail to <anything>@domain.com into a single e-mail box. With this, I could set up an automated signup service using randomly generated e-mail addresses. Then, one just has to figure out the appropriate signin/post page GET or POST URLs to automatically signin and post without any human intervention at all.

AncientPC
04-15-2005, 01:03 PM
Yes, you could probably authenticate automatically.

However all sign up pages aren't the same, and how would you pick which forum to post in? Plus different forums use different software . . .

IsaacW
04-15-2005, 01:10 PM
Ok, so to write a good message board spambot you would have to "support" a lot of different varieties of message board software. This is not really relevant to my original question, except to the extent that message board spam may not be a widespread enough problem to warrant such measures.

A Bayesian filter doesn't know or care if an e-mail was written by a computer or a person; if it detects a message as spam, then it is rejected.

SinSixer
04-16-2005, 04:46 AM
Autodelete any post with the word Thursday in it.

axioma
04-16-2005, 07:46 AM
theoretically i belive it could certainly be done, with a naive bayes filtering method.

accuracy would be over 97% with a suitable training size.

i have a friend who is an expert in bayesian spam filtering.