Home Download FAQ / Knowledge Base Screenshots Documentation Support Roadmap

How can I plug sa-learn into Citadel to 'train' SpamAssassin?

Here's how one site did it: (Thanks to Jon Watson for this contribution) ----

Shawn recently wrote an entry on his blog detailing how to get sa-learn to work with Kolab. For the uninitiated, sa-learn is the 'learning' component of SpamAssassin. In short, you tell sa-learn that you're feeding it a bunch of spam and it reads it and learns more about what spam looks like. Same with ham (your 'good' mail). Feed sa-learn ham, tell it it's ham, and it learns what good email looks like. It's a very cool little dealio, but it's only meant to work on individual email files, mbox or mbx mailboxes. Citadel doesn't use any of these formats - it stores everything in a database. Further, sa-learn can't be piped to or read from stdin, so the obvious workarounds don't work.

SO what's a guy to do? Well, what I always do is stand on the shoulders of giants so I look clever.

Apparently, I'm not the first person by far to want to extend sa-learn's capabilities past the file level. There are a few projects out there that feed sa-learn email directly from an IMAP account instead of local files. Since Citadel provides an IMAP server, this is working well for me.

The important part of my folder setup is this:

Inbox
 |
 SALearn
 |-Spam
 |-Ham

Different IMAP servers represent this folder hierarchy in different ways. Some use a dot (inbox.salearn.spam) and some - like Citadel - use a forward slash (inbox/salearn/spam). Armed with that knowledge, I used the information from the Apache Wiki to figure out how to use fetchmail to get my spam and ham and feed it to sa-learn.

I created my fetchmailrc file as described and then made two script files: one for spam and one for ham. Yes, they can be combined into one file, but I'm a task-based thinker and I like them separate. There's only one line in each file:

Fetchmail uses the server and login information in the fetchmailrc file I created to log in to my account, it grabs the email from the Spam or Ham folder respectively, and feeds it to sa-learn with the proper ham or spam switch.

Now, the only reason this works is because I'm diligent about dragging the email in my inbox to either the spam or ham folder depending on what I want SA to learn from it. I *copy* my good mail (ham) to the ham folder and I *move* any spam mail to the spam folder. The scripts do the rest once an hour or so.

Cron it up and you're done!

There are no social media links here. Enjoy a friendly Citadel community instead. Or go outside.