CORPUS_SUBMIT   [plain text]


 1. If you don't already have a submission account, send a request to
    submit@spamassassin.org and ask for a GA mass-check submission account.
    You will receive your username and password via email, it will only be
    good for GA mass-check submissions.  If you're interested in the nightly
    submissions, please see the CORPUS_SUBMIT_NIGHTLY document.

 2. Get the latest version of SA following the instructions in the GA
    mass-check announcement email.
 3. Now cd to the "masses" directory in the checked-out CVS code tree.
 4. Read README to gain understanding of what mass-check does.
 5. Run mass-check against your ham mail archive.
 6. sort -rn +1 ham.log | head -20
 7. Check each of those 20 messages by hand to make sure they're not spam that
    slipped through, or a forward of a spam message.
 8. Repeat #6 until the top 20 are "clean"
 9. Repeat steps 4-7 for your spam archive until they are "clean"
    (except you do sort -n +1 spam.log to look for low scoring spam)
10. Run a mass-check for ham and spam together (one mass-check run)
11. rename ham.log and spam.log to the appropriate filenames.  ** see note below **
12. rsync -CPcvzb ham-yourname.log spam-yourname.log username@rsync.spamassassin.org::submit

Thanks for your help!


Note: Depending on what type of mass-check you've run, the name of
the file you need to upload may be different.  The different types of
mass-check are combinations of with/without Bayes, and with/without
Net rules.  The resulting filenames are:

Set 0		ham-nobayes-nonet-username.log	spam-nobayes-nonet-username.log
Set 1		ham-nobayes-net-username.log	spam-nobayes-net-username.log
Set 2		ham-bayes-nonet-username.log	spam-bayes-nonet-username.log
Set 3		ham-bayes-net-username.log	spam-bayes-net-username.log

For GA mass-check runs, there will be 3 announcements for people to run
sets 1-3 (we can get the set 0 results by removing the net results from
set 1 ...)