Our Anti-spam is Ready! - Accuspam [Archive] - Page 2 - SpeedGuide.net Broadband Community

View Full Version : Our Anti-spam is Ready! - Accuspam


Pages : 1 [2]

accuspam
08-08-04, 08:03 AM
Improvements Made:

1. Replying to Daily Summary with "R" (or formerly "D") in [ ] box no longer adds sender to Approved Senders list. Now works as documented it should. This is important so that the AccuSpam user of Daily Summary (the recipient) can deliver a possible spam email if the subject is not clearly a spam without skewing the statistics or adding sender to their Approved Senders list. Note this only applied to free version of AccuSpam, as paid version already did the correct thing.

2. Replying to Daily Summary with any character(s), except empty space, in [ ] box no longer deletes the email and instead acts as if "R" was typed in box (i.e. delivers email without adding to Approved Senders list). This presents the safest action given a typo by AccuSpam user of Daily Summary (the recipient). This safest action was implemented only now because we were waiting to implement #1 above.

3. Senders no longer add themselves to Approved Senders list when replying to the AccuSpam confirmation message, and only their email is delivered. This is to remove any way and incentive for spammers to reply to the confirmations in order to corrupt the statistics (although this apparently had not been exploited yet). Note a probable future improvement will feature a spammer-proof (http://www.captcha.net/) option for senders to add themselves to Approved Senders list. Note that this only applies (confirmations only sent to) senders which are not yet on Approved Senders list. And note that all such senders are added to the Approved Senders list when recipient (AccuSpam user) replies to their Daily Summary with an "A" in the [ ] box. The (removed and possibly future) ability of a sender to add themselves to recipient's Approved Senders list bypasses the Daily Summary and delivers the email immediately.

4. Fixed a bug mentioned previously in this thread (previously mentioned to be in our TO DO list), when an AccuSpam user disables his or her account, then all emails currently held in storage (quarantined) by AccuSpam are released (sent back) to the mailbox of the user. Formerly, the emails were deleted and lost on disabling. The advice we had given earlier in this thread to only disable immediately after replying to a Daily Summary, is no longer necessary. Note this only applied to free version of AccuSpam, as paid version always did the correct thing.

5. Added the "Approved Days" field to the AccuSpam Login web page, defaulted to 90 days, which removes Approved Senders which do not send email within "Approved Days", so that Approved Senders list does not grow ridiculously large from one time or infrequent senders, and which thus minimizes the risk of forged spam being received from an Approved Sender (for those senders who do not use AccuSpam anti-forgery):

http://www.accuspam.com/faq.php#as_spam

-Shelby Moore
http://AccuSpam.com

iaus10
08-08-04, 08:57 AM
Again I will repeat, this is because the spams are arriving in the 3 - 5 minutes prior to when you check your email each time.

http://www.accuspam.com/faq.php#as_spam

"If you have not enabled the 100% spam blocking then the ability to block 99% will be reduced significantly if you have your email program set to check for new email every few minutes or sooner. You should either increase the frequency that you or your email program regularly check for new email to 30 minutes or more. Else, enable the 100% spam blocking."

I need to know the email address address for your account at AccuSpam so I can offer you more tips? You are free to email it to, support@accuspam.com, if you prefer.

Again I will ask you, how many spams does the Daily Summary state it deleted for you? Note, if you tell me the email address for your AccuSpam account, then I lookup that data in the database, else I need you to answer please.

Again I will repeat that I think AccuSpam is working as advertised for you. You have to follow the instructions.
"Again I will repeat" ??

I posted my email address in a previous thread, and I pasted a copy of the daily summary (which includes the stated spams)................ whatever, forget it. I'm done.

accuspam
08-08-04, 09:25 AM
I posted my email address in a previous thread...

Oh now I found it, because you had hidden it, so I did not see it:

http://forums.speedguide.net/showpost.php?p=1375363&postcount=74

"My email is iaustin AT comcast dot net"

To tell you the truth, I am a very busy person, and I scan responses. My eye is trained to find the "@" sign. Wouldn't it be nice to not have to obscure your email address when communicating in a forum. I think so, and that is why I created AccuSpam and am not afraid to tell you that my business email address is:

shelby@coolpage.com

And my personal email address is:

coolpage@earthlink.net

Spammers feel free to email there any time. I am not afraid to write an email address the way it is supposed to be written.

...and I pasted a copy of the daily summary (which includes the stated spams)...

Yes I remember seeing that 100+ line post and I was not too enthralled ... why spam the whole forum by posting an entire daily summary. The majority of the daily summary has no information of value to any one other than you. All I asked for were the counts which in only 2 or 3 lines from the daily summary.

And my point was if you are going to accuse AccuSpam of not blocking spam, then provide all the counts at the same time, so the spams in the Inbox are in context of all spams you have received in your mailbox which WERE deleted by AccuSpam. You did not provide updated counts in this most recent post.

Indeed, now that I know your AccuSpam account, I can see:

mysql> select * from filter where userid=27;
+----------------+--------+------------+---------------+-------------+------------+--------------+
| ChangeDate | UserId | IsUpdating | LastMsgsCount | LastMsgUIDL | NumDeleted | NumDelivered |
+----------------+--------+------------+---------------+-------------+------------+--------------+
| 20040806061824 | 27 | 0 | 0 | | 26 | 20 |
+----------------+--------+------------+---------------+-------------+------------+--------------+
1 row in set (0.00 sec)

That you do not receive much spam at all, and most of it is being missed by AccuSpam. So now I am very sure that you have your email program set to check for new email every 30 seconds or something very frequent and in direct defiance of the instructions I had given you previously:

http://www.accuspam.com/faq.php#as_spam


...whatever, forget it. I'm done.

Any way, the point is I told you in past post the reasons you could receive spam in Inbox with free version. So you still have your email program checking email too frequently to work well with the free version. You did not follow the instructions in the AccuSpam FAQ, nor when I pointed you to the FAQ in the previous posts.

If you want to get angry with me, so be it. I also told you previously that I don't think you need AccuSpam. Comcast is already providing BrightMail to you for free. If you want 100% blocking, you will need the paid version of AccuSpam. If you want close to 99% blocking, then you already have it with BrightMail, else you need to use the free version of AccuSpam properly.

So yes, I would not expect you to try too hard to use AccuSpam. Also I asked you (personally directed to you in previous post), not to try AccuSpam again until I gave you the greenlight, because I could sense that you would require more personal attention than I have to give you at the moment. I am more focused on reducing the # of emails in the Daily Summary at the moment, than on acquiring new users who are already using the very good BrightMail anti-spam.

accuspam
08-08-04, 09:40 AM
...My email is iaustin AT comcast dot net...

Also a spammer just as easy write a program to retrieve this very common way of obscuring an email address, as the actual email address format.

The regular expression is look for word followed by whitespace followed by "at" (upper or lowercase) followed by whitespace followed by word followed by "dot" (upper or lowercase) followed by whitespace followed by top level domain, where word is any legal character in email address, except @ and .

So don't feel too safe writing your email address that way. All you accomplish is make it more difficult to communicate legitimately.

TrevGlas
08-08-04, 10:56 AM
Tried signing up with my info, I keep getting server time out.. can ya help me out?

Paft
08-08-04, 01:22 PM
Also a spammer just as easy write a program to retrieve this very common way of obscuring an email address, as the actual email address format.

He's right.

int main()
{
fstream file("retrieved_html_page.html")
file.getline(user, 64, ' ');
file.getline(atrate, 4, ' ');
file.getline(domain, 64, ' ');
file.getline(dot, 8, ' ');
file.getline(ext, 8);
file.close();

return(0);
}

I prefer to just write my email in unicode. THAT ****s spammers up really quickly. I know I'd rather not use regxp to try to decode a long unicode address.

accuspam
08-09-04, 01:35 AM
Tried signing up with my info, I keep getting server time out.. can ya help me out?

Apparently your Host refuses POP connections not from it's own network?

Indeed what AccuSpam is reporting to you is true. The problem is apparently with "mail.mchsi.com". It will not accept a connection over port 110 (the POP port). Contact www.mchsi.com and ask them what the problem is. Perhaps they only allow connections to "mail.mchsi.com" from their own IP addresses (network)? If you are connected to internet via different ISP (not www.mchsi.com), can you still POP your email (not using WebMail)? I would guess the answer is no. You need to complain to www.mchsi.com.


telnet mail.mchsi.com 110
Trying 204.127.203.151...
telnet: connect to address 204.127.203.151: Operation timed out
telnet: Unable to connect to remote host

accuspam
08-09-04, 01:44 AM
Alerting thread followers that I updated this post with some proof that Bayesian can be defeated every time:

http://forums.speedguide.net/showpost.php?p=1374288&postcount=69

Note the post is now prominently linked from Google:

http://www.google.com/search?q=defeat+bayesian

I expect spammers to figure out how to defeat Bayesian eventually.

I am not trying to encourage spam, but I think it is important that users of Bayesian understand the risks. Just as I think users understand that obscuring an email address in a public web page is not necessary effective unless done in such a uncommon way that most humans are obscured also.

The reason I am pursuing AccuSpam's statistical idea, inspite of reasonable performance many recipients currently get with other existing anti-spam is I rationalize those other anti-spam can and will be subverted as they become more popular.

Again I have outlined how I feel AccuSpam can be subverted as well, and explained why I think it is not likely:

http://forums.speedguide.net/showpost.php?p=1376466&postcount=97

I am currently working on implementing more effective statistical algorithms to reduce the # of spams in Daily Summaries from AccuSpam.

Prey521
08-09-04, 01:47 AM
I've been using Accuspam for the last month or so and so far good things. I enjoy the daily summarys and replying, LOL. It's getting less and less recently.

accuspam
08-09-04, 10:44 AM
I continue to analyze the anti-spam alternatives to AccuSpam.


BAYESIAN:

I have outlined that the apparently very popular Bayesian anti-spam (used in Spam Assassin, http://www.paulgraham.com/spamfaq.html, etc) can be defeated by spam which does not include more than a few words (or oddities) used only in spam AND which simultaneously includes words which are used in most of the non-spam of the recipient.

Also Bayesian carries a major risk that your future legitimate email which does not use the same words as your current legitimate email will be erroneously blocked by Bayesian. More details and examples were given in previous post:

http://forums.speedguide.net/showpost.php?p=1374288&postcount=69

Probably the way to destroy Bayesian over night, is to distribute an email virus which sends random spams to every email address it can find on the infected user's computer and/or to huge spam list. The spams would have no purpose (and not cost) to the spammer other than pollute the Bayesian databases to the point that they were totally useless.

Since Bayesian can filter such a high % of spam for many people (some claim 98+%), then it has gained (imho a blind) popularity, probably without adequate evaluation of the risk of losing legitimate email and reducing effectiveness in future. I have quoted a research paper in earlier post which indicates that if you value a legitimate email 1000 times more than not receiving a spam email, then the risk cost of using Bayesian is greater than using no filter at all. And spammers have not even begun to fully exploit the ways Bayesian can be defeated.


BRIGHTMAIL:

Currently used by 6 of the top 10 ISPs (Comcast, MSN, Earthlink, etc) and claiming to process 16% of all internet email, it boasts 95% spam blocking with near 0% (1 in million claimed) blocking of non-spam. Was an incumbation company run by a former Symantec employee (Enrique Salem, who incidentally I went to high school with and knew personally) and recently acquired by Symantec.

BrightMail uses hueristic rules written by humans to filter spam. They monitor any spam which defeats these rules on their probe network, and write new filters every few hours:

http://www.economist.com/business/displayStory.cfm?Story_ID=569825

These rules are apparently mostly combinations of content and origin reputation filters.

To defeat BrightMail, one obvious technique would be to send smaller spam runs (less chance of hitting the probe network and less time for BrightMail to get the rules out) and to randomly alter content and origin of the spam sent, including any urls or other aspects (headers, etc) in the email that can offer a traceable signature. If the volume of mutations is greater than the human economy-of-scale of BrightMail, then the either BrightMail has to find a way to automate the correlation of the randomness or they have to hire more humans. However, spammers economy-of-scale costs can also increase if they have to deal in smaller spam runs and with increased effort to mutate in random uncorrelateable ways.

If any spammer can ever figure out a way to determine which email addresses in the spam list are the spam probes, then those email addresses can be eliminated and BrightMail will suffer drastically. Since BrightMail already filters 16% of all email (and this will be more true as BrightMail gains marketshare), then a spammer could send spam runs which are say 1/10000 the size of his or her mailing list, randomly selecting email addresses, and also very important to randomly mutate all aspects of the spam sent (to be sure that the spam is not caught by an existing BrightMail rule). Then record the response rates for different spam runs (sets of email addresses), then plug this data into a huge system of simultaneous equations and then begin to get some indication of which email address sets are underperforming (must be the spam probes) and remove them from the lists.

Other possible methods spammers could employ to detect spam probes would be to send emails which are very normal and not likely to be caught by any type of spam filter which only illicit a response (the "I love you" email virus comes to mind), one that most people likely to respond to spam would reply to. Then all those that don't reply could be removed from the spam lists by the spammer to save the spammer the economy of sending spam to those who won't respond any way. However, I think BrightMail replies to these. So perhaps the inverse would work better, i.e. to delete from the spam list all those who reply to a request to reply under the logic that most of those replying are not normal humans because normal humans would not reply to something they do not need.

But as of now, spammers just do not have the incentive to do this. Why go through all that effort to get 16% of your spam through, when you can just increase the effort to spam the 84%.

So you really won't see spammers trying very hard to defeat these systems unless it is really easy (which is what worries me about using Bayesian) or until say 50% or more of their spam is being filtered.

Incidently the randomized email virus would also probably get past BrightMail and also cause major havoc, although not as permanent as the damage it would cause to Bayesian.

So again, I reiterate that we feel BrightMail is a strong alternative and will continue to be so for the forseeable future. But BrightMail can never block 100% of incoming spam, because not all spam hits the BrightMail probe network and/or new rules can not always be delivered in time.

accuspam
08-10-04, 11:03 PM
OVERVIEW OF ACCUSPAM STATISTICAL PERFORMANCE AND METHODS

All inventions, methods, and algorithms stated herein are the intellectual property of AccuSpam and posting descriptions of them here in no way assigns any rights or gives a license to any one to use them. Any policy of SpeedGuides with regard to intellectual property rights over posts to it's website is superceded by this paragraph. Should SpeedGuides not agree, they may reject this post. These inventions, methods, and algorithms are incorporated into the products licensed to users by AccuSpam.com. This paragraph applies to all previous and future posts made by AccuSpam to SpeedGuides.

The trends in the statistics building from AccuSpam users is very encouraging. Looking at Tony's AccuSpam account as a benchmark example, given he receives on the order of 2000 spams per day, I see that AccuSpam is currently deleting 90% of the spam before the Daily Summary, and 10% of it is being summarized in the Daily Summary. However, 200 spams in the Daily Summaries per day (100 per Twice Daily Summary) is too much to browse to search for legitimate emails which may be burried. Looking at the ratios for spam domains in the current statistics of all users, and extrapolating this to a factor of 10 times more users, assuming those ratios remain consistent, then I project 99.5% to 99.85% deletion of spam before the Daily Summary, and 0.15% to 0.5% of it being summarized in the Daily Summary. I believe AccuSpam will continue to be near 0% false positive risk (towards 1 in million) and of course AccuSpam prevents 100% (in paid version) of spam from reaching the Inbox.

Contrast that above anecdotal evidence of AccuSpam's probable current and future performance, to what I consider to be the best-of-breed alternative anti-spam, BrightMail.com with 95+% spam deletion, 5% or less reaching Inbox, and near 0% (1 in million claimed) false positive rate, and per user trained Bayesian with 98+% spam deletion often claimed, thus 2% or less reaching Inbox, and known false positive risk in the range of 0.03% to often 0.5%. The future risk is always high for Bayesian because for example you may be training on a lot of spam mentioning mortgage financing or insurance and currently receiving no legitimate email about these topics, but may in future request some legitimate emails on those topics and find them deleted by the Bayesian. More details about the risks of Bayesian and BrightMail are in following posts:

http://forums.speedguide.net/showpost.php?p=1383127&postcount=111

Additionally, if necessary I have invented what I believe to be a superior variant of a Bayesian-like algorithm which would be applicable in the case of AccuSpam's access to multiple user data, which will not suffer from the high potential false positive risk and less potential to be subverted in the ways I stated per user Bayesian can be. If necessary, I could combine the probabilities from such an anti-spam algorithm with the statistical domain blocking algorithm to get better than 99.85% spam deletion rate. Also the statistical domain blocking will continue to build mass and improve. Also we can apply other safe (1 in million error rate) hueristics for known holes in the statistical domain blocking, such as defeating spammers attempts to hide behind free email accounts, by doing reverse DNS on email originating free email domains (under the highly accurate assumption free email users always email from the free email provider's SMTP relays).

Fundamentally what (will) makes AccuSpam's statistical domain blocking superior to traditional per user Bayesian is that AccuSpam is not looking at content and thus can not be tricked by content into not deleting spam or deleting non-spam. Also the ongoing effort of re-training AccuSpam is shared among all AccuSpam users, so the effort per user decreases as the number of AccuSpam users increase. Comparing to BrightMail (as will further elaborated below), AccuSpam's advantage is it can build a higher statistical reach of spam probes, because each trusted AccuSpam user is a spam probe, the writing of rules is automated (does not require hundreds of humans in a BLOC center), and the rules are updated in real-time. This for example should make the future AccuSpam more effective against smaller spam runs and highly randomized spam than BrightMail, which if ever becomes prevalent could make AccuSpam much less costly than BrightMail (to the ISP). Of course, I am projecting future performance, because AccuSpam can not compare with BrightMail statistically until we have significant number of AccuSpam users, as compared to BrightMail's claimed 16% email marketshare. But AccuSpam will not need 16% email marketshare to compete with BrightMail, because BrightMail's users are not their spam probes (the spam probes are not 16% of the market).

Comparing AccuSpam's statistical approach to something like CloudMark.com (Vipul's Razor) or DCC, note that those statistical, user feedback anti-spam are correlating on content, so they can be tricked by (especially randomized) content. Also as far as I know, those anti-spam are not correlating statistics on spamness of a sender, so they can not predict and delete future spam from same spammer.

The 90% spam deletion rate was achieved via detecting inexistent and forged senders, coupled with per user statistical domain blocking. I have not yet activated the global (using all user's data) statistical domain block mentioned in the 2nd paragraph above, because I am both waiting for the statistics to reach a critical mass and waiting for my implementation of a trust metric so we can weed out any feedback from AccuSpam users which is statistically erroneous.

I will discuss some of the inventions we invented and are patenting.

I believe I have worked out an algorithm for correlating the trust of the feedback from an AccuSpam user from replying to the Daily Summaries, which makes it impossible for spammers to subvert or pollute by joining AccuSpam and approving their own spam. I refer the reader back to the list of unlikely ways that spammers could attempt to defeat AccuSpam:

http://forums.speedguide.net/showpost.php?p=1376466&postcount=97

Specifically I am addressing item #3 from the above post.

1. The naive spammer might join AccuSpam and send a bunch of email from his own domain and approve it. We catch this simply by ignoring statistical data from AccuSpam users which have a very high approved domains to disapproved domains ratio.

2. The astute spammer might join AccuSpam and send a bunch of email spoofing other domains, or even from his own domains he wishes to not use (sacrifice them for this purpose), and disapprove those. And send a bunch of email from his own domain and approve it. The rule in #1 would not catch this.

So we do a RMS (root mean squared) error correlation between each pair of AccuSpam users, where the differences between the ratios of disapproved to approved for each domain (voted on by both users), is divided by the number of domains found. If a domain is not voted on by both users, then it contributes to the RMS error calculation by using the greater difference of ratio compared to 0 or 1. We get a mean and standard deviation of RMS errors and ignore statistical data from AccuSpam users which are outside a chosen # of sigma confidence interval from the mean.

3. The more astute spammer would join multiple times to distort the mean and standard deviation. We can catch this by ignoring statistical data from AccuSpam users (and not including in the calculation of RMS error mean and standard deviation) which either have an approved plus disapproved domains count which is below a reasonable threshold (AccuSpam users need to build a enough data before being included in the global statistics), or which have RMS error above some reasonable threshold.

4. The even more astute spammer would join multiple times and replicate the same approved and disapproved domains so they have a relative RMS error of 0 or near 0. So a pairwise computation of all permutations of all users could be defeated. We can catch this by doing our pairwise correlations only to known spam probes (which obviously approve no domains), and possibly to AccuSpam users which have RMS error correlations to the spam probes below a reasonable threshold.

5. The wisest spammer might approve his own domain and disapprove other real spammers' domains. Well I am welcoming the spammers to please starting fighting amongst each other this way :) Essentially we can also apply weighted (by approval counts) trust measures on a per domain basis as well.

6. Another way to correlate without relying on spam probes and aribtrary thresholds, is to use the intended recipient as the trust basis and correlate all other users feedback to the intended recipient using RMS error. Then statistics from other users can be weighted according to the correlation of their feedback to the feedback of the recipient, thus tailoring the statistics to the people who think most like the intended recipient.

It is interesting to then relate this back to BrightMail's use of spam probes. Note that BrightMail depends on spam probes to tell it which emails are spam. BrightMail applies it's existing rules to the spam probes, any email that is not filtered is considered spam and additional rules are written by humans and these rules are distributed every few hours to ISPs using BrightMail. Whereas, with AccuSpam, the spam probes merely serve as a starting point (basis) to trust other AccuSpam users, and the AccuSpam users are the spam probes. So if AccuSpam has the same marketshare as BrightMail, then AccuSpam would have orders of magnitude more spam probes (identify more spam) then BrightMail. Also since the AccuSpam users are voting (replying to Daily Summaries) at random times throughout the day, then new "rules" are being written automatically and being distributed in real-time immediately as the first spams are arriving in users' mailboxes. Thus small spam runs or highly randomized spam runs have a better chance of being detected and deleted before the Daily Summary is viewed with AccuSpam than with BrightMail, given equal market share. Obviously AccuSpam does not yet have near the marketshare of BrightMail, but trending from 2nd paragraph of this post indicates (anecdotally) that comparable performance will be achieved with much less marketshare and very soon. Also as spam diversifies (BrightMail was quoted as up from several 1000 attacks a few years ago to several 100,000 attacks today...), then the number of humans needed at BrightMail to write new rules will increase, possibly faster than # of BrightMail users increase, thus potentially increasing the cost per user of BrightMail. Whereas as the # of users increase in AccuSpam will increase at a much faster rate than for BrightMail (because AccuSpam marketshare is so much smaller), e.g. BrightMail might increase at 50% per year while AccuSpam might be at 5000% percent per year, thus the cost per user will be decreasing for AccuSpam. There are other interesting theoretical comparisons to be fathomed, but the bottom line is to implement these things in AccuSpam and evaluate the results. For example, an interesting thing to ponder is what would happen to both AccuSpam and BrightMail if spammers figured out which email addresses are the spam probes. I do not think they ever will (more difficult for spammers to know this than for us to know which senders are spammers), but if it did happen, BrightMail could detect no spam, and AccuSpam would continue to identify spam through it's previously trusted users who would serve as the trust starting points instead of the spam probes. In other words, with AccuSpam, the users are the spam probes. And more so, if #6 is employed there are no "spam probes" in BrightMail sense, and only AccuSpam users are the spam probes for each other to the degree they correlate to each other.

-Shelby Moore
http://AccuSpam.com

accuspam
08-11-04, 01:06 AM
Expect AccuSpam performance to exceed all best-of-breed anti-spam probably within 1 more month.

Here is the best summary of current and future AccuSpam performance written to date:

http://forums.speedguide.net/showpost.php?p=1384314&postcount=112

We believe we have solved all the difficult issues now.

We are at 90% spam deletion, 100% blocking from Inbox now. 0% false positives for those willing to browse the 10% false negatives in Daily Summary.

Trending shows 99+% spam deletion, and continued 100% blocking from Inbox within a month or less. That will continue to be 0% false positives for those willing to browse the 0.15% to 0.5% false negatives in Daily Summary.

We could activate 99+% now, if we were willing to sacrifice false positives to be at the level of Bayesian. Instead we will wait for critical mass of statistics to maintain the near (1 in million) 0% false positive promise of AccuSpam.

accuspam
08-11-04, 05:21 AM
STATISTICAL BLOCKING BY DOMAIN

All inventions, methods, and algorithms stated herein are the intellectual property of AccuSpam and posting descriptions of them here in no way assigns any rights or gives a license to any one to use them. Any policy of SpeedGuides with regard to intellectual property rights over posts to it's website is superceded by this paragraph. Should SpeedGuides not agree, they may reject this post. These inventions, methods, and algorithms are incorporated into the products licensed to users by AccuSpam.com. This paragraph applies to all previous and future posts made by AccuSpam to SpeedGuides.


ACCUSPAM's USE OF BAYES THEOREM

AccuSpam's statistical blocking by domain is based on Bayesian probability, but differently from how naive Bayesian is used to filter content in naive Bayesian anti-spam.

The naive Bayesian probability used by most anti-spam to filter content is:

P(a ! b) = P(b ! a) / [P(b ! a) + P(b ! ~a)], thus assuming P(a) = P(~a)

The NON-naive Bayesian probability used by AccuSpam is:

P(a ! b) = P(b ! a) * P(a) / [P(b ! a) * P(a) + P(b ! ~a) * P(~a)]

Where "!" means conditional probability "if", and "~" means complement or "not".

Thus:

a = "domain is spammer" (aka "spammer")
~a = "domain is not spammer" (aka "not spammer")
b = "email from domain is spam" (aka "spam")

P( spammer ! spam ) = P( spam ! spammer ) * P( spammer ) / [P( spam ! spammer ) * P( spammer ) + P( spam ! not spammer ) * P( not spammer )]

So for example let us assume that a spammer is any domain which sends more than 99% spam, i.e. that all email from any domain sending more than 99% spam is to be considered spam, allowing 1% for errors or disagreements over what is spam and not among trusted AccuSpam users or even amongst the emails received by a single AccuSpam user. And let us assume that on average a domain which is not a spammer sends less than 30% spam, and let us assume that our AccuSpam user receives 90% of his email as spam and let us assume that ratio of spammer domains to non-spammer domains correlates to ratio of spam received.

This is non-intuitive because the first intuitive thought, even to someone skilled in the art, is that blocking 99% of spam and the 1% of non-spam would yield a horrible 1% false positive rate. But we are not computing the probability that an email is spam directly. We are first computing the probability than a domain is a spammer. We thus make the assumption that the probability that a legitimate non-spam would come from a spammer (e.g. domain that sends 99% or more spam in this example) is much less than 1%. So a non-intuitive invention herein is that first we compute the probability that a domain is a spammer, before computing the probability than an email is a spam.

Then:

P( spammer ! spam ) = 0.99 * 0.9 / (0.99 * 0.9 + 0.3 * 0.1) = 0.967 = 96.7%

Conversely:

P( spammer ! non spam ) = P( non spam ! spammer ) * P( spammer ) / [P( non spam ! spammer ) * P( spammer ) + P( non spam ! not spammer ) * P( not spammer )]

P( spammer ! non spam ) = 0.01 * 0.9 / (0.01 * 0.9 + 0.7 * 0.1) = 0.114 = 11.4%


MORE ON BAYES THEOREM

Bayes Theorem:

P(H ! E) = P(E ! H) * P(H) / P(E) (eq.1)

where "!" is "if", H is a hypothesis and E is the evidence. Or it may written in form:

P(H ! E) = P(E ! H) * P(H) / [P(E ! H) * P(H) + P(E ! ~H) * P(~H)]

where ~H is the complementary hypothesis. It can be simplified slightly to:

P(H ! E) = P(E ! H) / [P(E ! H) + P(E ! ~H) * P(~H) / P(H)] (eq.2)

It can be written more generally:

P(H(i) ! E) = P(E ! H(i)) * P(H(i)) / Sum( P(E ! H(k)) * P(H(k)) )

where k is set of all possible hypothesis.

So Bayes Theorem (eq.1) says that if we have some evidence and we want to ask the probability that hypothesis is true, then we can instead use the probabilities that the evidence is true when the hypothesis is true, times the ratio of probabilities of hypothesis and evidence. However, if we assume the "a priori" probability of the hypothesis and complement of hypothesis are the same, i.e. that P(H) = P(~H), then the equation can be reduced to the "naive" form:

P(H ! E) = P(E ! H) / [P(E ! H) + P(E ! ~H)] (eq.3)


PROBABILITY GIVEN MULTIPLE INSTANCES OF EVIDENCE

So AccuSpam computes the probability that a domain is spammer by combining the probabilities given the evidence, where the evidence is the number of spam and non-spam received from the domain, as determined by the actions that AccuSpam users take to add non-spam senders to the Approved Senders list (either manually or by replying to the Daily Summary) and to add spam senders to hidden disapproved list by by replying to the Daily Summary.

Note that currently AccuSpam does not record the count of multiple spams or multiple non-spams from same sender to same AccuSpam user (recipient), i.e. AccuSpam currently assumes disapproved sender is a spam email received and approved sender is a non-spam email received. In other words, assuming:

P( spammer | disapproved sender ) = P( spammer | spam )

and

P( spammer | approved sender ) = P( spammer | non spam )

Adding this count to computation would hasten the blocking of spammers who send multiple spams from same sender address, and would only contibute to the protection against false positives for those non-spammer domains which send greater than 50% spam (e.g. hotmail). This count should be easy to add and will probably be done in future, but we currently do not view it as essential, partially because we handle free email domains specially (currently requiring higher probability thresholds and eventually augmenting with hueristics such as reverse DNS).

We need a formula to combine the probabilities of each instance of evidence. If we assume that each instance of evidence is uncorrelated to the other (independent), which they clearly are not since in case of a spammer because they are coming from same domain and probably same spammer, and that the "a priori" probability of a set of evidence correlating to reality is equal to the probability that the complementary set of evidence correlates to complementary reality, then we use the formula:

P( spammer | a, b, c ) = P( spammer | a ) * P( spammer | b ) * P( spammer | c ) / [P( spammer | a ) * P( spammer | b ) * P( spammer | c ) + (1 - P( spammer | a )) * (1 - P( spammer | b )) * (1 - P( spammer | c ))]

Short hand written:

P( a, b, c ) = P(a) * P(b) * P(c) / [P(a) * P(b) * P(c) + (1 - P(a) * (1 - P(b) * (1 - P(c)]

a, b, c = abc / [abc + (1 - a)(1 - b)(1 - c)]

Which is derived here:

http://www.mathpages.com/home/kmath267.htm

Search the above page for "...suppose Mr. Red's ability to correctly identify the outcome of a TRUE/FALSE experiment is 75%, Mr. Green's is 60% and Mr. Blue's is 55%...".


So how close do the assumptions in the above formula model the reality? Intuitively we know in reality that N instances of spam evidence does not correlate as high a probability of a spammer as N instances of non-spam correlate the probability of not a spammer, all other factors equal, because domains that are not spammers do send more spam than spammer domains send non-spam. However intuitively this is compensated in the use of Bayesian probabilities. If we assume for purposes of illustration of an intuitive point that "all other factors equal", then P( spammer ) = P( not spammer ), so our example Bayesian probabilities change (to the naive Bayesian case):

P( spammer ! spam ) = 0.99 * 0.5 / (0.99 * 0.5 + 0.3 * 0.5) = 0.767 = 76.7%

P( spammer ! non spam ) = 0.01 * 0.5 / (0.01 * 0.5 + 0.7 * 0.5) = 0.014 = 1.4%

P( not spammer ! non spam ) = 0.7 / (0.7 + 0.01) = 0.986 = 98.6%

So we can see that "all other factors equal", the Bayesian probability of spammer given spam is much less than Bayesian probability that of not spammer given non-spam.

And any positive correlation between evidence of spams would just increase the probability of the reality over what the formula gives us with an assumption of independence. So we are just underestimating in that case. The formula and example Bayesian probabilities given any more than a small number of evidence of non-spam will prevent us from classifying a domain as a spammer, thus positive correlations between non-spams can be ignored because we are not classifying which domains are not spammers, instead we are classifying those domains which are spammers. The former set may include a few of the latter set. In other words, any loss of accuracy from the assumption of independence will error on the side of less spam caught (some domains not classified as spammers when in reality they are) instead of erroring on the case of the dreaded false positive.

Any loss of accuracy on the case of false negatives (spam not caught) is compensated in the fact that it will only take a few more votes from AccuSpam users to catch it. Spam is sent in such huge quantities that this error has no effect other than to possibly increase the proportion of users who have to reply to a Daily Summary (by still an insignificant and unnoticeable quantitative amount).

-Shelby Moore
http://AccuSpam.com

accuspam
08-12-04, 11:25 AM
For those who are curious how the correlation works in real world, as described in previous post:

http://forums.speedguide.net/showpost.php?p=1384314&postcount=112

Then here are some current results of cross-correlating AccuSpam users. UserId=20 is Tony in his support role (contactform@coolpage.com). UserId=17 is myself (shelby@coolpage.com).

The MaxRMSError is a measurement of how closely the ratios of Approved Senders to disapproved senders for each domain for each user are correlated between a pair of users UserId and CUserId.

The users who correlate to your UserId are helping contribute to your deletion of spam and vice versa (helping each other share effort). Actually this shared effort is not turned on yet, but will be shortly.

If you joined AccuSpam before the date of this posting and curious to know your UserId, then for privacy sake, please email a request to:

support@accuspam.com

Interesting to note that UserId=20 (Tony) receives the most spam (2000+ spams per day before filtering by AccuSpam) and that UserId=20 is correlated to the most other users, e.g. 8 other users (at this error threshold < 0.1). Thus, the correlation works to reward those who have the most incoming spam, by correlating them to the most other users, so mutual help is balanced. Quite satisfying to see that work in real world as expected in theory.


mysql> SELECT * FROM correlation WHERE MaxRMSError < 0.1 LIMIT 45;
+--------+---------+-----------+------------+-------------+
| UserId | CUserId | RMSError | Confidence | MaxRMSError |
+--------+---------+-----------+------------+-------------+
| 3 | 10 | 0.0552158 | 0.0112709 | 0.0664866 |
| 3 | 17 | 0.0613506 | 0.0118069 | 0.0731575 |
| 3 | 20 | 0.0252644 | 0.00287914 | 0.0281435 |
| 10 | 3 | 0.0552158 | 0.0112709 | 0.0664866 |
| 10 | 16 | 0.0354202 | 0.00669379 | 0.042114 |
| 10 | 19 | 0.0902866 | 0.00957036 | 0.099857 |
| 10 | 20 | 0.040897 | 0.00341998 | 0.044317 |
| 10 | 42 | 0.0716143 | 0.00843983 | 0.0800541 |
| 12 | 20 | 0.0229612 | 0.00382687 | 0.0267881 |
| 14 | 20 | 0.075312 | 0.0150624 | 0.0903744 |
| 16 | 10 | 0.0354202 | 0.00669379 | 0.042114 |
| 16 | 20 | 0.0264113 | 0.00299049 | 0.0294017 |
| 16 | 31 | 0.0387949 | 0.0086748 | 0.0474697 |
| 16 | 42 | 0.0535306 | 0.00836007 | 0.0618906 |
| 17 | 3 | 0.0613506 | 0.0118069 | 0.0731575 |
| 17 | 20 | 0.0814516 | 0.00719937 | 0.088651 |
| 19 | 10 | 0.0902866 | 0.00957036 | 0.099857 |
| 19 | 42 | 0.0839283 | 0.00701844 | 0.0909467 |
| 20 | 3 | 0.0252644 | 0.00287914 | 0.0281435 |
| 20 | 10 | 0.040897 | 0.00341998 | 0.044317 |
| 20 | 12 | 0.0229612 | 0.00382687 | 0.0267881 |
| 20 | 16 | 0.0264113 | 0.00299049 | 0.0294017 |
| 20 | 17 | 0.0814516 | 0.00719937 | 0.088651 |
| 20 | 42 | 0.0691626 | 0.00546778 | 0.0746304 |
| 20 | 62 | 0.0583288 | 0.0065625 | 0.0648913 |
| 20 | 68 | 0.0698225 | 0.0113267 | 0.0811492 |
| 31 | 16 | 0.0387949 | 0.0086748 | 0.0474697 |
| 31 | 42 | 0.0769461 | 0.0130063 | 0.0899524 |
| 32 | 62 | 0.0454675 | 0.00572837 | 0.0511959 |
| 42 | 10 | 0.0716143 | 0.00843983 | 0.0800541 |
| 42 | 16 | 0.0535306 | 0.00836007 | 0.0618906 |
| 42 | 19 | 0.0839283 | 0.00701844 | 0.0909467 |
| 42 | 20 | 0.0691626 | 0.00546778 | 0.0746304 |
| 42 | 31 | 0.0769461 | 0.0130063 | 0.0899524 |
| 51 | 16 | 0.0651428 | 0.0142153 | 0.0793581 |
| 51 | 32 | 0.0756851 | 0.01182 | 0.0875051 |
| 51 | 42 | 0.0840936 | 0.0128241 | 0.0969177 |
| 53 | 62 | 0.0495078 | 0.0350073 | 0.0845151 |
| 57 | 16 | 0.0626783 | 0.0133631 | 0.0760414 |
| 57 | 42 | 0.081343 | 0.0127036 | 0.0940467 |
| 57 | 62 | 0.0650945 | 0.018054 | 0.0831484 |
| 59 | 69 | 0.0622572 | 0.0179721 | 0.0802293 |
| 62 | 20 | 0.0583288 | 0.0065625 | 0.0648913 |
| 62 | 32 | 0.0454675 | 0.00572837 | 0.0511959 |
| 68 | 20 | 0.0698225 | 0.0113267 | 0.0811492 |
+--------+---------+-----------+------------+-------------+
45 rows in set (0.00 sec)

-Shelby Moore
http://AccuSpam.com

accuspam
08-13-04, 02:02 AM
Improved AccuSpam to record the count of fraudulent (faked sender) emails it automatically filters, in addition to the general count of all spams filtered.

This will now be reported in the Daily Summary emails.

Also we will soon start reporting on our web site the real-time (continuously updated) global statistics (totals from all AccuSpam users) of spams deleted, non-spams delivered, % of email detected as spam, # of spams detected as fraud, etc..

This is so we can compete with http://BrightMail.com in terms of how they publish their data and their claims to detect fraudulent email.

A fraudulent email is when the sender of the email is not who sent the email. This can either be a sender address which does not exist, or it can be a sender who was spoofed. A big problem that results from fraudulent emails are phishing scams (http://survey.mailfrontier.com/survey/quiztest.html), e.g. where you get an official looking email appearing to come from a company you do business with that asks for you to update your financial data or password. The spammers use these fraudulent phishing scams to trick you and steal your personal data. AccuSpam automatically detects and deletes these type of spams:

http://www.fortune.com/fortune/fastforward/0,15704,680552,00.html

Now AccuSpam reports to you the count of fraudulent emails it is detecting and deleting for you.

Note that to populate the database, I had to just estimate the fraudulent counts (since they were not being recorded before today), so I just set it to 50% of the number of spams deleted. So for AccuSpam users who joined before today, your fraud counts will start at 50% of spams deleted thus far, but over time it will trend towards the actual percentage. New AccuSpam users will get an exact count starting from 0.

-Shelby Moore
http://AccuSpam.com

accuspam
08-13-04, 02:37 AM
Here follows evidence that spammers are already starting to do some of the things I wrote (in previous posts in this thread) that can defeat (subvert) Paul Graham's naive Bayesian anti-spam content filtering, which is used by many popular anti-spam products such as Spam Assassin (used by many ISPs), SpamBayes for Outlook, etc.

Notice below the insertion of random, uncommon words in the spam. This is an attempt to defeat the filter by guessing one of the "good words" of a Bayesian user, and thus after user retrains the filter, then this will pollute the "good words" Bayesian probabilities. However, as I wrote before, this will be caught by Spam Assassin's rule "many words used only once". Even repeating the words could flag a grammatical analysis. The spammers will eventually figure out they need to use plausible prose with such words, such as by seeding a madlib generator with a dictionary of such words.

If the following kind spam is missed by your Bayesian filter and you retrain on it, then your "good words" get polluted to the extent that the words inserted below overlap your good words.

For this to be effective for spammers, they will need to insert many more words per spam, use plausible prose, and most importantly not also insert words that have very strong signature of being spam. This spam also had "Sa;ve 6_0% ord.ering onl/ine To`day!" which is very obviously spam to a Bayesian filter which has seen any of those abberations of english in previous spam.

coincide cowman centrifugate fluoridate nullify poise nuptial income
nato earring linger adultery avenge dram polaris cancellate hollowware
extrude bet contraption calculus chaff destitute anisotropy ridgepole
concision druid kidnapping obstacle inefficacy alligator fashion agnes
wooster mckinney keenan petticoat radish couturier lifetime colloquy
electrophoresis adobe beauty borrow moth riordan sinful comprehension
grandiloquent geodesy bog deconvolution ghetto flaxen begetting embassy
rightful prismatic hoosegow greene enoch geography demise childbirth
coequal nibelung muzak wingspan alibi wop abyss analytic intellect
arabic deity gerhardt chantry becker cinematic bugle cranford stealth
inaugurate latera monstrosity hibachi diminish adulate rush million
buzzy hydrous transgressor brilliant muzak hepatitis franc checklist
huddle postprocess egypt perseverance eigenvalue multitude gelatin
appraisal burial donner desuetude hip chauncey crony bronx handymen
isadore coquette demagogue company cucumber healthful valedictory attire
mallory stimulate stipulate
removemeplease

accuspam
08-13-04, 05:06 AM
Fixed a bug which was causing any MIME encoded emails (e.g. HTML, attachments, etc), which you chose to be deliver from the Daily Summary, to be slightly corrupted when delivered in Inbox in Outlook Express and possibly also Outlook (and maybe other email programs). It seems some other email programs, such as Eudora were robust enough to handle the problem.

Any way, it should be fixed now.

This problem did not apply to email from senders already on the Approved Senders list.

Actually I have fix this by working around an apparently long-standing bug in PHP which is what we currently use to program AccuSpam in:

http://bugs.php.net/bug.php?id=29646

So the PHP mail() bug still exists, but we have altered our code so it works around the PHP bug.

The following link has further discussion of the problem we fixed:

http://www.zend.com/zend/comments/show_comment.php?article=sendmimeemailpart1&id=5733&pid=2921&days=10000&f_id=sendmimeemailpart1&mode=&kind=sl

UPDATE:

Appears that my work around did fix the problem, and I have documented my work around in the PHP bug report linked above (for other PHP programmers to use).

accuspam
08-13-04, 08:24 AM
Fixed an obscure bug that did not apply to most AccuSpam users.

Daily Summaries REPLIES larger than 64K were being ignored. This is fixed.

If you had a Daily Summary that was growing larger and larger, even though you were replying, then you can reply now and the Daily Summary will be cleaned out.

accuspam
08-13-04, 04:29 PM
IMPORTANT MILESTONE

The global statistical blocking by domain is now enabled!

It is working and some domains are being blacklisted based on the correlation of votes from AccuSpam users who replied to their Daily Summaries.

This means AccuSpam users are sharing the disapproving spam domains statistically.

AccuSpam users should see a gradual decline in the length of their Daily Summaries over coming weeks.

Only AccuSpam users who reply to their Daily Summaries can benefit from being optimally correlated to other AccuSpam users, and thus have the length of their Daily Summaries decrease optimally.

The spammers have immense number of domains. If we can get 10 times more AccuSpam users, and if all users will reply to their Daily Summaries, then I project all users will see the length of their Daily Summaries shrink by a factor of 10, from current 90% to achieve the 99+% mentioned in previous post:

http://forums.speedguide.net/showpost.php?p=1384314&postcount=112

Mostly I sit back and wait for usership to grow now.

Try to encourage people to join AccuSpam.

-Shelby Moore
http://AccuSpam.com

TonyT
08-13-04, 04:42 PM
To add a bit of promo to the above last post by accuspam:

I get 2000+ spams/day. Join accuspam.com, reply to your Daily Reports and you will benefit from my spam! The more spam I get, the less spam you get! Inverse quote of an old phrase, "more is less".

UOD
08-13-04, 04:43 PM
Yeah man...been plugging your site when I can to those who need it. If you haven't already, I would introduce yourself and your very fine product to the people at www.eff.org

Even though I'm not using your product at this time, I do feel it is important for us all to work together to fight spam. ;)

accuspam
08-13-04, 05:03 PM
...I would introduce yourself and your very fine product to the people at www.eff.org...

Thanks but I personally do not have time to contact the EFF.

Apparently AccuSpam meets the EFF's desired spam tactics 100% and better than any other existing method. Perhaps you can contact them on our behalf and let them know that Bayesian and Spam Assassin, which they are prominently linking to (http://www.eff.org/Spam_cybersquatting_abuse/Spam/) does not meet their own criteria, but AccuSpam does:

http://www.eff.org/Spam_cybersquatting_abuse/Spam/position_on_junk_email.php

"...any measure for stopping spam must ensure that all non-spam messages reach their intended recipients. Proposed solutions that do not fulfill these minimal goals are themselves a form of Internet abuse..."

"...we would like to see the development of better filtration software on servers, something that could work interactively with the mail recipient in defining what he or she regards as spam using pattern recognition. That is, every time somebody gets a message of a sort he or she does not want, s/he could send it to the filter, thereby making that filter smarter over time, as well as giving it the ability to "learn" as spam techniques develop..."

UOD
08-13-04, 06:54 PM
Thanks but I personally do not have time to contact the EFF.

Apparently AccuSpam meets the EFF's desired spam tactics 100% and better than any other existing method. Perhaps you can contact them on our behalf and let them know that Bayesian and Spam Assassin, which they are prominently linking to (http://www.eff.org/Spam_cybersquatting_abuse/Spam/) does not meet their own criteria, but AccuSpam does:

http://www.eff.org/Spam_cybersquatting_abuse/Spam/position_on_junk_email.php

"...any measure for stopping spam must ensure that all non-spam messages reach their intended recipients. Proposed solutions that do not fulfill these minimal goals are themselves a form of Internet abuse..."

"...we would like to see the development of better filtration software on servers, something that could work interactively with the mail recipient in defining what he or she regards as spam using pattern recognition. That is, every time somebody gets a message of a sort he or she does not want, s/he could send it to the filter, thereby making that filter smarter over time, as well as giving it the ability to "learn" as spam techniques develop..."


Well, I'll see what I can do. I am a member of the EFF and contribute yearly with monetary donations.

accuspam
08-14-04, 03:54 AM
Well, I'll see what I can do. I am a member of the EFF and contribute yearly with monetary donations.

Thanks! Any such prominent links to AccuSpam.com will help accelerate the snowball downhill effect of ramping up the statistics AccuSpam uses to detect spammers.

As well, I am working a Bayesian content filter that uses the statistics of all AccuSpam users, and will work essentially exactly the same as the domain blocking. In essense the domain blocking hypothesis is that some domains send 99.9+% spam and < 0.1% non-spam. The naive Bayesian content filtering espoused by Paul Graham (and afaik used by all current Bayesian anti-spam, e.g. Spam Assassin, Spam Bayes, etc.) attempts to correlate spam features which have much less than 99% probability (especially if measured globally for all correlated users), thus it needs to balance the probabilities with "good words". Whereas, I am working on a Bayesian method that works the same as the domain blocking hypothesis and looks only for the features of spam content which are in 99.9+% of spam and in < 0.1% non-spam. This improved form a Bayesian content analysis will have advantages over Paul Graham's Bayesian content filter:

1. Fundamentally it is correlating not content of spam and non-spam (which is inherently noisy), but correlating volume of spam and non-spam. What makes spam is it's volume, not it's message. So this Bayesian does not try to decide what is bad content and good content as Paul Graham's (www.paulgraham.com) Bayesian does, it instead just tries to find the features of spam sent in bulk that unique from the features of non-spam on the whole.

2. The risk for false positive (even in future) will be near 0, e.g. 1 in million (same as for domain blocking), because it takes into account the patterns of many correlated users and the many permutations of legitimate email.

3. No way for spammers to corrupt the "good words" (words in non-spam) probability, because my approach does not use the probability of the "good words", only the probability of very "bad words" (words always spam and never in non-spam).

4. Effort to identify and train on patterns shared (divided) amongst all users, so many orders of magnitude less effort than per user (Paul Graham) Bayesian.

5. The only way for spammers to corrupt the very "bad words" is to fight with other spammers by adding more spam weight to the very "bad words" of other spammers. This is same as for the domain blocking. The only way for one spammer to defeat AccuSpam for his domain(s), is to correlate well by disapproving the domains of the other spammers. If all spammers fighting each other then they actually cancel their attempts to defeat AccuSpam, and aid AccuSpam in detecting them. For example, say there are 1000 spammers, then 999 are against each 1 of them, so they add 999 to 1 more disapproval data than they add approval data. If a spammer joins AccuSpam and does not disapprove his fellow spammers, then his votes are ignored because they won't correlate well to AccuSpam users which are disapproving the spammers. It is like a dog chasing his tail, they have no way out to catch it.

6. Since most single words occur both in spam and non-spam, my improved global Bayesian, will look at n-grams of word combinations, since it is not usually the word "sexy" but the context of the use "sexy" in a phrase that can uniquely identify a spam.

accuspam
08-14-04, 04:30 AM
I have thought of an easier way that spammers can defeat the Bayesian used in afaik all existing (Paul Graham) Bayesian anti-spam, easier than what I wrote before:

http://forums.speedguide.net/showpost.php?p=1374288&postcount=69

For each spam run, they add 4 or 5 random letters (chosen from a-z and A-Z) to the end of each word that is often used in spam (e.g. ViagraAgtU). Do not insert HTML, space, punctuation or anything between the random letters and the spam word. Simple. Done. All existing (Paul Graham) Bayesian defeated 100%.

The reason is that given 26 letters times 2 for capitals, then the number of random combinations are (26*2) ^ 4 = 7.3 million. Thus it will take at least 7.3 million spam runs before on average a Bayesian filter will see the same spam word in more than one spam. Given that a Bayesian filter needs to see a word many times before giving it a high spam probability, then probably a billion spam runs will still not be detected. Spammers do not have to ask the other spammers not to use their combinations, because all spammers choose randomly.

Once the 4 letter combinations start getting caught by Bayesian, then just switch to 5 letters and that is 400 million. The 6 letters is 20 billion, e.g. trillions of spam runs before detection by Bayesian.

Since my improved Bayesian correlates all users, then for using the same combinations for all spams in a spam run will be detected by my improved Bayesian. The way for spammers to avoid detection with my improved Bayesian is to randomize the letters for each spam in a spam run:

http://forums.speedguide.net/showpost.php?p=1386418&postcount=125

Anti-spam could attempt to identify words stems which end (or start) with randomized letter combinations, but this could create false positives if analyzed the last letters for randomness. Some legitimate non-spam emails contain unlikely letter combinations, e.g. hexadecimal numbers.

Anti-spam could attempt to use a dictionary of words to extract the beginning word stem,and ignore words with stems not found in dictionary, but the dictionary would have to contain all possible spellings of spam word stem, then the anti-spam would miss unknown and misspelled spam words (e.g. Viaqra).

So to be most clever, spammers should combine misspellings with random letter appendages (e.g. ViaqraAgtU) and avoid using anything (no html) but letters a-z and A-Z in their emails. They can defeat all Bayesian that way, even my improved Bayesian if they randomize each spam of spam run.

Spammers would also have to randomize any urls they insert in their spams. They probably should do it more intelligiently than just adding random "?xxxx" to end, as this is easy for anti-spam to ignore. Instead they must randomize the domain (or portion after a non-spam domain). Much more costly for spammers to randomize their domains and urls. As long as spammers have a correlatible url or reply email address in their spam, then BrightMail and Bayesian can correlate them, but my improved Bayesian can correlate it much faster (since many users data and spammers can change urls frequently compared to only one user data).

And spammers could combine this with my previous ideas to insert normal prose to help defeat and pollule the good word probabilities of Paul Graham type Bayesian:

http://forums.speedguide.net/showpost.php?p=1385774&postcount=117

Here is some interesting analysis and examples from another person who believes Bayesian content filtering can and will be defeated:

http://www.jerf.org/writings/bayesReport.html

UOD
08-14-04, 10:15 AM
What are your thoughts on email encryption/digital signatures? Any problems in how encrypted email interfaces with your service?

accuspam
08-15-04, 12:44 AM
...Any problems in how encrypted email interfaces with your service?

As far as I know, no conflicts in terms of the sender address statistical blocking. AccuSpam does not care about what you put in the email, as long as the normal headers exist.

However, for the global Bayesian content blocking we are considering, then if the body of the email is encrypted, then that aspect of spam detected would be defeated. However, I think you are referring to a signature which identifies a sender, not the encryption of the email content. In that case, I see no conflict with AccuSpam.

Note that AccuSpam does not currently propogate all the headers (only for non-Approved Senders in free version, or all senders in the yet unreleased paid version), so any special headers (that normal email does not need) would be lost. This is in my medium term To Do list to fix.


What are your thoughts on email encryption/digital signatures?...

If you are referring to encrypting the content of an email using public/private key (e.g. PGP) so that only the sender and recipient can decrypt, then I think that is really not needed or practical for vast majority of users.

What we really need is secure transport (e.g. SMTP and POP over SSL), so that the email can not be sniffed during transmission, which is especially important now with wireless transmission. Minimally every user needs to demand their ISP support APOP or POP over SSL (it is amazing how many major ISPs do not!), and then set their email program, to prevent the sending of their email passwords in clear text. My ISP Earthlink.net supports APOP, but my Host (which is also the Host of AccuSpam) Pair.com still does not support APOP (even after 2 years of me asking them to), and it was a source of irritation when a college student walked up to me in a coffee shop where I was connected via wireless and showed me my email password. Since then, I always change my email password before doing wireless session, and then change it back afterwards. Note that other than this, Pair.com is a very secure and excellent Host. The do support most other major secure connection mechanisms, such as SSH (telnet over SSL), SFTP (ftp over SSL), etc..

If you are instead referring to the use of a digital signature to identify that an email really came from you, then we think this is so important, that it is actually part of way AccuSpam will detect email forgery. Soon there will be a new feature on AccuSpam, where you insert a value in your signature so that all AccuSpam users can receive your email. If you don't sign up for it, AccuSpam users will still get your email, but your email address can be forged by a spammer. Initially we expect major corporations to sign up for this once we have many AccuSpam users, so that they can stop spammers from doing phishing scams (http://survey.mailfrontier.com/survey/quiztest.html) using their corporate email addresses. This will also be a free service available for instant signup to individual users.

-Shelby Moore
http://AccuSpam.com

accuspam
08-15-04, 10:54 AM
Improved the correlation of AccuSpam users by only correlating to the target user on domains the target user thinks are spammers. This was done to insure that any attempt to approve a spammer by joining AccuSpam to pollute global stats, would be ineffective because they would also have to disapprove a greater number of other spammers in order to correlate to other users.

An unexpected benefit is it increased the number of correlations by 50%! So we are 50% closer to critical mass. In hindsight, this makes sense (thinking to myself "why didn't I realize that!" :)). Many users will disagree on the % of spam received from non-spam domains, ranging from 0% - less than 80 or 90%. But most users will agree the spammer domains are sending greater than 90+%.

Some users may see an instant and significant decrease in the length of their Daily Summaries from this simple improvement.

accuspam
08-15-04, 11:03 AM
I am currently having doubts whether I will implement the "improved Bayesian content filtering" I outlined in previous post:

http://forums.speedguide.net/showpost.php?p=1386418&postcount=125

I have realized any Bayesian filter which recognizes urls and domains, could be effectively used by spammer to blacklist any less frequently domain on the web, by sending out a lot of spam containing that domain. Chalk that up to yet another hole that could be exploited by spammers again Bayesian.

I could ignore domains and urls in content, and may do that as a defense against current day spam until our critical mass builds for statistical sender blocking, but then as outlined previously, defeating all Bayesian content filters is fairly trivial for spammers if the Bayesian is not considering the urls and domains in the content:

http://forums.speedguide.net/showpost.php?p=1386418&postcount=126

AccuSpam's statistical sender blocking can NOT be polluted so easily by spammers because we can detect forgery of sender. We have no corresponding way to detect forgery of content.

accuspam
08-15-04, 05:36 PM
Another major improvement has been made to AccuSpam.

The Daily Summary now has emails ranked by order of greatest chance to be a non-spam first.

And the chance of being a non-spam is listed below each email summary in the Daily Summary.

Thus the AccuSpam user can decide how far down to browse the Daily Summary based on his/her desired false positive risk.

Now there is no excuse not to reply to the Daily Summary. There are some users who are not replying to the Daily Summary, and they will have no one to blame if they lose a legimate email but themselves. We can not hold their quarantine indefinitely. We will probably automatically purge emails from the quarantine which are 14 days old and have less than 1 in 1000 chance to be a non-spam. Or something reasonable like that.

Additionally the statistical domain blocking algorithm is run again for previously processed emails for a user just before sending the Daily Summary, so that any global data that accumulated since first processing has another chance to detect the spam (as having > 1 in million chance to be non-spam) and not include it in Daily Summary.

I already noticed this has reduced the lengths of some users' Daily Summaries.

accuspam
08-16-04, 01:18 AM
We made an error in the improvement we made in morning:

http://forums.speedguide.net/showpost.php?p=1387136&postcount=131

which caused the Daily Summaries to contain blank entries.

This has been fixed and replacement Daily Summaries have been emailed to all users.

Do not worry. No email was lost. It was merely an error in the display of the information in the database. No information in the database was affected.

cigamkcalb
08-16-04, 01:26 AM
sounds dangerous

accuspam
08-16-04, 02:24 AM
sounds dangerous

Absolutely not dangerous.

Before sending the Daily Summaries, the data from the database is copied into an array in memory. The error was that we were not reading correctly from that array when writing the values into the text of the Daily Summary email. No manipulations are performed on the database when composing the Daily Summary, because it is purely a display operation. That is why it wasn't as crucial to test it exhaustively before release. Be confident that any code that changes the database is tested exhaustively both before and during release and continually monitored.

Besides, the database is backed up frequently.

accuspam
08-16-04, 05:07 AM
I am very happy to report that the backlog in some users' quarantines is being automatically reduced by the improvement I made to apply the statistical blocking again before sending Daily Summary.

We had a 20% increase in enabled AccuSpam users overnight!

The global statistical blocking among correlated AccuSpam users is starting to catch up with the rate that spammers use new domains.

I am confident we will see the Daily Summaries reduce from here.

The remaining major work for me is to figure out how to deal with UNSPOOFED spam from domains which do not send 100% spam, e.g. major ISP domains. We already delete the spoofed spam in most cases. Luckily this UNSPOOFED spam from non-spammer domains is a small % of the spam being received because ISPs have incentive to stop spam coming from their networks. I will probably have to apply some sort of "safe" Bayesian to UNSPOOFED spam from non-spammer domains. And may be able to apply "safe" reverse DNS on free email that are exclusively Webmail oriented. As well, the statistical blocking by sender email address (not just domain which only needs < 1000 users) will kick in once we have 10,000 AccuSpam users.

As always, in no case should you receive spam in your Inbox with (paid version of) AccuSpam (and only minute amount if free version used correctly as detailed in the AccuSpam.com website FAQ).

accuspam
08-16-04, 05:17 AM
We had the first user complain angrily about AccuSpam, and I feel it is important to explain the scenario where AccuSpam will absolutely not work.

(not counting the old version of AccuSpam in 2003 that was totalling different product and algorithm).

We did not bother to ask the user why s/he wanted to disable AccuSpam, but s/he was asking for instructions to disable and they seemed angry about not receiving any of their email. We simply pointed them to the instructions that are already on the AccuSpam.com website for disabling.

We realized that the user could not have been receiving the Daily Summaries or was not properly replying to the Daily Summaries. That is the only way they could have received none of their email.

So I realized that the users is probably running another anti-spam (probably at their ISP possibly even without the user's knowledge) and that anti-spam is erroneously blocking the Daily Summary emails from AccuSpam to the user.

It could very well happen that some people (possibly other anti-spam companies) who feel competition to AccuSpam will try to hurt us by blacklisting our IP address.

So if you are using AccuSpam and you do not receive the Daily Summaries, then complain to your ISP that they are erroneously blocking your legitimate email.

AccuSpam does not delete legitimate email. Sadly most other anti-spam does. So do not blame AccuSpam if you run another anti-spam that blocks AccuSpam. Users would be much wiser to run one anti-spam at a time.

UPDATE:

Apparently we were wrong and the problem was the user was not replying to the Daily Summaries.

Here is copy of our email response to her/his further explanation of the problem they were having. Note that no personal information has been disclosed. This is merely to answer a problem that other users may run into:


======AccuSpam wrote to AccuSpam user========
Thanks for explaining your problem further, especially we did not even ask you to. That is appreciated. We had incorrectly assumed you were not receiving the Daily Summaries from us:

http://forums.speedguide.net/showpost.php?p=1387530&postcount=136

Sounds to me like you are trying to type into the Daily Summary email you received from us.

You must click "Reply" first to create a new reply email. Make sure your email program is configure to include the senders email at bottom when you reply to a sender. Else you need to copy and paste the email from AccuSpam into the reply.

Then you can type into the [ ] boxes in the reply email.

NOTE: you do NOT need to type a space for each message you wanted deleted. The spaces are already inserted by default in the Daily Summary AccuSpam sends to you.

At 08:37 PM 8/15/2004 -0400, AccuSpam user wrote:
>
>I am trying to place letters (A & R) in the Message ID brackets. I am not
>able to type anything in the spaces ... nor can I put a space in the Message
>Brackets for the messages that I want deleted.
>
>WHAT IS GOING ON?

accuspam
08-16-04, 06:06 AM
Improved the Daily Summary instructions so users understand that they do not have to manually type an empty space for each email they wish to delete. They merely reply. They only need to type the A and the R.

Added:


Note the [ ] below already have an empty space by default.


To bottom of:


- Place empty space in Message ID brackets [ ] for messages you want deleted
permanently and to permanently block sender.
Future emails from sender will be deleted.
Use empty space if sure is spam.
Note the [ ] below already have an empty space by default.

accuspam
08-16-04, 06:27 AM
As I predicted, the spammers are getting more astute at attacking the popular Bayesian content filtering used by most anti-spam (not used by AccuSpam).

The following content is an attempt at normal prose, but I think it is still too non-random and as I said the urls are what can still be correlated by bayesian, if you do not mind your legit email getting blocked by bayesian if spammers insert non-spam urls in their spams:

Subject: joke inside

<DIV><FONT face=Arial size=2><A
href="http://nicepharmacy.com/?partid=arlenders">Three blondes were taking a walk in the country when they came upon a line of tracks. The first blonde said, "Those must be deer tracks!" The second blonde said, "No, stupid, anyone can tell those are rabbit tracks!" The third blondie said, "No, you idiots, those are horse tracks!" They where still arguing ten minutes later when a train hit them.</A></FONT></DIV>
<DIV><FONT face=Arial size=2><A
href="http://nicepharmacy.com/?partid=arlenders"><IMG alt="" hspace=0
src="http://222.233.52.28/d1.gif" align=baseline border=0></A></FONT></DIV>
<DIV><FONT face=Arial size=2><A
href="http://nicepharmacy.com/?partid=arlenders">A blonde got a dent in her car and took it in to the repair shop. The repairman, noticing that the woman was a blonde, decided to have a wee bit of fun. So he told her all she had to do was take it home and blow in the tailpipe until the dent popped itself out. After 15 minutes of this, the blonde's blonde friend came over and asked what she was doing. "I'm trying to pop out this dent, but it's not really working." "Duh. You have to roll up the windows first!"</A></FONT></DIV>

accuspam
08-16-04, 08:59 AM
Request for help.

To further improve AccuSpam, I need a list of the mail domain (e.g. msn.com, earthlink.net, etc.) for subscribers of major and stable ISPs all over the world. Specifically ones that we know have a bonafide non-spam subscriber base of size that would be worth a spammer attacking.

And for each one, I need a copy of the email headers (specifically the "Received" header lines) from an email sent from a subcriber to that ISP.

The reason I need this information is to compile a database of ISP domains that support "Reverse DNS" (e.g. PTR records in DNS for their IPs) and also a list of the nameservers for each ISP (e.g. NS records). I can lookup this information from DNS given the email headers.

It seems that we can delete a lot of the spam that AccuSpam is currently summarizing in the Daily Summaries simply by looking for forged Reverse DNS records! This is different than how most anti-spam use Reverse DNS. I have noticed that many spammers set a Reverse DNS record for their IP to match the lie they give in the email headers, but that then of course the nameservers do not match the major ISP they are pretending to be sending from.

For spams from IPs which do not have a Reverse DNS record, we will not delete this (as some anti-spam do), as this would cause false positives, but we can assign a probability to this which when combined with other metrics can help detect the spam.

Start here for lists of major ISPs:

http://navigators.com/isp.html

http://www.thelist.com/

Any contributions can be emailed to me at:

shelby@coolpage.com

If you subscribe to one of those ISPs above, simply send me an email with subject "Here is an ISP header you requested".

Thanks,
Shelby Moore
http://AccuSpam.com

accuspam
08-17-04, 02:16 AM
Implemented the pseudo-"Reverse DNS" test (somewhat different that the way other anti-spam use "Reverse DNS"), and have populated it to detect and delete sender email address forgeries from yahoo.com and hotmail.com. I can see many of these forgeries now being deleted:

http://forums.speedguide.net/showpost.php?p=1387578&postcount=139

You should see much less (if any) spam in your Daily Summaries from senders that have yahoo.com and hotmail.com in their email address.

For major non-webmail ISPs, it will not delete because we can not be sure legitimate email will pass "Reverse DNS" in that case, but it will place a higher probability of spam on those that fail "Reverse DNS". Most importantly it will delete those that forge the "Reverse DNS" of major ISPs.

FYI, this may seem non-intuitive, but it is MORE important to block forgery of free email domains than paid email domains, because AccuSpam deletes spam from non-existent senders, and thus it is much more costly for spammers to obtain paid email accounts (or to use their mailing list as the senders), or obtain their own domains, than to obtain free email accounts and forge them. The reason the spammer must forge the free email account they created is because they can not send huge volumes of emails through the webmail interface of the free email provider.

accuspam
08-17-04, 11:10 AM
Forged spam from hotmail and yahoo is now eliminated from Daily Summary. The only way to get a spam from hotmail and yahoo in your Daily Summary is if the spammer actually sent the spam from the webmail (directly or via a program which interfaces to the webmail, e.g. Outlook, Hot Popper, Yahoo Pops, etc) of hotmail or yahoo (which I think yahoo and hotmail prevent sending huge volume of email from their webmail).

Here is an example that AccuSpam detected and deleted (with "xxxxxxxx" used to obscure private AccuSpam user data):

Return-path: <cheappharmz569@hotmail.com>
Received: from defapp04.gatewaydefender.com (unverified [209.153.138.124]) by
buckeye-express.com
(Rockliffe SMTPRA 5.3.11) with ESMTP id <B0067639950@mpmail1.accesstoledo.com>;
Tue, 17 Aug 2004 10:17:14 -0400
Received: from YahooBB218124004160.bbtec.net (Not Verified[218.124.4.160]) by
xxxxxxxxxxxxxxx with DEFSCAN (v3)
id <BH0cca5226>; Tue, 17 Aug 2004 10:17:12 -0400
Message-ID: <86981256473.88020@cheappharmz569@hotmail.com>
Reply-To: "Katina Thornton" <cheappharmz569@hotmail.com>
From: "Katina Thornton" <cheappharmz569@hotmail.com>
To: xxxxxxxxxxxxxxxxx
Subject: V.iagra on s.ale, save moolah ; bdlkvdpkgogsc
Date: Tue, 17 Aug 2004 12:12:53 -0300
MIME-Version: 1.0 (produced by decimatesportsman 1.7)
Content-Type: multipart/alternative;
boundary="--945509168745767"


Here is the analysis AccuSpam did, where "38k" means it deleted the email as forgery. You can see the spammer was actually sending from 209.153.138.124, which is "defapp04.gatewaydefender.com" probably on "voyager.net" network:


27: cheappharmz569@hotmail.com
28: V.iagra on s.ale, save moolah ; bdlkvdpkgogsc
28a: hotmail.com
39
40a
38a1: hotmail.com
38a2: 209.153.138.124
RESULT:

; <<>> DiG 9.2.3rc4 <<>> -x209.153.138.124
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10170
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1

;; QUESTION SECTION:
;124.138.153.209.in-addr.arpa. IN PTR

;; ANSWER SECTION:
124.138.153.209.in-addr.arpa. 86400 IN PTR defapp04.gatewaydefender.com.

;; AUTHORITY SECTION:
138.153.209.in-addr.arpa. 86400 IN NS e0.ns.voyager.net.
138.153.209.in-addr.arpa. 86400 IN NS e1.ns.voyager.net.
138.153.209.in-addr.arpa. 86400 IN NS e2.ns.voyager.net.

;; ADDITIONAL SECTION:
e2.ns.voyager.net. 164766 IN A 207.90.100.25

;; Query time: 25 msec
;; SERVER: 209.68.2.239#53(209.68.2.239)
;; WHEN: Tue Aug 17 10:19:51 2004
;; MSG SIZE rcvd: 169


38a3
38a4
38a9
38k

accuspam
08-18-04, 08:01 AM
Added an interesting link of example phishing scams (http://survey.mailfrontier.com/survey/quiztest.html) to previous posts below. The link was provided by Tony.

http://forums.speedguide.net/showpost.php?p=1385759&postcount=116

http://forums.speedguide.net/showpost.php?p=1386863&postcount=128

accuspam
08-18-04, 08:07 AM
Here is an explanation I gave to Tony, which I think is relevant to share with anyone using or interested in using AccuSpam.


To: tonyt@coolpagehelp.com
Subject: Re: Is this from a spammer?
Cc:


Yes it is a spam.

You received it because it had arrived in your mailbox within 3 minutes before you POPed email from your mailbox.

If using the paid version, it would be impossible for you to receive these.

The spammer sent an email from zstom@coolpage.com to contactform@coolpage.com. AccuSpam sent a confirmation to zstom@coolpage.com, because there is no way for AccuSpam to know that zstom@coolpage.com is an alias for same POP mailbox as contactform@coolpage.com. For example, AccuSpam would not know if first@msn.com is same mailbox as second@msn.com.

But AccuSpam has a way to find out. When AccuSpam finds this email in the POP mailbox, it checks it's database and realizes that it received the confirmation in the same POP mailbox as it sent it from. So then AccuSpam deletes the confirmation and the original spam.

The only reason AccuSpam did not do this, is because you downloaded the confirmation email below before AccuSpam had a chance to check the mailbox again. So AccuSpam checked the POP mailbox, sent the confirmation, and then waited 3 minutes to check the POP mailbox again. While waiting for 3 minutes, the confirmation came back to same POP mailbox (because zstom@coolpage.com and contactform@coolpage.com are aliases for same POP mailbox). You downloaded the confirmation email below during that 3 minute wait. That is why this form of spam can only be received in the free version and only in the rare case that you happen to hit that 3 minute wait window.

AccuSpam must wait 3 minutes between inspecting your POP mailbox, because if it opened your POP mailbox more frequently than that, then you would be unable to open your own POP mailbox, because it can only be open to one client at a time.

The paid version solves this by using two mailboxes, one that AccuSpam inspects and the other that you POP from. This is called a "proxy". There are other ways we could attempt to do a proxy, but in our analysis they were all inferior to what we chose. For example, putting a proxy on the client computer of the user would not work well because it would not work with WebMail or when user uses other computer, so we chose the dual mailbox proxy for paid version instead.


At 09:01 PM 8/17/2004 -0400, you wrote:
>Return-Path: <cnfm_77741_HZuJwmIsI9PTY6Y5@accuspam.com>
>Delivered-To: coolpage-3dize:com-support@3dize.com
>X-Envelope-To: support@3dize.com
>Received: (qmail 98099 invoked by uid 3052); 18 Aug 2004 00:17:50 -0000
>Delivered-To: coolpage-coolpage:com-zstom@coolpage.com
>Received: (qmail 98096 invoked by uid 3052); 18 Aug 2004 00:17:50 -0000
>Date: 18 Aug 2004 00:17:50 -0000
>Message-ID: <20040818001750.98095.qmail@qs662.pair.com>
>To: zstom@coolpage.com
>Subject: Received your email: [BuddyN]
>From: "contactform@coolpage.com" <cnfm_77741_HZuJwmIsI9PTY6Y5@accuspam.com>
>Reply-To: cnfm_77741_HZuJwmIsI9PTY6Y5@accuspam.com
>
>
>I [contactform@coolpage.com] received the email from you [zstom@coolpage.com],
>containing the subject above.
>
>If you need me to reply more urgently, simply click Reply
>and send back this entire confirmation email.
>
>
>If you sent the email to [contactform@coolpage.com], the following
>does not apply to you.
>If you did NOT send an email to [contactform@coolpage.com],
>http://AccuSpam.com can help you stop forgery spam.
>
>=============================
>Join free http://AccuSpam.com
>100% spam blocked. 0% of non-spam blocked.

accuspam
08-18-04, 09:50 AM
Added "web.de" so impossible to get forged emails from a web.de email address, same as was done for hotmail and yahoo (http://forums.speedguide.net/showpost.php?p=1388432&postcount=141).

It is as easy as follows to add forgery blocking to AccuSpam for each free email provider:


dig -x217.72.192.221

; <<>> DiG 9.2.3rc4 <<>> -x217.72.192.221
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15696
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;221.192.72.217.in-addr.arpa. IN PTR

;; ANSWER SECTION:
221.192.72.217.in-addr.arpa. 3353 IN PTR fmmailgate01.web.de.

;; AUTHORITY SECTION:
192.72.217.in-addr.arpa. 3353 IN NS nsx2.cinetic.de.
192.72.217.in-addr.arpa. 3353 IN NS nsx1.cinetic.de.

;; Query time: 2 msec
;; SERVER: 209.68.2.239#53(209.68.2.239)
;; WHEN: Wed Aug 18 09:20:27 2004
;; MSG SIZE rcvd: 124

mysql> show create table dns;
+-------+--------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------------------------+
| dns | CREATE TABLE `dns` (
`Tld` varchar(127) NOT NULL default '',
`MajorISP` tinyint(3) unsigned NOT NULL default '1',
`PTRSupported` tinyint(3) unsigned NOT NULL default '2',
`PTRRequired` tinyint(3) unsigned NOT NULL default '0',
`TldNSMatches` varchar(127) NOT NULL default '',
PRIMARY KEY (`Tld`)
) TYPE=MyISAM |
+-------+--------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into dns values ('web.de','1','1','1','cinetic.de');
Query OK, 1 row affected (0.00 sec)


I was prompted to prioritize adding "web.de" (in advance of a planned comprehensive addition of all known (10,000+) free email providers), as I noticed the following forged email from a "web.de" address was NOT blocked by our best competitor, http://BrightMail.com:


Return-Path: <qwceilqxodjbd@online.sh.cn>
Received: from 207.217.125.20 ([211.191.62.186])
by robin (EarthLink SMTP Server) with SMTP id 1bX4fK48O3NZFjX0
Tue, 17 Aug 2004 06:43:47 -0700 (PDT)
Received: from dns3.web.de (dns3.web.de [73.212.13.183]) by 211.191.62.186 with SMTP id d7AJB51Jv7;
Tue, 17 Aug 2004 18:38:44 +0400
From: "Carmen Shepard" <kzwjxg@web.de>
Reply-To: "Carmen Shepard" <kzwjxg@web.de>
Subject: of 9 but
To: lrtimmons@earthlink.net
Cc: paulzaccardi@earthlink.net, aurora51@earthlink.net, cyndi6@earthlink.net, coolpage@earthlink.net, jusnjodi@earthlink.net
Message-ID: <B84EE85692174DC@web.de>
X-Mailer: crank case 62 curses
Date: Tue, 17 Aug 2004 20:43:44 +0600
Organization: philosopher 870 brides
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="=====250893080900=_"
X-ELNK-AV: 0

Dewey Blair,%RND_SYB ,cretin ,strengthen .%RND_SY Under ground C D !Check Your spouse and staff, Investigates anyone own cREDIT-HISTORY, Govenment don't want me to sell. hacking someone P C !Get a new passport! Disappear in your city very easy! http://acadu.bettersites.info/amite/CD3/ insomniac ,hypothetic , ,pouch ,din ,formidable ,adolphus . kinky ,lack .


Note the doing a reverse dns query of the IP address in first Received: header in above spam does not return "web.de" domain and "cinetic.de" nameserver, thus indicating it was not sent over the "web.de" webmail and is thus (with very high probability) a forged email:


dig -x211.191.62.186

; <<>> DiG 9.2.3rc4 <<>> -x211.191.62.186
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 54402
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;186.62.191.211.in-addr.arpa. IN PTR

;; Query time: 229 msec
;; SERVER: 209.68.2.239#53(209.68.2.239)
;; WHEN: Wed Aug 18 08:48:54 2004
;; MSG SIZE rcvd: 45



Whereas looking at following email I sent from a "web.de" account I created, notice I used the IP address in first Received: header in the query I used to configure our database (as shown in first Code section above).


Return-Path: <shelby_moore@web.de>
Received: from fmmailgate01.web.de ([217.72.192.221])
by sparrow (EarthLink SMTP Server) with ESMTP id 1bXqku7yV3NZFjV0
for <coolpage@earthlink.net>; Wed, 18 Aug 2004 06:18:14 -0700 (PDT)
Received: by fmmailgate01.web.de (8.12.6/8.12.6/webde Linux 0.7) with SMTP id i7IDHq1d016358; Wed, 18 Aug 2004 15:18:12 +0200
Received: from 203.168.2.77 by freemailng2002.web.de with HTTP;
Wed, 18 Aug 2004 15:18:07 +0200
Date: Wed, 18 Aug 2004 15:18:07 +0200
Message-Id: <30890719@web.de>
MIME-Version: 1.0
From: "Shelby Moore" <shelby_moore@web.de>
To: coolpage@earthlink.net, shelby@coolpage.com
Subject: coolpage@earthlink.net, shelby@coolpage.com
Precedence: fm-user
Organization: http://freemail.web.de/
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-ELNK-AV: 0

Test
____________________________________________________
Aufnehmen, abschicken, nah sein - So einfach ist
WEB.DE Video-Mail: http://freemail.web.de/?mc=021200



Enforcing reverse dns on free (webmail exclusive) email providers deletes forged spam that apparently http://BrightMail.com does not block.

I understand it is possible, yet not standard and complex, for some (1 in 10,000?) users of free email to configure (see "Method 2: How to Set Up a New Account that Sends Messages by Using an SMTP Server (http://support.microsoft.com/default.aspx?scid=kb;en-us;275510&Product=oex)") an email client to not send over the free email providers' network, but my opinion and assumption it simply isn't worth receiving all that forged spam from free email domains to insure against that rare chance (1 in million overall for all email received?). Those rare cases are easily handled by adding those rare users to Approved Senders list. My assumption is because by their nature, free email providers entice users who want to do webmail and who want an easy and free solution (not a complex one that requires paid password access to a non-open SMTP relay).

accuspam
08-18-04, 10:31 AM
At 11:55 AM 8/17/2004 -0400, you wrote:
Sirs:

Thank you very much for your free AccuSpam. I do not think I will need anything further as I am not a heavy user of email. I was just so annoyed at the spam mail and resulting pop-ups.

I just need to know if my existing blockers will interfere with your service?


As long as you are receiving the "Twice Daily Summary" emails from "AccuSpam Robot" then you are probably okay.

But if you lose a legit email, you will have to suspect your existing blockers. I noticed you are also using Spam Assassin (or am I mistaken?), which is known to delete legit email sometimes (severity depending on the Spam Assassin threshold set).

I would suggest turning off your existing blockers and see if AccuSpam can sufficiently block the spam. If not, then assuming you are receiving the "Twice Daily Summary" emails from "AccuSpam Robot", the turn back on your existing blockers. Repeat this test every couple of months, until you are satisfied that AccuSpam is sufficient without your existing blockers. Then leave the existing blockers off.

accuspam
08-19-04, 03:05 AM
Added that free version of AccuSpam is not compatible if you use forwarding to the protected email address:

AccuSpam FAQ Requirements (http://accuspam.com/faq.php#as_require)

This is because the reverse dns anti-forgery I added recently must have access to the original Received: headers of the email, which are normally deleted by most forwarding methods.

Most users do not use email forwarding.

Internal ISP forwarding that retains the Received: headers is compatible.

Failure to follow this Requirement, can lead to lost email.

accuspam
08-20-04, 08:40 AM
AccuSpam has announced it's superior proposal for anti-forgery, called SenderKeys (http://accuspam.com/senderkeys.php)(tm):

This directly competes with DomainKeys (http://antispam.yahoo.com/domainkeys) from Yahoo, SenderID (http://www.microsoft.com/mscorp/twc/privacy/spam_senderid.mspx) (aka CallerID) from Microsoft, and SPF (http://en.wikipedia.org/wiki/Sender_Permitted_From) from Pobox.com.

AccuSpam had promised this long ago (http://forum.icann.org/lists/stld-rfp-mail/msg00060.html) in the debate about .mail TLD (http://forum.icann.org/lists/stld-rfp-mail/) at ICANN.

accuspam
08-21-04, 06:32 AM
The rough snapshot estimate of current AccuSpam performance from Tony's account:

Approximately 1300 emails avg. per day processed by AccuSpam over the last month.

Only 40 emails per day in Twice Daily Summary with no probability to be spam, thus (1300 - 80) / 1300 = 94% spam deletion if Bayesian level false positive risk accepted.

Only 66 emails per day in Twice Daily Summary with greater than 99 in 100 probability to be spam, thus (1300 - 122) / 1300 = 91% spam deletion with medium false positive risk.

Only 103 emails per day in Twice Daily Summary total is (1300 - 206) / 1300 = 84% spam deletion with 0% (> 1 in million) false positive risk.

The shows about 1% improvement from where we were last week. 10 other AccuSpam users are now correlated with Tony, compared with 8 last week. We only have about 100 AccuSpam users. We really need about 1000 for the spam deletion rate in the Daily Summaries to hit Bayesian level without the Bayesian level risk of false positive.

-Shelby Moore
http://AccuSpam.com

accuspam
08-23-04, 07:04 AM
We are in the process of implementing anti-spam "honeypots" (aka "spam probes") to reduce with length of Daily Summaries without having to wait for more AccuSpam users.

This should be completed within August hopefully.

accuspam
08-23-04, 07:06 AM
SenderKeys Anti-Forgery proposal drastically improved and now has discussion list:

http://www.accuspam.com/senderkeys.php

It can now optionally be implemented entirely at the MTA (mail server) level without requiring MUA upgrades!

It now depends on (any one of) the 3 major anti-forgery proposals, so it will be seen as less of a threat to them and more complementary.

Rainbow
08-29-04, 12:23 PM
My Blocked senders list keeps getting bigger and bigger I think its time to give acuspam a try :)

accuspam
09-07-04, 08:02 PM
Example input we are receiving from satisfied AccuSpam users.

To: "The Old Map Company" <postmaster@oldmapxxxxxx>
Subject: Re: AccuSpam Comments
Cc:


Thanks.

Hope you do not mind if we post your comments to our Forum so others can be aware of the benefits you initially got with AccuSpam.

Actually you will find that AccuSpam will improve over time (we are still improving the algorithms), and eventually you will only get a Daily Summary if you have legit e-mail from previously unknown sender.

The Cool Page button was linked, but we added a link in the text based on your feedback.

Yes you can be sure with AccuSpam that you will never receive any spam that you did not specifically request, except as per the caveats in our FAQ if you are using the free version. If ever you need an absolute 100% insurance, you can upgrade to our paid version when it is available.


At 09:55 AM 9/7/2004 +0100, Steve wrote:
>Hello
>
>Trialing for a few days now and this all looks very promising. There was
>always the odd spam mail that demanded a quick look and the Newsletters
>(with the ad links that also demanded a look) one should have un-subscribed
>from, but never got around to hitting the button. Also I can now let my
>family have access to my mail box in the knowledge they will not be exposed
>to anything unpleasant. AccuSpam must be saving me an hour a day! That's
>more than two weeks a year, or placing a conservative value on my time as
>£10 per hour - £3,650 (US$6,500!) Congratulations.
>
>Steve Robxxxxx
>www.rag-dollxxxxxx
>
>PS You have a link missing on your site - To quickly design cool, creative
>web sites, we recommend ?
>Trust it's Coolpage!

Paft
09-07-04, 08:12 PM
SenderKeys Anti-Forgery proposal drastically improved and now has discussion list:

http://www.accuspam.com/senderkeys.php

It can now optionally be implemented entirely at the MTA (mail server) level without requiring MUA upgrades!

It now depends on (any one of) the 3 major anti-forgery proposals, so it will be seen as less of a threat to them and more complementary.

This seems like the private/public key model that PGP and GPG use already for encryption, tied in with email.

COOL.

shikaza
10-23-04, 03:29 PM
shikaza is her :irate:

accuspam
01-06-05, 07:54 AM
A superior underlying algorithm for AccuSpam will be released probably today. Nothing will need to change in the user interface of AccuSpam at this time.

The new algorithm correlates (among all users in a safe manner) on highly recursive content fragments instead of domain of sender, making it less susceptible to error from excessive email forgery of a domain, and more accurate against ISPs (domains) which send both spam and non-spam.

This algorithm also effectively increases the statistical reach of AccuSpam's user count, because spam content fragments cross-correlate more often than domain of sender of spam.

Unlike the very popular Bayesian statistics for anti-spam (e.g. used in Spam Assassin used my many ISPs), this algorithm continually re-trains itself, it will not generate a false positive (delete non-spam) or false negative (fail to block spam) when YOUR current non-spam or spam, suddenly has a shift in content that (in terms of Bayesian statistics) resembles YOUR past spam or non-spam respectively. The risks of Bayesian were detailed further in past posts:

http://forums.speedguide.net/showpost.php?p=1383127&postcount=111

http://forums.speedguide.net/showpost.php?p=1386422&postcount=126

accuspam
01-07-05, 12:04 AM
A reply we sent to a customer today:

Hi,
1
Your promo on the home page says you just sign up and carry on as before. How can that be when an approved sender list must be compiled?


A single Daily Summary email is sent to you (automatically by our robot) with a COMPACT list of temporarily blocked emails (those that we were not sure if spam or not) and you can reply to that email back to our robot with an "A" in the [ ] box next to each sender you want to receive email from.

Once a sender is added to your Approved Senders list (by any method, i.e. directly or replying to Daily Summary, other method in future by login, etc..), then you always get email from that sender immediately in your Inbox and the following does not apply to that sender any more.


2
a) A friend of mine gives my address to another friend who emails me.

b) A person sees my address on a business card and emails me showing an interest in my product.

c) A person sees my address on a business card and emails me with info about his product, relevant to my industry.

All three above are unsolicited, all from addresses not see before by the anti spam software, and yet are welcome.


They will all appear in your Daily Summary email. And as our usership grows, less and less spam appears in the Daily Summary. Somtime this year, all you will see in Daily Summary are new senders. In that future scenario (where our undelying statistic spam detection is 99.99%) then on the days you do not get new senders, then those days you do not Daily Summary emails.

See our announcement of superior statistical algorithm yesterday:

http://forums.speedguide.net/showpost.php?p=1516823&postcount=154


Without intervention by the user, no antispam system could possibly know which of the above emails are welcome and which are not.


Not true. Our underlying statistical algorithm is able to know this, we just do not have enough users yet (we have 1184 user as of today) to detect 99.99% of spam statistically. Once we have 10,000+ users, there will be an option for paid users to turn off the Daily Summary and allow new senders directly into Inbox. However, note that we need to know the Approved Senders in order to drive our statistical algorithm. However in that future scenario, we will be able to auto-populate the Approved Senders by seeing that you have received email from a new sender more than once and have not chosen to block the sender. In that scenario without a Daily Summary, then you will login to AccuSpam.com to report any spam received in your Inbox. But we won't enable such an option until our underlying spam detection is 99.99%.

Right now AccuSpam is 100% because it blocks everything that is not an Approved Sender and them compiles it into a Daily Summary. About 50% of incoming spam is detected (40+% by detecting nonexistent sender and 10% by statistics) and not included in the Daily Summary. The nonexistent sender and statistical algorithm is mathematically certain to never delete a non-spam more frequently than once in a million spams. In other words, the false positive accuracy is always 99.9999%. The 99.99% accuracy we are aiming for is to increase the statistical spam detection from current 10% to 99.99%.


If this point is agreed with then how would Accuspam be different to, say, Mailwasher where mail received from an address not yet seen by Mailwasher must be viewed by the user before being manually blacklisted.


1. AccuSpam is currently deleting 50% of spam automatically before the Daily Summary with 99.9999% accuracy. 10% is being done by correlating spam and non-spam content among all users, and this detection will increase to 99.99% this year as our usership grows. The exact algorithm is secret and includes some "magic" (math) which will hopefully be patented soon. It is quite different from Bayesian algorithm that many anti-spam products use, with some distinct advantages.

(Note the previous statistical algorithm which correlate sender domain, was not achieving the desired 99.9999% accuracy because many ISP's domains are used to send spam as well as non-spam. This wasn't a big problem, because the statistical sender domain correlation was only affecting (detecting as spam) 10% or less of incoming email and still with a very high accuracy. However, we have fixed this with the announcement mentioned above. It would have become a bigger problem as our usership increases, and now we have a very accurate statistical algorithm to build on as usership grows).

2. 100% spam protection is guaranteed by the Daily Summary email, which is much more COMPACT and SAFE way of reviewing suspect email not caught by the underlying algorithm than MailWasher which downloads all the spam and viruses to your computer BEFORE you are shown them and make choice whether to blacklist or receive them.

3. MailWasher is not correlating spam statistics with other users and has no underlying statistical way to detect spam automatically. Some products (maybe Mailwasher) will attempt to correlate only YOUR spam stats to detect spam (Bayesian), and the drawbacks of Bayesian are discussed in the link to the AccuSpam Forum I gave above.

4. MailWasher only protects email you download to your computer. AccuSpam protects your mailbox, no matter where or how you access it, e.g. using WebMail or from other computer.

5. You have to download and install MailWasher (and learn to use with each mail program you use) to every computer you want to protect. With AccuSpam, just signup your mailbox online in 1 minute and you are done.

6. The are sure to be technical compatibility issues for some computers and some mail programs when using a program such as MailWasher which runs on the computer you are using. AccuSpam runs on our server and communicates to your mailbox on your ISP's server, via standard POP3 protocol, and thus compatibilty issues are very, very rare and any compatibility issues are discovered when you attempt to signup. If your ISP's POP3 mailbox server is not compatible, you won't be able to signup for AccuSpam (we do numerous POP3 compatibility checks at signup). You won't have nasty problems later. And such incompatibility is very, very, very rare, because POP3 is a very, very, very universal standard for email mailbox delivery.

7. If your computer crashes or gets virus, your anti-spam does not crash or get compromised with AccuSpam.


And a promotional email my be sent out in bulk to subscribers, with a few extra to relevant ( as above ) industry pariticipants, so if software scrubs it just becaus a lot of others like it have alos been sent out it will fail the user again.


Our new underlying statistical algorithm will not scrub a desireable bulk email (newsletter, etc), because some of our users will have the sender of the desireable bulk email on their Approved Senders list and this will tell our algorithm that the content of that bulk email is not spam.

Again that is why I said we need the Approved Senders list to feed our statistical correlation algorithm. And again I said we can eventually get rid of the Daily Summary, once we reach critical mass of usership. In the meantime, it works very well, which is why we have a growing usership.

We will post our reply to our Forum for the benefit of the public knowledge.

accuspam
01-14-05, 08:19 AM
The following descriptions of AccuSpam's algorithms are not a license, nor a public grant of any rights. AccuSpam reserves all rights. A patent will be filed on this algorithms.

The Major Algorithm Update (http://forums.speedguide.net/showpost.php?p=1516823&postcount=154) is working as expected. The amount of spam summarized in the Daily Summary is drastically reduced, because most spam is being recognized and deleted (safely with less than an impossible 1 in million (0.0001%) risk of losing non-spam) by this new statistical algorithm which will call "Chunk".

The "Chunk" algorithm has many benefits as compared to the per-user Bayesian algorithm used by most all other anti-spam (e.g. Spam Assassin uses Bayesian and is used by many ISPs):


(1) Analyzes data from all users

(2) Automatically trains itself in real-time

(3) Only needs to be told what *some* non-spam is (does not need to have every incoming email trained on). We get the non-spam data from users' whitelists.

(4) Automatically recognizes new strains of spam in real-time (does not have to be trained on new spam), e.g. "Viagra" changed to "Ciali$". Can not be fooled by changes in spam (randomization) between spam runs.

(5) Automatically recognizes new strains of non-spam (new to one user, but not new to all users). In other words, it doesn't get confused if you contact an insurance company for a quote, but you have classified insurance emails as spam in the past.

(6) Detects much higher rate of spam, with a much lower rate of false positives. The false positive rate can be set in the probability calculations of the algorithm (e.g. 1 in million is 0.0001%), compared to 0.03% (1 in 3333) for Bayesian. So Bayesian will lose a legit email every 3333 emails received, whereas AccuSpam will never (1 in million) lose non-spam:

http://www.paulgraham.com/better.html

http://citeseer.nj.nec.com/androutsopoulos00learning.html
(See Page 9 of the PDF linked at top)


(7) 100% immune to users who misclassify non-spam as spam.

(8) Is more immune than Bayesian to users who misclassify spam as non-spam. Also we monitor users to discovers spammers who signup for AccuSpam to approve their own spam. Besides getting spam past the statistical algorithm in AccuSpam is useless, because it goes into Daily Summary and still is blocked and body of email never read by users.

(9) Uses an *EXACT* probability calculation. Bayesian which counts statistical evidence, then uses an ad h