View Full Version : Our Anti-spam is Ready! - Accuspam
After months of more research, fine tuning the programming and testing, our promised anti-spam solution is now up and running.
Check it out:
http://www.accuspam.com
quote
AccuSpamTM is the only anti-spam in the world which blocks 100% of spam and never fails to show you the non-spam.
fastchevy
07-21-04, 09:04 AM
How's it working?
We just implementing an internet appliance for spam from CyberTrust as well.
(Iron Mountain) We are in the beginning stages of fine tuning.
I will be sending everyone who asks how to keep spam out of thier mail to the site.
Good stuff Tony :thumb:
Prey521
07-21-04, 09:47 AM
TT, can you better explain to me the secret email account option? Thanks!
Sounds like a very complex bayesian filtering system, Tony.
:thumb:
Nice work.
Sounds like a very complex bayesian filtering system, Tony.
Nice work.
Nope! Bayesian does not work well for anti-spam.
Prey:
re Secret Email Address used in paid version:
see http://accuspam.com/faq.php#as_configure
and click on "Click here to create the SECRET email account correctly."
Looks great Tony. I'll be signing up for this one and will put the word out like Norm said.
YARDofSTUF
07-21-04, 04:37 PM
Site menu still looks wrong in firefox like when you first started posting this stuf, but its managable and yes, good work Tony!!!
Nope! Bayesian does not work well for anti-spam.
With a sample size as large as yours, it would work superbly. And I use Bayesian filtering for my mail.. 99.4% accuracy.. and my sample size is probally thousands of messages less than yours!
I'm not sure why you'd say it doesn't work well for anti spam.
AceFireball
07-21-04, 05:42 PM
Site menu still looks wrong in firefox \
Same for me, Great job tho
I am signed up and will be testing it
chimdogger
07-21-04, 06:52 PM
I signed up. I will let you know how it goes. Thanks for the linkage / invite. :thumb:
Paft
I'm not sure why you'd say it doesn't work well for anti spam.
The word "well" is a relative term. 99.4% may be "well" for a home user, but for a business, that means that .6% spam can get through (which is not too bad) but it also means that legitamate messages can get lost). In a large business, that single "lost or deleted message" can cost thousands of dollars). An email account or network with HUGE volume of traffic means that many weekly hours get wasted by employees who must take the tile to cull their inboxes weeding out those .6% spams messages.
A brief description of Bayesian (this was once pn a page at Accuspam site but has been removed)
Bayesian (Statistical) Phase/Word Content Filtering
(used by SpamAssassin, SpamCop, Sophos, etc)
1. Easily subvertable by future spam which uses random phrases/words or Markov chains
2. Cannot detect email viruses
3. Misses spam with no or very few phrases/words
4. Misses spam with clever methods (http://www.sophos.com/spaminfo/fieldguide/fieldguide.html) of hiding phrases/words
5. Misses some spam that have most phrases/words in common with your legitimate email
6. Occasionally blocks legimitate email which contains some phrases/words in common with spam. This research paper (http://eric.univ-lyon2.fr/~pkdd2000/Download/WS4_01.pdf) (requires Adobe Acrobat is running to view) shows in section 6.3 on page 9 that Bayesian is inferior to using no filter at all ("baseline"), for those who place a high cost on erroneous blocking of your non-spam email.
7. Requires initial training on large sample of your legitimate and spam emails
8. Requires periodic re-training as spam content mutates
Paft
A brief description of Bayesian (this was once pn a page at Accuspam site but has been removed)
Bayesian (Statistical) Phase/Word Content Filtering
(used by SpamAssassin, SpamCop, Sophos, etc)
1. Easily subvertable by future spam which uses random phrases/words or Markov chains
2. Cannot detect email viruses
3. Misses spam with no or very few phrases/words
4. Misses spam with clever methods (http://www.sophos.com/spaminfo/fieldguide/fieldguide.html) of hiding phrases/words
5. Misses some spam that have most phrases/words in common with your legitimate email
6. Occasionally blocks legimitate email which contains some phrases/words in common with spam. This research paper (http://eric.univ-lyon2.fr/~pkdd2000/Download/WS4_01.pdf) (requires Adobe Acrobat is running to view) shows in section 6.3 on page 9 that Bayesian is inferior to using no filter at all ("baseline"), for those who place a high cost on erroneous blocking of your non-spam email.
7. Requires initial training on large sample of your legitimate and spam emails
8. Requires periodic re-training as spam content mutates
1.) True
2.) True - No spam filtering method can, however. Only in conjunction with an antivirus can this be done.
3.) Depends on configuration. Can be configured to throw out mail with low word counts.
4.) Most Bayesian filters can be configured to scan sections of emails so that they can detect the broken URLs, and read HTML/CSS (the font/image tricks) as the messages come in. Granted there are some tricks that can't be detected, but I'd love to see your algorithm to see how you get by it.
5.) How does yours avoid that?
6.) See 5
7.) Yes it does. As does anything requiring statistical analysis.
8.) Which is constant with you flagging messages as junk or not junk in the case of errors.
I'm not doubting your software, I'm just wondering what the algorithm is you're using to bypass the Bayesian failures. :)
1.) True
2.) True - No spam filtering method can, however. Only in conjunction with an antivirus can this be done.
3.) Depends on configuration. Can be configured to throw out mail with low word counts.
4.) Most Bayesian filters can be configured to scan sections of emails so that they can detect the broken URLs, and read HTML/CSS (the font/image tricks) as the messages come in. Granted there are some tricks that can't be detected, but I'd love to see your algorithm to see how you get by it.
5.) How does yours avoid that?
6.) See 5
7.) Yes it does. As does anything requiring statistical analysis.
8.) Which is constant with you flagging messages as junk or not junk in the case of errors.
I'm not doubting your software, I'm just wondering what the algorithm is you're using to bypass the Bayesian failures. :)
I understand your curiosity!
response to above:
2. We use NO antivirus in conjunction, none is needed. Email viruses are MIME TYPES and these can be detected immediately.
3. Yes, but it would also throw out legit mail too.
8. The short answer is that Acuspam is "self-learning" and "self-training" and user response is only asked for to speed up this training cycle as SOME bulk mail is legit, as in the user's subscriptions.
I cannot devuldge what I know of the algorthm but I can say that it is built upon this definition of what spam is:
Spam = UBE = Unauthorized Bulk Email
and that some mathematical calculations have to do with overall volume of mail being sent based upon certain criteria.
I am certain that the developer will respond in this thread later to answer any technical questions you may have. (he is presently in a time zone 12 hrs offset from me)
I understand your curiosity!
response to above:
2. We use NO antivirus in conjunction, none is needed. Email viruses are MIME TYPES and these can be detected immediately.
3. Yes, but it would also throw out legit mail too.
8. The short answer is that Acuspam is "self-learning" and "self-training" and user response is only asked for to speed up this training cycle as SOME bulk mail is legit, as in the user's subscriptions.
I cannot devuldge what I know of the algorthm but I can say that it is built upon this definition of what spam is:
Spam = UBE = Unauthorized Bulk Email
and that some mathematical calculations have to do with overall volume of mail being sent based upon certain criteria.
I am certain that the developer will respond in this thread later to answer any technical questions you may have. (he is presently in a time zone 12 hrs offset from me)
Ah, nice catch. Yes, you could very well check for MIME types in the application/xxxxxx family.
*grins* Self-learning and self-training. So you're definately on a ruleset of some kind. Token/response system?
I'd definately love to talk to the developer. :) Hell, I'd sign any NDA or anything else you'd want me to sign if that's what would be needed to even get a description of this algorithm.
*interest peaked*
The developer ( Shelby Moore) just emailed me this:
tt
1. It is working very well so far. We need another 24 hours or so to see if any bugs pop up. A noteable stat thus far is that 93% of incoming email so far is spam (for our users) and AccuSpam is of course blocking 100% of it. Note that the statistical aspect of AccuSpam is not going to kick in until we have many more users signed up. However, AccuSpam works more accurately than other anti-spam even without the statistical effect.
2. It is definitely NOT bayesian, although we could employ bayesian in future as a way of grouping those more likely to be spam in the daily summary emails. However, we would never use bayesian as the determinant of which email to delete, because bayesian has one of the worst false positive rates of any anti-spam.
This http://eric.univ-lyon2.fr/~pkdd2000/Download/WS4_01.pdf research paper</a> (requires Adobe Acrobat is running to view) shows in section 6.3 on page 9 that Bayesian is inferior to using no filter at all (baseline), for those who place a high cost on erroneous blocking of your non-spam email!
One of the biggest mistakes to make when comparing anti-spam systems is to focus only on the false positive accuracy (i.e. the spam detection rate) and not also focus on the false negative accuracy (i.e. the non-spam misclassification). If you have to spend all your time browsing the spam folder to find misplaced non-spam, then it is the same WORK as using no filter at all. Nothing has been gained with Bayesian over no filter at all, unless you do not care about losing an occassional non-spam email.
Spammers can use "reverse Bayesian" techniques to tweak their spam to pass through your Bayesian filter. For example, they can use a genetic algorithm to generate many different spams and then place an <img> in the email which pings their server, so the algorithm can get statistical data on which spams are passing through the filter. From that, they can completely rebuild your bayesian statistics. There are other techniques as well. Simply morphing content a lot and sending a lot of different content can subvert Bayesian.
The worst possibility which I warned Paul Graham about when he first proposed Bayesian for anti-spam to the world in 2002 (and which he mostly ignored):
Spam that learns to not be statistically identified:
http://ixazon.dynip.com/pipermail/nilsimsa/2002-December/000041.html
Is that as spammers morph spam to look more and more like your non-spam, then the false positive inaccuracy (the missclassification of non-spam) increases drastically. And worse, then it makes it much more time consuming to browse the spam folder than before no bayesian, because the spam now looks very similar to the non-spam.
Note in that prophetic post, I predicted that spammers would end up using "reverse bayesian" techniques.
3. Unlike most other anti-spam, AccuSpam never looks at the content of the email, only the headers. This is good indicator that AccuSpam is going to misstakenly delete an important email just because it has the word "penis" or "drug" in it. Using bayesian to group the probable spam in the daily summaries would make AccuSpam less efficient and less private (although no human is involved). There are easier and better ways for us to do this grouping, and we will be implementing this in coming weeks. But this is grouping only, not the algorithm we use to detect and delete spam. One way to do grouping better than bayesian is to leverage the best real-time blacklists. Again we would not use these to delete email (as blacklists are known to cause false positives), only to try to group the spam from the non-spam in the daily summaries.
Here is what the daily summary says, which should give you some insight what we are writing about here and also explain how the statistics works. Realize if you read all of this below, that you won't be seeing these daily summaries often, once there are many users:
Daily Summary Of Possible Spam
READ CAREFULLY PLEASE!
Please click Reply and send this entire email back,
and type an "X" in the [] boxes for only the
emails below, which you wish to be delivered to
your Inbox.
When you reply, emails without an "X" or "D",
are PERMANENTLY DELETED and can not be recovered.
Occasionally NON-SPAM EMAILS WILL APPEAR BELOW
from senders who never emailed you before, so
make sure you scan all email subjects below
before replying.
Since you joined AccuSpam:
?? spams deleted (most automatically)
?? emails delivered (most automatically)
??% of your email has been spam
100% of this spam has been blocked from your Inbox
When you reply with an "X" in the [] box, then you
will never see this message again for that sender, the
sender is added to your Approved Senders list, and all
future emails from that sender will be automatically
delivered to your Inbox.
To deliver an email below, but NOT add the sender to your
Approved Senders list and NOT auto-deliver all future
emails from the sender, then type a "D" instead
of an "X" in the [] box.
(Then a list of potential spams which were not detected via deliverable and reputation statistics)
When you reply, you are helping AccuSpam statistically
detect spam. Your reply is deleting your spam and the
spam of the other AccuSpam users. Also the replies of
other AccuSpam users is deleting your spam before this
message is sent to you, thus reducing the number of spam
subjects you must review in this message. As the number
of AccuSpam users grow, the frequency of these messages
and the number of spam subjects in them will reduce
eventually to almost never. Thus you are required to reply.
Statistically even erroneous or malicious replies of other
users can never delete your non-spam.
To illustrate the rationale for replying, assume the number
of new deliverable, unforged spam senders per day to be
10,000. Thus with 10,000 AccuSpam users, each user will
only have to review 10 or less spam subjects per day. That
takes into account an approximate factor of 10 for
statistical safety. Then with 1 million AccuSpam users (i.e.
only 1/10th of 1% of all email users), each AccuSpam user
would only have to review 1 spam subject every 10 days.
accuspam
07-21-04, 09:25 PM
First a few corrections:
> I wrote:
>This is good indicator that AccuSpam is going to
> misstakenly delete an important email just because
> it has the word "penis" or "drug" in it.
Typo. I meant "...NOT going to misstakenly...".
> Tony wrote:
> 2. ...Email viruses are MIME TYPES and these can be
> detected immediately.
Note AccuSpam does not need to look at Mime types,
because it is blocking 100% of spam (email viruses are
spam). The only ways you could receive a spam using
AccuSpam are explained here:
http://www.accuspam.com/faq.php#as_spam
> Tony wrote:
> 8. The short answer is that Accuspam is "self-learning"
> and "self-training"...
That could be misleading statement. AccuSpam is not
training itself, but it may seem like it is because the
statistics employed have a way of reducing the effort
for users to near nothing, so it seems like it is automatic.
But in reality, the statistics are done by the users, it just
doesn't take much effort per user, because spam is sent
in such large quantities (this all assuming we will have
many AccuSpam uses to spread the work on to).
And note that the paid version can leverage the free users
so paid user never has to answer the daily summary:
http://www.accuspam.com/faq.php#as_paid
I will write more about the statistics in next post.
Kind Regards,
Shelby Moore III
CEO 3Dize, Inc. (coolpage.com)
CEO DownloadFAST.com, Inc.
founder and main programmer of AccuSpam.com (AntiViotic.com)
main programmer of Cool Page* (1998-), Art-O-Matic* (1996-8), WordUp* (1986-90), TurboJet (1988)
contributing programmer to DownloadFAST.com* (2001-2), Corel Painter* (1993-5), Corel ArtDabbler, EOS PhotoModeler (1996), FONTZ! (1988)
shelby@coolpage.com
* denotes major involvement in massive multi-year R&D projects with millions of characters (1000s of pages) of code
accuspam
07-21-04, 09:42 PM
> 2.) True - No spam filtering method can, however.
> Only in conjunction with an antivirus can this be done.
Any anti-spam which blocks 100% can also block 100% email viruses.
Also remember 100% blocking is not enough to compare anti-spam systems. I laugh when I see the ad for the C/R system (mailmoper.com I believe) which is offering to pay $1 for every spam you recieve, but they would never dare pay you $1 for each non-spam you will not receive!
So when comparing anti-spam, don't forget to compare the false positive rate also. AccuSpam is 0% false positive! It is the only one in the industry!
> 3.) Depends on configuration. Can be configured
> to throw out mail with low word counts.
Some non-spam would be thrown out too.
That nasty little false positive rate issue is the achilles heal of all other anti-spam (except BrightMail.com which we feel is our best competitor but they don't do 100% blocking).
> 4.) Most Bayesian filters can be configured to scan
> sections of emails so that they can detect the broken
> URLs, and read HTML/CSS (the font/image tricks) as
> the messages come in. Granted there are some tricks
> that can't be detected, but I'd love to see your
> algorithm to see how you get by it.
One of the tricks that can not be detected by bayesian is to put all the letters of a word in a table so that they are just letters to bayesian but visually they layout to be word.
And these tricks are always increasing in variety. Content filters are just "hacks" or "heuristics" in my opinion. BrightMail uses spam problems in conjunction with humans to constantly adjust these heuristics, so they are in effect using the economy-of-scale of the fact that spam is sent in large quantities. But bayesian in itself does not gain much from the fact that spam is sent in large quantities. More on that below...
See below for explanation of why AccuSpam is not subverted by these tricks.
> 5.) How does yours avoid that?
Simple. AccuSpam never looks at content. So it can not be tricked by content :)
> 6.) See 5
Ditto.
>7.) Yes it does. As does anything requiring statistical analysis.
>8.) Which is constant with you flagging messages as junk or not junk in the case of errors.
Correct, except the difference is that with AccuSpam, only a few users have to flag a message as spam and then all other users do not have to. Where "few" is defined by the statistical "significance" or "accuracy" we desire (sigma). As the number of AccuSpam users increases, then when the user is asked for this input, they will only be looking at a very few spams to classify and very infrequently. Unlike a Bayesian system where you are constantly having to look at ALL the spam in your junk folder for non-spam.
When we reach 1 million users for AccuSpam (e.g. 1/300 of # of BrightMail users), then we expect that each user will only be asked about a few spams maybe once a month.
> I'm not doubting your software, I'm just
> wondering what the algorithm is you're
> using to bypass the Bayesian failures.
Simple. We do not look at content. We look at senders and senders' domains. We run real-time statistics from that. Also we detect undeliverables and forgeries using the auto-response (which deletes a large % of the spam without asking users...maybe that is what Tony meant by self-training but these deletes are not input into the statistics portion), and is necesary to force the spammers to reveal their true addresses. See the "How It Works" section:
http://www.accuspam.com/accuspam.php#how
Bayesian makes a statistical assumption that the non-spam and spam words are mutually exclusive. That is why they call it "naive" bayesian. Without that assumption, then the Bayesian stats can not be computed (realistically). This is fundamentally why Bayesian for anti-spam is error prone, because that mathematical assumption is not entirely true. See Paul Graham's web site for some discussion of this math or better to Google "naive Bayesian spam".
Whereas AccuSpam is not measuring the statistics of content. It is measuring the statistics of the opinions of which senders and domains are spam as compared to all other senders and domains. Thus we can choose any statistical accuracy we want. If we set a sigma of say 10, then we get 99.9999+% accuracy (did not take the time to calculate that exactly...just for illustration of point).
There is no equivalent way to dial in Bayesian. One can trade false negative rate for false positive rate in any statistical method for anti-spam, but with Bayesian you can never get any where near 0% false positive rate, because the underlying math assumption is not that accurate.
So the next line of thought is to compare AccuSpam to other (than bayesian) statistical anti-spam, especially those that use data from many users.
http://www.cloudmark.com/products/spamnet/features/
I think CloudMark uses Vipul's Razor as it's statistical network:
http://razor.sourceforge.net/
This is confirmed in the FAQ:
http://razor.sourceforge.net/docs/doc.php?type=text&name=FAQ
Some clear differences from AccuSpam:
1. Email is delivered before recipient can be ask if a message is spam (recipient is not asked whether to deliver email from non-approved sender). Thus system is not 100% effective.
2. Blacklisting is by content signatures, not sender address. Thus false positives result.
3. The statistics can not be as accurate, because AccuSpam is sampling a huge quantity for domains as a baselines. Razor does not measure by sender (domain or address), so statistics have to baselined according to a recipient trust metric. A key fact of statistics is that sampling error decreases as sample size increases.
Note that the DCC anti-spam system does not use statistical methods (standard deviation, confidence intervals, etc) and it suffers from many problems such as #2 and either #1 or whitelist management problems of Challenge Response systems:
http://www.rhyolite.com/anti-spam/dcc/
-Shelby Moore
http://AccuSpam.com
accuspam
07-21-04, 09:49 PM
>> BrightMail uses spam problems in conjunction...
Typo. Meant "BrightMail uses milliions of spam probes in conjunction..."
So, basically, you're stating that because you do not look at email content, you are immune from spammers.
So how do you block spoofed From addresses (to a valid domain), people who run off from free email sites (hotmail, yahoo, etc), people who buy domains just to run an SMTP server from; and still manage to detect the perhaps .001% of users who actually WANT to get mail from, say, xxxhotb4b3s.com?
Statistically, if you're basing your assumptions using NHST (Null Hypothesis Statistics Testing), you HAVE TO HAVE some sort of error level. Either Type 1 or Type 2 (False positives [calling non-spam spam], or false negatives [calling spam non-spam]) errors. If your alpha level is, say, .00001%, and your null hypothesis is that "this mail is spam", then you have an INCREDIBLY SMALL chance to correct a mistake if your software calls a mail spam (say the return email hits their server during a major network outage).
How do you avoid situations like that?
accuspam
07-21-04, 09:57 PM
A common misconception is something like "but spammers will just change their address or domain".
Again I urge the reader to read more carefully the "How It Works" section and focus on the words "undeliverable" and "forged". By detecting "undeliverable" and "forged", it becomes prohibitively expensive for the spammer to change addresses and domains often enough to escape real-time statistical detection:
http://www.accuspam.com/accuspam.php#how
As a side benefit, corporations will be able to license AccuSpam to insure than no spammer can forge their address when emailing to AccuSpam users. That option will be coming to our web site soon. There is nothing the users have to change. It is already built in when they sign up.
Any more questions? If yes, then send me an email to:
shelby@coolpage.com
to alert me that you made a post here for me to answer. That is how confident I am in AccuSpam. I am willing to post my personal email address in a public forum! Spammers PLEASE SEND ME SPAM! :-) ;-) :o
-Shelby Moore
http://AccuSpam.com
accuspam
07-21-04, 10:21 PM
> So how do you block spoofed From addresses (to a valid domain...
Spoofed address are either undeliverable or they are forged. We detect both.
> ...you have an INCREDIBLY SMALL chance to correct a mistake if
> your software calls a mail spam...
First, the detection of undeliverables is robust, because SMTP is robust. I am sure you have seen those emails "Warning message not delivered in 4 hours...will keeping trying".
Second, the undeliverables do not feed into the statistics.
Third, what ever confidence level we select for our hypothesis, is the error rate we will see. So yes, 0% would be more correctly stated as 0.00001% or 1 in million. Yes AccuSpam could lose 1 in 1 million non-spam emails, which is one reason why we offer an "undo" in the paid version:
http://www.accuspam.com/faq.php#as_paid
But who really cares about 1 in 1 million? Compare that to Bayesian, Challenge Response, real-time blacklists, or heuristic filters, all which are more like 1 in 1000. That is the difference between losing 1 non-spam in 1000 days (for AccuSpam) and every day (for the other anti-spam).
As I said, I only know of BrightMail which can compare to our false positive rate.
As for the false negative error rate in our hypothesis, all undetected spam is presented in the "daily" (frequency will be more like monthly as number of users grows) summary, so never delivered to Inbox without permission of user.
The bottom line is that for the user the perception will be:
1. AccuSpam detects and deletes most (soon to be 99+%) of my spam automatically, and NEVER (100% blocking) is spam delivered to my Inbox.
2. Occassionally (perhaps monthly) get an email from AccuSpam asking whether a few spams are spam or not. I send it back with my answers. Minimal and infrequent effort.
3. "Never" (1 in 1 million) do I lose non-spam or have to go browse a spam folder.
So in essense you get 100% protection, lose 0% non-spam (1 in 1 million), and do not have daily effort or hassle of browsing spam (as the number of AccuSpam users increases).
-Shelby Moore
http://AccuSpam.com
*nods* Gotcha. Nice system. :)
accuspam
07-21-04, 10:31 PM
Correct another typo I made in previous post :(
"One of the biggest mistakes to make when comparing anti-spam systems is to focus only on the false **NEGATIVE** accuracy (i.e. the spam detection rate) and not also focus on the false **POSITIVE** accuracy (i.e. the non-spam misclassification)."
accuspam
07-22-04, 03:33 AM
As you know, we just released AccuSpam less than 24 hours ago and it usually takes at least 24 hours to discover any post-release bugs. That is normal and expected.
I just fixed two bugs, neither of which I expect to have lost any email. The worst case was some small corruptions in emails. Note it is fixed now, but those emails which arrived before now, can not be fixed. So when you process your next daily summary you may get a few of these corruptions but they are usually not worrisome (e.g. a "=" at end of each line of the line). Also some attachments could have been lost, but not entire emails.
By tomorrow this fix will have fully propogated and should not see any more problems.
For the more inquisitive minds, the specific bugs where:
1. We were failing to propogate some of the critical Mime encoding headers. Changing to a case-insensitive search fixed this.
2. We were failing to propogate the "Date:" header so all the dates were getting changed on delivered emails. This should be fixed as well.
3. An obscure case in the bounce detection logic was fixed/improved. The case where a sender replies to the AccuSpam confirmation, but changes the subject significantly (e.g. more than "Re: " or "Re[4]: ") but also returns the body (even if changed) thus it does not look like an auto-response. This obscure case almost never happens, but just to be safe we added a check whether the sender has changed, since most bounces use "postmaster@", "mailer-daemon@", etc.. I actually found this because I lost some important email when our host support department replied (not auto-response but human reply) but mangled the subject with a ticket #.
-Shelby Moore
http://AccuSpam.com
accuspam
07-22-04, 10:54 PM
Major improvement coming!
Note these algorithms are AccuSpam's inventions and mentioning them here in no way gives rights to others to use these algorithms without a license from AccuSpam.
From the initial usership, we see that much of the spam in the Daily Summaries is coming from the same domains (as expected) but different senders (part before the @ changes) which is also as expected.
What we just realized is that we don't need to have the user manually blacklist those domains (we need planned to) and we do not have to wait for many, many users of AccuSpam to do the global domain blacklisting (that was what we planned).
We can simply apply our domain blocking statistics per user. So as a user builds up data about domains, that users domains which are always sending spam will get statistically (with confidence that insures 1 in million false positives) blocked.
This should drastically reduce the # of spam subjects in the Daily Summaries.
We hope to implement it next week.
We ALSO will still retain the (as originally planned) global statistical blocking of domains and senders, but this won't be able to kick in until we have many 1000s of users.
-Shelby Moore
http://AccuSpam.com
I feel like a dunce. Even after reading all of this, I still don't understand how it all works. As it is, I am preparing a huge export of my address book and typing in every possible e-mail address I might need to let through
accuspam
07-23-04, 09:43 PM
Why do you feel like you need to know how it works?
Are you experiencing some issues with using it?
It is not absolutely necessary to add your Address Book to the Approved Senders list (although more efficient), as these will get added when you scan your Daily Summaries for legit email and put an "x" in the [] boxes for those. Even if you add your Address Book, you still need to scan the Daily Summaries because you might get legit email from new senders that are not in your Address Book yet.
The main issue we are working on right now, is that AccuSpam users are seeing too many spam subjects in their Daily Summaries. The spam is being blocked, but it is too much to wade through every day to find the legit email. On average, 60% of spam is being automatically deleted and the other 40% is showing in the Daily Summaries.
The reason is because there are not enough AccuSpam users yet for the global statistical methods to kick in. When they do, then we predict AccuSpam users will see spam subjects in their Daily Summaries very rarely.
While we are waiting for the # of AccuSpam users to increase, we need to provide better than 60% performance in terms of the Daily Summaries. Realize even now, 100% spam is blocked from Inbox. I am just referrring the % of spam subjects in the Daily Summaries.
Within a few days or less, we will implement some extra filters to deal with the 40%:
1. We will implement the very accurate (1 in 1 million false positive) PER USER domain blocking as per previous post I made in this thread yesterday. This may delete a significant portion of the 40%. Maybe 39% of it (did not yet run the data to see how much would be caught)
2. We noticed spammers are spoofing the AccuSpam confirmation messages and these are showing up as spam in the Daily Summaries. We will detect those automatically (may have that done today).
3. We may add a bayesian filter or other SpamAssassin filter to RANK the subjects in the Daily Summaries, but never delete from the Daily Summary using those inexact type of filters.
Be patient as you use AccuSpam, knowing that we are continually observing the results and finding ways to improve it.
Also please feedback here or via email to <support@accuspam.com> any issues you want to bring to our attention so that we can know about things that need to be fixed or improved.
-Shelby Moore
http://AccuSpam.com
accuspam
07-23-04, 09:54 PM
Add to my previous post (on page 2 of this thread), that all the effort AccuSpam users are expending now to scan their Daily Summaries is being recorded and is not wasted effort.
When we flip a switch to turn on the PER USER domain blocking, all that effort will be rewarded by an instant reduction of the # of spam subjects seen in the Daily Summaries.
Keep using AccuSpam and keep scanning your Daily Summaries. You will be rewarded very soon. You are already being rewarded in terms of 60% removal of spam subjects from Daily Summaries and 100% blocking from Inbox.
It will only get better very soon.
-Shelby Moore
http://AccuSpam.com
accuspam
07-23-04, 10:43 PM
For the potential users of anti-spam, perception is apparently all that matters in marketing.
Let me tell a short story about something unrelated to anti-spam and unrelated to AccuSpam in order to make my point. I relate to this to security of passwords. It is a well known fact among us computer scientists that using a password that is composed of recognizeable words is very risky, because all a hacker has to do is run through all combinations of a dictionary of known words, proper names, and word fragments. This is much faster than running through all combinations of all digits. For example, if we have a dictionary of 10,000 words and we try all 1 and 2 word combinations as passwords, then we get, (10,000 + 10,000 x 10,000) = 100 million combinations to try. A computer which can try a million combinations a second, will only take 100 seconds to hack any possible password made of 1 or 2 words. Whereas compare that to passwords which don't use words, but use random digits, e.g. "j24vshvg5s7g4b2G". Then given 26 letters in lowercase alphabet, 26 in uppercase, and 10 numeric digits, then all combinations for a 16 digit password would be: (26+26+10)^26 = 47672401706823533450263330816 combinations. Same computer would take 47672401706823533450263 seconds to try all combinations, which happens to be 1511681941489838 years!!!!!
So which password do you think is more secure, the one that takes 100 seconds or the one that takes 1511681941489838 years to crack?
However, users still prefer to use passwords that contain recognizeable words, because they can remember them more easily. Alas, the user is not currently hacked, so they are under a false sense of security.
Okay so now let me relate this to anti-spam and AccuSpam.
The current state of anti-spam is that many (most?) users are currently using an anti-spam system based on Bayesian and/or hueristic rules (guesses). For example, Spam Assassin is a very popular product installed by many ISPs. Many users are quite satisfied with their current results, just as they are satisfied with their current recognizeable word passwords. Just as we were all satisfied with dates in all our programs that wrapped back to 0 after the year 2000. Remember the massive effort it took to fix that before year 2000?
The problem is that all a spammer has to do is change a few things and these Bayesian and hueristic filters can go awry. For a particular user, maybe their pattern of use is such that they haven't noticed a problem yet, but the fundamental problem is lurking and will happen eventually. For example, I know a "security expert" sysadmin who swears SpamAssassin has near 0% false positive rate for him (even though the published stat is 0.5% for SpamAssassin as used in McAfee SpamKiller, which is a horrible 1 in 200 non-spam emails lost), and it could be that his use of email terminology in his non-spam is very different from the spam he is currently receiving. Also he has the knowledge and time to tweak SpamAssassin to his personal use. But once a spammer sends him email with "sysadmin" words, e.g. "server", "downtime", "pager", "linux", etc., then either his non-spam will get flagged as spam or his spam will not get caught.
Thus products like SpamAssassin will constantly require tweaking of the hueristic rules (guesses) to keep up with the changes in the "quirks" of spam that the rules detect. I bet the anti-spam companies such as Norton and McAfee would love for users to get locked into monthly updates of rules. How convenient as a way to charge for upgrades!
Whereas, AccuSpam is based on the principle of deterministic statistics. We want to solve the problem once and for all. We target the unmorphable aspect of spam, i.e. that spam is email sent in large quantities and undesired by the majority of recipients. That is the most exact and agreeable definition of spam I know of. Whereas, those bayesian and hueristic anti-spam are defining spam to be "bad content" or "bad headers" or "bad relay server" or a zillion other things which have some correlation to but are not what spam is.
So don't be surprised if you use recognizeable word passwords that you will get hacked one day and by the same line of logic, if you use a bayesian or hueristic anti-spam (not AccuSpam), then don't be surprised if you lose important email or get swamped in spams or email viruses one day. We can apply the same logic to anti-virus software which is also hueristic or based on previously seen viruses, not on future unknown viruses.
AccuSpam is deterministic. That is the bottom line.
For now, we have a little bit of a marketing problem because all the user cares about is the performance on the first day they use a product, not the ongoing performance. But we can say right now that AccuSpam prevents 100% of spam from reaching the Inbox. That is sure. And we can say, if you scan your Daily Summaries then you will never lose important email. And we can say that the Daily Summaries will contain less and less spam subjects as we progress...
-Shelby Moore
http://AccuSpam.com
accuspam
07-23-04, 10:58 PM
There I go again. In my haste, I made another typo.
Correction:
16 digit password would be: (26+26+10)^16
Note the "^16" instead of the "^26".
Also let me explain that "^16" is shorthand for "raised to the power of 16".
Thus the long way of writing it is:
(26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) X (26+26+10) = 47672401706823533450263330816
The reason is because you have 16 digits (places to put the 26+26+10 possibilities).
Also remember this password example has NOTHING to do with how AccuSpam works. It was merely an unrelated example to point out that users do not care about security. All they care is that passwords are easy to remember.
-Shelby Moore
http://AccuSpam.com
Bouncer
07-23-04, 11:21 PM
Why do you feel like you need to know how it works?
-Shelby Moore
Shelby, this is a (somewhat) technical site. People are curious about inner workings or they wouldn't be here. They also feel more confident about a system whether it's QPSK based signalling oriented, or software oriented when they have what they consider to be a reasonable understanding of the material.
Regards,
-Bouncer-
Bouncer
07-23-04, 11:35 PM
I'm going to ask a simple question.
Is Accuspam based (wholly or partially) on probability vectors as related to header info?
Regards,
-Bouncer-
accuspam
07-23-04, 11:59 PM
Good news! I just ran some queries on AccuSpam data, and very soon AccuSpam users will see their Daily Summaries drop to near 0!
It is very simple. The vast majority of unspoofed spam is coming from same domains over and over. Remember AccuSpam deletes the spoofed spam so 60% is already gone. The other 40% has been going into the Daily Summaries but none of it (100% protection) has been going into the Inbox.
Now we will be able to delete the 40% from the Daily Summaries so it will be easier to find any legit email from new senders (not have to scan so many subjects).
So for the new AccuSpam user, they will only have to disapprove (return the Daily Summary) for spam senders a few times, then AccuSpam will statistically recognize that a domain is a spam domain FOR THAT USER ONLY.
The ability for users to help other users delete spam will not kick in statistically until there are closer to 100,000+ AccuSpam users, but it may not even be necessary. Apparently the improvement we will make this week as described above, will be enough to clean up the Daily Summaries very effectively.
For the more technically inquisitive, here is an except of the query I did. With Tony's permission, user #20 is TonyT (you all know him here as Senior member of SpeedGuides). Tony current has 98 emails waiting to be summarized in his next Daily Summary and 64 of them from unique senders. And from 2nd query, we see that all 98 has been for domains which were disapproved more than 3 times already by Tony (and probably not approved, although I did not do that query yet but will be part of the statistical calculation).
mysql> SELECT UserId, COUNT( DISTINCT ConfirmId), COUNT( DISTINCT SenderId ) FROM confirm WHERE Status='Confirm'
-> GROUP BY UserId ORDER BY UserId;
+--------+----------------------------+----------------------------+
| UserId | COUNT( DISTINCT ConfirmId) | COUNT( DISTINCT SenderId ) |
+--------+----------------------------+----------------------------+
...
| 20 | 98 | 64 |
...
+--------+----------------------------+----------------------------+
1700 rows in set (0.07 sec)
mysql> SELECT s.Tld, c.UserId, COUNT( DISTINCT c.ConfirmId ), COUNT( DISTINCT s.SenderId ) FROM confirm AS c
-> LEFT JOIN disapproved AS d USING( UserId )
-> LEFT JOIN sender AS s USING( SenderId )
-> WHERE Status='Confirm'
-> GROUP BY s.Tld, c.UserId HAVING COUNT( DISTINCT s.SenderId ) > 3;
+-------------------------+--------+-------------------------------+------------------------------+
| Tld | UserId | COUNT( DISTINCT c.ConfirmId ) | COUNT( DISTINCT s.SenderId ) |
+-------------------------+--------+-------------------------------+------------------------------+
| 3kinmy.com | 20 | 98 | 4 |
| 8it0cop.com | 20 | 98 | 5 |
| aa05.com | 20 | 98 | 4 |
| aagregory.com | 20 | 98 | 7 |
| absoff10.com | 20 | 98 | 5 |
| adknowledge2.com | 20 | 98 | 7 |
| adknowledgemail.com | 20 | 98 | 4 |
| blmngp.com | 20 | 98 | 4 |
| boolever.com | 20 | 98 | 4 |
| centerunionac.com | 20 | 98 | 5 |
| centerunionaj.com | 20 | 98 | 5 |
| choosemymail.com | 20 | 98 | 4 |
| classoffers.com | 20 | 98 | 6 |
| crysholef.com | 20 | 98 | 4 |
| crysholgh.com | 20 | 98 | 4 |
| estrategics.com | 20 | 98 | 4 |
| estratfg.com | 20 | 98 | 5 |
| golunga.com | 20 | 98 | 11 |
| great-dealz.net | 20 | 98 | 4 |
| greatestdealsaround.com | 20 | 98 | 4 |
| hh02.com | 20 | 98 | 4 |
| hotmail.com | 20 | 98 | 4 |
| ofcsvr.com | 20 | 98 | 4 |
| ofmsvr.com | 20 | 98 | 11 |
| pactkl.com | 20 | 98 | 4 |
| pclght11.com | 20 | 98 | 4 |
| pfiftyonemustang.com | 20 | 98 | 4 |
| rashpie.com | 20 | 98 | 12 |
| smilepop.com | 20 | 98 | 4 |
| squibkl.com | 20 | 98 | 4 |
| squibnetworks.com | 20 | 98 | 4 |
| trlcnt12.com | 20 | 98 | 4 |
| uu02.com | 20 | 98 | 4 |
| velvettooth.net | 20 | 98 | 4 |
| vmadmin.com | 20 | 98 | 11 |
| winsomab.com | 20 | 98 | 5 |
| wzone03.com | 20 | 98 | 5 |
+-------------------------+--------+-------------------------------+------------------------------+
accuspam
07-24-04, 12:12 AM
> Is Accuspam based (wholly or partially) on
> probability vectors as related to header info?
Only the From: header and the probabilities are determined by the statistically significant opinions of AccuSpam users as to which unspoofed senders' addresses and domains are sending ONLY spam, where ONLY is statistically significant to the accuracy we have chosen. And this is calculated in a way that any one user can not influence the statistics (malicously or erroneously).
For the rare (lets use 1 in 1 million) falsely flagged non-spam, the senders whose email is deleted by AccuSpam, (unlike those blocked into the Daily Summaries which get confirmations that do not require any additional action from sender), get a challenge response which allows them to escape the blacklist into the Daily Summary and then possibly become Approved Sender.
Thus the statistics are self-correcting, even in the worst possible rare imagineable scenario.
-Shelby Moore
http://AccuSpam.com
I feel like a dunce. Even after reading all of this, I still don't understand how it all works. As it is, I am preparing a huge export of my address book and typing in every possible e-mail address I might need to let through
Burke:
This is how it works: (simplified)
1. You login to accuspam using your email info:
username:
mail server:
password:
email address;
2. Once logged in you enable or disable accuspam by the checkbox.
3. You then use your email client same as you always did.
4. When you check for new messages, accuspam will have removed the spam BEFORE it is downloaded by your mail client.
5. You will receive Daily Summaries from accuspam which contain messages that have been flagged as spam. This summary contains WHO sent the message and the SUBJECT of the message. In each message on this list in the Summary are these brackets: [ ]. If you wish to receive any of these flagged messages you put the appropriate characher inbetween the brackets and simply Reply to the Daily Summary message. (send back to accuspam) If you do not place a character inbetween the brackets the message gets deleted and accuspam "learns" based on user input. As the number of users grow, the system gets "smarter" and the statistical spam blocking improves.
When someone not in your Approved Senders list sends you a message, he receives an auto response telling him that his message was received. He does NOT have to reply to this auto response. (no hassle) But this auto response has an option for him to reply, and if he does, it forces faster delivery of his message. (much like using your mail client to mark a message as 'Priority')
The Daily Summary can contain message that you DO want to receive. In this case, you put an X inbetween the brackets and send the message back to accuspam. And this user will be auto added to your Approved Senders list. By using this method (instead of adding entire address book right away), accuspam will 'learn' faster about spam and all of your wanted messages will still arrive in your mail client's inbox. Someone whom you desire to receive messages from will only ever have ONE auto response sent to him, unless you fail to approve him via the daily summary. (adding his address to Approves Sender list also stops auto response sent to him)
accuspam
07-24-04, 07:30 PM
TonyT, thanks for the excellent summary of how to use AccuSpam in previous post.
For those who want more details on why "naive" Bayesian is mathematically flawed, I dug us this old email I wrote. Note there may be some errors in here, because I did not take the time today to study what I had written then and see if I had corrected myself later. Any way, this will at least stimulate the thought process about why Bayesian can fall apart if the "probabilities of features (i.e. words) are independent" is not true, i.e. if spam and non-spam start sharing the same word sets.
I also publicly posted portion of this email at the bottom of this public post:
http://forum.icann.org/lists/stld-rfp-mail/msg00061.html
Subject: ELABORATION on Probability Theory of: Multiplicative principle for anti-spam
Cc:
1. The math assumption, "P(a | b) = P(a) * P(b)", in forwarded email below is derived, where "|" is intersection of two mutually exclusive events. If we are not sure they are mutually exclusively and we assume they are, then this is called "naive":
P(a | b) = P(b ! a) * P(a), where "!" is conditional probability, i.e. "if"
P(a I b) = P(b) * P(a), because P(b ! a) = P(b) if a and b are mutually exclusive events.
2. Incidentally, the P(a & b) = P(a) + P(b) - P(a | b) where "&" is union of two mutually exclusive events. The derivation is:
P(a & ~a) = 1 = P(a) + P(~a), where is "~" is complement
P(a & b) = P(a) + P(b | ~a)
P(b) = P(a | b) + P(b | ~a)
P(b | ~a) = P(b) - P(a | b)
3. So the probability of spam being caught by the intersection and union of two filters (events), is P(a | b) and P(a & b), as shown above. However, if a spam is caught by the intersection of two filters, then the probability (confidence) that the caught spam is really spam is:
P(a @ b) = P(a) * P(b) / [P(a) * P(b) + (1 - P(a)) * (1 - P(b))]
when the "a priori" probability of any email being spam is 0.5 and assume that P(a @ b) = P(~a @ ~b), i.e. that probability that caught spam is really spam is equal to probability that not caught spam is really not spam.
This is intuitively correct because P(0.5 @ 0.5) = 0.5. That is to say that intersection of two filters which say catch 50% of spam catch less (25%) spam, but the probability that the caught spam is really spam does not change, because the filters had a equal probability of catching spam and not catching spam.
The derivation is:
http://www.mathpages.com/home/kmath267.htm
Thus:
P(0.95 @ 0.95) = 0.95 * 0.95 / (0.95 * 0.95 + 0.05 * 0.05) = 0.997 = 99.7%
Thus although caught spam decreases by using intersection of two filters, the probability that the caught spam is really spam increases.
However, note that if we have measure false positive rate of 0.01%, then the predicted 0.3% (100% - 99.7%) is incorrect, so it means either or both of the assumptions in #3 are not true. This is very interesting, because Paul Grahams "Naive Bayesian" makes these assumptions in 2 different equations:
i) http://www.paulgraham.com/naivebayes.html
Same eq as #3
ii) http://www.paulgraham.com/spam.html
Here Paul Graham assumes:
P(a ! b) = P(b ! a) / [P(b ! a) + P(b ! ~a)], thus assuming P(a) = P(~a)
Because the Bayesian equation is:
P(a ! b) = P(b ! a) * P(a) / [P(b ! a) * P(a) + P(b ! ~a) * P(~a)]
Paul admits these assumptions:
http://www.paulgraham.com/better.html
"Probabilities in this algorithm are calculated using a degenerate case of Bayes' Rule. There are two simplifying assumptions: that the probabilities of features (i.e. words) are independent, and that we know nothing about the prior probability of an email being spam."
>Date: Thu, 15 Apr 2004 00:12:22 +0800
>From: Shelby Moore <shelby@coolpage.com>
>Subject: Multiplicative principle for anti-spam
>
>An idea to contemplate is that if you take two spam filters that are 95%
>effective, and have 1% false positive rate, then if you only delete the spam
>which is caught by *BOTH* filters, then the effectiveness is 0.95 * 0.95 =
>90.25% and the false positive rate is 0.01 * 0.01 = 0.01%. So 90 out of 100
>spams are caught and only 1 in 10,000 legit emails are caught.
>
>So one idea for anti-spam is to apply multiple highly effective filters to
>reduce the false positive rate.
-Shelby Moore
http://AccuSpam.com
accuspam
07-24-04, 09:28 PM
Correction to previous post:
> Good news! I just ran some queries on AccuSpam data, and
> very soon AccuSpam users will see their Daily Summaries
> drop to near 0!
There was a mistake in the query I used. It now looks like perhaps 25% or more of the spam subjects in the Daily Summaries can be eliminated with PER USER domain statistical blocking.
It could be greater, possibly even much greater, as data builds.
We will know more when we implement that next week.
Also we still have the GLOBAL domain and sender statistical blocking to look forward to as the # of AccuSpam users increase.
Once again, want to re-iterate that 100% spam is blocked from Inbox (paid version, or 99% for free version). We currently seeing around 60% deleted immediately. Then about 40% summarized in the Daily Summary email. We think this 40% can be reduced by at least 25% with simple PER USER domain stats blocking. To be implemented next week. So then you would have 70% deleted immediately and 30% summarized. However it could be better than that. We will know next week. And on the horizon, as the # of AccuSpam users increase we will see that move towards something like 99.9% deleted immediately and only 0.1% summarized, due to the GLOBAL (users helping other users) statistical blocking.
I will be away for next 3 days or so. Direct any issues to Tony please.
-Shelby Moore
http://AccuSpam
Basically a global blacklist based on user input and "known" spammer domains.
Sweet. :)
I will be away for next 3 days or so. Direct any issues to Tony please.
Gee wiz Mr. accuspam, like I don't have enough to do already! :rotfl:
accuspam
07-25-04, 11:00 PM
I have a day to work on AccuSpam before being away for 3 days. Hopefully can finish the outstanding important todo list.
> Basically a global blacklist based on user input
> and "known" spammer domains
There is more going on than that. For example, there is anti-spoofing algorithm, which is essential to enable blocking by sender address and sender domain, otherwise spammer could just hide behind infinite spoofed sender addresses.
And there is an overload algorithm which will automatically detect real account that are being used for spamming in large volume once we have a large volume of users. This is for example to detect that spammer is using a real account on Hotmail and we can not block the entire Hotmail domain.
But if I had to summarize the blocking by domain, then you are not far off. Perhaps more accurate summary would be:
A real-time, statistically accurate global (domain and sender) blacklist based on user input and a statistically accurate per user global (domain and sender) blacklist based on user input.
The only way we can see spammers attempting to subvert our system are:
1. Proliferating the non-spoofed domains they own.
2. Responding to the confirmations sent by AccuSpam.
3. Joining AccuSpam and making user input which alters the statistics.
We do not think any of these are practical for the spammer:
1. The "cost" of acquiring a new domains will be prohibitive. Not the cost of paying for the domain, as this can be done with stolen credit cards, but the "cost" of acquiring, providing a server, and configuring them. If a spammer were to get very clever in this regard, then we could detect their IP in real-time instead of their domains. But this won't be necessary because the number of users giving feedback will far exceed the rate at which spammers can add new domains. The ROI on spam is so low, spammers can not afford the "cost". Even if AccuSpam covered 1% of all email, e.g. 20 million users, spammers still would have more incentive to ignore AccuSpam than drastically increase their costs.
2. We are monitoring this. So far none have that we have seen. If ever they do, then we can just make the "response" a little more "human effort".
3. This has been thought out and is covered in detail in the patent application we are filing. For one thing, they could never influence in a way that would cause non-spam to be lost. The most they could attempt to do is cause more spam subjects to be in the single summary email sent by Accuspam robot. This still would not reduce the 100% spam blocking. Practically speaking it would be easy to statistically detect these abusive accounts else the spammer would need so many accounts that it would be as prohibitive impractical from a "cost" standpoint, as the proliferation of domains in issue #1.
-Shelby Moore
http://AccuSpam.com
morbidpete
07-25-04, 11:05 PM
Quick Up date from me..i used to use an old e-mail account..but closed it cause of my lack of knowlage of spam back in the day..(when i was like 13)..i reciently re-opened it...got like 80 something spam e-mails a day..so i saw this thead..signed up...working great bro!...love it seriously..ant blocked a single legit e-mail as of yet..(been using it for a week now)..like it alot bro..thanks abunch!
accuspam
07-25-04, 11:47 PM
A bug reported by Tony has been fixed.
This fix only affected the deleted spam counts you see in the summary emails from accuspam robot. The spam was being deleted, just the count wasn't being incremented. It is now fixed.
We were not incrementing the deleted spam count when 100% blocking not enabled (e.g. free version) and spam was deleted either due to being undeliverable or from responding to the summary from accuspam robot. Count was being incremented for disapproved senders and other scenarios.
The reason for this oversight was when we changed from dual email proxy (100% blocking) to single email address for free version signup, the spams were no longer in the PUBLIC mailbox when they were deleted, so the dual proxy code thought the spam had already been deleted.
-Shelby Moore
http://AccuSpam.com
accuspam
07-26-04, 04:41 AM
Building on my previous post:
http://forums.speedguide.net/showpost.php?p=1368469&postcount=32
I find it interesting to dwelve into the 650 heuristic rules (guesses) being used by SpamAssassin:
http://spamassassin.apache.org/tests.html
I think the default is SpamAssassin deletes any email with a score > 3.5, so it is relevant to look at the rules which have scores > 1.0 and which are likely to occur in non-spam email.
It is amazing to me that anyone would use a product like this if they care about not losing important email. Just read this entire post and I think you will shocked at the kind of non-spam you could lose potentially lose with SpamAssassin.
Here are a few poignant ones I noticed from a quick browse of the rule list above:
a) Weird repeated double-quotation marks WEIRD_QUOTING 1.138
b) Multipart message mostly text/html MIME MIME_HTML_MOSTLY 1.591
c) Message only has text/html MIME parts MIME_HTML_ONLY 0.884
d) HTML and text parts are different MPART_ALT_DIFF 1.319
f) Message body has 70-80% blank lines BLANK_LINES_70_80 1.117
g) Message body has many words used only once UNIQUE_WORDS 2.791
h) Message body mentions many internet domains DOMAIN_RATIO 2.251
Okay so do not expect to get your legit email if it has confluence of repeated quote marks, mostly HTML, different HTML and text portions, mostly blank lines, has many words used only once, mentions many domain names.
It may be that those aspects of email content are rarely non-spam, but it is not true all the time. So legit email can be lost by those rules, depending on the variety of content your non-spam senders send you.
Okay now on to the more risky rules I noticed:
i) NJABL: dialup sender did non-local SMTP RCVD_IN_NJABL_DUL 0 1.580
What that means is that if someone is traveling and uses a different relay to send their email than the ISP they are logged into. I do that all the time when I send email.
j) Received via a relay in Spamhaus XBL RCVD_IN_XBL 0 2.333
k) Received via a relay in list.dsbl.org RCVD_IN_DSBL 0 2.767
l) Received via a relay in bl.spamcop.net RCVD_IN_BL_SPAMCOP_NET 0 1.783
m) Received via a relay in RSL RCVD_IN_RSL 0 1.043
n) Relay in RBL, http://www.mail-abuse.org/rbl/ RCVD_IN_MAPS_RBL 1
o) Relay in DUL, http://www.mail-abuse.org/dul/ RCVD_IN_MAPS_DUL 1
p) Relay in RSS, http://www.mail-abuse.org/rss/ RCVD_IN_MAPS_RSS 1
q) Relay in NML, http://www.mail-abuse.org/nml/ RCVD_IN_MAPS_NML 1
What that means is that if Spamhaus, DSBL, SpamCop, or RSL does not like your ISP's (or host if you send email from a web site) email relay policy, then your sent email can get blocked by SpamAssassin. It actually happened to me when I sent my mom an email from a T-Mobile wireless account from a StarBucks cafe.
r) Viagra and other drugs DRUG_ED_COMBO 1.200
Hope you do get legitimate emails about drugs if someone in family gets sick.
s) From: starts with nums FROM_STARTS_WITH_NUMS 1.312
t) From: contains numbers mixed in with letters FROM_HAS_MIXED_NUMS3 1.092
Hope no one you want email from ever emails you who has an email address that begins with numbers or contains numbers!
u) Reply-To: is empty REPLY_TO_EMPTY 1.280
Hope no one you want email from ever emails you who does not have the Reply-To address set in their email program.
v) Subject contains lots of white space SUBJ_HAS_SPACES 1.763
Hope no one you want email from ever emails you with spaces in the subject.
w) Date: is 3 to 6 hours after Received: date DATE_IN_FUTURE_03_06 1.417
x) Date: is 12 to 24 hours after Received: date DATE_IN_FUTURE_12_24 2.163
Hope no one you want email from ever emails you with clock set wrong on their computer.
y) Subject: contains G.a.p.p.y-T.e.x.t GAPPY_SUBJECT 1.457
Hope no one you want email from ever emails you with cutesy subject.
z) HTML font size is large HTML_FONT_SIZE_LARGE 1.068
Hope no one you want email from ever emails you with large fonts.
A) HTML: images with 0-400 bytes of words HTML_IMAGE_ONLY_04 3.080
Hope no one you want email from ever emails you with a computer screen capture or other image that contains many words.
B) HTML has a low ratio of text to image area HTML_IMAGE_RATIO_02 1.437
Hope no one you want email from ever emails you with an email that is mostly images and not much text.
ETC, ETC, ETC
-Shelby Moore
http://AccuSpam.com
accuspam
07-26-04, 10:34 AM
Fixed so that emails that spoof your own email address are deleted:
http://www.accuspam.com/faq.php#as_self
That fixed some strange messages coming from @accuspam.com, which were actually confirmation messages being sent back to sender, sending being yourself.
Fixed so that when one AccuSpam user email the another one for the first time, then the confirmation message is automatically delivered (does not go into the summary from accuspam robot). Consider it auto-whitelisting for AccuSpam confirmations. And fixed so that spoofed emails from @accuspam.com, which are not from support@accuspam.com, are deleted (do not go into the summary from accuspam robot).
Anti-forgery (anti-spoofing) of support@accuspam.com is not yet enabled. We have seen no spam from spoofed support@accuspam.com yet. Please let us know if you do.
-Shelby Moore
http://AccuSpam.com
I signed up... but now I've disabled it. When I approve a sender: Placing an "X" in the box, I don't believe I'm then getting the valid email delivered to me. Also, I haven't seen the decrease in spam that is claimed. A dozen or so have showed up in the last few days... and I've never asked for viagra/mortgage/adult solicitations.
I signed up... but now I've disabled it. When I approve a sender: Placing an "X" in the box, I don't believe I'm then getting the valid email delivered to me. Also, I haven't seen the decrease in spam that is claimed. A dozen or so have showed up in the last few days... and I've never asked for viagra/mortgage/adult solicitations.
see below
1. you signed up when?
2. a dozen or so spam out of how many total messages?
3. I don't believe I'm then getting the valid email delivered to me. Did you compare delivered mail to Daily Summary sender & subject line to verify this?
Typed my earlier post at work today.. didn't have time to elaborate.
1. Last week... don't remember the day
2. Probably 30 emails total
3. Verify? The subject was about a person from whom I was expecting a reply today. I replied to the accuspam email and "X"ed the (I think) appropriate box. She was added to my approved list... but I never got the original email.
I don't doubt the validity of your system... maybe I'm just not doing something correctly.
Typed my earlier post at work today.. didn't have time to elaborate.
1. Last week... don't remember the day
2. Probably 30 emails total
3. Verify? The subject was about a person from whom I was expecting a reply today. I replied to the accuspam email and "X"ed the (I think) appropriate box. She was added to my approved list... but I never got the original email.
I don't doubt the validity of your system... maybe I'm just not doing something correctly.
OK, thanks for clarifying.
I will have the developer investigate this further when he returns from his travels in a few days. As for "verify", you can check the copies of your Daily Reports that were sent to you and compare senders-subjects with received messages. If you replied to the Daily Reports, gthen there will be a copy in your Sent Messages folder and there you can verify if put X in correct messages and you can compare the subjects with received messages.
accuspam
07-26-04, 09:35 PM
Please do not spread rumors which are NOT true.
I am very confident there is no such bug in AccuSpam!
If someone is on your Approved Senders list, then you will get all their email in your Inbox. If you are not getting their email in your Inbox, then I am very confident it is not AccuSpam. There could be many other causes, and you will have to provide additional information, so we can help you track down what actually happened.
If someone is not on your Approved Senders list, then you will get their email in your Daily Summary from AccuSpam, and then you merely need to type an "X" in the box for that email, and then it will be delivered to your Inbox. You are claiming that you did that, but you are claiming the email was not delivered. Let's analyze your claim so we can get to the bottom of this.
First of all, you are stating that the sender was added to your Approved Senders list when you put the "X" in the box for the Daily Summary. Is that correct?
If yes (which I think you already stated above), then we know that AccuSpam correctly reacted to your action of putting the "X" in the box. You claim you did not receive the email, but there are other possibilities. You may have accidentally deleted it from your Inbox. You may have spam filters enabled else where (such as in your email program or another anti-spam that you signed up for and forgot to disable) that deleted the email. There are other possibilitiies that have nothing to do with AccuSpam, which would could explore.
But first, let's verify that AccuSpam is actually working correctly for you. Go turn on AccuSpam, and then have that sender email you. If you receive the email from that sender, then you know AccuSpam is working correctly and there is not bug. I am very confident this is the case. You will have to prove me wrong, for me to believe otherwise, because 1000s of emails have been delivered correctly from Approved Senders. It is a very simple section of code that I just looked at again.
Also you claim that AccuSpam failed to catch many spams. First of all, you are using the free version, and it is explained here:
http://accuspam.com/faq.php#as_spam
that unless you upgrade to paid version, there are scenarios where you can still get spam. For most people, those scenarios are very rare, but it is possible that for you, those scenarios are more common. However, when you say you got spam, do you mean in your Inbox or in the Daily Summary? We do expect you to get spam in the Daily Summary. That is how the user feedback works. But this decreases over time to near nothing. If the spam was in your Inbox, then you need to read the link above. There is also the possibility that you put an "X" in the wrong box and approved the spammers. Look at your Approved Senders list and see what senders are there. You can remove any senders which you mistakenly added there.
If you are serious about this report, please work with us to help you find out what went wrong. I am very confident you will discover there is no bug in AccuSpam related to the delivery of email from Approved Senders. There may be some obscure bugs in AccuSpam, but I am very confident not in the main things such as this. This kind of thing has already been fairly well tested. However, you did say you had this problem weeks ago, and in that case it is possible there was a problem back then which has since been fixed. I do not remember any problem with delivery of email from Approved Senders, but I do remember fixing a problem with email corruption. It is possible that you did not notice the delivered email because it was corrupted. That has long since been fixed.
Lastly if you turn off AccuSpam, make sure you do it immediately AFTER receiving the delivered emails from your reply to a Daily Summary, because at the moment there is something on my todo list to fix, where when a free user (not paid) disables, then any emails in the quarantine are deleted instead of delivered. So make sure the quarantine is empty before disabling. I intend to fix that within this week.
So if you were turning on and off AccuSpam back and forth, yes that could cause it to lose email. That will be fixed. That is probably what happened to you. I am confident that if you turn it back on, your email will not be lost.
-Shelby Moore
http://AccuSpam.com
accuspam
07-26-04, 10:45 PM
Daily Summary reduction coming within days.
Will implement this during or soon as get back from trip.
Just keeping processing your Daily Summaries in meantime, because I can see the stats are building and I can see for example that Tony (user #20) currently has 314 emails in his quarantine (Daily Summary) and 174 of them would be deleted by the PER USER statistical domain blocking if it were implemented today. Tony was getting 2000+ spams per day, so about 80% is being deleted automatically. The rest is going into his daily summary. When I implement the PER USER statistical domain blocking then that remaining 20% will be chopped in half, so only about 10% of his spam will still be arriving as summarized subjects in the daily summary. And that will probably improve over time further.
I've seen a # of user disabling AccuSpam with no feedback from them why. I assume it is because the Daily Summaries still have about 20 - 40% of the spam summarized by subjects. As stated above, the stats have not kicked in yet. Please be patient.
Also I think some users expect 100% blocking from Inbox, when they are using the free version and it is explained that here that is not possible unless you use the paid version:
http://accuspam.com/faq.php#as_spam
Right now, any email larger than 32KB is going to go in your Inbox if you are using the free version. This is because you have no proxy with the free version. There is nothing AccuSpam can do about that unless you upgrade to the paid version. If you want to know? I can go into great detail about why. It has to do with that both AccuSpam and you are login to same mailbox to get the email and thus you are both racing each other to get to the email, and also that AccuSpam has to pull the email out of the mailbox to quarantine it while it is statistically analyzed. With the paid version, you use a 2nd email address transparently (no one every knows you are) and this serves as a proxy so that AccuSpam does not have to remove emails from the mail to analyze them and then there is no race condition.
Read the link above to understand clearly why and when spam can get in your Inbox if you are using the free version. The paid version offers 100% blocking from Inbox.
-Shelby Moore
http://AccuSpam.com
"Please do not spread rumors which are NOT true."
Not purposely spreading rumors, just stating an observation... with the caveat that I may indeed be doing something incorrectly.
Perhaps I didn't wait long enough to get the email from the sender that was approved through the Daily summary.
I turned it on about 5 days ago.. just turned it off today.
Thank you for your input, it is obvious that you have put a lot of effort into this project. If I see good results with the free version in the coming days, I will definitely purchase the paid version. Your explanations have been very helpful.
One additional comment; Is there a more accessible place to place the "box" for approving/rejecting emails in the daily summaries? I feel like I'm checking and double checking to make sure I'm putting the X in the correct spot. Perhaps a brief list of suspect emails at the top of the summary, with the check box next to each listing?
"Please do not spread rumors which are NOT true."
Not purposely spreading rumors, just stating an observation... with the caveat that I may indeed be doing something incorrectly.
Perhaps I didn't wait long enough to get the email from the sender that was approved through the Daily summary.
I turned it on about 5 days ago.. just turned it off today.
Thank you for your input, it is obvious that you have put a lot of effort into this project. If I see good results with the free version in the coming days, I will definitely purchase the paid version. Your explanations have been very helpful.
One additional comment; Is there a more accessible place to place the "box" for approving/rejecting emails in the daily summaries? I feel like I'm checking and double checking to make sure I'm putting the X in the correct spot. Perhaps a brief list of suspect emails at the top of the summary, with the check box next to each listing?
We realize that YOU are not spreading rumors. That comment was directed at me in the most part as I had stated that "it's quite possibl;e that bugs exist" when in fact, there are no bugs in the main accuspam code, as Shelby explained above. I was unaware that the main code was 100% completed and I spoke too soon. As with any new software, there will be adjustments to be made to improve it here & there, and these adjustments are made to minor section of the code. Apologies if you felt that you were the target of that "rumors" statement.
accuspam
07-27-04, 10:09 PM
Not purposely spreading rumors, just stating an observation... with the caveat that I may indeed be doing something incorrectly.
Perhaps I didn't wait long enough to get the email from the sender that was approved through the Daily summary.
I turned it on about 5 days ago.. just turned it off today.
That gives the impression that you think AccuSpam can take 5 days to deliver an email, which is impossible. If you replied to the Daily Summary (which actually happens twice per day), then within 5 minutes the sender for the [] box you put an "X" in is added to Approved Senders list and all email from that sender is immediately delivered to your Inbox.
Have you tried the tests I asked you to do yet?
Thank you for your input, it is obvious that you have put a lot of effort into this project. If I see good results with the free version in the coming days, I will definitely purchase the paid version. Your explanations have been very helpful.
One additional comment; Is there a more accessible place to place the "box" for approving/rejecting emails in the daily summaries? I feel like I'm checking and double checking to make sure I'm putting the X in the correct spot. Perhaps a brief list of suspect emails at the top of the summary, with the check box next to each listing?
Thank you also. How could we make it more brief? As it is, only has Subject and Sender.
Currently on todo list:
1. Move the unique Id (the 16 strange characters before the [] box) to the bottom of the email, so they don't not get in way of browsing.
2. Add a link at top of Daily Summary to (an alternative way) secure login web page where you can select the emails to deliver by clicking.
3. Rank the suspect emails using Bayesian and/or RBL and/or their BULKNESS count, etc.
4. Add the Urgent Emails and Quarantine folders to the AccuSpam login area.
5. Increase the 32KB to perhaps 128KB for first 30 days trial of free version.
6. Enable the PER USER statistical domain blocking.
And much more...
-Shelby Moore
http://AccuSpam.com
"If you replied to the Daily Summary (which actually happens twice per day), then within 5 minutes the sender for the [] box you put an "X" in is added to Approved Senders list and all email from that sender is immediately delivered to your Inbox."
That most definitely did not happen within 5 minutes. I still haven't gotten a particular original email that I approved in my Daily summary. I have gotten other email from that approved sender since that time.
Since yesterday morning, I've received 9 emails in my inbox, 5 were legitimate (including 2 accuspam daily summaries) and 4 were spam. It is possible that the 4 spam messages are not caught due to my using the free version.
Below is the text from one reply that I must look through to find the appropriate [] box to check. My suggestion is to put the [] box and subject closer to the top, before the instructions... I've figured it out by now... just a lot to scan through initially.
On Tuesday 27 July 2004 07:30 pm, AccuSpam Robot wrote:
> READ CAREFULLY PLEASE!
>
> Please click Reply to send this entire email back,
> and then type an "X" in the [] boxes for only the
> emails below, which you wish to be delivered to
> your Inbox.
>
> When you reply, emails without an "X" or "D",
> are PERMANENTLY DELETED and can not be recovered.
>
> Occasionally NON-SPAM EMAILS WILL APPEAR BELOW
> from senders who never emailed you before, so
> make sure you scan all email subjects below
> before replying.
>
> Since you joined AccuSpam:
> 3 spams deleted (most automatically)
> 9 emails delivered (most automatically)
> 24% of your email has been spam
> 100% of this spam has been blocked from your Inbox
>
> When you reply with an "X" in the [] box, then you
> will never see this message again for that sender, the
> sender is added to your Approved Senders list, and all
> future emails from that sender will be automatically
> delivered to your Inbox.
>
> To deliver an email below, but NOT add the sender to your
> Approved Senders list and NOT auto-deliver all future
> emails from the sender, then type a "D" instead
> of an "X" in the [] box.
>
>
>
> Sender: gcrwvjymlse@llcifllctrader.com
> Subject: What are your plans for tonight.?, A play friend would sure m...
> _rC399XTNB4b3frFE [] Deliver and Approve Sender?
>
>
>
>
> When you reply, you are helping AccuSpam statistically
> detect spam. Your reply is deleting your spam and the
> spam of the other AccuSpam users. Also the replies of
> other AccuSpam users is deleting your spam before this
> message is sent to you, thus reducing the number of spam
> subjects you must review in this message. As the number
> of AccuSpam users grow, the frequency of these messages
> and the number of spam subjects in them will reduce
> eventually to almost never. Thus you are required to reply.
> Statistically even erroneous or malicious replies of other
> users can never delete your non-spam.
>
> To illustrate the rationale for replying, assume the number
> of new deliverable, unforged spam senders per day to be
> 10,000. Thus with 10,000 AccuSpam users, each user will
> only have to review 10 or less spam subjects per day. That
> takes into account an approximate factor of 10 for
> statistical safety. Then with 1 million AccuSpam users (i.e.
> only 1/10th of 1% of all email users), each AccuSpam user
> would only have to review 1 spam subject every 10 days.
>
>
> ----------
> To stop these emails and disable AccuSpam,
> Login to your AccuSpam account,
> uncheck Enable AccuSpam, and click the Save Changes button:
>
> http://www.AccuSpam.com/login.php
Hi- the program sounds quite exciting- I'm going to try it myself before I recommend to clients. Will it also work on IMAP / Exchange mail, or only POP mail? This is a BIG issue for me, since virtually all of my clients are using exchange server mail.
Thanks!
Tim
just a bump..... waiting for TonyT
Still hoping, so still bumping. Starting to look pointless. Oh well. :(
accuspam
07-29-04, 06:30 AM
Hi- the program sounds quite exciting...Will it also work on IMAP / Exchange mail, or only POP mail?...
Thanks. IMAP is planned, but not yet implemented. And the free IMAP version will have some additional benefits over the POP version. I assume Exchange will also be supported if we are having success with the POP version.
So please give us some time (probably at least 6 weeks or more) to ramp up the POP version, then if that is successful, expect the IMAP and Exchange support to be added.
We've got very important things to finish on our todo list first (as previously mentioned in this thread).
-Shelby Moore
http://AccuSpam.com
accuspam
07-29-04, 06:50 AM
That most definitely did not happen within 5 minutes. I still haven't gotten a particular original email that I approved in my Daily summary. I have gotten other email from that approved sender since that time.
I am very confident that AccuSpam sent the email within 5 minutes of you putting an "X" in [] and replying to daily summary.
To prove to yourself that AccuSpam is doing that, I suggest you Login at AccuSpam.com, remove that particular sender from your Approved Senders list, then the next time that sender emails you, it will be in your Daily Summary. Then put the "X" in the [] box and wait 5 to 10 minutes. If you do not receive the email, then come back here and report it.
Else we can all assume that AccuSpam is working correctly and there is no bug. So if we do not get a report from you saying that you tried the above and it failed, then we can all safely assume that I am correct and that AccuSpam is working correctly in this regard.
As for why you never received the email that you claim you did not, like I said, there are a zillion possible reasons that have nothing to do with AccuSpam. For one thing, your ISP may very well be using real-time blacklists. Maybe your ISP deleted the email when AccuSpam sent it back to your mailbox. There are MANY, MANY other possibilities that have nothing to do with AccuSpam.
First step, is to try what I suggest above and verify for yourself that AccuSpam is working. If it works, then you can start to think about other reasons (not AccuSpam) to explain your claim. If it does not work, then report it here, and we can begin to investigate by having you try some specific other steps.
So first go try that. Thanks.
Since yesterday morning, I've received 9 emails in my inbox, 5 were legitimate (including 2 accuspam daily summaries) and 4 were spam. It is possible that the 4 spam messages are not caught due to my using the free version.
An interesting question may be "How many spams were deleted in that time?" (as stated in the Daily Summary you get from AccuSpam). If you get a lot of spam, you will definitely find that AccuSpam is blocking like 99% of it in the free version. If you are only getting a few spams a day, then maybe you do not even need AccuSpam unless you want the 100% blocking of paid version.
The free version can not stop all the spam from your Inbox. It will range from perhaps 90 - 99% blocking for the free version. The spams you got were either larger than 32KB or they arrived within 5 minutes of when you checked your email each time. Those are limitations of the free version which do not exist in the paid version. Note, on our todo list is to consider increasing the 32KB to 128KB for 30 day trial of free version. And possibly to decrease the 5 minute window to 3 minutes. Watch for us to post a notification of any improvements here.
Upgrade to paid version for 100% blocking from Inbox.
Below is the text from one reply that I must look through to find the appropriate [] box to check. My suggestion is to put the [] box and subject closer to the top, before the instructions... I've figured it out by now... just a lot to scan through initially.
The instructions need to be at the top for the new users else they will have no clue what to do.
I have two additional ideas (building off previous todo post):
7. Move the bulk of the daily summary instructions to the end of the email, after the user has successfully replied with an "X" in the [] box to a few summaries.
8. Number each email summary, perhaps put some type of line between each summary, and perhaps put the [] box above each email summary.
Thanks for your feedback and testing! We appreciate it.
-Shelby Moore
http://AccuSpam.com
accuspam
07-29-04, 07:09 AM
...I still haven't gotten a particular original email that I approved in my Daily summary. I have gotten other email from that approved sender since that time.
Since yesterday morning, I've received 9 emails in my inbox,...4 were spam. It is possible that the 4 spam messages are not caught due to my using the free version.
What email address do you Login to AccuSpam.com with? I want to study your account metrics.
It seems like you get quite a low amount of spam? If yes, it is probably true that your ISP is filtering some of your spam. And then that would probably explain why that particular email was lost. Your ISP probably filtered it, when AccuSpam resent it back to your mailbox.
If you ISP is not filtering spam, then maybe you have some spam filters enabled in your email program?
ISP which silently filter legitimate email (even rarely) is one of the motivations I had for creating AccuSpam. I was sick of seeing our business and support email being incorrectly deleted by inaccurate spam filters.
Any way, try the suggestion I gave in previous post so we can isolate that AccuSpam is not the cause.
-Shelby Moore
http://AccuSpam.com
accuspam
07-29-04, 11:19 AM
Made an improvement to catch spoofed emails which are sent to an alias of yourself. This only affects people who have multiple aliases (email addresses) which receive email in the same mailbox:
http://www.accuspam.com/faq.php#as_self
"Note that emails to an alias of the same mailbox are also deleted."
accuspam
07-29-04, 11:31 AM
The news is getter better!
Looking at user #20 (Tony) as our reference metric (considering he was getting 2000+ spams a day before AccuSpam), I see he currently has 266 emails in quaratine awaiting to go out in the next Daily Summary. And looking at the stats collected thus far from Daily Summaries Tony has replied to, when I enabled the PER USER statistical domain blocking (hopefully tomorrow), then 169 of those emails would be deleted!
So this means, approximately 85+% of Tony's spam is being deleted by AccuSpam immediately. The remaining 15% is being shown to Tony in a Daily Summary (the spams are not going into Inbox).
With the PER USER statistical domain blocking enabled, I expect 95+% of Tony's spam to be deleted immediately and only 5% going into the Daily Summary (not into Inbox).
And this trend appears to be getting better, so probably within another week or so, Tony will be approaching 100% spam deleted immediately.
I want to make it clear, we are talking about % of spam summarized in the Daily Summary. AccuSpam never (if using paid version) delivers spam to the Inbox. The free version caveats are here:
http://www.accuspam.com/faq.php#as_spam
To summarize, the good news is that the statistical aspect of AccuSpam is going to kick in for a user only about a week or two after enabling and replying to the Daily Summaries. And it appears that the performance will be very competitive to the other types of anti-spam, except without the risk of losing legitimate email that plague most other anti-spam.
We believe in by our marketing promise:
"AccuSpamTM is the ***ONLY*** anti-spam in the world which blocks 100% of spam and never fails to show you the non-spam."
-Shelby Moore
http://AccuSpam.com
accuspam
07-29-04, 10:39 PM
Discovered that with the free version if you receive an email from a NEW sender (someone not already in your Approved Senders list) and the email is larger than the size AccuSpam will extract from your mailbox for the free version (e.g. currently 32KB) then as expected, the email will remain in your Inbox (whether it is spam or not). Then AccuSpam mentions it in your next Daily Summary, but by the time you reply to the Daily Summary, you have already downloaded that email, which is as expected. The aspect I did not realize was that when you reply to Daily Summary with an "A" (or not) for that email, then it is not added to the Approved Senders list (or disapproved list) as expected, because you already downloaded the email and AccuSpam thinks the reply may be forged (the email to which is refers no longer exist cause it is no longer in your mailbox but in your Inbox).
This is not really a major problem, because the next time the sender emails you, then it will be in your Daily Summary again and if it is not larger than (currently) 32KB, then sender will be added to Approved Senders list when you reply to Daily Summary and "A" it (or added to your disapproved list if you do not "A" it). However, note that for the case of spammer who is sending spams larger than (currently) 32KB, it means you will never be able to disapprove them. Except this is also not a problem, because Accuspam's statistical algorithm catches these spams and removes them.
Note in any case, no email is lost.
I suppose we could try to improve this obscure case by holding a record of emails which were already downloaded from mailbox for longer period of time, but this would complicate the code too much. And as explained above, an improvement is not necessary because no adverse effects are caused by above issues.
The paid version does not have this issue.
And when I increase the size the free version will extract from 32KB to perhaps 128KB, then this will reduce the likelihood of this issue to occur.
Let me explain in more detail about the free version and the issue of "proxy".
The way AccuSpam works is it checks your mailbox every so often (3 - 5 minutes for free version and less than minute for paid version) and then it analyzes the email there.
For the free version, AccuSpam must remove (extract) the emails from your mailbox before you check (download to your Inbox from) your mailbox, so that any spams are not delivered. Then for free version AccuSpam sends back to your mailbox any emails you choose to deliver (actually Approved Senders are never removed from mailbox).
So with the free version, the "proxy" (the way that AccuSpam stands between you and your incoming email) is that AccuSpam must race you to beat you to your own email. Since AccuSpam only checks every 3 - 5 minutes for the free version, if you check your email very frequently, then you will always be beating AccuSpam to your incoming email and thus AccuSpam will not catch much spam. However if you check your email less frequently, then AccuSpam catches more email before you do. It is basically a ratio. If you check email every 60 minutes, then AccuSpam has 60 / 63 = 95% chance of catching the spam before you do. If you check every 4 hours, then it is 240 / 243 = 99%. However if you have your email program to auto-check email every 30 seconds, then the chance that free version of AccuSpam can block your spam is reduced to ONLY: 0.5 / 3.5 = 14%!!!
So if you or your email program regularly check email every few minutes or less, then free version of AccuSpam is not going to work well for you. What is worse, is that you also increase the probability of locking AccuSpam out of your mailbox, because mailboxes can only be opened to one reader at a time.
If you use the FREE version of AccuSpam, try to not regularly check email more often than every hour or so. Else you will get a lot more spam in your Inbox.
The paid version does not have this issue, because it uses two email addresses, one is your PUBLIC email address that receives all your incoming email and the other is a SECRET email address which is used in a way that it is invisible to everyone and you. Thus AccuSpam forwards only the non-spam to your SECRET email address and thus you get 100% blocking:
http://www.accuspam.com/faq.php#as_configure
So with the paid version, AccuSpam never has to remove the email from the PUBLIC mailbox, because you do not check there for your incoming email. You check your SECRET mailbox instead. With the free version, AccuSpam must remove the incoming email asap. However, there has to be some limit to the size of the emails AccuSpam will remove, else spammers could sent 200MB attachments and quickly overload the AccuSpam hard disks. Currently this limit is set at 32KB, which means any email over 32KB is left in your mailbox, even if it is spam or virus. The paid version obviously does not have this problem, as stated above.
In the near future, we plan to increase the 32KB to perhaps 128KB or 256KB, at least for 30 days trial of the free version. This will increase the size of the emails that free version of AccuSpam will block.
Overall this is not a major issue for most users. But if you find you are getting a lot of spam in Inbox with free version, or wondering why you keep having to "X" the same sender in the Daily Summary, then read the above.
Overall the paid version will be much superior for people are willing to configure it:
http://www.accuspam.com/faq.php#as_configure
Note also that when we begin to license AccuSpam directly to ISPs, then the ISP can transparently implement the paid proxy for us without you needing to use dual mailboxes! But we can not wait for that, because we have to demonstrate AccuSpam working for real users in order to interest ISPs in licensing AccuSpam. And also we have to convince ISPs that AccuSpam is better than the free SpamAssassin:
http://forums.speedguide.net/showpost.php?p=1370373&postcount=46
-Shelby Moore
http://AccuSpam.com
accuspam
07-29-04, 11:34 PM
As promised, we have increased the size of the spam emails that the free version will block to 256KB from 32KB.
See prior posts from us in this thread, especially the post immediately above this one, as to why this is important.
For now, this is not limited to 30 days use of free version. If we must change it to a 30 day revert to 32KB, then free version users will get a notice before the 30 day period has expired. At some point in the future, we will probably use such a notice to urge free users to upgrade to the superior paid version.
Our intention is the free version will always remain free though, for people who are statisfied with less than 100% spam blocking.
-Shelby Moore
http://AccuSpam.com
thepieman
07-30-04, 04:21 AM
So, basically, you're stating that because you do not look at email content, you are immune from spammers.
So how do you block spoofed From addresses (to a valid domain), people who run off from free email sites (hotmail, yahoo, etc), people who buy domains just to run an SMTP server from; and still manage to detect the perhaps .001% of users who actually WANT to get mail from, say, xxxhotb4b3s.com?
Statistically, if you're basing your assumptions using NHST (Null Hypothesis Statistics Testing), you HAVE TO HAVE some sort of error level. Either Type 1 or Type 2 (False positives [calling non-spam spam], or false negatives [calling spam non-spam]) errors. If your alpha level is, say, .00001%, and your null hypothesis is that "this mail is spam", then you have an INCREDIBLY SMALL chance to correct a mistake if your software calls a mail spam (say the return email hits their server during a major network outage).
How do you avoid situations like that?
Paft...I thought you were a teenager. Forgive me if Im wrong. Your parents got an extremely intelligent kid on their hands. Amazing.
:thumb:
Pie
accuspam
07-30-04, 04:41 AM
=======
UPDATE#2: I have just made a new post with an even easier and 100% way to defeat Bayesian, as compared to what I original wrote in this post:
http://forums.speedguide.net/showpost.php?p=1386422&postcount=126
=======
Does any one else have any experiences to share about using competitive anti-spam?
I got 15 spams from last night (12 hours) in my Inbox from my BrightMail.com protected Earthlink account. I get 0 spams in my Inbox on my AccuSpam protected email account.
Worse, 2 of them BrightMail failed to block were phishing scams from "Earthlink" asking for me to give my password and credit card data,
And 2 others were phishing of "SunTrust" bank.
Since I consider BrightMail to be the best competitor to AccuSpam (over long term as AccuSpam gains membership and we refine it), I thus consider this to be very relevant. BrightMail is currently offered on most email accounts from many major ISPs such as MSN, Earthlink, Comcast, etc..
The paid version of AccuSpam blocks 100% from Inbox!
I still need to see more "spam of the future" before I can say with certainty that SpamAssassin will degrade further. SpamAssassin is probably the main realistic competitor to AccuSpam (agree with Chris on that...even my Mom's ISP and my host Pair.com uses it apparently):
http://forums.speedguide.net/showpost.php?p=1370373&postcount=46
http://forums.speedguide.net/showpost.php?p=1374076&postcount=66
But I did see some spams recently that were almost entirely an image and some were empty emails, both of which Bayesian (SpamAssassin's main algorithm) are useless against (unless some other SpamAssassin heuristic rule will flag them... and possibly also delete the non-spams!).
The spam of the future which will defeat Bayesian is really simple to do, as I will explain below. I am surprised more spammers do not do it. I guess they will when Bayesian has become more popular.
Most Bayesian anti-spam look at the top 10 or so "most influential words", which means the words with spam or non-spam probabilities farthest from 0.5. So all a spammer has to do is select words randomly from a dictionary and then only use the key words (such as "viagra" or "click here") only once.
That will defeat Bayesian, but SpamAssassin also has some other rules which could trigger such as the "too many words used only once", so the clever spammer will repeat the words. However, a future grammatical analysis rule could still trigger, so best for spammer to actually randomly select sentences from actual written materials from random sources (encyclopedia, etc), as this will have the best chance of randomly landing on correlations to a user's particular non-spam probabilities. I have actually seen at least one spammer doing this, and his spam almost always gets by BrightMail and I bet also past SpamAssassin. Other spammers will copy that eventually I bet.
When spammers start defeating Bayesian this way, then not only will their spam not get caught, but as users mark these emails as spams and feed back into the Bayesian probabilities, then due to the volume of the spam versus the legitimate email, the Bayesian will begin to classify many normal words as having high spam probabilities and then more and more legitimate email will get falsely blocked.
As long as spammers are pretty stupid, e.g. repeating the spam words, using all kinds of abnormal text and tricks such as periods in middle of words, using non-words, etc, then Bayesian will work very, very well and continue to improve.
It will only take a few spammers (to start a trend) who actually have a clue about how Bayesian works to do what I wrote above and destroy Bayesian as an anti-spam algorithm.
UPDATE:
Apparently one person has proven that the technique I described above will defeat Bayesian filters:
http://news.bbc.co.uk/1/hi/technology/3458457.stm
Additionally, I do not believe it is necessary for spammer to get feedback on which random words help the spam get past each recipient's Bayesian filter. I agree that this would be too much effort for spammers to exploit. Each recipient's filter will have different non-spam word probabilities.
Instead as I wrote above, all the spammer has to do is insert huge volumes of random words in the spam, and then the chance increases that one or some of those random words will correlate to the "non-spam" (closest to Bayesian probability of 0) words of the filter.
Also the spammer would have better results by inserting random words which are least used in normal writings, since it is the words that are used in that recipient's non-spam, but not in recipient's spam, which will trigger the Bayesian filter to think the email is non-spam. So words like "this" and "that" are useless (have probabilities near to 0.5, not indicating spam or non-spam) and are thus ignored by Bayesian filters.
Again the spammer needs to insert those words more than once (as Spam Assassin has special rule to detect many words used only once), and it would be even better to insert real phrases (or use madlib techniques to generate plausible sentences from the random words), so that grammatical analysis can not identify the spam.
The spammer would insert these at the end of the spam and would not be part of the spammer's message, just an extra text designed to get past many Bayesian filters.
What will then happen is that as recipients retrain their Bayesian filters to catch these spams which are not being caught, then those non-spam words become neutral words and then either (or probably both) the false positive rate of their Bayesian filter will increase and/or the false negative rate will increase. In other words, the Bayesian filter will block more non-spams and block less spams. The effectiveness will detoriate, and depending how well spammers implement these techniques, it is very possible that Bayesian will become much worse than using no filter at all.
Again the techniques that spammers currently use such as trying to obscure the spam words using zero 0 instead of capital O, just help Bayesian to identify the spam, since such techniques are almost never used in non-spam.
Once the spammers become a little less stupid, then Bayesian is dead.
Note the following technique will NOT work well for spammers, because it is training on one set of "good" emails, and "good" emails are different for every recipient, which is one of the strengths of Bayesian filtering. Instead the spammer must use huge volumes of randomness to automatically get hits on each recipient's non-spam filter database:
http://www.spamsolution.org/filterdefeat.html
However, I agree with the author's premise about the weaknesses of Bayesian, as expressed in the link above and more eloquently in this FAQ:
http://www.spamsolution.org/faq.htm
"...The most significant problem is in dealing with "mind changes." What I might consider spam this month I may not consider spam next month..."
-Shelby Moore
http://AccuSpam.com
I have another question.
If you want an algorithm with a 100% effectiveness rate and 0% spam lost, why don't you just do this:
Add a mail filter that states that unless a certain code (user-defined) is in the subject, it will be treated as "unknown" or "spam" (depending on global spam settings). The unknowns filter into the daily summary as always, but that pass-code gets the name put directly onto the approved senders list.
It seems like that would be a little easier for everyone involved than reading through, what was Tony's, 288 spam headers in the daily summary.
Just 2cp.
accuspam
07-30-04, 03:03 PM
...unless a certain code (user-defined) is in the subject, it will be treated as "unknown" or "spam" (depending on global spam settings...
It is unworkable because (rhetorically) how do you get NEW senders (those not already in Approved Senders list) to insert the passcode?
However, this idea is useful as a way to email yourself, although we do not yet support it because it is not any simpler than emailing your SECRET email address:
http://www.accuspam.com/faq.php#as_self
See the next post from us for a solution to the 288 (out of 2000+) spams in the Daily Summary.
accuspam
07-30-04, 03:33 PM
Enabled the statistical blocking by domain based on the statistics of the user (the sharing of stats between users will not take effect until we have more AccuSpam users) who has replied to the Daily Summaries and built up a list of disapproved senders.
Expect to see for example Tony's 288 summaries decrease by 50% and continue to decrease over the coming weeks. How fast or how much it will decrease, I can not entirely predict, but based on trends I would not be surprised if Tony gets 80 to 90% reduction in that 288. So maybe he will end up with 30 - 60 spams in daily summary out of 2000+ spams per day incoming. That could be 98 - 99% spam detection rate, 100% spam blocking from Inbox, and hopefully still a very low false positive rate (in the range of 1 in 10,000 to 1 in million). This would exceed any other anti-spam performance in the market.
Other AccuSpam users who are replying correctly to their Daily Summaries can expect similar improvement.
Also we have significantly refined the clarity and brevity of the Daily Summary.
On the horizon, we have the global statistical blocking to kick in as more users join AccuSpam to further increase the spam detection rate and hopefully futher raise AccuSpam above the other anti-spam options in the market.
accuspam
07-30-04, 11:41 PM
Enabled the statistical blocking by domain based on the statistics of the user alone (the sharing of stats between users will not take effect until we have more AccuSpam users) who has replied to the Daily Summaries and built up a list of disapproved senders.
Expect to see for example Tony's 288 summaries decrease by 50% and continue to decrease over the coming weeks...
We are very happy to report that there are only 66 summaries pending in Tony's next Daily Summary and it has been 8 hours since last Daily Summary was sent and only 4 more hours until next one is sent. Thus via extrapolation, we see an approximate reduction from 250+ summaries per Daily Summary to 100 summaries. This is as we expected (see quote above) and we expect this to continue to decrease as Tony builds more stats by replying to the Daily Summaries.
As well, all AccuSpam users should see similar improvement.
We have another idea which we will implement by Tuesday, which should limit that no one ever sees more than say 50 summaries in their Daily Summary, while insuring that any not shown in the Daily Summary are very (hopefully still 1 in 10,000 to 1 in million) likely to be spam, and this will be only while these stats build up and summaries reduce to near 0 over time.
So far AccuSpam is working really well. The current main challenge is to reduce the number of spams which are summaries in the Daily Summary, so that it is easier and less work to find non-spams there. We know that over time, the spam summaries decrease due to this building of statistics. The challenge is how to get new users over the hump while they build stats, so they do not lose interest before the spam summaries decrease.
As stated above, we have several ideas to limit the # of summaries and make the initial statistical training more palatable.
Realize that as the # of AccuSpam users grows, then the sharing of stats between users will take effect and then initial training will not be necessary, as the other users' stats will instantly take effect for new users.
We are in sort of a "chicken and egg" (neither can occur if other doesn't come first) quagmire, but we have ideas to improve further coming by Tuesday...
What email address do you Login to AccuSpam.com with? I want to study your account metrics.
It seems like you get quite a low amount of spam? If yes, it is probably true that your ISP is filtering some of your spam. And then that would probably explain why that particular email was lost. Your ISP probably filtered it, when AccuSpam resent it back to your mailbox.
If you ISP is not filtering spam, then maybe you have some spam filters enabled in your email program?
ISP which silently filter legitimate email (even rarely) is one of the motivations I had for creating AccuSpam. I was sick of seeing our business and support email being incorrectly deleted by inaccurate spam filters.
Any way, try the suggestion I gave in previous post so we can isolate that AccuSpam is not the cause.
-Shelby Moore
http://AccuSpam.com
I have found a possibility in regards to my not getting newly approved emails within 5 minutes (or ever), after replying to the daily summaries. I often check my email through Comcast's web interface. When I approve a sender via the daily summary, from Comcast's web interface, I do not get the email mentioned in the daily summary. When I approve a sender via the daily summary using my home POP client, I believe I am getting the expected result. Comcast does have a spam filtering (screened) process. I can only see this through their web interface. Copious amounts of spam are in that "screened" category. My email is iaustin AT comcast dot net
Sorry guys......I gave up on your service after a couple days of use.
Too much work to reply to your emails in the Daily Summary Of Possible Spam.
It was also catching legitimate emails and classifying them as spam. I do not have the time to go through an approved senders list every single time I have a new sender of some sort.
I use Spambayes...the free open source Outlook plugin. I have been using it for a year now and it's worked flawlessly. It has NEVER classified a legitimate email as being spam. It's easy to use and it sits right in Outlook as a toolbar.
Sorry guys, I tried it but it's not for me.
Now how do I get my legitimate mail off your server? Since disabling your service, I can't get my legitimate mail.
accuspam
07-31-04, 02:51 PM
Sorry guys......I gave up on your service after a couple days of use.
Too much work to reply to your emails in the Daily Summary Of Possible Spam.
Thanks for the feedback. We knew that already and do not know if you've been following this thread, but we just reduced that load by 50% 12 hours ago, and plan another 50% reduction (to bring it 25% of original load) by next week.
How many spams do you receive a day with no filter?
How many spams a day were you receiving in the Daily Summaries? (assuming you were replying every day to clear them out which from below is apparently not true so then this question would be one you can not answer)
My point is let's us get a ratio here so we can compare apples to apples. It looks like before our improvement 12 hours ago, AccuSpam was only placing 20 - 40% of spam into the Daily Summaries. Apparently now we have reduced that to 10 - 20%, and by next week to be reduced to roughly 5 - 10%. That is approaching the performance of Bayesian (typically ranges from 95 - 98%), without the risk of getting a spam in your Inbox as you do with Bayesian (AccuSpam blocks 100% from Inbox by using the Daily Summary), and without the risk of classifying a legitimate email as spam 0.5% of the time (e.g. 1 in 200 emails).
In other words, with Bayesian 2 - 5% of spam gets through to Inbox and you have to find your legit email mixed with that. And 1 in 200 or 0.5% of non-spam (and potentially much worse at any time in future, see link below) goes into your Spam folder where you will be seaching for it among 95 - 98% of the spam, unlike in AccuSpam where any new senders (not yet on Approved Senders list) will only be burried in 5 - 10% spam (by next week) and to eventually be reduced to near 0% spam.
http://forums.speedguide.net/showpost.php?p=1374288&postcount=69
So I do not think Bayesian is even close long-term. I expect you may be back when Bayesian gets hammered by spammers as I predict (ask Tony about my past record of uncanny ability to predict future).
Actually we are seeing the usership of AccuSpam continue to increase :) The cancellations so far are only about 10% of signups which is a reasonable ratio in any marketing campaign. Actually I doubt my other businesses which aleady generate income do that well in terms of conversion of signups to retained users.
It was also catching legitimate emails and classifying them as spam.
Absolutely false! You misunderstood what the Daily Summary is. It is a list of emails that AccuSpam could not classify and is asking you to classify. In fact we have changed the subject to make it more clear:
"Twice Daily Summary: xx Possible Non-Spams Waiting Delivery or Deletion"
The important thing to analyze is what % of spam is in these summaries. Because if it is as I claim, 10 - 20% (now and to be 5 - 10% by next week), then the is far superior in terms of finding email from new senders there, than finding misclassified emails with Bayesian burried in 95 - 98% of the spam in the Spam folder.
I think probably you stopped responding to the Daily Summaries back when they were 40% of spam and let them accumulate, then they got so backlogged that finding your new senders became impractical. First of all, the 40% has been cut in half at least and will be halved again. Second of all, we realize we can not just let the summaries accumulate if you stop responding, and we will soon automatically delete these older items for you and not show you more than perhaps 30 emails in the Daily Summary. We will accomplish this by using Bayesian or other techniques to rank the emails in the Daily Summary. In short, we will take the best of Bayesian and other anti-spam and combine it with the best of AccuSpam, to eliminate the weakness of each. Bayesian alone, won't be able to come close.
I do not have the time to go through an approved senders list every single time I have a new sender of some sort.
You do not need to. You simply put an "A" in the [ A ] box next to their email in the Daily Summary. You simply scan the Daily Summaries. You never need to go look at your Approved Senders list.
You apparently misunderstood the point of the Daily Summary. You thought it was the caught spam. The caught spam is gone. It was deleted and you never saw it. That was (or is now) 80 - 95% of the spam deleted automatically. The Daily Summary is the unclassified email.
I use Spambayes...the free open source Outlook plugin. I have been using it for a year now and it's worked flawlessly. It has NEVER classified a legitimate email as being spam. It's easy to use and it sits right in Outlook as a toolbar.
Sorry guys, I tried it but it's not for me.
Now how do I get my legitimate mail off your server? Since disabling your service, I can't get my legitimate mail.
First of all, sorry is not needed. We provide a service for those who need it. Those who don't need it, need not apologize for their lack of need. Later you may need it, because I predict Bayesian is going to turn into a morass once spammers figure out my easy instructions to subvert it:
http://forums.speedguide.net/showpost.php?p=1374288&postcount=69
Bayesian is known to be one of the worst offenders in terms of misclassifying legitimate email as spam. Go back to page 1 of this thread and read the link to the research paper Tony and I both provided which shows this. Also since your misclassified legitimate email gets burried in the 95+% spam in spam folder, you probably would not know if email is getting misclassified, so it seems like NEVER. Maybe one day you will realize there is some important email that was lost. Or if I am wrong about that and Bayesian is truely perfect for you, just because your current spam happens to have PERFECTLY different word statistics from your current non-spam, is no assurance of future performance. Caveat emptor!
If you have disabled AccuSpam, then all email that arrives at your mailbox is no longer analyzed or touched by AccuSpam. As for email that was in the Daily Summary, you needed to reply to the Daily Summary BEFORE you disabled. I had posted that here several times already.
Sounds to me like you never followed the instructions of replying to the summaries, so that is why AccuSpam did not work for you.
Also AccuSpam works with any email program and any POP3 email address mailbox. Your solution ties you to Outlook. Also by using that Outlook plugin instead of AccuSpam, all your spam is being downloaded to your computer BEFORE it is deleted, which means:
1. Viruses files are getting downloaded.
2. As spam increases, the time to download your email increases.
3. With a viral spreading spam that sends 1000s of copies of large attachments to your mailbox, you might not even be able to get your email or your mailbox could overflow and ALL your legit email could be lost.
4. Since Bayesian not 100%, then some of those viruses are getting into your Inbox.
5. The other 95 - 98% of viruses are in your spam folder ON YOUR COMPUTER!
Etc, etc. etc.
-Shelby Moore
http://AccuSpam.com
accuspam
07-31-04, 04:04 PM
Let's analyze further why AccuSpam is superior to Bayesian anti-spam.
Users seem to like Bayesian anti-spam filtering, because with training, it can typically achieve spam deletion rates between 95 - 98%. This of course currently depends on the fact that currently spammers do not do the very easy trick to fool Bayesian which I detailed:
http://forums.speedguide.net/showpost.php?p=1374288&postcount=69
Comparing Bayesian on it's current (not factoring a bleak future) 95 - 98% deletion rate, we have to recognize two facts about the way Bayesian works:
1. It must constantly be retrained unless your non-spam and spam remain the same.
2. To train Bayesian, when a non-spam is misclassified into the Spam folder, then you will be searching ALL the spam any way, so it is the same as using no filter at all!
So when someone says that Bayesian NEVER deletes legitimate email, what they really mean is that either a) they do not know that for sure, or b) they know that only because the are searching all their spam every day.
As long as spam content does not mutate enough to place legitimate emails in the spam folder often, then the Bayesian user can tolerate ignoring the spam folder and only retrain on spams found in the Inbox (which will be 2 - 5% of all spam typically). This is works only as long as this assumption is true or true to a reasonable approximation.
So as I said, spammers are not prevented from using non-spam content. Once spammers realize this, then either you will have a much higher % of spams in the Inbox to retrain and/or a much higher % of non-spams burried in the spam folder.
So my point in comparing Bayesian to AccuSpam is the Bayesian has no future. AccuSpam is only in the first weeks of release and AccuSpam will always be unaffected by future mutations in spam content. AccuSpam currently does not even look at the content of emails it analyzes.
As we continue to tweak AccuSpam, we will approach the same spam deletion rates as Bayesian very soon (we are already at 80 - 90% to be 90 - 95% by next week hopefully) and eventually equal or exceed Bayesian in spam deletion rate. And AccuSpam will always have the advantage of blocking 100% spam from the Inbox.
And with AccuSpam, no matter how spam morphs, the user will never have to train on more than a few % of spam at most (as we reach and maintain our peak performance in coming weeks). Whereas with Bayesian, there is a real risk that the statistics can collapse, because there is nothing stopping a spammer from using non-spam content to fool Bayesian.
Whereas AccuSpam is detecting spam based on an immutable characteristic of spam, and that is that spam is sent in large quantities. If the spammers do not send in large quantities, then spam would not be a problem any more. Whereas spammers could change to using "non-spam" content as I detailed and defeat Bayesian:
http://forums.speedguide.net/showpost.php?p=1374288&postcount=69
For me, I can not tolerate investing all that time to train Bayesian, only to have it wiped out overnight if spammers switch to using non-spam content, meaning content that statistically (Bayesian) overlaps your non-spam content. Remember what makes spam annoying is not it's content, but the fact you get so much of it. So changing the content will not make spam any less spam. Whereas, decreasing quantity of spam sent in order to defeat AccuSpam would render spam no longer annoying.
-Shelby Moore
http://AccuSpam.com
Which brings me to this question:
Global blacklists. How is what you are doing any different than telling your POP server software to check the sending domain against ordb.org or some other blacklisting server, as well as keeping a private list of disallowed domains that the user has on top of that? The ordb is a "global" list, and the private list is user-only. Plus whitelisting.
Basically, that would be just as effective as what you have.
IF new_mail != spam.user AND new_mail != spam.ordb OR new_mail == mail.whitelist THEN send_mail(); ELSE delete_mail();
Or is that what you're doing, in the most simplistic terms?
My 2 cents re global blacklists:
Blacklists are good in that they can stop almost all spam, but they are erred in that they stop legit mail too. For instance, it does not take a volume of UBE to have one's domain end up on a blacklist. All it takes is ONE irritated or pissed off customer, or some guy who "thinks" he knows teh real source of a spam message to get someone's legit domain on a blacklist.
And once your domain is erroneously put on a global blacklist like ORDB, then good luck getting it removed! Some of these blacklist maintainers charge a fee to have a legit domain removed from the blacklist. After all, that's how they pay for their own hosting!
Bottom line is that there are absolutely NO accurate or honest blacklists.
Let's say some spammer sends out 5 mil messages with a spoofed From: address of @smithfamily.com . (and let's say smithfamily.com is a legit site) Now, some user somewhere gets a hundred spams that appear to be from smithfamily.com and then a hundres other users get the same. Eventually smithfamily.com ends up on blacklists. And Mr. & Mrs. Smith now wonder why their friends can no longer receive their newsletters of family announcements! And Mr. Smith now tries to get his domain removed from the blacklist and is told that (1) it cannot be removed or (2) it can be removed for a penalty fee.
Mr. Smith is now penalized, punished and extorted because some spammer spoofed his domain and some other criminal maintains a blacklist.
The ONLY way tio maintain and manage a blacklist with judgement is to do it manually. This is too costly for any business to do.
Also AccuSpam works with any email program and any POP3 email address mailbox. Your solution ties you to Outlook. Also by using that Outlook plugin instead of AccuSpam, all your spam is being downloaded to your computer BEFORE it is deleted, which means:
-Shelby Moore
http://AccuSpam.com
I think you need to brush up on SpamBayes because it is compatible with alot more than you think...even AOL mail. Exchange, POP3, IMAP...etc. are all supported.
http://spambayes.sourceforge.net/faq.html#compatibility
But thanks for the reply I think....I just wanted to retrieve my legit email. Good luck in your new endeavor.
Just one piece of advice. I would try to keep your instructions as simple as some of the spam that you are trying to avoid or else it will be easier to just read and delete spam. KISS.
My 2 cents re global blacklists:
Blacklists are good in that they can stop almost all spam, but they are erred in that they stop legit mail too. For instance, it does not take a volume of UBE to have one's domain end up on a blacklist. All it takes is ONE irritated or pissed off customer, or some guy who "thinks" he knows teh real source of a spam message to get someone's legit domain on a blacklist.
And once your domain is erroneously put on a global blacklist like ORDB, then good luck getting it removed! Some of these blacklist maintainers charge a fee to have a legit domain removed from the blacklist. After all, that's how they pay for their own hosting!
Bottom line is that there are absolutely NO accurate or honest blacklists.
Let's say some spammer sends out 5 mil messages with a spoofed From: address of @smithfamily.com . (and let's say smithfamily.com is a legit site) Now, some user somewhere gets a hundred spams that appear to be from smithfamily.com and then a hundres other users get the same. Eventually smithfamily.com ends up on blacklists. And Mr. & Mrs. Smith now wonder why their friends can no longer receive their newsletters of family announcements! And Mr. Smith now tries to get his domain removed from the blacklist and is told that (1) it cannot be removed or (2) it can be removed for a penalty fee.
Mr. Smith is now penalized, punished and extorted because some spammer spoofed his domain and some other criminal maintains a blacklist.
The ONLY way tio maintain and manage a blacklist with judgement is to do it manually. This is too costly for any business to do.
You misunderstand the purpose of sites like the ORDB. The Open Relay DataBase. They just - ONLY - scan for open relays that users submit. The kind most often used by spammers. And if it is found, then the domain owner fixes the problem and rescans to get it removed.
I don't see that as a bad thing.
accuspam
07-31-04, 09:46 PM
I think you need to brush up on SpamBayes because it is compatible with alot more than you think...even AOL mail. Exchange, POP3, IMAP...etc. are all supported.
http://spambayes.sourceforge.net/faq.html#compatibility
I stand by my previous statement that SpamBayes is only going to work with Outlook. Otherwise you will not be able to train the Bayesian in any near efficient manner (using the toolbar you alluded to).
If you can manage to install and configure SpamBayes for a POP3 account, which requires installing Python and numerous other steps which 99.9% of users will never be able to figure out (compare this with the instant signup of AccuSpam), it will still require you copy and paste or do individualized special actions (no SpamBayes in your toolbar in other email applications) for each spam that arrives in your Inbox (2 - 5% of all your spam) and any non-spam that goes into spam folder. Also requires configuring your email client to filter into folders based on header inserted by SpamBayes. Imagine if you get 1000 spams a day, you will be doing this manual action 20 - 50 times per day (as separate actions because no way to select all and click a toolbar if not using the Outlook plugin).
The Yahoo support is nothing near anything a normal user can configure. It requires dealing with a source code project for providing POP3 access for Yahoo. Then you've still got to do all the POP3 configuration mention above.
So realistically speaking, you are limited to Outlook with SpamBayes and SpamBayes will always have all the disadvantages of downloading all the spam and viruses to YOUR COMPUTER, whereas AccuSpam deletes and quarantines at the server.
I realize that SpamBayes claims IMAP support, but very few people have email accounts AND email programs with IMAP support.
...I would try to keep your instructions as simple as some of the spam that you are trying to avoid or else it will be easier to just read and delete spam. KISS.
Agreed, and we have simplified the instructions in the Daily Summaries. You turned it off BEFORE we had improved things (reduced spam in summaries from 20 - 40% download to 10 - 20% and improved the format and brevity).
However, I disagree that any amount of static instructions we might have (even if more than 1 page), which users only need to be read once, could compete with the effort to download and manually deletes 100s of 1000s of spam per week.
As I said in previous post, all that matters for users like you apparently is perception:
http://forums.speedguide.net/showpost.php?p=1368469&postcount=32
And you perceived that you were "dealing with" more spam in the early version of AccuSpam you tried as compared with your previous experience with SpamBayes. Again I think this was a confluence of:
1. Not understanding what is and how-to-use the Daily Summaries
2. Thus not using AccuSpam correctly.
3. Before we had cut by the spam in the summaries drastically.
4. That you do not seem to prefer the big risk of Bayesian (your perception is there is no risk when in fact there is big risk).
However, I agree with you that if I had a choice between browsing for new senders within 20-40% of my spam (when you gave up on AccuSpam early version), or using Bayesian, then I would choose Bayesian. But we are at 10-20% now for the daily summaries, and we will be at 5 - 10% hopefully by next week. And at < 5% within a month I hope. At that time, there will no advantage to Bayesian, we will not even have a perception problem any more.
Within month, it will be simple. You will the same or less spam in the single daily summary (no spam in Inbox) than you get as separate spams in your Inbox with Bayesian. And with AccuSpam, your legitimate email will never (1 in million) be silently deleted or burried in your 95+% delete spam folder. With AccuSpam, your email from previous senders directly into your Inbox without risk, and email from new senders within the < 5% spam in Daily Summary. Compare this to the very real risk with Bayesian that legitimate email could be burried in the 95+% deleted spam, so in effect you need to regularly browse ALL THE SPAM with Bayesian.
Thanks for the feedback and advice.
-Shelby Moore
http://AccuSpam.com
accuspam
07-31-04, 10:13 PM
What AccuSpam is doing is very, very different than filtering using public blacklists.
First, AccuSpam is deleting email from non-existent and forged senders.
Second, AccuSpam is detecting (by overload of the mailbox with confirmations) the use of free accounts to send huge volumes of emails from same sender.
Third, AccuSpam is **STATISTICALLY** detecting which senders and domains are blacklisted.
Thus the false positive rate of public blacklists does not apply to AccuSpam, because AccuSpam sets a **STATISTICAL** confidence. In essense, AccuSpam's "blacklist" is dynamic and maintained automatically by statistics.
So Tony, you are incorrect that a blacklist can not be automated, as that is essentially what the third aspect of AccuSpam is. There are other attempts at statistical collaborative blacklisting such as Vipul Razor (Cloudmark.com). I covered the differences with AccuSpam in previous post:
http://forums.speedguide.net/showpost.php?p=1367004&postcount=20
But it is correct that most (if not all) public blacklists have very high false positive rates (delete non-spam) if used as the sole metric for deleting spam. AccuSpam does not have these problems because it is *STATISTICALLY* maintained, not manually.
Although ORB is not a blacklist per se, it can be used as one or as a weighted metric in anti-spam and thus it also can generate a lot of false positives, because the fact that someone sends an email over an open relay does not definitely mean they are spammer, although it is pretty much that way now due to the "slash and burn" collateral damage of open relay lists being used to fight spam.
-Shelby Moore
http://AccuSpam.com
Ok...I will give it another shot and a more serious look through when I have time. Don't get me wrong, what you are doing is great but it's not what I'm used to.
Fortunately for me, I don't get much spam to begin with because I really don't surf to many questionable sites, and I use Mozilla. I also take advantage of Spybot S&D and Ad-aware. I also am an avid use of Norton (Corporate Edition) for AV purposes and keep it updated daily. So for me, what I consider important, is just the simple classification and categorization of spam within my inbox. When I open up my Outlook program...I go to my inbox which is free of spam....but I have two folders, a junk suspects, and a spam folder. So when I open up Outlook, all I have to do is verify what is spam in the junk folder and then delete what is in the spam folder.
But yeah...I will give it another shot.
thanks for the replies and help. :D
accuspam
07-31-04, 10:34 PM
It is interesting to quote directly from the SpamBayes training documentation:
http://cvs.sourceforge.net/viewcvs.py/*checkout*/spambayes/spambayes/README.txt?rev=HEAD&content-type=text/plain
1. "It's best to train on recent email, because your interests and the
nature of what spam looks like change over time."
2. "...you should train on a few spams and a
few hams on a regular basis. You should also try to train it on about the
same number of spams as hams."
Thus SpamBayes is recommending you:
1. Retrain often and continuously.
2. You search your entire spam folder often and continuously for the "hams", the deleted non-spams.
-Shelby Moore
http://AccuSpam.com
accuspam
07-31-04, 10:37 PM
Ok...I will give it another shot and a more serious look through when I have time...
Thanks. But please wait until I have posted here that we are indeed < 5% in the Daily Summaries. Hopefully we can reach that within a month.
accuspam
07-31-04, 10:44 PM
...Fortunately for me, I don't get much spam to begin with...my inbox which is free of spam....but I have two folders, a junk suspects, and a spam folder...delete what is in the spam folder.
I do not think AccuSpam is as attractive for people who do not get a lot of spam, unless they simply want the security of the 100% blocking or the deletion of spam at the server and other features of AccuSpam.
Your experience with Bayesian is I doubt representative of someone who gets a lot of spam because you are claiming 100% detection rate for Bayesian, which is well documented to be not the case.
And thus you don't have much spam in your spam folder either to look for deleted non-spams.
And thus you don't have enough incoming spam to trigger the false negatives and false positives that are known to occur with Bayesian.
Most AccuSpam users are getting 100s of spams a day.
In short, you are lucky.
(not many spams and your thus your spams rarely intersect your non-spams and thus Bayesian works okay for you).
I must be missing the entire focus of accuspam. I have read all the comments and I cannot figure out for the life of me what you are doing to filter out spam. Some of your keywords are confusing.
For example, you claim that, quote:
Second, AccuSpam is detecting (by overload of the mailbox with confirmations) the use of free accounts to send huge volumes of emails from same sender.
I assume that this means that you are looking at places like hotmail and such. However, how can you block the "same sender" if they can just automatically get a new account that has no stigma attached to it and start all over again? You can't block the entire @hotmail.com domain without causing problems, so what exactly is it that you're doing here? You don't read the content, so you don't know what is spam and what is not, and since I could go get 30+ hotmail addresses a day ON MY OWN, what's to stop a huge spamming company from getting 5,000+ addresses a day to spam you with? You can't filter the entire domain and you don't do content based filtering, so I am very confused.
Third, AccuSpam is **STATISTICALLY** detecting which senders and domains are blacklisted.
Through the use of the daily summaries? Where your users check off the spam and what's not the spam? What happens if your statistical analysis fails? And/or what happens if someone wanted to screw up your system, logged a whole SLEW of accounts on your server, and changed all the weights for the spam to be non-spam? A few good-sized spamming companies doing that could cost you a whole lot of credibility. And even if you could revert the changes, you'd still have that lost time and effort.
Unless you're willing to send out daily summaries filled with spam email subjects for your users to peruse? And if you truncate them like I read above you might, then won't your users who don't check email often get the "good" mail bumped off the list and the bad mail prevaliant?
I must not be understanding something here, but I can't see how you are doing exactly what you are doing.. it seems like it can be defeated easily, if I understand the technology at all.
I do not think AccuSpam is as attractive for people who do not get a lot of spam, unless they simply want the security of the 100% blocking or the deletion of spam at the server and other features of AccuSpam.
Your experience with Bayesian is I doubt representative of someone who gets a lot of spam because you are claiming 100% detection rate for Bayesian, which is well documented to be not the case.
And thus you don't have much spam in your spam folder either to look for deleted non-spams.
And thus you don't have enough incoming spam to trigger the false negatives and false positives that are known to occur with Bayesian.
Most AccuSpam users are getting 100s of spams a day.
In short, you are lucky.
(not many spams and your thus your spams rarely intersect your non-spams and thus Bayesian works okay for you).
Yeah...I'm lucky. I like to think of my use of Spambayes as more like Spam management. Being that I don't get that nuissance level of spam that many others experience, I just want something that will throw the spam into a folder for later inspection. When I trained Spambayes...all I did was collect all the spam I had in my inbox over a weeklong period...throw it into it's own folder and let Spambayes train on it.(that was about 100 spam emails total) After that, it's been doing a pretty good job for me.
But in the future, it would be nice for me to not get any spam at all. If that happens through the use of your program/service or through legislative mandate...then all will be well in the universe lol. :D
accuspam
07-31-04, 11:06 PM
...I cannot figure out for the life of me what you are doing to filter out spam.
I am being purposely vague our patent is not filed yet.
I assume that this means that you are looking at places like hotmail
No we do not look for specific domains a priori. The algorithm itself will detect that a spammer is sending bulk email with the From: address set to an email account with limited storage. Note Gmail.com does prevent a problem with it's 1GB free storage, but I am confident Google will decide to bounce our confirmations if received in bulk approaching > 10 MB, rather than let their system become a haven for spam senders.
The above should give you all the hints how it works if you read between the lines.
...What happens if your statistical analysis fails? And/or what happens if someone wanted to screw up your system, logged a whole SLEW of accounts on your server, and changed all the weights for the spam to be non-spam? A few good-sized spamming companies doing that could cost you a whole lot of credibility. And even if you could revert the changes, you'd still have that lost time and effort.
By definition the failure of a statistic is within a chosen confidence. We can easily statistically detect accounts that always approve the spam that the other users say is spam. That will be statistically obvious. Also these attempts can be found simply by the fact that they do not disapprove much diversity of spam domains.
...if you truncate them like I read above you might, then won't your users who don't check email often get the "good" mail bumped off the list and the bad mail prevaliant?
The chance of a truncated email being a non-spam will be the accuracy of other anti-spam which will still be low enough (even if 1 in 200 which is horrible) that it can not affect statistics. Actually the chance will be much better because we are still showing say the 30 best chance to be non-spam, instead of only one (as how the metric is quoted), so I expect (1/200) ^ 30 = NEVER. Also in the rare case (NEVER) that truncation would delete a non-spam then remember all blacklisted are challenged by AccuSpam (instead of confirmed, i.e. it reverts to a C/R anti-spam in blacklist case), so the sender will eventually get on to the Approved Senders list.
Also the truncation will rarely happen in future, because if we truncate 30, then if you get 1000 spams per day (500 per 12 hours summary), then 30 / 500 = 6%, so we hope AccuSpam is better than 94% very soon...users catching other users spams...we are at 10 - 20% (80 - 90% deletion) now just using statistics for the user's own data.
It took us 6 months to think through this very different way of looking at spam.
Spam is not "bad content". Spam is email sent in bulk that was not requested.
This business email account is a good test account for Accuspam. Here's where it stands after about 1 week of using Accuspam:
Since you joined AccuSpam:
7089 spams deleted (most automatically)
82 emails delivered (most automatically)
98% of your email has been spam
100% of this spam has been blocked from your Inbox
Not too shabby eh?
Not too shabby eh?
Assuming you don't call the Daily Summaries "spam", it's not shabby at all.
Assuming you don't call the Daily Summaries "spam", it's not shabby at all.
I used the term "shabby" as a multiple pun, in that the overall volume of messages is extremely high and that 98% of them are spam. Also, the amount of messages in my daily summary has be reduced 50% and will continue to drop, This is a unique test case business account in that all mail addressed to the domain goes to the one mailboox (*.coolpage.com). Thus the high volume of messages.
accuspam
08-01-04, 10:57 PM
Assuming you don't call the Daily Summaries "spam", it's not shabby at all.
If comparing this for example Bayesian, where you might be able to claim 98% detection on avg (2% false negative rate) and 0.5% (1 in 200) false positive rate (missclassified non-spam), then you might overlook the fact that with Accuspam the false positive rate is nearer to 0% (if user scans the Daily Summary) and that the Inbox is now 100% protected and 99.9% uncluttered. As well, the scanning for false positives in AccuSpam is within the (say 10%) spams in the Daily Summary only, and not in the entire spam folder (say 98%) as with Bayesian.
So it is more significant, than "not too shabby" compared to Bayesian and other best of breed anti-spam, if you consider all the factors I just stated in previous paragraph.
I got your point though. You are saying that the Daily Summary is delivered to the Inbox and it summarizes any email that AccuSpam could not classify, and which will be non-spam from new senders (not yet on Approved List) or spam. So your point I believe is that spam is still received in the Daily Summary.
But the Daily Summary is only one email (actually sent twice a day) and it only contains the summary of each unclassified email (e.g. sender and subject). So it is not the same as delivering spam to the Inbox, because it is one single email containing only summaries, so your Inbox is uncluttered and you have 100% protection from accidentally viewing or clicking a spam that might have a virus. And you are not downloading all those spams to your computer (time and risk of virus). In short, if the Daily Summary contains 30 spams, the number of emails due to spam in your Inbox is reduced from 30 to 1 per 12 hours.
Better yet, I am seeing that 80.1 to 90.3% of the spam in AccuSpam users mailboxes is currently being classified and deleted BEFORE the Daily Summary. So currently, only 9.7 to 19.9% of spam is being sent in the Daily Summary. But it is arriving as a single email, so it is really like 99.9% reduction in some sense.
Sometime this week (or next week at latest), I hope to implement a pure *INTERSECTION* (NOT union or semi-union like SpamAssassin) algorithm of our statistics, Bayesian, and RBLs to futher classify the emails in the Daily Summary. Read here (at bottom of page linked) about the advantages of the math of INTERSECTION:
http://forum.icann.org/lists/stld-rfp-mail/msg00061.html
Note there is some indication that RBLs caught only 24% of spam with 34% false positive rate, but (to be tested) I believe perhaps by the time we've eliminated the 60 - 80% non-existent senders and domains, then the we haven't eliminated any on the RBL, so RBL is really more like 84 - 99% detection. The high false positive rate of RBL is not a factor when using INTERSECTION instead of union:
http://www.paulgraham.com/falsepositives.html
At that point, I confidently expect the % of spams not classified and in the Daily Summary to drop to 5 - 10% or less. And I expect those 5 - 10% to be ranked such that normally you need only scan the first few for new senders, so the effective (psychological) rate will be approaching 99% or better. And the 0.5% false positive rate of Bayesian will not be realized. Instead we should see an effective false positive rate which is superior to what we have now (because new senders no longer burred in 9.7% to 19.9% of spam) and which is probably approaching 0% as claimed in our marketing (e.g. on the order of 1 in 10,000 or better).
In short, I expect the # of unclassified spams in the Daily Summary to drop to a few only.
And then within a month or so, as we have enough AccuSpam users for the global statistics to kick in, I expect the Daily Summaries to become Weekly Summaries because most spam get classified before the Daily Summary so no need to send it. And as we approach million users, I expect it to become a monthly summary and eventually we forget there are even summaries sent sometimes. At than point we will have the "holly grail" of anti-spam, which is 100% blocking and 0% false positive rate and no effort.
-Shelby Moore
http://AccuSpam.com
I have to admit that I am duly impressed with AccuSpam and hope to see it able to evolve as the tricks against it become harder and harder to block.
:thumb:
accuspam
08-01-04, 11:21 PM
...Here's where it stands after about 1 week of using Accuspam:
Since you joined AccuSpam:
7089 spams deleted (most automatically)
82 emails delivered (most automatically)
98% of your email has been spam
100% of this spam has been blocked from your Inbox...
Note it really closer to 14,000 spams deleted and 99+% email has been spam, because during that week, I had fixed a bug where a large % of the # of spams deleted was not being recorded:
http://forums.speedguide.net/showpost.php?p=1370300&postcount=45
accuspam
08-01-04, 11:34 PM
I have to admit that I am duly impressed with AccuSpam and hope to see it able to evolve as the tricks against it become harder and harder to block.
:thumb:
Thanks for the interest. The proof needs to come in further work. So I will probably reduce the volume of time I spending here talking for a while.
I only see these ways a spammer can ATTEMPT to subvert AccuSpam over the long term:
http://forums.speedguide.net/showpost.php?p=1370275&postcount=43
1. Proliferating the non-spoofed domains they own.
2. Responding to the confirmations sent by AccuSpam.
3. Joining AccuSpam and making user input which alters the statistics.
4. And forgot before to mention possibility that they can hijack a domain that also sends non-spam, such as hotmail accounts or even if they break into a server.
#3 is handled by the fact that they will (probably) not be disapproving any spam, only attempting to approve their own spam. If they do disapprove spam, then they effectively help AccuSpam delete more spam or at worst they simply all cancel each out statistically. However if they spoof legitimate senders and disapprove those, or disapprove sacrificial domains that they do not use to send spam to any one else, this will be more challenging to detect. Correlation can be used to detect that an AccuSpam account is not well correlated with other trusted accounts in terms of domains approved and domains disapproved. :p
And additionally consider how little the incentive a spammer has to try to defeat AccuSpam's statistical algorithms, when all it will accomplish is get the spam subject and sender into the Daily Summary, never into the Inbox (if using 100% blocking of paid version or at least 99% of time if using free version correctly).
#2 we could actually even disable this feature since it will be less and less necessary as the # of spams in the Daily Summary decreases. It is really only needed for the case of an urgent email sent by a new sender (one you do not know about). We can always add a http://captcha.net to it if we find spammers replying the confirmations.
#1 is the biggest risk. Could spammers actually proliferate domains fasters than Accuspam users could statistically block them? I think the cost for the spammer would not be worth it. Why should they change their whole business model if they only losing 1% or less of their market to AccuSpam? And unlike Bayesian where small tweaks to content can be experimented with by spammers, this would require a major cost decision by a spammer. We may have to watch for TLDs that are spammer friendly, e.g. if .ru decides to sell domains to spammers in bulk.
#4 is handled by the sending of the confirmations and the fact that most mailboxes overflow at 10MB or so. However, recently Gmail with 1GB free accounts and now Yahoo following with 100MB free accounts does present a possible challenge. The free email providers should be looking at the bounced email in real-time and disabling accounts in real-time that are generating huge volumes of bounces.
Also there is possible case of spammers spoofing legitimate domains that allow (do not bounce) any alias, i.e. *@domain.com, although this is not yet widespread as far as we've observed. The administrators of domains need to close these holes by bouncing all but recognized aliases, else they could find their domain being filtered (not just by AccuSpam but generally by many anti-spam and RBLs) and at least from AccuSpam they will get challenge email to this effect and giving them instructions how to fix. It is similar to the encouragement ISPs were given to close open relays, but in this case with much stronger incentive to the admin of the domain, who certainly does not relish millions of users receiving spoofed spams coming from his/her domain.
Also spammers could proliferate the # of free accounts they signup for. This is another real risk. But I would also say this is a big problem for these free email providers also, as the volume of spam being sent from or spoofed from their domains causes more legitimate mail from their domains to be filtered (not just by AccuSpam but generally by many anti-spam), so they will have to deal with it. If these free account providers could increase the manual (unautomatible) work of signup to say 5 minutes effort, then manual signup becomes COST impractical method for the spammer if the spammer now needs 10,000 new hotmail accounts (daily) for each 1 million mailing (daily).
Delaying signup is not effective because a spammer could initiate many signup sessions staggered in time. There has to be a work involved which can not be automated by the spammer. And also this would have to not disincentivize signup by legitimate users. Possibly before completion of signup of a free account, a "hashcash" type computation could be downloaded to the browser and performed in javascript, which returns the result as final step in signup. This may not be easy or practical (e.g. spammer could optimize the browser perhaps), so proliferation of free accounts for spamming may present a special challenge to AccuSpam in future. AccuSpam would need to employ additional algorithms against free accounts, such as looking for fingerprints of automated account generation, such as Bayesian, 3 consonants in row in the portion of the email address before the "@" sign, etc. Or essentially applying Spam Assassin hueristics to email from free accounts. Perhaps the most effective hueristic against email from free accounts, would be place a very high probability of spam on any email from free account which was not relayed directly from an IP which is a server of the free email account provider, by assuming that most legitimate free email users are not savvy enough to use their free email address as when sending from an email program not the free email provider web interface.
Can you think of any other ways spammers could attempt to subvert AccuSpam?
accuspam
08-03-04, 03:29 PM
Fixed a bug and corrected the database where if a new sender replied to your AccuSpam confirmation and the subject of the email being confirmed was long (e.g. > 70 characters or so), the sender could be GLOBALLY (for all AccuSpam recipients) marked undeliverable instead of adding to Approved Senders list of the intended AccuSpam recipient.
This was because our code was not correctly handling Subject headers split on to multiple lines.
Please do not send me private messages re accuspam, post your questions here in this thread.
Thank you,
TonyT
Please do not send me private messages re accuspam, post your questions here in this thread.
Thank you,
TonyT
OK then.
I have disabled my ISP's spam filtering service (Comcast's "screened mail" service). That was last Saturday. Since then I've gotten 61 emails, 9 of which are legitimate (including the daily summaries)... the remaining 52 have been spam. I checked a few of them, they're under the size restriction for the free version of accuspam to ignore. I'll gladly forward all of them to you in the name of research.
Or, maybe I'll copy/paste them all into a PM and send them to Tony.
:p
accuspam
08-08-04, 07:58 AM
...disabled my ISP's spam filtering service (Comcast's "screened mail" service...
FYI, Comcast recently licensed BrightMail.com, so you disabled what we feel is one of the best anti-spam and a very strong competitor to the paid AccuSpam version. However, as you know, BrightMail does not block 100% of spam, whereas the paid version of AccuSpam does.
...last Saturday. Since then I've gotten 61 emails, 9 of which are legitimate (including the daily summaries)... the remaining 52 have been spam...
Again I will repeat, this is because you are using the free version and the spams are arriving in the 3 - 5 minutes prior to when you check your email each time.
http://www.accuspam.com/faq.php#as_spam
"If you have not enabled the 100% spam blocking then the ability to block 99% will be reduced significantly if you have your email program set to check for new email every few minutes or sooner. You should either increase the frequency that you or your email program regularly check for new email to 30 minutes or more. Else, enable the 100% spam blocking."
I need to know the email address address of your account at AccuSpam so I can offer you more tips? You may email it to, support@accuspam.com, if you prefer.
Again I will ask you, how many spams does the latest Daily Summary state it deleted for you? Note, if you tell me the email address for your AccuSpam account, then I lookup that data in the database, else I need you to answer please.
Again I will repeat that I think AccuSpam is working as advertised for you. You have to follow the instructions correctly for AccuSpam to achieve the stated performance levels.
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.