PDF image spam?

Pictures, Pictures, Pictures...Talk amongst yourselves...

Moderator: Forum Moderator

PDF image spam?

Postby AnonymousDog » Mon Jul 02, 2007 1:07 pm

Getting much? Several of my sites that have a history of heavy image spam and/or stock spam are getting peppered with it. Looks like most of the tools for including pdf processing in FOCR are available, but decoder has said he isn't interested in making FOCR process pdfs.

Since it's looking like FOCRs not going to fill the need for pdf processing, anyone have any ideas how to filter image spam based on pdfs? These are typical image spam emails with bayes-poison or no contents aside from the attachment.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby mr88talent » Mon Jul 02, 2007 4:47 pm

Are you using MSRBL and sanesecurity virus definition databases?
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby Marius » Mon Jul 02, 2007 8:47 pm

I just saw our first reported instance of this here. Blank message body with a .pdf attachment, containing the image spam.

Very clever.

mr88,

I'm not using "MSRBL and sanesecurity virus definition databases". Do you have any info for getting started on those?
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby mr88talent » Mon Jul 02, 2007 10:54 pm

First install curl and rsync

Then:
Code: Select all
cd /usr/sbin
wget http://www200.pair.com/mecham/spam/UpdateSaneSecurity.sh.txt
mv UpdateSaneSecurity.sh.txt UpdateSaneSecurity.sh
chmod +x UpdateSaneSecurity.sh
UpdateSaneSecurity.sh

crontab -e

Insert this entry. Replace MM (minutes) below with a number between 1 and 59:
MM */4 * * * /usr/sbin/UpdateSaneSecurity.sh

Amavisd-new 2.5.x can treat most of these as spam instead of viruses.

Search for 'scam' on this page:
http://www.ijs.si/software/amavisd/release-notes.txt

Logs of the last download are located in /var/tmp/clamdb/
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby Marius » Tue Jul 03, 2007 12:30 am

Thanks mr88!

One thing I was wondering about; When I enter 'UpdateSaneSecurity.sh', nothing happens. At least visibly anyway. Does it normally provide no output when it us run?
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby mr88talent » Tue Jul 03, 2007 9:38 am

Correct - it should only output something if there was an error, and if you look at the code, it waits for a little while before it does anything. Look in /var/lib/clamav (or wherever your database is kept) and see if you see where extra files were added - phish, scam, MSRBL. For example:

drwxr-xr-x 2 clamav clamav 4096 2007-06-16 19:48 daily.inc
-rw-r--r-- 1 clamav clamav 9351789 2007-06-10 21:16 main.cvd
-rw------- 1 clamav clamav 260 2007-06-16 19:14 mirrors.dat
-rw-r--r-- 1 clamav clamav 347982 2007-06-16 19:25 MSRBL-Images.hdb
-rw-r--r-- 1 clamav clamav 228232 2007-06-08 04:33 MSRBL-SPAM.ndb
-rw-r--r-- 1 clamav clamav 1033688 2007-06-16 19:48 phish.ndb
-rw-r--r-- 1 clamav clamav 174338 2007-06-15 02:55 phish.ndb.gz
-rw-r--r-- 1 clamav clamav 516182 2007-06-16 19:48 scam.ndb
-rw-r--r-- 1 clamav clamav 102738 2007-06-15 02:55 scam.ndb.gz
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby AnonymousDog » Tue Jul 03, 2007 5:19 pm

Decoder has included pdf scanning in the latest SVN of FOCR. Bleeding edge alert!...untested stuff. I'll be trying it out this week sometime and report back. Decoder had some very passionate warnings that this would create unacceptably large number of false positive; so, I'll be watching for this in particular. I see that there is a page limit setting (for limiting processing to pdfs with that number of pages or less); that should help somewhat.

Also, is there a way for us to do this
Code: Select all
Right now I have mimedefang running 'pdfinfo $file 2>&1' for each pdf, and if it has an error, I'm quarantining the entire message.
in amavisd?
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby Marius » Wed Jul 04, 2007 10:51 am

Now that I'm using the sanesecurity virus definitions I'm seeing this in my maillog:

(15780-0) Blocked INFECTED (Html.Img.Gen034.Sanesecurity.07010302), [199.171.54.203] [192.16.10.228] <reedy> -> <user>, quarantine: virus/virus-IAHlcvh91jWK, Message-ID: <00fa01c7bd28>, mail_id: IAHlcvh91jWK, Hits: -, 2340 ms
Jul 3 00:07:00 myspamserver postfix/smtp[16294]: F2B33288001: to=<user>, relay=127.0.0.1[127.0.0.1]:10024, delay=3.8, delays=1.4/0/0/2.3, dsn=2.7.1, status=sent (254 2.7.1 Ok, discarded, id=15780-08 - VIRUS: Html.Img.Gen034.Sanesecurity.07010302)

(Notice the bold text above)

Does this mean that the message was blocked using the new definitions? Would it have other wise gotten through?

I've noticed a much higher amount of blocked virus messages since I applied the sanesecurity definitions. Very nice indeed! :)
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby mr88talent » Wed Jul 04, 2007 11:00 am

Yes that's right. The only unfortunate thing is, these are not really viruses but they are detected as such. As I mentioned, amavisd-new 2.5.x can turn these into spam.

There is also a PDFinfo plugin available at:

http://rulesemporium.com/plugins.htm

but you have to manually request it. People have had mixed results.
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby AnonymousDog » Wed Jul 04, 2007 12:22 pm

Mr88, are you using the combined msrbl or one/several of the others?

Also, looks like we'll have to remove the imageinfo plugin config when upgrading to SA 3.2.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby mr88talent » Wed Jul 04, 2007 1:20 pm

I'm using the aforementioned script which brings in:

MSRBL-Images.hdb
MSRBL-SPAM.ndb
phish.ndb
scam.ndb

I'm working on a rather large document at the moment, this is one small section:

http://www200.pair.com/mecham/spam/sa-upgrade.html
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby Marius » Wed Jul 04, 2007 1:23 pm

Also, looks like we'll have to remove the imageinfo plugin config when upgrading to SA 3.2.


Never mind. Mr88's last post cleared up some things. :)
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby AnonymousDog » Fri Jul 06, 2007 2:00 pm

I'm not finding the clamav defs to be helping with this kind of spam. In fact, I've gotten no virus hits at all except for tests...rejecting so much at the door with Postfix and Postgrey, I think. It catches EICAR and MSRBL test images, but isn't helping with the pdf spam. Decoder is working on pdf processing with pdfinfo error checking (which is needed since most all these pdfs are "damaged" and won't pdftops properly). He's deferred this work a bit, but I think he's interested in getting it done.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby AnonymousDog » Fri Jul 06, 2007 2:05 pm

What's sa-compile do? Is this new to 3.2?
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby mr88talent » Fri Jul 06, 2007 2:23 pm

man sa-compile

yes - new to 3.2

Code: Select all
DESCRIPTION
       sa-compile uses "re2c" to compile the SpamAssassin ruleset. This is
       then used by the "Mail::SpamAssassin::Plugin::Rule2XSBody" plugin to
       speed up SpamAssassin's operation, where possible, and when that plugin
       is loaded.

       "re2c" can match strings much faster than perl code, by constructing a
       DFA to match many simple strings in parallel, and compiling that to
       native object code.  Not all SpamAssassin rules are amenable to this
       conversion, however.
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby AnonymousDog » Fri Jul 06, 2007 2:28 pm

Sweet! Optimization!
Is that not run after every sa-update because it takes a "long time" to process?
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby mr88talent » Fri Jul 06, 2007 2:39 pm

Not sure, but it does take a long time to process. I set up a script to run it once a week. Hopefully one is not required to run it if a few new body rules are added/updated. I assume SpamAssassin won't duplicate rules so if a few new rules get downloaded then they should be used - whether compiled or not, but honestly I'm just hoping that's the way it works. After all, not all rules can be compiled.
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby Marius » Fri Jul 06, 2007 2:42 pm

This morning we had everything but the kitchen sink thrown at us. Those sanesecurity definitions blocked several hundred messages in 14 minutes, and most of them were .pdf image spam messages.

Without those definitions I'm sure most, if not all of them would have made it through.
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

New Rule

Postby AnonymousDog » Sat Jul 14, 2007 1:37 am

A new rule in the default sa-upate channel is hitting on mail with PDF attachment(s) and low text content -- "how low" I don't want to publish openly, but the rule is pretty easy to figure if you look at it.
The below, tacked onto spamassassin's local.cf file, will double it's usual score:
Code: Select all
score TVD_PDF_FINGER01 2.0
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby bigbobsbbs » Thu Jul 19, 2007 11:51 am

pinoyako amavis[12315]: (12315-04) Blocked INFECTED (Email.Stk.Gen588.Sanesecurity.07071604.pdf)


This is Amazing... thanks guys...
Cheers

Bobby
bigbobsbbs
 
Posts: 58
Joined: Wed Aug 31, 2005 12:20 pm
Location: Winnipeg

Postby AnonymousDog » Thu Jul 19, 2007 2:01 pm

Make sure your cron job is running for SaneSecurity updates, that signature, Email.Stk.Gen588.Sanesecurity.07071604.pdf, isn't the most curent. Email.Stk.Gen592.Sanesecurity.07071801.pdf is. If it seems to be running, run it interactively and check the logs, 'cus you're not up to date. That def DOES do a good job of catching some/most.

Also, SARE has publicly released the PDFInfo plugin. I've barely had time to look at it. It's still experimental. It does not check for corruption of the PDF format (which is common on the ones that are just embeded image spam and no text). It does a lot of checking against md5 hashes of known spam pdfs, but that seems like a tireless cat-n-mouse game; so, I won't pursue it very far (except to update the plugin frequently -- I'm not computing and entering new rules for every new hash that comes up). It also does some matching on image sizes that have frequently cropped up in pdf spam -- same downside to that as above, and it seems to me that would generate some false positives. There are lots of example and disabled rules. There's a rule, GMD_PDF_NO_TXT, that seems to do something similar to the TVD_PDF_FINGER01 rule in the sa-updates standard channel; they've disabled this rule in pdfinfo.cf because it hit some ham. That's it for the disappointing (to me).

It does do an interesting thing to allow you to write rules that would match on certain pdf meta data like author, producer, creator, title, created, and modified. I find that most pdfs are either corrupt and will pass all the tests (unless a known hash) b/c pdfinfo errors out, or are pdfs with no embeded images created by text2pdf, ImageMagick, and easyPDF (others will surely come); so, some good rules might be had there. There are also some good, heavily scored meta rules that should be very effective against known pdf spam signatures.

More later.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby AnonymousDog » Thu Jul 19, 2007 3:42 pm

I created some new rules to do what I want with the PDFInfo plugin. Would someone, please, review this for sanity and or suggestions? You'll find an attempt to identify corrupt pdfs via missing pdf meta data. That and rules using known-bad title and producer meta data are all that's below.
Code: Select all
# <LICENSE>
#
# Free as in beer.  Free as in speech.  You get the picture...
#
# </LICENSE>
#
# File: pdfinfouserrules.cf
# Version: 0.1
# Created: 2007-07-19
# Modified: 2007-07-19
# Author: Andy Kinnard (AnonymousDog) andyk at slcpa dot biz
# Requires: PDFInfo.pm plugin
# License: None
# Description: This plugin/ruleset combination will help you alleviate the new
#              PDF based stock spam which began to appear mid-June, 2007.
#
# Changes:
#
#   0.1 - initial ruleset.
#

ifplugin Mail::SpamAssassin::Plugin::PDFInfo

# pdf_match_details()

body          GMD_PRODUCER_UNKNOWN        eval:pdf_match_details('producer','/^unknown$/')
describe      GMD_PRODUCER_UNKNOWN        Missing PDF meta data for producer
score         GMD_PRODUCER_UNKNOWN        0.5

body          GMD_CREATED_ZERO        eval:pdf_match_details('created','/^0$/')
describe      GMD_CREATED_ZERO        Missing PDF meta data for created date
score         GMD_CREATED_ZERO        1.0

# The next four should be just meta b/c they're very common in ham and uncorrupted pdfs.  The descriptions follow from the above two.
body          __GMD_CREATOR_UNKNOWN        eval:pdf_match_details('creator','/^unknown$/')
body          __GMD_TITLE_UNTITLED        eval:pdf_match_details('title','/^untitled$/')
body          __GMD_MODIFIED_ZERO        eval:pdf_match_details('modified','/^0$/')
body          __GMD_AUTHOR_UNKNOWN        eval:pdf_match_details('author','/^unknown$/')
# End of four

body          GMD_PRODUCER_TEXT2PDF       eval:pdf_match_details('producer','/^text2pdf/')
describe      GMD_PRODUCER_TEXT2PDF       PDF meta data for producer begins with text2pdf
score         GMD_PRODUCER_TEXT2PDF       3.0

body          GMD_PRODUCER_IMAGEMAGICK       eval:pdf_match_details('producer','/^ImageMagick/')
describe      GMD_PRODUCER_IMAGEMAGICK       PDF meta data for producer begins with ImageMagick
score         GMD_PRODUCER_IMAGEMAGICK       0.001

body          GMD_PRODUCER_EASYPDF       eval:pdf_match_details('producer','/easyPDF/')
describe      GMD_PRODUCER_EASYPDF       PDF meta data for producer contains easyPDF
score         GMD_PRODUCER_EASYPDF       0.5

body          GMD_TITLE_STOCK     eval:pdf_match_details('title','/stock/')
describe      GMD_TITLE_STOCK     PDF meta data for title contains stock
score         GMD_TITLE_STOCK     2.0

# metas

meta         GMD_PDF_LIKELY_CORRUPT   ( GMD_PRODUCER_UNKNOWN && GMD_CREATED_ZERO )
describe     GMD_PDF_LIKELY_CORRUPT   Missing PDF meta data for producer and created date indicates probable PDF format corruption
score        GMD_PDF_LIKELY_CORRUPT   1.5

meta         GMD_MISSING_LESSER_DETAILS     ( __GMD_CREATOR_UNKNOWN && __GMD_TITLE_UNTITLED && __GMD_MODIFIED_ZERO && __GMD_AUTHOR_UNKNOWN )
describe     GMD_MISSING_LESSER_DETAILS     Missing PDF meta data for ALL lesser details: creator, title, modified date, and author
score        GMD_MISSING_LESSER_DETAILS     0.5

meta         __GMD_KNOWN_SPAM_PRODUCERS     ( GMD_PRODUCER_TEXT2PDF || GMD_PRODUCER_IMAGEMAGICK || GMD_PRODUCER_EASYPDF )
describe     __GMD_KNOWN_SPAM_PRODUCERS     PDF meta data for producer matches one of those deemed "known spam producer"

# This rule needs more titles to be effective
meta        __GMD_KNOWN_SPAM_TITLES     ( GMD_TITLE_STOCK )
describe    __GMD_KNOWN_SPAM_TITLES     PDF meta data for title matches one of those deemed "known spam titles"

# This rule won't be effective until __GMD_KNOWN_SPAM_TITLES is
meta        GMD_PRODUCER_AND_TITLE     ( __GMD_KNOWN_SPAM_PRODUCERS && __GMD_KNOWN_SPAM_TITLES )
describe    GMD_PRODUCER_AND_TITLE     PDF meta data for title AND producer match those deemed "known spam *"
score       GMD_PRODUCER_AND_TITLE     0.001

endif

Thanks
Last edited by AnonymousDog on Fri Jul 20, 2007 1:23 pm, edited 2 times in total.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Fri Jul 20, 2007 5:26 am

I have added it to my list of rules.

FYI plugin version 0.6 has now been released and includes a rule for EASYPDF itself.
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby AnonymousDog » Fri Jul 20, 2007 10:26 am

Are you getting any hits from this plugin at all (not necessarily my rules)? I can't get it to work on my test machine. Per 'amavisd debug-sa' it is loading ok, but I can't get it to hit on any tests. I wonder whether it's not compatible with amavisd?
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Fri Jul 20, 2007 11:02 am

Yes it is working fine for me.

Rule hits analysis for your rules. Fields are :-

rule name, rule description, total, ham number, ham percent, spam number, spam percent
GMD_PDF_ENCRYPTED Attached PDF is encrypted 73 0 0 73 100
GMD_PDF_EMPTY_BODY 50 0 0 50 100
GMD_PRODUCER_UNKNOWN 35 2 5.7 33 94.3
GMD_PDF_STOX_M4 PDF Stox spam 27 0 0 27 100
GMD_PRODUCER_EASYPDF 12 0 0 12 100
etc...
I use Mailscanner as the main system and Mailwatch to analyse the logs which are created. The rule hit analysis is part of mailwatch and is something I find incredibly usefull when analysing the performance of new rules I have written.

analysing GMD_PRODUCER_UNKNOWN furthur the two HAM hits were false positives so I would reduce the score for this rule.

Just off to my parents no so wont have access to internet until sunday or monday when I get back to work.

One thing which has been causing some people to have problems with pdfinfo is their max spamassassin message size. The default is now 250k which is fine but some historical installations have had a lower value which causes the pdf to be skipped.
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby AnonymousDog » Fri Jul 20, 2007 1:22 pm

gblades wrote:Rule hits analysis for your rules. Fields are :-

rule name, rule description, total, ham number, ham percent, spam number, spam percent
GMD_PRODUCER_UNKNOWN 35 2 5.7 33 94.3
GMD_PRODUCER_EASYPDF 12 0 0 12 100

analysing GMD_PRODUCER_UNKNOWN furthur the two HAM hits were false positives so I would reduce the score for this rule.

I've reduced it to 0.5. Any idea what produced those ham pdfs? It's really unusual not to see a producer in a legit PDF.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Sat Jul 21, 2007 4:07 pm

The first came from tag4.com and due to the email subject, who it was sent to and the fact the message was in the AWL I suspect it may be a draft for a book we are producing.

The second came from a jockey club race course barracuda spam firewall and seems to be a wedding related enquiry going by the subject.
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby gblades » Sat Jul 21, 2007 4:59 pm

Got a mail back from tag4.com. The PDF was saved from Adobe Illustrator CS2.
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby AnonymousDog » Mon Jul 23, 2007 5:10 pm

I'm overall displeased with the PDFInfo plugin. It's results in parsing pdfs for meta data, what it calls "details", is so much different from what one gets from the pdfinfo utility (from poppler-utils package) that it makes writing rules very difficult. In particular, producer meta data from most of the pdfs I've tested (with valid producer meta data as per pdfinfo) is completely missed by PDFInfo, resulting in an unexpectedly "unknown" producer...even for some of the producers with rules in the as-delivered ruleset (e.g., ghostscript). So far, I've only gotten easyPDF to hit. This pretty much kills the expected utility of my GMD_PRODUCER_UNKNOWN rule (and similar); I haven't been able to get any hits on GMD_CREATED_ZERO either (despite that field's being missing from all the corrupt [per pdfinfo] image-only pdfs I've received). I had expected both to be pretty predictive rules, and neither has positive utility...the first's is negative.

Results are frustratingly similar for most of the meta data types (i.e., grossly inconsistent). For instance, I've not gotten title to hit for any rule. Is this your (all's) experience with this plugin? gblades experience with Adobe Illustrator CS2 indicates seem to support that (as I'm pretty certain Illustrator would populate the producer field).

I'm no Perl programmer, but it looks to me like PDFInfo.pm parses the mimedecode output of each file line by line and greps for particular regex matches to find the meta data. I wonder how reliable this method is. I thought there were several CPAN packages that could do that pretty well; so, I'm surprised to see them reinvent the wheel there. Anyone else better able to parse the perl and offer more athoritative feedback on it's functions/fitness?

What we really need is a module that:
    tests the pdf(s) for corruption
    parses them reliably for metadata
    extracts embedded images and either passes them to FOCR or ocrs them itself
    uses something like pdftotext to extract body text and call SA to process with body rules
and passes a score back to SA. It looks like PDF::OCR and PDF::OCR::Thorough do most of that (but for the metadata), esp. Thorough which uses PDF::API2 to check for corruption as well as both pdftotext and tesseract to extract text.

PDF::Parse and Image::ExifTool both can extract metadata.

PDFInfo wrapped around PDF::OCR and Image::ExifTool functionality could be a winner. Anyone know how to do it? ;-)
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Tue Jul 24, 2007 3:34 am

No hits on GMD_CREATED_ZERO here.

Have you contacted the author with your findings and suggestions?
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby AnonymousDog » Tue Jul 24, 2007 2:39 pm

I am so reticent in joining another mailing list; that is what they use for a forum.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby AnonymousDog » Tue Jul 24, 2007 8:26 pm

Subscribed and communicated. We'll see if it gets results.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Wed Jul 25, 2007 3:48 am

I know what you mean. I have a shared folder which is subscribed to mailscanner, mailwatch, sare, fuzzyocr and ocrad
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby Marius » Wed Jul 25, 2007 8:34 am

I am so reticent in joining another mailing list; that is what they use for a forum.


So very, very true. I have never liked using mailing lists, but have to sometimes because much of the Linux world uses them to distribute information. Like you, I would MUCH rather use a discussion forum.
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby Marius » Wed Jul 25, 2007 9:50 pm

The SaneSecurity 'virus' definitions that mr88 recommended earlier in this thread have been doing a superb job of stopping the .PDF image spam at our site. I haven't had ANY reported instances of these since deploying the definitions, and they have been nailing them at the rate of 200-300 per hour over the last 3 days.
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby AnonymousDog » Thu Jul 26, 2007 6:07 pm

I concur re: SaneSecurity defs. As long as these efforts are limited to a small number of pdfs (excel, word, etc.) the signature approach will be very effective. If/When they figure out how to use bot-resident programs to covert images/content to target formats on-the-fly and with some randomness, signature approaches will become inefficient. We were clearly seeing something like that with image spam over the last year or two; so, I'm just trying to think ahead (of them).
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby Marius » Thu Jul 26, 2007 6:11 pm

Very true. This signature approach is only a very good 'band aid' fix until a more permanent solution is found.
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby AnonymousDog » Fri Jul 27, 2007 10:47 am

Very frustrating: I'm on the list and get their digests, but I can't seem to submit an email. Aaaaarrrrrggggghhhhh!
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Gone?

Postby AnonymousDog » Wed Aug 01, 2007 5:03 pm

No pdf spam or hits on Sanesecurity defs for over two days at any site. It dropped off like a stone on June 30 between 11am and 2pm (EDT) at all my sites.
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby Marius » Wed Aug 01, 2007 5:11 pm

Big decrease here too. We were getting sometimes thousands per day at the most, and so far today only 30 or so.

Makes you wonder doesn't it? I wonder if they have changed their messages to evade our signatures.
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby AnonymousDog » Wed Aug 01, 2007 5:16 pm

Marius wrote:Big decrease here too. We were getting sometimes thousands per day at the most, and so far today only 30 or so.

Makes you wonder doesn't it? I wonder if they have changed their messages to evade our signatures.
We're seeing zero...nothing since Monday. Nothing is getting though either; so, they didn't just change their pdfs -- they turned it off (some bots may wander off on their own). I'd expect this to resurface as a slightly retooled attack soon (before school gets back in).
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby Marius » Wed Aug 01, 2007 5:42 pm

I'd expect this to resurface as a slightly retooled attack soon


Looks like I spoke too soon. We just got a 150 or so in the last hour. :?
User avatar
Marius
 
Posts: 334
Joined: Wed Sep 13, 2006 10:39 pm
Location: VA, USA

Postby AnonymousDog » Fri Aug 03, 2007 12:48 pm

They're back here too...all of the gold color:Image48hr. respite only
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Fri Aug 10, 2007 2:16 pm

PDFInfo 0.8 has been released and now works on .fdf files aswell.

I mailed the author about the comments on this thread and got the following reply :-
I welcome patches... but I will not make anything available that
requires outside perl module dependancies.
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby AnonymousDog » Fri Aug 10, 2007 5:44 pm

gblades wrote:PDFInfo 0.8 has been released and now works on .fdf files aswell.
Well that's great, since the basic meta data parsing is broken anyway. :roll: There has been at least one mailing list question about broken parsing for encryption (plus my question) and no response from maintainers or anyone else.
gblades wrote:I mailed the author about the comments on this thread and got the following reply :-
I welcome patches... but I will not make anything available that
requires outside perl module dependancies.
Which is just fabulous! So, he insists on reinventing the wheel despite there being acceptable CPAN modules for getting meta data from pdfs. :shock: :?:

Ok, can anyone write the code that would call pdfinfo (as a helper app like FOCR does with netpbm binaries) to obtain meta data? Maybe he'll accept a patch that does that. You're kinda Perly, aren't you, gblades?
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby gblades » Sat Aug 11, 2007 4:51 am

Unfortunetly perl is one language I cant program in. Its just a bit too different to other languages for me to be able to pick it up straight away. I am going on a training course later this year so might have a go then if nobody else has done anything.
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby mr88talent » Mon Aug 13, 2007 6:52 pm

Dog, et al:

This is a nicely written post:
http://www.renaissoft.com/pipermail/mai ... 10366.html

I would caution against scoring Botnet over 2.0 due to false positives.
User avatar
mr88talent
Moderator
 
Posts: 1676
Joined: Tue Mar 08, 2005 4:19 pm
Location: Salt Lake City

Postby gblades » Tue Aug 14, 2007 4:22 am

I have been running botnet for a while but thanks for the link as I didn't know 0.8 was out. I have now upgraded and patched it.

These are the botnet rules I am using. I prefer to assign smaller scores to the individual tests and then whitelist IP's from regular senders which false positive.

describe __BOTNET Relay might be a spambot or virusbot
header __BOTNET eval:botnet()
score __BOTNET 0.01

describe BOTNET_SOHO Relay might be a SOHO mail server
header BOTNET_SOHO eval:botnet_soho()
score BOTNET_SOHO -0.01

describe BOTNET_NORDNS Relay's IP address has no PTR record
header BOTNET_NORDNS eval:botnet_nordns()
score BOTNET_NORDNS 2.0

describe BOTNET_BADDNS Relay doesn't have full circle DNS
header BOTNET_BADDNS eval:botnet_baddns()
score BOTNET_BADDNS 1.0

describe BOTNET_CLIENT Relay has a client-like hostname
header BOTNET_CLIENT eval:botnet_client()
score BOTNET_CLIENT 1.0

describe BOTNET_IPINHOSTNAME Hostname contains its own IP address
header BOTNET_IPINHOSTNAME eval:botnet_ipinhostname()
score BOTNET_IPINHOSTNAME 1.0

describe BOTNET_CLIENTWORDS Hostname contains client-like substrings
header BOTNET_CLIENTWORDS eval:botnet_clientwords()
score BOTNET_CLIENTWORDS 0.5

describe BOTNET_SERVERWORDS Hostname contains server-like substrings
header BOTNET_SERVERWORDS eval:botnet_serverwords()
score BOTNET_SERVERWORDS -0.2
gblades
 
Posts: 66
Joined: Mon Mar 26, 2007 7:14 am

Postby AnonymousDog » Tue Aug 14, 2007 1:16 pm

That is nicely written.

I like the strategy employed with the SA rulesets leveraging mimeheaders and AWL; it's a bit like greylisting at the SA level on mail with suspicious attachments. That should be useful as an alternative to outright file type banning when there's flood of a particular file type attachment spam.

Botnet looks pretty well conceptualized and coded (with some negative-scoring rules even).
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan

Postby AnonymousDog » Tue Sep 11, 2007 1:31 pm

gblades wrote:I have been running botnet for a while...These are the botnet rules I am using. I prefer to assign smaller scores to the individual tests and then whitelist IP's from regular senders which false positive.

I think that's very wise. I've been testing Botnet 0.8 for a few days but never ran with the default score of 5 -- way too high for testing. So I downed it to a 2 and added a meta rule with !AWL (as I am so fond of doing lately) scored at 3. The meta hits on no ham at all so far, but the standard rule hits on enough legit hosts that I would not weight the rule any higher (and may go lower)...and there are too many legit hosts with bad DNS to chase around all the time.

I find most legit hosts that hit on it are failing BOTNET_IPINHOSTNAME and/or BOTNET_BADDNS. Unfortunately, those two are probably the most powerful tests (and, in a perfect world, would be the ones we could score through the roof, esp. BADNS).
User avatar
AnonymousDog
Moderator
 
Posts: 398
Joined: Fri Oct 20, 2006 12:54 pm
Location: SW Michigan


Return to FuzzyOCR and the Magic of Image Spam

Who is online

Users browsing this forum: No registered users and 0 guests