[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SPAM Database... not everything clear.

On Wed, 1 Apr 1998, Chris J. Mutter wrote:
>    I want to ask how the database is updated. Is it done automatically by
> zmailer or should I put the $MAILBIN/policy-builder.sh into the root
> crontab? And is the Database somewhere explained?? For example who decides
> which addresses/domains go into the database? Im pretty new to this Anti-
> SPAM thing but am interested (got some serious spam problem here and mail
> bombings - what to do about mailbombings on zmailer?)

If you catch a mailbombing in progress, I'd recommend simply firewalling
the source off and then notifying the proper authorities to make the
abuser ceast and desist on their end.  In any case, definitely do the

> how to fight against
> those boneheads who waste bandwith.

If you use Python, I have attached a slightly updated version of my spam
list gathering program (i.e. goes to various sources, massages data,
eliminates duplicates and outputs the result to stdout.) It's updated for
Python 1.5 and now includes the ability to use a local file as a source of
data.  Left as an exercise for the reader to incorporate into
policy-builder.sh ... 

Roy Bixler
The University of Chicago Press
# Emacs: use -*-Python-*- mode.
# Z-mailer spam list maker
# Roy Bixler
# rcb@press-gopher.uchicago.edu
# 1 Dec. 1997

import ftplib, httplib, string, sys
from urlparse import *

# returns the contents at the given URL (must be either of type "http" or
# "ftp") as a list
def get_url_contents(url):
    global lns
    lns = []
    url_comps = urlparse(url)
    if (url_comps[0] == "file"):
	f = open(url_comps[2])
	ln = f.readline()
	while ln:
	    ln = f.readline()
    elif (url_comps[0] == "ftp"):
	def ftp_line(ln):
	h = ftplib.FTP(url_comps[1])
	i = string.rfind(url_comps[2], '/')
	if (i >= 0):
	    h.retrlines("RETR "+url_comps[2][i+1:], ftp_line)
	    h.retrlines("RETR "+url_comps[2], ftp_line)
    elif (url_comps[0] == "http"):
	h = httplib.HTTP(url_comps[1])
	h.putrequest('GET', url_comps[2])
	h.putheader('Accept', 'text/html')
	h.putheader('Accept', 'text/plain')
	errcode, errmsg, headers = h.getreply()
	# HTTP/1.1 replies seem to generate an errorcode of -1, so try
	# to handle this case.  This may simply be a manifestation of
	# a broken Python 1.4 httplib module.  This bug has been fixed
	# with Python version 1.5.
	version = sys.version[0:3]
	if ((version < "1.5") and (errcode == -1)):
		real_errcode = string.atoi(string.split(errmsg)[1])
	    except ValueError:
		real_errcode = -1 # yes, it really is bogus :-/
	    sys.stderr.write("%d" % (real_errcode)) # Should be 200
	    sys.stderr.write("%d" % (errcode)) # Should be 200
	f = h.getfile()
	ln = f.readline()
	# once again, try to compensate for broken behavior on HTTP/1.1
	# by eating the header lines which would otherwise show up in
	# the data.  This bug has been fixed with Python version 1.5.
	if ((version < "1.5") and (errcode == -1) and (real_errcode <> -1)):
	    while ((ln) and
		   ((len(ln) > 2) or
		    (ln[0] <> "\r") or (ln[-1] <> "\n"))):
		ln = f.readline()
	while ln:
	    lns.append(string.rstrip(ln)) # Get the raw HTML
	    ln = f.readline()
    return lns

# if there is not @-sign found, insert at beginning of string
def atify(dom):
    if (string.find(dom, '@') == -1):
	return '@'+dom
	return dom

# add the information found at 'svc_url' to a list of junk e-mailers.
# The list consists of the dictionary 'jdict'.  'svc_name' is merely used
# for the cosmetic purpose of progress reporting.  'start_after' specifies
# a string which marks the beginning of the list and 'end_before' similarly
# specifies a marker which tells when to stop reading the list.  These are
# both optional parameters.
def add_to_junkers_dict(jdict, svc_name, svc_url, start_after='',
    sys.stderr.write("%s: (status = " % (svc_name))
    tdict = get_url_contents(svc_url)
    sys.stderr.write(") - done\n")
    i = 0
    if (start_after):
	while ((i < len(tdict)) and
	       (tdict[i][0:len(start_after)] <> start_after)):
	    i = i+1
	i = i+1
    while (i < len(tdict)):
	if ((end_before) and (tdict[i][0:len(end_before)] == end_before)):
	if ((tdict[i]) and (tdict[i][0] <> "#")):
	    jdict[atify(tdict[i])] = svc_name
	i = i+1

# and now for the main program

# start with an empty junk list
sl = {}

#add_to_junkers_dict(sl, "Local",
#		    "file:/home/rcb/spamlist1",
#		    "", "")

add_to_junkers_dict(sl, "Hilotek",
		    "<h3", "<P")
add_to_junkers_dict(sl, "Taz 1",
		    "", "")
add_to_junkers_dict(sl, "Taz 2",
		    "", "")
add_to_junkers_dict(sl, "Webeasy",
		    "", "")
add_to_junkers_dict(sl, "Znet",
		    "", "")

# we only really care about the dictionary keys
ksl = sl.keys()

# output the sorted dictionary keys
for i in ksl:
    print i