[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

deny list generation request



Hi:

I have an automated script I use to generate the smtp-policy.spam file.
I'll attach it below.  After running it, it then run a
version policy-builder.sh with the 'lynx' line commented out.  Could there
be some option, like '-n' or something, to leave off getting the
smpt-policy.spam file through 'lynx' and just accept it as-is?

As for the script, it is written in pretty straightforward Python (see
http://www.python.org for details) and merely adds the AOL Preferred Mail
list to the Webeasy list for the output.  At one time, I also got some
lists via FTP from ftp.cybernothing.org, but those appear to be
obsolescent now.  The idea is that it's easy (at least, theoretically) to
add new sources in to generate the final list.

Here is the script:

#!/usr/bin/python
#
# spam list generator
#

import ftplib, httplib, string, sys
from urlparse import *

# returns a list of lines from the given url
def get_url_contents(url):
    global lns
    lns = []
    url_comps = urlparse(url)
    if (url_comps[0] == "ftp"):
	def ftp_line(ln):
	    lns.append(ln)
	h = ftplib.FTP(url_comps[1])
	h.login()
	i = string.rfind(url_comps[2], '/')
	if (i >= 0):
	    h.cwd(url_comps[2][:i])
	    h.retrlines("RETR "+url_comps[2][i+1:], ftp_line)
	else:
	    h.retrlines("RETR "+url_comps[2], ftp_line)
	h.close()
    elif (url_comps[0] == "http"):
	h = httplib.HTTP(url_comps[1])
	h.putrequest('GET', url_comps[2])
	h.putheader('Accept', 'text/html')
	h.putheader('Accept', 'text/plain')
	h.endheaders()
	errcode, errmsg, headers = h.getreply()
	sys.stderr.write("%d\n" % (errcode)) # Should be 200
	f = h.getfile()
	ln = f.readline()
	while ln:
	    ln = string.rstrip(ln)
	    lns.append(ln) # Get the raw HTML
	    ln = f.readline()
	f.close()
    return lns

# prepend an @ sign if none is yet in the string
def atify(dom):
    if (string.find(dom, '@') == -1):
	return '@'+dom
    else:
	return dom

sl = {}
sys.stderr.write("AOL PreferredMail:")
csl = get_url_contents("http://www.idot.aol.com/preferredmail/")
sys.stderr.write("done\n")
i = 0
while ((i < len(csl)) and (csl[i][0:9] <> "<MULTICOL")):
    i = i+1
i = i+1  #skip <MULTICOL directive
# start extracting some junk domains
while ((i < len(csl)) and (csl[i][0:10] <> "</MULTICOL")):
    if (csl[i]):
	sl[atify(csl[i])] = "AOL"
    i = i+1

sys.stderr.write("Webeasy list:")
csl = get_url_contents("http://www.webeasy.com:8080/spam/spam_download_table")
sys.stdout.write("done\n")
for i in csl:
    if (i):
	sl[atify(i)] = "Webeasy"

ksl = sl.keys()
ksl.sort()
for i in ksl:
    print i
#end script

---
Roy Bixler
The University of Chicago Press
rcb@press-gopher.uchicago.edu