This section covers subsystem management issues, including usage and configuration examples, basic and somewhat specific explanations of how pre-existing scripts have been done.
The Figure 14 repeats earlier picture showing central components of the system.
The things in the picture are pointed further here, along with their related auxiliary programs, etc:
Administration of the smtpserver is described at Chapter 12, and detailed Reference is at Chapter 17.
The sendmail client compatibility functions program is described at Reference Chapter 18.
Administration of the router is at Chapter 13, and Reference is at Chapter 21.
Auxiliary programs used to support the router include:
This script has two behaviours, if $MAILVAR/db/dbases.conf file can be found, utility script newdbprocessor is run. If not, a set of pre-determined individual database regeneration
actions is done.
A perl wrapper for actual internal makedb utility taking care of things like correct sequence of file movements after successfull generation of new binary database file(s).
Classical aliases database regenerator, subset of zmailer newdb.
Behaves like the newaliases, but processes different database: fqdnaliases. Subset of zmailer newdb.
Administration of the scheduler is at Chapter 14, and Reference is at Chapter 22.
Auxiliary programs used to support the scheduler include:
Utility for querying scheduler's view of the universe.
Program driven by the scheduler to do (usually) delivery to local mailboxes, running pipes, and writing to files.
Program to do delivery to external systems via SMTP protocol (with many supported extensions).
Handler of “hold”-channel deliveries.
There are lots of programs intended to be run under sendmail's M-entry (Mailer-entry) lines. This program supplies that interface layer to an extent which is meaningfull in ZMailer sense.
Handler of “error”-channel deliveries.
This is actually driven by the administrator via manual-expirer wrapper script.
The cornerstone of everything in busy Internet email routing is a properly working DNS server, and a modern resolver library. If you use the BIND nameserver, you should be using a recent version. As of this writing (January 2006), BIND server developers recommend version 9. They also strongly recommend, that you do not let zone data masters (either masters, or slave copies) to do any recursive resolving, and do recursive resolvings with servers that do not have locally mastered data.
You can get improved DNS performance by installing local named(8), which does cache replies, including negative replies.
For the file /etc/resolv.conf:
domain your.domain nameserver 127.0.0.1 nameserver (some other server) |
For the local nameserver daemon (named(8)) you should have at least following type of configuration:
For 4.* series: /etc/named.boot
forwarders 10.12.34.56 10.45.67.89
options forward-only
For 8.* series: /etc/named.conf
options {
forward only;
forwarders {
10.12.34.56;
10.45.67.89;
};
};
|
For Solaris, Linux, and some other environments you propably have file /etc/nsswitch.conf. There the interesting line is one referring with “hosts:” tag. In most cases the default setup assumes you will use e.g. NIS(+) in the system overriding DNS and/or local files. In general that is quite bad thing to do — especially for DNS intensive application, like mailers... Suggested value:
hosts: files dns |
At DEC Tru64 there is another file with same purpose as nsswitch.conf, it is: /etc/svc.conf.
At Solaris 2.6 (and after?) there is also a “nscd” daemon (name service cache daemon), which has appeared at times to harm DNS lookup intensive systems. At its configuration file /etc/nscd.conf:
enable-cache hosts no
|
The intention of the security mechanisms is not to prevent address faking, but to control the privileges which are used to execute pipes, and accessing files (read, and write).
In addition of doing strict privilege control on who can do what, ZMailer has a concept of trust, which shows as a group of accounts who can claim wildly “fradulent” data in their headers.
The trusted accounts are those listed in the ZMailer group or the “trusted” variable in the system configuration (router.cf) file.
Of course when one uses SMTP protocol to inject email, it is extremely easy to “fake” any source and destination envelope and visible addresses.
Having local-parts that allow delivery to arbitrary files, or which can trigger execution of arbitrary programs, can clearly lead to a huge security problem. sendmail does address this problem, but in a restrictive, and unintuitive manner. This aspect of ZMailer security has been designed to allow the privileges expected by common sense.
The responsibility for implementing this kind of security is split between the router, and the transport agent that delivers a message to an address. Since it is the Transport Agent that must enforce the security, it needs some information to guide it. Specifically, for each address it delivers to, some information about the “trustworthiness” of that address is necessary so that the transport agent can determine which privileges it can assume when delivering for that destination. This information is determined by the router, and passed to the transport agent in the message control file. The specific measure of trustworthiness chosen by ZMailer, is simply a numeric user id (uid) value representing the source of the address.
When a message comes into the mailer from an external source, the destination addresses should obviously have no privileges on the local host (when mailing to a file or a program). Similarly, common sense would indicate that locally originated mail should have the same privileges as the originator. Based on an initial user id assigned from such considerations, the privilege attached to each address is modified by the attributes of the various alias files that contain expansions of it. The algorithm to determine the appropriate privilege is to use the user id of the owner of the alias file if and only if that file is not group or world writable, and the directory containing the file is owned by the same user and is likewise neither group nor world writable. If any of these conditions do not hold, an unprivileged user id will be assigned as the privilege level of the address.
It is entirely up to the transport agent whether it will honour the privilege assignment of an address, and indeed in many cases it might not make sense (for example for outbound mail). However, it is strongly recommended that appropriate measures are taken when a transport agent has no control over some action that may affect local files, security, or resources.
The described algorithm is far from perfect. The obvious dangers are:
The grandparent directories, to the Nth degree, are ignored, and may not be secure. In that case all security is lost.
There is a window of vulnerability between when the permissions are checked, and the delivery is actually made. This is the best argument for embedding the entire local-aliasing into the local-delivery agent program.
There is also another kind of security that must be addressed. That is the mechanism by which the router is told about the origin of a message. This is something that must be possible for the message receiving programs (/bin/rmail and the SMTP server are examples of these) to specify to ZMailer. The router knows of a list of trusted accounts on the system. If a message file is owned by one of these user id's, any sender specification within the message file will be believed by ZMailer. If the message file is not owned by such a trusted account, the router will cross-check the message file owner with any stated “From:” or “Sender:” address in the message header, or any origin specified in the envelope. If a discrepancy is discovered, appropriate action will be taken. This means that there is no way to forge the internal origin of a message without access to a trusted account.
Normal processing within ZMailer goes via directories described at Figure 11-1. A message may get sidelined or otherwise linked into other directories for several possible reasons.
![]() |
A filesystem without following three properties is not suitable for
ZMailer's $
Most of the $ However, the $ Adding system reliability in form of having directory data committed to disk at the time of the directory modifying operations returning with success is a nice bonus, although in normal UFS-like cases that taxes system performance heavily. E.g. running fast-and-loose with async metadata updates in Linux EXT2 filesystem gives you performance, but in case the system crashes, your postoffice directory may be in shambles, and important email may have been lost. How exactly you can combat the problem is yours to choose. Most filesystems for UNIX have lots of different options at mount-time, and also by-directory attributes can be set to control these things. Check yours after your decide on what kind of data loss threat you can tolerate at the expence of what speed reduction. (E.g. 300+ day straight uptime with power surges during a thunderstorm at the end of it toasting your machine along with its disks and filesystems, but trouble-free running until then ?) |
There are multiple queues in ZMailer. Messages exist in in one of five locations:
Submission temporary directory ($POSTOFFICE/public/)
Freezer directory ($POSTOFFICE/freezer/)
Router directory ($POSTOFFICE/router/)
Deferred directory ($POSTOFFICE/deferred/)
Scheduler directories ($POSTOFFICE/transport/, $POSTOFFICE/queue/)
And sometimes is also copied into the:
Postmaster analysis area ($POSTOFFICE/postman/)
In few places inside of ZMailer (in parts of router, and more so in parts of scheduler) the system expects the filenames to be decimal encodings of integers of 31 bits (maybe 63 bits at systems with suitably large 'long'), and those integers (modulo something) are used as keys in several internal database lookups.
The numeric values used in filenames must be unique for the entire lifetime of the spool files.
Message submission is done by writing a temporary file into the directory ($POSTOFFICE/public/), the actual format of the submitted message is described in Appendix E.
When the temporary file is completely written, flushed to disk, and closed, it is then renamed into one of the router input directories, usually into $POSTOFFICE/router/ with a name that is a decimal representation of the spool-file i-node number.
This is a way to ensure that the name of the file in the $POSTOFFICE/router/ directory is unique.
The message may also be renamed into alternate router directories, which give lower priorities on which messages to process when.
Sometimes, especially smtpserver built files may be moved into alternate directories. The smtpserver “ETRN” command has two implementations, original
one is by moving the built special file to the directory $POSTOFFICE/transport/ without going through the router. The smtpserver may also move newly arrived files into the $POSTOFFICE/freezer/ directory.
FIXME:
This description is from era before the router got
“daemonized” in a sense of having separate instance
of queue processor (and it also handles logging/log
rotation) and a worker-farm of routing work processes.
|
FIXME: The system can have multiple router processes running in parallel and competeting on input files. Multiple processes may make sense when there are multiple processors in the system allowing running them in parallel, but also perhaps for handling cases where one process is handling routing of some large list, and other (hopefully) will get less costly jobs.
The router processes have a few different behaviours when they go over their input directories.
First of all, if there are ROUTERDIRS entries, those are scanned for processing after
the primary $POSTOFFICE/router/ directory is found empty.
Within each directory, the router will sort files at first into mod-time order, and then
process the oldest message first. (Unless the router has been started with the “-s” option.)
The router acquires a lock on the message (spool file) by means of renaming the file from its previous name to a name created with formatting statement:
sprintf(buf, "%ld-%d", (long)filestat.st_ino, (int)getpid()); |
Once the router has been able to acquire a new name for the file, it starts off by creating a temporary file of router routing decisions. The file has a name created with formatting statement:
sprintf(buf, "..%ld-%d", (long)filestat.st_ino, (int)getpid()); |
Once the processing has completed successfully, the original input file is moved into the directory $POSTOFFICE/queue/, and the router produced scheduler work-specification file is moved to the $POSTOFFICE/transports/ directory with the same
name that the original file has.
If the routing analysis encountered problems, the message may end up moved into the directory $POSTOFFICE/deferred/, from which the command zmailer
resubmit is needed to return the messages into processing (The router logs should be
consulted for the reason why the message ended up in the /deferred/ area, especially if the
command zmailer resubmit is not able to get the messages processed successfully and the files
end up back in the /deferred/ area.)
If the original message had errors in its RFC-822 compliance, or some other detail, a copy of the message may end up
in the directory $POSTOFFICE/postman/.
See Postmaster Analysis Area on section Section 11.4.
The scheduler work specification files are in the directory $POSTOFFICE/transport/, under which there can be (optionally) one or two levels of subdirectories
into which the actual work files will be scattered to lessen the sizes of individual directories in which files reside,
and thus to speed up the system implied directory lookups at the transport agents, when they open files, (and also in
the scheduler).
When the router has completed message file processing, it places the resulting files into the
top level directory of the scheduler; $POSTOFFICE/transport/, and $POSTOFFICE/queue/.
The scheduler (if so configured by “-H” option) will move the messages into “hashed
subdirectories,” when it finds the new work specification files, and then start processing them.
The transport agents are run with their CWD in directory $POSTOFFICE/transport/, and they open files relative to that location. Actual message bodies, when
needed, are opened with the path prefix “../queue/” to
the work specification file name.
Usually it is the transport agent's duty to log different permanent status reports (failures, successes) into the end of the work-specification file. Sometimes the scheduler also logs something at the end of this file. All such operations are attempted without any sort of explicit locking, instead trusting the write(2) system call to behave in an atomic manner while writing to the disk file, and having a single buffer of data to write.
Once the scheduler has had all message recipient addresses processed by the transport agents, it will handle possible diagnostics on the message, and
finally it will remove the original spool-file from the $POSTOFFICE/queue/, and the work-specification file from $POSTOFFICE/transport/.
If the filename in the $POSTOFFICE/postman/ directory has an
underscore in it, the reason for the copy is soft, that is, the
message has been sent through successfully in spite of being copied into the directory.
If the filename in that directory does not have an underscrore in it, that file has not been processed successfully, and the only copy of the message is now in that directory!
Usually forementioned underscoreless filenames are double-errors, that is, error messages to error messages. There is nowhere else to send them.
The indication of error message is, of course, MAIL FROM:<> per RFC 821.
If the smtpserver receives a message with content that the policy filtering system decides to be dubious, it can move the message into $POSTOFFICE/freezer/ directory with a bit explanatory name of type:
sprintf(buf, "%ld.%s", (long)filestat.st_ino, causecodestring); |
The files in the freezer-area are in the input format to the router, and as of this writing, there are no tools to automatically process them for obvious spams, and leave just those that were falsely triggered.
Things to place here - administrative stuff - runtime command-line parameters (most important of them) - smtpserver.conf - PARAM entries (most common/important ones) - SMTP policy-control - content-policy interface |
The smtpserver is ZMailer's component to receive incoming email via SMTP protocol. Be it thru TCP channel, or thru Batch-SMTP.
The Figure 12-1 repeats earlier picture showing central components of the system, and where the smtpserver is in relation to them all.
The smtpserver program actually has several operational modes.
It can operate as a stand-alone internet service socket listener, which forks off childs that do the actual SMTP-protocol service.
It can be started from under the control of the inetd(8) server, and it can there fulfill most of the the same roles as it does in the stand-alone mode.
It can even be used to accept Batch-SMTP from incoming files (UUCP, and BITNET uses, for example).
The runtime command-line options are as follows:
smtpserver [-46aignvBV] [-p port] [-l logfile] [-s ] [-s [ftveR]]'strict'
[-L maxloadaver] [-M SMTPmaxsize] [-P postoffice] [-R router]
[-C cfgfile] [-T IPv4/IPv6-address-literal]
The most commonly used command line options are:
smtpserver [-aBivn] [-s ehlo-styles] [-l logfile] [-C cfgpath]
Without any arguments the smtpserver will start as a daemon listening on TCP port 25 (SMTP).
Most important of the options are:
-aQuery IDENT information about the incoming connection. This information (if available at all) may, or may not tell who is forming a connection.
-BThe session is Batch-SMTP a.k.a. BSMTP type of session. Use of “-i” option is needed, when feeding the input batch file.
-iThis is interactive session. I/O is done thru stdin/stdout.
-vVerbose trace written to stdout for use in conjunktion with “-i”, and “-B”.
-nThis tells that the smtpserver is running under inetd(8), and that the stdin/stdout file handles are actually network sockets on which we can do peer identity lookups, for example.
-s ehlo-styleDefault value for various checks done at SMTP protocol MAIL FROM, RCPT TO, VRFY, and EXPN commands. These are overridden with the value from EHLO-patterns, if they are available (more below)
-s 'strict'Special value directing the system to be extremely picky about the incoming SMTP protocol — mainly for protocol compliance testing, usually way too picky for average (sloppy) applications out there...
-l logfilepathFilename for the smtpserver input protocol log. This logs about everything, except actual message data...
-l 'SYSLOG'This tells smtpserver that it shall send all incoming smtp protocol transactions via
syslog facility to elsewere. Used syslog parameters are: FACILITY=mail,
LEVEL=debug.
This option may be used in addition to the preceding file logging variant. Double-use of file referring variant uses the last defined file, but this doesn't affect files at all.
-C configfilepathFull path for the smtpserver configuration in case the default location can not be used for some reason.
-M SMTPmaxsizeSMTPmaxsize defines the absolute maximum size we accept for incoming email. (Default: infinite) (This is a local policy issue.)
-T [IPv4-or-IPv6-address-literal]An address literal is used in interactive test mode to check how the rules work with given inputs when the source address of the connection is what is given in headers.
If the system has file $MAILSHARE/smtpserver.conf (by
default), that file is read for various parameters, which can override most of those possibly issued at the command
line.
Example configuration is is in figure Figure 12-2.
Figure 12-2. Sample smtpserver.conf file
#PARAM maxsize 10000000 # Same as -M -option #PARAM max-error-recipients 3 # More than this is propably SPAM! #PARAM MaxSameIpSource 10 # Max simultaneous connections from # # any IP source address #PARAM ListenQueueSize 10 # listen(2) parameter # # Enables of some commands: #PARAM DEBUGcmd #PARAM EXPNcmd #PARAM VRFYcmd #PARAM enable-router # This is a security decission for you. # # This is needed for EXPN/VRFY and interactive # # processing of MAIL FROM and RCPT TO addresses. # # However it also may allow external user entrance # # to ZMailer router shell environment with suitably # # pervert input, if quotation rules are broken in # # the scripts. PARAM help ------------------------------------------------------------- PARAM help This mail-server is at Yoyodyne Propulsion Inc. PARAM help Our telephone number is: +1-234-567-8900, and .... PARAM help ------------------------------------------------------------- # The policy database: (NOTE: See 'makedb' for its default suffixes!) PARAM policydb $DBTYPE $MAILVAR/db/smtp-policy # External program for received message content analysis: #PARAM contentfilter $MAILBIN/smtp-contentfilter .... # # TLSv1/SSLv[23] parameters; all must be used for the system to work! # # See doc/guides/openssl, or: # http://www.aet.tu-cottbus.de/personen/jaenicke/pfixtls/doc/setup.html # #PARAM use-tls #PARAM tls-CAfile $MAILVAR/db/smtpserver-CAcert.pem #PARAM tls-cert-file $MAILVAR/db/smtpserver-cert.pem #PARAM tls-key-file $MAILVAR/db/smtpserver-key.pe .... # # HELO/EHLO-pattern style-flags # [max loadavg] # localhost 999 ftveR some.host.domain 999 !NO EMAIL ACCEPTED FROM YOUR MACHINE # If the host presents itself as: HELO [1.2.3.4], be lenient to it.. # The syntax below is due to these patterns being SH-GLOB patterns, # where brackets are special characters. \[*\] 999 ve # Per default demant strict syntactic adherence, including fully # qualified addresses for MAIL FROM, and RCPT TO. To be lenient # on that detail, remove the "R" from "veR" string below: * 999 veR |
The PARAM keywords and values are:
maxsize nnMaximum size in the number of bytes of the entire spool message containing both the transport envelope, and the actual message. That is, if the max-size is too low, and there are a lot of addresses, the message may entirely become undeliverable..
max-error-recipients nnIn case the message envelope is an error envelope (MAIL FROM:<>), the don't accept more than this many separate recipient addresses for it. The default value is 3, which should be enough for most cases. (Some SPAMs claim to be error messages, and then provide a huge number of recipient addresses...)
MaxSameIpSource nn(Effective only on daemon-mode server — not on “-i”, nor “-n”
modes.) Sometimes some systems set up multiple parallel connections to same host (qmail ones especially, not
that ZMailer has entirely clean papers on this - at least up to 2.99.X series), we won't accept more than this
many connections from the same IP source address open in parallel. The default value for this limit is 10.
ListenQueueSize nnThis relates to newer systems where the listen(2) system call can define
higher limits, than the traditional/original 5. This limit tells how many nascent TCP streams we can have in
SYN_RCVD state before we stop answering to incoming SYN packets requesting opening of a connection.
There are entirely deliberate denial-of-service attacks based on flooding to some server many SYNS on which it can't send replies back (because the target machines don't have network connectivity, for example), and thus filling the back-queue of nascent sockets. This can also happen accidentally, as the connectivity in between the client host, and the server host may have a black hole into which the SYN-ACK packets disappear, and the client thus will not be able to get the TCP startup three-way handshake completed.
Most modern systems can have this value upped to thousands to improve systems resiliency against malicious attacks, and most likely to provide complete immunity against the accidental “attack” by the failing network routing.
DEBUGcmdFIXME! WRITEME! #PARAM DEBUGcmd
EXPNcmdFIXME! WRITEME! #PARAM EXPNcmd
VRFYcmdFIXME! WRITEME! #PARAM VRFYcmd
enable-routerFIXME! WRITEME! #PARAM enable-router # This is a security decission for you.
help 'string'This one adds yet another string (no quotes are used) into those that are presented to the client when it asks for “HELP” in the SMTP session.
PolicyDB dbtype dbpathThis defines the database type, and file path prefix to the binary database containing policy processing information. More of this below. Actual binary database file names are formed by appending type specific suffixes to the path prefix. For example NDBM database appends “.pag” and “.dir”, while BSD-Btree appends only “.db”. (And the latter has only one file, while the first has two.)
For an operative overview, see Section 12.2, and for deeper details, see Section 17.4.
contentfilter programpathThe contentfilter studies the received message at the
end of the DATA or BDAT transaction, and produces syncronous report about should be message be accepted or not.
Unlike the PolicyDB, this does not (should not) care about validity of envelope
source and recipient address validities, although perhaps it should consider at least the recipients in some
cases -- e.g. accept about anything when the destination is <postmaster>.
For an operative overview, see Section 12.3, and for deeper details, see Section 17.5.
All lines that are not comments, nor start with uppercase keyword “POLICY” are “EHLO-style patterns”. This is the oldest form of configuring the smtpserver, and as such, it can be seen...
Behaviour is based on glob patterns matching the HELO/EHLO name given by a remote client. Lines beginning with a “#”, or whitespace are ignored in the file, and all other lines must consist of two tokens: a shell-style (glob) pattern starting at the beginning of the line, whitespace, and a sequence of style flags. The first matching line is used. As a special case, the flags section may start with a ! character in which case the remainder of the line is a failure comment message to print at the client. This configuration capability is intended as a way to control misbehaving client software or mailers.
The meanings of the style flag characters are as follow:
fCheck “MAIL FROM” addresses through online processing at the attached router process
tCheck RCPT TO addresses through online processing at the attached router process
vAllow execution of VRFY command online at the attached router process
eAllow execution EXPN command online at the attached router process
RRequire addresses to be in fully qualified (domained) form: “local@remote” (strict 821)
SAllow Sloppy input for systems incapable to respect RFC 821 properly; WinCE1.0 does: “MAIL FROM:user@domain” :-(
The policy database that smtpserver uses is built with policy-builder.sh script, which bundles together a set of policy source files:
| File | What |
|---|---|
| DB/smtp-policy.src | The boilerplate |
| DB/localnames | (“= _localnames”) |
| DB/smtp-policy.relay.manual | (“= _full_rights”) |
| DB/smtp-policy.relay | (“= _full_rights”) |
| DB/smtp-policy.mx.manual | (“= _relaytarget”) |
| DB/smtp-policy.mx | (“= _relaytarget”) |
| DB/smtp-policy.spam.manual | (“= _bulk_mail”) |
| DB/smtp-policy.spam | (“= _bulk_mail”) |
If you want, you can modify your smtp-policy.src boilerplate file as well as your installed policy-builder.sh script.
Basically these various source files (if they exist) are used to combine knowledge of valid users around us:
The controlling boilerplate, which you should modify!
Who we are — ok domains for receiving.
Who can use us as outbound relay.
Use [ip.number]/maskwidth here for listing those senders (networks) we want to trust. You may also use domains, or domain suffixes so that the IP-reversed hostnames are accepted (but that is a it risky thing due to ease of fakeing the reversed domain names):
[11.22.33.00]/24 ip-reversed.host.name .domain.suffix |
Server sets its internal “always_accept” flag at the source IP address tests before it decides on what to tell to the contacting client. The flag is not modified afterwards during the session.
Usage of domain names here is discouraged as there is no way to tell that domain “foo.bar” here has different meaning than same domain elsewere — at “smtp-policy.mx,” for example.
Who really are our MX clients. Use this when you really know them, and don't want just to trust that if recipient has MX to you, it would be ok…
You can substitute this knowledge with a fuzzy feeling by using “acceptifmx -” attribute at the generic boilerplate. List here domain names, possibly suffixes:
mx-target.dom .mx-target.dom |
You CAN also list here all POSTMASTER addresses you accept email routed to:
postmaster@local.domain postmaster@client.domain |
Those users, and domains that are absolutely no-no for senders, or recipients no matter what earlier analysis has shown. (Except for those senders that we absolutely trust..)
user@domain user@ domain |
At the “smtp-policy.src” boiler-plate file there is one particular section containing default setting statements. See figure Figure 12-3 for the salient details concerning this.
Figure 12-3. The smtp-policy.src file default settings fragment
...
#| ===========================
#|
#| Default handling boilerplates:
#|
#| "We are not relaying between off-site hosts, except when ..."
#|
#| You MUST uncomment one of these default-defining pairs, or the blocking
#| of relay hijack will not work at all !
#|
#| == 1st alternate: No MX target usage, no DNS existence verify
#| Will accept for reception only those domains explicitely listed
#| in ``smtp-policy.mx'' and ``localnames'' files. Will not do
#| verifications on validity/invalidity of source domains:<foo@bar>
#
# . relaycustomer - relaytarget -
# [0.0.0.0]/0 relaycustomer - relaytarget -
#
#| == 2nd alternate: No MX target usage, DNS existence verify
#| Like the 1st alternate, except will verify the sender
#| (MAIL FROM:<..>) address for existence of the DNS MX and/or
#| A/AAAA data -- e.g. validity.
#| If RBL parameters are set below, will use them.
#
# . relaycustomer - relaytarget - senderokwithdns + = _rbl1
# [0.0.0.0]/0 relaycustomer - relaytarget - senderokwithdns + = _rbl0
#
#| == 3rd alternate: MX relay trust, DNS existence verify
#| For the people who are in deep s*... That is, those who for some
#| reason have given open permissions for people to use their server
#| as MX backup for their clients, but don't know all domains valid
#| to go thru... Substitutes accurate data to user's whimsical DNS
#| maintenance activities. Vulnerable to inbound MX resource abuse.
#| If RBL parameters are set below, will use them.
. relaycustomer - acceptifmx - senderokwithdns + = _rbl1
[0.0.0.0]/0 relaycustomer - acceptifmx - senderokwithdns + = _rbl0
#| == 4th alternate: Sender & recipient DNS existence verify
#| This is more of an example for the symmetry's sake, verifies that
#| the source and destination domains are DNS resolvable, but does not
#| block relaying
#
# . senderokwithdns - acceptifdns -
# [0.0.0.0]/0 senderokwithdns - acceptifdns -
#
#|
#| You may also add ``test-dns-rbl +'' attribute pair to [0.0.0.0]/0
#| of your choice to use Paul Vixie's http://maps.vix.com/ MAPS RBL
#| system.
#|
#| These rules mean that locally accepted hostnames MUST be listed in
#| the database with ``relaytarget +'' attribute.
#|
#| ===========================
#|
#| RBL type test rules:
#| First RBL variant: NONE OF THE RBL TESTS
_rbl0 # Nothing at early phase
_rbl1 # Nothing at late phase
#| Second RBL variant: Early block with RBL+DUL+RSS
#_rbl0 test-dns-rbl +:dul.maps.vix.com:relays.mail-abuse.org
#_rbl1 # Nothing at late phase
#| Third RBL variant: Late block with RBL+DUL+RSS
#_rbl0 rcpt-dns-rbl +:dul.maps.vix.com:relays.mail-abuse.org
#_rbl1 test-rcpt-dns-rbl +
#| (The "+" at the DNS zone defines is treated as shorthand to
#| "rbl.maps.vix.com")
#|
#| The Third RBL variant means that all target domains can all by
#| themselves choose if they use RBL to do source filtering.
#| The ``= _RBL1'' test *must* be added to all domain instances
#| where the check is wanted.
#| (Including the last-resort domain default of ".")
#| (Or inverting: If some recipient domain is *not* wanting RBL-type
#| tests, that domain shall have ``test-rcpt-dns-rbl -'' attribute
#| pair given for it at the input datasets - consider smtp-policy.mx
#| file!)
#|
#| These rules mean that locally accepted hostnames MUST be listed in
#| the database with 'relaytarget +' attribute. ("acceptifmx *" allows
#| reception if the local system is amonst the MXes.)
#|
#| ===========================
#|
#| If your system has ``whoson'' server (see contrib/whoson-*.tgz),
#| you can activate it by adding 'trust-whoson +' attribute pair to
#| the wild-card IP address test: [0.0.0.0]/0 of your choise.
#|
#| ===========================
#|
#| For outbound relaying control for fixed IP address networks, see
#| comments in file: smtp-policy.relay
#|
#| ===========================
...
|
The ZMailer can do also message content analysis with an external program at the end of DATA-dot phase, and BDAT LAST phase (that is, when the input message is complete, and final acknowledgement is expected by the email sender.)
The program becomes active if PARAM entry “contentfilter” is set:
# External program for received message content analysis: #PARAM contentfilter $MAILBIN/smtp-contentfilter |
More details are at the Reference part: Section 17.5.
The router is the part of the ZMailer that uses algorithms, and control databases to determine what latter stages, like scheduler should do to the message.
The Figure 13-1 repeats earlier picture showing central components of the system, and where the router is in relation to to all.
In following our intention is to cover topics of:
What input data router uses
What output it produces
How the router is configured, including of what is 'dbases.conf' file.
How it can be tuned
FIXME:
- Intro to what the router does to the message
- How the configuration scripts are loaded
- How the standard scripts are tunable by means of databases,
specifically 'dbases.conf'
- The ROUGHT logic of the standard scripts
- What to do if one wants to tune ?
|
The names (determined at compile-time) and interface specifications for the routing and crossbar functions, are the only crucial “magical” things one needs to contend with in a proper router configuration. The syntax and semantics of the configuration file's contents are dealt with in the following subsection. The details of the two functions introduced here are specified after that, once the necessary background information has been given.
Router behavior is controlled by a configuration file read at startup. It is a zmsh(1) script that uses facilities provided built into the router.
The configuration file looks like a Bourne Shell script at first glance. There are minor syntax changes from standard sh(1), but the aim is to be as close to the Bourne Shell language as is practical. In fact some aspects of variable handling are more of PERL style, and others are even LISPish.
The contents of the file are compiled into bytecode, which can then be interpreted by the router. The configuration file is usually self-contained, although an easy mechanism exists to make use of external UNIX programs when so desired. Together with a very flexible database lookup mechanism, functions, and address manipulation based on token-matching regular expressions, the configuration file language is an extremely flexible substrate to accomplish its purpose. When the language is inadequate, or if speed becomes an issue, it is possible to call built in (C coded) functions. The interface to these functions is mostly identical to what a standalone program would expect (modulo symbol name clashes and return values), to ease migration of external programs to inclusion in the router process.
Whenever the router process starts, its first action is to read its configuration file. The
configuration file is a text file which contains statements interpreted immediately when the file is read. Some
statements are functions, in which case the function is defined at that point in reading the configuration file. The
purpose of the configuration file is to provide a simple way to customize the behavior of routing process of the
mailer, and this is primarily achieved by defining the router (at Section
21.4.2), and crossbar (at Section 21.4.3) functions.
For these to work properly, some initialization code and auxiliary functions will usually be needed.
At first sight, a configuration file looks like a Bourne shell script. The ideal is to duplicate the functionality, syntax, and to a large degree the semantics, of a shell script. Therefore, the configuration file programming language is defined in terms of its deviation from standard Bourne shell syntax and semantics. The present differences are:
No repeat statement.
Functions are allowed, parameter lists are allowed. If not enough arguments are present in a function call to exhaust the parameter list, the so-far unbound parameter variables are bound to "" (the empty string) as local variables. For example, this is the identity address rewriting function:
null (address) {
return $address # surprise!
}
|
Multiple-value returns are allowed. The return statement can be used to return a
non-“” (non-empty string) value from a function. The following are all legal
return statements:
return
return $address
return $channel ${next_host} ${next_address}
|
Variables are dynamically scoped, local variables are the ones in a function's parameter list and those declared
with the “local” statement. Only the first
value of a multiple-value return may be assigned to a variable. All values are either strings, or lists, so no type
information, checking, or declaration, is necessary.
Quoting is a bit stilted. All quotes (double-, single-, back-), must appear in matching pairs at the beginning and at the end of a word.
{\bf\large CHECK!} Single quotes are not stripped, double quotes cause the enclosed character sequence to be collected into a quoted-string RFC822 token.
For example, the statement:
foo `bar "`baz`"` |
(apply 'foo (apply 'bar (baz))) |
In standard shells the IFS guides on how variable expansion results are to be
treated. Namely in cases where the expansion happens without being enclosed into double-quotes, the expansion
result is at first split with IFS contained characters forming the separation
sequences.
ZMailer's “shell” behaves alike PERL in this regard, and will not do IFS interpolation on the result.
However, unlike with PERL, double-quoted evaluation will not have its contents re-evaluated.
Thus it is equally safe to do assignments like:
var1='some text here' var2=' more text cat1="$var1$var2" cat2=$var1$var2 |
The notable thing at this particular example is that both result variables are catenates of the input strings.
However! If either of inputs is a list of any kind, then the catenate is not to be done this way! See lappend.
Due to lack of implicite split by the IFS characters, ZMailer “shell” contains function ifssplit.
The for construct is even more strange, and classical Bourne script:
countvar='1 2 3 4 5 6' for x in $countvar; do ... ; done |
Yields only nasty surprise.
Here are two alternates on how to do it:
countvar='1 2 3 4 5 6' for x in $(ifssplit $countvar); do ... ; done countvar=(1 2 3 4 5 6) for x in $(elements $countvar); do ... ; done |
Conditional substitution forms are supported:
${variable:=value}
${variable:-value}
${variable:+value}
|
Patterns (in case labels) are parsed once, the first time they are encountered.
This is like with PERL's “m/../o” patterns.
At the end of a case label, the sequentially next case labels of the same case statement will be tried for successful pattern matching (and the corresponding case label body executed). The only exceptions (apart from encountering a return statement) are:
againa function which retries the current case label for a match
breakcontinues execution after the current case statement
A regular expressions using variant of “case”, with two flavours:
ssiftA “String Shift” where the input string is handled as is.
tsiftA “Token Shift” where the input string is spliced according to RFC-822 tokenization rules. Especially RFC-822 special characters cause tokens to split.
With “tsift” the “.” (dot) will match any single “rfc822-token”, that is, input string “foo.bar” has three tokens: “foo” (atom),
“.” (dot, special), and “bar”
(atom).
Overall usage of these “sifts” is very much
like that of “case”, including the need for
matching termination tokens:
ssift "$invar" in
pattern
statements
;;
tfiss
tsift "$invar" in
pattern
statements
;;
tfist
|
Various standard Bourne shell functions do not exist built in.
The general form of function calls in the system is:
$(funcname arguments) |
It returns a scalar or list object, and the result can be stored into variables at will.
Relations, and other database lookups are constructed as function calls where the relation name is the function name. More about this later.
There are currently only three entry points (i.e. magic names known to the router code) in
the configuration scripts, namely the process, the router,
and the crossbar -functions.
The process() script function is called with a file name as argument. The file is
typically located in the $POSTOFFICE/router/ directory. The
process() is a protocol switch function which uses the form of the file name to
determine how to process different types of messages.
The router() script function is called with an address as argument, and returns a
quad of (channel, host, user, attribs) as three separate values, corresponding to the channel the message should be sent out on (or, the router
function can also be called to check on who sent a message), the
host or node name for that channel (semantics depend per what channel is in effect), and the address the receiving agent should transmit to. The fourth parameter is “attribute” storage variable name
from which a “privilege” value-pair is picked for recipient address security
control functions.
The crossbar() function is in charge of rewriting envelope addresses, selecting
message header address munging type (a function to be called with each message header address), and possibly doing
per-message logging or enforcing restrictions deemed necessary. It takes a sender-quad and a receiver-quad as
arguments (eight parameters altogether). It returns the new values for each element of the two quads, and in
addition a function name corresponding to the function to be used to rewrite header addresses for the specific
destination. If the destination is to be ignored, returning a null function name will accomplish this.
There is a fourth script entrypoint used by the smtpserver program, namely the server(), which is used to implement smtpserver's realtime support facilities for “EXPN”, and “VRFY” commands, and optionally also to process addresses in “MAIL FROM:<…>”, and “RCPT TO:<…>” commands.
The router has several built in (C coded) functions. Their calling sequence and interface specification is exactly the same as for the functions defined in the configuration file. Some of these functions have special semantics, and they fall into three classes, as follows:
Functions that are critical to the proper functioning of the configuration file interpreter:
returnreturns its argument(s) as the value of a function call
againrepeats the current case, and *sift label
breakexits case, and *sift statements
Functions that are necessary to complete the capabilities of the interpreter:
relationdefines a database to the database lookup mechanism
shan internal function which runs its arguments as /bin/sh would
Non-critical but recommended functions:
echoemulates /bin/echo
exitaborts the router with the specified status code
hostnameinternal function to get and set the local idea about the system name
traceturns on selected debugging output
untraceturns off selected debugging output
[, testemulates a subset of “/bin/test” (a.k.a. “/bin/[”) functionality.
The relation
function is described in “Databases”, at section Section
13.2. Functions trace, and untrace are described in connection with debugging.
See Logging and Statistics, section Chapter 16. (This will probably change to Reference/Router/Debugging)
The hostname
function requires some further explanation. It is intended to emulate the BSD UNIX /bin/hostname
functionality, except that setting the hostname will only set the router's idea of the hostname,
not the system's. Doing so will enable generation of “Message-Id:” and “Received:” “trace” headers on all messages
processed by the router.
It is done this way since the router needs to know the official domain name of the local host in order to properly generate these headers, and this method is cleaner than reserving a magic variable for the purpose.
The router cannot assume the hostname reported by the system is a properly qualified domain name, so the configuration file may generate it using whichever method it chooses.
If the hostname indeed is a fully qualified domain name, then:
hostname "hostname" |
Finally, note that a symbol can have both a function-value and a string-value. The string value is of course accessed using the “$” prefix convention of the Bourne shell language.
To test the configuration or routing data, proceed as shown in figure Figure 13-2.
Figure 13-2. Example of running tests on router
sh$ $MAILBIN/router -i (select interactive mode) z$ rtrace (turn tracing on) z$ router user@broken.address (the address that gave you trouble) z$ router another@address (and so on) |
Old salts can use “/usr/lib/sendmail -bt” instead of “router -i”. Once satisfied that routing works, command:
zmailer router |
You can also run the router directly on a message. Copy your message to someplace other than the postoffice (/tmp/ is usually good), to a numeric file name. If the file name is “123”, you run
$MAILBIN/router 123 |
FIXME:
- Intro
- How 'dbases.conf' file works
- How the databases are defined in the deep down inside ('relation' function)
- How lookup works
|
Many of the decisions and actions taken by configuration file code depend on the specifics of the environment the MTA finds itself in. So, not just the facts that the local host is attached to (say) the UUCP network and a Local Area Network are important, but it is also essential to know the specific hosts that are reachable by this method. Hardcoding large amounts of such information into the configuration file is not practical. It is also undesirable to change what is really a program (the configuration file), when the information (the data) changes.
The desirable solution to this data abstraction problem is to provide a way for the configuration file programmer to manage such information externally to ZMailer, and access it from within the router. The logical way to do this is to have an interface to externally maintained databases. These databases need not be terribly complicated; after all the simplest kind of information needed is that a string is a member of some collection. This could simply correspond to finding that string as a word in a list of words.
However, there are many ways to organize databases, and the necessary interfaces cannot be known in advance. The router therefore implements a framework that allows flexible interfacing to databases, and easy extension to cover new types of databases.
To use a database, two things are needed: the name of the database, and a way of retrieving the data associated with a particular key from that database. In addition to this knowledge, the needs of an MTA do include some special processing pertinent to its activities and the kind of keys to be looked up.
Specifically, the result of the data lookup can take different forms: one may be interested only in the existence of
a datum, not its value, or one may be looking up paths in a pathalias database and need to substitute the proper thing in place of “%s” in the string returned from the database lookup. It
should be possible to specify that this kind of postprocessing should be carried out in association with a specific
data access. Similarly, there may be a need for search routines that depend on the semantics of keys or the retrieved
data. These possibilities have all been taken into consideration in the definition of a relation. A relation maps a key to a value obtained by applying the
appropriate lookup and search routines, and perhaps a postprocessing step, applied to a specified database that has a
specified access method.
The various attributes that define a relation are largely independent. There will of
course be dependencies due to the contents or other semantics of a database. In addition to the features mentioned,
each relation may optionally have associated with it a subtype, which is a string value used to tell the lookup routine
which table of several in a database one is interested in.
There are no predefined relations in the router. They must all be specified in the
configuration file before first use. This is done by calling the special function relation with various options, as indicated by the usage strings printed by the relation function when called the wrong way. See figure Figure 13-3.
The “-t” option specifies one of several
predefined database types, each with their specific lookup routine. It determines a template for the set of attributes
associated with a particular relation. The predefined database types are:
bhashthe database is in BSD/SleepyCat DB HASH format.
bindthe database is the BIND nameserver, accessed through the standard resolver routines.
btreethe database is in BSD/SleepyCat DB BTREE format.
dbmthe database is in DBM format. Note that the original dbm had no dbm_close()
function, thus there was no way to dissociate active database from a process. A bit newer variant of dbm has the
close function, and multiple dbm's can be used. (You propably won't encounter this beast at all..)
gdbmthe database is in GNU GDBM format.
headersrouter internal database of various headers, and how they are to be treated.
hostsfile/etc/hosts lookup using gethostbyname().
incorethe database is a high-speed bundle of data kept entirely in the router process core memory. This is for a short-term data storage, like handling duplicate detection.
ldapMechanism for X.500 Directory access lookup with the "Light-weight Directory Access Protocol."
ndbmthe database is in NDBM (new DBM) format. (At which the length of key + length of data must not exceed 1024 bytes!)
orderedthe database is a text file with key-datum pairs on each line, keys are looked up using a linewise binary search in the sorted file.
selfmatcha special type that does translate the numerical address of format 12.34.56.78 (from within address-literal bracets) into binary form, and checks that it is (or is not) actually our own local IP addresses. This is used in address literal testing of addresses of type: localpart@[12.34.56.78].
unorderedthe database is a text file with key-datum pairs on each line, keys are looked up using a sequential search. First to match is used.
ypSun SunOS 4.x YP (these days "NIS") interface library.
A subtype is specified by appending it to the database type name separated by a slash, or a comma. For example,
specifying bind/mx as the argument to the “-t” option will store “mx” for reference by the access routines whenever a query to that relation is processed.
The subtypes must therefore be recognized by either the database-specific access routines (for translation into some
other form), or by the database interface itself.
For unordered and ordered database types, the datum
corresponding to a particular key may be null. This situation arises if the database is a simple list, with one key per
line and nothing else. In this situation, the use of an appropriate post-processor option (e.g. “-b”) is recommended to be able to detect whether or not the
lookup succeeded.
The “-f” option specifies the name of the
database. This is typically a path that either names the actual (and single) database file, or gives the root path for
a number of files comprising the database (e.g. “foo”
may refer to the NDBM files “foo.pag” and “foo.dir”). For the hostsfile type of
database, the /etc/hosts file is the one used (and since the normal “hosts” file access routines do not allow specifying different file, this cannot be
overridden).
The “-s” option specifies the size of the cache.
If this value is non-zero (by default it is 10), then an LRU cache of this size is maintained for previous queries to
this relation, including both positive and negative results.
The “-e” option specifies the cache data
expiration time in seconds.
The “-b” option asks that a postprocessor is
applied to the database lookup result, so the empty string is returned from the relation query if the database search
failed, and the key itself it returned if the search succeeded. In the latter case, any retrieved data is discarded.
The option letter is short for Boolean.
The “-n” option asks that a postprocessor is
applied to the database lookup result, so the key string is returned from the relation query if the database search
failed, and the retrieved datum string is returned if the search succeeded. The option letter is short for
Non-Null.
The “-l” option asks that all keys are converted
to lowercase before lookup in the database. This is mutually
exclusive with the “-u” option.
The “-u” option asks that all keys are converted
to uppercase before lookup in the database. This is mutually
exclusive with the “-l” option.
The “-d” option specifies a search routine. Most
commonly used argument for this option is “pathalias”, specifying a driver that searches for the key using domain name lookup
rules.
The “-C” option specifies a configuration file
for the underlying database mechanism. Exact details depend by the database mechanisms.
The “-%” option enables substitution of “%0” thru “%9” patterns in the db lookup results with key, iterated partial key, or positional
parameter to lookup of the database. See Reference Section 21.5.50 for
more information.
Figure 13-4. Some examples of relation definitions
relation -lmt $DBTYPE -f $MAILVAR/db/aliases$DBEXT aliases relation -lm%t $DBTYPE -f $MAILVAR/db/fqdnaliases$DBEXT fqdnaliases relation -lm%t $DBTYPE -f $MAILVAR/db/routes$DBEXT -d pathalias routes if [ -f /etc/resolv.conf ]; then relation -nt bind/cname -s 100 canon # T_CNAME canonicalize hostname relation -nt bind/uname uname # T_UNAME UUCP name relation -bt bind/mx neighbour # T_MX/T_WKS/T_A reachability relation -t bind/mp pathalias # T_MP pathalias lookup else relation -nt hostsfile -s 100 canon # canonicalize hostname relation -t unordered -f $MAILBIN/db/hosts.uucp uname relation -bt hostsfile neighbour relation -t unordered -f /dev/null pathalias fi |
Figure 13-5. More examples of alternate forms of database reference
#
# We maintain an aliases database, and may access it via NDBM,
# or via indirect indexing:
#
if [ -f $MAILBIN/db/aliases.dat ]; then
relation -t ndbm -f $MAILBIN/db/aliases aliases
else
relation -it ordered -f $MAILBIN/db/aliases.idx aliases
fi
|
Figure 13-6. More miscellaneous relation definitions to illustriate various
possibilities
relation -t unordered -f /usr/lib/news/active -b newsgroup relation -t unordered -f /usr/lib/uucp/L.sys -b ldotsys relation -t ordered -f $MAILBIN/db/hosts.transport -d pathalias transport |
The final argument for the relation is not preceeded by an option letter. It specifies
the name the relation is known under. Note that it is quite possible for different relations to use the same database
(like in case of “bind”).
Some sample relation definitions are in figure Figure 13-4. That fragment defines a set of relations that can be accessed in the same way, using the same names, independent of their actual definition.
CHECK! (-i option!) As the comment
says, the relation name aliases has special significance to the
router. Although the relation is not special in any other way (i.e. it can be used in the normal
fashion), the semantics of the data retrieved are bound by assumptions in the aliasing mechanism. (Or more
specifically, actually database compilation in case this isn't “ordered” or
“unordered” file will handle this.)
These assumptions are that key strings are local-name's, and the corresponding datum gives a byte offset into another file (the root name of the aliases file, with a “.dat” extention), which contains the actual addresses associated with that alias.
The reason for this indirection is that the number of addresses associated with a particular alias can be very large, and this makes the traditional simple database formats inadequate. For example, quick lookup in a text file is only practical if it is sorted and has a regular structure. A large number of addresses associated with an alias makes structuring a problem. The situation for DBM files and variations have problems too, due to the intrinsic limits of the storage method. The chosen indirection scheme avoids such problems without loss of efficiency.
More examples on figure Figure 13-6, where the first two illustrate convenient coincidences of format, and the last definition shows what might be used if outgoing channel information is maintained in a pathalias-format database (e.g. “bar smtp!bar” means to send mail to “bar” via the SMTP channel).
The pathalias is an UUCP era thing, and not quite what one would need these days, but just in case…
Accessing route databases is a rather essential capability for a mailer. At the University of Toronto, all hosts access a centrally stored database through a slightly modified nameserver program. If such a setup is not practical at your site, other methods are available. The most widespread kind of route database is produced by the pathalias program.
The current ZMailer can do two separate things, which were combined into the old pathalias idea:
relation defines driver
routine with “-d pathalias”
relation defines that lookup result contained “%0” thru “%9”
strings may be substituted (the “-%”
option).
The pathalias generates key-value pairs of the form:
uunet ai.toronto.edu!uunet!%s .css.gov ai.toronto.edu!uunet!seismo!%s |
which need to be post-processed to:
uunet ai.toronto.edu!uunet!%0 .css.gov ai.toronto.edu!uunet!seismo!%0 |
which when queried about “uunet” and “beno.css.gov” correspond to the routes:
ai.toronto.edu!uunet ai.toronto.edu!uunet!seismo!beno.css.gov |
Notice that there are two basic forms of routes listed: routes to UUCP node names and routes to subdomain gateways. Depending on the type of route query, the value returned f