[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reworking on smtpserver subsystems..

On Fri, Apr 02, 2004 at 02:06:53PM +0400, Eugene Crosser wrote:
> My thoughts on this matter are as follows:
> - Forking model should be Apache-like, with preforked processes that can
> be reused up to N times.  In border case, you may have zero preforked
> servers and N=1, which gives one forked process per session.  Normally,
> you can have a pool of no more than MAX processes preforked and ready to
> serve.  This approach is very portable, efficient and does not suffer
> from problems inherent to thread model.

Transport agents are almost like that, there is no upper bound in
processed jobs per TA, though.  It doesn't matter that much, after
most glaring memory leaks got plugged in them recently.

Routers are started, and run forever.  Again no upper bound in number of
jobs completed, nor in memory consumption, altough its present internal
"LISPish" object handling is fairly good, and has garbage collector.
(I do remember back in early 1990es when a 1000 recipient message caused
router to bloat up to 600 MB... )

In both cases implementing such upper bound by number of jobs, or what-
ever, should be trivialish to do.

The  smtpserver is 0-prefork, N=1 (client closed -> process terminates)
thing, though.  It isn't entirely clean enough code to be converted into
threads, nor (exactly) into 'return to resource pool'.

There are upper bounds in number of running TA processes, and in number
of running routers (except when running interactive routers under
smtpserver, which is the thing that I started this thread with...)

> - All elements (smtpserver, contentfilter, router *and* scheduler with
> transport agents) should use same technology of process management and
> interprocess communication.

To a degree they do use.  (e.g. pipes/socketpairs),
however the details on how things are used do vary, and not
all of the communication code are librarized.
(socketpair creation is, though:  scheduler/pipes.c  although it
isn't placed into a library, it is used by both scheduler, and
router to do the same things.)

> - You will need queues on persistent storage anyways.  Let's think that
> queue objects are files on the local filesystem.  The object handle will
> be path (thus an ascii string).  This is easily extendable to the queue
> kept in a database.  So the elements of the MTA will pass each other
> queue object handles.

Both "master router" and scheduler do tell job handles to their childs.
They get back diagnostics, which are processed, or just logged.

Present "1 message, 1 file" (or two files after router) has benefits,
but commercial systems use also approaches, where the queued messages
are inside a database..

To a degree it might even make some sense, e.g. by using SleepyCat DB
in CDB mode, and possibly even with transactions.
Programs taking part in data access must have at least group-write
access (as 'set gid') to the things going on in that db (if not 
'set uid')..

Anyway, such is quite far from "minimum technology needed" approach that
ZMailer has used so far.

If the 'transport' data would live in a DB, and router would write it,
then, perhaps, the scheduler could be spiffier in its job assimilation,
hold less data in memory, and have persistent storage for temporary
diagnostics things...

In Router the 'queue' is quite simple, messages in age order are picked
up from given 'router*' directory-tree, and fed to preforked (actually
'on demand forked') processing farm.

In Scheduler the 'queue' is rather complicated collection of circullar
buffers, and the scheduler keeps in-core temporaryish data about
recipients that were retried, and their diagnostics.
The queue there contains also each individual recipient specifier line..
That is (as I see it) the reason for unpredictable process sizes
of the scheduler.  Drop in a million spams into the box, and all the
sudden you have unmanagable queue.  (I had this situation at work
earlier this week...)

Presume that 'transport' files were to be relation entries in a DB:
What would be smart way to build it ?  A relational DB ? SleepyCat BTREE ?
GDBM ?  Or should there be multiple database files, each for their own
relation direction ?
  - {spool-file}  -> transport data
  - {spool-file,recipient-channel,recipient-host} -> list of files
  - {spool-file,'diags'} -> accumulated diagnostics
  - {recipient-channel,recipient-host} -> list of spool-files
  - {recipient-channel,recipient-host,'schedule'} -> scheduling data

Or if the 'transport' files still exist, just queue summary data lives
in these databases ?  No need to change libta/ctlopen.c  and friends.

> - Initial processing of existing queue after restart/reboot can be done
> by a separate tool, so no need to scan directories from router/scheduler
> processes.

Well...  it does depend..  Router's queue scanning is trivial, but
scheduler's isn't.  Far from it.  To discover what jobs to process
the scheduler needs to read the transport specification file, as
jobs aren't processed one jobs at the time, but rather one destination
at the time (plus parallellisms)

> - Data access should be kept separate from processing management.  I
> mean, queue object should be opened as a file or retreived from a
> database, while the *handle* (e.g. path to the file) passed over sockets
> or whatever.  Actual data never is passed over IPC.

This is what router-queue-runner (the "master") does when talking to
its actual workers.  (And the scheduler talking to its transport-agent

> - Use sockets for IPC, not doors/shm/whatever.  Normally unix domain
> sockets (but if the queue is in a database that may be TCP).  Sockets
> are the most portable of all IPC means.

For some things the  doors  looks rather smart, so do POSIX Message
Queues.  They are not suitable for all jobs, of course.

socketpair(AF_xxx, SOCK_STREAM, ..),  or SysV bidirectional pipes ?
Or just two pipe(2)s ?

They sure are the most portable, but it isn't the first time that
we do things "under the hood" in different ways in some situations.
See for example  scheduler/pipes.c   :-)

Nice librarization hides gory details, but still some things are not
doable at all, if certain facilities are not available..
(E.g. there is no sense in talking about SMTP if there is no TCP 
networking in the system..  In the early 1990es I did things in
systems that didn't have it, though.  Networking was by means of
virtual punch cards over BISYNC NJE network...)

I did consider  doors  for tasks needing feedback of successfull
message passing onwards,  however inherent assumption is always,
that there are threads, and the application receiving the door_call()
is thread-safe.  (door_call()er can be non-thread thing.)
Also, the door_call() is strictly  client-server, you can never send
back more than single reply (TAs are blabbermouths, and talk all the
time back to the scheduler about each recipient address in the job

Most complex place needing non-trivial message passing would be
a way, where smtpserver creates some socketpairs, and gives them
to a pre-fork()ed set of servers (interactive router, content filter,
rate tracker) one for each to be used in receiving IPC calls of
various messages from actual smtp serving process instances.

Then each incoming SMTP connection gets (of course) the SMTP
socket, plus also other end of these sockets/pipes, and when
a processing smtpserver instance then wants to call e.g. router,
it must at first create a socketpair, and send its remote end
to the router resource manager server (e.g. 'multiplexer')

A way not needing fd passing capability would be to create socket
instances for each smtpserver instance of each server talker.
At worst that would be two plain pipe(2)s for each server..
And do remember that each smtpserver instance inherits all fds..
Say there is 256 fd limit in the system libc stdio, to handle
even 43 parallel incoming smtp sessions...  all uses of libc stdio
must be replaced with  sfio (like was done in scheduler.)
Next limit would be whatever is RLIMIT_NOFILE...  (Say: 1000 -> 160
parallel SMTP sessions -- not too bad..) ... but the fork()ing
master would have tons of fds becoming inherited down to
children unneeded, and be terribly sluggish to spin around its
master select() loop checking at IOs at all those sockets..

No way...  Horrible mess!

Using named rendezvous points (e.g. listening named AF_UNIX sockets)
to contact smtpserver's auxiliary sub-servers would be workable, but
would need secured (only 'daemon' user can access inside a given 
directory, or some such) way to do it.  Long ago I did notice that
mere '600' protection with some owner does not limit AF_UNIX
connectivity.  There are no access protection semantics in
AF_UNIX name space, apparently.

> Eugene

/Matti Aarnio	<mea@nic.funet.fi>
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi