[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reworking on smtpserver subsystems..

To: Eugene Crosser <crosser@rol.ru>
Subject: Re: Reworking on smtpserver subsystems..
From: Matti Aarnio <mea@nic.funet.fi>
Date: Mon, 5 Apr 2004 18:49:33 +0300
Cc: zmailer@nic.funet.fi
In-Reply-To: <1081154772.7677.96.camel@ariel.sovam.com>; from crosser@rol.ru on Mon, Apr 05, 2004 at 12:46:15PM +0400
Original-Recipient: rfc822;zmailer-log@nic.funet.fi
References: <20040401200023.W29421@nic.funet.fi> <1080900412.6647.49.camel@ariel.sovam.com> <20040402230422.Z29421@nic.funet.fi> <1081154772.7677.96.camel@ariel.sovam.com>
Sender: zmailer-owner@nic.funet.fi

On Mon, Apr 05, 2004 at 12:46:15PM +0400, Eugene Crosser wrote:
> On Sat, 2004-04-03 at 00:04, Matti Aarnio wrote:
> > On Fri, Apr 02, 2004 at 02:06:53PM +0400, Eugene Crosser wrote:
> > > My thoughts on this matter are as follows:
> > > 
> > > - Forking model should be Apache-like, with preforked processes that can
> 
> > Transport agents are almost like that, there is no upper bound in
> 
> First, a couple of notes: my sentiments in this message as well as in
> the previous one are more about "how I would like to see it in ideal
> world" than "how I suggest to do it" ;-)  Then, some of them exactly
> match the current status of things.  That's why I like Zmailer, after
> all :-)

I know  :-)     Nevertheless in order to get my own thoughts clear,
I am using you (and others) as a sounding board.

> What we have now are managed server farms that look for job on their own
> (router farm), "one session - one process" farms that accept jobs on
> sockets (smtpserver), farms where process management and job management
> are performed by a single entity (scheduler managing the transport
> agents farm).

In case of routers we had for a long time N parallelly running processes
that each did look for new jobs in destination directory, and therefore
did these filename munging things:  123 -> 123-14567   to have some sort
of locks...

Now we have one master router that scans directories (infrequently, like
every minute or so) for new jobs, but also listens $ZENV{'ROUTERNOTIFY'},
which is rather simple PF_UNIX/SOCK_DGRAM thing. If the master router gets
a record telling that there is a new job in spool-file   N/123,   and
there is a file with that name, it adds that into work queue, and 
commences job start for it (if it is the only one to handle.)
(There are two more similar PF_UNIX sockets:  SCHEDULERNOTIFY, and
 INPUTNOTIFY.  See  http://www.zmailer.org/man/zmailer.conf.5zm.html )
Security-wise these are easy; just a "heads up" about something.
All an internal Mallory can do is to flood the notification socket
with junk, and thus load the system.

The directory scan is there to handle missed new-msg notifications, as
well as for startup.

But..  interactive router usage under smtpserver is quite entirely
another breed of a beast...

> What I think would make sense is a uniform model for server farms
> (Apache like), all of them getting job requests on (tcp or unix domain)
> sockets.  There may be a router farm, a contentfilter farm, and possily
> a number of transport agent farms (e.g. one farm per channel).  Thus the
> scheduler (which probably has to be a single process) will not have to
> manage process starting and terminating, and concentrate on queue
> management.

Using Apache as a model has one disadvantage as I see it:
   *  Apache 1.3 serves clients as:  1 process = 1 connection
When amount of parallelly running connections is smaller than
preforked server count, some of them do go into "idle" spool, that
are doing that  accept()  you refer below.

What I would like to do in ZMailer's smtpserver is a bit different:
- Smtpserver starts router-multiplexer server in its init
- The multiplexer has a limited number of router processes
- If there is more call for router activity, than preset
  limits allow, callers get served in fair(?) round-robin
- The router processes are a bit memory leaky when used in
  interactive mode, therefore there must be a limit on number
  of processed addresses (10k - 100k ?)
  (There are memory consuming symbol lists that are not quite
   trivially flushable from a script.)

The contentfilter is quite similar in its behaviour, I think.

The ratetracker  is entirely different beast.
It is unique in a sense that it doesn't run any external
server to do its bidding.  It might talk with other peers
in a multicast cluster, but even that is not truly necessary.
(And it is a _tracker_, limits are applied in policy code,
if so configured...)

>    On the other hand, farm managers could be very simple
> multiplexors, in most cases doing nothing but wait() and occationaly
> fork() (because on most systems even multiplexing is not necessary, the
> childs just listen() on the same socket and the OS dispatch them as
> appropriate).

That is fine model for  accept() - until you begin to worry about 
thundering herds...  (It really should not be all that bad thing
for SMTP, for HTTP it might be entirely another story...)

> Now, smtpserver will inject jobs to the router farm over the socket, the
> router pass it to the scheduler, again, over another socket, and
> [scheduler] pass it to the transport agent much like it does now *but*
> over the socket rather than pipe.

Here I got a bit confused,  do you mean "pass the message file over
the socket"  or just (as we already do with notifications), tell about
new jobs available in the queue ?

Hmm..  Yes,  lib/pipes.c ( = scheduler/pipes.c ) does confuse you
a bit  :-)

> On reboot/restart, a separate ("run once") process can traverse queue
> directories and inject old jobs emulating smtpserver for the router, and
> another one emulating router for the scheduler.

This is for the external persistent queue model.  Any further thoughts
about how that persistency could be implemented ?

As the queue in router is trivial (run in any|age-order, unless bounced
around due to e.g. router crashing, which puts the thing at the tail
of the queue --> some messages go thru, but some keep crashing the
system...)

> You are quite right that there is an issue with permissions handling of
> unix domain sockets on some systems (all SysVs maybe?), but there is an
> easy workaround: create them in a directory with appropriate
> permissions.

Been there, done also that, and couple other tricks.
I had regular file with selective protection (user+group read,
utilities were 'set-gid' and did read the file for current 32-bit
"secret", and passed that over the socket to identify themselves.)

A way to avoid need for such complications is to use master-inherited
"anonymous" socketpair() sockets.

> This approach also minimizes the "fd space pollution".

After last nights coding binge, I am fairly confident, that this
"pollution" will not be an issue.  The SMTP(el.al.) accept()ing
master process has  lib/fdpassing.c   sockets in its fd-space
(one socket per subdaemon contact point), and those get inherited
to all smtpserver instances doing actual processing.
The  smtpserver/subdaemons.c  file is very much "work in progress",
and its internal architecture did change twice last night...
I won't yet predict, what it will be tomorrow.
(As much code-sharing as possible, etc..  Right now all have
common  subdaemon_loop()  which gets "subdaemon_handler" function
pointer as a parameter -- theory being that one instance of working
code is simpler to make, than 3 subtly different ones...)

>  The only bottleneck continues to be the scheluler, but even that
> will need half as many descriptors as it uses now.

Half of one descriptor per transport agent is ...  integresting ;-)
I think you have missed the quirks in  lib/pipes.c  (formerly in
scheduler/pipes.c);  when ZMailer is being compiled in a system with
socketpair(2) call, it will be used.  The fall-back will use classic
pipe(2), that are unidirectional, and are needed in pairs.
(Presumption is, that if the system doesn't have  socketpair(2),
things are grim indeed...)

Of course I abstracted things a bit so that I have the necessary
file-descriptor massaging code in one place.

I am also considering adding named-sockets into this code, but
I don't see any real use for it (except in very old Linux, that
didn't do descriptor-passing over AF_UNIX sockets.)

Proper abstraction-point will be something like "connect_subdaemon_xx()"
that returns necessary file-handle(s), and hides all gory details about
how the connection was actually made.
(And allows implementing that named-socket fallback, if necessary.)

> The smtpserver is somewhat special because it accepts real data on the
> socket, but it could still benefit from the farm approach.  Even if the
> current code does not allow process reuse (due to lack of object
> destructors and otherwise unclean memory management), it *may* still
> make sense to keep some preforked processes (or may not...)

Very deep in my "todo" set of thoughts is to use threads in smtpserver,
and in scheduler, but as long as we manage without, I really haven't
done all state-abstraction that is necessary to handle it.
(e.g. there is   SmtpState*  type thingie in smtpserver, but it isn't
quite as "free-threaded" as it needs to be...)

Anyway, smtpserver's initial reply time is dominated by a few DNS lookups
it does right after having completed fork() from the main accept()or.
And overall the beast is rather lightweight (as compared to the router
program...)

> > Present "1 message, 1 file" (or two files after router) has benefits,
> > but commercial systems use also approaches, where the queued messages
> > are inside a database..
> 
> If sockets are used to pass queue object handles, then the current file
> approach is quite easily extended to database approach, only interface
> functions will change, not the "business logic" code.  Having queue
> objects in a database would (theoretically) allow to keep queue on one
> physical server, smtpserver on another, router on the third, etc.  All
> of them would communicate over tcp sockets and access actual message
> data via particular database API.

Uh....  I do know that Oracle has a product doing something quite like
what you describe, but would it  a) really make sense,  b) be lightweight
enough to be worthwhile without huge server hardware ?
(I do calculate the number of syscalls/context-switches to achieve 
something, the fewer, the better...  of course presuming that my
code doesn't waste CPU power otherwise unnecessarily.)

> > Anyway, such is quite far from "minimum technology needed" approach that
> > ZMailer has used so far.
> 
> To me, it even looks like simplification - use one technology for all
> communication...  Of course sockets are one step "higher tech" than
> pipes, but who has socket-less systems nowdays?

So true, so true..   I can't say for sure that ZMailer works
anymore in socket-less system at all.

> > socketpair(AF_xxx, SOCK_STREAM, ..),  or SysV bidirectional pipes ?
> > Or just two pipe(2)s ?
> > 
> > They sure are the most portable, but it isn't the first time that
> > we do things "under the hood" in different ways in some situations.
> > See for example  scheduler/pipes.c   :-)
> 
> I vote for named sockets, possibly upgradable to tcp sockets in future
> (and separation of process creation from passing job requests, see
> above).

Yep.

> Eugene

/Matti Aarnio	<mea@nic.funet.fi>
-
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi

Follow-Ups:
- Re: Reworking on smtpserver subsystems..
  - From: Eugene Crosser <crosser@rol.ru> (Tue, 6 Apr 2004 15:33:23 +0300)

References:
- Reworking on smtpserver subsystems..
  - From: Matti Aarnio <mea@nic.funet.fi>
- Re: Reworking on smtpserver subsystems..
  - From: Eugene Crosser <crosser@rol.ru>
- Re: Reworking on smtpserver subsystems..
  - From: Matti Aarnio <mea@nic.funet.fi>
- Re: Reworking on smtpserver subsystems..
  - From: Eugene Crosser <crosser@rol.ru>

Prev by Date: Re: Reworking on smtpserver subsystems..
Next by Date: Re: Reworking on smtpserver subsystems..
Prev by thread: Re: Reworking on smtpserver subsystems..
Next by thread: Re: Reworking on smtpserver subsystems..
Index(es):
- Date
- Thread