[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Multiple Routers Processing A Single Message
> We're running version 2.99.38 on IRIX 5.3 systems (using EFS filesystems)
> with NROUTERS set to 10. The (I think) relevant piece of code in
> router/functions.c hasn't changed as of 2.99.46 though.
It hasn't. I am more and more convinced that it should
revert back to the old link()/unlink() method, which
behaviour apparently can be made completely deterministic,
and most definitely it will not behave in this SysVr4
manner that we so much see...
Your locking code is an interesting idea, though :-)
> I've witnessed instances of multiple routers apparently somehow renaming the
> queue file and then processing it (redundantly), resulting in the the mail
> being delivered as multiple discrete messages.
> As an example, the following entries (split and wrapped for readability)
> from the router log show three different routers grabbing the same message
> and processing it independantly (right?):
> The IRIX man page for rename(2) doesn't actually say that it's an atomic
> operation. I'm thinking that maybe it's not. :-(
Yes, if it yields an error, the target name may already exist,
but the origin has not yet been removed...
Umm.. EBUSY error it was ? (At least with Solaris)
> Would it perhaps be appropriate to put some sort of locking around the call
> to eqrename() to force single-threading in that section? It seems like the
> proper paranoid thing to do anyhow... especially for systems that don't have
> rename() and must therefore make do with a link() followed by an unlink().
Perhaps not -- those are safe from this rename() fallacy :-(
> I'm trying this approach (using lockf() on the original file, which I know
> isn't portable) just to see if it works. I've attached a context diff (just
> cut-n-pasted, so any TABs may be destroyed) to router/functions.c in 2.99.46
> below just so you can see what I'm talking about.
/Matti Aarnio <email@example.com>