[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Miraculous problem, possibly BAD!
On Thu, Dec 02, 1999 at 01:38:37PM +0300, Eugene Crosser wrote:
> I suddently noticed a thing that I did not pay attention before, and hardly
> could beleive. Actually, I cannot fully beleive in it now. But it looks
> really bad.
>
> I noticed that mail sometimes (rarely) disappears. I started looking
> into $POSTOFFICE and found it. It was lying in queue/?/nnnn-nnnn files;
> there where *no* corresponding transport/?/* files, mailq did not see it.
> Move those files into router/ and mail gets delivered. There where a few
> messages half a year old! And a few quite fresh.
I do suspect the problem lurks around Solaris directory
handling weirdos...
Perhaps (still) the sequence of handling things isn't quite
right, and two routers who have been processing same file
cross their ways at which of them gets to move the original
spoolfile under what name to that QUEUEDIR.
Could you play play a bit of Devils Advocate to router/rfc822.c
where that processing happens ? Could you guess what *must* happen
for the observed effects to occur ?
(Excluding possibility of the Scheduler *not* unlinking the queue
directory file for some reason.)
> I would think that these are files left of hard reboots etc. but there
> where two cases when I *saw* messages going this way. They where
> submit locally by `sendmail' frontend and went directly to queue/...,
> no traces in the log.
>
> I am puzzled. Any thoughts?
>
> System is Solaris 2.5.1 and 2.6 on SPARC, filesystem is simple UFS
> on local disk.
My Linux doesn't have such, but then I think I run *one* router
process.. Not a competeting worker-gang..
... which reminds me, at CVS I have code which has rudimentary
scheduler-like input directory scanner and job dispatcher for
the router environment. As a side-effect, now router's "-L"
option really works, *and* one *never* needs to restart the
routers to rotate the routerlog. (Although, I think, my goal
priorities were: 1: logging fix, 2: worker gang - thus the
job-scheduling for router is rather rudimentary..)
... also, as a side-effect, the routers should *not* encounter
these Solaris-like directory rename() problems which they so
far have had... (But will it be faster than the old way ?
I don't know yet!)
> Eugene
--
/Matti Aarnio <mea@nic.funet.fi>