[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Miraculous problem, possibly BAD!

On Thu, Dec 02, 1999 at 01:38:37PM +0300, Eugene Crosser wrote:
> I suddently noticed a thing that I did not pay attention before, and hardly
> could beleive.  Actually, I cannot fully beleive in it now.  But it looks
> really bad.
> I noticed that mail sometimes (rarely) disappears.  I started looking
> into $POSTOFFICE and found it.  It was lying in queue/?/nnnn-nnnn files;
> there where *no* corresponding transport/?/* files, mailq did not see it.
> Move those files into router/ and mail gets delivered.  There where a few
> messages half a year old!  And a few quite fresh.

	I do suspect the problem lurks around Solaris directory
	handling weirdos...

	Perhaps (still) the sequence of handling things isn't quite
	right, and two routers who have been processing same file
	cross their ways at which of them gets to move the original
	spoolfile under what name to that QUEUEDIR.

	Could you play play a bit of  Devils Advocate to   router/rfc822.c
	where that processing happens ?  Could you guess what *must* happen
	for the observed effects to occur ?
	(Excluding possibility of the Scheduler *not* unlinking the queue
	 directory file for some reason.)

> I would think that these are files left of hard reboots etc. but there
> where two cases when I *saw* messages going this way.  They where
> submit locally by `sendmail' frontend and went directly to queue/...,
> no traces in the log.
> I am puzzled.  Any thoughts?
> System is Solaris 2.5.1 and 2.6 on SPARC, filesystem is simple UFS
> on local disk.

	My Linux doesn't have such, but then I think I run *one* router
	process..  Not a competeting worker-gang..

	... which reminds me, at CVS I have code which has rudimentary
	scheduler-like input directory scanner and job dispatcher for
	the router environment.  As a side-effect, now router's  "-L"
	option really works, *and* one *never* needs to restart the
	routers to rotate the routerlog.  (Although, I think, my goal
	priorities were: 1: logging fix, 2: worker gang - thus the
	job-scheduling for router is rather rudimentary..)

	... also, as a side-effect, the routers should *not* encounter
	these Solaris-like directory rename() problems which they so
	far have had...  (But will it be faster than the old way ?
	I don't know yet!)

> Eugene

/Matti Aarnio	<mea@nic.funet.fi>