[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: weekend datadump at mea.tmt.tele.fi



[talking of smtp becoming stuck, SIGALRM ineffectivity]
> Interesting ...  I never upgraded to 2.99.48 on my Solaris systems because
> I never saw the stuck 'smtp' processes as on Linux.  I am still happily
> running 2.99.47 here.  Also, since running 2.99.48p3 on Linux, I have not
> seen the stuck 'smtp' processes.  Just a datapoint ... but it would seem
> to be saying that something that's recently changed is causing the
> problem.

	The Solaris is weird.  Same code works fine on Linux,
	SGI, and FreeBSD, but becomes stuck on Solaris...

	Recently our main mail-relay's smtp-server caused an alert
	on the periodic service quality monitor by presenting following
	at the log (and doing an exit() after it..):

00000#  started server pid 460 at Thu, 3 Apr 1997 17:51:59 +0300
000000# accept(): No child processes
00000#  started server pid 27408 at Sat, 5 Apr 1997 18:57:16 +0300

	The smtpserver did until then consider such a major
	catastrophe, but then I decided that as the  accept()
	at Solaris can get weird errors not listed at its
	documentation, who am I to argue ?   Sigh...

	Perhaps the freeze-out of the smtp-client is similar
	Solaris weirdo...  Hmm.. or..

	It is possible that the Solaris uses POSIX sigaction(),
	and does REQUIRE usage of flag  SA_NODEFER.  The ALRM
	handler does after all do a longjump, and will not return
	from the signal handler per se...

	In such a case the first timeout works, and the second one
	jams..

> Roy
> rcb@press-gopher.uchicago.edu

	/Matti Aarnio <mea@nic.funet.fi>