[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fix to use the saved errno in scheduler/transport.c



On Tue, Oct 18, 2005 at 11:15:31AM +0100, Alex Kiernan wrote:
> We've had a long standing problem of the scheduler aborting very
> occasionally. I went and stared at the relevant code earlier today and
> it looks like it might be the classic errno getting overwritten by a
> later syscall problem. What's stranger is the code has all the
> machinery to avoid the problem, it just doesn't actually use the saved
> value!
> 
> I see the problem so infrequently I can't really test it, but I'm
> hoping this fixes it.

Ouh..   since aeons ago I did remove a bunch of abort()s from
other subsystems, but obviously have left them here..

Use of  abort()  in code should be reserved to cases of serious
s***t happening; which usually manifests itself as  SIGSEGV..
Syscalls resulting with odd error codes is not truly fatal.

I haven't encountered this error at all, which doesn't really
preclude it from being real.  (At  smtpserver  I had a number of
odd cases when error processing did hard failures -- but prolonged
exposure to Solaris as system environment did cure me of most such
false expectations in there..)

I think your fix is at least half-way to it, but true fix is to
make system to be upset only about things that are truly worth
the upset, and otherwise just log and ignore them.


> --
> Alex Kiernan
-- 
/Matti Aarnio	<mea@nic.funet.fi>
-
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi