[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fix to use the saved errno in scheduler/transport.c



On 18/10/05, Matti Aarnio <mea@nic.funet.fi> wrote:
> On Tue, Oct 18, 2005 at 11:15:31AM +0100, Alex Kiernan wrote:
> > We've had a long standing problem of the scheduler aborting very
> > occasionally. I went and stared at the relevant code earlier today and
> > it looks like it might be the classic errno getting overwritten by a
> > later syscall problem. What's stranger is the code has all the
> > machinery to avoid the problem, it just doesn't actually use the saved
> > value!
> >
> > I see the problem so infrequently I can't really test it, but I'm
> > hoping this fixes it.
>
> Ouh..   since aeons ago I did remove a bunch of abort()s from
> other subsystems, but obviously have left them here..
>
> Use of  abort()  in code should be reserved to cases of serious
> s***t happening; which usually manifests itself as  SIGSEGV..
> Syscalls resulting with odd error codes is not truly fatal.
>
> I haven't encountered this error at all, which doesn't really
> preclude it from being real.  (At  smtpserver  I had a number of
> odd cases when error processing did hard failures -- but prolonged
> exposure to Solaris as system environment did cure me of most such
> false expectations in there..)
>

We see it once every few months (this is on Solaris FWIW).

> I think your fix is at least half-way to it, but true fix is to
> make system to be upset only about things that are truly worth
> the upset, and otherwise just log and ignore them.
>

Agreed.

--
Alex Kiernan
-
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi