Re: [ZMailer] Zmailer crashes

On Sun, Feb 01, 2009 at 03:52:44PM +0200, Matti Aarnio wrote:

> On Fri, Jan 30, 2009 at 03:32:31PM -0800, Neal Morgan wrote:
> > > On October 31, 2008 9:03 AM Ralf Baechle wrote
> > > Since quite a while I'm observing these kernel messages on a Linux x86_64
> > > system:
> > >
> > > sm[3270]: segfault at 3ba7f9f0 ip 79fbc9 sp 7fffe7c48e30 error 6 in
> > libc-2.7.so[72d000+14d000]
> > > sm[3493] trap stack segment ip:7f0e2a121bc9 sp:7fff3240e4a0 error:0
> > > sm[3773]: segfault at 3ba7f9f0 ip 79fbc9 sp 7fff55499680 error 6 in
> > libc-2.7.so[72d000+14d000]
> > 
> > Matti: I've been seeing these across 4 servers:
> > 
> > kernel: smtpserver[31693]: segfault at 00000000 eip b7c16371 esp
> > bf94b018 error 4
> > 
> > kernel: router[9934]: segfault at 00000008 eip 0807fa95 esp bfdf5570
> > error 4
> > 
> > The interesting thing is it only happens when booted into a 2.6.24
> > kernel.  If I reboot the same box into a 2.6.18 kernel everything runs
> > fine (and there are no segfaults).

Older kernels don't emit this segfault message.  It was added in
commit abd4f7505bafdd6c5319fe3cb5caf9af6104e17a that is for 2.6.23.  Could
that be why you didn't notice it earlier?

> I do see them too with 2.6.26 kernel at zmailer.org server.
> A few hits per week according to kernel dmesg logs.
> I suspect more about glibc doing something stupid, than program really
> going over the edge, but these are so rare that debugging them is next
> to impossible.   Previously I have seen them happen after the program
> has called exit(0).
> Anyway I have turned on core dumps to be able to see what happens.

I've seen Zmailer stopping mail delivery or stopping accepting connections
on port 25.  The issue is hitting relativly infrequently but I decieded to
follow your example and just turned on core dumps; it is affecting sm,
smtpserver and router.  Lately the frequency of this issue striking
seems to have increased significantly - I wonder if that's due to me
looking more frequently after it or due to my extremly inflated mail
queue with over 1,700,000 stored messages.

Ironically I seem to have gotten another router segfault just seconds
before I enabled core dumps ...

