[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ZMailer] Zmailer crashes



On Wed, Feb 04, 2009 at 03:09:55PM +0000, Ralf Baechle wrote:

> > On Fri, Jan 30, 2009 at 03:32:31PM -0800, Neal Morgan wrote:
> > > > On October 31, 2008 9:03 AM Ralf Baechle wrote
> > > > Since quite a while I'm observing these kernel messages on a Linux x86_64
> > > > system:
> > > >
> > > > sm[3270]: segfault at 3ba7f9f0 ip 79fbc9 sp 7fffe7c48e30 error 6 in
> > > libc-2.7.so[72d000+14d000]
> > > > sm[3493] trap stack segment ip:7f0e2a121bc9 sp:7fff3240e4a0 error:0
> > > > sm[3773]: segfault at 3ba7f9f0 ip 79fbc9 sp 7fff55499680 error 6 in
> > > libc-2.7.so[72d000+14d000]
> > > 
> > > Matti: I've been seeing these across 4 servers:
> > > 
> > > kernel: smtpserver[31693]: segfault at 00000000 eip b7c16371 esp
> > > bf94b018 error 4
> > > 
> > > kernel: router[9934]: segfault at 00000008 eip 0807fa95 esp bfdf5570
> > > error 4
> > > 
> > > The interesting thing is it only happens when booted into a 2.6.24
> > > kernel.  If I reboot the same box into a 2.6.18 kernel everything runs
> > > fine (and there are no segfaults).
> 
> Older kernels don't emit this segfault message.  It was added in
> commit abd4f7505bafdd6c5319fe3cb5caf9af6104e17a that is for 2.6.23.  Could
> that be why you didn't notice it earlier?
> 
> > I do see them too with 2.6.26 kernel at zmailer.org server.
> > A few hits per week according to kernel dmesg logs.
> > 
> > I suspect more about glibc doing something stupid, than program really
> > going over the edge, but these are so rare that debugging them is next
> > to impossible.   Previously I have seen them happen after the program
> > has called exit(0).
> > 
> > Anyway I have turned on core dumps to be able to see what happens.
> 
> I've seen Zmailer stopping mail delivery or stopping accepting connections
> on port 25.  The issue is hitting relativly infrequently but I decieded to
> follow your example and just turned on core dumps; it is affecting sm,
> smtpserver and router.  Lately the frequency of this issue striking
> seems to have increased significantly - I wonder if that's due to me
> looking more frequently after it or due to my extremly inflated mail
> queue with over 1,700,000 stored messages.
> 
> Ironically I seem to have gotten another router segfault just seconds
> before I enabled core dumps ...

To close this old case - the issue went away for me after upgrading the
system from Fedora 8 to Fedora 10.  So I assume there indeed as suspected
by Matti was something toxic in glibc.

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi