[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SMTP transport hangs


I'm experiencing this problem for a *long* time, but as this happens
rarely, it did not bother me much until recently.

Sometimes smtp transport process hangs and stays in the process list for
a long time (days).  This effectively stops the queue for the domain it
is contacting to.  The process uses zero CPU and leaves *no* traces in
the smtp log.  Netstat does not show any open connection on the receiving
end (and I cannot check if there is a connection associated with this
process on the originating end).

Today, I caught such process and killed it with sig 11 (with -QUIT, it
leaves no core file?).  Zmailer 2.99.47 + all patches, SPARC Solaris 2.5.1.
This is what gdb shows:

Core was generated by `/usr/zmailer/bin/ta/smtp -s8H -l /var/log/zmailer/smtp'.
Program terminated with signal 11, Segmentation fault.
procfs (find_procinfo):  Couldn't locate pid 0
#0  0xef636d58 in _end ()
(gdb) bt
#0  0xef636d58 in _end ()
#1  0xef6732cc in _end ()
#2  0xef6475f8 in _end ()
#3  0xef6f2158 in _end ()
#4  0xef6d77c8 in _end ()
#5  0xef6f1bb8 in _end ()
#6  0x1fec0 in stachmyaddress (host=0x44fc7 "koi.smtp.online.ru")
    at selfaddrs.c:296
#7  0x2011c in stachmyaddresses (host=0x44fd9 "") at selfaddrs.c:420
#8  0x16604 in smtpconn (SS=0xeffff998, host=0x454b0 "office.sob.tulane.edu", 
    noMX=0) at smtp.c:1758
#9  0x163b0 in smtpopen (SS=0xeffff998, host=0x454b0 "office.sob.tulane.edu", 
    noMX=0) at smtp.c:1689
#10 0x13e28 in main (argc=0, argv=0x454b0) at smtp.c:669

"koi.smtp.online.ru" is one of IP aliases of the local machine.
selfaddrs.c:296 is gethostbyname() call.  I *can* beleive that this is
an error in Solaris...  Though this happend in 2.4 as well.
Also, I *can* write a wrapper around gethostbyname doing alarm(),
but it sounds ugly.

Any better ideas?
Maybe the scheduler could kill letargic childs?  Something else?