[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Scheduler dumps core; problem with sfio?



Hello.

I'm using a CVS HEAD version of ZMailer, on a Linux/x86 platform.

Recently I was trying to raise the limit of simultatneous TAs
run by scheduler (from aroung 1000 processes to around 3000).

So, I raised the limit of open fds (ulimit -n 32767), and also
increased the number of open files as seen by the kernel:
echo 512000 > /proc/sys/fs/file-max

Then I tried to run scheduler.  Scheduler already had a load of
messages to deliver (around 1 milion files), so it started forking
new children, when suddenly... SEGV.  100% reproducable.

Here's the gdb output:

(gdb) bt
#0  0x00000001 in ?? ()
#1  0x08082f20 in procselhost ()
#2  0xbfdec7f0 in ?? ()
#3  0xbfdec770 in ?? ()
#4  0x00000000 in ?? ()
#5  0xbfdec768 in ?? ()
#6  0x0808ad20 in ?? ()
#7  0xbfdec778 in ?? ()
#8  0x0806b54f in sfprintf (f=0x1, form=0x1 <Address 0x1 out of
bounds>) at sfprintf.c:27
#9  0xffffffff in ?? ()
#10 0xffffffff in ?? ()
#11 0xffffffff in ?? ()
#12 0xffffffff in ?? ()
#13 0xffffffff in ?? ()
#14 0xffffffff in ?? ()
#15 0xffffffff in ?? ()
#16 0xffffffff in ?? ()
#17 0xffffffff in ?? ()
#18 0xffffffff in ?? ()
#19 0xffffffff in ?? ()
#20 0xffffffff in ?? ()
#21 0xffffffff in ?? ()
#22 0xffffffff in ?? ()
#23 0x0000ffff in ?? ()
#24 0x00000007 in ?? ()
#25 0x00000006 in ?? ()
#26 0x0804be20 in sig_exit () at scheduler.c:1108
(gdb) f 0
#0  0x00000001 in ?? ()
(gdb) l
1108                    die(0, "signal");
1109            mustexit = 1;
1110    }
1111
1112    static RETSIGTYPE sig_quit(sig)
1113    int sig;
1114    {
1115            slow_shutdown = 1;
1116            freeze = 1;
1117    }
(gdb) f 8
#8  0x0806b54f in sfprintf (f=0x1, form=0x1 <Address 0x1 out of
bounds>) at sfprintf.c:27
27              rv = sfvprintf(f,form,args);
(gdb) l
22              reg char*       form;
23              va_start(args);
24              f = va_arg(args,Sfio_t*);
25              form = va_arg(args,char*);
26      #endif
27              rv = sfvprintf(f,form,args);
28
29              va_end(args);
30              return rv;
31      }
(gdb) f 26
#26 0x0804be20 in sig_exit () at scheduler.c:1108
1108                    die(0, "signal");
(gdb) l
1103            if (querysocket6 >= 0) {                /* give up
mailq socket asap */
1104                    close(querysocket6);
1105                    querysocket6 = -1;
1106            }
1107            if (canexit)
1108                    die(0, "signal");
1109            mustexit = 1;
1110    }
1111
1112    static RETSIGTYPE sig_quit(sig)

# /usr/local/zmailer/bin/scheduler -V
ZMailer scheduler (2.99.57.pre4 #1: Thu Aug  3 22:36:38 CEST 2006)
  root@localhost:/root/qnex/zmailer/zmailer/scheduler
Copyright 1992 Rayan S. Zachariassen
Copyright 1992-2004 Matti Aarnio
Configured with command: 'CC='gcc' CFLAGS='-g -O2' ./configure
'--with-openssl' '--with-ta-mmap' '--prefix=/usr/local/zmailer''

# file /usr/local/zmailer/bin/scheduler
/usr/local/zmailer/bin/scheduler: ELF 32-bit LSB executable, Intel
80386, version 1 (SYSV), for GNU/Linux 2.2.0, dynamically linked (uses
shared libs), not stripped

To me it seems like there could be a bug withing superfast IO which makes
it unable to handle too many simultaneously open files...

Any idea how to fix it? :)

  Regards,
     Dawid
-
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi