[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Router bug



> ZMailer router (2.99.38 #1: Mon Sep 30 04:02:27 EDT 1996), SunOS
> 4.1.3_U1 with latest resolver libraries. 
> 
> When the router expands a mailing list of 2187 recipients it silently
> dies. It gets to the point where it is listing the recipients in the router
> log file, and then it simply stops. The next router in order takes on
> the task and then dies. Pretty soon: no routers...

	I got a copy of the list from Scott, and despite his
	saying against, I think this is what is happening:

	Well, from  /var/log/mail/router:

router[17016]: malloc(32744): virtual memory exceeded, sleeping

	When it gets that, it sleeps 60 seconds, AND TRIES AGAIN
	MALLOC MORE -- but fails because it exceeds my default ulimits:

	(This is Solaris machine, but SPARC anyway..)

sol:/var/spool/postoffice/router|530# ps -lp 17016
 F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME CMD
 8 S     0 17016     1  0  41 20 f6562680   4290 f6562850 ?       18:58 router
sol:/var/spool/postoffice/router|531# ulimit -a
core file size (blocks)  800000
data seg size (kbytes)   15000
file size (blocks)       unlimited
stack size (kbytes)      8192
cpu time (seconds)       unlimited
pipe size (512 bytes)    10
open files               64
virtual memory (kbytes)  2097151

	This is because expanding this big list easts A LOT of memory.
	It used to eat a lot more, though...
	I recall 600 receipient list eating 140 MB, then I begun to
	rewrite parts of the router to trim down their memory usage.
	Nevertheless it is a pig :-(

	Perhaps it is not hopelessly BIG pig, following is from nic.funet.fi,
	when it had ran thru the list expansion:

bash# ps wwl30113
UID   PID PPID CP PRI NI   VSZ  RSS WCHAN S TT     TIME COMMAND
  0 30113    1 93  52  0 13.1M 9.7M -     R ??   4:58.70 /l/mail/bin/router -dkn 4

	Odd.. SMALLER than 16 MB at SPARC, but nic.funet.fi does
	not do logging of individual addresses to the /var/log/mail/router
	file, just puts summaries there.

	Comment off .... eh ?  nic.funet.fi DOES log everything,
	addresses incoming to the router, AND results going out.
	Still it was smaller at 64-bit Alpha, than 32-bit SPARC ???

	Quite a puzzle..

> Splitting the list into two separate lists works, so it seems to be
> size related.

	Indeed -- it was growing before like in N! until
	I fixed it a bit.  Now it should grow pretty much linear.
	(It is still growing too much to my taste, but that is
	 another story.)

> sdb

	/Matti Aarnio <mea@nic.funet.fi>