[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: zmailer memory leak status



> When last we left, zmailer was leaking memory in large chunks.  

	Yes, mostly because of a bug in Toronto version of
	lib/linebuffer.c though others are possible too.

> Since then, I have recompiled with the latest version of the 
> toronto malloc routine as suggested and tested it.  At first
> it appear to solve the problem.  I ran 1300 messages through
> to local accounts and the memory leak was still there but 
> significantly reduced, however, over the last two days, its
> come back with a vengance (150MB) for one router process.

	I concur, the new Toronto malloc does diminish memory
	bloat somewhat, but didn't solve my Solaris fault-problem
	which keeps me from making a "release".
	(But malloc and linebuffer.c are separate systems.)

> I can only infer that its related to non local traffic.
> 
> At present, I see two options for correting this problem:
> 
> 1) try a different malloc routine as suggested in previous
>    mail

	Oddly the Toronto malloc has some problems with
	GCC -- mostly propably just headers, but that
	mess may hide real problems as well..

> 2) Do a clean compile under the SUN development environment
>    with clean SUN malloc's then track down and eliminate
>    the memory leaks (I will probably add Testcenter from
>    Centerline for this purpose).

	What these "Testcenter" and "Centerline" are ?
	Are they on SunPRO  SparcWORKS ?

> I propose to do the later.
> 
> PS it appears that zmailer does not implment reliable signal 
> handling, one more think to fix.

	In what sense / what platform / which modules ?

> This leads to the following questions:
> 
> 1) mia, indiatacted (he/she?) was working on a newer version
>    does any one know what branch's of source have been created
>    from the toronto base ? What's the feature differences ?  Any
>    recommendation ?

	mea -- Matti E Aarnio,  "matthew" for you english-speakers :)
		(If that can solve the  he/she  for you..  My own "mother-
		 language" doesn't have  female/male  differentiation at all.
		 Only  "he" (person) vs. "it" (things, animals,..)   )

	There are a lot differences, and plenty bugfixes..
	Get the source:

	ftp.funet.fi:/pub/unix/mail/zmailer/zmailer-2.2.1-mea-DATECODE.tar.gz

> 2) I'm going to recompile on Solaris 2.3 (an AT&T 5.4 base system)
>    I tend to target that enviroment instead of SUN 4.X, any 
>    thoughts ?

	I am fairly successfull at it, but when external listfile has
	more than 4kB size,  fread() called by linebuffer.c  drops
	core..  :-(  Same source works all right on 4.x, and BINARY
	from 4.x runs just fine on Solaris 2.3!  "I don't like it"...

> Last some observations:
> 
> Some one mentioned that only thier first router process was working,
> point in fact they all seam to work but only under heavy loads, so
> on a system with light email loads, the first router accumulates 
> most of the time.

	Well, what would you expect an asynchronous parallel system
	to do ?  I would say all of them will get work even under
	light loads, but then if the system load is light, there is
	no point at keeping more routers available for momentary
	queue buildup -- if average routing thruput is too high
	(say more than 15 minutes), then adding more routers is ok.
	If the average is under a minute, and it happens with single
	router, then maybe Zmailer is overkill for your uses...

> Second,  it is painfully slow.  I will do some profiling but I 
> suspect it related to the router CPU time and the heavy use of file
> I/O (e.g.) heavy logging, lots of stat info on disk.

	No, it isn't slow. Route decission with static table,
	or to local user with no frils (.forward) will happen
	under a second -- faster than I can type anyway.
	The DNS lookups over  the network are slow.  When you
	have a lot addresses to resolve, it will take some time..

	Of course it takes a moment to pick up a file, but
	rescan interval of 10-20 second should not be of
	problem.  Though if you compare it to  sendmail's
	"instant" processing -- the one instance you start
	for message submission will also do routing and
	delivery -- then it will make a difference.

	However at serious loads when usual rapidly-spawned
	bunch of sendmails will literally kill the system
	(ok, sendmail 8.x has smarties on this based on
	 system loadaverage), Zmailer's  /lib/sendmail will
	just submit the file for latter routing et.al.

> Moving forward, I'm thinking of the following changes:
> 
> 1) add light weight threads to the router process, and clean up
>    logging

	Not so easy, and especially not truly portable :-(
	One of the Zmailer premisses is to do things with
	as low-level technology as possible.
	Mimimum at the moment is working setreuid(), all the
	rest can be found/faked easily, but that is the corner
	stone of security features.

> 2) Add a write through cache for status information

	What that would mean ?   (I mean "status information")

	A cache of route-results --  domain -> route
	would help and speed-up slow DNS lookups.
	(A common cache-server for it, not (necessarily) caches
	 on each router processes!)

> 3) Utilize signals and rpcs to tell other processes when there 
>    is work to be done instead of the file system

	Possible, but again it might call for too high-level
	things.  And then, are you sure the filesystem mediated
	interaction would be considerably more "expensive", than
	using signals/rpcs ?

> 4) Implment a unifed logging archiecture
> 
> 5) Develop a MIB-II interface and an Tivoli object management
>    interface

	Would you have any more detailed ideas ?  (on 4 and 5)
	MIB-II is of SNMP-2, but "Tivoli" doesn't tell me much.
	(I think you don't mean the amusement park at Coepengahen, Denmark)

> Any thoughts about these ideals
> 
> Thanks,
> 
> John Scharber


	/Matti Aarnio	<mea@nic.funet.fi> <mea@utu.fi>