[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues for Solaris 2.5.1 and 2.99.45



> As you figured out... I am not specifying "smtp $host". I am doing "smtp -e 
> -s".

	Ah!  Leave away the '-e' -- it causes the smtp transporter to
	process also messages it is not supposed to touch.
	(Well, '-e' did make sense before I rewrote the scheduler.)

> > > (3) I have 6 smtp processes looping, apparently. I am unable to truss or
> > > debug them. I sent them a SIGABRT and then gdb the core file and got:
> > > 
> > > #0  0x148b0 in process (SS=0xeffff3c8, dp=0x3fb7c, smtpstatus=0, host=Cannot 
> > > access memory at address 0xeffff184.
> > > ) at smtp.c:701
> > > 701             for (rp = rphead = dp->recipients; rp != NULL; rp = rp->next) {
> > 
> > 	???  Perhaps a faulty  transport-spec  file ?
> 
> Ok. Not sure where I should look for this... Any help is appreciated.

	I suspect the "-e" is the reason for this too.
	If it recurs, then at the main() level of the smtp transporter
	there are variables "channel", "host", and "file", which may
	contain the clue..

> > > (4) HUPing the scheduler process does not cause new processes to write a new
> > > scheduler.perflog. Killing the scheduler process does not kill the children,
> > > as documented. HUPing the router process only causes that one router process
> > > to write to the new log file, the others continue writing to the old (it would
> > > seem they never get HUPed).
> > 
> > 	- Scheduler HUP -- The logfn processing (of log file) is handled by
> > 			   the  lib/loginit.c  routine, which does not know
> > 			   about the perflog.
> 
> I presume I should disable the perflog then... since it would seem
> that there aren't any programs to grok this information for useful reporting,
> yes? I guess I would point the perflog and /dev/null so I can get the
> nice -Q output, yes?

	Just leave away the '-l scheduler.perflog' -option from the
	SCEDULEROPTIONS.

	The 2.99.47 will do 'SIGHUP' properly on them both.

> > 
> > 	- Router HUP    -- You must HUP the GROUP, not single ("top parent")
> > 			   process that forked off some siblings to do paral-
> > 			   lelized routing
> 
> How is this done? Sorry for such a stupid question.

    I am tempted to say: "man 2 kill", but...

	routerpid=`head -1 $POSTOFFICE/.pid.router`
	kill -HUP -$routerpid

    That is, the listed  .pid.router  is the pid of the process-group leader.

> > 
> > 	- The children of the scheduler will die some day, once they are done
> > 	  with all the job-specs they have received.  However smtp connection
> > 	  timeouts (for example) are not always very short...
> 
> Can there be an option to the scheduler which says that it should send
> a SIGTERM or SIGKILL to all children??? Then I can reliably roll over
> the scheduler log file.

	The logfiles should be opened with  close-on-exec  flags..
	... not that it should matter.  Child processes have only
	FDs 0,1/2 open for them, and they are pipes from the scheduler.

> > 	- For the 2.99.47 I have done some checking on SMTP timeouts, and
> > 	  how sure they are to happen.  Perhaps I have now been able to
> > 	  improve the sureness a bit now.
> 
> good.

	I have tried also to do some additional HighAvailablility things
	there -- we want to use ZMailer on a H-A system on which the system
	components backup each other, but it becomes rather hairy when it
	comes to the MX processing, and detection of which IP address in
	system is for what kind of use...

> > > (7) What is considered to be the last 'stable' release of zmailer? Is anyone
> > > moving 200,000 messages per day on Solaris 2.5.1???
> > 
> > 	I have 2.99.46p3 at mailhost.utu.fi, though I don't know
> > 	how many mails it processes daily -- fairly many..
> > 	.. and no, I would not call 46p3 STABLE, for that matter
> > 	all of 46 are somewhat buggy.
> 
> Is there a previous version I should go to instead??

	I would say 2.99.45 -- except that you have problems with it..

> /mrg

	/Matti Aarnio