[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues for Solaris 2.5.1 and 2.99.45

> Solaris 2.5.1
> Zmailer 2.99.45
> A few problems. I tried pushing about 100,000 messages through it 
> as my first acid test (real traffic).
> (1) Logging scheduler output I got 53MB of the following message type:
> 19970317192004 DBGdiag: # smtpclient:15563: lockaddr: file '541305-13635' host 
> '?host?' expected '~' found ' '

	It means that two smtp-clients have been processing same message
	in parallel.  It is a no-no situation actually.

	The presense of '?host?' propably means that you are running
	scheduler without sending host-selectors down to the TA programs.
	That is, you have the  smtp/*  defined in the ancient manner of:
		command="smtp $host"

	Umm.. No, it doesn't look like it from things you describe below.

> (2) I tried setting USE_FCNTLLOCK in transports/libta/lockaddr.c. After
> futzing with the header files to get fcntl.h incorporated, this code
> did not build. Looking at the fcntl code in lockaddr.c, I don't think
> this code works.

	No, it does not work.  I did never finish off the code.
	(And I am not certain it would be cheaper to the system,
	 than the current method -- having HUNDREDS of small log
	 regions in dozens of files...)
> (3) I have 6 smtp processes looping, apparently. I am unable to truss or
> debug them. I sent them a SIGABRT and then gdb the core file and got:
> #0  0x148b0 in process (SS=0xeffff3c8, dp=0x3fb7c, smtpstatus=0, host=Cannot 
> access memory at address 0xeffff184.
> ) at smtp.c:701
> 701             for (rp = rphead = dp->recipients; rp != NULL; rp = rp->next) {

	???  Perhaps a faulty  transport-spec  file ?
> (4) HUPing the scheduler process does not cause new processes to write a new
> scheduler.perflog. Killing the scheduler process does not kill the children,
> as documented. HUPing the router process only causes that one router process
> to write to the new log file, the others continue writing to the old (it would
> seem they never get HUPed).

	- Scheduler HUP -- The logfn processing (of log file) is handled by
			   the  lib/loginit.c  routine, which does not know
			   about the perflog.

	- Router HUP    -- You must HUP the GROUP, not single ("top parent")
			   process that forked off some siblings to do paral-
			   lelized routing

	- The children of the scheduler will die some day, once they are done
	  with all the job-specs they have received.  However smtp connection
	  timeouts (for example) are not always very short...

	- For the 2.99.47 I have done some checking on SMTP timeouts, and
	  how sure they are to happen.  Perhaps I have now been able to
	  improve the sureness a bit now.

> (5) process.cf has a return 0 just prior to the log info: line that prevents
> the info: line from being generated. This, of course, breaks any stats
> gathering you might want.
> (6) I can't find any documentation about the output of mailq -Q. Can someone
> describe the following please?:
>    smtp/mx1.gzic.gd.cn/0 R=9  A=3  P=22688 HA=1482s FA=1484s OF=9  QA=14h18m56s

	R = Recipient addresses in the thread
	A = Attempts done on this channel/host selector
	P = Process PID of the Transport Agent
	HA = Hunger Age -- when it last reported a need for "food"
	FA = Feed Age   -- when a job was sent to the TA
	OF = OverFeed   -- how many jobs are still unprocessed by the TA from
			   those that were sent to it ?
	QA = QueueAge   -- Age of the oldest message in the thread

> (7) What is considered to be the last 'stable' release of zmailer? Is anyone
> moving 200,000 messages per day on Solaris 2.5.1???

	I have 2.99.46p3 at mailhost.utu.fi, though I don't know
	how many mails it processes daily -- fairly many..
	.. and no, I would not call 46p3 STABLE, for that matter
	all of 46 are somewhat buggy.

> Thanks!!
> /mrg

	/Matti Aarnio <mea@nic.funet.fi>