[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Maxring...



> Hi all, hi Matti ;-)
> I've come into a problem of contention of 2 mailbox deliveries,
> where one comes back with an error "error (File exists) creating ...",
> actually bouncing the message.
> 
> Tracking the thing, it seems to be caused by 2 messages ;-) at once
> to the same mailbox BUT one coming through an alias expansion. This
> way, the host part of the channel/host selector is different and 
> bypass the (intended filter?) only one mailbox proc at a time.

	There should be mailbox write interlock at the delivery
	phase in every case, but if it fails for some reason,
	then you are likely to see errors like you see.

	Now to find the interlock failure reason...

	... I think I see the reason:
	- mailbox1 tests to see if the mailbox exists, and it does not
	- mailbox2 tests to see if the mailbox exists, and it does not
	- mailbox1 tries to do exclusive file creation (and succeeds)
	- mailbox2 tries to do exclusive file creation (and fails)

	It is a matter of milliseconds, most likely.
	If the file exists, then the code takes different path to
	aquire the interlock, and append to the file.

	I will put this at the BUGS file, from which I pull out things
	to tinker with...  (Fix should be simple:  When file exists, do
	retry the file creation interlock acquisition, and then propably
	just append to the file.)

> Actually, the distributed scheduler.conf (2.99.27) has local/* with
> maxring=20, maxchannel=20 and a comment saying that only one host
> is done at a time, but I fail to understand if the comment is coherent
> with the data ... ;-)
> 
> In fact, it seems that maxring is maxkidthreads, i.e. max procs for
> config/channel/host tripplet, so ...

	Well, ``rings'' are all of those  channel/host  threads that match
	the same configuration selector.  See picture at:
		doc/zmnewsched1.ps
	which I drew for myself as a map of how things interact, when I was
	designing the new scheduler...  Most of the pointers are not named
	at the picture but I they can be found from the scheduler.h data-
	structure definitions.

	As you noted, in your case the `host'-parts of the message
	delivery addresses were different, and thus came the catch
	of starting two threads for the same actual delivery address.

> -tron
> --
> Carlos G Mendioroz  <tron@secyt.gov.ar>

	/Matti Aarnio