[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scheduler/hold loop



> 
> I have a loop in the "hold" transport.
> 
> It is slowly eating up memory. This is zmailer 2.99.45.

	Nope, at least the "hold" program does not grow. 
	Now if you could get the strace/truss(whatever)
	to show longer strings, we might learn something :)

> The hold process is:
> 
> write(1, " # h u n g r y\n", 8)                 = 8
> read(0, " # i d l e\n W / J / 5 4".., 8192)     = 100
	The TA gets at first "#idle\n", then (in same PIPE read())
	it gets a TWO-LEVEL HASHED subdirectory pointer to
	"W/J/54..."
> write(1, " # h u n g r y\n", 8)                 = 8
	This is responce to "#idle" -- "Thanks, I got it, but still
	I am free for a job"...  It starts with opening the files,
	and reading (mmap()ing) them in.
> open("W/J/541381-24826", O_RDWR)                = 3
> fstat(3, 0xEFFFF230)                            = 0
> mmap(0x00000000, 3322, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xEF780000
> read(3, " i   5 4 1 3 8 1 - 2 4 8".., 3322)     = 3322
> open("../queue/W/J/541381-24826", O_RDONLY)     = 4
> fstat(4, 0xEFFFF230)                            = 0
> mmap(0x00000000, 3744, PROT_READ, MAP_SHARED, 4, 0) = 0xEF690000
> write(1, " 5 4 1 3 8 1 / 8 8 4\t t".., 138)     = 138
> write(1, " 5 4 1 3 8 1 / 7 6 0\t t".., 138)     = 138

	Hmm.. These are the status report lines, though they are
	way longer than visible here...

> munmap(0xEF690000, 3744)                        = 0
> munmap(0xEF780000, 3322)                        = 0
> close(3)                                        = 0
> close(4)                                        = 0

	Tail of the processing -- once the responces have been
	sent, it is time to release the locks, and return the
	resources.

> The scheduler process is:
> 
> read(33, " # h u n g r y\n 5 4 1 3".., 2048)    = 292
> write(33, " # i d l e\n", 6)                    = 6
> write(2, " s c h e d u l e r :   m".., 73)      = 73

	It gets '#hungry', and couple status reports, then
	it wrote something to the stderr propably relating
	to this case.

> poll(0xEFFFD478, 1, 0)                          = 0
> poll(0xEFFFD598, 1, 0)                          = 0
> write(33, " W / J / 5 4 1 3 8 1 - 2".., 44)     = 44
> poll(0xEFFFD598, 1, 0)                          = 0
> poll(0xEFFFD588, 201, 0)                        = 1
> 
> /mrg

	I am rewriting parts of the scheduler-ta interaction protocols,
	and during it I came to think, that perhaps the "#idle" message
	should be counted same way as any other job-spec message.

	Also it looks like the 'hold' channel needs to have deeper
	understanding of the 'host' concept.  It didn't pay attention
	to the 'host' selector, and perhaps thus became rejected as
	it received two job-specifiers on same file, and the first one
	processed them both.  (I am not convinced of this explanation,
	I must do a test...)

/Matti Aarnio