[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: scheduler/hold loop
>
> I have a loop in the "hold" transport.
>
> It is slowly eating up memory. This is zmailer 2.99.45.
Nope, at least the "hold" program does not grow.
Now if you could get the strace/truss(whatever)
to show longer strings, we might learn something :)
> The hold process is:
>
> write(1, " # h u n g r y\n", 8) = 8
> read(0, " # i d l e\n W / J / 5 4".., 8192) = 100
The TA gets at first "#idle\n", then (in same PIPE read())
it gets a TWO-LEVEL HASHED subdirectory pointer to
"W/J/54..."
> write(1, " # h u n g r y\n", 8) = 8
This is responce to "#idle" -- "Thanks, I got it, but still
I am free for a job"... It starts with opening the files,
and reading (mmap()ing) them in.
> open("W/J/541381-24826", O_RDWR) = 3
> fstat(3, 0xEFFFF230) = 0
> mmap(0x00000000, 3322, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xEF780000
> read(3, " i 5 4 1 3 8 1 - 2 4 8".., 3322) = 3322
> open("../queue/W/J/541381-24826", O_RDONLY) = 4
> fstat(4, 0xEFFFF230) = 0
> mmap(0x00000000, 3744, PROT_READ, MAP_SHARED, 4, 0) = 0xEF690000
> write(1, " 5 4 1 3 8 1 / 8 8 4\t t".., 138) = 138
> write(1, " 5 4 1 3 8 1 / 7 6 0\t t".., 138) = 138
Hmm.. These are the status report lines, though they are
way longer than visible here...
> munmap(0xEF690000, 3744) = 0
> munmap(0xEF780000, 3322) = 0
> close(3) = 0
> close(4) = 0
Tail of the processing -- once the responces have been
sent, it is time to release the locks, and return the
resources.
> The scheduler process is:
>
> read(33, " # h u n g r y\n 5 4 1 3".., 2048) = 292
> write(33, " # i d l e\n", 6) = 6
> write(2, " s c h e d u l e r : m".., 73) = 73
It gets '#hungry', and couple status reports, then
it wrote something to the stderr propably relating
to this case.
> poll(0xEFFFD478, 1, 0) = 0
> poll(0xEFFFD598, 1, 0) = 0
> write(33, " W / J / 5 4 1 3 8 1 - 2".., 44) = 44
> poll(0xEFFFD598, 1, 0) = 0
> poll(0xEFFFD588, 201, 0) = 1
>
> /mrg
I am rewriting parts of the scheduler-ta interaction protocols,
and during it I came to think, that perhaps the "#idle" message
should be counted same way as any other job-spec message.
Also it looks like the 'hold' channel needs to have deeper
understanding of the 'host' concept. It didn't pay attention
to the 'host' selector, and perhaps thus became rejected as
it received two job-specifiers on same file, and the first one
processed them both. (I am not convinced of this explanation,
I must do a test...)
/Matti Aarnio