[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: scheduler queke structure
On Mon, Feb 14, 2000 at 08:27:38PM +0300, Eugene Crosser wrote:
> how comes that under load there are a lot of files in the
> $POSTOFFICE/transport and $POSTOFFICE/queue directories,
> while there *also* are files in the [A-Z] subdirectories?
> Is this because the router places the files in the directories,
> and the scheduler moves them over to subdirectories later?
Yes. Files that are in subdirs (presuming your scheduler
runs in -H or -HH mode) are known to scheduler, files at
$POSTOFFICE/transport/ are not yet assimilated.
> I experienced a burst of load today, the transport queue
> grew up to 29000 entries and scheduler slowed down to a halt.
> I suppose this was because directory operations became very
> slow when it became overcrowded.
To know it for sure one should use syscall tracing on the scheduler
to see how many microseconds various calls do take. E.g. if readdir()
or rename() do take lots of time, then the trouble is there.
I haven't done proper profiling of the programs for over a year,
maybe I should e.g. profile at least the scheduler...
Still... I do know (feel) that there are scheduler responsiviness
issues in several forms:
- mailq socket interactivity
- input directory message assimilation
(occasional *very* long blockings may happen here if
dirqueuescan() must work for large number of new files..)
(readdir() for e.g. 20 000 files takes some time, but stat()
for them will take *a lot* more..)
- TA interaction (The OverFeed trick is used to hide this)
Rewriting the thing to be thread-safe, and have threads for each
TA process, plus input and mailq services would perhaps improve
responsiveness -- but with a penalty of having it radically "high
Reading the code a bit - I wonder how you got newents_limit risen
from its default value of 400 ? Do you use -E option to the scheduler ?
It *should* keep the responsiveness under (some) control, unless
the number of files in the directory kills things completely at
Thinking of it some more, perhaps (as there already is a time server,
and thus asking time is cheap), I could push at least the mailq
service polling into dirqueuescan() -- once per second, or so.
Things might not be any faster at such directory bloat case, but at
least the mailq will respond quickly.
> Any comments?
/Matti Aarnio <firstname.lastname@example.org>