[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scheduler queke structure



On Mon, Feb 14, 2000 at 08:27:38PM +0300, Eugene Crosser wrote:
> Matti,
> 
> how comes that under load there are a lot of files in the
> $POSTOFFICE/transport and $POSTOFFICE/queue directories,
> while there *also* are files in the [A-Z] subdirectories?
>
> Is this because the router places the files in the directories,
> and the scheduler moves them over to subdirectories later?

  Yes.  Files that are in subdirs (presuming your scheduler
  runs in -H or -HH mode) are known to scheduler, files at
  $POSTOFFICE/transport/ are not yet assimilated.

> I experienced a burst of load today, the transport queue
> grew up to 29000 entries and scheduler slowed down to a halt.
> I suppose this was because directory operations became very
> slow when it became overcrowded.

  To know it for sure one should use syscall tracing on the scheduler
  to see how many microseconds various calls do take.  E.g. if readdir()
  or  rename()  do take lots of time, then the trouble is there.

  I haven't done proper profiling of the programs for over a year,
  maybe I should e.g. profile at least the scheduler...

  Still... I do know (feel) that there are scheduler responsiviness
  issues in several forms:
	- mailq socket interactivity
	- input directory message assimilation
	  (occasional *very* long blockings may happen here if
	   dirqueuescan() must work for large number of new files..)
	  (readdir() for e.g. 20 000 files takes some time, but stat()
	   for them will take *a lot* more..)
	- TA interaction (The OverFeed trick is used to hide this)

  Rewriting the thing to be thread-safe, and have threads for each
  TA process, plus input and mailq services would perhaps improve
  responsiveness -- but with a penalty of having it radically "high
  technology" stuff.

  Reading the code a bit - I wonder how you got  newents_limit risen
  from its default value of 400 ?  Do you use -E option to the scheduler ?
  It *should* keep the responsiveness under (some) control, unless
  the number of files in the directory kills things completely at
  those levels...

  Thinking of it some more, perhaps (as there already is a time server,
  and thus asking time is cheap), I could push at least the mailq
  service polling into  dirqueuescan() -- once per second, or so.

  Things might not be any faster at such directory bloat case, but at
  least the mailq will respond quickly.

> Any comments?
> Eugene

-- 
/Matti Aarnio	<mea@nic.funet.fi>