[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mailbox Hashing



On Tue, Apr 04, 2000 at 03:12:03PM +0000, Dan wrote:
> . How would Zmailer's performance be effected if it hashed mailboxes, or
>   stored them in Maildir format?
> . How do large mail spool files affect Zmailer?
> . Or those problems are alieviated with the help of router?
> . What would be the maximum number of users supported by FS without mail
>   spool hashing.
> These are very interesting questions. Any ideas? Thanks.

  These depend on what kind of filesystem you have.
  Most classical UNIX filesystems do linear searches inside the directory,
  and they do those for  readdir()  as well as for *all* file lookups
  (Until a match is found, or nothing is found.)

  Under such conditions having large (and likely fragmented!) directory
  data area means extremely bad performance for directory operations.

  In Linux environment the so called  dircache  does help a bit, but not
  when you have 100 000 files in single directory.

  On Solaris I recall the magic limit being around 60 kB after which the
  search becomes extremely sluggish.  (Well, like 50 ms instead of 2 ms,
  something that I heard while we used to use Squids for web proxies..)

  Very likely directory sizes were grown slowly, and datablocks were
  fragmented all over the disk, thus making the search slower.

  There are contenders for cases where single directory contains *huge*
  number of files.  One such being RieserFS for Linux.  Those systems
  use BTREE as directory ordering method.

  With that background explained, answers in order:

  - Hashing mailboxes means scattering their storage paths via one or two
    levels of hashed directories, a'la:
	/path/to/[A-Z]/[A-Z]/unique_userid
    (Derivation of those hashes shall be side-stepped for now)

    Intention is to scatter the users files around the dirs so that
    no single subdirectory becomes very large, and thus speed up accesses.

    The Maildir format creates large directories  -  sooner or latter..
    (And ZMailer doesn't support writing it - nobody has contributed
     the code, nor have I had interest for it myself.)

  - Large single files as mailbox spool again depend of basic filesystem
    performance.  ZMailer just appends.

  - Large single files as MESSAGE spool files, now there is an interesting
    animal...  Please don't exceed 2GB-minus-some as any single message size.
    Processing of message body is done with a stream state machine, and is
    linearly dependent on message size.

  - Helped with router ?  Some aspects of MESSAGE processing are in plans
    to be helped by the router -- router should do MIME structure scanning,
    and then the post-processing (conversions) can be done without pre-
    scan of the message body in the transport agent programs

  - Maximum number of user files (or subdirs) at any given filesystem is
    a parameter of that filesystem.  Usually number of files is far larger
    than number of possible subdirs in given directory.  At Solaris UFS
    I think the max number of subdirectories is circa 32000, while max
    number of files is *far* larger -- just that the system becomes
    hopelessly slow...


-- 
/Matti Aarnio	<mea@nic.funet.fi>