[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mailbox files with "hole" on NFS, linux 2.6
Matti Aarnio wrote:
>>Occationally a mailbox is noticed with big size, with a "home" in the
>>middle. it starts as normal malbox, then there are lots of zeroes, and
>>before the end there is more normal mailbox data. Apparently zeroes are
>>not real:
> Could you determine exact byte offsets of where the hole begins,
> and where it ends ? The hole size, and its edge offsets are
> indicative about possible fault paths.
0000000 F r o m M A I L E R - D A E M
0000020 O N T u e J u n 2 1 2 2
0000040 : 2 5 : 5 7 2 0 0 5 \n D a t e
0000060 : 2 1 J u n 2 0 0 5 2 2
0000100 : 2 5 : 5 7 + 0 4 0 0 \n F r o
0000120 m : M a i l S y s t e m I
0000140 n t e r n a l D a t a < M A
0000160 I L E R - D A E M O N @ g n o m
...
0001000 t h t h e d a t a r e s e
0001020 t t o i n i t i a l v a l
0001040 u e s . \n \n \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0001060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
64450300 \0
64450301
and that's all (there is no trailing part).
Other trouble that I can recall (and which is apparently unrelated to
IMAP) is that sometimes (again, rarely!) Zmailer appends messages to a
mailbox and reports normal delivery, but the data is not in the file.
> Some reports even told that mounting with TCP protocol did
> allow perfect functioning without errors appearing, while
> default of using UDP did fail every now and then...
Hmm, maybe I could try TCP...
> The local-delivery process isn't the only one modifying
> your mailbox files. Also your POP/IMAP server modifies them.
> Running without fcntl-locking ... you are using dot-locking ?
Yes. I had problems with NFS locking very very long ago (when the
server was on Solaris machine)
It's quite possible that it's IMAP server that triggers the problem. I
currently run UW IMAP4rev1 2004.350 (both for POP and IMAP).
> The NFS protocol is staleless, so every read and write
> supplies knowledge of at what offset the operation is
> about to happen, and therefore the most likely place for
> the "wrong offset bug" is at the client side.
>
> Next likely is bug in the NFS server code.
> With present kernel implemented NFS server some of previous
> bugs are no longer present, but who knows what else happens...
I would rather think it's a client bug because it clearly depends on the
kernel version on the client (but of course it could be server bug
triggered by particular client behavior)
> A single bit corruption in network protocol is also possible,
> and in early days of SunOS and NFS, it was customary of NOT
> doing UDP checksums for performance reasons, and NFS got very
> bad reputation... Single bit corruption is also possible
> in computer memory, but I trust you are paranoid enough
> to run with ECC memory and all checks and alerts active ?
I may be not ;-)
But I don't *ever* get this problem when clients are running 2.4 kernel.
Eugene
OpenPGP digital signature