[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mailbox files with "hole" on NFS, linux 2.6



Matti Aarnio wrote:

>>Occationally a mailbox is noticed with big size, with a "home" in the
>>middle.  it starts as normal malbox, then there are lots of zeroes, and
>>before the end there is more normal mailbox data.  Apparently zeroes are
>>not real:

> Could you determine exact byte offsets of where the hole begins,
> and where it ends ?  The hole size, and its edge offsets are
> indicative about possible fault paths.

0000000   F   r   o   m       M   A   I   L   E   R   -   D   A   E   M
0000020   O   N       T   u   e       J   u   n       2   1       2   2
0000040   :   2   5   :   5   7       2   0   0   5  \n   D   a   t   e
0000060   :       2   1       J   u   n       2   0   0   5       2   2
0000100   :   2   5   :   5   7       +   0   4   0   0  \n   F   r   o
0000120   m   :       M   a   i   l       S   y   s   t   e   m       I
0000140   n   t   e   r   n   a   l       D   a   t   a       <   M   A
0000160   I   L   E   R   -   D   A   E   M   O   N   @   g   n   o   m
...
0001000   t   h       t   h   e       d   a   t   a       r   e   s   e
0001020   t       t   o       i   n   i   t   i   a   l       v   a   l
0001040   u   e   s   .  \n  \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0001060  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
64450300  \0
64450301

and that's all (there is no trailing part).

Other trouble that I can recall (and which is apparently unrelated to
IMAP) is that sometimes (again, rarely!) Zmailer appends messages to a
mailbox and reports normal delivery, but the data is not in the file.

> Some reports even told that mounting with TCP protocol did
> allow perfect functioning without errors appearing, while
> default of using UDP did fail every now and then...

Hmm, maybe I could try TCP...

> The local-delivery process isn't the only one modifying
> your mailbox files.  Also your POP/IMAP server modifies them.
> Running without fcntl-locking ... you are using dot-locking ?

Yes.  I had problems with NFS locking very very long ago (when the
server was on Solaris machine)
It's quite possible that it's IMAP server that triggers the problem.  I
currently run UW IMAP4rev1 2004.350 (both for POP and IMAP).

> The NFS protocol is staleless, so every read and write
> supplies knowledge of at what offset the operation is
> about to happen, and therefore the most likely place for
> the "wrong offset bug" is at the client side.
> 
> Next likely is bug in the NFS server code.
> With present kernel implemented NFS server some of previous
> bugs are no longer present, but who knows what else happens...

I would rather think it's a client bug because it clearly depends on the
kernel version on the client (but of course it could be server bug
triggered by particular client behavior)

> A single bit corruption in network protocol is also possible,
> and in early days of SunOS and NFS, it was customary of NOT
> doing UDP checksums for performance reasons, and NFS got very
> bad reputation...  Single bit corruption is also possible
> in computer memory, but I trust you are paranoid enough
> to run with ECC memory and all checks and alerts active ?

I may be not ;-)
But I don't *ever* get this problem when clients are running 2.4 kernel.

Eugene

OpenPGP digital signature