[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Strange growth of CPU usage



Hello.

A few days ago our routers started to behave in a strage way. Till then 
average CPU usage generated by them was about 8%, while average iowait on the 
machine was about 32. After cleaning postoffice/postman directory (there was 
plenty of files in it) iowait went down to about 13, but, unexpectedly, CPU 
usage went up to about 20%. It stuck at this level and hasn't change so far. 
What's more, the ratio of processed messages has not changed, either. It 
looks as if routers were doing some extra job, thus wasting CPU power.

We tried to truss router processes. We found a possibly unusual thing: whole 
a lot of llseek(1, 0, SEEK_CUR) operations. For example, this is what the 
router does when it tries to take a message file from the router directory:

-------------------------------------------------
29889:  getdents64(4, 0x0011BF80, 1048)                 = 1048
29889:  stat("97-29891", 0xFFBEF5D0)                    = 0
29889:  kill(29891, SIG#0)                              = 0
29889:  stat("2304", 0xFFBEF5D0)                        = 0
29889:  stat("2304", 0xFFBEF0C8)                        = 0
29889:  rename("2304", "2304-29889")                    = 0
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  llseek(1, 0, SEEK_CUR)                          = 14649065
29889:  open("2304-29889", O_RDONLY)                    = 6
29889:  time()                                          = 993112746
29889:  fstat(6, 0x001EBB70)                            = 0
-------------------------------------------------

There are places where the routers does hundreds of llseeks without any clear 
reason. File descriptor 1 is the log file, while I can't see any logging 
instructions there.

We are using zmailer-2.99.51 on Solaris 7.

If anybody knows what's up here, let me know. Thanks for any help.

Regards,

-- 
Bartosz Klimek, Onet.pl S.A.
e-mail: bartoszk@onet.pl
-
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi