[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Malloc bug in scheduler



>> I tried to install zmailer-2.99.49p9 (with patch1) on AIX 4.2.1,
>> but there is a bug in scheduler that makes it wait for ever.
>
>	I know of a problem on router, but this one is new to me.
>
>> Scheduler log says this:
>> 
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> 
>> It seems to me scheduler calls emalloc with a negative argument. The
>> other processes does not produce these errors.

I was wrong about the negative argument, 
In scheduler/resources.c resources_query_nofiles() I get the following
values: rc=0 rl.rlim_cur=2147483647 which leads to, in transport.c 
stashprocess() i=2147483647 and then malloc(4294965232).

Undefining HAVE_SETRLIMIT makes it work, of course.

When doing a small test:

#include <stdio.h>
#include <sys/resource.h>
main()
{
  struct rlimit rl;
  printf("getrlimit(RLIMIT_NOFILE,&rl)=%d\n", getrlimit(RLIMIT_NOFILE,&rl));
  printf("rl.rlim_cur=%d\n", rl.rlim_cur);
  printf("getdtablesize()=%d\n", getdtablesize());
}

I get:
getrlimit(RLIMIT_NOFILE,&rl)=0
rl.rlim_cur=2147483647
getdtablesize()=2000

It seems RLIMIT_NOFILE is not defined in AIX 4.1.5 so the problem never occurs
there. 

In resource.h (AIX4.2.1):
#define RLIMIT_NOFILE   7               /* max # allocated fds--not enforced */
 
This looks like the maximum number of files in the system, not open files per
process, and it is reasonable that it is a signed 32bit number. Since Solaris2.5
gives 64 on both rl.rlim_cur and getdtablesize(), HU-UX 10.20 gives 60 on both
and Digital Unix 4.0 gives 4096 on both. It seems they are not referring to
the same thing as AIX4.2.1 (NOFILE - Number of Open FILEs / NO of FILEs).

As I see it, ignore RLIMIT_NOFILE on AIX and use getdtablesize() instead.

----
Mail:	Martin Wendel, IT-Support, Uppsala university, S-751 08 Uppsala, Sweden
Phone:	+46-18-4717780, Fax: +46-18-4717725