[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

"SNMP" monitoring of ZMailer system ...



That thing has for a long time been one of the "TODO" items.
Finally as I got the scheduler to track properly how much it
has stored messages, I was confident enough to add a more
generic way to track and query these counter things.

Here is what  SiteConfig.in  says  (e.g. zmailer.conf  template):

  # SNMP-like global system instance monitoring datablock path:
  # This _file_ has absolute path (bo substitutions are allowed),
  # it is shared in between all principal subsystem components
  # in mmap(MAP_SHARED, MAP_READ|MAP_WRITE) mode. Counters in
  # this file are NEVER reset.  Gauges are managed as shadows
  # of subsystem internal state.
  SNMPSHAREDFILE=@POSTOFFICE@/.zmailer.SNMP.block

I did think for a while about doing  SysV-SHM segment, but didn't
in the end.  Systems that can support this at all can do it with
mmap(), I do think.

One of the things behind this thinking, is that I want some of
those values to be very PERSISTENT -- over machine reboots, even.


There are some disadvantages, though:

When you restart the scheduler with actual queue in it, then these
two counters (from  "mailq -M") will count the pre-existing queue
again:

  printf("SC.ReceivedMessages             %10u\n",
         M.mtaReceivedMessagesSc);
  printf("SC.ReceivedRecipients           %10u\n",
         M.mtaReceivedRecipientsSc);

On the other hand, a normal system should be running hundreds of
days in between restarts, so that isn't a thing worth to bother.


The shared segment memory block does contain "magic" signature
value essentially telling its version.  When I decide to change
the structure, I will also change the magic.  That keeps new
binaries from messing up the old structure. You will notice
this by "mailq -M" giving an error. (which should explain it
quite precisely)  Just "rm"  away the old file, and restart
servers.


I have also experimented a bit with making web-based "mailq"
view into the running system.  See:
   http://vger.kernel.org/z/

Things will yet change -- most likely "mailq -M" variable names
have stabilized now, but many of them don't have data feeders.
Some of them have bad data..  e.g. observe gauges:
   SC.TransportAgentsActive-G             -53
   SC.TransportAgentsIdle-G                53
Negative number of processes is -- bug.

Now if somebody with a bit more experience with RRDTOOL and/or MRTG
could step up and whip up graphers for these variables....
When parsing  "mailq -M" output, do it by means of variable name
labels, not by their line number in the output...
(e.g. those WILL change.)

I have thought of writing a perl-module that extracts these counter
and gauge values, but "mailq -M" does it well enough, and if you
run the monitor by every 5 minutes, doing fork and reading from
pipe isn't all that much trouble (compared to perl module coding..)

I just tought ... that I might move the whole "mailq -M" business
into MAILQ-v2 protocol, then there would be simple way to extract
the data remotely, presuming that the scheduler is running.

... but then also there is need to make some other tool to test
that the shared segment is all right.  Presently  smtpserver, router,
and scheduler attach to it (success or not) and keep using local
data buffer if global isn't available.

Should those complain at start time, if the attachment isn't
successfull ?  They could, after all.  They shouldn't (IMO)
refuse to operate.

-- 
/Matti Aarnio	<mea@nic.funet.fi>
-
To unsubscribe from this list: send the line "unsubscribe zmailer" in
the body of a message to majordomo@nic.funet.fi