[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

router stalls on incomplete/misconfigured NS entries



I'm experiencing a stalled router in the following case:

the From: line on an incoming message contains an address for which
the following occurs:

search_res: deferred: connected.com: cname (host name lookup) error
search_res: deferred: connected.com.: mx (host name lookup) error
search_res: deferred: connected.com.: a (host name lookup) error
	
ZMailer looks to contain some similar code to the program 'host',
which gives the same output string as the last line above. The router
wants to verify the source address as well as the destination, and so 
queries DNS for an answer verifying the origin of the message. The normal
running router is supposed to get to a HOLD stage on this type of 
outgoing destination, but never does - remember this is incoming, and 
the local destination is trivial; it hangs trying to verify the source
address).

The older 'host' program I have has the same symptom - it just hangs.
(newer code exits nicely after a timeout)

The interactive router does eventually (several minutes later) get to
a holding pattern if I query it to resolve an outgoing address:

ZMailer router (2.2.m8 #2: Thu May  5 21:27:44 MDT 1994)
  PostMaster@Phys.UAlberta.CA:/DEVEL/zmailer.221m8/router
Copyright 1992 Rayan S. Zachariassen

z$ router test@connected.com
<jmack.interactive@relay.Phys.UAlberta.CA>: address: test@connected.com
[minutes pass...]

search_res: deferred: connected.com: cname (host name lookup) error
search_res: deferred: connected.com.: mx (host name lookup) error
[minutes pass...]

search_res: deferred: connected.com.: a (host name lookup) error
<jmack.interactive@relay.Phys.UAlberta.CA>: deferred: 
[minutes pass...]

NS:connected.com./a: test@connected.com
(((hold NS:connected.com./a test@connected.com default_attributes)))


But both under 2.28mea and my testbed 2.93mea, what happens is that the 
router either goes IDLE or dies, and messages pile up in the queue. After
some lengthy time (perhaps hours, if there is more than 1 consecutive
message of the problem type in the queue), the message may get delivered,
but at the expense of all the others being held up on account of this
awful time-out handling.

A few bad source messages of this type in the queue can create a 
situation where no mail gets to the scheduler for several hours, or even 
days.

I've come in several mornings now to find 100 or so entries queued for 
delivery, but with the router doing absolutely nothing (other than failing 
to time-out on the query in a reasonable time). 5 of those pesky problem 
entries had held up the router overnight.

Here is generally what I see: (Ultrix, zm 2.28mea)

>mailq -sv
MailQ on localhost
100 entries in router queue: processing
0 entries in scheduler queue: idle
Transport queue is empty

>ps -aux|grep router
root      2454   0.0  3.6 1.95M  728K ?  I   1:41 /zmailer/bin/router -dkn1

The only way I can get things running again is to pull the offending 
message(s) from the router directory, and then explicitly kill and restart 
the router ('zmailer kill router' will NOT kill that process!).

The particular address which I found to be problematic for our site recently 
is in the sample router entry below (destined for a local user. The source
is a list):

--
external
rcvdfrom hebron.connected.com ([162.148.251.254])
with SMTP
from <angrychr@connected.com>
to <afaust@Phys.UAlberta.CA>
Received: (from angrychr@localhost) by hebron.connected.com (8.6.9/8.6.9) 
id NAA
01848; Tue, 13 Sep 1994 13:46:49 -0700
Date: Tue, 13 Sep 1994 13:46:48 -0700 (PDT)
From: Alice In Chains List <angrychr@connected.com>
To: Alice In Chains List <KhadejahJ@aol.com>
Subject: Re: Alice in Chains - The Glam Rocks Years 
Message-ID: <Pine.SUN.3.90.940913134507.1715A-100000@hebron.connected.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
--

Try sticking this file in the router directory (change the 'to' line to be
local to your site) and see what I mean!

For this particular address, there seems to be something amuck in the
DNS (likely a missing CNAME), since there is no 'A' entry, and no 'MX'
entry either. It may be a SLIP host which goes away and takes any
additional DNS info away with it. The domain exists, but it is not
particularized in any way to have a CNAME, MX, or A entry for it's
generic identity.

The main problem for ZMailer is it makes the router pause, die, or
otherwise spin it's wheels :-(

Idealistically all NS's would be perfectly configured, and this would 
never happen (but true to form, it is always bound to happen). In our
case ZMailer should never choke because of somebody else's problem or
network oddity.

Any ideas on changing the router's behavior to (quicky) skip over these
problem entries, and get on with delivering the remaining ones? Is there
some configurable parameter in the config files to do a reasonable 
workaround?

Thanks,
--
James S. MacKinnon             Office: P-139 Avahd-Bhatia Physics Lab
Computing/Networking           Voice : (403) 492-8226
Department of Physics
University of Alberta          email : Jim.MacKinnon@Phys.UAlberta.CA
Edmonton, Canada T6G 2N5             : jmack@Phys.UAlberta.CA