Wednesday, February 18, 2015

Restarting Agent Manager on Domino 9.0.1 may crash your server....


Update: Thanks to everyone who commented to point out that this was fixed.  We were all so focussed on Poodle that we only applied the fixes to the servers that serve HTML.  Turns out that IBM Domino 9.0.1 FIX PACK 3 is a good fix to have on all of your servers. 

KILLER AGENTS ! 

Just a fun tidbit we discovered today (fortunately on the test, rather than the production server).

It seems that bug has been introduced in from Domino 9.0.1 which doesn't like having the agent manager restarted. 

Specifically, via the commands;

TELL AMGR QUIT

and

LOAD AMGR

Under normal use, you'd probably have no reason to issue those commands on your server console but if you had a runaway agent or if you were testing/debugging, you might.

Shortly after agent manager loads (in our case, in under 10 seconds), the server will start to report things like;

AMgr: Console command 'LOG.NSF' is unknown
AMgr: Console command 'admin4.NSF' is unknown

The actual name of the database will be different depending upon your system but the problem is the same. The server starts referring to databases like they were console commands.

After a while, the server becomes hard to access and you either need to get to a remote console to shut down Agent Manager or access the server via services and shut down the Domino server (and then reboot).

After a reboot, it all starts working again -- provided that you leave Agent Manager alone. 

Turns out that there has been an APAR for it  (and here) since 17 June 2014 (but it's closed, not sure if that's okay).

Fixing it
Apparently the fix is to "Do not set Log_AgentManager." and "Remove unnecessary MQClose" (thanks IBM, that's really clear).

From what I can gather, this is something to do with the Notes.INI variable;

log_agentmanager=1

Which our server didn't even have.  I added this INI variable and set it to 0 (but didn't restart the domino server, so it's not a proper test).  After I restarted the Agent Manager, the problem reoccurred. I tried setting it to 1 and restarting the Agent Manager.... I'm not sure if I just hit good timing but the problem seems to have disappeared.

Really though, best to avoid agent manager commands during office hours on the production servers if you can help it. 

(one final thing... it looks like Thomas Hampel blogged about this last May, so thank you!)

2 comments:

Lars Berntrop-Bos said...

Following the links and looking up SPR CSAO9FR9ZS reveals this should be fixed as of 9.0.1 FP2. Can you please confirm which version of DOmino you are running? If it is FP2 or above please report this to IBM!

Anonymous said...

Hi Gavin,

This is fixed in FP2 :

http://www-10.lotus.com/ldd/fixlist.nsf/Public/6E143EA62D16E98B85257CFC006E057A?OpenDocument

Best regards,

Mathieu