|
Release 0.6.4 - April 2006
Appendix C. Sample ScriptsA set of sample scripts are included in this appendix. These are not intended to represent an advanced or complex application of NodeBrain. Here our intent is to provided examples that are relatively easy to understand, so we want to avoid complexity.
The scripts in this appendix build on each other. For example, the first script is an alarm script included in subsequent scripts for alarming. The state monitor script generates events for the event monitor script. The diagnostic script presents a state to the statement monitor script.
We use the notation “…” at the beginning of a line in this appendix to show that a long single-line command has been illustrated on multiple lines. This is not valid NodeBrain syntax---these commands must be entered on a single line.
Mary had a è Mary had a little lamb. … little lamb.
C.1 Sample Alarm ScriptThe sample script provided here is intended to illustrate a relatively simple alarming structure that automatically switches from detailed to summary alarms and back to detailed alarms depending on the rate of similar alarms. Here we use a hypothetical host command “notify” to distribute alarms via email, text pager, or any other notification mechanism available on the host system.
# File: alarm.nb
define alarm expert cache:(~(1h):msgid(^50,100,200,400)); alarm. define group cell; alarm. define severity cell; alarm. define text cell;
alarm. define r0 if(msgid._hitState="normal"): … $ - notify source=map appl=map group="$${group}" … severity="$${severity}" text="$${msgid} A[$${msgid._hits}] $${text}"
alarm. define r1 if(msgid._hitState<>"normal"): … $ alarms. alert ("MAP001","message ids had $${msgid._hits} … level A summary cases over ${alarm._interval}"), … group="$${group}", … severity="critical", … text="$${msgid._hits} level A summary cases of $${msgid} … over ${alarm._interval}, suppressing to avoid message flood"
define alarmS expert cache:(~(1h):msgid,summary(5,10,20)); alarms. define group cell; alarms. define severity cell; alarms. define text cell;
alarms. define r0 if(summary._hitState="normal"): … $ - notify source=map appl=map group="$${group}" … severity="$${severity}" … text="$${msgid} B[$${summary._hits}] $${text}"
alarms. define r1 if(summary._hitState<>"normal"): … $ alarm. alert ("$${msgid}"), … group="$${group}", … severity="critical", … text="$${summary._hits} $${summary} in ${alarmS._interval}"
This script is included in other scripts using the SOURCE command.
source alarm.nb;
Alarms are passed to this script by alerting the alarmS context.
alarms. alert (“msgId”,”summaryMsgText”), … group=”group”,severity=”severity”,text=”msgText”;
C.2 Sample Event Monitoring ScriptThe sample script provided here is intended to illustrate a very simple event monitoring application.
# file: event.nb
portray sample; # This identity must be defined in private.nb set log=”/tmp/event.log”; define l1 listener type=”NBP”,port=49999;
source alarm.nb; # include alarm experts
define cAB expert cache:(~(4h):a,b(20,30,60)); cAB. define r1 if(b._hitState): … $ alarms. alert (“MAP001”,”b’s with $${b._hits} where a=$${a}”), … severity=”$${b.hitState}”, … group=”abc”, … text=”$${b._hits} events where a=$${a} and b=$${b}”;
define event expert; event. define type cell; event. define a cell; # some event attribute event. define b cell; # another event attribute
event. define r1 if(type~~”^weird|^goofy|^silly”): $ cAB. assert (a,b);
This script monitors for 20, 30 and 60 occurrences of weird, goofy, or silly events for any given combination of a and b.
This monitor is stated with the following nb command.
nb event.nb
Events are passed to this monitor via a batch instance of nb.
nb “:>example event alert type=\”goofy\”,a=1,b=2;”
Here we assume the brain “example” has been defined in the local private.nb.
declare example brain sample@host:49999;
You might experiment with this script by executing the following Perl script and then inspecting the monitor log /tmp/event.log.
#!/usr/bin/perl for($i=0;$i<15;$i++){ system(“nb \”:>example event alert type=\\\”goofy\\\”,a=1,b=2;\””); system(“nb \”:>example event alert type=\\\”weird\\\”,a=1,b=3;\””); system(“nb \”:>example event alert type=\\\”silly\\\”,a=1,b=2;\””); system(“nb \”:>example event alert type=\\\”sorry\\\”,a=1,b=2;\””); } C.3 Sample State Monitoring ScriptThis example builds on the event monitor script, event.nb, of the prior section. We include a cache table to keep track of down processes and an expert to schedule process state checks and respond to asserted states.
# file: state.nb
source event.nb; # build on event monitor
# cache of down processes define cHostProcess expert cache:(host,process);
# process state monitoring rules # assert host=”<hostName>”,process=”<processName>”,status=”up|down”; # define state expert; state. define r0 on(~(1m)):- processCheck.pl state. define r1 on(status=”down” and not cHostProcess(host,process)): … event alert type=”weird”,a=state.host,b=state.process; state. define r2 on(status=”up”): … cHostProcess. assert !(state.host,state.process);
This is not a good process monitor, just a simple example. We use a hypothetical Perl script, processCheck.pl, to check for application processes that should be running. (Normally this knowledge might be included in the rules, but we are delegating that logic to the Perl script to keep this simple.) The process checking script must report process states to the monitor with an assertion.
nb “:>example state assert … host=\”host\”, … process=\”process\”, … status=\”state\””;
See if you are convinced any given process must go down at least 20 times in a four hour period for this monitor to trigger an alarm.
Now let’s consider a different structure for the expert called “state” in this example. Here we have independent rules for each process and an application made up of multiple processes. The processCheck.pl script checks one process name at a time.
# state.’host’.’process’ assert state=<n>; # define state expert; state. define ’myhost.com’.’sally’ context; state.’myhost.com’.’sally’ define r0 on(~(1m)):- processCheck.pl sally state.’myhost.com’.’sally’ define r1 on(state<1): … alarm. alert (“MAP002”),group=”process”,severity=”critical”, … text=”sally process is down on localhost”; state.’myhost.com’.’silly’ define r0 on(~(20m)):- processCheck.pl silly state.’myhost.com’.’silly’ define r1 on(state<5): … alarm. alert (“MAP002”),group=”process”,severity=”critical”, … text=”fewer than 5 silly processes on myhost.com”; state. define r1 on(‘myhost.com’.’sally’.state=0 and … ‘myhost.com’.’silly’.state=0): … alarm. alert (“MAP002”),group=”process”,severity=”critical”, … text=”silly-sally application is down on myhost.com”; C.4 Sample Diagnostic ScriptThis is a trivial diagnostic script. For a problem this simple, one might prefer to use a more common programming language. However, it does illustrate an important notion about diagnostic scripts---only values required to “solve” rule conditions are obtained.
#!/usr/local/bin/nb –solve # File: sally.nb
# Consultant scripts written in Perl. define process expert; process. use:processCount.pl sally process. define count cell; # number of sally processes running
define ping expert :pingStatus.pl sally-db.mydomain.com ping. define status cell; # 0 – not pinged, 1 – pinged
define tran expert :tranCount.pl tran. define count cell; # transaction count over past hour
define cpu expert :getCpu.pl cpu. define percent cell; # percent of cpu used over 30 seconds
# Named cell expressions define down cell process.count<1 or ping.status=0; define degraded cell process.count>0 && tran.count<5 && cpu.percent>95;
# Rules whose conditions must be solved define r0 on(down) state=0; define r1 on(!down and degraded) state=0.5; define r2 on(state): … >example state.’local’.’sally’. assert state=$${state};
Copyright © 2003-2006 The Boeing Company |