NodeBrain Module Reference
Release 0.6.4 - April 2006

< Prev Table of Contents Next >

2        Cache Module. 12

2.1      Event Caches. 12

2.2      Cache Definition. 16

2.3      Cache Attributes. 16

2.4      Cache Assertions. 17

2.5      Cache Intervals. 18

2.6      Cache Thresholds. 18

2.7      Cache Rules. 19

2.8      Cache Terms. 19

2.9      Cache Conditions. 19


2          Cache Module

 

The Cache module implements a single skill by the same name.  A cache expert is a table whose rows can be set to expire if not refreshed within some interval of time.  Cache functionality is described in other sections of this document because it has been a built-in feature.  It is still built-in because it is statically linked into nb, but is now packaged as a skill module.  This makes it relatively easy to experiment with enhanced or custom cache modules.

 

2.1        Event Caches

 

To perform real-time event correlation on a stream of events, we must compare the parameters of each new event with parameters of prior events.  For this purpose, NodeBrain provides a simple structure called an event cache, a memory resident table for relatively short-term storage of events.

 

An event cache is logically a table where each row represents an event, defined by the values in each column.  We implemented an event cache as a tree structure with imbedded counters for measuring repetition and variation.  There may be any number of attributes (columns), but we’ll use three here to illustrate the concepts.

 

            Root           Attribute 1     Attribute 2     Attribute 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

Each node in a cache structure contains an event attribute value and three counters: hits, kids, and rows.

 

  • Hits – number of times we’ve seen the value represented by a node
  • Kids – number of subordinate nodes in the next column (branches)
  • Rows – number of rows in the subordinate sub-cache (leafs)

 

The hit count is used to measure repetition.  When we “add” a row (set of values) to an event cache, if a node already exists for a given value, we just increment the hit counter instead of inserting a new node. The first number below a box (node) above represents the hit count.  You will notice that hit counts sum from right to left, so the total hit count in the root node is the sum of the hit counts in each column.

 

The kid and row counts are used to measure variation.  We show these counts above as the second and third number below a box.  The root node in our example has 3 kids and 7 rows.  The first node in the “Attribute 1” column has 2 kids and 3 rows. 

We can also represent an event cache as a sideways tree.  What we call the kid count is just the number of branches on the right side of a node.  What we call the row count is just the number of final nodes to the right of a node.  A final node on the right represents a complete row, and only has a hit count.  The event (A,B,B) has occurred 7 times (hit count is 7).  The event (A,B) has a hit count of 9, a kid count of 2 and a row count of 2.  The event (A) has a hit count of 14, a kid count of 2 and a row count of 3.  Notice that both hits and rows sum from right to left.

 

 

            Root           Attribute 1     Attribute 2     Attribute 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

Events are retained in an event cache for a specified interval.  Let’s say the cache interval is 4 hours and we last asserted (B,B,A) at 02:00.  At 06:00, the hit counter on the final node A of (B,B,A) is decremented.  We then move from right to left decrementing counts down to the root node.  If a hit counter goes to zero, the node is removed.  This causes the kid count on the node to the left to be decremented.  So at any time the counters represent what has happened over the past cache interval.  This is a sliding interval, not a fixed interval.  At 06:23, the cache represents the activity from 02:23 to 06:23.

 

This enables easy implementation of rules of the forms listed below.  Here we can use a cache (x,y) with an interval of T.  In both cases, we assert (x,y) when we get (A,x,y).  In the first case when we get (C,x,y), we test for (x,y) and respond on true.  In the second case when we get (C,x,y), we assert not (x,y).  In the second case, we respond when any (x,y) expires.

 

1)  if (A,x,y) occurs followed by (C,x,y) within T seconds, then …

2)  if (A,x,y) occurs without (C,x,y) within T seconds, then …

 

Up to three thresholds may be defined for each type of counter (hits, kids, rows) for each attribute (column).  A reset threshold may also be specified.

 

 

 

When a counter crosses a threshold for the first time, it triggers a response.  We represent a trigger point with a vertical bar in the figure above.  The actual response is determined by rules that are described later.  For now, let’s just say the cache triggers the response rules.  Any given threshold (minor, major, or critical) will not trigger a second time for a given counter until the count has dropped to the reset threshold.  The reset threshold is used as evidence that we have ended an abnormal episode and returned to a normal state.  If a threshold is crossed again, it is treated as a new episode.  If a reset threshold is not specified, it defaults to zero.

 

 


 

2.2        Cache Definition

 

 

   Syntax:

 

 

cacheDefineCmd

::= define š term š expert cacheSpec ] •

 

 

 

 

cacheSpec

::= cache :(“ [ cacheRootSpec : ] cacheAttrList)

 

 

 

 

cacheRootSpec

::= [ ! ] [ cacheInterval ] [ cacheThresholds ]

 

 

 

 

cacheInterval

::= ~(“ integer ( s | m | h | d ) “)

 

 

 

 

cacheThresholds

::= [ cacheHitSpec ] [ cacheKidSpec ] [ cacheRowSpec ]

 

cacheHitSpec

::= “(cacheThreshold)

 

cacheKidSpec

::= “[cacheThreshold]

 

cacheRowSpec

::= “{cacheThreshold}

 

cacheThreshold

::= [ ^ integer  , ] integer [ , integer [ , integer ] ]

 

 

 

 

cacheAttrList

::= cacheAttrSpec { , cacheAttrSpect }

 

cacheAttrSpec

::= term [ cacheThresholds ]

 

 

 

 

 

The cache module supports a relatively complex syntax for specifying various cache parameters.  This complexity is illustrated by the following example.

 

          define connie expert cache:(~(3h):x(100,200,300)[3,6,9]{10,20,30},y,z);

 

We describe the individual parameters in the following sections as we cover related topics.

 

2.3        Cache Attributes

 

Cache attributes are specified as a list of names within parentheses.  To define a cache named abc with attributes a, b and c, the following define statement is used.

 

          define abc expert cache:(a,b,c);

 

The terms a, b, and c are automatically defined within the context and may be referenced as if they had been explicitly defined as follows.

 

          abc. define a cell;

          abc. define b cell;

          abc. define c cell;

 


2.4        Cache Assertions

 

Rows are added to a cache, and counters incremented, when we assert values.

 

          abc. assert (“NodeBrain”,”patch”,”Solaris”);

          abc. assert (“NodeBrain”,”patch”,”Linux”);

          abc. assert (“NodeBrain”,”defect”,”HP-UX”);

          abc. assert (“NodeBrain”,”patch”,”Linux”);

          abc. assert (“cron”,”patch”,”Linux”);

 

After the assertions above, the abc cache root node would have 5 hits, 4 rows and 2 kids.  Each unique partial row would have counts as shown.

 

          () 5 hits, 4 rows, 2 kids

          (“NodeBrain”) 4 hits, 3 rows, 2 kids

          (“NodeBrain”,”patch”) 3 hits, 2 rows, 2 kids

          (“NodeBrain”,”patch”,”Solaris”) 1 hits

          (“NodeBrain”,”patch”,”Linux”) 2 hits

          (“NodeBrain”,”defect”) 1 hits, 1 rows, 1 kids

          (“NodeBrain”,”defect”,”Linux”) 1 hits

          (“cron”) 1 hits, 1 rows, 1 kids

          (“cron”,”patch”) 1 hits, 1 rows, 1 kids

          (“cron”,”patch”,”Linux”) 1 hits

 

The assertion syntax illustrated above is normally used when we want to include assertions for terms within the expert’s context but not managed by the skill module.

This provides additional information for use by the expert’s rules.

 

          abc. assert (“NodeBrain”,”patch”,”Solaris”),component=”nb”,status=”beta”;

 

If that is not a requirement, one may prefer the following syntax.

 

assert abc(“NodeBrain”,”patch”,”Linux”);

 

When a cache assertion is a rule action, avoiding the context prefix enables the elimination of the verb ASSERT and simplifies our reference to terms within the context where the rule is defined.

 

          define failedNode expert cache:(~(20m):node);

          define event expert;

          event. assert ?.type,?.system,?.text;

 

          event. define r1 on(type=”Failed”) failedSystem(system);

          … instead of …

          event. define r1 on(type=”Failed”):failedSystem. assert (event.system);

          …

          event. alert type=”Failed”,system=”happynode.com”;

 


 

 

2.5        Cache Intervals

 

A cache interval is optional.  If specified, it determines the life of cache assertions.  The following example specifies a cache interval of 4 hours.

 

define abc expert cache:(~(4h):a,b,c);

 

Four hours after a row is asserted to this cache the hit counter for that row is decremented.  When a hit counter goes to zero, a row is removed.  This may have an impact on cache conditions and rules.

 

          abc(“fred”,”joe”,”sam”)                 # cache condition

         

          abc. assert (“fred”,”joe”,”sam”);     # cache assertion

 

If the assertion above is not repeated within four hours, the cache condition will transition from true to false when the row expires.  As long as the assertion is repeated in less than 4 hours the cache condition remains true.

 

Specifying a “!” in front of the time interval will cause the context to be alerted each time a row expires.

 

          define abc expert cache:(!~(4h):a,b,c);

 

 

2.6        Cache Thresholds

 

Cache thresholds are specified as a list of numbers following an attribute.  The type of threshold is specified by the choice of enclosing symbols.

 

          ()       - hits

          []       - kids

          {}      - rows

 

The command below defines a cache with a hit threshold of 20, a kid threshold of 3 and a row threshold of 10 for the attribute “a”.

 

          define abc expert cache:(a(20)[3]{10},b,c);

 

When a set of values are asserted to a cache and a threshold is reached, the cache expert is automatically alerted.  Rules are written to respond to these alerts.  We’ll cover that in a bit.

 

Four thresholds may be set for each attribute: reset, minor, major, and critical.  If a reset threshold is specified, it must be the first value in the list, and must be prefixed with a toggle “^”.  The following example specifies thresholds of 20, 100, and 250 for the hit count on any row in the cache.  A reset threshold of 5 is specified.

 

          define abc expert cache:(a,b,c(^5,20,100,250));

 

When a threshold is reached on a given counter the expert is alerted.  For a given counter to trigger on the same threshold again, it must first drop down to the reset threshold.  By default the reset threshold is zero.

 

The following example specifies thresholds of 3, 9 and 27 for (a,b) kids, or the number of unique values of c for a given value of a and b.

 

          define abc expert cache:(a,b[3,9,27],c);

 

All of these exampled may be combined in a single statement because each type of threshold may be specified for any attribute.  Actually the last attribute only has a hit counter since there are no kids or subordinate rows.

 

          define abc context cache:(a(20)[3]{10},b[3,9,27],c(^5,20,100,250));

 

To specify thresholds at the cache root level, use a colon before the first attribute.  The following definition establishes a hit threshold of 1000 and a kid threshold of 3 at the cache root level.  The abc expert will be alerted when 3 unique values of “a” are asserted or 1000 assertions are made.

 

          define abc expert cache:((1000)[3]:a,b[3,9,27],c);

 

 

 

2.7        Cache Rules

 

Cache rules operate like any other rule within a context, but you need to base them on terms that are implicitly defined and alert commands that come from “within” a cache when thresholds are reached.  Consider the following definitions.

 

define abc expert cache:(~(4h):a,b[10,20],c(40,60));

 

abc. define r1 if(b._kidState):$ action including $${a} $${b} $${b._kids}

abc. define r2 if(c._hitState):$ action including $${a} $${b} $${c} $${c._hits}

 

When a cache assertion causes a counter to hit a threshold, the context is alerted with values assigned to special terms that describe the threshold condition.  For example, suppose the following assertions produces a tenth value of c for (a,b).  In other words, “sam” is the tenth value of c for (“fred”,”joe”) within the previous 4 hours.

 

          abc. assert (“fred”,”joe”,”sam”);

 

The internal alert is equivalent to the following.

 

          abc. alert a=”fred”, b=”joe”, c=”sam”, b._kidState=1, b._kids=10;

 

2.8        Cache Terms

 

We have referenced terms that are automatically defined when a cache is defined and asserted by a cache when it alerts the context.  Here is a list of them.

 

          attribute                 cache attribute (one for each column)

 

          attribute._hits                   number of times the node has been asserted

          attribute._kids         number of nodes in the next column

          attribute._rows        number of rows a node participates in

 

          attribute._hitState    hit threshold reached

          attribute._kidState   kid threshold reached

          attribute._rowState  row threshold reached

                            

          _interval                 cache interval in text appropriate for a message

          _action                   “expire” (only when “!” precedes the interval)

 

 

2.9        Cache Conditions

 

A cache condition returns a value of True if the parameter list matches a row in the cache, a value of Unknown if any of the arguments are Unknown, and otherwise returns False. 

 

Suppose we want to take some action when an event of Type T2 occurs within 5 minutes after an event of Type T1 if both events have the same value for attributes A and B.  This could be accomplished with the following rule set.

 

          define event expert;     # define a context to be alerted

          event. define t1ab expert cache:(~(5m):a,b);               # define cache

          event. define r1 if(Type=”T1”) t1ab(A,B);                     # populate cache

event. define r2 if(Type=”T2” and t1ab(A,B)):action      # lookup

 

The highlighted cache condition is True when the t1ab cache contains an entry for the current value of A and B.  If either A or B is Unknown, the cache condition is Unknown.  Otherwise, the cache condition is False. 

 

The event stream for this context is generated through a series of commands of the following form.

 

          event. alert Type=”type”,A=”a”,B=”b”;

 

When an event of type T1 occurs, rule r1 asserts (A,B) to the cache.  This inserts an entry for the current value A and B.  This entry will expire within 5 minutes.  When an event of type T2 occurs, rule r2 will fire if the cache contains an entry for the values of A and B.  If the following events occur within a 5 minute period, the final event will cause rule r2 to fire.

 

          event. alert Type=”T1”,A=”man”,B=”happy”;

          event. alert Type=”T2”,A=”pilot”,B=52;

          event. alert Type=”T1”,A=”sister”,B=”good”;

          event. alert Type=”T0”,A=”buddy”,B=”cool”;

          event. alert Type=”T2”,A=”man”,B=”happy”;

         

If you defined the cache without scheduled expiration of entries, you must explicitly delete entries when appropriate.

 

          event. define t1ab expert cache:(a,b);      # define cache

          event.t1ab. assert (“abc”,”xyz”);             # insert entry if new

          event.t1ab. assert !(“abc”,”xyz”);            # delete entry

          event.t1ab. assert !(“abc”);                    # delete group of entries

          event.t1ab. assert !();                           # delete all entries

 

With or without an expiration period, you may want to delete entries based on some condition.  This is simply a way of forcing the cache condition to be False, just as asserting an entry forces it to be True.  So, you can think of a cache condition as a dynamic set of named Boolean switches.  You address a specific switch via the argument list.

 

Because a cache condition cache(x) is False when cache(x) has not been asserted or the assertion has expired, we say that a cache uses the closed world assumption.  This is just a name for saying what isn’t known to be true is assumed to be false.  Because of this assumption, a cache condition will never return a value of Unknown when the arguments are known---it is always True or False.  This means the condition ?cache(x) is never true.

 

Although it may seem logically inconsistent, a cache will accept an assertion to an Unknown state, and arrive at a False state.

 

          assert ?cache(x);  # make the condition cache(x) false

 

This is consistent with the closed world assumption.  Once we no longer know something to be True, it is immediately assumed to be False. 


Copyright © 2003-2006 The Boeing Company