Availability Measurement
Availability is a measure of successful polling operations versus failed polling operations. The Availability Success Rate is expressed as a percentage and reflects whether or not Lithium was able to obtain monitoring data from the Device. An average Response Time is also tracked for the polling operations. One single SNMP GET request for one SNMP OID is an example of a single polling Operation for Availability.
All Devices have an "Availability" container that contains a "Master" object and an object per protocol being used to monitor the device. The "Master" object is an aggregated total of the Successful and Failed polling operations across all protocols being used to monitor the Device.
Triggers (thresholds) are applied to the "Success Rate" Metric under the "Master" Availability object. The default Triggers are as follows:
- If the Success Rate drops to 0% for 2 * Device Refresh Interval then a Critical Incident is raised.
- If the Success Rate drops to 61% - 90% for 10 minutes then a Warning Incident is raised.
- If the Success Rate drops to 1% - 60% for 10 minutes then an Impaired Incident is raised.
The Triggers are designed to raise a Critical Incident altering to a total failure in communication with the monitored device if this condition persists for twice the configured Refresh Interval. That is, if the device does not respond to any requests for monitoring data in two consecutive polling intervals then a Critical Incident is raised.
If there is a persistent partial loss of communication with the device for 10 minutes than a Warning or Impaired condition is raised.