Documentation
OvercommitAggr-Check
Reports the aggregate(s) rate of overcommitment.
Usage
$ check_netapp_pro.pl OvercommitAggr -H <host> [-m relative|absolute|growth] [-w <n> -c <n>] -s <vserver|node> [...] [--help]
Description
The plugin calculates the aggregates overcommitment with thin-provisioned
volumes. It can check either one, several or all aggregates (controlled by
--include
/ --exclude
).
Depending on the --metric
it will check either the absolute overcommitment in
bytes or the relative overcommitment in percent.
The special metric growth interpolates past growth rates into the future.
Handling of size-less volumes
As it could happen that DataONTAP returns volumes without any size attribute, these
volumes can be skipped and ignored for the calculations by setting the switch
--ignore_size_less_vol
. We recommend to check carefully for the reason why such
volumes exist, before setting this switch.
Understanding and using the growth-metric
This metric can be used to display an alarm if the usage of an aggregate would reach a given threshold within a given time based on the present trend. The most important values in that calculation are:
the retrospective period: --lookbehind
the forecast period: --lookahead
warning / critical thresholds for the usage at the end of the forecast period (in % of aggregate size):
The check than takes the delta within the retrospective period, interpolates this trend into the future (until the end of the forecast-period) and compares these interpolated values with the thresholds given.
Collecting History (--metric=growth
)
If the metric growth is selected, this checks needs to collect its own
history independent from the getters. (So you do not need to set --stm
,
short-term-memory (history) with any of the getters for this check.)
The check requires a minimum of short-term-memory about the past to be able to
interpolate these historical trends into the future and calculate the growth
from it. The appropriate value for the short-term-memory (--stm
) depends on
the lookbehind value: The lookbehind-period must be shorter than the
short-term-memory. E.g. for a --lookbehind=1d
you should set --stm=1d
or even better --stm=25h
.
Handling of reduced or missing history
Aggregates for which not enough history has been collected are tried to be calculated
with a reduced lookbehind. A hint is printed about this reduction. If the history is too
low for a meaningful reduction, the aggregate is not calculated at all and a value of 0%
is printed. In both cases the exit-value is increased according to --reduced_history
Debugging the growth-metric
The interpolation of historical data into the future is not trivial and therefore the configuration and even more the debugging of the growth metric can return surprising results unless you are aware of the background.
The point is that this check maintains it's own history for the growth calculations. So each time you run this check the following happens:
The present state of the volume sizes is saved into a call-file (A call-file is the container for data at a given point in time.)
Any call-file older than the value given with
--stm
is deleted
This has the following consequences:
If you run this check with e.g.
--stm=10
you will delete nearly any history (only call-files from within the last 10 seconds would be kept). Therefore the maximum value for lookbehind which could return a calculated growth would be a ridiculous 10 seconds in that scenario.If you run this check manually (on the command-line) you have to take into account the interval between the program-invocations. E.g. running this check two times within 20 seconds and a lookbehind of one day (1d) can not return a valid growth value as the history reaches just 20 seconds into the past. Only if you have already run the check before and used the same store-dir you may choose a longer lookbehind-period.
Recommendations for the growth metric:
Try to use the default values for
--lookbehind
and--stm
firstIf debugging on the command-line, choose a temporary storedir different from the one configured for the monitoring system.
If you want to debug the existing historical data by running the check on the command-line, choose a large
--stm
so that you do not accidentally delete the long-lasting history of call files. An overly-long--stm
does not harm if run only occasionally.Adding
--explore=history
when running the check with--metric=growth
will show you how long the history stretches into the past and give you an idea about the possible lookbehind.When configuring the service-checks choose the
--stm
carefully. It should be a bit longer than the largest lookbehind you are planning to use but not too long, as this may fill the monitoring server with unnecessary call files.
Simple Examples
Checks the filer. Returns a list of aggregates together with their overcommitment in percent.
Returns a list of aggregates together with their overcommitment in Bytes. Warns if the sum of all thin-provisioned volumes within one aggregate exceeds more than 100 GiB
Checks the growth of volumes per aggregate. Sends a warning message if the growth recorded during the last 24 hours would lead to aggregate usage exceeding 80% in the next 48 hours.
Same as above, but more difficult to read for most human-beings.