Configuration • Dependencies
About Dependencies
Most of the check_netapp_pro check-scripts get their data from a local
store-file, which is written by a collector (also known as
getter due to their file-name get_netapp*.pl
).
The getter-scripts are configured in the monitoring-system and run as if
they were normal service-checks.
Some very simple checks do not require any collector. They run like a traditional stand-alone plugin and retrieve their data directly from the filer.
Please keep in mind that all checks from the Performance-Bundle get their
data trough a special performance-data getter
(get_netapp_perfdata.pl
).
Examples
Some examples which show how the data for specific checks gets collected. The relation between the check and the object is fully documented in the table below.
Example 1: Single Dependency
The following example shows which getter must have been run so that the Usage of volumes can be checked on a cluster-mode filer:./get_netapp_cm.pl -H my_cdot_filer --object=volumeSame as above but for a 7-mode filer:
./get_netapp_7m.pl -H my_7m_filer --object=volume
Example 2: Performance Data
To check the per volume latency (PerfVolume) the performance-getter must have been run (at least twice) configured with the correct object and mode as shown below:./get_netapp_perfdata.pl -H my_cdot_filer --object=volume --mode=cm
Example 3: Complex Dependency
Some checks need the data from more than one object. The following example shows the requirements to monitor the overcommitment of aggregates with the check OvercommitAggr:./get_netapp_cm.pl -H my_cdot_filer --object=volume ./get_netapp_cm.pl -H my_cdot_filer --object=aggregateThe next example shows the requirements for checking Snapshots. Here we also see that the objects data must be either retrieved by the generic getter
get_netapp_cm.pl
or in some cases
with a specialized getter for that object like get_netapp_snapshot.pl
.
./get_netapp_cm.pl -H my_cdot_filer --object=volume ./get_netapp_snapshot.pl -H my_cdot_filer # the snapshot-object has its own getter
Dependencies by Bundle and Check
This table lists all dependencies between checks and getters. If no dependency is listed the check is a traditional stand-alone check.
Advanced Bundle
check | depends on object(s) |
---|---|
AggregateState checks the aggregates-state. Alarms if they are not online (configurable). [help] |
aggregate
|
AutosizeMode checks the autosize-mode of autosized volumes if they are all set to given value (grow, grow_shrink, ...) [help] |
volume
|
check_netapp7_cluster checks the status of the high availability service (connected, taken over, takeover failed, ...). | |
check_netapp7_fcpstats monitors the FCP adapters for crc-errors and other values. | |
check_netapp7_snapvault monitors the status and lag-time of Snapvault relations onyl on 7m filers. (Cdot filers are checked with the SnapMirror checks). | |
check_netapp7_vfiler monitors the status of a vFiler (if the vfiler is running and if the network resources are configured) | |
check_netapp_anycli for building checks with simple CLI-commands. | |
check_netapp_asup monitors the ASUP-log and alarms if failed transmissions or collections were found. | |
check_netapp_license checks the filer for expiring (demo-)licenses. | |
check_netapp_nfs-persist checks for non-persistant NFS shares. | |
check_netapp_process checks for runaway processes on a filer (as shown with the ps command). | |
check_netapp_quotas monitors quotas on a NetApp-filer (cluster mode only). | |
check_netapp_scrub sends an alarm if the last scrubs timestamp of an aggregate is over a certain age. | |
check_netapp_takeover sends an alarm if the storage failover facility is disabled or otherwise not active. | |
check_netapp_time checks the filers NTP configuration (at least one ntp server must be configured) and measures the drift between the filers system-time and the monitoring server. Can alarm if that drift is getting too high. | |
check_netapp_unused_lun checks for luns which are online but do not have an initiator connected. | |
DiskCount counts the number of disks matching defineable criteria (disk-type, container (spare, ...), storage-pool). Mostly used to monitor the number of spare-disks of a certain type. [help] |
disk
|
DiskPathQuality hecks disk path qualities, reports i/o-error percentages and raises a CRITICAL error whenever an error percentage is above zero. [help] |
disk
|
DiskPaths checks if each disk has a given number and pattern of paths (A/B, B/A, ABAB, ABBA, ...). [help] |
disk
|
FCPAdapter checks the operational status of all fcp adapters. [help] |
fcp-adapter
|
IfGrp checks if an interface-group has enough links in up-state to still be redundant. [help] |
ifgroup
(7m)
net-port (cm) ifconfig (7m) |
Job checks for failed jobs. [help] |
job
|
LunAlignment searches for misaligned luns. Alarms if a certain number of misaligned luns is reached. [help] |
lun
|
LunSize checks the unused but allocated blocks inside of a LUN. Notfifys the admin if they exceed a certain number (he may than run an unmap procedure on vmware). [help] |
lun
|
LunState checks the LUN-states. Alarms if they are offline or not mapped to an initiator. [help] |
lun
|
NetInterface checks if a network interfaces current-port is not equal to its home-port (output of the CLI command `network interface show -is-home false`). Can also check it's operational mode (up/down). [help] |
net-interface
(cm)
|
OvercommitAggr returns a list of aggregates together with their overcommitment in percent. Overcommitment is the relation between the aggregates size and the total of all its (thin provisioned) volumes sizes. [help] |
volume
aggregate |
Raidstatus alarms if one of the RAIDs is degraded. [help] |
aggregate
|
ReportIOPS reports how many iops are consumed by a given tenant. [help] |
volume
|
ReportSpace reports how much space in bytes are consumed by a given tenant. [help] |
volume
|
ServiceProcessor checks the status of the nodes service-processor and if they are correctly configured (autoupdate, IP-address). [help] |
service-processor
|
ShelfBay checks, the shelf- and disk-port status. Can alarm BYP-status disks. [help] |
shelf-bay
|
Sis checks dedup-values (stale-fingerprint-percentage, run-time of last successfull operation). [help] |
sis
|
SisStatus find volumes whose compression or deduplication is not enabled. [help] |
sis
|
SnapMirrorMetrics checks and logs SnapMirrors (including type Vault): lag-time, last-transfer-duration, last-transfer-size [help] |
snap-mirror
|
SnapMirrorState checks and logs for SnapMirror (including type Vault): health, mirror-state [help] |
snap-mirror
|
SnapshotChangeRate calculates and monitors the change-rate (daily data change) of Snapshots in Gigabytes per day. [help] |
volume
|
SnapshotLessVolume searches for volumes which do not have snapshots. [help] |
volume
|
StorageUtilization Storage Utilization answers the question, “Am I effectively using the storage capacity available to my applications. [help] |
aggregate
|
UnprotectedVolume checks for volumes not protected by SnapMirror. [help] |
volume
snapmirror-destination |
UsageTrend checks the time how long ist would last until an aggregate or volume is full, if the trend of the last 48h (configurable) would continue. Checks both bytes and inodes. [help] |
aggregate
volume |
VolumeAge searches for and flags volumes which have been created a (configurable) long time ago. An old age may be an indication for a forgotten and unused volume-clone. The logic can be also inverted to search for volumes with an exceptional short age (which have been created within the last day or so). [help] |
volume
|
VolumeAutosize checks a volumes total-size and alerts when the volume is close to being full relative to the autosize maximum. [help] |
volume
|
VolumeState checks the volume-states. Alarms if they are not online (configurable). [help] |
volume
|
Vserver monitors the admin-state or the operational-status of a Vserver (running, stopped, inconsistent or defunct) [help] |
vserver
|
Base Bundle
check | depends on object(s) |
---|---|
check_netapp7_head monitors the 7m-heads hardware objects (fans, NVRAM, power-supplies and the temperature-sensors) | |
check_netapp_health monitors the system health. Sends an alarm if the system health status is anything other than 'ok'. | |
check_netapp_spare monitors the status of the spare-low condition (alarms if there is no suitable spare disk available). | |
Disk checks for failed, offline or unassigned disks on the filer. [help] |
disk
|
Head monitors the heads hardware objects (fans, NVRAM, power-supplies, health-state, temperature-sensors) [help] |
head
|
NetPort checks if the network-interfaces are enabled or not [help] |
net-port
|
NetPort7m checks if the network-interfaces are enabled or not [help] |
ifconfig
|
ShelfEnvironment checks, the shelf-status, power-supplys, temperature, fans, voltage-sensor and current-sensor on the shelves. [help] |
shelf-environment
|
Snapshots checks, if the snap-reserve is still sufficient. Thresholds are set in percent; performance-data can be either in percent or absolute (Byte). Additional criteria are the age or name of the snapshot. This can be used for monitoring snapshot-backups and whether they are up to date or not. Also can be used to find snapshots related to a specific application like SNMV and check all volumes for left-over snapshots. [help] |
vol_snapshot
aggr_snapshot volume aggregate |
Uptime checks the seconds since last reboot. [help] |
head
|
Usage checks the used space in volumes and aggregates. Thresholds can be set in GB or percent. [help] |
volume
aggregate |
MetroCluster Bundle
check | depends on object(s) |
---|---|
check_netapp_mc_config checks a metro-clusters mode and configuration state. | |
ClusterPeerHealth checks the health of cluster peer relationships by evaluating several ping- and health-status. [help] |
cluster-peer-health
|
MetroClusterVserver sends an alarm if the configuration state of a MetroCluster vserver changes to unhealthy. [help] |
metrocluster-vserver
|
SyncMirror checks the mirror-status on Metro Cluster aggregates. [help] |
aggregate
|
Performance Bundle
check | depends on object(s) |
---|---|
BadlyPerformingDisks checks all disks in a NetApp system or in a specific raid-group. If a certain number of them performes badly (=has a high utilization) an alarm is send. [help] |
disk
|
BufferCache checks several metrics of the system buffer cache (=system memory) like Buffers being read, Buffers being written, Empty (unused) buffers, Buffers with modified data, Buffers associated with CP IO, ... [help] |
bufcache
|
FlashCache checks several metrics of the external FlashCache (PAM II) like External cache hit rate, Average latency of read I/Os, Number of wafl buffers served off the external cache, ... [help] |
ext_cache_obj
|
LunLatency Checks the 'latency' and 'operations per second' (ops) per LUN. Shows details for total, read, write and other. NetApp recommends monitoring latency as the primary performance indicator. [help] |
lun
|
NVRAM checks data-rates and latency of the NVRAM. [help] |
nvram
|
PerfAggregate checks the 'latency', 'transfer-rate' and other performance counters per aggregate. Shows details for total, read, write and other. Also averages and totals over all aggregates of the filer can be measured and monitored, which allows the monitoring of the aggregate-latency and aggretate-transfer-rate on the filer level. [help] |
aggregate
|
PerfCpu checks one or all processors in a NetApp system for their utilization. [help] |
processor
|
PerfDisk checks all disks in a NetApp system for their utilization (Percentage of time there was at least one outstanding request to the disk). Optional the check can be limited to the disks of a single aggregate. [help] |
disk
disk_constituent |
PerfHostadapter checks and counts rates per host adapter (Fibre Channel, Serial Attached SCSI, and parallel SCSI). [help] |
hostadapter
|
PerfIf checks and counts transfer-rates and errors per network-interface (ifnet). Especially useful for monitoring 10GbE-ports. [help] |
ifnet
|
PerfLif checks and counts transfer-rates and errors per network-interface (lif) for DataONTAP 8.2.x. or higher. [help] |
lif
|
PerfQtree checks some ops-counters per q-tree (nfs-ops, cifs-ops, ...). [help] |
qtree
|
PerfSys checks various performance counters of the NetApp-system (mostly operations/second and transfer-rates). Counters supported: net_data_sent, dafs_ops, total_ops, disk_data_written, net_data_recv, cifs_ops, streaming_pkts, http_ops, nfs_ops, fcp_ops, disk_data_read, iscsi_ops [help] |
system
|
PerfSysNode checks various performance counters of the NetApp-system (mostly operations/second and transfer-rates). Counters supported: net_data_sent, dafs_ops, total_ops, disk_data_written, net_data_recv, cifs_ops, streaming_pkts, http_ops, nfs_ops, fcp_ops, disk_data_read, iscsi_ops. The check evaluates these counters per Node and works only for DataONTAP 8.3 or later. [help] |
system_node
|
PerfTcpIp checks CRC errors and packets send/received for both the IP and TCP layer. [help] |
tcp
ip |
PerfVolume checks the 'latency' and 'operations per second' (ops) per volume. Shows details for total, read, write and other. NetApp recommends monitoring latency as the primary performance indicator. [help] |
volume
|
Wafl reads WAFL performance-counters like cp_count twice and calculates the rate of CPs per second. Different types of consistency-points (wafl-timer, back-to-back, ...) can be checked. The information gathered from this plugin corresponds to the CPty-column of 'sysstat -x 1'. [help] |
wafl
|