安装nagios插件: http://7.down.119g.com:7766/7/52DB48B15572B98C6FCD8AAEC2EF4D2AAD7640D3/nagios-plugins-1.4.16.tar.gz tar zxvf nagios-plugins-1.4.14.tar.gz cd nagios-plugins-1.4.14 ./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios make && make install
http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz tar zxvf nrpe-2.13.tar.gz cd nrpe-2.13 ./configure make all make install-plugin make install-daemon make install-daemon-config make install-xinetd
下载check_megaraid_sas脚本,该脚本通过MegaCli命令来获取监控信息的Nagios插件, 使用perl编写的。 下载地址: http://www.techno-obscura.com/~delgado/code/check_megaraid_sas 修改该脚本内容: # vi check_megaraid_sas a. 查找第35行: use lib qw(/usr/lib/nagios/plugins /usr/lib64/nagios/plugins); # possible pathes to your Nagios plugins and utils.pm 修改为: use lib qw(/usr/local/nagios/libexec); # possible pathes to your Nagios plugins and utils.pm 说明:/usr/local/nagios/libexec 为nrpe 在监控端主机上的路径。
b. 查找第52-53行: my $megaclibin = '/usr/sbin/MegaCli'; # the full path to your MegaCli binary my $megacli = "sudo $megaclibin"; # how we actually call MegaCli 修改为: my $megaclibin = '/usr/sbin/MegaCli'; # the full path to your MegaCli binary my $megacli = "$megaclibin"; # how we actually call MegaCli 说明:/usr/sbin/MegaCli为MegaCli的绝对路径。
#!/bin/bash # check memory script # sunny 2008.2.15 # Total memory TOTAL=`free -m | head -2 |tail -1 |gawk '{print $2}'` # Free memory FREE=`free -m | head -2 |tail -1 |gawk '{print $4}'` # to calculate free percent # use the expression free * 100 / total FREETMP=`expr $FREE \* 100 / $TOTAL` if [ $FREETMP -ge 15 ] then echo "OK: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)" exit 0 fi if [ $FREETMP -ge 6 ] || [ $FREETMP -lt 15 ] then echo "WARNING: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)" exit 1 fi if [ $FREETMP -le 5 ] then echo "ERROR: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)" exit 2 fi
LVS监控:
vi check_lvs.sh
MYSQL监控:
在需要监控的mysql数据库上建一个专门给Nagios使用的库
mysql>create database nagdb default CHARSET=utf8; mysql> grant select on nagdb.* to 'nagios'@'192.168.1.100'; mysql> update mysql.user set Password = PASSWORD('nagios') where user='nagios';
define host{ name rhel-name ; The name of this host template use generic-host ; This template inherits other values from the generic-host template check_period 24x7 ; By default, Linux hosts are checked round the clock check_interval 5 ; Actively check the host every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 3 ; Check each Linux host 10 times (max) check_command check-host-alive ; Default command to check Linux hosts notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day ; Note that the notification_period variable is being overridden from ; the value that is inherited from the generic-host template! notification_interval 120 ; Resend notifications every 2 hours notification_options d,u,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! }
####
define service{ name rhel-sys ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 5 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 60 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! }
####
define service{ name rhel-raid ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 5 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,c ; Send notifications about warning, unknown, critical, and recovery events notification_interval 0 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! }
#####
define service{ name rhel-service ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 3 ; Check the service every 10 minutes under normal conditions retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 10 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! }
define host{ name mysql-server ; The name of this host template use generic-host ; This template inherits other values from the generic-host template check_period 24x7 ; By default, Linux hosts are checked round the clock check_interval 5 ; Actively check the host every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 3 ; Check each Linux host 10 times (max) check_command check-host-alive ; Default command to check Linux hosts notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day ; Note that the notification_period variable is being overridden from ; the value that is inherited from the generic-host template! notification_interval 120 ; Resend notifications every 2 hours notification_options d,u,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! }
#vi mysql.cfg define service{ use generic-service,mysql-server host_name mysql service_description Mysqld_pnp check_command check_mysqld!nagios!nagios!nagdb }
这里贴一个监控配置: vi mysql.cfg define host{ use linux-server,mysql-server host_name mysql alias My mysql Host address 192.168.34.101 }
define service{ use generic-service,mysql-server host_name mysql service_description Mysqld check_command check_mysql!nagios!nagios!10!60 } #define service{ # use generic-service,mysql-server # host_name mysql # service_description Mysqld_pnp # check_command check_mysqld!nagios!nagios!nagdb #} define service{ use generic-service,mysql-server host_name mysql service_description CHECK USERS check_command check_nrpe!check_users } # Create a service for monitoring the uptime of the server # Change the host_name to match the name of the host you defined above define service{ use generic-service,mysql-server host_name mysql service_description Load check_command check_nrpe!check_load } # Create a service for monitoring CPU load # Change the host_name to match the name of the host you defined above define service{ use generic-service,mysql-server host_name mysql service_description SDA1 check_command check_nrpe!check_sd1 } # Create a service for monitoring memory usage # Change the host_name to match the name of the host you defined above define service{ use generic-service,mysql-server host_name mysql service_description SDA2 check_command check_nrpe!check_sd2 } # Create a service for monitoring C:\ disk usage # Change the host_name to match the name of the host you defined above define service{ use generic-service,mysql-server host_name mysql service_description Zombie check_command check_nrpe!check_zombie_procs } # Create a service for monitoring the W3SVC service # Change the host_name to match the name of the host you defined above define service{ use generic-service,mysql-server host_name mysql service_description total procs check_command check_nrpe!check_total_procs } define service{ use generic-service,mysql-server host_name mysql service_description Cpu check_command check_nrpe!check_cpu } define service{ use generic-service,mysql-server host_name mysql service_description Mem check_command check_nrpe!check_mem } #define service{ # use generic-service,mysql-server # host_name mysql # service_description Http # check_command check_http!/ # } define service{ use generic-service,mysql-server host_name mysql service_description Ping check_command check_ping!100.0,20%!500.0,60% } #define service{ # use generic-service,mysql-server # host_name mysql # service_description check_memcached_11211 # check_command check_memcached_11211!80!100 # } #define service{ # use generic-service,mysql-server # host_name mysql # service_description check_memcached_response_11211 # check_command check_memcached_response_11211!300!500 # } #define service{ # use generic-service,mysql-server # host_name mysql # service_description check_memcached_hit # check_command check_memcached_hit!10!5 # }
错误1】在nagios页面中,有个Map链接,一点开就报错: The requested URL /nagios/cgi-bin/statusmap.cgi was not found on this server --解决: statusmap.cgi依赖gd开发包 通过yum安装gd开发包,然后重新编译configuration及make nagios cgi部分 yum -y install gd gd-devel ./configure --with-gd-lib=/usr/lib --with-gd-inc=/usr/include #make all #make install #make install-init #make install-config #make install-commandmode make install-config
2】普通用户(除nagiosadmin外所有用户)点nagios页面中的service等链接,都出现如下错误: It appears as though you do not have permission to view information for any of the hosts you requested... If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI and check the authorization options in your CGI configuration file. ---原因: 认证用户不正确,编辑etc/cgi.cfg,该文件里默认的是nagiosadmin,如果新建的用户要想查看的话,得添加进去,多用户用逗号分开 authorized_for_system_information=nagiosadmin authorized_for_configuration_information=nagiosadmin authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin 如果不是 nagiosadmin 需要到后面添加,例子 authorized_for_system_information=nagiosadmin,admin