码迷,mamicode.com
首页 > 其他好文 > 详细

监控利器--nagiox

时间:2016-05-23 19:38:51      阅读:945      评论:0      收藏:0      [点我收藏+]

标签:服务器   监控   角色   

Nagios安装
基础环境
[root@m01 yum.repos.d]# cat /etc/redhat-release
CentOS release 6.7 (Final)
[root@m01 yum.repos.d]# uname -r
2.6.32-573.el6.x86_64
[root@m01 yum.repos.d]# uname -m
x86_64

1、准备3台服务器
管理IP        角色        备注
10.0.0.61    nagios        Nagios 服务器端
10.0.0.8    web01        被监控的客户端服务器
10.0.0.7    web02        被监控的客户端服务器

2、设置yum安装源
[root@m01 ~]# ping www.baidu.com(确保可以上网)
PING www.a.shifen.com (61.135.169.121) 56(84) bytes of data.
64 bytes from 61.135.169.121: icmp_seq=1 ttl=128 time=3.99 ms

cd /etc/yum.repos.d/
/bin/mv CentOS-Base.repo CentOS-Base.repo.oldboy.ori
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos.repo

3.解决Perl软件编译问题
[root@m01 yum.repos.d]# echo ‘export LC_ALL=C‘>>/etc/profile
[root@m01 yum.repos.d]# tail -1 /etc/profile
export LC_ALL=C
[root@m01 yum.repos.d]# source /etc/profile
[root@m01 yum.repos.d]# echo $LC_ALL
C
[root@m01 yum.repos.d]# cd ~

4.关闭防火墙及selinux
[root@m01 ~]# /etc/init.d/iptables stop
[root@m01 ~]# /etc/init.d/iptables status
iptables: Firewall is not running.
[root@m01 ~]# chkconfig iptables off
[root@m01 ~]# chkconfig --list iptables   
iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@m01 ~]#sed -i ‘s#SELINUX=enforcing#SELINUX=disabled#g‘ /etc/selinux/config
修改配置文件则永久生效,但是必须重启系统
[root@m01 ~]# getenforce
Disabled

5、解决系统时间同步问题
[root@m01 ~]# crontab -l
#time sync by oldboy at 2010-2-1
*/5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&1

6、安装Nagios服务端所需安装包
yum install gcc glibc glibc-common -y
yum install gd gd-devel -y
yum install mysql-server -y
yum install httpd php php-gd -y

[root@m01 ~]# rpm -qa mysql httpd php
httpd-2.2.15-47.el6.centos.4.x86_64
php-5.3.3-46.el6_7.1.x86_64
mysql-5.1.73-5.el6_7.1.x86_64

7、创建Nagios服务器端需要的用户及组
[root@m01 ~]# /usr/sbin/useradd nagios
[root@m01 ~]# /usr/sbin/useradd apache -M -s /sbin/nologin
useradd: user ‘apache‘ already exists
[root@m01 ~]# /usr/sbin/groupadd nagcmd                  
[root@m01 ~]# /usr/sbin/usermod -a -G nagcmd nagios
[root@m01 ~]# /usr/sbin/usermod -a -G nagcmd apache
[root@m01 ~]# id -n -G nagios      
nagios nagcmd
[root@m01 ~]# id -n -G apache
apache nagcmd

8、上传软件包到指定目录或通过URL下载
mkdir -p /home/oldboy/tools/nagios
cd /home/oldboy/tools/nagios
rz

====================================================
安装Nagios服务器端
tar xf nagios-3.5.1.tar.gz
cd nagios
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode

1、安装Nagios Web配置文件及创建登录用户
make install-webconf
htpasswd -bc /usr/local/nagios/etc/htpasswd.users oldboy 123456
cat /usr/local/nagios/etc/htpasswd.users
/etc/init.d/httpd reload

2、添加监控报警信息接受的Email地址
cp /usr/local/nagios/etc/objects/contacts.cfg{,.ori}
sed -i ‘s#nagios@localhost#976199267@qq.com#g‘ /usr/local/nagios/etc/objects/contacts.cfg
使用第三方邮件服务商提供的邮箱,把下列一行添加达到/etc/mail.rc里
[root@m01 tools]# tail -1 /etc/mail.rc 
set from=18516688992@163.com smtp=smtp.163.com smtp-auth-user=18516688992 smtp-auth-password=tian123 smtp-auth=login

3、配置Apache服务并加入系统开机自启动
[root@m01 tools]# /etc/init.d/httpd start
Starting httpd:
[root@m01 tools]# /etc/init.d/httpd restart
Stopping httpd:                                            [  OK  ]
Starting httpd: httpd: Could not reliably determine the server‘s fully qualified domain name, using 172.16.1.61 for ServerName
                                                           [  OK  ]
[root@m01 tools]# chkconfig httpd on
[root@m01 tools]# netstat -lntup|grep httpd
tcp        0      0 :::80                       :::*                        LISTEN      53291/httpd  

在浏览器登录
10.0.0.61/nagios
输入用户名和密码
oldboy
123456
显示nagios core就正常了

4、安装Nagios插件软件包
安装基础依赖包
yum install perl-devel openssl-devel -y
安装Nagiospluginx插件包
wget https://nagios-plugins.org/download/nagios-plugins-1.4.16.tar.gz
[root@m01 tools]# ls nagios-plugins-1.4.16.tar.gz
nagios-plugins-1.4.16.tar.gz
[root@m01 tools]# tar xf nagios-plugins-1.4.16.tar.gz
[root@m01 tools]# cd nagios-plugins-1.4.16
[root@m01 nagios-plugins-1.4.16]# ./configure  --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql
[root@m01 nagios-plugins-1.4.16]# make
[root@m01 nagios-plugins-1.4.16]# make install

5、安装nrpe软件
ls /usr/local/nagios/libexec/check_nrpe
[root@m01 nagios-plugins-1.4.16]# cd ..
tar xf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install -plugin
make install -daemon
make install -daemon-config
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60

检查check_nrpe插件
[root@m01 tools]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60
到此为止Nagios服务器端的软件安装部分就配置完成了

6、配置并启动Nagios服务
添加Nagios服务到开机自启动
[root@m01 tools]# chkconfig nagios on
[root@m01 tools]# chkconfig --list nagios     
nagios          0:off   1:off   2:on    3:on    4:on    5:on    6:off
更好的办法
[root@m01 tools]# echo "/etc/init.d/nagios start">>/etc/rc.local
[root@m01 tools]# tail -1 /etc/rc.local
/etc/init.d/nagios start
检查语法
[root@m01 tools]# /etc/init.d/nagios checkconfig
Running configuration check... OK.
启动Nagios服务
[root@m01 tools]# /etc/init.d/nagios start
Starting nagios: done.
检查Nagios服务器端进程及端口
[root@m01 tools]# ps -ef |grep nagios|grep -v grep
nagios   15895     1  0 16:41 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[root@m01 tools]# netstat -lntup|grep nagios

===============================================
Nagios客户端安装
1、基础环境
[root@m01 yum.repos.d]# cat /etc/redhat-release
CentOS release 6.7 (Final)
[root@m01 yum.repos.d]# uname -r
2.6.32-573.el6.x86_64
[root@m01 yum.repos.d]# uname -m
x86_64

2、准备2台服务器
管理IP        角色        备注
10.0.0.8    web01        被监控的客户端服务器
10.0.0.7    web02        被监控的客户端服务器

3.解决Perl软件编译问题
[root@m01 yum.repos.d]# echo ‘export LC_ALL=C‘>>/etc/profile
[root@m01 yum.repos.d]# tail -1 /etc/profile
export LC_ALL=C
[root@m01 yum.repos.d]# source /etc/profile
[root@m01 yum.repos.d]# echo $LC_ALL

4.关闭防火墙及selinux
[root@m01 ~]# /etc/init.d/iptables stop
[root@m01 ~]# /etc/init.d/iptables status
iptables: Firewall is not running.
[root@m01 ~]# chkconfig iptables off
[root@m01 ~]# chkconfig --list iptables   
iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@m01 ~]#sed -i ‘s#SELINUX=enforcing#SELINUX=disabled#g‘ /etc/selinux/config
修改配置文件则永久生效,但是必须重启系统
[root@m01 ~]# getenforce
Disabled

5、解决系统时间同步问题
[root@m01 ~]# crontab -l
#time sync by oldboy at 2010-2-1
*/5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&1

=============================================
正式安装
1、安装基础系统软件
yum install gcc glibc glibc-common -y
yum install mysql-server -y

[root@m01 ~]# rpm -qa mysql
mysql-5.1.73-5.el6_7.1.x86_64


2、上传软件包到指定目录或通过URL下载
mkdir -p /home/oldboy/tools/nagios
cd /home/oldboy/tools/nagios
rz

unzip -q oldboy_training_nagios_soft.zip

3、添加Nagios用户
[root@web01 nagios]# useradd nagios -M -s /sbin/nologin
[root@web01 nagios]# id nagios
uid=508(nagios) gid=508(nagios) groups=508(nagios)

4、安装nagios-plugins插件
[root@web02 nagios]# yum install perl-devel perl-CPAN openssl-devel -y
[root@web02 nagios]# tar xf nagios-plugins-1.4.16.tar.gz
[root@web02 nagios]# cd nagios-plugins-1.4.16
[root@web02 nagios-plugins-1.4.16]# ./configure  --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql
检查插件数
[root@web01 nagios]# ls /usr/local/nagios/libexec/|wc -l
61

5、安装nrpe软件
[root@m01 nagios-plugins-1.4.16]# cd ..
ls /usr/local/nagios/libexec/check_nrpe
tar xf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install -plugin
下面两个会报错
make install -daemon
make install -daemon-config
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60

检查check_nrpe插件
[root@m01 tools]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60

6、安装其他相关的插件
[root@web01 nrpe-2.12]# cd ..
[root@web01 nagios]#
#----------Dear,我是分隔符---------------------
tar zxf Params-Validate-0.91.tar.gz
cd Params-Validate-0.91
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Class-Accessor-0.31.tar.gz
cd Class-Accessor-0.31
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Config-Tiny-2.12.tar.gz
cd Config-Tiny-2.12
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Math-Calc-Units-1.07.tar.gz
cd Math-Calc-Units-1.07
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Regexp-Common-2010010201.tar.gz
cd Regexp-Common-2010010201
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Nagios-Plugin-0.34.tar.gz
cd Nagios-Plugin-0.34
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
#yum install sysstat -y

如果报错就是前面的perl环境变量没提前设置好

7、配置监控内存、磁盘I/O脚本插件
yum install dos2UNIX -y
/bin/cp /home/oldboy/tools/nagios/check_memory.pl  /usr/local/nagios/libexec/
/bin/cp /home/oldboy/tools/nagios/check_iostat  /usr/local/nagios/libexec/
chmod 755 /usr/local/nagios/libexec/check_memory.pl
chmod 755 /usr/local/nagios/libexec/check_iostat
dos2unix /usr/local/nagios/libexec/check_memory.pl
dos2unix /usr/local/nagios/libexec/check_iostat

8、配置Nagios客户端nrpe服务
cd /usr/local/nagios/etc/
[root@web02 etc]# sed -n ‘79p‘ nrpe.cfg
allowed_hosts=127.0.0.1
[root@web01 etc]# sed -i ‘s#allowed_hosts=127.0.0.1#allowed_hosts=127.0.0.1,10.0.0.61#g‘ nrpe.cfg 
[root@web01 etc]# sed -n ‘79p‘ nrpe.cfg
allowed_hosts=127.0.0.1,10.0.0.61

9、然后在命令模式下执行shift+g命令道结尾。并进行如下操作
第一步,注释掉199-203行
#command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
#command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
#command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
#command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
第二步,在下面新添加要监控的内容:
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 10% -c 3%
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 15% -c 7% -p /
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 6 -c 10
10、启动Nagios client nrpe守护进程
[root@web02 etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
检查启动结果
[root@web02 etc]# netstat -lntup|grep nrpe
tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      24505/nrpe         
[root@web02 etc]# ps -ef |grep nrpe |grep -v grep
nagios   24505     1  0 19:56 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

重启技巧(这里不用重启)
#pkill nrpe
#/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

11、加入开机自启
[root@web01 etc]# echo "#nagios nrpe process cmd by wangtian 2016-5-22">>/etc/rc.local
[root@web01 etc]# echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local
检查
[root@web01 etc]# tail -2 /etc/rc.local
#nagios nrpe process cmd by wangtian 2016-5-22
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d


===============================================================================
Nagios服务器端监控
修改主配置文件(新手不需要,需要的话自己加上去书上582页)
[root@m01 tools]#cp /usr/local/nagios/etc/nagios.cfg{,.ori}
[root@m01 tools]# vim /usr/local/nagios/etc/nagios.cfg +34
增加如下主机和服务的配置文件
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_dir=/usr/local/nagios/etc/objects/services/
然后注释下列
# Definitions for monitoring the local (Linux) host
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

根据已有数据生成hosts.cfg
[root@m01 tools]# cd /usr/local/nagios/etc/objects/
[root@m01 objects]# head -51 localhost.cfg >hosts.cfg
[root@m01 objects]# chown nagios.nagios /usr/local/nagios/etc/objects/hosts.cfg

然后生成新的空services.cfg服务文件
[root@m01 objects]# touch services.cfg
[root@m01 objects]# chown nagios.nagios /usr/local/nagios/etc/objects/services.cfg
最后,生成服务的配置文件目录
[root@m01 objects]# mkdir services    
[root@m01 objects]# chown -R nagios.nagios /usr/local/nagios/etc/objects/services
检查
[root@m01 objects]# ls -lrt
total 60
-rw-rw-r-- 1 nagios nagios 10812 May 22 15:14 templates.cfg
-rw-rw-r-- 1 nagios nagios  7716 May 22 15:14 commands.cfg
-rw-rw-r-- 1 nagios nagios  3208 May 22 15:14 timeperiods.cfg
-rw-rw-r-- 1 nagios nagios  5403 May 22 15:14 localhost.cfg
-rw-rw-r-- 1 nagios nagios  4019 May 22 15:14 windows.cfg
-rw-rw-r-- 1 nagios nagios  3124 May 22 15:14 printer.cfg
-rw-rw-r-- 1 nagios nagios  3293 May 22 15:14 switch.cfg
-rw-r--r-- 1 root   root    2166 May 22 15:26 contacts.cfg.ori
-rw-rw-r-- 1 nagios nagios  2166 May 22 15:28 contacts.cfg
-rw-r--r-- 1 nagios nagios  1870 May 22 20:36 hosts.cfg
-rw-r--r-- 1 nagios nagios     0 May 22 20:38 services.cfg
drwxr-xr-x 2 nagios nagios  4096 May 22 20:39 services

====================================================================

配置Nagios服务器端监控项
1、定义要监控的Nagios客户端主机
[root@m01 objects]# cd /usr/local/nagios/etc/objects/
[root@m01 objects]# cp hosts.cfg.ori{,.1}
[root@m01 objects]# egrep -v "#|^$" hosts.cfg.ori >hosts.cfg
[root@m01 objects]# vim hosts.cfg
检查
[root@m01 objects]# cat hosts.cfg  
define host{
        use                     linux-server           
        host_name               web01
        alias                   web01
        address                 10.0.0.8
        }
define host{
        use                     linux-server           
        host_name               web02
        alias                   web02
        address                 10.0.0.7
        }
define hostgroup{
        hostgroup_name  linux-servers
        alias           Linux Servers
        members         web01,web02    
        }

2、配置services.cfg,定义要监控的资源服务
[root@m01 objects]#cp services.cfg{,.ori}
[root@m01 objects]#vim services.cfg
[root@m01 objects]# cat services.cfg                          
define service {
        use                         generic-service           
        host_name                   web01,web02
        service_description         Disk Partition
        check_command               check_nrpe!check_disk
        }
define service {
        use                         generic-service           
        host_name                   web01,web02
        service_description         Swap Useage
        check_command               check_nrpe!check_swap
        }
define service {
        use                         generic-service           
        host_name                   web01,web02
        service_description         MEM Useage
        check_command               check_nrpe!check_mem
        }
define service {
        use                         generic-service           
        host_name                   web01,web02
        service_description         Current Load
        check_command               check_nrpe!check_load
        }
define service {
        use                         generic-service           
        host_name                   web01,web02
        service_description         Disk Iostat
        check_command               check_nrpe!check_iostat!5!11
        }
define service {
        use                         generic-service           
        host_name                   web01,web02
        service_description         PING
        check_command               check_ping!100.0,20%!500.0,60%
        }
       
3、调试hosts.cfg和service.cfg的所有配置
[root@m01 objects]# cp commands.cfg{,.ori}
[root@m01 objects]#  vim commands.cfg
[root@m01 objects]# tail -5 commands.cfg
# ‘check_nrpe‘ command definition
define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

4、检查语法
/etc/init.d/nagios checkconfig
出现OK就可以启动了
/etc/init.d/nagios start
如果已经启动了,就执行/etc/init.d/nagios reload

在网页输入服务器端IP/Nagios就可以看到结果啦

=====================================================================================
配置报警(前面已经修改过邮箱报警,需要其他报警的自行扩展)
配置报警就是配置contacts.cfg文件。可以将公司所有的运维人员都加入到这个文件中,如果有需要还可以分组。

配置报警的步骤:
(1)    添加联系人及联系组contacts.cfg;
define contact{
    contact_name    oldboy-pager   
    use                generic-contact   
    alias            Nagios users   
    email            18901398229
    }
(2)    添加报警的命令commands.cfg
define command {
    command_name    notify-host-by-pager
    command_line    $USER1$/sms_send "$HOSTSTATE$ alert for $HOSTNAME$" $CONTACTOAGER$
    }
define command {
    command_name    notify-service-by-pager
    command_line    $USER1$/sms_send "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTOAGER$
    }   
(3)    调整联系人模板,添加报警的命令(来自于commands.cfg):
define contact{
    name                            generic-contact
    service_notification_period     24x7
    host_notification_period        24x7
    service_notification_options    w,u,c,r,f,s
    host_notification_options       d,u,r,f,s
    service_notification_commands   notify-service-by-email,notify-service-by-pager
    host_notification_commands      notify-host-by-email,notify-host-by-pager
    register                        0
    }
(4)    在hosts.cfg和service.cfg配置文件中添加报警联系人及组,或者在模板中添加
contact_groups          admins,group1,group2,user1

 

 

一些排错的思路

(1) 客户端获取值失败:

[root@client1 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.0.0.2 -c check_disk

CHECK_NRPE: Error - Could not complete SSL handshake. # 握手失败

# 这种问题的解决办法很简单,只需要执行下面这条命令即可:

[root@client1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk

# 如果能够获得值,那就是没有添加网卡地址,在nrpe.cfg中修改allowed_hosts=127.0.0.1这一行

(2) 状态为CRITICAL

# 这种问题就是连接失败,要么是服务没起,要么就是防火墙没关。我们可以现在本地执行:

/usr/local/nagios/libexec/check_nrpe -H 10.0.0.2 -c check_disk

# 当然ip和参数都可以改,通过该命令就能得到答案,因为改命令就是Nagios获取监控值的过程

(3) 命令行执行能够获取数值,但是web界面去获取不到。

define service {

use generic-service

host_name 02-client1,01-nagios

service_description Disk Partition

check_command check_nrpe!check_disk # 肯定是这个参数定义错了

}

(4) Unable to read output

# 出现这种问题的原因就是获取值的插件没有执行权限,或者是这插件就是有问题的,总之就是插件的错。

command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 6% -c 3% # check_memory.pl就是插件

[root@nagios libexec]# chmod +x check_memory.pl # 执行该命令,如果还是不行,那就是插件本身的问题了

总结,当web界面显示出现问题时:

(1) Nagios自身和配置文件;

(2) 在服务器端执行:

/usr/local/nagios/libexec/check_nrpe -H 被监控主机地址 -c 获取值的命令

(3) 在客户端本地执行:

/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c 获取值的命令

(4) 执行nrpe.cfg配置文件中的获取值的命令:

command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 8% -p / # 执行该命令

监控利器--nagiox

标签:服务器   监控   角色   

原文地址:http://wtlinux.blog.51cto.com/11253430/1782049

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!