Linux 部署自动检测系统--Nagios

时间：2018-01-09 20:17:27 阅读：397 评论：0 收藏：0 [点我收藏+]

标签：字段 col line str mail 利用 launch rhel auth

Nagios简介

是一款用来监视系统和网络的开源应用软件

— 利用其众多的插件实现对本机和远端服务的监控

— 当被监控对象异常时，会及时向管理员告警

— 提供一批预设好的监控插件，用户可以直接调用

— 也可以自定义Shell脚本来监控服务，适合各类企业的业务监控

— 可通过Web页面显示对象状态，日志，告警信息

搭建Nagios监控服务器

*本次实验使用到的源码包有三个：

nagios-4.2.4.tar.gz

nagios-plugins-2.1.4.tar.gz

nrpe-3.0.1.tar.gz

安装Nagios(源码)

安装准备 ：编译工具创建用户和组（根据配置文件）

//安装编译工具

# yum -y install gcc gcc-c++

# rpm -q gcc gcc-c++

gcc-4.8.5-4.el7.x86_64

gcc-c++-4.8.5-4.el7.x86_64

//创建运行帐号

# useradd nagios

# groupadd nagcmd

# usermod -G nagcmd nagios

安装nagios：解包配置编译安装查看安装信息

1）解包

# ls nagios/

nagios-4.2.4.tar.gz nagios-plugins-2.1.4.tar.gz nrpe-3.0.1.tar.gz

# cd nagios/

# tar -zxf nagios-4.2.4.tar.gz

# cd nagios-4.2.4

2）配置

# ./configure --help | more //分页显示软件配置帮助

--disable//禁用某些功能

--enable//启用某些功能

--with-nagios-user=<user>

sets user name to run nagios//设置运行nagios 的用户

--with-nagios-group=<grp>

sets group name to run nagios//设置运行nagios 的组

--with-command-user=<user>

sets user name for command access//设置使用命令的用户

--with-command-group=<grp>

sets group name for command access//设置时用命令的组

//配置将刚刚创建的运行帐号

# ./configure --with-nagios-user=nagios --with-nagios-group=nagcmd --with-command-user=nagios --with-command-group=nagcmd

3）编译

# make all//执行编译

....//下面是编译完成后提供的安装帮助

make install

- This installs the main program, CGIs, and HTML files

make install-init

- This installs the init script in /etc/rc.d/init.d

make install-commandmode

- This installs and configures permissions on the

directory for holding the external command file

make install-config

- This installs *SAMPLE* config files in /usr/local/nagios/etc

You'll have to modify these sample files before you can

use Nagios. Read the HTML documentation for more info

on doing this. Pay particular attention to the docs on

object configuration files, as they determine what/how

things get monitored!

make install-webconf

- This installs the Apache config file for the Nagios

web interface

make install-exfoliation

- This installs the Exfoliation theme for the Nagios

web interface

make install-classicui

- This installs the classic theme for the Nagios

web interface

...

4）安装和查看安装信息

# make install //安装程序

# ls /usr/local/nagios/

bin libexec sbin share var

# make install-init//安装控制脚本

/usr/bin/install -c -m 755 -d -o root -g root /etc/rc.d/init.d

/usr/bin/install -c -m 755 -o root -g root daemon-init /etc/rc.d/init.d/nagios

*** Init script installed ***

# ls /etc/rc.d/init.d/

functions nagios netconsole network README rhnsd

# make install-commandmode//调权限

/usr/bin/install -c -m 775 -o nagios -g nagcmd -d /usr/local/nagios/var/rw

chmod g+s /usr/local/nagios/var/rw

*** External command directory configured ***

# make install-config//安装配置

# ls /usr/local/nagios/etc//nagios 的配置文件

cgi.cfg nagios.cfg objects resource.cfg

# make install-webconf//部署网站配置

/usr/bin/install -c -m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf

if [ 0 -eq 1 ]; then \

ln -s /etc/httpd/conf.d/nagios.conf /etc/apache2/sites-enabled/nagios.conf; \

*** Nagios/Apache conf file installed ***

# ls /etc/httpd/conf.d/nagios.conf //nagios 的网页文件配置

/etc/httpd/conf.d/nagios.conf

# make install-exfoliation//安装网页配置风格

查看安装目录 /usr/local/nagios

# cd /usr/local/nagios/

# ls

bin etc libexec sbin share var

# ls bin///程序命令

nagios //验证配置信息

nagiostats//命令行显示监控状态信息

# ls etc///配置文件目录

# ls etc/objects/

# ls libexec///存放监控插件的目录

# ls sbin///监控脚本文件目录

# ls share///nagios 网页目录

# ls var///nagios 运行数据

安装监控插件

# cd /root/nagios/

# tar -zxf nagios-plugins-2.1.4.tar.gz

# cd nagios-plugins-2.1.4/

# ./configure && make && make install

# ls /usr/local/nagios/libexec///查询安装的监控插件

启动nagios监控服务

1）设置访问监控页面的用户名（nagiosadmin）和密码(自定义)

# vim /etc/httpd/conf.d/nagios.conf

39 Alias /nagios "/usr/local/nagios/share" //网页文件存放目录

52 AuthUserFile /usr/local/nagios/etc/htpasswd.users //用户验证文件目录

# which htpasswd

/usr/bin/htpasswd

# rpm -qf /usr/bin/htpasswd

httpd-tools-2.4.6-40.el7.x86_64

//使用 htpasswd 来创建用户验证文件

# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

// -c 创建

New password: //123456

Re-type new password:

Adding password for user nagiosadmin

//查看刚刚创建的用户验证文件

# cat /usr/local/nagios/etc/htpasswd.users

nagiosadmin:$apr1$Y82sv4lW$WSL2QI05tcXY59l8GFouo/

2) 启动 nagios 服务

// nagios 服务启动 | 停止 | 查看状态 /etc/rc.d/init.d/nagios start | stop | status

# /etc/rc.d/init.d/nagios start

Reloading systemd: [ 确定 ]

Starting nagios (via systemctl): [ 确定 ]

# /etc/rc.d/init.d/nagios status

nagios (pid 2055) is running...

3）重启网站服务加载nagios 配置

# systemctl restart httpd.service

在客户端访问监控服务器的监控页面

#ping -c2 192.168.4.21 //测试网络连通性

# firefox http://192.168.4.21/nagios

在这里输入刚刚使用 htpasswd 创建的用户验证

技术分享图片

map 监控拓扑图

host 监控主机信息

host groups 监控主机组信息

Service 查看监控服务情况 [ 主要使用这个 ]

技术分享图片

Serivices

//默认不用任何配置监控本机插件名

Current Load CPU 负载 check_load

Current Users登录系统用户数check_users

HTTP网站运行状态check_http

PINGping check_ping

ROOT Rartition根分区 check_disk

SSH监控ssh check_ssh

Swap Usage交换分区 check_swap

Total Processes总的进程数量check_procs

监控状态分为：

正常 OK

警告 Warning

不知道Unknown

严重错误Critical

监控中Pending

配置nagios服务

1）监控过程

nagios 服务运行是加载主配置文件ngaios.cfg，在配置文件中调用监控插件

运维人员可以设置插件的监控阀值（警告值和错误值）

nagios 服务的插件把监控到的数据和监控阀值比较，根据比较结果显示监控状态

监控到的数据 < 警告值显示 OK

监控到的数据 > 警告值 < 错误值显示 Warning

监控到的数据 > 错误值显示 Critical

Unknown配置有问题

Pending正在获取数据

2）监控插件的使用

# cd /usr/local/nagios/libexec/ //进入存放插件的目录

# ./check_users --help//查看插件帮助文档

-w, --warning=INTEGER

Set WARNING status if more than INTEGER users are logged in

-c, --critical=INTEGER

Set CRITICAL status if more than INTEGER users are logged in

插件使用练习：

# ./check_users -w 1 -c 3//警告值当前用户1 错误值当前用户3

USERS WARNING - 3 users currently logged in |users=3;1;3;0

# ./check_users -w 1 -c 2

USERS CRITICAL - 3 users currently logged in |users=3;1;2;0

# ./check_users -w 10 -c 20

USERS OK - 3 users currently logged in |users=3;10;20;0

# ./check_http -H 192.168.4.4

connect to address 192.168.4.4 and port 80: 没有到主机的路由

HTTP CRITICAL - Unable to open TCP socket

# ./check_http -H 192.168.4.254

HTTP WARNING: HTTP/1.1 403 Forbidden - 5179 bytes in 0.002 second response time |time=0.001511s;;;0.000000 size=5179B;;;0

# ./check_ping -H 192.168.4.4 -w 10,50% -c 20,75% -p 3 //警告值 10ms内丢包率 50% 错误值 20ms内丢包率 75%

PING OK - Packet loss = 0%, RTA = 0.42 ms|rta=0.418000ms;10.000000;20.000000;0.000000 pl=0%;50;75;0

# ./check_ping -H 192.168.4.100 -w 10,50% -c 20,75% -p 3

CRITICAL - Host Unreachable (192.168.4.100)

# ./check_disk -w 50% -c 25% -p ///警告值根分区可用 50% 错误值可用 25%

DISK OK - free space: / 47849 MB (93% inode=99%);| /=3325MB;25587;38381;0;51175

# ./check_disk -w 50% -c 25% -p /boot

DISK OK - free space: /boot 339 MB (68% inode=99%);| /boot=157MB;248;372;0;496

# dd if=/dev/zero of=/boot/test.txt bs=1M count=200

记录了200+0 的读入

记录了200+0 的写出

209715200字节(210 MB)已复制，0.407923 秒，514 MB/秒

# ./check_disk -w 50% -c 25% -p /boot

DISK WARNING - free space: /boot 139 MB (28% inode=99%);| /boot=357MB;248;372;0;496

# rm -rf /boot/test.txt

# ./check_disk -w 50% -c 25% -p /boot

DISK OK - free space: /boot 339 MB (68% inode=99%);| /boot=157MB;248;372;0;496

# ./check_ssh -H 192.168.4.4

SSH OK - OpenSSH_6.6.1 (protocol 2.0) | time=0.016415s;;;0.000000;10.000000

# ./check_swap -w 20% -c 10% /

SWAP OK - 100% free (2047 MB out of 2047 MB) |swap=2047MB;409;204;0;2047

# ./check_procs -w 50 -c 51 -s R //正在运行的进程

PROCS OK: 1 process with STATE = R | procs=1;50;51;0;

# ./check_procs -w 50 -c 51 //所有状态

PROCS CRITICAL: 136 processes | procs=136;50;51;0;

# ./check_procs -w 50 -c 51 -s Z//僵尸进程

PROCS OK: 0 processes with STATE = Z | procs=0;50;51;0;

# ./check_procs -w 50 -c 51 -s S//休眠进程

PROCS OK: 47 processes with STATE = S | procs=47;50;51;0;

# ./check_load -w 1.00,3.00,9.00 -c 2.00,6.00,11.00

OK - load average: 0.00, 0.01, 0.05|load1=0.000;1.000;2.000;0; load5=0.010;3.000;6.000;0; load15=0.050;9.000;11.000;0;

# uptime

01:38:33 up 6:04, 3 users, load average: 0.00, 0.01, 0.05

# ./check_tcp -H 192.168.4.4 -p 22

TCP OK - 0.001 second response time on 192.168.4.4 port 22|time=0.000507s;;;0.000000;10.000000

# ./check_tcp -H 192.168.4.4 -p 21

connect to address 192.168.4.4 and port 21: 没有到主机的路由

3）配置文件说明

# cd /usr/local/nagios/etc/

# ls

cgi.cfg htpasswd.users nagios.cfg(主配置文件) objects resource.cfg(宏定义配置文件)

# vim nagios.cfg//主配置文件

...

//顺序加载objects下配置文件

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

...

//在主配置文件中会加载其他全部配置文件所有只要检查主配置文件就相当与检查全部配置

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg#检查配置有没有错误

// 由于检查配置命令过长不易操作可以将其设置别名为checknagios

//设置临时别名

# alias checknagios='/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg'

# checknagios

//设置永久全局别名

# vim /etc/bashrc

# sed -n '2p' /etc/bashrc

alias checknagios='/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg'

# source /etc/bashrc //使 /etc/bashrc 快速生效

# checknagios

# /usr/local/nagios/bin/nagiostats -c /usr/local/nagios/etc/nagios.cfg#在命令行输出监控信息

// cgi 程序配文件

# vim cgi.cfg

14 main_config_file=/usr/local/nagios/etc/nagios.cfg //声明主配置文件目录

23 physical_html_path=/usr/local/nagios/share // 声明网页文件存放目录

35 url_html_path=/nagios // 声明访问网页url

118 authorized_for_system_information=nagiosadmin //声明验证用户名

130 authorized_for_configuration_information=nagiosadmin

//变量配置

# vim resource.cfg

$USER1$=/usr/local/nagios/libexec//定义变量$USER1$ 为插件所在目录

监控对象及模版配置目录 /usr/local/nagios

# cd objects/

# ls

commands.cfg(定义监控命令) localhost.cfg(监控本机的配置文件) switch.cfg timeperiods.cfg

contacts.cfg printer.cfg templates.cfg windows.cfg

//定义监控命令配置文件

# vim commands.cfg

//定义监控命令格式

define command{

command_name 命令名 //自定义命令名

command_line 路径/插件名参数 //对应的具体操作（可调插件）

}

命令名 !值1 !值2

//设置接受报警信息邮件的邮箱地址

# grep -n email contacts.cfg

34 email nagios@localhost

//定义监控时间模版配置文件

# vim timeperiods.cfg

//定义监控模版配置文件

# vim templates.cfg

define contact{

name generic-contact #联系人名称，

service_notification_period 24x7 #当服务出现异常时，发送通知的时间段，这个时间段“7x24"在timeperiods.cfg文件中定义

host_notification_period 24x7 #当主机出现异常时，发送通知的时间段，这个时间段“7x24"在timeperiods.cfg文件中定义

service_notification_options w,u,c,r #这个定义的是“通知可以被发出的情况”。w即warn，表示警告状态，u即unknown，表示不明状态，c即criticle，表示紧急状态，r即recover，表示恢复状态。也就是在服务出现警告状态、未知状态、紧急状态和重新恢复状态时都发送通知给使用者。

host_notification_options d,u,r #定义主机在什么状态下需要发送通知给使用者，d即down，表示宕机状态，u即unreachable，表示不可到达状态，r即recovery，表示重新恢复状态。

service_notification_commands notify-service-by-email #服务故障时，发送通知的方式，可以是邮件和短信，这里发送的方式是邮件，其中“notify-service-by-email”在commands.cfg文件中定义。

host_notification_commands notify-host-by-email #主机故障时，发送通知的方式，可以是邮件和短信，这里发送的方式是邮件，其中“notify-host-by-email”在commands.cfg文件中定义。

}

define host{

name generic-host #主机名称，这里的主机名，并不是直接对应到真正机器的主机名，乃是对应到在主机配置文件里所设定的主机名。

notifications_enabled 1

event_handler_enabled 1

flap_detection_enabled 1

failure_prediction_enabled 1

process_perf_data 1

retain_status_information 1

retain_nonstatus_information 1

notification_period 24x7 #指定“发送通知”的时间段，也就是可以在什么时候发送通知给使用者。

}

define host{

name linux-server #主机名称

use generic-host #use表示引用，也就是将主机generic-host的所有属性引用到linux-server中来，在nagios配置中，很多情况下会用到引用。

check_period 24x7 #这里的check_period告诉nagios检查主机的时间段

check_interval 5 #nagios对主机的检查时间间隔，这里是5分钟。

retry_interval 1 #重试检查时间间隔，单位是分钟。

max_check_attempts 10 #nagios对主机的最大检查次数，也就是nagios在检查发现某主机异常时，并不马上判断为异常状况，而是多试几次，因为有可能只是一时网络太拥挤，或是一些其他原因，让主机受到了一点影响，这里的10就是最多试10次的意思。

check_command check-host-alive #指定检查主机状态的命令，其中“check-host-alive”在commands.cfg文件中定义。

notification_period workhours #主机故障时，发送通知的时间范围，其中“workhours”在timeperiods.cfg中进行了定义，下面会陆续讲到。

notification_interval 120 #在主机出现异常后，故障一直没有解决，nagios再次对使用者发出通知的时间。单位是分钟。如果你觉得，所有的事件只需要一次通知就够了，可以把这里的选项设为0

notification_options d,u,r #定义主机在什么状态下可以发送通知给使用者，d即down，表示宕机状态，u即unreachable，表示不可到达状态，r即recovery，表示重新恢复状态。

contact_groups admins #指定联系人组，这个“admins”在contacts.cfg文件中定义。

}

define service{

name generic-service #定义一个服务名称

active_checks_enabled 1

passive_checks_enabled 1

parallelize_check 1

obsess_over_service 1

check_freshness 0

notifications_enabled 1

event_handler_enabled 1

flap_detection_enabled 1

failure_prediction_enabled 1

process_perf_data 1

retain_status_information 1

retain_nonstatus_information 1

is_volatile 0

check_period 24x7 #这里的check_period告诉nagios检查服务的时间段。

max_check_attempts 3 #nagios对服务的最大检查次数。

normal_check_interval 10 #此选项是用来设置服务检查时间间隔，也就是说，nagios这一次检查和下一次检查之间所隔的时间，这里是10分钟。

retry_check_interval 2 #重试检查时间间隔，单位是分钟。

contact_groups admins #指定联系人组，同上。

notification_options w,u,c,r #这个定义的是“通知可以被发出的情况”。w即warn，表示警告状态，u即unknown，表示不明状态，c即criticle，表示紧急状态，r即recover，表示恢复状态。也就是在服务出现警告状态、未知状态、紧急状态和重新恢复后都发送通知给使用者。

notification_interval 60 #在服务出现异常后，故障一直没有解决，nagios再次对使用者发出通知的时间。单位是分钟。如果你认为，所有的事件只需要一次通知就够了，可以把这里的选项设为0。

notification_period 24x7 #指定“发送通知”的时间段，也就是可以在什么时候发送通知给使用者。

}

//监控本机的配置文件

# vim localhost.cfg

define host{ //定义监控主机

use linux-server //监控主机的时候的监控模版模版名固定

host_name 主机名//监控页面 host字段下的内容

address 监控主机的ip地址//监控主机地址

}

define hostgroup{//定义监控组

}

define service{//定义监控资源

use local-service //监控资源时的监控模版模版名固定

host_name 主机名 //要与上面host的host_name 一样

service_description PING//监控页面监控资源的描述信息

check_commandcheck_ping!100.0,20%!500.0,60% //在commands.cfg定义的命令

}

Nagios 监控实战

nagios监控配置的实现过程

1）定义监控命令

2）定义监控对象

3）由nagios 服务加载监控配置

监控本机 //nagios 默认监控本机

添加新的监控项监控本机的引导分区

# vim commands.cfg//定义监控命令

define command{

command_name check_local_boot

command_line $USER1$/check_disk -w 50% -c 25% -p /boot

}

# vim localhost.cfg

define service{

use local-service

host_name localhost

service_description boot

check_command check_local_boot

}

修改已有监控项的监控阀值用户数量

define service{

use local-service ; Name of service template to use

host_name localhost

service_description Current Users

check_command check_local_users!1!2

}

删除已有监控项

#define service{

# use local-service ; Name of service template to use

# host_name localhost

# service_description Swap Usage

# check_command check_local_swap!20!10

# }

# checknagios//只要修改配置文件就都要检查是否有错误

//主配置文件会加载objects 下的所有配置文件所有修改配置文件只要检查主配置文件就行

# /etc/init.d/nagios stop

Stopping nagios (via systemctl): [ 确定 ]

# /etc/init.d/nagios start

Starting nagios (via systemctl): [ 确定 ]

//访问监控页面查看监控信息

技术分享图片

监控报警

#vim contacts.cfg //报警信息邮件

29 define contact{

30 contact_name nagiosadmin

31 use generic-contact //发送邮件的模版名

32 alias Nagios Admin

34 email nagios@localhost

35 }

# vim templates.cfg //定义监控模版配置文件

define contact{

name generic-contact #联系人名称，

service_notification_period 24x7 #当服务出现异常时，发送通知的时间段，这个时间段“7x24"在timeperiods.cfg文件中定义

host_notification_period 24x7 #当主机出现异常时，发送通知的时间段，这个时间段“7x24"在timeperiods.cfg文件中定义

}

# vim commands.cfg //监控命令配置文件

26 # 'notify-host-by-email' command definition

27 define command{

28 command_name notify-host-by-email

29 command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NO TIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOST OUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Ale rt: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$

30 }

32 # 'notify-service-by-email' command definition

33 define command{

34 command_name notify-service-by-email

35 command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NO TIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nStat e: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERV ICESTATE$ **" $CONTACTEMAIL$

36 }

Nagios可用的全部的宏

主机宏

$HOSTNAME$ 主机简称(如"web")，取自于主机定义里的host_name域

$HOSTADDRESS$ 主机地址。取自于主机定义里的address域

服务宏

$SERVICESTATE$ 服务状态描述，有w，u，c

$SERVICEDESC$ 对当前服务的描述

联系人宏

$CONTACTNAME$ 表示联系人，在联系人文件中定义

通知宏

$NOTIFICATIONTYPE$ 返回下面信息：("PROBLEM", "RECOVERY", "ACKNOWLEDGEMENT", "FLAPPINGSTART", "FLAPPINGSTOP", "FLAPPINGDISABLED", "DOWNTIMESTART", "DOWNTIMEEND", or "DOWNTIMECANCELLED").

日期/时间宏

$LONGDATETIME$ 当前的日期/时间戳

文件宏

$LOGFILE$ 日志文件的保存位置。

$MAINCONFIGFILE$ 主配置文件的保存位置。

其他宏

$ADMINEMAIL$ 全局的管理员EMail地址

$ARGn$ 指向第n个命令传递参数(通知、事件处理、服务检测等)。Nagios支持最多32个参数宏

//当前主机有三个用户登录达到严重错误情况查看监控报警

//默认使用本机的邮件服务器发送报警信息

# mail -u nagios

...

***** Nagios *****

Notification Type: PROBLEM

Service: Current Users

Host: localhost

Address: 127.0.0.1

State: CRITICAL

Date/Time: Mon Jan 8 05:06:23 EST 2018

Additional Info:

USERS CRITICAL - 3 users currently logged in

监控远端主机 192.168.4.12

监控公有数据（服务）

网站服务 sshd服务监控数据库服务

1 定义监控命令 commands.cfg

# cd /usr/local/nagios/libexec/

//测试命令正确性

# ./check_tcp -H 192.168.4.12 -p 22

TCP OK - 0.001 second response time on 192.168.4.12 port 22|time=0.000667s;;;0.000000;10.000000

# ./check_tcp -H 192.168.4.12 -p 80

connect to address 192.168.4.12 and port 80: 拒绝连接

# ./check_tcp -H 192.168.4.12 -p 3306

TCP OK - 0.001 second response time on 192.168.4.12 port 3306|time=0.001477s;;;0.000000;10.000000

//自定义监控命令

# vim /usr/local/nagios/etc/objects/commands.cfg

define command{

command_name check_12_ssh

command_line $USER1$/check_tcp -H 192.168.4.12 -p 22

}

define command{

command_name check_12_http

command_line $USER1$/check_tcp -H 192.168.4.12 -p 80

}

define command{

command_name check_12_mysql

command_line $USER1$/check_tcp -H 192.168.4.12 -p 3306

}

2 创建监控主机 12 配置文件 ser12.cfg

# grep -v '#' localhost.cfg > ser12.cfg

# vim ser12.cfg

define host{

use linux-server

host_name web12

address 192.168.4.12

}

define service{

use local-service

host_name web12

service_description ssh

check_command check_12_ssh

}

define service{

use local-service

host_name web12

service_description http

check_command check_12_http

}

define service{

use local-service

host_name web12

service_description mysql

check_command check_12_mysql

}

define host{}

define service{}

3 在主配置文件里加载监控远端主机12的配置文件 nagios.cfg

# vim /usr/local/nagios/etc/nagios.cfg

//注释监控本机配置文件

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

cfg_file=/usr/local/nagios/etc/objects/ser12.cfg

4 检查以上配置使用之前定义的别名 checknagios

# checknagios

Nagios Core 4.2.4

Last Modified: 12-07-2016

License: GPL

Website: https://www.nagios.org

Reading configuration data...

Read main config file okay...

Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...

Checked 11 services.

Checked 2 hosts.

Checked 1 host groups.

Checked 0 service groups.

Checked 1 contacts.

Checked 1 contact groups.

Checked 28 commands.

Checked 5 time periods.

Checked 0 host escalations.

Checked 0 service escalations.

Checking for circular paths...

Checked 2 hosts

Checked 0 service dependencies

Checked 0 host dependencies

Checked 5 timeperiods

Checking global event handlers...

Checking obsessive compulsive processor commands...

Checking misc settings...

Total Warnings: 0

Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

5 重启nagios 服务

# /etc/init.d/nagios stop

Stopping nagios (via systemctl): [ 确定 ]

# /etc/init.d/nagios start

Starting nagios (via systemctl): [ 确定 ]

6 访问监控页面查看监控信息

技术分享图片

7 查看邮箱

# mail -u nagios

***** Nagios *****

Notification Type: PROBLEM

Service: http

Host: web12

Address: 192.168.4.12

State: CRITICAL

Date/Time: Mon Jan 8 20:56:01 EST 2018

Additional Info:

connect to address 192.168.4.12 and port 80: Connection refused

[root@web12 ~]# netstat -pantu | grep httpd

tcp6 0 0 :::8090 :::* LISTEN 3304/httpd

监控私有数据（系统运行情况）

技术分享图片

a 配置被监控端主机12

1) 安装nagios-plugins获取数据插件

# rpm -q gcc gcc-c++

gcc-4.8.5-4.el7.x86_64

gcc-c++-4.8.5-4.el7.x86_64

# tar -zxf nagios-plugins-2.1.4.tar.gz

# cd nagios-plugins-2.1.4/

# ./configure && make && make install

# ls /usr/local/nagios/libexec/

# cd /usr/local/nagios/libexec/

# ./check_users -w 1 -c 2

USERS OK - 1 users currently logged in |users=1;1;2;0

# ./check_disk -w 50% -c 25% -p /

DISK OK - free space: / 43674 MB (85% inode=99%);| /=7500MB;25587;38381;0;51175

2) 运行NRPE服务

2.1 装包准备

# yum -y install openssl-devel

2.2 装包

#cd

# useradd nagios

#tar -zxf nrpe-3.0.1.tar.gz

#cd nrpe-3.0.1/

#./configure

# make

Please enter make [option] where [option] is one of:

all builds nrpe and check_nrpe

nrpe builds nrpe only

check_nrpe builds check_nrpe only

install-groups-users add the users and groups if they do not exist

install install nrpe and check_nrpe

install-plugin install the check_nrpe plugin

install-daemon install the nrpe daemon

install-config install the nrpe configuration file

install-inetd install the startup files for inetd, launchd, etc.

install-init install the startup files for init, systemd, etc.

# make all

# make install

# make install-config

# make install-init //rhel7 使用system 启动服务

* docs/NRPE.pdf //帮助文档

2.3 修改主配置文件

# vim /usr/local/nagios/etc/nrpe.cfg

98 allowed_hosts=127.0.0.1, 192.168.4.21//允许那些地址访问该服务

//为了区分与插件名在命名时添加nrpe_ 不是必须要求

287 command[nrpe_check_users]=/usr/local/nagios/libexec/check_users -w 1 -c 2

288 command[nrpe_check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

289 command[nrpe_check_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /

290 command[nrpe_check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

291 command[nrpe_check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

2.4 启动服务

# systemctl start nrpe.service

# systemctl enable nrpe.service

# netstat -pantu | grep :5666

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 27221/nrpe

tcp6 0 0 :::5666 :::* LISTEN 27221/nrpe

2.4 测试配置 //测试定义的命令

# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -p 5666 -c nrpe_check_users

USERS OK - 1 users currently logged in |users=1;1;2;0

# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -p 5666 -c nrpe_check_load

OK - load average: 0.00, 0.01, 0.05|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.050;5.000;20.000;0;

# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -p 5666 -c nrpe_check_root

DISK OK - free space: / 43661 MB (85% inode=99%);| /=7513MB;40940;46057;0;51175

# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -p 5666 -c nrpe_check_zombie_procs

PROCS OK: 0 processes with STATE = Z | procs=0;5;10;0;

# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -p 5666 -c nrpe_check_total_procs

PROCS OK: 130 processes | procs=130;150;200;0;

b 配置监控噢服务器主机21

1) 安装提供连接NRPE 服务的插件

# yum -y install openssl-devel

# cd

# tar -zxf nrpe-3.0.1.tar.gz

# cd nrpe-3.0.1/

# ./configure

# make all

# make install-plugin

//测试命令

# /usr/local/nagios/libexec/check_nrpe -H 192.168.4.12 -p 5666 -c nrpe_check_users

USERS OK - 1 users currently logged in |users=1;1;2;0

2) 定义获取私有数据的命令

define command{

command_name check_12_users

command_line $USER1$/check_nrpe -H 192.168.4.12 -p 5666 -c nrpe_check_users

}

define command{

command_name check_12_load

command_line $USER1$/check_nrpe -H 192.168.4.12 -p 5666 -c nrpe_check_load

}

define command{

command_name check_12_root

command_line $USER1$/check_nrpe -H 192.168.4.12 -p 5666 -c nrpe_check_root

}

define command{

command_name check_12_zombie_procs

command_line $USER1$/check_nrpe -H 192.168.4.12 -p 5666 -c nrpe_check_zombie_procs

}

define command{

command_name check_12_total_procs

command_line $USER1$/check_nrpe -H 192.168.4.12 -p 5666 -c nrpe_check_total_procs

}

3) 在监控主机的配置文件里调用定义的命令

# vim /usr/local/nagios/etc/objects/ser12.cfg

define service{

use local-service

host_name web12

service_description users

check_commandcheck_12_users

}

define service{

use local-service

host_name web12

service_description load

check_commandcheck_12_load

}

define service{

use local-service

host_name web12

service_description root

check_commandcheck_12_root

}

define service{

use local-service

host_name web12

service_description zombie_proc

check_commandcheck_12_zombie_proc

}

define service{

use local-service

host_name web12

service_description total_procs

check_commandcheck_12_total_procs

}

4) 检查以上配置 checknagios

# checknagios

Nagios Core 4.2.4

Last Modified: 12-07-2016

License: GPL

Website: https://www.nagios.org

Reading configuration data...

Read main config file okay...

Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...

Checked 8 services.

Checked 1 hosts.

Checked 0 host groups.

Checked 0 service groups.

Checked 1 contacts.

Checked 1 contact groups.

Checked 33 commands.

Checked 5 time periods.

Checked 0 host escalations.

Checked 0 service escalations.

Checking for circular paths...

Checked 1 hosts

Checked 0 service dependencies

Checked 0 host dependencies

Checked 5 timeperiods

Checking global event handlers...

Checking obsessive compulsive processor commands...

Checking misc settings...

Total Warnings: 0

Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

5) 重启nagios 服务

# /etc/init.d/nagios stop

Stopping nagios (via systemctl): [ 确定 ]

# /etc/init.d/nagios start

Starting nagios (via systemctl): [ 确定 ]

6 访问监控页面查看监控信息

技术分享图片

Linux 部署自动检测系统--Nagios

标签：字段 col line str mail 利用 launch rhel auth

原文地址：http://blog.51cto.com/13558754/2059158

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

Linux 部署 自动检测系统--Nagios

Linux 部署自动检测系统--Nagios