使用monit监控进程与系统状态

时间：2016-08-12 22:03:17 阅读：655 评论：0 收藏：0 [点我收藏+]

标签：monit

参考文章：

http://heylinux.com/archives/3063.html

https://mmonit.com/wiki/Monit/ConfigurationExamples

https://mmonit.com/wiki/Monit/Gmail

monit它最大的特点是配置文件简单易读，同时支持进程和系统状态的监控，并灵活的提供了各种检测的方式，周期，并进行报警和响应（重启服务，执行命令等）

安装配置：

由于monit是属于epel源里的，所以你必须配置好epel源码，然后

yum -y install monit

直接上修改后的配置文件

[root@aliyun_test ~]# cat /etc/monit.conf

###############################################################################

## Monit control file

###############################################################################

## Comments begin with a ‘#‘ and extend through the end of the line. Keywords

## are case insensitive. All path‘s MUST BE FULLY QUALIFIED, starting with ‘/‘.

## Below you will find examples of some frequently used statements. For

## information about the control file and a complete list of statements and

## options, please have a look in the Monit manual.

###############################################################################

## Global section

###############################################################################

## Start Monit in the background (run as a daemon):

set daemon 30 # check services at 30 seconds intervals

# with start delay 240 # optional: delay the first check by 4-minutes (by

# # default Monit check immediately after Monit start)

## Set syslog logging. If you want to log to a standalone log file instead,

## specify the full path to the log file

set logfile syslog

## Set the location of the Monit lock file which stores the process id of the

## running Monit instance. By default this file is stored in $HOME/.monit.pid

set pidfile /var/run/monit.pid

## Set the location of the Monit id file which stores the unique id for the

## Monit instance. The id is generated and stored on first Monit start. By

## default the file is placed in $HOME/.monit.id.

set idfile /var/.monit.id

## Set the location of the Monit state file which saves monitoring states

## on each cycle. By default the file is placed in $HOME/.monit.state. If

## the state file is stored on a persistent filesystem, Monit will recover

## the monitoring state across reboots. If it is on temporary filesystem, the

## state will be lost on reboot which may be convenient in some situations.

set statefile /var/.monit.state

## Set the list of mail servers for alert delivery. Multiple servers may be

## specified using a comma separator. If the first mail server fails, Monit

# will use the second mail server in the list and so on. By default Monit uses

# port 25 - it is possible to override this with the PORT option.

set mailserver localhost # primary mailserver

# backup.bar.baz port 10025, # backup mailserver on port 10025

# localhost # fallback relay

## By default Monit will drop alert events if no mail servers are available.

## If you want to keep the alerts for later delivery retry, you can use the

## EVENTQUEUE statement. The base directory where undelivered alerts will be

## stored is specified by the BASEDIR option. You can limit the queue size

## by using the SLOTS option (if omitted, the queue is limited by space

## available in the back end filesystem).

set eventqueue

basedir /var/monit # set the base directory where events will be stored

# slots 100 # optionally limit the queue size

## Send status and events to M/Monit (for more informations about M/Monit

## see http://mmonit.com/). By default Monit registers credentials with

## M/Monit so M/Monit can smoothly communicate back to Monit and you don‘t

## have to register Monit credentials manually in M/Monit. It is possible to

## disable credential registration using the commented out option below.

## Though, if safety is a concern we recommend instead using https when

## communicating with M/Monit and send credentials encrypted.

# set mmonit http://monit:monit@192.168.1.10:8080/collector

# # and register without credentials # Don‘t register credentials

## Monit by default uses the following format for alerts if the the mail-format

## statement is missing::

## --8<--

set mail-format {

from: monit@$HOST

subject: monit alert -- $EVENT $SERVICE

message: $EVENT Service $SERVICE

Date: $DATE

Action: $ACTION

Host: $HOST

Description: $DESCRIPTION

Your faithful employee,

Monit

}

## --8<--

## You can override this message format or parts of it, such as subject

## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.

## are expanded at runtime. For example, to override the sender, use:

# set mail-format { from: monit@foo.bar }

## You can set alert recipients whom will receive alerts if/when a

## service defined in this file has errors. Alerts may be restricted on

## events by using a filter as in the second example below.

set alert 13817419446@139.com

## Do not alert when Monit starts, stops or performs a user initiated action.

## This filter is recommended to avoid getting alerts for trivial cases.

# set alert your-name@your.domain not on { instance, action }

## Monit has an embedded HTTP interface which can be used to view status of

## services monitored and manage services from a web interface. The HTTP

## interface is also required if you want to issue Monit commands from the

## command line, such as ‘monit status‘ or ‘monit restart service‘ The reason

## for this is that the Monit client uses the HTTP interface to send these

## commands to a running Monit daemon. See the Monit Wiki if you want to

## enable SSL for the HTTP interface.

set httpd port 2812 and

use address localhost # only accept connection from localhost

allow localhost # allow localhost to connect to the server and

allow admin:monit # require user ‘admin‘ with password ‘monit‘

###############################################################################

## Services

###############################################################################

## Check general system resources such as load average, cpu and memory

## usage. Each test specifies a resource, conditions and the action to be

## performed should a test fail.

# check system $HOST

# if loadavg (1min) > 4 then alert

# if loadavg (5min) > 2 then alert

# if cpu usage > 95% for 10 cycles then alert

# if memory usage > 75% then alert

# if swap usage > 25% then alert

## Check if a file exists, checksum, permissions, uid and gid. In addition

## to alert recipients in the global section, customized alert can be sent to

## additional recipients by specifying a local alert handler. The service may

## be grouped using the GROUP option. More than one group can be specified by

## repeating the ‘group name‘ statement.

# check file apache_bin with path /usr/local/apache/bin/httpd

# if failed checksum and

# expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor

# if failed permission 755 then unmonitor

# if failed uid root then unmonitor

# if failed gid root then unmonitor

# alert security@foo.bar on {

# checksum, permission, uid, gid, unmonitor

# } with the mail-format { subject: Alarm! }

# group server

## Check that a process is running, in this case Apache, and that it respond

## to HTTP and HTTPS requests. Check its resource usage such as cpu and memory,

## and number of children. If the process is not running, Monit will restart

## it by default. In case the service is restarted very often and the

## problem remains, it is possible to disable monitoring using the TIMEOUT

## statement. This service depends on another service (apache_bin) which

## is defined above.

# check process apache with pidfile /usr/local/apache/logs/httpd.pid

# start program = "/etc/init.d/httpd start" with timeout 60 seconds

# stop program = "/etc/init.d/httpd stop"

# if cpu > 60% for 2 cycles then alert

# if cpu > 80% for 5 cycles then restart

# if totalmem > 200.0 MB for 5 cycles then restart

# if children > 250 then restart

# if loadavg(5min) greater than 10 for 8 cycles then stop

# if failed host www.tildeslash.com port 80 protocol http

# and request "/somefile.html"

# then restart

# if failed port 443 type tcpssl protocol http

# with timeout 15 seconds

# then restart

# if 3 restarts within 5 cycles then unmonitor

# depends on apache_bin

# group server

## Check filesystem permissions, uid, gid, space and inode usage. Other services,

## such as databases, may depend on this resource and an automatically graceful

## stop may be cascaded to them before the filesystem will become full and data

## lost.

# check filesystem datafs with path /dev/sdb1

# start program = "/bin/mount /data"

# stop program = "/bin/umount /data"

# if failed permission 660 then unmonitor

# if failed uid root then unmonitor

# if failed gid disk then unmonitor

# if space usage > 80% for 5 times within 15 cycles then alert

# if space usage > 99% then stop

# if inode usage > 30000 then alert

# if inode usage > 99% then stop

# group server

## Check a file‘s timestamp. In this example, we test if a file is older

## than 15 minutes and assume something is wrong if its not updated. Also,

## if the file size exceed a given limit, execute a script

# check file database with path /data/mydatabase.db

# if failed permission 700 then alert

# if failed uid data then alert

# if failed gid data then alert

# if timestamp > 15 minutes then alert

# if size > 100 MB then exec "/my/cleanup/script" as uid dba and gid dba

## Check directory permission, uid and gid. An event is triggered if the

## directory does not belong to the user with uid 0 and gid 0. In addition,

## the permissions have to match the octal description of 755 (see chmod(1)).

# check directory bin with path /bin

# if failed permission 755 then unmonitor

# if failed uid 0 then unmonitor

# if failed gid 0 then unmonitor

## Check a remote host availability by issuing a ping test and check the

## content of a response from a web server. Up to three pings are sent and

## connection to a port and an application level network check is performed.

# check host myserver with address 192.168.1.1

# if failed ping then alert

# if failed port 3306 protocol mysql with timeout 15 seconds then alert

# if failed port 80 protocol http

# and request /some/path with content = "a string"

# then alert

## Check a network link status (up/down), link capacity changes, saturation

## and bandwidth usage.

# check network public with interface eth0

# if failed link then alert

# if changed link then alert

# if saturation > 90% then alert

# if download > 10 MB/s then alert

# if total upload > 1 GB in last hour then alert

## Check custom program status output.

# check program myscript with path /usr/local/bin/myscript.sh

# if status != 0 then alert

###############################################################################

## Includes

###############################################################################

## It is possible to include additional configuration parts from other files or

## directories.

# include /etc/monit.d/*

# set daemon mode timeout to 1 minute

set daemon 60

# Include all files from /etc/monit.d/

include /etc/monit.d/*

注意：如果要实现邮箱报警，本地localhost的邮件服务器25端口要开启监听

接下来为mysql做监控：

vim /etc/monit.d/mysql

check process mysql with pidfile /mydata/data/aliyun_test.pid

start program = "/etc/init.d/mysqld start" with timeout 10 seconds

stop program = "/etc/init.d/mysqld stop"

if failed port 3306 protocol mysql

with timeout 10 seconds

then restart

if 3 restarts within 5 cycles then unmonitor

group server

启动minit服务：

service monit start

[root@aliyun_test ~]# tail -f /var/log/monit

[CST Aug 12 13:01:00] info : Monit daemon with pid [20358] stopped

[CST Aug 12 13:01:00] info : ‘aliyun_test‘ Monit 5.14 stopped

[CST Aug 12 13:01:00] info : Starting Monit 5.14 daemon with http interface at [localhost]:2812

[CST Aug 12 13:01:00] info : Starting Monit HTTP server at [localhost]:2812

[CST Aug 12 13:01:00] info : Monit HTTP server started

[CST Aug 12 13:01:00] info : ‘aliyun_test‘ Monit 5.14 started

手动模拟mysql进程挂掉：

service mysqld stop

观察日志：

[CST Aug 12 13:51:04] error : ‘mysql‘ process is not running

[CST Aug 12 13:51:04] info : ‘mysql‘ trying to restart

[CST Aug 12 13:51:04] info : ‘mysql‘ start: /etc/init.d/mysqld

[CST Aug 12 13:52:05] info : ‘mysql‘ process is running with pid 21246

看邮件截图：

上图为does not exist mysql表示出故障了，准备重启

上图则表示mysql服务已经正常上线，成功解决故障

注：monit有web接口，请配置好配置文件，如下：

也可以通过web接口来访问：查看配置文件

set httpd port 2812 and

use address localhost # only accept connection from localhost

allow localhost # allow localhost to connect to the server and

allow admin:monit # require user ‘admin‘ with password ‘monit‘

修改为：

set httpd port 2812 and

use address 外网ip # only accept connection from localhost

allow 外网ip # allow localhost to connect to the server and

allow admin:monit # require user ‘admin‘ with password ‘monit‘

allow @monit # allow users of group ‘monit‘ to connect (rw)

allow @users readonly # allow users of group ‘users‘ to connect readonly

通过浏览器来访问，查看监控状态

更多学习monit技能，请参考文章上方的参考文章

使用monit监控进程与系统状态

标签：monit

原文地址：http://huangsir007.blog.51cto.com/6159353/1837281

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行