标签:mongos
一 故障描述
线上游戏服使用MongoDB集群分片,游戏配置文件通过mongos实例发送查询请求。开发同事反映程序连接mongos报错
Failed to connect to: 10.4.4.66:28018: send_package: error reading from socket: The socket is closed
过后又可以连接了。
10.4.4.66是mongos的IP地址
二 故障分析
1.查看mongos进程是否正常,使用mongo能否登录到mongos终端。一切都正常。
2.查看mongos的日志文件,看是否有异常
***** SERVER RESTARTED ***** Thu Apr 23 15:02:01.726 [mongosMain] MongoS version 2.4.6 starting: pid=7553 port=28018 64-bit host=XXXXXXX 1 (--help for usage)
可以看到刚好在这段时间内mongos重新启动了,在mongos重新启动后,有一系列工作要做,比如连接config server获取后端shard的信息等。所以程序会报错误。
3.查看Linux系统日志
$ sudo grep mongos /var/log/messages Apr 10 15:35:38 localhost sz[32066]: [xxxx] check_mongos.sh/ZMODEM: 211 Bytes, 229 BPS Apr 23 14:50:18 localhost sz[5794]: [xxxxx] mongos/ZMODEM: 297 Bytes, 151 BPS Apr 23 15:01:55 localhost kernel: [20387] 497 20387 694326 427932 0 0 0 mongos Apr 23 15:01:55 localhost kernel: Out of memory: Kill process 20387 (mongos) score 890 or sacrifice child Apr 23 15:01:55 localhost kernel: Killed process 20387, UID 497, (mongos) total-vm:2777304kB, anon-rss:1711700kB, file-rss:28kB
可以看出mongos由于内存溢出,进程被内核给Kill掉了。
4.检查mongos拉起脚本的日志
为了防止mongos进程挂掉后无法自动重起,我们添加了一个定时任务每隔一分钟就去检查mongos进程是否存在,不在就启动mongos
mongos is not running Starting mongod: about to fork child process, waiting until server is ready for connections. forked process: 7553 all output going to: /data/app_data/mongodb/log/mongos.log child process started successfully, parent exiting [ OK ]
可以看到mongos进程被OOM给kill掉后,该脚本检测到mongos宕掉又重新启动了mongos
5.研究mongos出现OOM的原因
本文出自 “Linux SA John” 博客,请务必保留此出处http://john88wang.blog.51cto.com/2165294/1637695
标签:mongos
原文地址:http://john88wang.blog.51cto.com/2165294/1637695