一个logstash引发的连环案,关于logstash提示:Reached open files limit: 4095, set by the 'max_open_files' option or default, files yet to open: 375248

!!! Please upgrade your java version, the current version 1.7.0_45-mockbuild_2013_10_22_03_37-b00 may cause problems. We recommend a minimum version of 1.7.0_51
{:timestamp=>"2018-11-02T17:46:20.823000+0800", :message=>"Pipeline main started"}
!!! Please upgrade your java version, the current version 1.7.0_45-mockbuild_2013_10_22_03_37-b00 may cause problems. We recommend a minimum version of 1.7.0_51
{:timestamp=>"2018-11-02T17:46:25.727000+0800", :message=>"Pipeline main started"}
{:timestamp=>"2018-11-02T17:47:10.274000+0800", :message=>"Reached open files limit: 4095, set by the ‘max_open_files‘ option or default, files yet to open: 375248", :level=>:warn}
{:timestamp=>"2018-11-02T17:47:15.429000+0800", :message=>"Reached open files limit: 4095, set by the ‘max_open_files‘ option or default, files yet to open: 375248", :level=>:warn}
{:timestamp=>"2018-11-02T17:47:39.819000+0800", :message=>"Reached open files limit: 4095, set by the ‘max_open_files‘ option or default, files yet to open: 375248", :level=>:warn}




根据logstash抛出的问题,首先想到修改配置,初次了解logstash,应该修改哪些主机的配置呢,首先ulimit -a看了一下搭载logstash的负载均衡服务器,发现配置都已经修改过,并且配置到最大

[linkage@dandianwg-gate3 bin]$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 193095
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65535
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


[linkage@dandianwg-es1 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128456
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 1024
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

#开始对es集群的主机进行修改配置vi /etc/security/limits.conf 、vi /etc/security/limits.d/90-nproc.conf,把各项指标都调到65535

[root@dandianwg-es1 ~]$ vi /etc/security/limits.conf 
# End of file
*       soft    nproc   65535
*       hard    nproc   65535
*       soft    nofile  65535
*       hard    nofile  65535

[root@dandianwg-es1 ~]$
[root@dandianwg-es1 ~]$vi /etc/security/limits.d/90-nproc.conf 
# Default limit for number of users processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     65535
root       soft    nproc     65535 


root用户登录es集群主机ulimit -a查看配置生效,切换到普通查看配置生效居然没生效,open files依然1024(这个问题让人摸不着头脑,各种配置调试、百度,偶然发现一篇文章提到和ssh有关)


[linkage@dandianwg-es1 ~]$ cat /etc/security/limits.d/90-nproc.conf 
# Default limit for number of users processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     unlimited
root       soft    nproc     unlimited


[linkage@dandianwg-app bin]$ ssh -V
OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010

[linkage@dandianwg-app bin]$ openssl version -a
OpenSSL 1.0.1s  1 Mar 2016
built on: Wed Aug 17 15:42:37 2016
platform: linux-x86_64
options:  bn(64,64) rc4(16x,int) des(idx,cisc,16,int) idea(int) blowfish(idx) 
OPENSSLDIR: "/usr/local/ssl"



tar -xvf openssh-7.9p1.tar.gz
tar -xvf openssl-1.0.2p.tar.gz
tar -xvf zlib-1.2.11.tar.gz

cd /export/home/tools/zlib-1.2.11
./configure --prefix=/usr 
make -j4 && make install

cd /export/home/tools/openssl-1.0.2p
./config --prefix=/usr  --shared  zlib
make -j4 && make install

cd /export/home/tools/openssh-7.9  

  ./configure --prefix=/usr --with-pam --with-zlib --with-md5-passwords --without-openssl-header-check

make && make install

sed -i s/SSHD=\/usr\/local\/sbin\/sshd/SSHD=\/usr\/sbin\/sshd/g  /etc/init.d/sshd 
service sshd restart

#第三坑  由于是源码安装,安装成功之后查看版本也正常,(不管普通用户还是root,通通没有登录权限)

[linkage@dandianwg-gate2 bin]$ ssh
The authenticity of host ( cant be established.
RSA key fingerprint is b8:f3:61:35:51:2e:34:22:d5:b4:96:b9:e9:a4:cc:f5.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added (RSA) to the list of known hosts.
linkage@ password: 
Permission denied, please try again.
linkage@ password: 
Permission denied, please try again.
linkage@ password: 
Permission denied (publickey,password,keyboard-interactive).


./configure --prefix=/usr --with-pam --with-zlib --with-md5-passwords --without-openssl-header-check


cd /export/home/tools/openssh-7.9 
./configure --prefix=/usr --with-pam --with-zlib --with-md5-passwords --without-openssl-header-check
make && make install
sed -i s/SSHD=\/usr\/local\/sbin\/sshd/SSHD=\/usr\/sbin\/sshd/g  /etc/init.d/sshd 
service sshd restart

cd /export/home/tools/openssh-7.9  
./configure --with-ssl-dir=/usr/local/ssl --with-openssl-includes=/usr/local/ssl/include/openssl --with-openssl-libraries=/usr/local/ssl/lib
make && make install
sed -i ‘s/SSHD=\/usr\/sbin\/sshd/SSHD=\/usr\/local\/sbin\/sshd/g‘  /etc/init.d/sshd 
service sshd restart


#ssh升级后,通过普通用户和root登录测试,发现open files生效,(经过一些列的波折,搞定了修改open files普通用户不生效的问题)


#回到正题,解决logstash抛异常的问题,其实经过上面的填坑已经把这个异常的问题解决了,为什么logstash还在抛open files的异常???



所以上面我们经过一系列的填坑,把es集群的ulimit 配置搞定了,其实真正的异常则是从app主机上抛出来的,只需要将app主机上的limit配置最大化。



