公司使用openstack定制开发了云桌面平台提供给客户使用,架构是3台控制节点+N台计算节点+N台CEPH+cinder。使用一直较为稳定,今天部份客户反映双击云主机图标无法进入云桌面、或者需要多次点击方可进入云桌面。经过检查是因为2台控制节点内存占用过高导致,需要清理。下面是详细操作过程。
1、查看控制节点内存占用
[root@node-6 ~]# top top - 15:29:40 up 18 days, 17:15, 1 user, load average: 1.27, 1.83, 2.09 Tasks: 1062 total, 6 running, 1056 sleeping, 0 stopped, 0 zombie Cpu(s): 12.2%us, 1.1%sy, 0.0%ni, 86.4%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 99032136k total, 97932848k used, 1099288k free, 233100k buffers Swap: 33554428k total, 295916k used, 33258512k free, 38017888k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2926 rabbitmq 20 0 2904m 449m 2676 S 97.4 0.5 0:12.85 beam.smp 1868 root 10 -10 44932 11m 7240 S 15.0 0.0 2149:53 ovs-vswitchd 7982 mysql -2 0 24.8g 2.8g 138m S 9.4 3.0 4617:11 mysqld ......
"Mem: 99032136k total, 97932848k used",可以看到 控制节点内存已用满
控制节点查看nova服务状态:
[root@node-6 ~]# nova-manage service list Binary Host Zone Status State Updated_At nova-consoleauth node-6.domain.tld internal enabled :-) 2017-05-04 07:26:39 nova-scheduler node-6.domain.tld internal enabled :-) 2017-05-04 07:26:40 nova-conductor node-6.domain.tld internal enabled :-) 2017-05-04 07:26:49 nova-cert node-6.domain.tld internal enabled :-) 2017-05-04 07:26:47 nova-consoleauth node-7.domain.tld internal enabled :-) 2017-05-04 07:26:43 nova-scheduler node-7.domain.tld internal enabled :-) 2017-05-04 07:26:48 nova-conductor node-7.domain.tld internal enabled :-) 2017-05-04 07:26:48 nova-consoleauth node-8.domain.tld internal enabled :-) 2017-05-04 07:26:39 nova-scheduler node-8.domain.tld internal enabled :-) 2017-05-04 07:26:42 nova-conductor node-8.domain.tld internal enabled :-) 2017-05-04 07:26:48 nova-cert node-7.domain.tld internal enabled :-) 2017-05-04 07:26:41 nova-cert node-8.domain.tld internal enabled :-) 2017-05-04 07:26:39 nova-compute node-4.domain.tld nova disabled :-) 2017-05-04 07:26:40 nova-compute node-10.domain.tld nova enabled :-) 2017-05-04 07:26:47 nova-compute node-15.domain.tld nova enabled :-) 2017-05-04 07:26:44 nova-compute node-16.domain.tld nova enabled :-) 2017-05-04 07:26:45 nova-compute node-11.domain.tld nova enabled :-) 2017-05-04 07:26:41 nova-compute node-17.domain.tld nova enabled :-) 2017-05-04 07:26:39 nova-compute node-5.domain.tld nova enabled :-) 2017-05-04 07:26:46 nova-compute node-9.domain.tld nova enabled :-) 2017-05-04 07:26:41 nova-compute node-13.domain.tld nova enabled :-) 2017-05-04 07:26:42 nova-compute node-14.domain.tld nova enabled :-) 2017-05-04 07:26:46 nova-compute node-12.domain.tld nova enabled :-) 2017-05-04 07:26:41 nova-compute node-26.domain.tld nova enabled :-) 2017-05-04 07:26:48
在控制节点查看了内存占用及nova服务状态控制。node-4 libvirtd服务disabled,一般在该计算节点用“/etc/init.d/libvirtd restart”重启即可。虽然linux有内存管理机制,但目前服务器内存用满不会自动下降,却容易导致各节点通讯失败。导致用户“双击云主机图标无法进入云桌面”,只能人工介入处理清理内存。
当前的处理过程是 清理控制节点内存-->重启控制节点rabbitmy-server服务--> 重启各计算节点nova-compute服务。当然清理内存也有风险,可能导致在用用户云主机断开影响操作体验。
2、第一台控制节点清理内存,重启rabbitmy-server服务
[root@node-6 ~]# sync [root@node-6 ~]# echo 3 > /proc/sys/vm/drop_caches [root@node-6 ~]# top top - 15:30:47 up 18 days, 17:16, 1 user, load average: 1.29, 1.73, 2.04 Tasks: 1063 total, 4 running, 1059 sleeping, 0 stopped, 0 zombie Cpu(s): 12.2%us, 1.1%sy, 0.0%ni, 86.4%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 99032136k total, 59740848k used, 39291288k free, 9576k buffers Swap: 33554428k total, 295916k used, 33258512k free, 214468k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2926 rabbitmq 20 0 3222m 735m 2676 S 35.4 0.8 0:19.17 beam.smp 8392 keystone 20 0 423m 66m 3524 S 20.5 0.1 15:43.19 keystone-all 1868 root 10 -10 44932 11m 7240 S 14.9 0.0 2150:02 ovs-vswitchd 7982 mysql -2 0 24.8g 2.8g 138m S 9.3 3.0 4617:44 mysqld ...... [root@node-6 ~]# /etc/init.d/rabbitmq-server restart Restarting rabbitmq-server: FAILED - check /var/log/rabbitmq/shutdown_log, _err RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. SUCCESS Setting policy "ha-all" for pattern "." to "{\"ha-mode\":\"all\", \"ha-sync-mode\":\"automatic\"}" with priority "0" ... ...done. rabbitmq-server. [root@node-6 ~]# ssh node-7 Warning: Permanently added ‘node-7,192.168.10.6‘ (RSA) to the list of known hosts. Last login: Tue May 2 09:40:05 2017 from 192.168.10.4
3、第二台控制节点清理内存:
[root@node-7 ~]# top top - 15:30:53 up 44 days, 6 min, 1 user, load average: 1.33, 1.44, 1.63 Tasks: 1081 total, 5 running, 1076 sleeping, 0 stopped, 0 zombie Cpu(s): 11.3%us, 1.0%sy, 0.0%ni, 87.5%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 99032136k total, 98589812k used, 442324k free, 273588k buffers Swap: 33554428k total, 141844k used, 33412584k free, 35998936k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12728 neutron 20 0 549m 216m 2016 S 37.7 0.2 15:41.11 neutron-server 26690 keystone 20 0 327m 66m 3428 R 24.5 0.1 13:40.80 keystone-all 26698 keystone 20 0 327m 67m 3428 S 16.9 0.1 12:57.10 keystone-all 19852 nova 20 0 555m 197m 3332 S 15.1 0.2 502:36.04 nova-api 1936 root 10 -10 47608 14m 7220 S 9.4 0.0 3790:59 ovs-vswitchd [root@node-7 ~]# sync [root@node-7 ~]# echo 3 > /proc/sys/vm/drop_caches [root@node-7 ~]# /etc/init.d/rabbitmq-server restart Restarting rabbitmq-server: rmdir: failed to remove `/var/run/rabbitmq‘: Directory not empty RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. SUCCESS Setting policy "ha-all" for pattern "." to "{\"ha-mode\":\"all\", \"ha-sync-mode\":\"automatic\"}" with priority "0" ... ...done. rabbitmq-server.
4、重启计算节点的nova-compute服务。
[root@node-7 ~]# ssh node-4 Warning: Permanently added ‘node-4,192.168.10.2‘ (RSA) to the list of known hosts. Last login: Thu May 4 15:31:29 2017 from 192.168.10.6 [root@node-4 ~]# /etc/init.d/libvirtd restart Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] [root@node-4 ~]# /etc/init.d/openstack-nova-compute restart Stopping openstack-nova-compute: [FAILED] Starting openstack-nova-compute: [ OK ] [root@node-4 ~]# exit logout Connection to node-4 closed.
5、在控制节点上查看各节点nova服务状态
[root@
Binary Host Zone Status State Updated_At nova-consoleauth node-6.domain.tld internal enabled :-) 2017-05-04 07:32:09 nova-scheduler node-6.domain.tld internal enabled :-) 2017-05-04 07:32:10 nova-conductor node-6.domain.tld internal enabled :-) 2017-05-04 07:32:16 nova-cert node-6.domain.tld internal enabled :-) 2017-05-04 07:32:07 nova-consoleauth node-7.domain.tld internal enabled :-) 2017-05-04 07:32:13 nova-scheduler node-7.domain.tld internal enabled :-) 2017-05-04 07:32:08 nova-conductor node-7.domain.tld internal enabled :-) 2017-05-04 07:32:16 nova-consoleauth node-8.domain.tld internal enabled :-) 2017-05-04 07:32:09 nova-scheduler node-8.domain.tld internal enabled :-) 2017-05-04 07:32:12 nova-conductor node-8.domain.tld internal enabled :-) 2017-05-04 07:32:16 nova-cert node-7.domain.tld internal enabled :-) 2017-05-04 07:32:11 nova-cert node-8.domain.tld internal enabled :-) 2017-05-04 07:32:09 nova-compute node-4.domain.tld nova enabled :-) 2017-05-04 07:32:05 nova-compute node-10.domain.tld nova enabled :-) 2017-05-04 07:32:07 nova-compute node-15.domain.tld nova enabled :-) 2017-05-04 07:32:06 nova-compute node-16.domain.tld nova enabled :-) 2017-05-04 07:32:10 nova-compute node-11.domain.tld nova enabled :-) 2017-05-04 07:32:12 nova-compute node-17.domain.tld nova enabled :-) 2017-05-04 07:32:14 nova-compute node-5.domain.tld nova enabled :-) 2017-05-04 07:32:16 nova-compute node-9.domain.tld nova enabled :-) 2017-05-04 07:32:11 nova-compute node-13.domain.tld nova enabled :-) 2017-05-04 07:32:12 nova-compute node-14.domain.tld nova enabled :-) 2017-05-04 07:32:13 nova-compute node-12.domain.tld nova enabled :-) 2017-05-04 07:32:15 nova-compute node-26.domain.tld nova enabled :-) 2017-05-04 07:32:11
6、各节点nova服务已正常。一般来说,重启了控制的rabbitmq-server,也可以重启各计算的nova-compute服务,以重新建立连接(根据实际情况判断是否需要)。
计算节点node-5重启libvirtd及openstack-nova-compute服务
[root@node-7 ~]# ssh node-5 Warning: Permanently added ‘node-5,192.168.10.3‘ (RSA) to the list of known hosts. Last login: Sat Apr 22 10:37:43 2017 from 192.168.10.4 [root@node-5 ~]# /etc/init.d/libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] [root@node-5 ~]# /etc/init.d/open openstack-nova-compute openvswitch [root@node-5 ~]# /etc/init.d/openstack-nova-compute restart Stopping openstack-nova-compute: [ OK ] Starting openstack-nova-compute: [ OK ]
计算节点node-9重启libvirtd及openstack-nova-compute服务
[root@node-9 ~]# /etc/init.d/libvirtd restart
Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] [root@node-9 ~]# /etc/init.d/openstack-nova-compute restart Stopping openstack-nova-compute: [ OK ] Starting openstack-nova-compute: [ OK ]
计算节点node-10重启libvirtd及openstack-nova-compute服务
[root@node-9 ~]# ssh node-10 Warning: Permanently added ‘node-10,192.168.10.9‘ (RSA) to the list of known hosts. Last login: Tue Apr 25 16:30:26 2017 from 192.168.10.6 [root@node-10 ~]# /etc/init.d/libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] [root@node-10 ~]# /etc/init.d/openstack-nova-compute restart Stopping openstack-nova-compute: [ OK ] Starting openstack-nova-compute: [ OK ]
......重复操作。
本文出自 “福州恒达电脑” 博客,请务必保留此出处http://fzhddn.blog.51cto.com/12650899/1922090
原文地址:http://fzhddn.blog.51cto.com/12650899/1922090