标签:资讯 今天 平台 play it基础 api hosts switch oms
Docker 是Openshift最基本的组件. 需要master与node实例全局的docker健康情况 ,以下是每个节点应该监控的:
Check Name | Description | Storage Driver | Sample Alerting Logic |
Docker Daemon | Check that docker is running on a system | devicemapper | systemctl is-active docker |
overlay2 | systemctl is-active docker | ||
Docker Storage | Check that docker’s storage has adequate space. overlay2 check assumes LV_Name is dockerlv and VG is dockervg. | devicemapper | echo $(echo \"$(docker info 2>/dev/null | awk ‘/Data Space Available/ {print $4}‘) / $(docker info 2>/dev/null | awk ‘/Data Space Total/ {print $4}‘)\" | bc -l) ‘>‘ 0.3 | bc -l |
overlay2 | echo "$(df -h | awk ‘/dockervg-dockerlv/ {print $5}‘ | awk -F% ‘{print $1}‘) > 70" | bc | ||
Docker Metadata Storage | Check that docker’s metadata storage volume is not full | devicemapper | echo $(echo \"$(docker info 2>/dev/null | awk ‘/Metadata Space Available/ {print $4}‘) / $(docker info 2>/dev/null | awk ‘/Metadata Space Total/ {print $4}‘)\" | bc -l) ‘>‘ 0.3 | bc -l |
overlay2 | N/A with overlay2 |
Check Name | Description | Relevant Hosts | OCP Version | Sample Alerting Logic |
Etcd Service | Check that etcd is active | Masters | <= 3.9 | systemctl is-active etcd |
>= 3.10 | oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "master-etcd" | grep -i "running" | if [ $( wc -l) -eq $(oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name | grep etcd | wc -l) ]; then exit 0; else exit 1; fi | |||
Etcd Storage | Check that the etcd volume is not too full.This checks assumes the node storage (/var/lib/etcd) is provisioned with a separate logical volume. | Masters | <= 3.9 | echo "$(lvs | awk ‘/etcd/ {print $4}‘) > 70" | bcor echo "$(df -h | awk ‘/etcd/ {print $5}‘ | awk -F% ‘{print $1}‘) > 70" | bc |
>= 3.10 | echo "$(lvs | awk ‘/etcd/ {print $4}‘) > 70" | bcor echo "$(df -h | awk ‘/etcd/ {print $5}‘ | awk -F% ‘{print $1}‘) > 70" | bc | |||
Master API Service (single master) | Check that the Master API Service or pods are active | Masters | <= 3.9 | systemctl is-active atomic-openshift-master |
>= 3.10 | Same as multi-master check. | |||
Master API Service (multi-master) | Check that the Master API Service or pods are active | Masters | <= 3.9 | systemctl is-active atomic-openshift-master-api |
>= 3.10 | oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "master-api" | grep -i "running" | if [ $( wc -l) -eq $(oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name | grep etcd | wc -l) ]; then exit 0; else exit 1; fi | |||
Master Controllers Service (multi-master) | Check that the Master Controllers Service or pods are active | Masters | <= 3.9 | systemctl is-active atomic-openshift-master-controllers |
>= 3.10 | oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "master-controller" | grep -i "running" | if [ $( wc -l) -eq $(oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name | grep etcd | wc -l) ]; then exit 0; else exit 1; fi | |||
Node Service | Check that the node service is active | All Nodes | <= 3.9 | systemctl is-active atomic-openshift-node |
>= 3.10 | systemctl is-active atomic-openshift-node | |||
Node Storage | Check that the node’s local data storage volume is not too full. This checks assumes the node storage (/var/lib/origin) is provisioned with a separate logical volume. | All Nodes | <= 3.9 | echo "$(lvs | awk ‘/origin/ {print $4}‘) > 70" | bcor echo "$(df -h | awk ‘/origin/ {print $5}‘ | awk -F% ‘{print $1}‘) > 70" | bc |
>= 3.10 | echo "$(lvs | awk ‘/origin/ {print $4}‘) > 70" | bcor echo "$(df -h | awk ‘/origin/ {print $5}‘ | awk -F% ‘{print $1}‘) > 70" | bc | |||
OpenVSwitch Service | Check that the openvswitch service or pods are active | All Nodes | <= 3.9 | systemctl is-active openvswitch |
>= 3.10 | oc get pods -n openshift-sdn --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "ovs-" | grep -i "running" | if [ $( wc -l) -eq $(oc get nodes --no-headers | wc -l) ]; then exit 0; else exit 1; fi | |||
SDN Service | Check that all the SDN pods are active | All Nodes | <= 3.9 | NA |
>= 3.10 | oc get pods -n openshift-sdn --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "sdn-" | grep -i "running" | if [ $( wc -l) -eq $(oc get nodes --no-headers | wc -l) ]; then exit 0; else exit 1; fi |
许多Openshift组件暴露HTTP端点,用于健康与相关操作。这些需要监控:
Check Name | Description | Sample Alerting Logic |
OpenShift Master API Server | Check the health of a master API Endpoint | curl -s https://console.c1-ocp.myorg.com:8443/healthz | grep ok |
Router | Check the health of the Router | curl http://router.default.svc.cluster.local:1936/healthz | grep 200 |
Registry | Check the health of the Registry | curl -I https://docker-registry.default.svc.cluster.local:5000/healthz | grep 200 |
Logging | Check the health of the EFK Logging Stack | Because of the various components and complexities involved, we recommend the OpenShift Logging health check script. |
Metrics | Check the health of the Metrics Stack | Because of the various components and complexities involved, we recommend the OpenShift Metrics health check script. |
如有想了解更多软件设计与架构, 系统IT,企业信息化, 团队管理 资讯,请关注我的微信订阅号:
作者:Petter Liu
出处:http://www.cnblogs.com/wintersun/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
该文章也同时发布在我的独立博客中-Petter Liu Blog。
标签:资讯 今天 平台 play it基础 api hosts switch oms
原文地址:https://www.cnblogs.com/wintersun/p/12540293.html