标签:nat mes targe linux top dna demo 故障排查 int
k8s容器云平台技术落地方案1、服务器初始化
2、部署etcd集群,包括生成证书
3、部署master(kube-apiserver、scheduler、controller-manager)
4、部署node(kubelet、proxy、docker)
5、部署网络组件、dashboard、ingress controller、配置pv自动供给、CoreDNS等
主机名 | ip | 角色 | 备注 |
---|---|---|---|
centos7-node1 | 192.168.56.11 | k8s-master | etcd-1 |
centos7-node2 | 192.168.56.12 | k8s-node1 | etcd-2 |
centos7-node3 | 192.168.56.13 | k8s-node2 | etcd-3 |
$ yum -y install git epel-release ansible
$ git clone https://github.com/lizhenliang/ansible-install-k8s.git
$ mkdir ~/binary_pkg && cd binary_pkg #提供所需的软件包
$ cd ansible-install-k8s #修改,group_vars下的all.yml 指定对应路径和集群ip,修改文件下的hosts
$ ansible-playbook -i hosts single-master-deploy.yml #部署单master集群
$ kubectl create deploy web --image=lizhenliang/java-demo
$ kubectl expose deploy web --port=80 --target-port=8080 --type=NodePort
ETCDCTL_API=3 etcdctl snapshot save snap.db --endpoints=https://192.168.56.11:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
$ cd /data/etcd/bin
$ ETCDCTL_API=3 ./etcdctl snapshot save snap.db --endpoints=https://192.168.56.11:2379 --cacert=/data/etcd/ssl/ca.pem --cert=/data/etcd/ssl/server.pem --key=/data/etcd/ssl/server-key.pem
操作步骤如下:
mv /etc/kubernetes/manifests /etc/kubernetes/manifests.bak
mv /var/lib/etcd /var/lib/etcd.bak
$ ETCDCTL_API=3 etcdctl snapshort restore snap.db --data-dir=/var/lib/etcd
mv /etc/kubernetes/manifests.bak /etc/kubernetes/manifests
操作步骤如下:
$ systemctl stop kube-apiserver
$ ansible etcd -m service -a "name=etcd state=stopped"
$ ansible etcd -m shell -a "mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak"
$ cd /data/etcd/bin/ #恢复的数据备份至此
$ ansible etcd -m copy -a "src=/data/etcd/bin/snap.db dest=/data/etcd/bin/"
# etcd 节点1 执行如下命令
$ ETCDCTL_API=3 ./etcdctl snapshot restore snap.db --name etcd-1 --initial-cluster="etcd-1=https://192.168.56.11:2380,etcd-2=https://192.168.56.12:2380,etcd-3=https://192.168.56.13:2380" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=https://192.168.56.11:2380 --data-dir=/var/lib/etcd/default.etcd
# etcd 节点2 执行如下命令
$ETCDCTL_API=3 ./etcdctl snapshot restore snap.db --name etcd-2 --initial-cluster="etcd-1=https://192.168.56.11:2380,etcd-2=https://192.168.56.12:2380,etcd-3=https://192.168.56.13:2380" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=https://192.168.56.12:2380 --data-dir=/var/lib/etcd/default.etcd
# etcd 节点3 执行如下命令
$ETCDCTL_API=3 ./etcdctl snapshot restore snap.db --name etcd-3 --initial-cluster="etcd-1=https://192.168.56.11:2380,etcd-2=https://192.168.56.12:2380,etcd-3=https://192.168.56.13:2380" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=https://192.168.56.13:2380 --data-dir=/var/lib/etcd/default.etcd
$ systemctl start kube-apiserver
$ ansible etcd -m service -a "name=etcd state=started"
$ ansible
二进制部署过程中,apiserver和etcd由cfssl或者openssl工具自签证书并可以定义过期时间,而kubelet连接apiserver所需的客户端证书是由controller-manager组件自动颁发,默认是一年,如果到期,kubelet将无法使用过期的证书连接apiserver,从而导致无法正常工作,日志会给出证书过期错误(x509: certificate has expired or is not yet valid)
解决该问题可启用kubelet证书轮转。
1、配置kube-controller-manager组件
vi /etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
containers:
- command:
- kube-controller-manager
- --experimental-cluster-signing-duration=87600h0m0s
- --feature-gates=RotateKubeletServerCertificate=true
…
添加上述两个参数:
experimental-cluster-signing-duration=87600h0m0s 为kubelet客户端证书颁发有效期10年
feature-gates=RotateKubeletServerCertificate=true 启用server证书颁发
配置完成后,重建pod使之生效:
kubectl delete pod kube-controller-manager-k8s-master -n kube-system
2、配置kubelet组件
默认kubelet证书轮转已启用。
# vi /var/lib/kubelet/config.yaml
...
rotateCertificates: true
3、测试
找一台节点测试,先查看现有客户端证书有效期:
# cd /var/lib/kubelet/pki
# openssl x509 -in kubelet-client-current.pem -noout -dates
notBefore=May 25 09:01:24 2020 GMT
notAfter=May 25 09:01:24 2021 GMT
修改服务器时间,模拟证书即将到期:
# date -s "2021-5-20"
# systemctl restart kubelet
再查看证书有效期,可以看到已经是十年:
# openssl x509 -in kubelet-client-current.pem -noout -dates
notBefore=Aug 8 15:44:55 2020 GMT
notAfter=May 23 09:05:30 2030 GMT
如果你采用老师的二进制方式部署,已经配置了默认是5年,所以在5年之前不会出现证书过期问题的。
找一台节点查看:
# cd /data/kubernetes/ssl
# openssl x509 -in kubelet-client-current.pem -noout -dates
notBefore=Aug 8 15:54:54 2020 GMT
notAfter=Aug 7 07:38:00 2025 GMT
为确保5年后证书能自动续签,还需要启用证书轮转和自动签发。
1、启用证书轮转
```bash]
...
rotateCertificates: true
**2、 配置kubelet证书申请自动签发**
```bash
#自动批准首次申请证书的 CSR 请求
kubectl create clusterrolebinding node-client-auto-approve-csr --clusterrole=system:certificates.k8s.io:certificatesigningrequests:nodeclient --user=kubelet-bootstrap
# 自动批准kubelet客户端证书续签
kubectl create clusterrolebinding node-client-auto-renew-crt --clusterrole=system:certificates.k8s.io:certificatesigningrequests:selfnodeclient --group=system:nodes
# 自动批准kubelet服务端证书续签
kubectl create clusterrolebinding node-server-auto-renew-crt --clusterrole=system:certificates.k8s.io:certificatesigningrequests:selfnodeserver --group=system:nodes二进制部署过程中,apiserver和etcd由cfssl或者openssl工具自签证书并可以定义过期时间,而kubelet连接apiserver所需的客户端证书是由controller-manager组件自动颁发,默认是一年,如果到期,kubelet将无法使用过期的证书连接apiserver,从而导致无法正常工作,日志会给出证书过期错误(x509: certificate has expired or is not yet valid)
解决该问题可启用kubelet证书轮转。
1、配置kube-controller-manager组件
vi /etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
containers:
- command:
- kube-controller-manager
- --experimental-cluster-signing-duration=87600h0m0s
- --feature-gates=RotateKubeletServerCertificate=true
…
添加上述两个参数:
- experimental-cluster-signing-duration=87600h0m0s 为kubelet客户端证书颁发有效期10年
- feature-gates=RotateKubeletServerCertificate=true 启用server证书颁发
配置完成后,重建pod使之生效:
kubectl delete pod kube-controller-manager-k8s-master -n kube-system
2、配置kubelet组件
默认kubelet证书轮转已启用。
# vi /var/lib/kubelet/config.yaml
...
rotateCertificates: true
3、测试
找一台节点测试,先查看现有客户端证书有效期:
# cd /var/lib/kubelet/pki
# openssl x509 -in kubelet-client-current.pem -noout -dates
notBefore=May 25 09:01:24 2020 GMT
notAfter=May 25 09:01:24 2021 GMT
修改服务器时间,模拟证书即将到期:
# date -s "2021-5-20"
# systemctl restart kubelet
再查看证书有效期,可以看到已经是十年:
# openssl x509 -in kubelet-client-current.pem -noout -dates
notBefore=Aug 8 15:44:55 2020 GMT
notAfter=May 23 09:05:30 2030 GMT
如果你采用老师的二进制方式部署,已经配置了默认是5年,所以在5年之前不会出现证书过期问题的。
找一台节点查看:
# cd /opt/kubernetes/ssl
# openssl x509 -in kubelet-client-current.pem -noout -dates
notBefore=Aug 8 15:54:54 2020 GMT
notAfter=Aug 7 07:38:00 2025 GMT
为确保5年后证书能自动续签,还需要启用证书轮转和自动签发。
1、启用证书轮转
# vi /opt/kubernetes/cfg/kubelet-config.yml
...
rotateCertificates: true
# systemctl restart kubelet
2、 配置kubelet证书申请自动签发
#自动批准首次申请证书的 CSR 请求
kubectl create clusterrolebinding node-client-auto-approve-csr --clusterrole=system:certificates.k8s.io:certificatesigningrequests:nodeclient --user=kubelet-bootstrap
# 自动批准kubelet客户端证书续签
kubectl create clusterrolebinding node-client-auto-renew-crt --clusterrole=system:certificates.k8s.io:certificatesigningrequests:selfnodeclient --group=system:nodes
# 自动批准kubelet服务端证书续签
kubectl create clusterrolebinding node-server-auto-renew-crt --clusterrole=system:certificates.k8s.io:certificatesigningrequests:selfnodeserver --group=system:nodes
镜像拉取失败: 认证问题,secret配置需要根据namespace和业务的namespace对应
调度类问题: 无法调度,一直Pending,需要查节点污点,调度策略,污点容忍,调度节点node问题
svc和deploy无法通信: namespace问题,多看看describe事件或者查看标签问题
1、iptables模式
iptables-save |grep web
-A KUBE-SEP-D4VB4DJ76HXQK5R7 -p tcp -m comment --comment "default/web:" -m tcp -j DNAT --to-destination 10.244.1.4:8080
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.0.0.149/32 -p tcp -m comment --comment "default/web: cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.0.149/32 -p tcp -m comment --comment "default/web: cluster IP" -m tcp --dport 80 -j KUBE-SVC-BIJGBSD4RZCCZX5R
-A KUBE-SVC-BIJGBSD4RZCCZX5R -m comment --comment "default/web:" -j KUBE-SEP-D4VB4DJ76HXQK5R7
10.0.0.80:80 -> 10.244.2.4:8080
2、ipvs模式
查看ipvs规则:ipvsadm -L -n
ipvs0绑定clusterip,捕获数据包,根据ipvs规则转发后端。
nodeport数据包流程:
用户->nodeport->iptables/ipvs->pod
kubectl get svc,deploy,pod -n xxx
kubectl describe pod xxxx
kubectl exec -it podxxxx -c xxxx -n xxxx -- bash #进入指定容器
标签:nat mes targe linux top dna demo 故障排查 int
原文地址:https://blog.51cto.com/13812615/2520575