另外只要是calico组件一直处于running 0/1的状态都可以参考下面的解决办法
1.问题描述将pinpoint微服务链路监控以docker方式部署在了k8s集群中的node节点上,刚开始没有问题,过了一天后,发现calico组件全部启动失败,并且集群中的所有微服务无法提供服务
calico报错如下
# kubectl logs -f calico-node-p6hsk -n kube-system 2021-11-26 07:28:04.920 [INFO][8] startup/startup.go 299: Early log level set to info 2021-11-26 07:28:04.920 [INFO][8] startup/startup.go 315: Using NODENAME environment for node name 2021-11-26 07:28:04.920 [INFO][8] startup/startup.go 327: Determined node name: binary-k8s-node2 2021-11-26 07:28:04.925 [INFO][8] startup/startup.go 359: Checking datastore connection 2021-11-26 07:28:04.976 [INFO][8] startup/startup.go 383: Datastore connection verified 2021-11-26 07:28:04.977 [INFO][8] startup/startup.go 104: Datastore is ready 2021-11-26 07:28:05.086 [INFO][8] startup/startup.go 425: Initialize BGP data 2021-11-26 07:28:05.087 [INFO][8] startup/startup.go 664: Using autodetected IPv4 address on interface br-a32444aa3aae: 172.18.0.1/16 2021-11-26 07:28:05.087 [INFO][8] startup/startup.go 495: Node IPv4 changed, will check for conflicts 2021-11-26 07:28:05.105 [WARNING][8] startup/startup.go 1010: Calico node 'binary-k8s-master1' is already using the IPv4 address 172.18.0.1. 2021-11-26 07:28:05.106 [INFO][8] startup/startup.go 263: Clearing out-of-date IPv4 address from this node IP="172.18.0.1/16" 2021-11-26 07:28:05.139 [WARNING][8] startup/startup.go 1214: Terminating
也没有对K8S集群做过任何特殊 *** 作,集群中calico就异常了,一直无法启动,重启集群也不好使。
2.问题解决我们可以仔细观察日志中的报错,看到如下的一句话
2021-11-26 07:28:05.087 [INFO][8] startup/startup.go 664: Using autodetected IPv4 address on interface br-a32444aa3aae: 172.18.0.1/16
br-a32444aa3aae这是个什么鬼,看报错日志的意思也就是说br-a32444aa3aae这个网卡上有一个172.18.0.1IP导致和calico冲突了。
我们去node2主机上查一下这个网卡,果不其然确实有这个网卡,并且也有这个IP。
# ifconfig br-a32444aa3aae: flags=4099mtu 1500 inet 172.18.0.1 netmask 255.255.0.0 broadcast 172.19.255.255 inet6 fe80::42:8ff:feb9:2492 prefixlen 64 scopeid 0x20 ether 02:42:08:b9:24:92 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
这个网卡像是docker生成的,我们查一下docker的网络
# docker network ls NETWORK ID NAME DRIVER SCOPE 55efa42d705d bridge bridge local aba466ac1f9c host host local 89494ad04935 none null local a32444aa3aae pinpoint-docker-185_pinpoint bridge local
的确, br-a32444aa3aae与a32444aa3aae网络是相对应的,也就是我们的pinpoint,将这个网络删除
# docker network rm a32444aa3aae a32444aa3aae
删掉网络之后,我们在观察calico网络已经成功启动
# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-97769f7c7-dsdrk 1/1 Running 1 38m calico-node-jkl6q 1/1 Running 0 8m38s calico-node-pgstp 1/1 Running 0 8m39s calico-node-vssbk 1/1 Running 0 8m39s coredns-6cc56c94bd-m7pzr 1/1 Running 1 30m3.总结
使用docker或者docker-compose部署的程序最好不要放在K8S集群,docker-compose部署的服务都有自己的docker网络,会和calico网络产生冲突。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)