Kubernetes排错记录
阿里云国内75折 回扣 微信号:monov8 |
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6 |
错误产生背景:在Kuberentes环境使用Deployment方式去创建Prometheus实例。用到的prometheus-deploy.yaml文件的内容如下:
#错误1
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-core
namespace: monitoring
labels:
app: prometheus
component: core
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: core
template:
metadata:
name: prometheus-main
labels:
app: prometheus
component: core
spec:
serviceAccountName: prometheus-k8s
# nodeSelector:
# kubernetes.io/hostname: $YOUr_IP
containers:
- name: prometheus
image: bitnami/prometheus:2.47.2
#错误3
command: ["/bin/bash", "-ce", "tail -f /dev/null"]
imagePullPolicy: IfNotPresent
args:
- '--storage.tsdb.retention=15d'
- '--config.file=/etc/prometheus/prometheus.yaml'
- '--storage.tsdb.path=/home/prometheus_data'
- '--web.enable-lifecycle'
ports:
- name: webui
containerPort: 9090
resources:
#错误2
#requests:
#cpu: 20000m
#memory: 20000M
#limits:
#cpu: 20000m
#memory: 20000M
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /home/prometheus_data
- name: config-volume
mountPath: /etc/prometheus
- name: rules-volume
mountPath: /etc/prometheus-rules
- name: time
mountPath: /etc/localtime
volumes:
- name: data
hostPath:
path: /home/cdnadmin/prometheus_data
- name: config-volume
configMap:
name: prometheus-core
- name: rules-volume
configMap:
name: prometheus-rules
- name: time
hostPath:
path: /etc/localtime
kubectl apply -f prometheus-deploy.yaml
apply后产生了如下错误(注意yaml文件中标记的错误1~3),接下来逐一进行分析。
错误1:
error: resource mapping not found for name: "prometheus-core" namespace: "monitoring" from "prometheus-deploy.yaml":
no matches for kind "Deployment" in version "extensions/v1beta1"
ensure CRDs are installed first
no matches for kind "Deployment" in version "extensions/v1beta1"—说明是Deployment资源对应的ApiVersion的版本有问题。原先ApiVersion字段的值为extensions/v1beta1。此时需要查询Deployment资源的版本。
kubectl explain deploy
解决方法:将prometheus-deploy.yaml中apiVersion的值修改为apps/v1即可。
错误2:
Pod处于Pending状态,查看Pod详情出现如下错误:
0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: },
2 Insufficient cpu, 2 Insufficient memory. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling..
大意是说,1个节点有未加速的污点(node-role.kubernetes.io/control-plane),1个节点有未加速的污点(node.kubernetes.io/unreachable);CPU和内存资源不足导致无合适节点可用。
解决方法:
(1)针对污点问题,执行如下命令取消污点
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
kubectl taint nodes --all node.kubernetes.io/unreachable-
(2)针对资源问题,由于自己的环境中CPU和内存就没达到20000m,因此直接注释掉prometheus-deploy.yaml中resources部分(资源请求&资源限制)即可。
错误3:
Pod启动后一直重启,并报Back-off restarting failed container…。
Back-off restarting failed container的Warning事件,一般是由于通过指定的镜像启动容器后,容器内部没有常驻进程,导致容器启动成功后即退出,从而进行了持续的重启。
解决方法:找到Deployment的containers部分,为command字段添加一个能执行成功且不会退出的指令(例如,command: ["/bin/bash", "-ce", "tail -f /dev/null"])。
错误排除完再次apply就成功了。最后查看新创建的Deployment和Pod的状态
阿里云国内75折 回扣 微信号:monov8 |
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6 |