错误产生背景:在Kuberentes环境使用Deployment方式去创建Prometheus实例。用到的prometheus-deploy.yaml文件的内容如下:

#错误1
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-core
  namespace: monitoring
  labels:
    app: prometheus
    component: core
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: core
  template:
    metadata:
      name: prometheus-main
      labels:
        app: prometheus
        component: core
    spec:
      serviceAccountName: prometheus-k8s
      # nodeSelector:
      #   kubernetes.io/hostname: $YOUr_IP
      containers:
      - name: prometheus
        image: bitnami/prometheus:2.47.2 
        #错误3
        command: ["/bin/bash", "-ce", "tail -f /dev/null"]
        imagePullPolicy: IfNotPresent
        args:
          - '--storage.tsdb.retention=15d'
          - '--config.file=/etc/prometheus/prometheus.yaml'
          - '--storage.tsdb.path=/home/prometheus_data'
          - '--web.enable-lifecycle' 
        ports:
        - name: webui
          containerPort: 9090
        resources:
          #错误2
          #requests:
            #cpu: 20000m
            #memory: 20000M
          #limits:
            #cpu: 20000m
            #memory: 20000M
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /home/prometheus_data
        - name: config-volume
          mountPath: /etc/prometheus
        - name: rules-volume
          mountPath: /etc/prometheus-rules
        - name: time
          mountPath: /etc/localtime
      volumes:
      - name: data
        hostPath:
          path: /home/cdnadmin/prometheus_data 
      - name: config-volume
        configMap:
          name: prometheus-core
      - name: rules-volume
        configMap:
          name: prometheus-rules
      - name: time
        hostPath:
          path: /etc/localtime


kubectl apply -f prometheus-deploy.yaml

apply后产生了如下错误(注意yaml文件中标记的错误1~3),接下来逐一进行分析。

错误1:

error: resource mapping not found for name: "prometheus-core" namespace: "monitoring" from "prometheus-deploy.yaml": 
no matches for kind "Deployment" in version "extensions/v1beta1"
ensure CRDs are installed first

Kubernetes排错记录_Kubernetes

no matches for kind "Deployment" in version "extensions/v1beta1"—说明是Deployment资源对应的ApiVersion的版本有问题。原先ApiVersion字段的值为extensions/v1beta1。此时需要查询Deployment资源的版本。

kubectl explain deploy

Kubernetes排错记录_Kubernetes_02

解决方法:将prometheus-deploy.yaml中apiVersion的值修改为apps/v1即可。

错误2:

Pod处于Pending状态,查看Pod详情出现如下错误:

0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 
2 Insufficient cpu, 2 Insufficient memory. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling..

Kubernetes排错记录_Kubernetes_03

大意是说,1个节点有未加速的污点(node-role.kubernetes.io/control-plane),1个节点有未加速的污点(node.kubernetes.io/unreachable);CPU和内存资源不足导致无合适节点可用。

解决方法:

(1)针对污点问题,执行如下命令取消污点

kubectl taint nodes --all node-role.kubernetes.io/control-plane-
kubectl taint nodes --all node.kubernetes.io/unreachable-

Kubernetes排错记录_Kubernetes_04

(2)针对资源问题,由于自己的环境中CPU和内存就没达到20000m,因此直接注释掉prometheus-deploy.yaml中resources部分(资源请求&资源限制)即可。

错误3:

Pod启动后一直重启,并报Back-off restarting failed container…。

Kubernetes排错记录_Kubernetes_05

Back-off restarting failed container的Warning事件,一般是由于通过指定的镜像启动容器后,容器内部没有常驻进程,导致容器启动成功后即退出,从而进行了持续的重启。

解决方法:找到Deployment的containers部分,为command字段添加一个能执行成功且不会退出的指令(例如,command: ["/bin/bash", "-ce", "tail -f /dev/null"])。

Kubernetes排错记录_Kubernetes_06

错误排除完再次apply就成功了。最后查看新创建的Deployment和Pod的状态

Kubernetes排错记录_Kubernetes_07

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6
标签: k8s