Kubernetes排错记录

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

错误产生背景：在Kuberentes环境使用Deployment方式去创建Prometheus实例。用到的prometheus-deploy.yaml文件的内容如下：

#错误1
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-core
  namespace: monitoring
  labels:
    app: prometheus
    component: core
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: core
  template:
    metadata:
      name: prometheus-main
      labels:
        app: prometheus
        component: core
    spec:
      serviceAccountName: prometheus-k8s
      # nodeSelector:
      #   kubernetes.io/hostname: $YOUr_IP
      containers:
      - name: prometheus
        image: bitnami/prometheus:2.47.2 
        #错误3
        command: ["/bin/bash", "-ce", "tail -f /dev/null"]
        imagePullPolicy: IfNotPresent
        args:
          - '--storage.tsdb.retention=15d'
          - '--config.file=/etc/prometheus/prometheus.yaml'
          - '--storage.tsdb.path=/home/prometheus_data'
          - '--web.enable-lifecycle' 
        ports:
        - name: webui
          containerPort: 9090
        resources:
          #错误2
          #requests:
            #cpu: 20000m
            #memory: 20000M
          #limits:
            #cpu: 20000m
            #memory: 20000M
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /home/prometheus_data
        - name: config-volume
          mountPath: /etc/prometheus
        - name: rules-volume
          mountPath: /etc/prometheus-rules
        - name: time
          mountPath: /etc/localtime
      volumes:
      - name: data
        hostPath:
          path: /home/cdnadmin/prometheus_data 
      - name: config-volume
        configMap:
          name: prometheus-core
      - name: rules-volume
        configMap:
          name: prometheus-rules
      - name: time
        hostPath:
          path: /etc/localtime


kubectl apply -f prometheus-deploy.yaml

apply后产生了如下错误(注意yaml文件中标记的错误1~3)，接下来逐一进行分析。

错误1：

error: resource mapping not found for name: "prometheus-core" namespace: "monitoring" from "prometheus-deploy.yaml": 
no matches for kind "Deployment" in version "extensions/v1beta1"
ensure CRDs are installed first

Kubernetes排错记录_Kubernetes

no matches for kind "Deployment" in version "extensions/v1beta1"—说明是Deployment资源对应的ApiVersion的版本有问题。原先ApiVersion字段的值为extensions/v1beta1。此时需要查询Deployment资源的版本。

kubectl explain deploy

Kubernetes排错记录_Kubernetes_02

解决方法：将prometheus-deploy.yaml中apiVersion的值修改为apps/v1即可。

错误2：

Pod处于Pending状态，查看Pod详情出现如下错误：

0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 
2 Insufficient cpu, 2 Insufficient memory. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling..

Kubernetes排错记录_Kubernetes_03

大意是说，1个节点有未加速的污点(node-role.kubernetes.io/control-plane)，1个节点有未加速的污点(node.kubernetes.io/unreachable)；CPU和内存资源不足导致无合适节点可用。

解决方法：

(1)针对污点问题，执行如下命令取消污点

kubectl taint nodes --all node-role.kubernetes.io/control-plane-
kubectl taint nodes --all node.kubernetes.io/unreachable-

Kubernetes排错记录_Kubernetes_04

(2)针对资源问题，由于自己的环境中CPU和内存就没达到20000m，因此直接注释掉prometheus-deploy.yaml中resources部分(资源请求&资源限制)即可。

错误3：

Pod启动后一直重启，并报Back-off restarting failed container…。

Kubernetes排错记录_Kubernetes_05

Back-off restarting failed container的Warning事件，一般是由于通过指定的镜像启动容器后，容器内部没有常驻进程，导致容器启动成功后即退出，从而进行了持续的重启。

解决方法：找到Deployment的containers部分，为command字段添加一个能执行成功且不会退出的指令(例如，command: ["/bin/bash", "-ce", "tail -f /dev/null"])。

Kubernetes排错记录_Kubernetes_06

错误排除完再次apply就成功了。最后查看新创建的Deployment和Pod的状态

Kubernetes排错记录_Kubernetes_07

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

标签: k8s

返回列表

上一篇：北林OJ——260二维数组中元素的查重

下一篇：设16位浮点数，其中阶符1位、阶码值6位、数符1位、尾数8位。若阶码用移码表示，尾数用补码表示，则该浮点数所能表示的数值范围是

“Kubernetes排错记录” 的相关文章

基于K8s的DevOps平台实践（三）1年前 (2023-02-02)

k8s 文件目录挂载1年前 (2023-02-02)

K8S 初始化系统和全局变量1年前 (2023-02-02)

【云原生】k8s图形化管理攻击之rancher1年前 (2023-02-02)

二十九、Kubernetes中CronJob(CJ)详解1年前 (2023-02-02)

【Kubernetes 企业项目实战】04、基于 K8s 构建 EFK+logstash+kafka 日志平台（中）1年前 (2023-02-02)

【Kubernetes】从基础认识 k8s核心pod相关概念1年前 (2023-02-02)

K8S学习模块1年前 (2023-02-02)

不懂Pod？不足以谈K8s1年前 (2023-02-02)

k8s部署nginx1年前 (2023-02-02)

Kubernetes排错记录

“Kubernetes排错记录” 的相关文章

阿里云国际版