Kubernetes 高级篇 k8s Event详述及持久化方案

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

1、什么是Kubernetes Event

Kubernetes的事件Event是一种资源对象Resource Object用于展示集群内发生的情况Kubernetes系统中的各个组件会将运行时发生的各种事件上报给Kubernetes API Server。

例如调度器做了什么决定某些Pod为什么被从节点中驱逐。可以通过kubectl get event或kubectl describe pod <podname>命令显示事件查看Kubernetes集群中发生了哪些事件。执行这些命令后默认情况下只会显示最近1小时内发生的事件。
由于Kubernetes的事件是一种资源对象因此它们存储在Kubernetes API Server的Etcd集群中。为避免磁盘空间被填满故强制执行保留策略在最后一次的事件发生后删除1小时之前发生的事件。

2、为什么需要Kubernetes Event

Kubernetes 是分布式架构apiserver 是整个集群的交互中心客户端主要和它打交道kubelet 是各个节点上的 worker负责执行具体的任务。对于用户来说每次创建资源的时候除了看到它的最终状态一般是运行态希望看到资源执行的过程中间经过了哪些步骤。这些反馈信息对于调试来说非常重要有些任务会失败或者卡在某个步骤有了这些信息我们就能够准确地定位问题。

3、Kubernetes Event示例

我们先来做个简单的示例来看看 Kubernetes 集群中的 events 是什么。

创建一个新的名叫 zmc 的 namespace 然后在其中创建一个叫做 redis 的 deployment。接下来查看这个 namespace 中的所有 events。

[root@master1 ~]# kubectl create ns zmc
namespace/zmc created
[root@master1 ~]#  kubectl -n zmc create deployment redis --image=redis
deployment.apps/redis created
[root@master1 ~]# kubectl -n zmc get deploy
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
redis   1/1     1            1           35s
[root@master1 ~]# kubectl -n zmc get events
LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE
67s         Normal   Scheduled           pod/redis-6749d7bd65-m6cmj    Successfully assigned zmc/redis-6749d7bd65-m6cmj to node2
66s         Normal   Pulling             pod/redis-6749d7bd65-m6cmj    Pulling image "redis"
55s         Normal   Pulled              pod/redis-6749d7bd65-m6cmj    Successfully pulled image "redis" in 11.720629892s
48s         Normal   Created             pod/redis-6749d7bd65-m6cmj    Created container redis
48s         Normal   Started             pod/redis-6749d7bd65-m6cmj    Started container redis
67s         Normal   SuccessfulCreate    replicaset/redis-6749d7bd65   Created pod: redis-6749d7bd65-m6cmj
67s         Normal   ScalingReplicaSet   deployment/redis              Scaled up replica set redis-6749d7bd65 to 1

但是我们会发现默认情况下 kubectl get events 并没有按照 events 发生的顺序进行排列所以我们往往需要为其增加 --sort-by='{.metadata.creationTimestamp}' 参数来让其输出可以按时间进行排列。

按时间排序后可以看到如下结果

[root@master1 ~]#  kubectl -n zmc get events --sort-by='{.metadata.creationTimestamp}'
LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE
3m12s       Normal   Scheduled           pod/redis-6749d7bd65-m6cmj    Successfully assigned zmc/redis-6749d7bd65-m6cmj to node2
3m12s       Normal   SuccessfulCreate    replicaset/redis-6749d7bd65   Created pod: redis-6749d7bd65-m6cmj
3m12s       Normal   ScalingReplicaSet   deployment/redis              Scaled up replica set redis-6749d7bd65 to 1
3m11s       Normal   Pulling             pod/redis-6749d7bd65-m6cmj    Pulling image "redis"
3m          Normal   Pulled              pod/redis-6749d7bd65-m6cmj    Successfully pulled image "redis" in 11.720629892s
2m53s       Normal   Created             pod/redis-6749d7bd65-m6cmj    Created container redis
2m53s       Normal   Started             pod/redis-6749d7bd65-m6cmj    Started container redis

通过以上的操作我们可以发现 events 实际上是 Kubernetes 集群中的一种资源。当 Kubernetes 集群中资源状态发生变化时可以产生新的 events。

4、深入了解Kubernetes Event

4.1、获取当前namespace下所有的event名称

既然 events 是 Kubernetes 集群中的一种资源正常情况下它的 metadata.name 中应该包含其名称用于进行单独操作。所以我们可以使用如下命令输出其 name

[root@master1 ~]#  kubectl -n zmc get events --sort-by='{.metadata.creationTimestamp}' -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
redis-6749d7bd65-m6cmj.16d5a1b59bce1473
redis-6749d7bd65.16d5a1b59b9ad7c4
redis.16d5a1b5990ac00c
redis-6749d7bd65-m6cmj.16d5a1b5cc73736d
redis-6749d7bd65-m6cmj.16d5a1b8870e4081
redis-6749d7bd65-m6cmj.16d5a1ba20df9afd
redis-6749d7bd65-m6cmj.16d5a1ba2c4418d8

4.2、kubectl describe 中的 Events

我们可以分别对 Deployment 对象和 Pod 对象执行 describe 的操作可以得到如下结果省略掉了中间输出

对 Deployment 操作

[root@master1 ~]# kubectl -n zmc describe deploy/redis 
......
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  11m   deployment-controller  Scaled up replica set redis-6749d7bd65 to 1

对 Pod 操作

[root@master1 ~]# kubectl -n zmc describe pods redis-6749d7bd65-m6cmj
Name:         redis-6749d7bd65-m6cmj
Namespace:    zmc
.......
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  13m   default-scheduler  Successfully assigned zmc/redis-6749d7bd65-m6cmj to node2
  Normal  Pulling    13m   kubelet            Pulling image "redis"
  Normal  Pulled     13m   kubelet            Successfully pulled image "redis" in 11.720629892s
  Normal  Created    13m   kubelet            Created container redis
  Normal  Started    13m   kubelet            Started container redis

我们可以发现对不同的资源对象进行 describe 的时候能看到的 events 内容都是与自己有直接关联的。在 describe Deployment 的时候看不到 Pod 相关的 Events 。

这说明 Event 对象中是包含它所描述的资源对象的信息的它们是有直接联系的。

4.3、更进一步了解Kubernetes Events

我们来看看如下的示例创建一个 Deployment 但是使用一个不存在的镜像

[root@master1 ~]#  kubectl -n zmc create deployment non-exist --image=zmcno
deployment.apps/non-exist created
[root@master1 ~]# kubectl -n zmc get pods
NAME                         READY   STATUS         RESTARTS   AGE
non-exist-54d7bcffbc-7kq4b   0/1     ErrImagePull   0          12s
redis-6749d7bd65-m6cmj       1/1     Running        0          16m

我们可以看到当前的 Pod 处于一个 ErrImagePull 的状态。查看当前 namespace 中的 events (我省略掉了之前 deploy/redis 的记录)

[root@master1 ~]# kubectl -n zmc get events --sort-by='{.metadata.creationTimestamp}'
LAST SEEN   TYPE      REASON              OBJECT                            MESSAGE
......
69s         Normal    SuccessfulCreate    replicaset/non-exist-54d7bcffbc   Created pod: non-exist-54d7bcffbc-7kq4b
69s         Normal    ScalingReplicaSet   deployment/non-exist              Scaled up replica set non-exist-54d7bcffbc to 1
69s         Normal    Scheduled           pod/non-exist-54d7bcffbc-7kq4b    Successfully assigned zmc/non-exist-54d7bcffbc-7kq4b to node2
21s         Normal    Pulling             pod/non-exist-54d7bcffbc-7kq4b    Pulling image "zmcno"
18s         Warning   Failed              pod/non-exist-54d7bcffbc-7kq4b    Error: ErrImagePull
18s         Warning   Failed              pod/non-exist-54d7bcffbc-7kq4b    Failed to pull image "zmcno": rpc error: code = Unknown desc = Error response from daemon: pull access denied for zmcno, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
3s          Warning   Failed              pod/non-exist-54d7bcffbc-7kq4b    Error: ImagePullBackOff
3s          Normal    BackOff             pod/non-exist-54d7bcffbc-7kq4b    Back-off pulling image "zmcno"

对这个 Pod 执行 describe 操作

[root@master1 ~]# kubectl -n zmc describe pod non-exist-54d7bcffbc-7kq4b
Name:         non-exist-54d7bcffbc-7kq4b
Namespace:    zmc
.......
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m26s                default-scheduler  Successfully assigned zmc/non-exist-54d7bcffbc-7kq4b to node2
  Normal   Pulling    54s (x4 over 2m25s)  kubelet            Pulling image "zmcno"
  Warning  Failed     50s (x4 over 2m22s)  kubelet            Failed to pull image "zmcno": rpc error: code = Unknown desc = Error response from daemon: pull access denied for zmcno, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
  Warning  Failed     50s (x4 over 2m22s)  kubelet            Error: ErrImagePull
  Warning  Failed     23s (x6 over 2m21s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    11s (x7 over 2m21s)  kubelet            Back-off pulling image "zmcno"

我们可以发现这里的输出和之前正确运行 Pod 的不一样。最主要的区别在于 Age 列。这里我们看到了类似 11s (x7 over 2m21s) 这样的输出。

它的含义表示该类型的 event 在 2m21s 中已经发生了 7 次最近的一次发生在 11s 之前

但是当我们去直接 kubectl get events 的时候我们并没有看到有 7 次重复的 event 。这说明 Kubernetes 会自动将重复的 events 进行合并。

4.4、Kubernetes Event资源对象字段详细解释

以下内容是从 Events 中随便选择的一条(方法前面内容已经讲了) 并将其内容使用 YAML 格式进行输出

[root@master1 ~]# kubectl -n zmc get events non-exist-54d7bcffbc-7kq4b.16d5a293c26b7082 -o yaml
apiVersion: v1
count: 19
eventTime: null
firstTimestamp: "2022-02-20T23:33:14Z"
involvedObject:
  apiVersion: v1
  fieldPath: spec.containers{zmcno}
  kind: Pod
  name: non-exist-54d7bcffbc-7kq4b
  namespace: zmc
  resourceVersion: "1540948"
  uid: a68c9ffe-4025-4254-9dda-f6eab5d4edb9
kind: Event
lastTimestamp: "2022-02-20T23:38:16Z"
message: Back-off pulling image "zmcno"
metadata:
  creationTimestamp: "2022-02-20T23:33:14Z"
  name: non-exist-54d7bcffbc-7kq4b.16d5a293c26b7082
  namespace: zmc
  resourceVersion: "1541908"
  selfLink: /api/v1/namespaces/zmc/events/non-exist-54d7bcffbc-7kq4b.16d5a293c26b7082
  uid: c8505943-4740-4a15-bb30-c4727508be40
reason: BackOff
reportingComponent: ""
reportingInstance: ""
source:
  component: kubelet
  host: node2
type: Normal

其中主要字段的含义如下

count: 表示当前同类的事件发生了多少次
firstTimestamp 和 lastTimestamp 分别表示了这个 event 首次出现了最近一次出现的时间
involvedObject: 与此 event 有直接关联的资源对象触发event的资源对象, 结构如下

type ObjectReference struct {
 Kind string
 Namespace string
 Name string
 UID types.UID
 APIVersion string
 ResourceVersion string
 FieldPath string
}

source: 直接关联的组件, 结构如下

type EventSource struct {
 Component string
 Host string
}

reason: 简单的总结或者一个固定的代码比较适合用于做筛选条件主要是为了让机器可读当前有超过 50 种这样的代码
message: 给一个更易让人读懂的详细说明
type: 当前只有 Normal正常事件 和 Warning警告事件 两种类型, 源码中也分别写了其含义

// staging/src/k8s.io/api/core/v1/types.go
const (
 // Information only and will not cause any problems
 EventTypeNormal string = "Normal"
 // These events are to warn that something might go wrong
 EventTypeWarning string = "Warning"
)