SLA探活工具EaseProbe简介

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6
开源的健康/状态检查工具,支持以下探测方法:HTTP,TCP,TLS,Shell命令,SSH命令,主机资源使用情况和本机客户端。

因为很喜欢这个工具,我想了几天,决定这样去介绍。

需求:

  • 监控 http 接口可用性,及返回的字符串,返回 success 代表成功,返回 failure 则代表失败。
  • 监控 tcp 接口可用性
  • 监控 ssl 证书过期时间

一个运维拿到这样的需求,会怎么做?

  1. 安装 Zabbix,用 web 监控 去监控 http 接口,用 net.tcp.service[service,,] 监控 tcp 接口,写一个脚本再配置监控项去监控 ssl 证书过期时间,配置钉钉告警。

  2. 安装 Prometheus,用 blackbox 监控 http、tcp、ssl证书过期时间,安装 alertmanager 、配置钉钉告警。

如果现在有一个工具,只需要填几个配置文件,就能完成上面的所有需求,而且功能还不止于此,会不会让你觉得 awesome

当然有大佬会说,写个 shell 脚本更简单,那你移动鼠标到右上角有个 X,你点一下。

背景

经常需要知道某个接口或中间件是不是挂了,虽然已经有 zabbix、Prometheus 这样的监控系统,毕竟还是重了些,下面介绍一个轻量级的工具 EaseProbe,在 github 上已经获得 1.5k star,作者是大名鼎鼎的左耳朵耗子,想了解作者的自行谷歌。

主页:https://github.com/megaease/easeprobe

官方介绍:

EaseProbe is a simple, standalone, and lightweight tool that can do health/status checking, written in Go.

image

一个开源的健康/状态检查工具,支持以下探测方法:HTTP,TCP,TLS,Shell命令,SSH命令,主机资源使用情况和本机客户端。

实战

上面的需求是如何完成的呢?看如下配置文件

http: # http probes
  - name: prometheus
    url: http://192.168.96.121:9090/prometheus/graph
    username: admin
    password: 123456
    success_code:
      - [200,206] # the code >=200 and <= 206
      - [300,308] # the code >=300 and <= 308
    # Response Checking
    contain: "success" # response body must contain this string, if not the probe is considered failed.
    not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.	
  - name: zabbix
    url: http://192.168.96.121/zabbix
    username: admin
    password: 123456
    success_code:
      - [200,206] # the code >=200 and <= 206
      - [300,308] # the code >=300 and <= 308
    # Response Checking
    contain: "success" # response body must contain this string, if not the probe is considered failed.
    not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.

tcp:
  - name: nginx1
    host: 192.168.96.60:80
  - name: nginx2
    host: 192.168.96.58:80


tls:
  - name: xx ssl证书30天后过期
    host: xx.cn:443
    proxy: socks5://proxy.server:1080 # Optional. Only support socks5.
                                    # Also support the `ALL_PROXY` environment.
    insecure_skip_verify: true # dont check cert validity
    expire_skip_verify: false # dont check cert expire date
    alert_expire_before: 720h # alert if cert expire date is before X, the value is a Duration, see https://pkg.go.dev/time#ParseDuration. example: 1h, 1m, 1s. expire_skip_verify must be false to use this feature.
    # root_ca_pem_path: /path/to/root/ca.pem # ignore if root_ca_pem is present
    # root_ca_pem: |
    #   -----BEGIN CERTIFICATE-----

  - name: yyssl证书30天后过期
    host: yy.cn:443
    proxy: socks5://proxy.server:1080 # Optional. Only support socks5.
                                    # Also support the `ALL_PROXY` environment.
    insecure_skip_verify: true # dont check cert validity
    expire_skip_verify: false # dont check cert expire date
    alert_expire_before: 720h # alert if cert expire date is before X, the value is a Duration, see https://pkg.go.dev/time#ParseDuration. example: 1h, 1m, 1s. expire_skip_verify must be false to use this feature.
    # root_ca_pem_path: /path/to/root/ca.pem # ignore if root_ca_pem is present
    # root_ca_pem: |
    #   -----BEGIN CERTIFICATE-----




settings:
  time: "23:59"
  timeformat: "2006-01-02 15:04:05 Z07:00"
  # check the following link to see the time zone list
  # https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  timezone: "Asia/Shanghai" #  default: UTC
  probe:
    timeout: 30s # the time out for all probes
    interval: 1m # probe every minute for all probes
# SLA Report schedule
sla:
    #  minutely, hourly, daily, weekly (Sunday), monthly (Last Day), none
  schedule: "weekly"
    # UTC time, the format is 'hour:min:sec'
  time: "23:59"


# Notification Configuration
notify:
  dingtalk:
    - name: "dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=123456"
      secret: "" # sign secret if set
  log:
    - name: log file # local log file
      file: /var/log/easeprobe.log
  http:
    port: 8181
    log:
       file: /var/log/easeprobe-http-access.log #<---- access log file location

告警展示

这里有几条告警展示一下
还有 89 个小时 ssl 证书就过期了
image

Redis 挂了
image

SLA 报表输出,可以设置每分钟,每小时,每星期,每天,每月
image

其余功能

还支持其他强大的功能,可以去官方查看。

image

后记

这里并不是说要拿它去和 zabbix、Prometheus 去比较,没有可比性。它只是比较轻,对于一些小的需求,不需要用那么重的软件去完成。就比如统一日志平台,从节省资源的层面来看,rsyslog 比 elk 更加适合。

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6