Wise-IoU 作者导读：基于动态非单调聚焦机制的边界框损失

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

论文地址Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism

论文作者童赞嘉陈宇杭许泽蔚余荣 (指导老师)

Githubhttps://github.com/Instinct323/wiou

摘要目标检测作为计算机视觉的核心问题其检测性能依赖于损失函数的设计。边界框损失函数作为目标检测损失函数的重要组成部分其良好的定义将为目标检测模型带来显著的性能提升。近年来的研究大多假设训练数据中的示例有较高的质量致力于强化边界框损失的拟合能力。但我们注意到目标检测训练集中含有低质量示例如果一味地强化边界框对低质量示例的回归显然会危害模型检测性能的提升。Focal-EIoU v1 被提出以解决这个问题但由于其聚焦机制是静态的并未充分挖掘非单调聚焦机制的潜能。基于这个观点我们提出了动态非单调的聚焦机制设计了 Wise-IoU (WIoU)。动态非单调聚焦机制使用“离群度”替代 IoU 对锚框进行质量评估并提供了明智的梯度增益分配策略。该策略在降低高质量锚框的竞争力的同时也减小了低质量示例产生的有害梯度。这使得 WIoU 可以聚焦于普通质量的锚框并提高检测器的整体性能。将WIoU应用于最先进的单级检测器 YOLOv7 时在 MS-COCO 数据集上的 AP-75 从 53.03% 提升到 54.50%

前言因为我能使用的算力有限所以做实验时只在 YOLOv7 上做了。而且因为完整的 MS-COCO 需要更大的参数量训练一个模型需要 3 天时间所以我只取了其中四分之一的数据进行训练 (28474 张训练图片)。虽然实验量相比其它工作是远远不足的但这几天中文社区的反应让我感觉这篇文章还有救哈哈哈。为了支持广大计算机视觉研究者的工作我决定来这里讲解一下理论部分和代码实战

现有工作

记锚框为 $\vec{B}=[x\ y\ w\ h]$ 目标框为 $\vec{B_{gt}}=[x_{gt}\ y_{gt}\ w_{gt}\ h_{gt}]$

IoU 用于度量目标检测任务中预测框与真实框的重叠程度定义为

$\mathcal{L}_{IoU} = 1 - IoU = 1 - \frac{W_i H_i}{S_u}$

同时IoU 有一个致命的缺陷可以在下面公式中观察到。当边界框之间没有重叠时 $(W_i=0\ or\ H_i=0)$ $\mathcal{L}_{IoU}$ 反向传播的梯度消失。这导致重叠区域的宽度 $W_i$ 在训练时无法更新

$\frac{\partial \mathcal{L}_{IoU}}{\partial W_i} = \left\{\begin{matrix} -H_i\frac{IoU + 1}{S_u},W_i >0\\ 0,W_i = 0 \end{matrix}\right.$

现有的工作考虑了许多与包围盒相关的几何因素并构造了惩罚项 $\mathcal{R}_i$ 来解决这个问题现有的边界框损失都是基于加法的损失并遵循以下范式

$\mathcal{L}_i = \mathcal{L}_{IoU} + \mathcal{R}_i$

Distance-IoU

DIoU 将惩罚项定义为中心点连接的归一化长度

$\mathcal{R}_{DIoU} = \frac{(x-x_{gt})^2 + (y-y_{gt})^2}{W^2_g + H^2_g}$

同时为最小包围框的尺寸 $W_g, H_g$ 提供了负梯度这将使得 $W_g, H_g$ 增大而阻碍预测框与目标框重叠

$\frac {\partial \mathcal{R}_{DIoU}} {\partial W_g} = -2W_g \frac{(x-x_{gt})^2 + (y-y_{gt})^2}{(W^2_g + H^2_g)^2} < 0$

$\frac {\partial \mathcal{R}_{DIoU}} {\partial H_g} = -2H_g \frac{(x-x_{gt})^2 + (y-y_{gt})^2}{(W^2_g + H^2_g)^2} < 0$

但不可否认的是距离度量的确是一个极其有效的解决方案成为高效边界框损失的必要因子。EIoU 在此基础上加大了对距离度量的惩罚力度其惩罚项定义为

$\mathcal{R}_{EIoU} = \mathcal{R}_{DIoU} + \frac{(x-x_{gt})^2}{W^2_g} + \frac{(y-y_{gt})^2}{H^2_g}$

Complete-IoU

在 $\mathcal{R}_{DIoU}$ 的基础上CIoU 增加了对纵横比一致性的考虑

$\mathcal{R}_{CIoU} = \mathcal{R}_{DIoU} + \alpha v, \alpha = \frac{v}{\mathcal{L}_{IoU} + v}$

其中的 $v$ 描述了纵横比一致性

$v=\frac{4}{\pi^2} (\tan^{-1}\frac{w}{h} - \tan^{-1}\frac{w_{gt}}{h_{gt}}) ^2$

$\frac{\partial v}{\partial w} = \frac{8}{\pi^2} (\tan^{-1}\frac{w}{h} - \tan^{-1}\frac{w_{gt}}{h_{gt}}) ^2 \frac{h}{h^2+w^2}$

$\frac{\partial v}{\partial h} =- \frac{8}{\pi^2} (\tan^{-1}\frac{w}{h} - \tan^{-1}\frac{w_{gt}}{h_{gt}}) ^2 \frac{w}{h^2+w^2}$

其中 $v$ 反向传播的梯度满足 $\frac{\partial v}{\partial h} = -\frac{w}{h} \frac{\partial v}{\partial w}$ 也就是 $v$ 不可能为预测框的宽高提供同号的梯度。在前文对 DIoU 的分析中可知 DIoU 会产生负梯度 $\frac {\partial \mathcal{R}_{DIoU}} {\partial W_g}$ 当这个负梯度与 $\frac {\partial \mathcal{L}_{IoU}} {\partial W_g}$ 正好抵消时会导致预测框无法优化。而CIoU对纵横比一致性的考虑将打破这种僵局

Scylla-IoU

Zhora Gevorgyan 证明了中心对齐的边界框会具有更快的收敛速度以 angle cost、distance cost、shape cost 构造了SIoU。其中 angle cost 描述了边界框中心连线与 x-y 轴的最小夹角

$\mathcal{\varLambda} = \sin(2\sin^{-1}\frac{\min(|x-x_{gt}|,|y-y_{gt}|)}{\sqrt{(x-x_{gt})^2 + (y-y_{gt})^2} + \epsilon})$

distance cost 描述了两边界框的中心点在x轴和y轴上的归一化距离其惩罚力度与 angle cost 正相关。distance cost 被定义为

$\Delta = \frac{1}{2}\sum_{t=w,h}(1 - e^{-\gamma \rho_{t}}),\gamma = 2 - \varLambda$

$\left\{\begin{matrix} \rho_x = (\frac{x - x_{gt}}{W_g})^2\\ \rho_y = (\frac{y - y_{gt}}{H_g})^2 \end{matrix}\right.$

shape cost 描述了两边界框的形状差异当两边界框的尺寸不一致时不为 0。shape cost 被定义为

$\Omega = \frac{1}{2}\sum_{t=w,h}(1 - e^{\omega_t})^{\theta},\theta = 4$

$\left\{\begin{matrix} \omega_w = \frac{|w-w_{gt}|}{\max(w,w_{gt})}\\ \omega_h = \frac{|h-h_{gt}|}{\max(h,h_{gt})} \end{matrix}\right.$

$\mathcal{R}_{SIoU}$ 与 $\mathcal{R}_{CIoU}$ 类似它们都由 distance cost 和 shape cost 组成

$\mathcal{R}_{SIoU} = \Delta + \Omega$

本文方法

Wise-IoU v1

因为训练数据中难以避免地包含低质量示例所以如距离、纵横比之类的几何度量都会加剧对低质量示例的惩罚从而使模型的泛化性能下降。好的损失函数应该在锚框与目标框较好地重合时削弱几何度量的惩罚不过多地干预训练将使模型有更好的泛化能力。在此基础上我们根据距离度量构建了距离注意力得到了具有两层注意力机制的 WIoU v1

$\mathcal{R}_{WIoU} \in [1,e)$ 这将显著放大普通质量锚框的 $\mathcal{L}_{IoU}$
$\mathcal{L}_{IoU} \in [0,1]$ 这将显著降低高质量锚框的 $\mathcal{R}_{WIoU}$ 并在锚框与目标框重合较好的情况下显著降低其对中心点距离的关注

$\mathcal{L}_{WIoUv1} = \mathcal{R}_{WIoU} \mathcal{L}_{IoU}$

$\mathcal{R}_{WIoU} = \exp(\frac{(x-x_{gt})^2 + (y-y_{gt})^2}{(W^2_g + H^2_g)^*})$

为了防止RW Io U产生阻碍收敛的梯度将 $W_g, H_g$ 从计算图 (上标*表示此操作) 中分离。因为它有效地消除了阻碍收敛的因素所以我们没有引入新的度量指标如纵横比

Wise-IoU v2

Focal Loss 设计了一种针对交叉熵的单调聚焦机制有效降低了简单示例对损失值的贡献。这使得模型能够聚焦于困难示例获得分类性能的提升。类似地我们构造了 $\mathcal{L}_{WIoUv1}$ 的单调聚焦系数 $\mathcal{L}_{IoU}^{\gamma*}$

$\mathcal{L}_{WIoUv2} = \mathcal{L}_{IoU}^{\gamma *} \mathcal{L}_{WIoUv1}, \gamma > 0$

在模型训练过程中梯度增益 $\mathcal{L}_{IoU}^{\gamma *}$ 随着 $\mathcal{L}_{IoU}$ 的减小而减小导致训练后期收敛速度较慢。因此引入 $\mathcal{L}_{IoU}$ 的均值作为归一化因子

$\mathcal{L}_{WIoUv2} = (\frac{\mathcal{L}_{IoU}^*}{\ \overline{\mathcal{L}_{IoU}}\ })^{\gamma} \mathcal{L}_{WIoUv1}$

其中的 $\overline{\mathcal{L}_{IoU}}$ 为动量为 $m$ 的滑动平均值动态更新归一化因子使梯度增益 $r= (\frac{\mathcal{L}_{IoU}^*}{\ \overline{\mathcal{L}_{IoU}}\ })^{\gamma}$ 整体保持在较高水平解决了训练后期收敛速度慢的问题

Wise-IoU v3

定义离群度以描述锚框的质量其定义为

$\beta = \frac{\mathcal{L}_{IoU}^*}{\ \overline{\mathcal{L}_{IoU}}\ } \in [0, +\infty)$

离群度小意味着锚框质量高我们为其分配一个小的梯度增益以便使边界框回归聚焦到普通质量的锚框上。对离群度较大的锚框分配较小的梯度增益将有效防止低质量示例产生较大的有害梯度。我们利用 $\beta$ 构造了一个非单调聚焦系数并将其应用于WIoU v1

$\mathcal{L}_{WIoUv3}= r\mathcal{L}_{WIoUv1},\ r=\frac{\beta}{\delta \alpha^{\beta - \delta}}$

其中当 $\beta = \delta$ 时 $\delta$ 使得 $r=1$ 。当锚框的离群程度满足 $\beta = C$ ( $C$ 为定值)时锚框将获得最高的梯度增益。由于 $\overline{\mathcal{L}_{IoU}}$ 是动态的锚框的质量划分标准也是动态的这使得 WIoU v3 在每一时刻都能做出最符合当前情况的梯度增益分配策略

为了防止低质量锚框在训练初期落后我们初始化 $\overline{\mathcal{L}_{IoU}}=1$ 使得 $\mathcal{L}_{IoU}=1$ 的锚框具有最高的梯度增益。为了在训练的早期阶段保持这样的策略需要设置一个小的动量 $m$ 来延迟 $\overline{\mathcal{L}_{IoU}}$ 接近真实值 $\overline{\mathcal{L}_{IoU-real}}$ 的时间。对于 batch size 为 $n$ 的训练我们建议将动量设置为

$m = 1- \sqrt[tn]{0.5},\ tn > 7000$

这种设置使得经过 $t$ 轮训练后有 $\overline{\mathcal{L}_{IoU}} = 0.5(1+\overline{\mathcal{L}_{IoU-real}})$ 。在训练的中后期WIoU v3 将小梯度增益分配给低质量的锚框以减少有害梯度。同时 WIoU v3 会聚焦于普通质量的锚框提高模型的定位性能

核心代码

下面这个类可以计算现有的边界框损失 (IoUGIoUDIoUCIoUEIoUSIoUWIoU)核心的类变量有

iou_mean即 $\overline{\mathcal{L}_{IoU}}$ 的滑动平均值每次程序刚开始运行时初始化为 1。如果训练中断导致该值重置需要将该值恢复为中断前的值否则会导致性能增速下降
monotonus其指示了边界框损失使用单调聚焦机制 (e.g., WIoU v2) 或是非单调聚焦机制 (e.g., WIoU v3)具体看该类的文档
_momentum遵循 $m = 1- \sqrt[tn]{0.5},\ tn > 7000$ 的设置。当 $m$ 足够小时验证集的 IoU 基本不影响 $\overline{\mathcal{L}_{IoU}}$ 的值此时不需要使用 eval 和 train 函数指定训练模式；否则需要使用 eval 和 train 函数指定训练模式

此外聚焦机制会对边界框损失的值进行缩放具体通过实例方法 _scaled_loss 实现

import math

import torch


class IoU_Cal:
    ''' pred, target: x0,y0,x1,y1
        monotonous: {
            None: origin
            True: monotonic FM
            False: non-monotonic FM}'''
    iou_mean = 1.
    monotonous = False
    _momentum = 1 - pow(0.5, exp=1 / 7000)
    _is_train = True

    def __init__(self, pred, target):
        self.pred, self.target = pred, target
        self._fget = {
            # x,y,w,h
            'pred_xy': lambda: (self.pred[..., :2] + self.pred[..., 2: 4]) / 2,
            'pred_wh': lambda: self.pred[..., 2: 4] - self.pred[..., :2],
            'target_xy': lambda: (self.target[..., :2] + self.target[..., 2: 4]) / 2,
            'target_wh': lambda: self.target[..., 2: 4] - self.target[..., :2],
            # x0,y0,x1,y1
            'min_coord': lambda: torch.minimum(self.pred[..., :4], self.target[..., :4]),
            'max_coord': lambda: torch.maximum(self.pred[..., :4], self.target[..., :4]),
            # The overlapping region
            'wh_inter': lambda: self.min_coord[..., 2: 4] - self.max_coord[..., :2],
            's_inter': lambda: torch.prod(torch.relu(self.wh_inter), dim=-1),
            # The area covered
            's_union': lambda: torch.prod(self.pred_wh, dim=-1) +
                               torch.prod(self.target_wh, dim=-1) - self.s_inter,
            # The smallest enclosing box
            'wh_box': lambda: self.max_coord[..., 2: 4] - self.min_coord[..., :2],
            's_box': lambda: torch.prod(self.wh_box, dim=-1),
            'l2_box': lambda: torch.square(self.wh_box).sum(dim=-1),
            # The central points' connection of the bounding boxes
            'd_center': lambda: self.pred_xy - self.target_xy,
            'l2_center': lambda: torch.square(self.d_center).sum(dim=-1),
            # IoU
            'iou': lambda: 1 - self.s_inter / self.s_union
        }
        self._update(self)

    def __setitem__(self, key, value):
        self._fget[key] = value

    def __getattr__(self, item):
        if callable(self._fget[item]):
            self._fget[item] = self._fget[item]()
        return self._fget[item]

    @classmethod
    def train(cls):
        cls._is_train = True

    @classmethod
    def eval(cls):
        cls._is_train = False

    @classmethod
    def _update(cls, self):
        if cls._is_train: cls.iou_mean = (1 - cls._momentum) * cls.iou_mean + \
                                         cls._momentum * self.iou.detach().mean().item()

    def _scaled_loss(self, loss, gamma=1.9, delta=3):
        if isinstance(self.monotonous, bool):
            if self.monotonous:
                loss *= (self.iou.detach() / self.iou_mean).sqrt()
            else:
                beta = self.iou.detach() / self.iou_mean
                alpha = delta * torch.pow(gamma, beta - delta)
                loss *= beta / alpha
        return loss

    @classmethod
    def IoU(cls, pred, target, self=None):
        self = self if self else cls(pred, target)
        return self.iou

    @classmethod
    def WIoU(cls, pred, target, self=None):
        self = self if self else cls(pred, target)
        dist = torch.exp(self.l2_center / self.l2_box.detach())
        return self._scaled_loss(dist * self.iou)

    @classmethod
    def EIoU(cls, pred, target, self=None):
        self = self if self else cls(pred, target)
        penalty = self.l2_center / self.l2_box.detach() \
                  + torch.square(self.d_center / self.wh_box.detach()).sum(dim=-1)
        return self._scaled_loss(self.iou + penalty)

    @classmethod
    def GIoU(cls, pred, target, self=None):
        self = self if self else cls(pred, target)
        return self._scaled_loss(self.iou + (self.s_box - self.s_union) / self.s_box)

    @classmethod
    def DIoU(cls, pred, target, self=None):
        self = self if self else cls(pred, target)
        return self._scaled_loss(self.iou + self.l2_center / self.l2_box)

    @classmethod
    def CIoU(cls, pred, target, eps=1e-4, self=None):
        self = self if self else cls(pred, target)
        v = 4 / math.pi ** 2 * \
            (torch.atan(self.pred_wh[..., 0] / (self.pred_wh[..., 1] + eps)) -
             torch.atan(self.target_wh[..., 0] / (self.target_wh[..., 1] + eps))) ** 2
        alpha = v / (self.iou + v)
        return self._scaled_loss(self.iou + self.l2_center / self.l2_box + alpha.detach() * v)

    @classmethod
    def SIoU(cls, pred, target, theta=4, self=None):
        self = self if self else cls(pred, target)
        # Angle Cost
        angle = torch.arcsin(torch.abs(self.d_center).min(dim=-1)[0] / (self.l2_center.sqrt() + 1e-4))
        angle = torch.sin(2 * angle) - 2
        # Dist Cost
        dist = angle[..., None] * torch.square(self.d_center / self.wh_box)
        dist = 2 - torch.exp(dist[..., 0]) - torch.exp(dist[..., 1])
        # Shape Cost
        d_shape = torch.abs(self.pred_wh - self.target_wh)
        big_shape = torch.maximum(self.pred_wh, self.target_wh)
        w_shape = 1 - torch.exp(- d_shape[..., 0] / big_shape[..., 0])
        h_shape = 1 - torch.exp(- d_shape[..., 1] / big_shape[..., 1])
        shape = w_shape ** theta + h_shape ** theta
        return self._scaled_loss(self.iou + (dist + shape) / 2)

在将 WIoU v3 引进 YOLOv7 时先在 train_aux.py 中找到损失函数的位置。其中 ComputeLoss 是在 eval 的时候用的不用管ComputeLossAuxOTA 是 train 的时候用的找到其源代码并进行修改

在初始化函数动一下手脚指定使用的损失函数

再修改 __call__ 函数 (修改的行已用书签标注出)

再找到 bbox_iou 函数的所在位置修改边界框损失的计算方法

def bbox_iou(box1, box2, type_, x1y1x2y2=True):
    # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4
    box2 = box2.T

    # Get the coordinates of bounding boxes
    if x1y1x2y2:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
    else:  # transform from xywh to xyxy
        b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
        b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
        b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
        b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2

    # 将边界框信息拼接
    b1 = torch.stack([b1_x1, b1_y1, b1_x2, b1_y2], dim=-1)
    b2 = torch.stack([b2_x1, b2_y1, b2_x2, b2_y2], dim=-1)

    self = IoU_Cal(b1, b2)
    loss = getattr(IoU_Cal, type_)(b1, b2, self=self)
    iou = 1 - self.iou

    return loss, iou

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

返回列表

上一篇：JavaScript 表单

下一篇：Shell脚本之文本处理三剑客——awk