PyTorch 中使用 Eager 模式的静态量化(beta)

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6

PyTorch 中使用 Eager 模式的静态量化beta

本教程说明了如何进行训练后的静态量化并说明了两种更先进的技术-每通道量化和量化感知训练-可以进一步提高模型的准确率。 请注意目前仅支持 CPU 量化因此在本教程中我们将不使用 GPU/CUDA。

在本教程结束时您将看到 PyTorch 中的量化如何导致模型大小显着减小同时提高速度。 此外您将在此处看到如何轻松应用显示的一些高级量化技术以使您的量化模型受到的准确率影响要小得多。

警告我们使用了许多其他 PyTorch 仓库中的样板代码例如定义了MobileNetV2模型架构定义了数据加载器等等。 我们当然鼓励您阅读它 但是如果要使用量化功能请随时跳到“ 4。 训练后静态量化”部分。

我们将从进行必要的导入开始

import numpy as np
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torchvision import datasets
import torchvision.transforms as transforms
import os
import time
import sys
import torch.quantization

# # Setup warnings
import warnings
warnings.filterwarnings(
    action='ignore',
    category=DeprecationWarning,
    module=r'.*'
)
warnings.filterwarnings(
    action='default',
    module=r'torch.quantization'
)

# Specify random seed for repeatable results
torch.manual_seed(191009)

1.模型架构

我们首先定义 MobileNetV2 模型架构并进行了一些值得注意的修改以实现量化

  • nn.quantized.FloatFunctional代替添加
  • 在网络的开头和结尾处插入QuantStubDeQuantStub
  • 用 ReLU 替换 ReLU6

注意此代码取自此处

from torch.quantization import QuantStub, DeQuantStub

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_planes, momentum=0.1),
            # Replace with ReLU
            nn.ReLU(inplace=False)
        )

class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup, momentum=0.1),
        ])
        self.conv = nn.Sequential(*layers)
        # Replace torch.add with floatfunctional
        self.skip_add = nn.quantized.FloatFunctional()

    def forward(self, x):
        if self.use_res_connect:
            return self.skip_add.add(x, self.conv(x))
        else:
            return self.conv(x)

class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):
        """
        MobileNet V2 main class

        Args:
            num_classes (int): Number of classes
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            inverted_residual_setting: Network structure
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
            Set to 1 to turn off rounding
        """
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280

        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]

        # only check the first element, assuming user knows t,c,n,s are required
        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))

        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU(3, input_channel, stride=2)]
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)
        self.quant = QuantStub()
        self.dequant = DeQuantStub()
        # building classifier
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):

        x = self.quant(x)

        x = self.features(x)
        x = x.mean([2, 3])
        x = self.classifier(x)
        x = self.dequant(x)
        return x

    # Fuse Conv+BN and Conv+BN+Relu modules prior to quantization
    # This operation does not change the numerics
    def fuse_model(self):
        for m in self.modules():
            if type(m) == ConvBNReLU:
                torch.quantization.fuse_modules(m, ['0', '1', '2'], inplace=True)
            if type(m) == InvertedResidual:
                for idx in range(len(m.conv)):
                    if type(m.conv[idx]) == nn.Conv2d:
                        torch.quantization.fuse_modules(m.conv, [str(idx), str(idx + 1)], inplace=True)

2.辅助函数

接下来我们定义一些助手函数以帮助模型评估。 这些主要来自这里

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)

def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res

def evaluate(model, criterion, data_loader, neval_batches):
    model.eval()
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    cnt = 0
    with torch.no_grad():
        for image, target in data_loader:
            output = model(image)
            loss = criterion(output, target)
            cnt += 1
            acc1, acc5 = accuracy(output, target, topk=(1, 5))
            print('.', end = '')
            top1.update(acc1[0], image.size(0))
            top5.update(acc5[0], image.size(0))
            if cnt >= neval_batches:
                 return top1, top5

    return top1, top5

def load_model(model_file):
    model = MobileNetV2()
    state_dict = torch.load(model_file)
    model.load_state_dict(state_dict)
    model.to('cpu')
    return model

def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

3.定义数据集和数据加载器

作为最后的主要设置步骤我们为训练和测试集定义了数据加载器。

ImageNet 数据

我们为本教程创建的特定数据集仅包含来自 ImageNet 数据的 1000 张图像每个类别都有一张此数据集的大小刚好超过 250 MB可以相对轻松地下载。 此自定义数据集的 URL 为

https://s3.amazonaws.com/pytorch-tutorial-assets/imagenet_1k.zip

要使用 Python 在本地下载此数据可以使用

import requests

url = 'https://s3.amazonaws.com/pytorch-tutorial-assets/imagenet_1k.zip`
filename = '~/Downloads/imagenet_1k_data.zip'

r = requests.get(url)

with open(filename, 'wb') as f:
    f.write(r.content)

为了运行本教程我们下载了这些数据并使用 Makefile 中的这些行将其移到正确的位置。

另一方面要使用整个 ImageNet 数据集运行本教程中的代码可以在此之后使用torchvision下载数据。 例如要下载训练集并对其进行一些标准转换可以使用

import torchvision
import torchvision.transforms as transforms

imagenet_dataset = torchvision.datasets.ImageNet(
    '~/.data/imagenet',
    split='train',
    download=True,
    transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225]),
    ])

下载完数据后我们在下面显示了一些函数这些函数定义了将用于读取此数据的数据加载器。 这些函数主要来自此处

def prepare_data_loaders(data_path):

    traindir = os.path.join(data_path, 'train')
    valdir = os.path.join(data_path, 'val')
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    dataset = torchvision.datasets.ImageFolder(
        traindir,
        transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ]))

    dataset_test = torchvision.datasets.ImageFolder(
        valdir,
        transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            normalize,
        ]))

    train_sampler = torch.utils.data.RandomSampler(dataset)
    test_sampler = torch.utils.data.SequentialSampler(dataset_test)

    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=train_batch_size,
        sampler=train_sampler)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=eval_batch_size,
        sampler=test_sampler)

    return data_loader, data_loader_test

接下来我们将加载经过预先​​训练的 MobileNetV2 模型。 我们在这里提供用于从torchvision中下载数据的 URL

data_path = 'data/imagenet_1k'
saved_model_dir = 'data/'
float_model_file = 'mobilenet_pretrained_float.pth'
scripted_float_model_file = 'mobilenet_quantization_scripted.pth'
scripted_quantized_model_file = 'mobilenet_quantization_scripted_quantized.pth'

train_batch_size = 30
eval_batch_size = 30

data_loader, data_loader_test = prepare_data_loaders(data_path)
criterion = nn.CrossEntropyLoss()
float_model = load_model(saved_model_dir + float_model_file).to('cpu')

接下来我们将“融合模块” 通过节省内存访问量这可以使模型更快同时还可以提高数值精度。 尽管这可以用于任何模型但在量化模型中尤为常见。

print('\n Inverted Residual Block: Before fusion \n\n', float_model.features[1].conv)
float_model.eval()

# Fuses modules
float_model.fuse_model()

# Note fusion of Conv+BN+Relu and Conv+Relu
print('\n Inverted Residual Block: After fusion\n\n',float_model.features[1].conv)

Inverted Residual Block: Before fusion

 Sequential(
  (0): ConvBNReLU(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

 Inverted Residual Block: After fusion

 Sequential(
  (0): ConvBNReLU(
    (0): ConvReLU2d(
      (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
      (1): ReLU()
    )
    (1): Identity()
    (2): Identity()
  )
  (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1))
  (2): Identity()
)

最后为了获得“基准”精度让我们看看带有融合模块的未量化模型的精度

num_eval_batches = 10

print("Size of baseline model")
print_size_of_model(float_model)

top1, top5 = evaluate(float_model, criterion, data_loader_test, neval_batches=num_eval_batches)
print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))
torch.jit.save(torch.jit.script(float_model), saved_model_dir + scripted_float_model_file)

Size of baseline model
Size (MB): 13.999657
..........Evaluation accuracy on 300 images, 77.67

我们看到 300 张图像的准确率达到 78%这是 ImageNet 的坚实基础特别是考虑到我们的模型只有 14.0 MB。

这将是我们比较的基准。 接下来让我们尝试不同的量化方法

4.训练后的静态量化

训练后的静态量化不仅涉及像动态量化中那样将权重从float转换为int而且还执行额外的步骤即首先通过网络馈送一批数据并计算不同激活的结果分布具体来说这通过在记录此数据的不同点插入观察者模块来完成。 然后使用这些分布来确定在推理时如何具体量化不同的激活一种简单的技术是将整个激活范围简单地划分为 256 个级别但我们也支持更复杂的方法。 重要的是此附加步骤使我们能够在操作之间传递量化值而不是在每次操作之间将这些值转换为浮点数然后再转换为整数从而显着提高了速度。

num_calibration_batches = 10

myModel = load_model(saved_model_dir + float_model_file).to('cpu')
myModel.eval()

# Fuse Conv, bn and relu
myModel.fuse_model()

# Specify quantization configuration
# Start with simple min/max range estimation and per-tensor quantization of weights
myModel.qconfig = torch.quantization.default_qconfig
print(myModel.qconfig)
torch.quantization.prepare(myModel, inplace=True)

# Calibrate first
print('Post Training Quantization Prepare: Inserting Observers')
print('\n Inverted Residual Block:After observer insertion \n\n', myModel.features[1].conv)

# Calibrate with the training set
evaluate(myModel, criterion, data_loader, neval_batches=num_calibration_batches)
print('Post Training Quantization: Calibration done')

# Convert to quantized model
torch.quantization.convert(myModel, inplace=True)
print('Post Training Quantization: Convert done')
print('\n Inverted Residual Block: After fusion and quantization, note fused modules: \n\n',myModel.features[1].conv)

print("Size of model after quantization")
print_size_of_model(myModel)

top1, top5 = evaluate(myModel, criterion, data_loader_test, neval_batches=num_eval_batches)
print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))

QConfig(activation=functools.partial(<class 'torch.quantization.observer.MinMaxObserver'>, reduce_range=True), weight=functools.partial(<class 'torch.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric))
Post Training Quantization Prepare: Inserting Observers

 Inverted Residual Block:After observer insertion

 Sequential(
  (0): ConvBNReLU(
    (0): ConvReLU2d(
      (0): Conv2d(
        32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32
        (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
      )
      (1): ReLU(
        (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
      )
    )
    (1): Identity()
    (2): Identity()
  )
  (1): Conv2d(
    32, 16, kernel_size=(1, 1), stride=(1, 1)
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (2): Identity()
)
..........Post Training Quantization: Calibration done
Post Training Quantization: Convert done

 Inverted Residual Block: After fusion and quantization, note fused modules:

 Sequential(
  (0): ConvBNReLU(
    (0): QuantizedConvReLU2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.1516050398349762, zero_point=0, padding=(1, 1), groups=32)
    (1): Identity()
    (2): Identity()
  )
  (1): QuantizedConv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), scale=0.17719413340091705, zero_point=63)
  (2): Identity()
)
Size of model after quantization
Size (MB): 3.631847
..........Evaluation accuracy on 300 images, 66.67

对于这个量化模型我们发现在这 300 张相同的图像上准确率仅低至约 62%。 不过我们确实将模型的大小减小到了 3.6 MB 以下几乎减少了 4 倍。

此外我们可以通过使用不同的量化配置来显着提高准确率。 我们使用推荐的配置对 x86 架构进行量化重复相同的练习。 此配置执行以下操作

  • 量化每个通道的权重
  • 使用直方图观察器该直方图观察器收集激活的直方图然后以最佳方式选择量化参数。
per_channel_quantized_model = load_model(saved_model_dir + float_model_file)
per_channel_quantized_model.eval()
per_channel_quantized_model.fuse_model()
per_channel_quantized_model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
print(per_channel_quantized_model.qconfig)

torch.quantization.prepare(per_channel_quantized_model, inplace=True)
evaluate(per_channel_quantized_model,criterion, data_loader, num_calibration_batches)
torch.quantization.convert(per_channel_quantized_model, inplace=True)
top1, top5 = evaluate(per_channel_quantized_model, criterion, data_loader_test, neval_batches=num_eval_batches)
print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))
torch.jit.save(torch.jit.script(per_channel_quantized_model), saved_model_dir + scripted_quantized_model_file)

QConfig(activation=functools.partial(<class 'torch.quantization.observer.HistogramObserver'>, reduce_range=True), weight=functools.partial(<class 'torch.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric))
....................Evaluation accuracy on 300 images, 74.67

仅更改这种量化配置方法就可以将准确率提高到 76% 以上 尽管如此这仍比上述 78% 的基准差 1-2%。 因此让我们尝试量化意识的训练。

5.量化感知的训练

量化感知的训练QAT是通常导致最高准确率的量化方法。 使用 QAT在训练的正向和反向过程中所有权重和激活都被“伪量化”即浮点值四舍五入以模拟int8值但所有计算仍使用浮点数完成。 因此在训练过程中进行所有权重调整同时“意识到”该模型将最终被量化的事实。 因此在量化之后此方法通常会比动态量化或训练后静态量化产生更高的精度。

实际执行 QAT 的总体工作流程与之前非常相似

  • 我们可以使用与以前相同的模型量化感知的训练不需要额外的准备。
  • 我们需要使用qconfig来指定要在权重和激活之后插入哪种伪量化而不是指定观察者

我们首先定义一个训练函数

def train_one_epoch(model, criterion, optimizer, data_loader, device, ntrain_batches):
    model.train()
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    avgloss = AverageMeter('Loss', '1.5f')

    cnt = 0
    for image, target in data_loader:
        start_time = time.time()
        print('.', end = '')
        cnt += 1
        image, target = image.to(device), target.to(device)
        output = model(image)
        loss = criterion(output, target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        acc1, acc5 = accuracy(output, target, topk=(1, 5))
        top1.update(acc1[0], image.size(0))
        top5.update(acc5[0], image.size(0))
        avgloss.update(loss, image.size(0))
        if cnt >= ntrain_batches:
            print('Loss', avgloss.avg)

            print('Training: * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
                  .format(top1=top1, top5=top5))
            return

    print('Full imagenet train set:  * Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f}'
          .format(top1=top1, top5=top5))
    return

我们像以前一样融合模块

qat_model = load_model(saved_model_dir + float_model_file)
qat_model.fuse_model()

optimizer = torch.optim.SGD(qat_model.parameters(), lr = 0.0001)
qat_model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')

最后prepare_qat执行“伪量化”为量化感知训练准备模型

torch.quantization.prepare_qat(qat_model, inplace=True)
print('Inverted Residual Block: After preparation for QAT, note fake-quantization modules \n',qat_model.features[1].conv)

Inverted Residual Block: After preparation for QAT, note fake-quantization modules
 Sequential(
  (0): ConvBNReLU(
    (0): ConvBnReLU2d(
      32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
      (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (weight_fake_quant): FakeQuantize(
        fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_channel_symmetric, ch_axis=0,         scale=tensor([1.]), zero_point=tensor([0])
        (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
      )
      (activation_post_process): FakeQuantize(
        fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1,         scale=tensor([1.]), zero_point=tensor([0])
        (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
      )
    )
    (1): Identity()
    (2): Identity()
  )
  (1): ConvBn2d(
    32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False
    (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (weight_fake_quant): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_channel_symmetric, ch_axis=0,         scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
    )
    (activation_post_process): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1,         scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
    )
  )
  (2): Identity()
)

高精度训练量化模型需要在推断时对数字进行精确建模。 因此对于量化感知的训练我们通过以下方式修改训练循环

  • 在训练快要结束时切换批量规范以使用运行均值和方差以更好地匹配推理数字。
  • 我们还冻结了量化器参数比例和零点并对权重进行了微调。
num_train_batches = 20

# Train and check accuracy after each epoch
for nepoch in range(8):
    train_one_epoch(qat_model, criterion, optimizer, data_loader, torch.device('cpu'), num_train_batches)
    if nepoch > 3:
        # Freeze quantizer parameters
        qat_model.apply(torch.quantization.disable_observer)
    if nepoch > 2:
        # Freeze batch norm mean and variance estimates
        qat_model.apply(torch.nn.intrinsic.qat.freeze_bn_stats)

    # Check the accuracy after each epoch
    quantized_model = torch.quantization.convert(qat_model.eval(), inplace=False)
    quantized_model.eval()
    top1, top5 = evaluate(quantized_model,criterion, data_loader_test, neval_batches=num_eval_batches)
    print('Epoch %d :Evaluation accuracy on %d images, %2.2f'%(nepoch, num_eval_batches * eval_batch_size, top1.avg))

....................Loss tensor(2.0747, grad_fn=<DivBackward0>)
Training: * Acc@1 56.167 Acc@5 77.333
..........Epoch 0 :Evaluation accuracy on 300 images, 77.67
....................Loss tensor(2.0358, grad_fn=<DivBackward0>)
Training: * Acc@1 54.833 Acc@5 78.500
..........Epoch 1 :Evaluation accuracy on 300 images, 77.00
....................Loss tensor(2.0417, grad_fn=<DivBackward0>)
Training: * Acc@1 54.667 Acc@5 77.333
..........Epoch 2 :Evaluation accuracy on 300 images, 74.67
....................Loss tensor(1.9055, grad_fn=<DivBackward0>)
Training: * Acc@1 56.833 Acc@5 78.667
..........Epoch 3 :Evaluation accuracy on 300 images, 76.33
....................Loss tensor(1.9055, grad_fn=<DivBackward0>)
Training: * Acc@1 58.167 Acc@5 80.000
..........Epoch 4 :Evaluation accuracy on 300 images, 77.00
....................Loss tensor(1.7821, grad_fn=<DivBackward0>)
Training: * Acc@1 60.500 Acc@5 82.833
..........Epoch 5 :Evaluation accuracy on 300 images, 76.33
....................Loss tensor(1.8145, grad_fn=<DivBackward0>)
Training: * Acc@1 58.833 Acc@5 82.333
..........Epoch 6 :Evaluation accuracy on 300 images, 74.33
....................Loss tensor(1.6930, grad_fn=<DivBackward0>)
Training: * Acc@1 63.000 Acc@5 81.333
..........Epoch 7 :Evaluation accuracy on 300 images, 75.67

在这里我们只对少数几个周期执行量化感知训练。 尽管如此量化感知的训练在整个 imagenet 数据集上的准确率仍超过 71%接近浮点精度 71.9%。

有关量化感知的训练的更多信息

  • QAT 是训练后量化技术的超集可以进行更多调试。 例如我们可以分析模型的准确率是否受到权重或激活量化的限制。
  • 由于我们使用伪量化来对实际量化算术的数值建模因此我们还可以在浮点中模拟量化模型的准确率。
  • 我们也可以轻松地模拟训练后量化。

来自量化的加速

最后让我们确认一下我们上面提到的内容量化模型实际上执行推理的速度更快吗 让我们测试一下

def run_benchmark(model_file, img_loader):
    elapsed = 0
    model = torch.jit.load(model_file)
    model.eval()
    num_batches = 5
    # Run the scripted model on a few batches of images
    for i, (images, target) in enumerate(img_loader):
        if i < num_batches:
            start = time.time()
            output = model(images)
            end = time.time()
            elapsed = elapsed + (end-start)
        else:
            break
    num_images = images.size()[0] * num_batches

    print('Elapsed time: %3.0f ms' % (elapsed/num_images*1000))
    return elapsed

run_benchmark(saved_model_dir + scripted_float_model_file, data_loader_test)

run_benchmark(saved_model_dir + scripted_quantized_model_file, data_loader_test)

Elapsed time:   7 ms
Elapsed time:   4 ms

在 MacBook Pro 上本地运行此程序常规模型的运行时间为 61 毫秒而量化模型的运行时间仅为 20 毫秒这说明了量化模型与浮点模型相比典型的 2-4 倍加速。

总结

在本教程中我们展示了两种量化方法-训练后静态量化和量化感知训练-描述它们在“幕后”进行的操作以及如何在 PyTorch 中使用它们。

阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6