2017 hypernetworks 笔记

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

HYPERNETWORKS

这篇文章来自谷歌的一篇文章

Introduction

这篇文章中提出了一种方法：使用一个小网络（hypernetwork），小网络的作用是给一个larger network（main network）来生成权重，这个main network和其他文章的模型是一样的，即将一些原始的输入映射到一个期望的目标上。

hypernetwork的输入是权重的结构信息，注意这并没有原始数据的信息。

2017 hypernetworks 笔记_ci

这篇文章与HyperNEAT不一样

在HyperNEAT中，hypernetworks的输入是每一个权重对应的虚拟位置（virtual coordinates）；而在这篇文章中，输入是用来描述整个给定层weights的embedding vector，the input is an embedding vector that describes the entire weights of a given layer。

Our embedding vectors can be fifixed parameters that are also learned during end-to-end training, allowing approximate weight-sharing within a layer and across layers of the main network.

embedding vector 可以是静态的，也同样可以进行学习，权重更新的方式是通过在一个layer中进行权重共享，或者main network中进行跨层的权重共享。
embedding vector 还可以动态地生成，允许循环神经网络中的权重能够根据时间戳、输入序列而自适应。

这篇文章的方法使用end-to-end的方法进行训练，要更有效率：Most reported results using these methods, however, are in small scales, perhaps because they are both slow to train and require heuristics to be effificient. The main difference between our approach and HyperNEAT is that hypernetworks in our approach are trained end-to-end with gradient descent together with the main network, and therefore are more effificient.

Hypernetwork能够给LSTM生成非共享的参数，在一些LSTM的任务上超过了标准的LSTM；在一些图片分类的任务上，hypernet也能用来生成卷积神经网络中的参数，这个参数信息

MOTIVATION AND RELATED WORK

作者的motivation是这样的：

evolutionary computing: It is diffificult to directly operate in large search spaces consisting of millions of weight parameters, a more effificient method is to evolve a smaller network to generate the structure of weights for a larger network, so that the search is constrained within the much smaller weight space.

这篇文章的相关工作有三类：

HyperNEAT

Most reported results using these methods, however, are in small scales, perhaps because they are both slow to train and require heuristics to be effificient. The main difference between our approach and HyperNEAT is that hypernetworks in our approach are trained end-to-end with gradient descent together with the main network, and therefore are more effificient.

fast weights
Even before the work on HyperNEAT and DCT, Schmidhuber (1992; 1993) has suggested the concept of fast weights in which one network can produce context-dependent weight changes for a second network.
作者强调，沿着fast weight的一些文章的主要贡献在于卷积神经网络，而这篇文章中的贡献在于循环神经网络：These studies however did not explore the use of this approach to recurrent networks, which is a main contribution of our work.