Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

Convolutional neural networks (CNN) have recently shown
outstanding image classification performance in the largescale
visual recognition challenge (ILSVRC2012). The success
of CNNs is attributed to their ability to learn rich midlevel
image representations as opposed to hand-designed
low-level features used in other image classification methods.
Learning CNNs, however, amounts to estimating millions
of parameters and requires a very large number of
annotated image samples. This property currently prevents
application of CNNs to problems with limited training data.
In this work we show how image representations learned
with CNNs on large-scale annotated datasets can be effi-
ciently transferred to other visual recognition tasks with
limited amount of training data. We design a method to
reuse layers trained on the ImageNet dataset to compute
mid-level image representation for images in the PASCAL
VOC dataset. We show that despite differences in image
statistics and tasks in the two datasets, the transferred representation
leads to significantly improved results for object
and action classification, outperforming the current state of
the art on Pascal VOC 2007 and 2012 datasets. We also
show promising results for object and action localization.

Source: http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Oquab_Learning_and_Transferring_2014_CVPR_paper.pdf

How transferable are features in deep neural networks?

Many deep neural networks trained on natural images exhibit a curious phenomenon
in common: on the first layer they learn features similar to Gabor filters
and color blobs. Such first-layer features appear not to be specific to a particular
dataset or task, but general in that they are applicable to many datasets and tasks.
Features must eventually transition from general to specific by the last layer of
the network, but this transition has not been studied extensively. In this paper we
experimentally quantify the generality versus specificity of neurons in each layer
of a deep convolutional neural network and report a few surprising results. Transferability
is negatively affected by two distinct issues: (1) the specialization of
higher layer neurons to their original task at the expense of performance on the
target task, which was expected, and (2) optimization difficulties related to splitting
networks between co-adapted neurons, which was not expected. In an example
network trained on ImageNet, we demonstrate that either of these two issues
may dominate, depending on whether features are transferred from the bottom,
middle, or top of the network. We also document that the transferability of features
decreases as the distance between the base task and target task increases, but
that transferring features even from distant tasks can be better than using random
features. A final surprising result is that initializing a network with transferred
features from almost any number of layers can produce a boost to generalization
that lingers even after fine-tuning to the target dataset.

Source: http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf

Visualizing and Understanding Convolutional Networks

Large Convolutional Network models have
recently demonstrated impressive classification
performance on the ImageNet benchmark
(Krizhevsky et al., 2012). However
there is no clear understanding of why they
perform so well, or how they might be improved.
In this paper we address both issues.
We introduce a novel visualization technique
that gives insight into the function of intermediate
feature layers and the operation of
the classifier. Used in a diagnostic role, these
visualizations allow us to find model architectures
that outperform Krizhevsky et al. on
the ImageNet classification benchmark. We
also perform an ablation study to discover
the performance contribution from different
model layers. We show our ImageNet model
generalizes well to other datasets: when the
softmax classifier is retrained, it convincingly
beats the current state-of-the-art results on
Caltech-101 and Caltech-256 datasets.

Source: https://arxiv.org/pdf/1311.2901.pdf

Understanding Synthetic Gradients and Decoupled Neural Interfaces

When training neural networks, the use of Synthetic
Gradients (SG) allows layers or modules
to be trained without update locking – without
waiting for a true error gradient to be backpropagated
– resulting in Decoupled Neural Interfaces
(DNIs). This unlocked ability of being
able to update parts of a neural network asynchronously
and with only local information was
demonstrated to work empirically in Jaderberg
et al. (2016). However, there has been very little
demonstration of what changes DNIs and SGs
impose from a functional, representational, and
learning dynamics point of view. In this paper,
we study DNIs through the use of synthetic gradients
on feed-forward networks to better understand
their behaviour and elucidate their effect
on optimisation. We show that the incorporation
of SGs does not affect the representational
strength of the learning system for a neural network,
and prove the convergence of the learning
system for linear and deep linear models. On
practical problems we investigate the mechanism
by which synthetic gradient estimators approximate
the true loss, and, surprisingly, how that
leads to drastically different layer-wise representations.
Finally, we also expose the relationship
of using synthetic gradients to other error approximation
techniques and find a unifying language
for discussion and comparison.

Source: https://arxiv.org/pdf/1703.00522.pdf