Understanding Synthetic Gradients and Decoupled Neural Interfaces

When training neural networks, the use of Synthetic
Gradients (SG) allows layers or modules
to be trained without update locking – without
waiting for a true error gradient to be backpropagated
– resulting in Decoupled Neural Interfaces
(DNIs). This unlocked ability of being
able to update parts of a neural network asynchronously
and with only local information was
demonstrated to work empirically in Jaderberg
et al. (2016). However, there has been very little
demonstration of what changes DNIs and SGs
impose from a functional, representational, and
learning dynamics point of view. In this paper,
we study DNIs through the use of synthetic gradients
on feed-forward networks to better understand
their behaviour and elucidate their effect
on optimisation. We show that the incorporation
of SGs does not affect the representational
strength of the learning system for a neural network,
and prove the convergence of the learning
system for linear and deep linear models. On
practical problems we investigate the mechanism
by which synthetic gradient estimators approximate
the true loss, and, surprisingly, how that
leads to drastically different layer-wise representations.
Finally, we also expose the relationship
of using synthetic gradients to other error approximation
techniques and find a unifying language
for discussion and comparison.

Source: https://arxiv.org/pdf/1703.00522.pdf

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s