Cookies?
Library Header Image
LSE Research Online LSE Library Services

Communication-efficient stochastic gradient descent, with applications to neural networks

Alistarh, Dan, Grubic, Demjan, Liu, Jerry, Tomioka, Ryota and Vojnovic, Milan ORCID: 0000-0003-1382-022X (2017) Communication-efficient stochastic gradient descent, with applications to neural networks. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. and Garnett, R., (eds.) Advances in Neural Information Processing Systems 30. Curran Associates, Inc., Long Beach, CA, pp. 1707-1718.

Full text not available from this repository.

Abstract

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always guarantee convergence, and it is not clear whether they can be improved. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes for gradient updates which provides convergence guarantees. QSGD allows the user to smoothly trade off \emph{communication bandwidth} and \emph{convergence time}: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance. We show that this trade-off is inherent, in the sense that improving it past some threshold would violate information-theoretic lower bounds. QSGD guarantees convergence for convex and non-convex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques. When applied to training deep neural networks for image classification and automated speech recognition, QSGD leads to significant reductions in end-to-end training time. For example, on 16GPUs, we can train the ResNet152 network to full accuracy on ImageNet 1.8x faster than the full-precision variant.

Item Type: Book Section
Official URL: http://papers.nips.cc/paper/6768-communication-eff...
Additional Information: © 2017 Neural Information Processing Systems Foundation, Inc.
Divisions: Statistics
Subjects: H Social Sciences > HA Statistics
Date Deposited: 23 Nov 2017 10:54
Last Modified: 20 Dec 2024 00:17
URI: http://eprints.lse.ac.uk/id/eprint/85698

Actions (login required)

View Item View Item