Title: Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning
Abstract: Computing an orthogonal basis that approximates the range or corange of a matrix is a ubiquitous problem in computational science and engineering. In numerous applications, a rapid decay of singular values permits the use of such bases to approximate a linear operator by restricting it to low-rank subspaces, thereby significantly reducing computational and storage demands. A powerful approach for constructing a basis with a specified rank or approximation tolerance is the (adaptive) randomized range finder. In this talk, we introduce a novel variant of this algorithm, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators. This development is motivated by its potential to substantially lower memory requirements during the training of deep neural networks such as transformers. We discuss the GaLore (Gradient Low-Rank Projection) training framework, and demonstrate how the randomized range finder can be employed to derive low-rank representations of optimizer states. Further potential avenues for future research are discussed.
Carolin Penke (Forschungszentrum Jülich)
SeMath
Colloquium