Low-rank compression of neural nets: Learning the rank of each layer

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years.

BibTex: