WMT14 English-German

Translation Given Non-Autoregressive a source sentence x, an AT model generates each target word yt conditioned on previously generated ones y<t, leading to high latency on the decoding stage.

BibTex: