Blockwise Parallel Decoding for Deep Autoregressive Models

Better & Faster Large Language Models via Multi-token Prediction