Blockwise Parallel Decoding for Deep Autoregressive Models
Better & Faster Large Language Models via Multi-token Prediction