Featured

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA’s Nemotron-Labs Diffusion models aim to speed up text generation by drafting and refining tokens in parallel instead of one at a time.

12h agohuggingface.co2 min read

Hugging Face blog thumbnail for Nemotron-Labs Diffusion

Hugging Face’s post explains NVIDIA’s attempt to make language models faster and more flexible with diffusion-style decoding. The pitch is that one model can handle autoregressive, diffusion, and self-speculative generation, with speed gains and modest accuracy tradeoffs.

Most chatbots write one word piece at a time, like a person filling in a sentence slowly.

NVIDIA made a new kind of model that can guess several pieces at once, then fix them step by step. It is like sketching a whole drawing quickly, then sharpening the lines.

That can make the model faster. The article says it can also work in a few different ways, so developers can choose the balance between speed and accuracy.

A different way to generate text

The post says most LLMs still work autoregressively, producing one token at a time and depending on previous tokens. NVIDIA argues that this creates a speed bottleneck, especially for latency-sensitive workloads, because the GPU spends much of its time moving memory rather than doing useful computation.

Nemotron-Labs Diffusion uses a different approach. Instead of generating strictly left to right, it drafts multiple tokens in parallel and then refines them over several steps. NVIDIA says that lets the model better use modern GPUs, revise earlier tokens, and support fill-in-the-middle style tasks. It also gives developers a built-in way to trade off speed and inference cost by changing the number of refinement steps.

One model, three modes

The model family is designed to support three modes in one checkpoint: standard autoregressive generation, diffusion generation, and self-speculation. In self-speculation, the model drafts candidate tokens and then verifies them autoregressively. The post says this means developers can switch modes at deployment time without changing their application much.

What NVIDIA claims it achieves

NVIDIA says the 8B model slightly improves accuracy over Qwen3 8B, by 1.2% on average, while also improving throughput. The post claims diffusion mode reaches 2.6x higher tokens per forward pass than AR models, and self-speculation goes higher still. It also says the 8B model was trained on 1.3T pretraining tokens and 45B supervised fine-tuning tokens.

The piece presents the release as a practical step toward faster text generation, not just a lab curiosity. The key message is that diffusion-style models may become a usable production option alongside standard LLMs, especially where speed matters more than perfect fidelity to conventional decoding.

Key points

Nemotron-Labs Diffusion tries to speed up text generation with parallel drafting and refinement.
NVIDIA says the model family supports autoregressive, diffusion, and self-speculation modes.
The 8B model is claimed to improve accuracy versus Qwen3 8B while boosting throughput.
The post says the models were trained on 1.3T pretraining tokens and 45B fine-tuning tokens.
The release is positioned as a practical deployment option, not just a research demo.

Tagsai llms research open-source tools

An image labeled AI from The Verge article

1h agotheverge.com

Google’s new anything-to-anything AI model is wild

Google’s new Gemini Omni video model can edit and generate strikingly realistic clips, but the results are still glitchy and expensive.

13h agotechcrunch.com

AI is being used to resurrect the voices of dead pilots

AI tools were used to reconstruct cockpit voices from a public spectrogram, prompting the NTSB to temporarily close access to part of its docket system.

Researchers using AI to study neurological disease

15h agobbc.com

AI used to speed up search for motor neurone disease drugs

Researchers are using AI to spot existing drugs that might treat MND and other brain conditions, hoping to find treatments faster.

Screenshot of Pixel app icons with a disco-ball style

15h agotechcrunch.com

Google goes for the glitter with disco-ball icons: ‘Are y’all sure you still want this?’

Google has rolled out disco-ball-style Pixel icons after teasing them on X, leaning into a playful Android customization trend.

Featured

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA’s Nemotron-Labs Diffusion models aim to speed up text generation by drafting and refining tokens in parallel instead of one at a time.

12h agohuggingface.co2 min read

Most chatbots write one word piece at a time, like a person filling in a sentence slowly.

NVIDIA made a new kind of model that can guess several pieces at once, then fix them step by step. It is like sketching a whole drawing quickly, then sharpening the lines.

That can make the model faster. The article says it can also work in a few different ways, so developers can choose the balance between speed and accuracy.

A different way to generate text

One model, three modes

What NVIDIA claims it achieves

Key points

Nemotron-Labs Diffusion tries to speed up text generation with parallel drafting and refinement.
NVIDIA says the model family supports autoregressive, diffusion, and self-speculation modes.
The 8B model is claimed to improve accuracy versus Qwen3 8B while boosting throughput.
The post says the models were trained on 1.3T pretraining tokens and 45B fine-tuning tokens.
The release is positioned as a practical deployment option, not just a research demo.

Tagsai llms research open-source tools

1h agotheverge.com

Google’s new anything-to-anything AI model is wild

Google’s new Gemini Omni video model can edit and generate strikingly realistic clips, but the results are still glitchy and expensive.

13h agotechcrunch.com

AI is being used to resurrect the voices of dead pilots

AI tools were used to reconstruct cockpit voices from a public spectrogram, prompting the NTSB to temporarily close access to part of its docket system.

15h agobbc.com

AI used to speed up search for motor neurone disease drugs

Researchers are using AI to spot existing drugs that might treat MND and other brain conditions, hoping to find treatments faster.

15h agotechcrunch.com

Google goes for the glitter with disco-ball icons: ‘Are y’all sure you still want this?’

Google has rolled out disco-ball-style Pixel icons after teasing them on X, leaning into a playful Android customization trend.

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

A different way to generate text

One model, three modes

What NVIDIA claims it achieves

More from this desk

Google’s new anything-to-anything AI model is wild

AI is being used to resurrect the voices of dead pilots

AI used to speed up search for motor neurone disease drugs

Google goes for the glitter with disco-ball icons: ‘Are y’all sure you still want this?’

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

A different way to generate text

One model, three modes

What NVIDIA claims it achieves

More from this desk

Google’s new anything-to-anything AI model is wild

AI is being used to resurrect the voices of dead pilots

AI used to speed up search for motor neurone disease drugs

Google goes for the glitter with disco-ball icons: ‘Are y’all sure you still want this?’