Audio Examples

Audio examples for a submission to ICASSP 2026: Generative Modeling of Text-Aligned Speech Tokens via Discrete Diffusion

All audio examples are randomly selected from the librispeech test-clean set.

Comparison between auto-regressive and discrete-diffusion models

For DDMs, we use the ancestral-conf-top-k sampler with 50 inference steps.

8455-210777-0066

original

auto-regressive

discrete-diffusion


8555-284447-0022

original

auto-regressive

discrete-diffusion


4992-41806-0005

original

auto-regressive

discrete-diffusion


5639-40744-0019

original

auto-regressive

discrete-diffusion


8224-274381-0002

original

auto-regressive

discrete-diffusion


Comparison of different number of inference steps

260-123286-0004

original

1steps

10steps

25steps

50steps

100steps


2830-3980-0040

original

1steps

10steps

25steps

50steps

100steps


5105-28233-0004

original

1steps

10steps

25steps

50steps

100steps


2094-142345-0029

original

1steps

10steps

25steps

50steps

100steps


7176-92135-0034

original

1steps

10steps

25steps

50steps

100steps


Comparison of different inference lengths for DDMs

908-31957-0006

original

70percent

100percent

130percent


4970-29093-0002

original

70percent

100percent

130percent


4077-13751-0005

original

70percent

100percent

130percent


8463-287645-0006

original

70percent

100percent

130percent


260-123440-0000

original

70percent

100percent

130percent