Model Details

Domain:

Speech

Task:

Speech-to-text

Speech recognition ASR

Model Access:

Open weights (unrestricted)

Introduction

parakeet-tdt-0.6b-v3 is a 600-million-parameter multilingual automatic speech recognition (ASR) model designed for high-throughput speech-to-text transcription. It extends the parakeet-tdt-0.6b-v2 model by expanding language support from English to 25 European languages. The model automatically detects the language of the audio and transcribes it without requiring additional prompting. It is part of a series of models that leverage the Granary [1, 2] multilingual corpus as their primary training dataset.

Training

Training Code Accessibility

cc-by-4.0 https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

Hardware

NVIDIA A100

Hardware Quantity

128

Size Notes: The model was trained on the combination of Granary dataset's ASR subset and in-house dataset NeMo ASR Set 3.0: 10,000 hours from human-transcribed NeMo ASR Set 3.0, including: LibriSpeech (960 hours) Fisher Corpus National Speech Corpus Part 1 VCTK Europarl-ASR Multilingual LibriSpeech Mozilla Common Voice (v7.0) AMI 660,000 hours of pseudo-labeled data from Granary [1] [2], including: YTC [7] MOSEL [8] YODAS [9]

Parameters

600000

Notes: 0.6B

Authors

Monica Sekoyan, Nithin Rao Koluguri, Nune Tadevosyan, Piotr Zelasko, Travis Bartley, Nikolay Karpov, Jagadeesh Balam, Boris Ginsburg

Related Models

NVIDIA | Parakeet-tdt-0.6b-v3 , Capabilities, Benchmarks and Use Cases, 2026

Parakeet-tdt-0.6b-v3 - Use Model

Parakeet-tdt-0.6b-v3 - Use Model

Model Details

Introduction

Training

Parameters

Authors

Related Models

Cosmos-Predict2.5 2B

NVIDIA-Nemotron-Nano-12B-v2

NVIDIA-Nemotron-Nano-9B-v2

Canary 1B v2

Parakeet-tdt-0.6b-v3 - Use Model

Parakeet-tdt-0.6b-v3 - Use Model

Model Details

Introduction

Training

Parameters

Authors

Related Models

Cosmos-Predict2.5 2B

NVIDIA-Nemotron-Nano-12B-v2

NVIDIA-Nemotron-Nano-9B-v2

Canary 1B v2