AiPortalXAIPortalX Logo

Filters

Selected Filters

Speech
Task1
Organization
Country

Include Other Tiers

By default, only production models are shown

47 Models found

Google DeepMind

Gemini Robotics-ER 1.5

By Google DeepMind
Domain
VisionVisionLanguageLanguageSpeechSpeech
Task
Instruction interpretationInstruction interpretationRobotic manipulationRobotic manipulationImage captioningImage captioning+5 more
Alibaba

Qwen3-Omni-30B-A3B

By Alibaba
Domain
MultimodalMultimodalLanguageLanguageVisionVision+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+6 more
Resemble AI

Chatterbox Multilingual

By Resemble AI
Domain
SpeechSpeech
Task
Text-to-speech TTSText-to-speech TTSSpeech synthesisSpeech synthesis
Microsoft

MAI-Voice-1

By Microsoft
Domain
SpeechSpeech
Task
Text-to-speech TTSText-to-speech TTSSpeech synthesisSpeech synthesis
OpenAI

gpt-realtime

By OpenAI
Domain
SpeechSpeechVisionVisionLanguageLanguage
Task
Speech recognition ASRSpeech recognition ASRSpeech synthesisSpeech synthesisVisual question answeringVisual question answering+1 more
NVIDIA

Canary 1B v2

By NVIDIA
Domain
SpeechSpeech
Task
Speech recognition ASRSpeech recognition ASRTranslationTranslationSpeech-to-textSpeech-to-text
NVIDIA

Parakeet-tdt-0.6b-v3

By NVIDIA
Domain
SpeechSpeech
Task
Speech-to-textSpeech-to-textSpeech recognition ASRSpeech recognition ASR
Google

Gemini 2.5 Deep Think

By Google
Domain
LanguageLanguageMultimodalMultimodalVisionVision+2 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationMathematical reasoningMathematical reasoning+6 more
Google DeepMind

Gemini 2.5 Flash-Lite Jun 2024

By Google DeepMind
Domain
LanguageLanguageVisionVisionVideoVideo+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+9 more
Google DeepMind

Gemini 2.5 Flash Native Audio

By Google DeepMind
Domain
SpeechSpeech
Task
Speech-to-speechSpeech-to-speechAudio question answeringAudio question answeringText-to-speech TTSText-to-speech TTS
Fish Audio

OpenAudio-S1-mini

By Fish Audio
Domain
SpeechSpeech
Task
Speech synthesisSpeech synthesisText-to-speech TTSText-to-speech TTS
Google

Gemma 3n

By Google
Domain
LanguageLanguageMultimodalMultimodalSpeechSpeech
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+7 more
Google DeepMind

Gemini 2.5 Flash

By Google DeepMind
Domain
LanguageLanguageMultimodalMultimodalVisionVision+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+9 more
Google DeepMind

Gemini 2.5 Pro

By Google DeepMind
Domain
LanguageLanguageVisionVisionVideoVideo+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+6 more
Google

Chirp 3 HD Text-to-Speech

By Google
Domain
SpeechSpeech
Task
Text-to-speech TTSText-to-speech TTSSpeech synthesisSpeech synthesis