AiPortalXAIPortalX Logo

Filters

Selected Filters

Speech
Task
Domain1
Organization
Country

Include Other Tiers

By default, only production models are shown

42 Models found

Google DeepMind

Gemini Robotics-ER 1.5

By Google DeepMind
Domain
VisionVision
LanguageLanguage
SpeechSpeech
Task
Instruction interpretationInstruction interpretation
Robotic manipulationRobotic manipulation
Image captioningImage captioning
Object detectionObject detection
+5 more
Alibaba

Qwen3-Omni-30B-A3B

By Alibaba
Domain
MultimodalMultimodal
LanguageLanguage
VisionVision
SpeechSpeech
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Visual question answeringVisual question answering
+6 more
Resemble AI

Chatterbox Multilingual

By Resemble AI
Domain
SpeechSpeech
Task
Text-to-speech TTSText-to-speech TTS
Speech synthesisSpeech synthesis
Microsoft

MAI-Voice-1

By Microsoft
Domain
SpeechSpeech
Task
Text-to-speech TTSText-to-speech TTS
Speech synthesisSpeech synthesis
OpenAI

gpt-realtime

By OpenAI
Domain
SpeechSpeech
VisionVision
LanguageLanguage
Task
Speech recognition ASRSpeech recognition ASR
Speech synthesisSpeech synthesis
Visual question answeringVisual question answering
Speech-to-speechSpeech-to-speech
+1 more
NVIDIA

Canary 1B v2

By NVIDIA
Domain
SpeechSpeech
Task
Speech recognition ASRSpeech recognition ASR
TranslationTranslation
Speech-to-textSpeech-to-text
NVIDIA

Parakeet-tdt-0.6b-v3

By NVIDIA
Domain
SpeechSpeech
Task
Speech-to-textSpeech-to-text
Speech recognition ASRSpeech recognition ASR
Google DeepMind

Gemini 2.5 Flash-Lite Jun 2024

By Google DeepMind
Domain
LanguageLanguage
VisionVision
VideoVideo
SpeechSpeech
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
TranslationTranslation
+9 more
Google DeepMind

Gemini 2.5 Flash Native Audio

By Google DeepMind
Domain
SpeechSpeech
Task
Speech-to-speechSpeech-to-speech
Audio question answeringAudio question answering
Text-to-speech TTSText-to-speech TTS
Speech synthesisSpeech synthesis
Fish Audio

OpenAudio-S1-mini

By Fish Audio
Domain
SpeechSpeech
Task
Speech synthesisSpeech synthesis
Text-to-speech TTSText-to-speech TTS
Google

Gemma 3n

By Google
Domain
LanguageLanguage
MultimodalMultimodal
SpeechSpeech
VisionVision
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
ChatChat
+7 more
Google DeepMind

Gemini 2.5 Flash

By Google DeepMind
Domain
LanguageLanguage
MultimodalMultimodal
VisionVision
SpeechSpeech
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Code generationCode generation
+9 more
Google DeepMind

Gemini 2.5 Pro

By Google DeepMind
Domain
LanguageLanguage
VisionVision
VideoVideo
MultimodalMultimodal
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Code generationCode generation
+6 more
Google

Chirp 3 HD Text-to-Speech

By Google
Domain
SpeechSpeech
Task
Text-to-speech TTSText-to-speech TTS
Speech synthesisSpeech synthesis
Google

Chirp 3 Speech-to-Text

By Google
Domain
SpeechSpeech
Task
Speech recognition ASRSpeech recognition ASR
Speech-to-textSpeech-to-text
TranslationTranslation
Reka AI

Reka Flash 3

By Reka AI
Domain
MultimodalMultimodal
LanguageLanguage
VisionVision
VideoVideo
+1 more
Task
ChatChat
Code generationCode generation
Language modelingLanguage modeling
Language generationLanguage generation
+6 more