AiPortalXAIPortalX Logo

Filters

Selected Filters

Image Captioning
Task1
Organization
Country

Include Other Tiers

By default, only production models are shown

31 Models found

Anthropic

Claude Opus 4.5

By Anthropic
Domain
LanguageLanguageMultimodalMultimodalVisionVision
Task
Code generationCode generationLanguage modelingLanguage modelingLanguage generationLanguage generation+13 more
Google DeepMind

Gemini Robotics-ER 1.5

By Google DeepMind
Domain
VisionVisionLanguageLanguageSpeechSpeech
Task
Instruction interpretationInstruction interpretationRobotic manipulationRobotic manipulationImage captioningImage captioning+5 more
Alibaba

Qwen3-Omni-30B-A3B

By Alibaba
Domain
MultimodalMultimodalLanguageLanguageVisionVision+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+6 more
Zhipu AI

GLM-4.5-Air

By Zhipu AI
Domain
LanguageLanguage
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+4 more
xAI

Grok 4

By xAI
Domain
LanguageLanguageMultimodalMultimodalVisionVision
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+4 more
Anthropic

Claude Opus 4

By Anthropic
Domain
LanguageLanguageMultimodalMultimodalVisionVision
Task
Code generationCode generationLanguage modelingLanguage modelingLanguage generationLanguage generation+13 more
Anthropic

Claude Sonnet 4

By Anthropic
Domain
LanguageLanguageMultimodalMultimodalVisionVision
Task
Code generationCode generationLanguage modelingLanguage modelingLanguage generationLanguage generation+13 more
Google DeepMind

Gemini 2.5 Flash

By Google DeepMind
Domain
LanguageLanguageMultimodalMultimodalVisionVision+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+9 more
IBM

TerraMind

By IBM
Domain
Earth scienceEarth scienceVisionVision
Task
Image captioningImage captioningFlood MappingFlood MappingCrop MappingCrop Mapping+3 more
Google DeepMind

Gemini 2.5 Pro

By Google DeepMind
Domain
LanguageLanguageVisionVisionVideoVideo+1 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+6 more
Baichuan

Baichuan-Omni-1.5

By Baichuan
Domain
MultimodalMultimodalLanguageLanguageSpeechSpeech+2 more
Task
Language modelingLanguage modelingLanguage generationLanguage generationQuestion answeringQuestion answering+8 more
Rhymes AI

Aria

By Rhymes AI
Domain
MultimodalMultimodalLanguageLanguageVideoVideo
Task
Language modelingLanguage modelingLanguage generationLanguage generationVisual question answeringVisual question answering+4 more
Tsinghua University

Grounding Dino L

By Tsinghua University
Domain
VisionVision
Task
Object detectionObject detectionImage captioningImage captioning
New York University NYU

Cambrian-1-13B

By New York University NYU
Domain
MultimodalMultimodalVisionVisionLanguageLanguage
Task
Image captioningImage captioningVisual question answeringVisual question answeringCharacter recognition OCRCharacter recognition OCR
Anthropic

Claude 3.5 Sonnet

By Anthropic
Domain
MultimodalMultimodalLanguageLanguageVisionVision
Task
ChatChatImage captioningImage captioningCode generationCode generation+1 more