AiPortalXAIPortalX Logo

Filters

Selected Filters

Image captioning
Task1
Domain
Organization
Country

Include Other Tiers

By default, only production models are shown

31 Models found

Anthropic

Claude Opus 4.5

By Anthropic
Domain
LanguageLanguage
MultimodalMultimodal
VisionVision
Task
Code generationCode generation
Language modelingLanguage modeling
Language generationLanguage generation
Quantitative reasoningQuantitative reasoning
+13 more
Google DeepMind

Gemini Robotics-ER 1.5

By Google DeepMind
Domain
VisionVision
LanguageLanguage
SpeechSpeech
Task
Instruction interpretationInstruction interpretation
Robotic manipulationRobotic manipulation
Image captioningImage captioning
Object detectionObject detection
+5 more
Alibaba

Qwen3-Omni-30B-A3B

By Alibaba
Domain
MultimodalMultimodal
LanguageLanguage
VisionVision
SpeechSpeech
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Visual question answeringVisual question answering
+6 more
Zhipu AI

GLM-4.5-Air

By Zhipu AI
Domain
LanguageLanguage
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Visual question answeringVisual question answering
+4 more
xAI

Grok 4

By xAI
Domain
LanguageLanguage
MultimodalMultimodal
VisionVision
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
SearchSearch
+4 more
Anthropic

Claude Opus 4

By Anthropic
Domain
LanguageLanguage
MultimodalMultimodal
VisionVision
Task
Code generationCode generation
Language modelingLanguage modeling
Language generationLanguage generation
Quantitative reasoningQuantitative reasoning
+13 more
Anthropic

Claude Sonnet 4

By Anthropic
Domain
LanguageLanguage
MultimodalMultimodal
VisionVision
Task
Code generationCode generation
Language modelingLanguage modeling
Language generationLanguage generation
Quantitative reasoningQuantitative reasoning
+13 more
Google DeepMind

Gemini 2.5 Flash

By Google DeepMind
Domain
LanguageLanguage
MultimodalMultimodal
VisionVision
SpeechSpeech
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Code generationCode generation
+9 more
IBM

TerraMind

By IBM
Domain
Earth scienceEarth science
VisionVision
Task
Image captioningImage captioning
Flood MappingFlood Mapping
Crop MappingCrop Mapping
Crop SegmentationCrop Segmentation
+3 more
Google DeepMind

Gemini 2.5 Pro

By Google DeepMind
Domain
LanguageLanguage
VisionVision
VideoVideo
MultimodalMultimodal
+1 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Code generationCode generation
+6 more
Baichuan

Baichuan-Omni-1.5

By Baichuan
Domain
MultimodalMultimodal
LanguageLanguage
SpeechSpeech
VisionVision
+2 more
Task
Language modelingLanguage modeling
Language generationLanguage generation
Question answeringQuestion answering
Audio question answeringAudio question answering
+8 more
Rhymes AI

Aria

By Rhymes AI
Domain
MultimodalMultimodal
LanguageLanguage
VideoVideo
VisionVision
Task
Language modelingLanguage modeling
Language generationLanguage generation
Visual question answeringVisual question answering
Image captioningImage captioning
+4 more
Tsinghua University

Grounding Dino L

By Tsinghua University
Domain
VisionVision
Task
Object detectionObject detection
Image captioningImage captioning
New York University NYU

Cambrian-1-13B

By New York University NYU
Domain
MultimodalMultimodal
VisionVision
LanguageLanguage
Task
Image captioningImage captioning
Visual question answeringVisual question answering
Character recognition OCRCharacter recognition OCR
Anthropic

Claude 3.5 Sonnet

By Anthropic
Domain
MultimodalMultimodal
LanguageLanguage
VisionVision
Task
ChatChat
Image captioningImage captioning
Code generationCode generation
Language modelingLanguage modeling
+1 more
NVIDIA

VILA1.5-40B

By NVIDIA
Domain
MultimodalMultimodal
LanguageLanguage
VisionVision
VideoVideo
Task
ChatChat
Visual question answeringVisual question answering
Image captioningImage captioning
Language modelingLanguage modeling
+3 more