MedSigLIP is a variant of SigLIP (Sigmoid Loss for Language Image Pre-training) that is trained to encode medical images and text into a common embedding space. Developers can use MedSigLIP to accelerate building healthcare-based AI applications. MedSigLIP was trained on a variety of de-identified medical image and text pairs, including chest X-rays, dermatology images, ophthalmology images, histopathology slides, and slices of CT and MRI volumes, along with associated descriptions or reports. MedSigLIP contains a 400M parameter vision encoder and 400M parameter text encoder, it supports 448x448 image resolution with up to 64 text tokens. MedSigLIP is recommended for medical image interpretation applications without a need for text generation, such as data-efficient classification, zero-shot classification, and semantic image retrieval. For medical applications that require text generation, MedGemma is recommended.
Training Code AccessibilityThe use of MedSigLIP is governed by the Health AI Developer Foundations terms of use https://huggingface.co/google/medsiglip-448
Parameters800000000
Notes: "MedSigLIP contains a 400M parameter vision encoder and 400M parameter text encoder,"