MedSigLIP is a variant of SigLIP (Sigmoid Loss for Language Image Pre-training) that is trained to encode medical images and text into a common embedding space. Developers can use MedSigLIP to accelerate building healthcare-based AI applications. MedSigLIP was trained on a variety of de-identified medical image and text pairs, including chest X-rays, dermatology images, ophthalmology images, histopathology slides, and slices of CT and MRI volumes, along with associated descriptions or reports. MedSigLIP contains a 400M parameter vision encoder and 400M parameter text encoder, it supports 448x448 image resolution with up to 64 text tokens. MedSigLIP is recommended for medical image interpretation applications without a need for text generation, such as data-efficient classification, zero-shot classification, and semantic image retrieval. For medical applications that require text generation, MedGemma is recommended.
Notes: "MedSigLIP contains a 400M parameter vision encoder and 400M parameter text encoder,"