On the first anniversary of the release of Mistral 7B, the model that revolutionized independent frontier AI innovation for millions, we are proud to introduce two new state-of-the-art models for on-device computing and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B. These models set a new frontier in knowledge, commonsense, reasoning, function-calling, and efficiency in the sub-10B category, and can be used or tuned to a variety of uses, from orchestrating agentic workflows to creating specialist task workers. Both models support up to 128k context length (currently 32k on vLLM) and Ministral 8B has a special interleaved sliding-window attention pattern for faster and memory-efficient inference.
Training Code AccessibilityMistral Commercial License Mistral Research License For self-deployed use, please reach out to us for commercial licenses. We will also assist you in lossless quantization of the models for your specific use-cases to derive maximum performance. The model weights for Ministral 8B Instruct are available for research use. Both models will be available from our cloud partners shortly. https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
Parameters8000000000
Notes: Architecture Dense Transformer Parameters 8,019,808,256 Layers 36 Heads 32