Today, we're adding a new, highly specialized tool to the Gemma 3 toolkit: Gemma 3 270M, a compact, 270-million parameter model designed from the ground up for task-specific fine-tuning with strong instruction-following and text structuring capabilities already trained in. Compact and capable architecture: Our new model has a total of 270 million parameters: 170 million embedding parameters due to a large vocabulary size and 100 million for our transformer blocks. Thanks to the large vocabulary of 256k tokens, the model can handle specific and rare tokens, making it a strong base model to be further fine-tuned in specific domains and languages. Extreme energy efficiency: A key advantage of Gemma 3 270M is its low power consumption. Internal tests on a Pixel 9 Pro SoC show the INT4-quantized model used just 0.75% of the battery for 25 conversations, making it our most power-efficient Gemma model. Instruction following: An instruction-tuned model is released alongside a pre-trained checkpoint. While this model is not designed for complex conversational use cases, it’s a strong model that follows general instructions right out of the box. Production-ready quantization: Quantization-Aware Trained (QAT) checkpoints are available, enabling you to run the models at INT4 precision with minimal performance degradation, which is essential for deploying on resource-constrained devices.
Notes: 6 FLOP / parameter / token * 270 * 10^9 parameters * 6 * 10^12 tokens = 9.72e+21 FLOP
Size Notes: "the 270M with 6 trillion tokens"
Notes: 270M Compact and capable architecture: Our new model has a total of 270 million parameters: 170 million embedding parameters due to a large vocabulary size and 100 million for our transformer blocks.