Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

This blog post introduces SmolLM, a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. It covers data curation, model evaluation, and usage.

Benchmarking

FLOPs

1.03e+22

Notes: "Therefore, we decided to train the 1.7B model on 1 trillion tokens" (https://huggingface.co/blog/smollm#experiments). The model doesn't have any CNNs or RNNs (https://huggingface.co/blog/smollm#hyperparameters-choice), so I will approximate it to be dense. Then assuming the model saw 1 trillion tokens during training, the 6ND approximation yields Training compute = # of active parameters / forward pass * # of tokens * 6 FLOPs / token = 1.71e9 parameters * 1e12 tokens * 6 FLOPs / token ~= 1.03e22 FLOPS "We also instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models... We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4" (https://huggingface.co/blog/smollm#evaluation).

Training

Training Code Accessibility

Apache 2.0 https://huggingface.co/HuggingFaceTB/SmolLM-1.7B

Hardware

NVIDIA H100 PCIe

Hardware Quantity

Size Notes: Pretraining tokens: 1T

Parameters

1710000000

Notes: The image here, https://huggingface.co/blog/smollm#hyperparameters-choice, shows that SmolLM-1.7B has 1.71B parameters.

Authors

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch

unknown | SmolLM-1.7B , Capabilities, Benchmarks and Use Cases, 2026

SmolLM-1.7B - Use Model

SmolLM-1.7B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

SmolLM-1.7B - Use Model

SmolLM-1.7B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors