NVIDIA-Nemotron-Nano-12B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks. The model was fine-tuned from NVIDIA-Nemotron-Nano-12B-v2-Base was further compressed into NVIDIA-Nemotron-Nano-9B-v2. The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just six Attention layers. For the architecture, please refer to the Nemotron-H tech report. The model was trained using Megatron-LM and NeMo-RL. The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen. This model is ready for commercial use.
Notes: 6 FLOP/parameter/token * 12000000000 parameters * 21100000000000 tokens = 1.5192e+24 FLOP
Size Notes: 21.1T
Notes: 12B