Model Details

Domain:

Task:

Model Access:

Open weights (restricted use)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

NVIDIA-Nemotron-Nano-12B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks. The model was fine-tuned from NVIDIA-Nemotron-Nano-12B-v2-Base was further compressed into NVIDIA-Nemotron-Nano-9B-v2. The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just six Attention layers. For the architecture, please refer to the Nemotron-H tech report. The model was trained using Megatron-LM and NeMo-RL. The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen. This model is ready for commercial use.

Benchmarking

FLOPs1.52e+24

Notes: 6 FLOP/parameter/token * 12000000000 parameters * 21100000000000 tokens = 1.5192e+24 FLOP

Training

Training Code AccessibilityGOVERNING TERMS: Your use of this model is governed by the NVIDIA Open Model License. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. "Models are commercially usable" https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2

Size Notes: 21.1T