NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks. The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. For the architecture, please refer to the Nemotron-H tech report. The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen. This model is ready for commercial use.
Notes: Cumulative compute : 1.53E+24 FLOPS [reported] 6 FLOP/parameter/token * 9000000000 parameters * 21100000000000 tokens = 1.1394e+24 FLOP
Size Notes: 21.1T
Notes: 9B