Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

Model Summary: Granite-3.1-2B-Base extends the context length of Granite-3.0-2B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens.

Benchmarking

FLOPs

1.8e+23

Notes: 6 FLOP / token / parameter * 2.5 * 10^9 parameters * 12*10^12 tokens = 1.8e+23 FLOP

Training

Training Code Accessibility

https://huggingface.co/ibm-granite/granite-3.1-2b-base Apache 2.0

Hardware

NVIDIA H100 SXM5 80GB

Size Notes: 12T

Parameters

2500000000

Notes: 2.5B Model Architecture: Granite-3.1-2B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

Related Models

Granite 3.1 2B - Use Model

Granite 3.1 2B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

Granite-4.0-H-Micro

Granite-4.0-H-Small

Granite-4.0-H-Tiny

TerraMind

Granite 3.1 2B - Use Model

Granite 3.1 2B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

Granite-4.0-H-Micro

Granite-4.0-H-Small

Granite-4.0-H-Tiny

TerraMind