Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

Model Summary: Granite-3.1-8B-Base extends the context length of Granite-3.0-8B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens.

Benchmarking

FLOPs

5.83e+23

Notes: 6ND = 6 FLOP / parameter / token * 8.1*10^9 parameters * 12*10^12 tokens = 5.832e+23 FLOP

Training

Training Code Accessibility

https://huggingface.co/ibm-granite/granite-3.1-8b-base Apache 2.0

Hardware

NVIDIA H100 SXM5 80GB

Size Notes: 12T

Parameters

8100000000

Notes: 8.1B Model Architecture: Granite-3.1-8B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

Related Models

Granite 3.1 8B - Use Model

Granite 3.1 8B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

Granite-4.0-H-Micro

Granite-4.0-H-Small

Granite-4.0-H-Tiny

TerraMind

Granite 3.1 8B - Use Model

Granite 3.1 8B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

Granite-4.0-H-Micro

Granite-4.0-H-Small

Granite-4.0-H-Tiny

TerraMind