Model Details

Domain:

Task:

Quantitative reasoning

Model Access:

Open weights (unrestricted)

Introduction

The "Tootsie Roll" Process A core premise of the Marin 8B run was that we didn't fully know the best recipe— so we just started training with what we had, and planned to adapt along the way. Internally, we referred to this as the "Tootsie" process, a reference to Tootsie Rolls, which use a "graining" process where each day's batch contains a bit of the previous day's, seeding crystallization or something. (We are not food scientists.) This is admittedly a bit of a strained metaphor, but the idea was that we'd keep folding in new data, training techniques, and whatever else as the training process went on. (As it would turn out, dear reader, we would often change more than the data...) Model Basics Model Size We decided to build a roughly 7-8 billion parameter model mostly out of pragmatism: we initially only had reserved capacity to train a model of that size for long enough. Architecture We settled on the Llama architecture for the usual reasons: it has been shown to work well, easier to plug into existing inference stacks, no one ever got fired for buying IBM, etc. We used the same settings as Llama 3.1 8B.

Benchmarking

FLOPs

6.12e+23

Notes: 6 FLOP / parameter / token * 8 *10^9 parameters * 12.75 * 10^12 tokens = 6.12e+23 FLOP

Training

Training Code Accessibility

Apache 2.0 https://huggingface.co/marin-community/marin-8b-base

Hardware

Google TPU v5e,Google TPU v4

Size Notes: 12.75T tokens

Parameters

8000000000

Notes: 8B Architecture Details Architecture: Llama 3 8B Hidden size: 4096 Feedforward size: 14336 Number of layers: 32 Number of attention heads: 32 Number of KV heads: 8

Marin | Marin 8B , Capabilities, Benchmarks and Use Cases, 2026

Marin 8B - Use Model

Marin 8B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Marin 8B - Use Model

Marin 8B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters