Model Details

Domain:

Language

Task:

Code generation

Model Access:

Open weights (non-commercial)

Citations:

775

Introduction

Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. this https URL

Benchmarking

FLOPs

3e+21

Notes: per table 5, required 3 zettaflop (3e21) to train. also, "INCODER-6.7B was trained on 248 V100 GPUs for 24 days" hardware method: 125 trillion * 248 * 24 * 24 * 3600 * 0.3 = 2e22. suggests their utilization was quite low, or 24 days was just calendar time.

Training

Training Code Accessibility

CC-BY-NC 4.0 (non commercial) data is open: "To train our models, we collect a corpus of (1) public code with permissive, non-copyleft, opensource licenses from GitHub and GitLab and (2) StackOverflow questions, answers, and comments." inference code, not training code in this repo: https://github.com/dpfried/incoder/blob/main/README.md

Hardware

NVIDIA V100

Size Notes: 216 GB: "Our final pre-training corpus contains a total of 159 GB of code, 52 GB of it in Python, and a total of 57 GB of content from StackOverflow"

Parameters

6700000000

Notes: 6.7B

Authors

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

Related Models

Facebook AI Research,University of Washington,University of California (UC) Berkeley,Carnegie Mellon University (CMU),Toyota Technological Institute at Chicago | Incoder-6.7B , Capabilities, Benchmarks and Use Cases, 2026