Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

Benchmarking

FLOPs332000000000000000000

Notes: 6ND = 6 *1.3*10^9 parameters * 51 * 10^9 tokens = 3.978e+20 312000000000000 FLOP/s * 8 GPUs *103 hours *3600 sec / hour *0.3 [assumed utilization] = 2.7765504e+20 geometric mean sqrt(2.7765504e+20*3.978e+20) = 3.3234195e+20

Training

Training Code Accessibilityhttps://huggingface.co/microsoft/phi-1 MIT license

HardwareNVIDIA A100

Hardware Quantity8

Size Notes: A filtered code-language dataset, which is a subset of The Stack and StackOverflow, obtained by using a language model-based classifier (consisting of about 6B tokens). • A synthetic textbook dataset consisting of <1B tokens of GPT-3.5 generated Python textbooks. • A small synthetic exercises dataset consisting of ∼180M tokens of Python exercises and solutions For the 1.3B models, phi-1 and phi-1-base are checkpoints after training on 51B tokens (770 GPU hours) Training tokens: 54B tokens (7B unique tokens)

Parameters

Parameters1300000000

Notes: 1.3B The architecture for our 1.3B parameter phi-1 model consists of 24 layers, hidden dimension of 2048, MLP-inner dimension of 8192, and 32 attention heads of dimension 64 each.

Authors

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

Related ModelsView all models

Phi-4By Microsoft Research

Language

CARPBy Microsoft Research

Biology

DeBERTaV3largeBy Microsoft Research

Language

Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Benchmarking

FLOPs332000000000000000000

Training

Training Code Accessibilityhttps://huggingface.co/microsoft/phi-1 MIT license

HardwareNVIDIA A100

Hardware Quantity8

Parameters

Parameters1300000000

Notes: 1.3B The architecture for our 1.3B parameter phi-1 model consists of 24 layers, hidden dimension of 2048, MLP-inner dimension of 8192, and 32 attention heads of dimension 64 each.

Authors

Related ModelsView all models

Phi-4By Microsoft Research

Language

CARPBy Microsoft Research

Biology

DeBERTaV3largeBy Microsoft Research

Language

Microsoft Research | Phi-1 - Capabilities, Benchmarks and Use Cases

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors