Model Details

Domain:

Task:

Model Access:

API access

Citations:

128

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

Benchmarking

FLOPs1.48e+23

Notes: "For experiments in Section 4, the model trained with 150B is used for fair comparison, because not all models are finished training at the same iteration. However, experiments in Section 5.2 use the model trained with 300B tokens, as HyperCLOVA Studio provided the 39B and 82B models trained with 300B tokens." 82e9 connections * 2 FLOP/connection * 300e9 tokens * 3 backward pass = 1.476e23 FLOP Calculation using GPU time corroborates this: - "Our model is based on megatron-LM (Shoeybi et al., 2019) and trained on the NVIDIA Superpod, which includes 128 strongly clustered DGX servers with 1,024 A100 GPUs." - "It takes 13.4 days to train a model with 82B parameters with 150B tokens." Assume 300B tokens takes twice as long, 26.8 days. - Assume the default of 30% utilization rate for large language models. 1024 A100 GPUs * 312e12 FLOP/second * 0.3 utilization * 26.8 days * 24 * 60 * 60 seconds/day = 2.219e+23 FLOP

Training

Training Code Accessibility"We introduce HyperCLOVA Studio, an interactive prompt engineering interface which provides GUI and API interfaces like the OpenAI playground1"

HardwareNVIDIA A100

Hardware Quantity1024

Size Notes: "However, experiments in Section 5.2 use the model trained with 300B tokens, as HyperCLOVA Studio provided the 39B and 82B models trained with 300B tokens." "We introduce HyperCLOVA, a large-scale Korean in-context learning-based LM with nearly 100B parameters, by constructing a large Korean-centric corpus of 560B tokens." Based on tokenizing the Hyperclova article itself using OpenAI's tiktoken BPE tokenizer (https://github.com/openai/tiktoken), there are 3285 tokens for 1069 words - about 3 tokens per word. This work uses a special tokenizer, but based on Figure 5 in Appendix E, the number of tokens seems similar between different tokenization methods. Based on that, 5.6e11 Korean tokens ~= 1.9e11 words

Parameters

Parameters82000000000

Notes: "We introduce a Korean in-context large-scale LM with 82B parameters, i.e., HyperCLOVA. This is the first discovery on near 100B-scale non-English LM." According to media reports, HyperCLOVA has 204B parameters (i.e. a different version than in the paper) https://m.koreaherald.com/view.php?ud=20210525000824

Authors

Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, Jinseong Park, Kyungduk Kim, Hiun Kim, Jisu Jeong, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee, Jaewook Kang, Inho Kang, Jung-Woo Ha, Woomyoung Park, Nako Sung

Related ModelsView all models

HyperCLOVA 204BBy NAVER

Language

Model Details

Domain:

Task:

Model Access:

API access

Citations:

128

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Benchmarking

FLOPs1.48e+23

Training

Training Code Accessibility"We introduce HyperCLOVA Studio, an interactive prompt engineering interface which provides GUI and API interfaces like the OpenAI playground1"

HardwareNVIDIA A100

Hardware Quantity1024

Parameters

Parameters82000000000

Authors

Related ModelsView all models

HyperCLOVA 204BBy NAVER

Language

NAVER,Search Solutions | HyperCLOVA 82B - Capabilities, Benchmarks and Use Cases

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors