Transformer language model archive

GPT-2 code and models for unsupervised multitask learning.

GPT-2 is the OpenAI release behind “Language Models are Unsupervised Multitask Learners,” packaging transformer language model code, byte-pair encoding tools, generation scripts, and staged checkpoints from 124M to 1.5B parameters.

Explore GPT-2 Read the guide

Transformer WebText Encoder Sample Scripts Model Card Archive

Overview

A compact release that shaped modern text generation research.

Unsupervised multitask framing

GPT-2 demonstrated that a large language model trained on broad web text could perform many tasks through prompting, without task-specific supervised fine-tuning.

Repository-first release

The official repository includes source code, model download tooling, byte-pair encoder files, interactive sampling, unconditional sampling, and domain-conditional sample examples.

Staged checkpoint history

The public release moved through multiple checkpoint sizes, culminating in the 1.5B parameter model and the accompanying model card with intended-use and limitation notes.

Released Models

Four checkpoint sizes, one transformer language model family.

The GPT-2 repository references 124M, 355M, 774M, and 1.5B parameter model directories. The staged approach made it easier to study capability scaling, output quality, compute requirements, and safety tradeoffs across model sizes.

124M

Smallest released checkpoint, practical for quick local experiments and repository smoke tests.

355M

Mid-size checkpoint released after the initial public code and sample release.

774M

Larger checkpoint that expanded public access before the full model release.

1.5B

Full-size GPT-2 model release, paired with the model card and broader safety discussion.

Repository Workflow

What the original project helps you run.

Python 3

Install dependencies

The repository documents Python 3 setup and dependency installation before downloading model checkpoints.

Download checkpoints

A download script retrieves model files for a chosen size such as 124M, 355M, 774M, or 1.5B.

Generate samples

Interactive and unconditional sample scripts expose the core text generation behavior from the released models.

Study limitations

The model card describes intended use, out-of-scope use, dataset context, and known biases or misuse risks.

Safety Context

GPT-2 is useful historical infrastructure, not a neutral source of truth.

Generated text can be false

GPT-2 predicts plausible continuations, so outputs can sound coherent while inventing facts, citations, events, or claims.

WebText carries bias

The model reflects patterns in its training distribution, including social bias, toxic language, and uneven coverage across topics and groups.

Use with clear boundaries

Research demos, educational exploration, and controlled text generation are better fits than high-stakes decisions or identity-sensitive inference.

FAQ

Fast answers for understanding the GPT-2 repository.

What is GPT-2?

GPT-2 is a transformer language model release from OpenAI, associated with the paper “Language Models are Unsupervised Multitask Learners.”

What model sizes were released?

The repository references four model sizes: 124M, 355M, 774M, and 1.5B parameters.

What is WebText?

WebText is the broad web-text training corpus described for GPT-2, built from outbound Reddit links with filtering described in the paper and model card.

Is the original repository still active?

The GitHub repository is archived. It remains useful as historical reference code and model-release documentation.