eleutherai/gpt-neox

★ 7,429PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((gpt-neox))
  What it does
    Train large LLMs
    Multi-GPU coordination
    Not for inference
  Tech stack
    Python
    PyTorch
    DeepSpeed
    Megatron-LM
  Models trained
    GPT-NeoX-20B
    Pythia suite
    Falcon LLaMA
  Infrastructure
    Slurm clusters
    AWS CoreWeave
    Supercomputers

mindmap root((gpt-neox)) What it does Train large LLMs Multi-GPU coordination Not for inference Tech stack Python PyTorch DeepSpeed Megatron-LM Models trained GPT-NeoX-20B Pythia suite Falcon LLaMA Infrastructure Slurm clusters AWS CoreWeave Supercomputers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Train a large language model from scratch on a GPU cluster using predefined configs for Pythia, LLaMA, or Falcon

USE CASE 2

Fine-tune an existing model using preference learning methods on cloud infrastructure like AWS or CoreWeave

USE CASE 3

Run a distributed training job on a supercomputer with Slurm integration and MPI coordination

Tech stack

PythonPyTorchDeepSpeedMegatron-LMCUDASlurmMPI

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a multi-GPU cluster with CUDA, designed for research organizations with large-scale compute, not individual developers.

In plain English

GPT-NeoX is a Python library built by EleutherAI for training very large language models from scratch on clusters of GPUs. A language model is the kind of AI system that powers tools like ChatGPT, capable of generating and understanding text. Training one from scratch requires enormous amounts of compute and careful coordination across many machines running in parallel. GPT-NeoX is designed for that process, not for running or chatting with a pre-existing model. The README explicitly states that if you are not trying to train a model with billions of parameters from scratch, this is probably the wrong library to use, and recommends the Hugging Face transformers library for general inference needs instead. The library builds on top of two other systems: NVIDIA Megatron-LM and Microsoft DeepSpeed, both of which handle splitting a model across many GPUs and coordinating the training process. GPT-NeoX adds its own optimizations on top of those, including support for a wider range of hardware configurations and cluster management tools such as Slurm and MPI. It has been run at scale on cloud providers like AWS and CoreWeave, as well as on government supercomputers including Oak Ridge National Lab systems and the LUMI system in Finland. The project was used to train several published open-source models, including GPT-NeoX-20B and the Pythia suite. It ships with predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 and 2. More recent additions include Mixture-of-Experts support, AMD GPU support, and preference learning methods for fine-tuning. This is primarily a research and engineering tool for organizations with access to large GPU clusters. It is maintained by EleutherAI, a nonprofit AI research organization. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

Set up GPT-NeoX to train a Pythia-410M model on my 8-GPU cluster using DeepSpeed ZeRO stage 3

Prompt 2

Modify a GPT-NeoX predefined config to train a Falcon architecture model on AWS with 16 A100 GPUs

Prompt 3

How do I enable Mixture-of-Experts support in GPT-NeoX and configure the number of experts for a 7B model?

Prompt 4

Configure GPT-NeoX to run RLHF preference fine-tuning on an existing pretrained checkpoint

Open on GitHub → Explain another repo

← eleutherai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.