Train a large language model from scratch on a GPU cluster using predefined configs for Pythia, LLaMA, or Falcon
Fine-tune an existing model using preference learning methods on cloud infrastructure like AWS or CoreWeave
Run a distributed training job on a supercomputer with Slurm integration and MPI coordination
Requires a multi-GPU cluster with CUDA, designed for research organizations with large-scale compute, not individual developers.
GPT-NeoX is a Python library built by EleutherAI for training very large language models from scratch on clusters of GPUs. A language model is the kind of AI system that powers tools like ChatGPT, capable of generating and understanding text. Training one from scratch requires enormous amounts of compute and careful coordination across many machines running in parallel. GPT-NeoX is designed for that process, not for running or chatting with a pre-existing model. The README explicitly states that if you are not trying to train a model with billions of parameters from scratch, this is probably the wrong library to use, and recommends the Hugging Face transformers library for general inference needs instead. The library builds on top of two other systems: NVIDIA Megatron-LM and Microsoft DeepSpeed, both of which handle splitting a model across many GPUs and coordinating the training process. GPT-NeoX adds its own optimizations on top of those, including support for a wider range of hardware configurations and cluster management tools such as Slurm and MPI. It has been run at scale on cloud providers like AWS and CoreWeave, as well as on government supercomputers including Oak Ridge National Lab systems and the LUMI system in Finland. The project was used to train several published open-source models, including GPT-NeoX-20B and the Pythia suite. It ships with predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 and 2. More recent additions include Mixture-of-Experts support, AMD GPU support, and preference learning methods for fine-tuning. This is primarily a research and engineering tool for organizations with access to large GPU clusters. It is maintained by EleutherAI, a nonprofit AI research organization. The full README is longer than what was shown.
← eleutherai on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.