optimalscale/lmflow

★ 8,487PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((lmflow))
    Training methods
      Basic finetuning
      Instruction tuning
      Preference alignment
      Efficient inference
    Supported models
      Llama family
      Phi-3
      ChatGLM
      Baichuan
    Setup
      Python 3.9+
      pip install
      Multi-GPU via Accelerate
    Community
      Discord
      Slack
      WeChat

mindmap root((lmflow)) Training methods Basic finetuning Instruction tuning Preference alignment Efficient inference Supported models Llama family Phi-3 ChatGLM Baichuan Setup Python 3.9+ pip install Multi-GPU via Accelerate Community Discord Slack WeChat

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Finetune a Llama or Phi-3 model on your own dataset to create a domain-specific AI assistant

USE CASE 2

Train a model to follow instructions more reliably using your own curated examples

USE CASE 3

Run efficient inference on finetuned models in memory-constrained GPU environments

Tech stack

PythonPyTorchHugging FaceAccelerateCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a CUDA-capable NVIDIA GPU and Python 3.9+, multi-GPU training requires additional Accelerate configuration.

Use freely for any purpose including commercial use as long as you include the Apache 2.0 license notice and copyright.

In plain English

LMFlow is a Python toolkit for taking an existing large language model, the kind that powers chat assistants and text generators, and training it further on your own data so it behaves differently or knows about a specific domain. This process is called finetuning. Rather than building a model from scratch, which requires enormous computing resources, finetuning starts from a model that already understands language and adjusts it with a smaller set of examples. LMFlow is designed to make that process more accessible to researchers and developers. The toolkit supports a range of training approaches beyond basic finetuning. It includes methods for making models follow instructions better, for aligning model behavior with human preferences, for running models more efficiently in memory-constrained settings, and for speeding up text generation at inference time. Recent updates added support for a Hugging Face library called Accelerate, which handles running training across multiple GPUs or machines. On the model side, LMFlow works with many of the well-known open-source language models, including the Llama family, ChatGLM, Baichuan, Phi-3, and others. Users provide their training data in a format the toolkit specifies, configure a shell script with their chosen options, and run the training job. The README and documentation cover dataset formatting, conversation templates for different model families, and scripts for various training configurations. The project also released its own finetuned models under the name Robin, based on Llama, and published benchmarks comparing open-source chat models. A web demo is available at lmflow.com. The codebase is open source under the Apache 2.0 license, requires Python 3.9 or later, and can be installed via pip. Community support is available through Discord, Slack, and WeChat channels. The project received a Best Demo Paper award at the NAACL 2024 academic conference.

Copy-paste prompts

Prompt 1

Write a shell script using LMFlow to finetune Llama-3 on a custom dataset of customer support conversations in the LMFlow format

Prompt 2

How do I format my training data as a JSON dataset for LMFlow instruction-following finetuning?

Prompt 3

Show me how to configure LMFlow with Hugging Face Accelerate to run finetuning across two NVIDIA GPUs

Prompt 4

How do I reduce GPU memory usage when running LMFlow inference on a finetuned model?

Open on GitHub → Explain another repo

← optimalscale on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.