nlpxucan/wizardlm

★ 9,482PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((wizardlm))
    Models
      WizardLM general
      WizardCoder coding
      WizardMath math
    Training Method
      Evol-Instruct
      Auto-generated data
    Tech Stack
      Python
      PyTorch
      HuggingFace
    Use Cases
      Local AI assistant
      Math tutoring
      Research reproduction

mindmap root((wizardlm)) Models WizardLM general WizardCoder coding WizardMath math Training Method Evol-Instruct Auto-generated data Tech Stack Python PyTorch HuggingFace Use Cases Local AI assistant Math tutoring Research reproduction

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Download WizardCoder to run a local AI coding assistant that generates and explains code without sending data to a cloud service.

USE CASE 2

Use WizardMath to build a math tutoring tool that solves grade-school and competition problems step by step.

USE CASE 3

Run the Evol-Instruct training scripts to reproduce the method and apply it to your own custom dataset.

Tech stack

PythonPyTorchTransformers

Getting it running

Difficulty · hard Time to first run · 1day+

Running 33B or 70B parameter models requires one or more high-memory GPUs, even smaller sizes need significant VRAM.

Code is Apache 2.0 (free for commercial use), training data is Creative Commons BY-NC 4.0 (research and non-commercial use only).

In plain English

WizardLM is a research project from Microsoft that produced a family of AI language models trained to follow complex instructions more reliably than earlier models of similar size. The project contains three distinct models: WizardLM for general conversation and instruction following, WizardCoder for writing and understanding code, and WizardMath for solving math problems. All three are built using a method the team calls Evol-Instruct, where a simpler set of training examples is automatically expanded into a larger, more varied and challenging set by having an AI generate progressively harder versions of each example. WizardCoder is the most prominent part of the repository in terms of benchmark results. As of early 2024, the 33-billion-parameter version achieved scores on standard coding benchmarks that the team reported as competitive with or surpassing GPT-3.5-Turbo and Gemini Pro. WizardMath similarly focuses on grade-school and competition-style math problems, with the 70-billion-parameter version outperforming GPT-3.5 on one benchmark (GSM8K) at the time of release. WizardLM itself targets general complex instructions and was accepted as a paper at ICLR 2024. All three model families are available for download from HuggingFace. The models come in several sizes, ranging from 1 billion to 70 billion parameters, so users with different hardware can choose a version that fits their available memory and compute. The underlying base models include Llama, Mistral, and DeepSeek-Coder depending on the version. The code in the repository covers training scripts for reproducing the Evol-Instruct process and evaluation scripts for running the benchmarks. It requires Python 3.9 or later. Data produced by the project is licensed under Creative Commons BY-NC 4.0, meaning it can be used for research and non-commercial purposes. The code itself is Apache 2.0 licensed. The project has a Discord community and a homepage with additional details. Development appears to have been most active between 2023 and early 2024, corresponding to the period when these benchmarks were published.

Copy-paste prompts

Prompt 1

I want to run WizardCoder-33B locally on my GPU, what hardware do I need and how do I set it up with the Transformers library?

Prompt 2

Using the WizardMath model from HuggingFace, write a Python script that takes a math word problem as input and returns a step-by-step solution.

Prompt 3

Explain the Evol-Instruct method and show me how to apply it to generate progressively harder versions of my own training examples.

Prompt 4

How do I run WizardCoder's evaluation scripts to benchmark it on HumanEval and compare the results to the reported scores?

Open on GitHub → Explain another repo

← nlpxucan on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.