deepseek-ai/deepseek-r1

★ 92,019Audience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((deepseek-r1))
    What it does
      Step-by-step reasoning
      Maths and code tasks
      Reinforcement learning
    Model Family
      DeepSeek-R1-Zero
      DeepSeek-R1
      Distilled 1.5B to 70B
    Use Cases
      Local inference
      Fine-tuning
      Benchmarking
    Tech Stack
      Hugging Face weights
      Transformer models
    Audience
      AI researchers
      ML engineers

mindmap root((deepseek-r1)) What it does Step-by-step reasoning Maths and code tasks Reinforcement learning Model Family DeepSeek-R1-Zero DeepSeek-R1 Distilled 1.5B to 70B Use Cases Local inference Fine-tuning Benchmarking Tech Stack Hugging Face weights Transformer models Audience AI researchers ML engineers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Download and run a smaller distilled DeepSeek-R1 model locally to answer maths problems with step-by-step reasoning.

USE CASE 2

Fine-tune a distilled checkpoint on your own dataset to build a specialised reasoning assistant.

USE CASE 3

Study the paper and training methodology to understand how reinforcement learning can replace supervised fine-tuning for reasoning.

USE CASE 4

Benchmark the 70B model against other open models for code generation tasks.

Tech stack

PythonPyTorch

Getting it running

Difficulty · hard Time to first run · 1h+

Model weights are large and hosted on Hugging Face, running the full model requires significant GPU memory (80GB+ for the largest variant).

MIT licence, use freely for any purpose, including commercial, as long as you keep the copyright notice.

In plain English

DeepSeek-R1 is the public release of a family of large language models built by DeepSeek AI that are designed to be good at step-by-step reasoning, solving maths problems, writing code, and working through long chains of thought before producing an answer. The repository contains documentation, evaluation results, an accompanying paper, and links to download the actual model weights from Hugging Face. It does not contain the model itself as code, the heavy machine-learning weights live separately and are loaded by other software. The README describes the family in two parts. The first is the post-training method: rather than the usual approach of teaching the model with curated example answers (supervised fine-tuning) before reinforcement learning, the team applied reinforcement learning directly to a base model. The result, called DeepSeek-R1-Zero, learned to produce long chains of thought and self-check its answers, but suffered from issues like repetition and language mixing. DeepSeek-R1 adds "cold-start" data and additional stages to fix those issues. According to the README, DeepSeek-R1 reaches performance comparable to OpenAI-o1 on maths, code, and reasoning benchmarks. The second part is distillation: the team used data produced by DeepSeek-R1 to fine-tune smaller open-source models in sizes ranging from 1.5B up to 70B parameters, so users with less computing power can still benefit. Someone might use this repository to download the weights, run the models locally, study the paper, or fine-tune the smaller distilled checkpoints for specific tasks. The project is released under the MIT licence.

Copy-paste prompts

Prompt 1

I want to run DeepSeek-R1 locally. Which distilled model size should I pick for a machine with 16GB RAM, and how do I download and load it from Hugging Face?

Prompt 2

Write a Python script that loads DeepSeek-R1-Distill-Qwen-7B from Hugging Face and prompts it to solve a maths problem with step-by-step reasoning shown.

Prompt 3

Explain the difference between DeepSeek-R1-Zero and DeepSeek-R1 and when I would choose one over the other for fine-tuning.

Prompt 4

I want to fine-tune DeepSeek-R1-Distill-1.5B on a custom reasoning dataset. Walk me through the training setup including hardware requirements.

Open on GitHub → Explain another repo

← deepseek-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.