deepseek-ai/deepseek-v3

★ 103,409PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((DeepSeek-V3))
    What it does
      Open LLM
      Chat and coding
      API alternative
    Architecture
      Mixture of Experts
      671B total params
      37B active params
      128K context
    Training
      FP8 precision
      14.8T tokens
      RL fine-tuning
    Use Cases
      Local deployment
      AI research
      Product integration

mindmap root((DeepSeek-V3)) What it does Open LLM Chat and coding API alternative Architecture Mixture of Experts 671B total params 37B active params 128K context Training FP8 precision 14.8T tokens RL fine-tuning Use Cases Local deployment AI research Product integration

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run a state-of-the-art AI chat model on your own servers without paying per-call API fees to a commercial provider.

USE CASE 2

Research Mixture-of-Experts architecture by studying how the model activates only 37B of its 671B parameters per query.

USE CASE 3

Integrate a strong open-weights model into your product as a drop-in alternative to closed-source commercial APIs.

Tech stack

Python

Getting it running

Difficulty · hard Time to first run · 1day+

Full model requires a large multi-GPU cluster, quantized versions reduce but do not eliminate the significant hardware requirements.

Code is MIT-licensed, model weights are under a separate model license agreement, check Hugging Face for the weight usage terms before commercial use.

In plain English

DeepSeek-V3 is an open-source large language model, the kind of AI that powers chat assistants. The repository contains the model's description, technical paper, instructions for downloading the weights, evaluation results, and code for running the model locally. Its main appeal, per the README, is that it is a very large model whose performance is comparable to leading closed-source models, yet it was trained at a much lower cost and is released under an open license. Under the hood, DeepSeek-V3 uses a Mixture-of-Experts design (often shortened to MoE). The model has 671 billion total parameters, but for any single piece of input only 37 billion are actually used, which keeps the cost of generating each answer low while still letting the overall model be very capable. The README explains that the architecture builds on a previous version (DeepSeek-V2) and adds an auxiliary-loss-free load-balancing strategy and a Multi-Token Prediction training objective for better performance. The training was done in FP8 mixed precision, a numerical format that uses less memory, on 14.8 trillion tokens of text. After pre-training, the model went through supervised fine-tuning and reinforcement learning, including distilling reasoning ability from DeepSeek's R1 reasoning model. The context window is 128K tokens. You would use DeepSeek-V3 if you want a strong open-weights model to run locally or integrate into your own product, if you are doing research on MoE architectures, or if you want to evaluate a state-of-the-art model without relying on a commercial API. The code is released under an MIT license, while the weights are under a separate model agreement, with downloads on Hugging Face.

Copy-paste prompts

Prompt 1

Walk me through downloading DeepSeek-V3 weights from Hugging Face and running inference on a multi-GPU server.

Prompt 2

How does DeepSeek-V3's Mixture-of-Experts design work, and what does the difference between 671B total and 37B active parameters mean for my hardware requirements?

Prompt 3

I want to fine-tune DeepSeek-V3 on a custom dataset, what are the minimum hardware requirements and where do I start?

Prompt 4

What quantization options are available for running DeepSeek-V3 on consumer hardware, and which gives the best quality-to-speed trade-off?

Prompt 5

Compare DeepSeek-V3's strengths on coding tasks vs GPT-4 so I can decide whether to use it in my project.

Open on GitHub → Explain another repo

← deepseek-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.