DeepSeek LLM is a collection of open-weight AI language models released by DeepSeek AI for research and commercial use. The models answer questions, write code, solve math problems, and hold conversations in both English and Chinese. They were trained from scratch on two trillion tokens of text, which is the raw data fed to the model during its learning process. Two sizes are available: a 7 billion parameter version that runs on modest hardware, and a 67 billion parameter version that rivals much larger models from other organizations. Each size comes in two forms: a base model trained on general text, and a chat model fine-tuned to follow instructions and hold conversations. The 67B chat model scores notably well on coding tasks and mathematics, including a Hungarian national high school exam it had not seen during training. The models are hosted on Hugging Face, a platform where AI models are shared and downloaded. You load them using a Python library called Transformers, pass in your text, and receive a response. The README includes short code examples showing how to load a model and generate output, as well as how to run a multi-turn conversation with the chat version. Evaluation results comparing DeepSeek LLM against other publicly available models are included in the repository, covering reasoning, reading comprehension, coding, and Chinese language benchmarks. The base models are released under an MIT license for the code, with a separate model license governing use of the weights themselves. Commercial use is permitted under the model license terms. The repository also hosts intermediate training checkpoints, which researchers can download from cloud storage to study how the model's capabilities developed at different points during training.
← deepseek-ai on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.