intel/ipex-llm

★ 8,801PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((IPEX-LLM))
    Status
      Officially archived
      Known security issues
    What It Did
      LLM inference on Intel GPUs
      Model fine-tuning
      Quantization support
    Supported Hardware
      Intel Arc GPU
      Intel integrated graphics
      Core Ultra NPU
    Integrations
      Ollama and llama.cpp
      HuggingFace models
      LangChain and vLLM

mindmap root((IPEX-LLM)) Status Officially archived Known security issues What It Did LLM inference on Intel GPUs Model fine-tuning Quantization support Supported Hardware Intel Arc GPU Intel integrated graphics Core Ultra NPU Integrations Ollama and llama.cpp HuggingFace models LangChain and vLLM

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Research how Intel GPU acceleration was applied to running open-source LLMs like Llama and DeepSeek locally.

USE CASE 2

Study 4-bit and 8-bit quantization techniques for fitting large language models into consumer GPU memory.

Tech stack

PythonPyTorchONNXOllamaLangChainvLLM

Getting it running

Difficulty · hard Time to first run · 1day+

Project is officially archived by Intel with known security vulnerabilities, not suitable for production use, requires Intel Arc, Flex, or Max GPU hardware.

In plain English

Important note before anything else: this project has been officially archived by Intel. Intel states it will no longer provide maintenance, bug fixes, new releases, or accept patches, and has identified the project as having known security issues. Anyone considering it for active use should treat it as unsupported. While it was active, IPEX-LLM was a library that made it faster to run and fine-tune large AI language models on Intel hardware, specifically Intel graphics chips (including the Arc, Flex, and Max discrete GPU lines), Intel's integrated graphics, and the neural processing unit (NPU) found in newer Intel Core Ultra processors. The goal was to let people run capable AI models locally on consumer and workstation Intel hardware rather than relying on cloud services. The library supported over 70 models, including well-known open-source families such as Llama, Mistral, DeepSeek, Qwen, and others. It also offered ways to compress models to smaller sizes (using techniques like 4-bit and 8-bit quantization) so they fit within the limited memory of consumer graphics cards. It was designed to plug into popular existing AI tools like Ollama, llama.cpp, HuggingFace's model library, LangChain, and vLLM, so developers could swap in Intel GPU acceleration without rewriting their code. One notable feature was the ability to run very large models, such as DeepSeek's 671-billion-parameter models, across one or two Intel Arc graphics cards by splitting the workload, which would otherwise require expensive enterprise hardware. Because the project is archived and carries known security issues, the appropriate use is historical reference or research only, not production deployment. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

How did IPEX-LLM integrate with Ollama to provide Intel GPU acceleration for running local LLMs on Arc hardware?

Prompt 2

What quantization methods did IPEX-LLM support for reducing model size to fit on a consumer Intel Arc GPU?

Prompt 3

How did IPEX-LLM split a large model like DeepSeek 671B across two Intel Arc cards to run inference?

Prompt 4

What is the difference between how IPEX-LLM handled Intel discrete GPUs versus the NPU in Intel Core Ultra processors?

Open on GitHub → Explain another repo

← intel on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.