explaingit

chenfei-wu/taskmatrix

34,126PythonAudience · researcherComplexity · 4/5DormantSetup · hard

TLDR

A research system that lets you edit and understand images through conversation by connecting a language model to specialized visual AI tools.

Mindmap

mindmap
  root((repo))
    What it does
      Chat-based image editing
      Multi-tool coordination
      Visual understanding
    Key components
      Language model backbone
      Visual foundation models
      Pluggable tool modules
    Use cases
      Image generation workflows
      Object detection tasks
      Interactive image editing
    Tech stack
      Python framework
      CUDA GPU support
      Hugging Face models
    How it works
      Parse user intent
      Chain visual tools
      Return results

Things people build with this

USE CASE 1

Turn a photo into a sketch and then colorize it through natural conversation.

USE CASE 2

Find and segment all objects of a specific type (like cats) in an image by describing what you want.

USE CASE 3

Extend an image infinitely outward in any direction using chained inpainting and captioning.

USE CASE 4

Answer questions about image content and perform multi-step visual editing without writing code.

Tech stack

PythonCUDAOpenAI APIHugging FaceLangChainStable DiffusionSegment Anything

Getting it running

Difficulty · hard Time to first run · 1day+

Requires CUDA GPU, multiple large model downloads (Stable Diffusion, Segment Anything), OpenAI API key, and LangChain orchestration setup.

License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

TaskMatrix is a research project from Microsoft that connects a large language model (like ChatGPT) to a collection of specialized AI visual tools, allowing users to work with images through natural conversation. The core idea is that a general-purpose language model is very good at understanding instructions and planning, but it cannot directly manipulate images, while dedicated "visual foundation models" like Stable Diffusion, ControlNet, and BLIP are extremely good at image tasks but require specific prompts and programmatic calls. TaskMatrix acts as the bridge between these two worlds. When you type a request such as "turn this photo into a sketch and then colorize it" or "find all the cats in this image and segment them," the language model interprets what you want, decides which sequence of visual tools to use, calls them in order, and returns the result as part of the conversation. Each visual capability, generating images from text, editing by instruction, extracting depth maps, answering questions about an image, detecting objects by description, and more, is wrapped as a pluggable module you can load onto available GPU memory. The project introduces a "template" concept, where complex multi-step workflows can be pre-defined and reused. For example, extending an image infinitely outward in any direction is handled by a template that chains together image captioning, inpainting, and visual question-answering models without any additional training. You would use TaskMatrix if you are a researcher exploring how AI agents can coordinate multiple specialized models, or if you want to experiment with a conversational interface for sophisticated image editing and understanding tasks. It is a Python project that requires a CUDA GPU for most visual models, uses OpenAI's API for the language model backbone, and integrates tools from Hugging Face, LangChain, and Meta's Segment Anything Model.

Copy-paste prompts

Prompt 1
How do I set up TaskMatrix to connect ChatGPT with visual tools like Stable Diffusion and ControlNet for image editing?
Prompt 2
Show me how to create a custom template in TaskMatrix that chains multiple visual models together for a specific image workflow.
Prompt 3
How can I use TaskMatrix to build a conversational interface where users describe image edits in plain English?
Prompt 4
What visual foundation models does TaskMatrix support, and how do I add a new specialized tool to the system?
Prompt 5
Walk me through an example of TaskMatrix coordinating multiple AI models to perform a complex image task like infinite outpainting.
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.