microsoft/jarvis

Analysis updated 2026-05-18

★ 24,693PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((repo))
    How it works
      ChatGPT plans tasks
      Selects expert models
      Runs in sequence
      Synthesizes results
    Key components
      Task planning stage
      Model selection stage
      Task execution stage
      Response generation
    Use cases
      Multi-step image tasks
      Complex AI workflows
      Model orchestration
    Tech stack
      Python
      PyTorch
      ChatGPT API
      Hugging Face models
    Setup requirements
      OpenAI API key
      Hugging Face account
      GPU memory optional

mindmap root((repo)) How it works ChatGPT plans tasks Selects expert models Runs in sequence Synthesizes results Key components Task planning stage Model selection stage Task execution stage Response generation Use cases Multi-step image tasks Complex AI workflows Model orchestration Tech stack Python PyTorch ChatGPT API Hugging Face models Setup requirements OpenAI API key Hugging Face account GPU memory optional

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Automatically break down complex image tasks (like pose detection + image generation) and execute them end-to-end.

USE CASE 2

Build AI agent systems that coordinate multiple specialized models without manually writing orchestration code.

USE CASE 3

Research how large language models can act as central planners for multi-model AI workflows.

What is it built with?

PythonPyTorchChatGPTHugging Face

How does it compare?

	microsoft/jarvis	spotdl/spotify-downloader	agentscope-ai/agentscope
Stars	24,693	24,657	24,646
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	4/5	2/5	3/5
Audience	researcher	vibe coder	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires OpenAI API key, multiple Hugging Face model downloads, and PyTorch/CUDA setup for inference.

Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

JARVIS (also known as HuggingGPT) is a Microsoft research project that uses a large language model, specifically ChatGPT, as a central coordinator to automatically plan and execute complex AI tasks by delegating work to specialized AI models hosted on Hugging Face. Here is how it works: when you give JARVIS a complicated request like "describe the poses in this photo and generate a new image based on them," ChatGPT breaks the task into steps, selects the appropriate expert models from the Hugging Face model hub (for pose detection, image generation, etc.), runs them in the right order, collects the outputs, and synthesizes a final response. The LLM (large language model) acts as the brain, the specialist models act as hands. The workflow has four stages: task planning (ChatGPT figures out what needs to be done), model selection (ChatGPT picks which Hugging Face models to use based on their descriptions), task execution (the models run), and response generation (ChatGPT summarizes the results). A lightweight mode exists that does not require downloading models locally. Researchers studying AI agent architectures or multi-model orchestration would use this project. It requires an OpenAI API key and a Hugging Face account. The full local setup needs significant GPU memory (24GB VRAM recommended) and disk space. Built in Python with PyTorch.

Copy-paste prompts

Prompt 1

How do I set up JARVIS to take a photo, detect poses, and generate a new image based on those poses?

Prompt 2

Show me how to add a custom Hugging Face model to JARVIS's available model pool for task execution.

Prompt 3

What's the lightweight mode in JARVIS and how do I use it without downloading models locally?

Prompt 4

How does JARVIS decide which Hugging Face model to use for each step in a multi-stage task?

Frequently asked questions

What is jarvis?

A system that uses ChatGPT to coordinate and execute complex AI tasks by automatically selecting and running specialized models from Hugging Face.

What language is jarvis written in?

Mainly Python. The stack also includes Python, PyTorch, ChatGPT.

What license does jarvis use?

Use freely for any purpose including commercial, as long as you keep the copyright notice.

How hard is jarvis to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is jarvis for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub microsoft on gitmyhub

Verify against the repo before relying on details.