explaingit

xiincs/claude-code-vision-skill

12PythonAudience · developerComplexity · 2/5Setup · moderate

TLDR

Adds image-analysis capability to Claude Code setups that lack vision support, by routing image questions to an external vision-capable AI model such as GPT-4o, Qwen, or Doubao.

Mindmap

mindmap
  root((claude-code-vision-skill))
    What it does
      Routes images to vision AI
      Returns analysis text
    Supported Providers
      GPT-4o
      Qwen
      Doubao
    Setup
      Python install script
      API key config
    Use Cases
      Analyze screenshots
      Check UI layouts
      Read charts
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Analyze UI screenshots or charts inside a text-only Claude Code session by routing them to GPT-4o, Qwen, or Doubao.

USE CASE 2

Automatically check the visual layout of web pages by pairing this skill with a browser-harness skill.

USE CASE 3

Override the default vision provider at call time to switch between Doubao, Qwen, or GPT-4o for different tasks.

Tech stack

Python

Getting it running

Difficulty · moderate Time to first run · 30min

Requires an API key from one of three providers (Doubao, Qwen, or OpenAI GPT-4o) configured via environment variable before use.

No license information was mentioned in the explanation.

In plain English

Claude Code is a coding assistant that can run as the underlying AI model for various tasks. Some versions of Claude Code are backed by models that lack the ability to process images, meaning they can only work with text. This project adds image analysis capability to those setups by routing image questions to a separate AI model that does support visual input. The idea is straightforward: when you have an image you want Claude Code to analyze, such as a screenshot, a user interface layout, or a chart, this skill passes that image to a vision-capable model and returns the result. The supported models come from three providers: Doubao (a Chinese AI service), Qwen (another Chinese AI service from Alibaba), and OpenAI's GPT-4o. You configure which one to use by setting an API key in an environment variable. Installation is handled by a Python script that walks through the necessary steps: asking which provider you want to use, setting up the API key, and updating a configuration file that Claude Code reads when starting up. The configuration update inserts skill instructions into a global Claude Code settings file, with markers so the content can be replaced cleanly on future updates. Once installed, the skill can be called from the command line with an image file and a question in plain text. You can also specify a particular provider at call time if you want to override the default. The README mentions that this skill is designed to pair with a separate browser-harness skill for checking the visual layout of web pages automatically. The README is written primarily in Chinese, reflecting its intended audience. The project is small: one installation script, one vision script, and a skill definition file.

Copy-paste prompts

Prompt 1
Using the claude-code-vision-skill, analyze this screenshot of my app's login page and list any visual issues you notice: [attach screenshot.png]
Prompt 2
Set up claude-code-vision-skill with GPT-4o as the provider, then run it on a bar chart image and extract the data values shown.
Prompt 3
Show me how to call the claude-code-vision-skill command-line tool with the --provider flag to switch from Qwen to GPT-4o for a single image.
Prompt 4
Modify the claude-code-vision-skill install script to add support for a fourth vision provider with a custom API endpoint.
Open on GitHub → Explain another repo

← xiincs on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.