explaingit

github-linguist/linguist

13,467RubyAudience · developerComplexity · 3/5Setup · moderate

TLDR

Linguist is the Ruby library GitHub uses internally to detect programming languages in a repo, generate the colored language bar, and control syntax highlighting.

Mindmap

mindmap
  root((linguist))
    What it does
      Language detection
      GitHub language bar
      Syntax highlighting
    Features
      Per-file breakdown
      JSON output
      Vendor file filtering
    Tech stack
      Ruby gem
      C extensions
      git history reader
    Use cases
      Repo language stats
      Single file detection
      CI scripting
    Config
      .gitattributes overrides
      Per-revision analysis
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run Linguist on a git repository to get a percentage breakdown of which programming languages are used.

USE CASE 2

Detect the programming language and MIME type of a single file from the command line.

USE CASE 3

Override Linguist's language guesses for specific files in your repo using a .gitattributes file.

USE CASE 4

Emit language stats as JSON for use in CI pipelines or custom scripts.

Tech stack

RubyCCMakeICU

Getting it running

Difficulty · moderate Time to first run · 30min

Requires system packages (cmake, ICU, OpenSSL) and a non-system Ruby install, macOS bundled Ruby often causes problems.

In plain English

Linguist is the Ruby library that GitHub itself uses to figure out which programming languages a repository contains. When you visit a project page on GitHub and see that colored breakdown bar showing something like 70% Ruby and 25% C, that data comes from Linguist. Beyond language detection, the library also helps GitHub ignore binary or vendored files, hide auto-generated content from diffs, and apply the right syntax highlighting. You can install and run Linguist yourself as a Ruby gem. Because it relies on two compiled dependencies, one for character encoding and one for reading git history, you need some system packages installed first. The README lists the exact commands for macOS (via Homebrew) and Ubuntu, covering things like cmake, ICU, and OpenSSL. It also warns that the version of Ruby bundled with macOS often causes problems, and recommends using a separate Ruby install via Homebrew, rbenv, or a similar tool. Once installed, a command-line tool called github-linguist works in two modes. Point it at a folder or git repository and it prints each detected language with its percentage and total byte size. Point it at a single file and it reports that file's type, MIME type, and detected language. You can run it against a specific git revision, like a tag or branch, so you can see how the language mix looked at any point in history. Several flags adjust the output. One shows a per-file breakdown instead of just totals. Another reveals which detection strategy was used for each file, such as file extension, filename pattern, or heuristic analysis. You can also emit everything as JSON for use in scripts or other tools. Projects can override Linguist's guesses through a .gitattributes file, forcing specific files or extensions to be counted as a different language. The README includes documentation links for deeper topics like how the detection works internally, how to configure overrides, and how to contribute fixes when a repo's language is being reported incorrectly.

Copy-paste prompts

Prompt 1
I have a mixed Python and JavaScript repo. Show me how to run Linguist on it to get a language percentage breakdown as JSON.
Prompt 2
Linguist is counting my vendored third-party code in my language stats. How do I exclude those folders using .gitattributes?
Prompt 3
Show me how to install Linguist on Ubuntu and use it to detect the language of a single file from the terminal.
Prompt 4
I want to use Linguist in a Ruby script to get the detected language of every file in a repo. Give me a working code example.
Open on GitHub → Explain another repo

← github-linguist on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.