Analysis updated 2026-07-05 · repo last pushed 2015-01-03
Fix a repository showing the wrong primary language on GitHub.
Exclude vendored or generated files from your language statistics.
Force GitHub to count specific files as a particular language.
Identify the programming languages in a local codebase.
| snatchev/linguist | joshuakgoldberg/mastodon | moritzheiber/mysql | |
|---|---|---|---|
| Language | Ruby | Ruby | Ruby |
| Last pushed | 2015-01-03 | 2024-05-11 | 2013-08-18 |
| Maintenance | Dormant | Dormant | Dormant |
| Setup difficulty | moderate | hard | moderate |
| Complexity | 2/5 | 4/5 | 3/5 |
| Audience | developer | ops devops | ops devops |
Figures from each repo's GitHub metadata at analysis time.
It is a Ruby library that requires Ruby installed and some configuration to run locally on your own repository.
Linguist is the tool GitHub uses to figure out what programming languages are in your code repository. When you look at a repo on GitHub and see a colored bar showing "70% Ruby, 20% JavaScript, 10% CSS," that breakdown is produced by this library. It also powers the syntax highlighting you see when you browse files on the site. The detection process works in layers. Most files are identified by their extension, a .rb file is Ruby, a .py file is Python. But some extensions are ambiguous. A .h file could be C, C++, or Objective-C. For those cases, Linguist first applies some common-sense rules, then falls back to a statistical classifier that looks at the actual content of the file to make an educated guess. Beyond detection, it also filters out "noise" files: vendored third-party code sitting in directories like vendor/, and generated files like minified JavaScript, so they don't skew your language stats or clutter diffs. Anyone who manages a GitHub repository benefits from this, though most people never think about it, it just works in the background. The people who interact with it directly are typically those whose repo language gets misidentified (say, a project showing up as "HTML" when it's really a JavaScript app) and want to fix it. Linguist lets you override its defaults by adding a .gitattributes file to your project, where you can explicitly tell it which language a file should be counted as, or flag certain paths as vendored so they're excluded from stats. The project is notable for its transparency and community-driven approach. GitHub actively encourages users to submit pull requests when a language is misdetected, and the full list of recognized languages lives in a human-readable YAML file that anyone can read and propose changes to. It's a rare example of a core platform feature being maintained as open source that the community can directly shape.
Linguist is the tool GitHub uses to detect what programming languages are in your repository. It identifies languages by file extension and content, filters out third-party and generated files, and lets you manually override results.
Mainly Ruby. The stack also includes Ruby.
Dormant — no commits in 2+ years (last push 2015-01-03).
This project is maintained by GitHub as open source, though the specific license type is not stated in the explanation.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.