explaingit

explosion/spacy

33,585PythonAudience · developerComplexity · 3/5MaintainedLicenseSetup · easy

TLDR

Python library for understanding and analyzing text: tokenization, named entity recognition, grammar parsing, and text classification with pretrained models for 70+ languages.

Mindmap

mindmap
  root((spaCy))
    What it does
      Tokenization
      Named entity recognition
      Grammar parsing
      Text classification
    Input & Output
      Raw text input
      Structured annotations
      Extracted entities
    Use cases
      Customer support routing
      Document analysis
      Information extraction
    Tech stack
      Python
      Cython
      CUDA optional
    Audience
      Developers
      Data scientists
      ML engineers

Things people build with this

USE CASE 1

Build a customer support system that automatically routes tickets by topic using text analysis.

USE CASE 2

Extract people, organizations, and locations mentioned in news articles or documents.

USE CASE 3

Analyze legal contracts to find key entities, dates, and obligations automatically.

USE CASE 4

Create an information extraction pipeline that pulls structured facts from scientific papers or research documents.

Tech stack

PythonCythonCUDABERTpipconda

Getting it running

Difficulty · easy Time to first run · 5min
Use freely for any purpose, including commercial use, as long as you keep the copyright notice.

In plain English

spaCy is a Python library for Natural Language Processing (NLP), the branch of AI that deals with understanding and analyzing human language in text. It provides tools for common language processing tasks: tokenization (splitting text into words and sentences), part-of-speech tagging (identifying nouns, verbs, adjectives), named entity recognition (finding people, organizations, and locations mentioned in text), dependency parsing (understanding sentence grammar structure), and text classification. It also integrates with transformer-based models like BERT, which are large neural networks pre-trained on massive amounts of text that can be fine-tuned for specific language tasks. spaCy is designed to be production-ready rather than a research tool, it prioritizes speed and reliability suitable for real-world applications that process large volumes of text. It ships with pretrained language pipelines for over 70 languages that can be downloaded and used immediately without any training. For custom needs, it includes a full training system to create your own models from labeled data. Someone would use spaCy when building applications that need to extract meaning or structure from text: a customer support system that routes tickets by topic, a legal document analyzer that finds mentioned entities and dates, a news aggregator that identifies key people and organizations in articles, or an information extraction pipeline that pulls facts from scientific papers. The tech stack is Python with Cython (a language that compiles Python-like code into fast C extensions) used internally for performance-critical parts. It installs via pip or conda and can use GPU acceleration through CUDA.

Copy-paste prompts

Prompt 1
Show me how to use spaCy to extract all person names and organizations from a block of text.
Prompt 2
How do I train a custom spaCy model to recognize domain-specific entities in my industry's documents?
Prompt 3
Write a spaCy script that takes customer support tickets and classifies them by topic using text analysis.
Prompt 4
How do I use spaCy with a transformer model like BERT to improve accuracy on my NLP task?
Prompt 5
Show me how to tokenize and parse the grammar structure of sentences using spaCy.
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.