explaingit

stanfordnlp/corenlp

10,074JavaAudience · researcherComplexity · 3/5LicenseSetup · moderate

TLDR

Stanford CoreNLP is a Java library from Stanford that extracts structured information from text, named entities, grammar structure, coreference, and more, across 8 languages including English, Chinese, and Arabic.

Mindmap

mindmap
  root((CoreNLP))
    What it does
      Named entity recognition
      Part of speech tagging
      Coreference resolution
    Tech Stack
      Java
      Maven and Ant
      ML and deep learning
    Languages
      English Arabic Chinese
      French German Spanish
      Hungarian Italian
    Audience
      NLP researchers
      Search developers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Extract all names of people, organizations, and places from a document collection and normalize dates and numbers into standard formats.

USE CASE 2

Build a search or summarization tool that understands the grammatical structure of text, not just keyword matching.

USE CASE 3

Analyze multilingual text in Arabic, Chinese, French, German, Hungarian, Italian, or Spanish for academic research.

USE CASE 4

Resolve coreference in a document to map pronouns and noun phrases back to the entity they refer to.

Tech stack

JavaMavenAnt

Getting it running

Difficulty · moderate Time to first run · 30min

Language model files must be downloaded separately per language, the English models are bundled but others require additional downloads.

Free to use and modify, but if you distribute software that includes this library you must also release your source code under the same GPL license.

In plain English

Stanford CoreNLP is a Java library from Stanford University that takes raw text and automatically extracts structured information from it. Give it a sentence or a document and it will identify the parts of speech for each word, find the base form of each word, recognize names of people, organizations, and places, resolve dates and numbers into standard formats, map out the grammatical structure of sentences, and figure out when different phrases in the text are referring to the same entity. These are building blocks that power search tools, document summarizers, and other applications that need to understand language rather than just find keywords. The toolkit was first built for English but now supports Arabic, Chinese, French, German, Hungarian, Italian, and Spanish at varying levels of depth. The underlying techniques are a mix of rule-based logic, traditional machine learning models, and newer deep learning components, depending on the task. It is widely used in academic research, commercial products, and government applications. To use it, you add the library to a Java project via Maven or by downloading the jar files directly. Language models, which are the trained files the library needs to do its analysis, are downloaded separately per language. Smaller English models come bundled by default, larger specialized ones are available as additional downloads or from the Hugging Face Hub. Once the models are in place, running all of the analysis tools on a piece of text takes about two lines of code. The project is released under the GNU General Public License version 2 or later. That license permits free use and modification but does not allow you to incorporate the library into proprietary software you distribute to others without releasing your source code. The README covers build instructions for both Ant and Maven, model download links for each supported language, and links to the main documentation site at stanfordnlp.github.io/CoreNLP. Stable releases come out several times a year, the latest development code is always available directly from the repository.

Copy-paste prompts

Prompt 1
Using Stanford CoreNLP in Java, extract all named entities from a news article and print each entity, its type, and character offset as a CSV.
Prompt 2
Show me how to add Stanford CoreNLP to a Maven project and run the full annotation pipeline on a sentence with just two lines of Java code.
Prompt 3
Using Stanford CoreNLP's coreference resolution annotator, find all pronouns in a paragraph and map each one to the entity it refers to.
Prompt 4
How do I download the French language model for Stanford CoreNLP and run part-of-speech tagging on a French sentence?
Open on GitHub → Explain another repo

← stanfordnlp on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.