Find the best-performing model for a specific NLP task like machine translation or question-answering to use as a baseline.
Identify benchmark datasets and evaluation metrics for an NLP problem you're working on.
Discover how much room for improvement exists in a particular language processing task.
Compare different approaches and models to decide which direction to pursue for a new NLP project.
This repository is a community-maintained reference tracking the best-known results in Natural Language Processing (NLP), the field of AI concerned with understanding and generating human language. NLP is a broad field covering many specific tasks: translating text between languages, answering questions, detecting who or what is mentioned in text, summarizing documents, recognizing speech, analyzing sentiment, and dozens more. For each task, the repository lists the standard benchmark datasets used to evaluate AI models, describes what the task involves, and records the best scores achieved by published research, this is called the "state of the art" (SOTA). It covers tasks for multiple languages including English, Chinese, Vietnamese, Hindi, French, Spanish, Korean, and others. You would use this if you are an AI researcher or engineer looking to understand what problems exist in NLP, which datasets are used to test solutions, and how well current methods perform. It serves as a starting point for choosing which approach or model to build on for a new NLP project, or to understand how much room for improvement remains in a given task. This is a reading and reference resource, not runnable software. Contributions from the community are welcome.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.