explaingit

facebookresearch/parlai

10,630PythonAudience · researcherComplexity · 4/5Setup · moderate

TLDR

A Python framework from Facebook Research for building and testing conversational AI systems, bundling 100+ dialogue datasets, pre-trained models, and tools to collect human-labeled data or deploy agents to Facebook Messenger.

Mindmap

mindmap
  root((ParlAI))
    What It Does
      Train dialogue AI
      Evaluate chatbots
      Collect human data
    Tech Stack
      Python
    Key Features
      100+ datasets
      Pre-trained models
      Mechanical Turk
      Messenger deploy
    Use Cases
      Chitchat research
      Visual QA
      Task-based dialogue
    Audience
      AI researchers
      NLP practitioners
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train and evaluate a chatbot on one of ParlAI's 100+ included conversation datasets without writing custom data loaders.

USE CASE 2

Collect new training data from human workers via Amazon Mechanical Turk using the built-in data collection pipeline.

USE CASE 3

Deploy a conversational AI agent to Facebook Messenger for real-user testing without switching frameworks.

USE CASE 4

Run visual question answering experiments by pulling an image-dialogue dataset directly from ParlAI's unified library.

Tech stack

Python

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python 3.8+, Windows is not officially supported.

In plain English

ParlAI, pronounced "par-lay", is a Python framework from Facebook Research for building, training, and testing AI systems that carry on conversations. The framework is aimed at researchers who study dialogue: systems that answer questions, maintain chitchat, complete tasks through conversation, or respond to images. One of its main offerings is a unified collection of over 100 publicly available conversation datasets, all accessible through the same code interface. Instead of spending time finding each dataset and writing custom loading code, you can pull from a large library of research datasets including question-answering sets, open-domain chat corpora, and visual question answering collections. The framework also includes a set of pre-built baseline models and a collection of pre-trained models you can load and run without training anything yourself. Beyond datasets and models, ParlAI supports collecting new training data from human workers through Amazon Mechanical Turk, and connecting conversation agents to real users through Facebook Messenger. This makes it possible to go from training and evaluation in research settings to actual human interaction without switching tools. The framework requires Python 3.8 or higher and runs on Linux or macOS. Windows is not officially supported, though the README notes that some users have had success with it. Installation is available through pip. Facebook Research created and maintains the project. It was described in a 2017 academic paper titled "ParlAI: A Dialog Research Software Platform" and has continued to grow since. The project website at parl.ai has additional documentation and tutorials, and an interactive notebook-based tutorial is available for anyone who wants to try it without setting up a local environment.

Copy-paste prompts

Prompt 1
How do I install ParlAI and run a pre-trained conversational model on an included dataset like BlendedSkillTalk?
Prompt 2
I want to fine-tune a dialogue model in ParlAI on my own question-answering dataset, walk me through the required data format and the training command.
Prompt 3
How do I use ParlAI's Mechanical Turk integration to collect human-to-human conversation data for model training?
Prompt 4
Show me how to evaluate a ParlAI model on a test split and interpret the F1 and perplexity metrics it reports.
Prompt 5
What datasets in ParlAI cover open-domain chitchat, and how do I list all available datasets from the command line?
Open on GitHub → Explain another repo

← facebookresearch on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.