explaingit

future-house/paper-qa

8,485PythonAudience · researcherComplexity · 2/5LicenseSetup · moderate

TLDR

PaperQA2 lets you ask plain-English questions about a folder of research PDFs and get answers with exact citations pointing to the specific paper and page each claim came from, using AI-powered search rather than memorized knowledge.

Mindmap

mindmap
  root((paper-qa))
    What it does
      Ask questions about PDFs
      Cited answers
      Contradiction detection
    How it works
      RAG pipeline
      Iterative search agent
      Contextual summarization
    Input formats
      PDF
      Word documents
      HTML and text
    Setup
      pip install
      OpenAI key default
      Local models via LiteLLM
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Ask questions about a collection of research papers and get cited answers pointing to specific pages and papers

USE CASE 2

Build a literature review assistant that searches your local PDF library and synthesizes findings with references

USE CASE 3

Check whether papers in your collection contradict each other using AI-powered analysis

Tech stack

PythonOpenAILiteLLMpip

Getting it running

Difficulty · moderate Time to first run · 30min

Requires an OpenAI API key by default, switching to a local or alternative model requires additional LiteLLM configuration.

Use freely for any purpose including commercial use as long as you include the Apache 2.0 license notice and copyright.

In plain English

PaperQA2 is a Python tool for asking questions about scientific papers and getting answers that include specific citations pointing back to the source text. You point it at a folder of PDF files, or other document types, and ask a question in plain English. It finds relevant passages, summarizes them, and produces an answer that tells you exactly which paper and which page each claim came from. The technique behind it is called retrieval augmented generation, or RAG, which means an AI language model is combined with a search system rather than relying only on what the model has memorized. PaperQA2 adds several refinements on top of basic RAG: it can run as an agent that iterates, refining its search queries if the first results are not good enough, it fetches metadata about papers automatically, including citation counts and retraction status, and it uses an additional step called contextual summarization to improve the quality of retrieved passages before passing them to the language model. The README reports that this pipeline has exceeded human performance on benchmarks involving scientific question answering, summarization, and contradiction detection. Installation is through pip, and basic use requires just three commands: install the package, put PDFs in a folder, and run pqa ask with your question. By default the tool uses OpenAI models for both the language model and the embedding step that finds relevant documents, but it supports a wide range of other models through a library called LiteLLM. Local models can also be used if you do not want to send data to an external service. The tool supports PDFs, plain text files, Microsoft Office documents, HTML, and source code files. It can maintain an index of a local document collection and reuse it across sessions without reprocessing everything each time. External vector databases can be plugged in for larger collections. PaperQA2 is developed by a research organization called FutureHouse, is open source under the Apache 2.0 license, and has an accompanying academic paper describing its architecture and benchmark results. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
Using PaperQA2, set up a pipeline to answer the question 'What are the known side effects of metformin in elderly patients?' against my folder of 50 clinical trial PDFs
Prompt 2
How do I configure PaperQA2 to use a local AI model via LiteLLM instead of OpenAI so my research papers stay private?
Prompt 3
Write a Python script using PaperQA2 to index a folder of PDFs and answer a list of research questions from a text file
Prompt 4
How do I maintain a persistent PaperQA2 index across sessions so I don't reprocess the same PDFs each time?
Open on GitHub → Explain another repo

← future-house on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.