Analysis updated 2026-06-21
Copy data from Salesforce, Stripe, and PostgreSQL into your Snowflake data warehouse on a nightly schedule.
Give a LangChain or OpenAI agent live access to CRM and support ticket data using Airbyte's Agent SDK.
Build a custom connector for an internal API using Airbyte's no-code Connector Builder, without writing Python.
Self-host a complete ELT pipeline on your own servers to keep sensitive business data off third-party services.
| airbytehq/airbyte | openai/chatgpt-retrieval-plugin | rasahq/rasa | |
|---|---|---|---|
| Stars | 21,209 | 21,210 | 21,153 |
| Language | Python | Python | Python |
| Setup difficulty | hard | hard | hard |
| Complexity | 4/5 | 4/5 | 4/5 |
| Audience | data | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires Docker Compose or Kubernetes for self-hosted deployment, full setup involves configuring source and destination connectors.
Airbyte is an open-source data movement platform designed to copy data from sources like APIs, databases, and files into destinations like data warehouses, data lakes, and databases. This kind of data pipeline is often called ELT (Extract, Load, Transform). Airbyte provides a catalog of 600+ pre-built connectors covering popular services and databases, and also lets you build new connectors using a no-code Connector Builder or a low-code SDK. It can be self-hosted (deployed on your own infrastructure) or used via Airbyte Cloud. Syncs can be orchestrated through integrations with tools like Airflow, Dagster, or Kestra, or via its own API. Beyond traditional data warehousing use cases, Airbyte also offers an Agent SDK that lets AI agents and LLM applications access live business data from CRMs, support tools, SaaS APIs, and databases in real time. The Agent SDK integrates with frameworks like LangChain, OpenAI Agents, pydantic-ai, and FastMCP. The project is MIT and Elastic License v2 licensed, has an active community on Slack, and its roadmap is publicly visible on GitHub. Enterprise features are available for larger organizations.
An open-source data pipeline platform with 600+ pre-built connectors to move data from APIs and databases into data warehouses, plus an Agent SDK for giving AI apps live access to business data.
Mainly Python. The stack also includes Python, Java, Docker.
Core is MIT licensed (free for any use), some components are under Elastic License v2, which prohibits offering Airbyte itself as a managed cloud service.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.