apache/arrow

Analysis updated 2026-06-24

★ 16,736C++Audience · dataComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((arrow))
    Inputs
      Parquet files
      CSV files
      Database queries
    Outputs
      Columnar memory buffers
      Arrow IPC streams
      Flight RPC payloads
    Use Cases
      Pass data between tools
      Build analytics pipelines
      Connect to databases
    Tech Stack
      C++
      Python
      Java
      Rust
    Components
      Columnar Format
      Arrow Flight
      ADBC
      Parquet readers

mindmap root((arrow)) Inputs Parquet files CSV files Database queries Outputs Columnar memory buffers Arrow IPC streams Flight RPC payloads Use Cases Pass data between tools Build analytics pipelines Connect to databases Tech Stack C++ Python Java Rust Components Columnar Format Arrow Flight ADBC Parquet readers

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Move large dataframes between Python and a database with zero copy

USE CASE 2

Build a data service that streams Arrow records over the network with Flight

USE CASE 3

Read and write Parquet files from any supported language

USE CASE 4

Connect to databases using ADBC instead of language-specific drivers

What is it built with?

C++PythonJavaRustGoJavaScript

How does it compare?

	apache/arrow	zerotier/zerotierone	espressif/arduino-esp32
Stars	16,736	16,732	16,759
Language	C++	C++	C++
Setup difficulty	moderate	easy	moderate
Complexity	4/5	3/5	3/5
Audience	data	ops devops	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Pick the right language binding (pyarrow, arrow-rs, etc) and accept that the C++ core build is heavy if you compile from source.

Apache 2.0 lets you use, modify, and redistribute the code for any purpose including commercial use, as long as you keep the license notice.

In plain English

apache/arrow is a universal standard for how data is stored and moved between programs, with libraries available in over a dozen programming languages. Rather than each data tool inventing its own internal data format, Arrow defines a single shared in-memory layout, called a columnar format (meaning data is organized by column rather than by row), that makes moving data between tools fast and efficient. The problem it solves is data exchange overhead. Without a shared standard, passing data between two different programs (say, a database and a data analytics library) usually requires serializing the data into a file format and deserializing it back on the other side, which wastes time. Arrow lets programs share data directly in memory with zero-copy transfers, meaning no unnecessary data duplication. Key components include the Arrow Columnar Format (the in-memory data layout standard), the Arrow IPC format for efficient data transmission between processes, Arrow Flight (a protocol for building high-performance data services over a network), ADBC (Arrow Database Connectivity, an API for connecting to databases in an Arrow-native way), and readers and writers for common file formats including Parquet and CSV. Libraries are available for C++, Python, R, Java, Go, Rust, JavaScript, Ruby, Julia, Swift, and more. Each language implementation follows the same underlying format, meaning data can move between them without conversion. You would use Apache Arrow when building data pipelines, analytics tools, or anything where multiple programs need to share large datasets quickly. It is an Apache Software Foundation project.

Copy-paste prompts

Prompt 1

Show me how to read a Parquet file in Python with pyarrow and convert it to a Pandas DataFrame without copying memory

Prompt 2

Write a minimal Arrow Flight server in Python that streams a table to a client

Prompt 3

Compare Arrow IPC format vs Parquet and tell me which one to use for streaming vs storage

Prompt 4

Help me move data from a Polars DataFrame to a DuckDB query using Arrow with no serialization

Prompt 5

Set up an ADBC connection from Python to Postgres and read a query result as an Arrow table

Frequently asked questions

What is arrow?

A shared in-memory columnar data format with libraries in 12+ languages, so programs can pass large datasets to each other without serializing or copying.

What language is arrow written in?

Mainly C++. The stack also includes C++, Python, Java.

What license does arrow use?

Apache 2.0 lets you use, modify, and redistribute the code for any purpose including commercial use, as long as you keep the license notice.

How hard is arrow to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is arrow for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub apache on gitmyhub

Verify against the repo before relying on details.