explaingit

xiaorenwu234/faasflow

25PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

FaaSFlow is an academic research prototype of a serverless workflow engine that reduces scheduling overhead and data-transfer latency when chaining small cloud functions, published at ASPLOS 2022.

Mindmap

mindmap
  root((faasflow))
    What It Does
      Serverless workflow engine
      Reduces scheduling overhead
      Faster data transfer
    Key Techniques
      WorkerSP local scheduling
      Adaptive local memory transfer
    Cluster Requirements
      8 machines minimum
      1 database and gateway
      7 worker nodes
    Research Context
      ASPLOS 2022 paper
      8 benchmark workflows
    Use Cases
      Reproduce experiments
      Compare scheduling modes
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Reproduce the benchmark experiments from the ASPLOS 2022 FaaSFlow paper using the included test scripts and eight built-in workflows.

USE CASE 2

Compare centralized versus worker-local scheduling overhead by switching between modes with a single script flag.

USE CASE 3

Study how passing data through local memory instead of a shared database affects latency between co-located serverless functions.

USE CASE 4

Measure end-to-end latency and tail latency under bandwidth limits using the provided experiment scripts.

Tech stack

Python

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a cluster of at least eight physical or virtual machines with specific services configured and IP addresses set on each node.

In plain English

FaaSFlow is a research prototype of a serverless workflow engine, published as part of a paper accepted at ASPLOS 2022, a major computer systems conference. The project addresses a specific performance problem: when you run a chain of small functions in a serverless environment (the kind where each step is an independent unit of code that spins up on demand), there is overhead involved in scheduling each step and in passing data between them. FaaSFlow proposes two techniques to reduce that overhead. The first technique is called WorkerSP, which shifts scheduling decisions to the individual worker machines rather than a central coordinator. The idea is that if a worker already knows what steps come next in a workflow, it can handle the handoff itself without waiting for a central manager to respond each time. The second technique is an adaptive storage layer that uses the local memory of a machine to transfer data between two functions running on the same physical node, instead of writing and reading from a shared database in between. The repository is structured around reproducing the experiments from the paper. Setup requires a cluster of at least eight machines: one for the database and gateway, seven as worker nodes. The README walks through installing dependencies on each machine, configuring IP addresses in config files, and starting the right services in the right order. Experiments are run via Python scripts that measure things like scheduling overhead, data transfer time, end-to-end latency, and tail latency under network bandwidth limits. This is academic research code, not a production-ready system you would deploy to run real applications. It is designed so other researchers can reproduce the benchmarks from the paper. The eight benchmark workflows used in experiments are built into the setup scripts, and the test scripts accept flags to switch between the two scheduling modes and data modes being compared. There is no description provided for this repository beyond what the README itself contains.

Copy-paste prompts

Prompt 1
Walk me through setting up the FaaSFlow 8-machine cluster: configuring IP addresses in the config files and starting gateway and worker services in the correct order.
Prompt 2
Run the FaaSFlow scheduling overhead benchmark and explain what the output latency metrics mean in plain English.
Prompt 3
Explain the difference between WorkerSP distributed scheduling and centralized scheduling in FaaSFlow and when each one performs better.
Prompt 4
Help me adapt the FaaSFlow benchmark scripts to add a ninth workflow and measure its end-to-end latency using the same test harness.
Open on GitHub → Explain another repo

← xiaorenwu234 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.