explaingit

angel-ml/angel

6,788JavaAudience · dataComplexity · 5/5LicenseSetup · hard

TLDR

A distributed machine learning platform from Tencent and Peking University that trains large models with billions of parameters across many machines using a Parameter Server architecture, integrated with Hadoop and Spark clusters.

Mindmap

mindmap
  root((Angel ML))
    Architecture
      Parameter Server
      Worker machines
      Server machines
    Integrations
      YARN cluster
      Spark on Angel
    Algorithms
      Logistic regression
      Gradient boosted trees
      Graph neural networks
    License
      Apache 2.0
    Audience
      Data engineers
      ML researchers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train a machine learning model too large to fit on one machine by distributing its parameters across a cluster

USE CASE 2

Run gradient boosted trees or logistic regression on massive datasets using your existing Hadoop or Spark infrastructure

USE CASE 3

Train graph neural networks for node classification or link prediction on large-scale graph data

USE CASE 4

Integrate Angel into an existing Spark data pipeline for the model training step without rebuilding your infrastructure

Tech stack

JavaSparkHadoopYARN

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a running Hadoop YARN cluster, not suitable for single-machine or laptop use.

Use freely for any purpose including commercial, with attribution and license notice required.

In plain English

Angel is a distributed machine learning platform developed jointly by Tencent and Peking University. Its core purpose is training machine learning models on very large datasets, particularly when the model itself has an enormous number of parameters that would not fit on a single machine. It was built from Tencent's internal experience handling the kind of scale that comes with a major internet company's data. The system is built around an idea called a Parameter Server. In simple terms, this means the model's parameters (the numbers that get adjusted during training) are split across many server machines, while separate worker machines process the training data and send updates back. This split allows training on datasets and model sizes that would be impractical on a single computer. Angel runs on Yarn, which is the resource management layer commonly used in Hadoop clusters. It also integrates with Spark, a popular distributed data processing tool, through a component called Spark on Angel. This means teams already using Spark for data pipelines can incorporate Angel for the model training step without rebuilding their infrastructure from scratch. The list of algorithms included in the repository is long. On the traditional machine learning side it covers logistic regression, support vector machines, factorization machines, k-means clustering, gradient boosted decision trees, and others. There is also a graph computing module that includes algorithms for ranking pages by importance, detecting communities, finding common connections between nodes, and training graph neural networks for tasks like node classification or link prediction. The project is open source under the Apache 2.0 license and is active under the Linux Foundation's Deep Learning Foundation. Several academic papers have been published about the system and its components, including work presented at major database and machine learning research venues.

Copy-paste prompts

Prompt 1
I have a Hadoop YARN cluster and want to train a logistic regression model on a 100GB dataset using Angel. Walk me through the setup and job submission.
Prompt 2
Explain how Angel's Parameter Server architecture works, how do workers and servers split the training job across machines?
Prompt 3
Show me how to use Spark on Angel to run a factorization machine model on my existing Spark dataset.
Prompt 4
I want to run graph community detection on a large graph using Angel's graph computing module. Which algorithm should I use and how do I configure it?
Prompt 5
How do I submit an Angel training job to a YARN cluster and monitor its progress from the command line?
Open on GitHub → Explain another repo

← angel-ml on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.