Analysis updated 2026-07-03
Train a model to classify nodes in a large social network (such as detecting spam accounts) using GraphSAGE embeddings.
Generate embeddings for protein interaction data to cluster proteins by function without needing labeled examples.
Run inductive learning on a graph so new nodes added after training still get valid embeddings without retraining.
Reproduce the experiments from the 2017 'Inductive Representation Learning on Large Graphs' paper using the included datasets.
| williamleif/graphsage | canonical/cloud-init | boris-code/feapder | |
|---|---|---|---|
| Stars | 3,687 | 3,687 | 3,686 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | moderate | moderate |
| Complexity | 4/5 | 3/5 | 3/5 |
| Audience | researcher | ops devops | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires TensorFlow (older version pinned in the repo) and NumPy, Docker setup helps manage the specific dependency versions.
GraphSAGE is a research algorithm and code library from Stanford University for learning about the nodes in very large graphs. A graph here is any collection of items connected by relationships: users and friendships, proteins and interactions, documents and links. GraphSAGE produces a numerical summary (called an embedding) for each node that captures what that node is like and who its neighbors are. The core idea is that instead of looking at every connection a node has, GraphSAGE samples a random subset of neighbors at each step. This sampling makes it practical to work with graphs containing hundreds of thousands or millions of nodes, which would be too large for older approaches. The algorithm is also inductive, meaning it can generate embeddings for nodes it has never seen before, such as new users who join a network after training is complete. The code supports two modes. In supervised mode, you provide labeled examples and the model learns embeddings that help classify nodes (for example, predicting which category a piece of content belongs to). In unsupervised mode, it learns embeddings based on which nodes appear together in random walks through the graph, with no labels required. The resulting embeddings can then be fed into other machine learning models for tasks like classification or clustering. Several aggregation strategies are available for combining a node's neighbor information, including mean, max-pooling, and an approach based on a sequence model. The repository includes a small protein interaction dataset to test with, and links to the full datasets used in the original paper. Running the code requires Python with TensorFlow, NumPy, and a few other scientific libraries. A Docker setup is included to make installing the right versions easier. This code accompanies a 2017 paper titled "Inductive Representation Learning on Large Graphs," and the README asks that users cite that paper if they use this work.
GraphSAGE is a Stanford research library that learns a numerical fingerprint for every node in a large graph (like a social network or protein database) so you can classify or cluster them.
Mainly Python. The stack also includes Python, TensorFlow, NumPy.
MIT, use freely for any purpose including commercial, just keep the copyright notice.
Setup difficulty is rated moderate, with roughly 1h+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.