explaingit

dmlc/dgl

14,273PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

A Python library for building and training deep learning models on graph-structured data like social networks and molecules, built on top of PyTorch, MXNet, or TensorFlow.

Mindmap

mindmap
  root((repo))
    What it does
      Graph deep learning
      Message passing
      GNN training
    Backends
      PyTorch
      Apache MXNet
      TensorFlow
    Domains
      Social networks
      Molecules
      Knowledge graphs
      NLP as graphs
    Scale
      Multi-GPU
      Distributed training
      Billions of nodes
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train a graph neural network to classify nodes in a social network by predicting user attributes from connection patterns.

USE CASE 2

Apply deep learning to molecular property prediction using the DGL-LifeSci companion package for chemistry and drug discovery.

USE CASE 3

Run well-known graph models on the Open Graph Benchmark without writing code using the DGL-Go command-line tool.

USE CASE 4

Scale a graph learning model to billions of nodes using DGL's distributed multi-GPU training support.

Tech stack

PythonPyTorchApache MXNetTensorFlowCUDA

Getting it running

Difficulty · hard Time to first run · 1h+

GPU acceleration requires CUDA, distributed training needs multiple machines or GPUs, setup complexity varies significantly by backend.

In plain English

DGL (Deep Graph Library) is a Python package that makes it easier to build and train machine learning models that operate on graph-structured data. A graph, in this context, is not a chart but a network of nodes and edges, like a social network where people are nodes and friendships are edges, or a molecule where atoms are nodes and chemical bonds are edges. Many real-world problems are naturally represented this way, and DGL provides the tools to apply modern deep learning techniques to them. The library sits on top of existing deep learning frameworks, specifically PyTorch, Apache MXNet, and TensorFlow. You bring your preferred framework, and DGL adds a high-performance graph object alongside it that can live on either a CPU or a GPU. At the core of DGL is a message-passing system: nodes and edges can exchange and aggregate information through the graph structure, which is the fundamental operation behind most graph neural network architectures. For researchers, DGL includes example implementations of many published graph neural network models covering tasks like node classification, link prediction, and graph classification. A command-line tool called DGL-Go lets you train and evaluate well-known models without writing code. For advanced users, DGL supports distributed training across multiple GPUs and multiple machines, with internal optimizations that allow it to scale to graphs with billions of nodes and edges. The project is used as one of the standard platforms for major graph learning benchmark suites, including the Open Graph Benchmark. Companion packages extend it to specific domains: DGL-LifeSci for biology and chemistry, DGL-KE for knowledge graphs, and Graph4NLP for natural language processing tasks that can be modeled as graphs. Learning resources include a 120-minute introductory tutorial, a full user guide available in English and Chinese, and a discussion forum. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
Show me how to define a graph neural network in DGL with PyTorch that classifies nodes in a citation network, step by step from graph construction to training loop.
Prompt 2
I want to predict molecular properties using DGL. How do I represent a molecule as a DGL graph and run a GNN on it with DGL-LifeSci?
Prompt 3
Explain how message passing works in DGL and show me an example where nodes aggregate neighbor features, similar to a GraphSAGE layer.
Prompt 4
How do I train a DGL model across multiple GPUs using DGL's distributed training API on a single machine with 4 GPUs?
Open on GitHub → Explain another repo

← dmlc on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.