explaingit

cloudpilot-ai/hermes

17GoAudience · ops devopsComplexity · 4/5LicenseSetup · hard

TLDR

A Kubernetes tool that reduces container startup time by over 22x using lazy image loading, automatically building SOCI indexes in the background so application teams never need to change their Docker build pipelines.

Mindmap

mindmap
  root((hermes))
    What it does
      Speeds up container starts
      Lazy image loading
      Builds SOCI indexes
    Architecture
      Controller watches images
      Node daemon hooks runtime
      Policy-based rules
    Results
      22x faster startup
      15s vs 5min 34s
    Tech
      Go
      Kubernetes
      SOCI format
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Speed up AI model serving container cold starts in Kubernetes from several minutes to under 20 seconds.

USE CASE 2

Enable lazy image loading across a cluster without requiring application teams to modify their Docker build pipelines.

USE CASE 3

Reduce Kubernetes pod scheduling latency for large container images during autoscaling bursts.

Tech stack

GoKubernetesSOCI

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a running Kubernetes cluster with cluster-admin permissions to deploy the controller and per-node daemon components.

Apache 2.0 licensed, use freely in any project including commercial, just keep the copyright and license notice.

In plain English

Hermes is a tool for Kubernetes clusters that dramatically speeds up how fast new application containers can start. The core problem it solves is that when Kubernetes needs to launch a container, it normally has to download the entire container image first, which can take several minutes for large images. Hermes lets containers start loading and running before the full download is complete, a technique called lazy loading. The underlying technology it builds on is called SOCI (Seekable OCI), which lets a container read only the parts of an image it actually needs right away and fetch the rest in the background from the original image registry. The challenge with SOCI is that it normally requires application teams to modify their build pipelines to produce special SOCI indexes alongside their images. Hermes removes that requirement entirely. Instead, a platform or infrastructure team installs Hermes and creates configuration rules called HermesPolicies that specify which container images should be optimized. Hermes watches the cluster for running containers matching those rules, builds the SOCI indexes itself in the background, caches them, and makes them available to worker nodes. Application teams keep publishing standard container images with no changes to their own build processes. The benchmark numbers in the README are significant. A 10.8 gigabyte AI model serving image (vLLM) that normally takes 5 minutes 34 seconds to pull before a container can start was reduced to 15 seconds startup time with Hermes, a speedup of over 22 times. The 15 seconds is measured after the SOCI artifact is already prepared, the actual container becomes usable in that time rather than waiting for the full download. Hermes runs as two main components: a controller that watches for matching images, builds the SOCI metadata, and serves it to nodes, and a per-node daemon that integrates with the container runtime and uses the controller-managed metadata during container startup. It is written in Go and licensed under Apache 2.0.

Copy-paste prompts

Prompt 1
I have a Kubernetes cluster running large AI model containers that take 5+ minutes to pull. Walk me through installing Hermes and creating a HermesPolicy to speed up startup for my vLLM containers.
Prompt 2
How do I write a HermesPolicy in Hermes to target all containers in a specific namespace and enable lazy SOCI loading for them?
Prompt 3
After installing Hermes, how do I verify it is working and check the SOCI index build status for a specific container image?
Prompt 4
What are the cluster prerequisites for running Hermes and how does the node daemon integrate with the container runtime on each worker node?
Open on GitHub → Explain another repo

← cloudpilot-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.