Automatically investigate a failing Kubernetes deployment by collecting logs and events, then getting an AI diagnosis with a recommended fix.
Build a platform that streams real-time troubleshooting progress to a team dashboard while an AI investigates cluster issues.
Use this architecture doc as a blueprint for designing your own AI-assisted DevOps investigation tool.
Currently a design document only, no working code or setup guide exists yet, implementation requires a live Kubernetes cluster and an LLM API key.
This repository contains the high-level design document for an AI-powered troubleshooting platform aimed at Kubernetes environments. Kubernetes is a system used to run and manage containerized applications across multiple servers, and diagnosing failures in it often requires reading through logs, events, and status reports from many different sources at once. The goal of this project is to automate that investigation process using an AI model. The planned architecture has four main layers. The first layer connects to a live Kubernetes cluster and collects diagnostic information: the health of individual application instances, their logs, scheduling events, deployment status, and network configuration. The second layer takes all of that collected data and passes it to a large language model, which can be Claude, GPT, or DeepSeek depending on configuration. The model reasons about what went wrong, correlates signals from the different sources, and produces a root cause diagnosis along with a confidence percentage and specific fix recommendations. The third layer is a backend service called InsForge that handles user authentication, stores the history of past investigations, and streams live progress updates as an investigation runs. The fourth layer is a front-end dashboard where users trigger investigations by entering a cluster name and namespace, then watch real-time status updates and read the AI's diagnosis and suggested remediation steps. The repository appears to be in a design and planning stage rather than a finished product. The README is structured as a formal architecture document with ASCII diagrams showing how data flows between components, but there is no setup guide or working code described.
← iam-veeramalla on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.