Turning your incident data into a knowledge graph
Spencer Cheng
22 April, 2026
Every incident your team resolves is a lesson waiting to be structured. A knowledge graph transforms scattered post-mortems into a living system your AI can reason over — turning institutional memory into a real-time advantage.
Every time an incident happens, your team generates a wealth of information — timelines, root causes, affected services, and the people involved. Most of this knowledge lives in post-mortems that nobody reads twice. What if you could turn all of that into a structured knowledge graph that your AI systems could actually reason over?
Post-mortems are written in free-form prose. They're useful in the moment but nearly impossible to query at scale. When your 200th incident hits, you can't easily answer questions like:
The data is there — it's just locked in unstructured text.
A knowledge graph models entities (services, engineers, incidents, runbooks) as nodes and their relationships as edges. Once your incidents are in graph form, you can ask questions that flat documents can't answer:
This is the foundation for AI-powered incident response — an LLM with access to your knowledge graph can reason over your entire incident history, not just a single post-mortem.
Use an LLM to parse each post-mortem and extract structured data:
{
"incident_id": "INC-1042",
"services_affected": ["payments-api", "checkout-service"],
"root_cause": "misconfigured rate limit after deploy",
"resolved_by": ["alice@gleg.ai"],
"duration_minutes": 47,
"runbooks_used": ["RB-22", "RB-08"]
}
Map your entities to a graph schema:
Incident → AFFECTED → ServiceEngineer → RESOLVED → IncidentIncident → USED → RunbookIncident → CAUSED_BY → RootCauseGive your AI agent a set of graph query tools. When an incident fires, it can instantly surface similar past incidents, the engineers who resolved them, and the runbooks that worked.
At Gleg, we've seen teams cut their mean time to resolution by over 40% once their AI has access to a well-structured incident knowledge graph. The AI doesn't replace your on-call engineer — it gives them a 10-year institutional memory in the first 30 seconds of an incident.
The graph also gets smarter over time. Every resolved incident adds new edges, new patterns, and new context. The longer you run it, the more valuable it becomes.
You don't need to graph every incident at once. Start with your last 50 post-mortems, extract entities with an LLM, and build a small graph. Run a few queries. See what surfaces. Then scale from there.
The hardest part isn't the technology — it's committing to writing post-mortems consistently enough that the data is worth graphing.