Documentation Index Fetch the complete documentation index at: https://docs.cnap.tech/llms.txt
Use this file to discover all available pages before exploring further.
An agent asked “why is my app returning 500s?” runs a full incident triage — checking pod health, recent Kubernetes events, error logs, and deployment rollout history — all composed across multiple execute calls, reasoning about each result before deciding what to check next.
The Triage Flow
This isn’t a single code block — it’s how the agent thinks . Each step is one execute call, but the agent decides what to check based on what it finds.
Step 1 — Pod Health Check
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // resolved by the agent from conversation
const kube = ( path ) => cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube/ ${ path } ` ,
}). then ( r => r . body );
const pods = await kube ( `api/v1/namespaces/ ${ namespace } /pods` );
return pods . items . map ( p => ({
name: p . metadata . name ,
phase: p . status . phase ,
restarts: p . status . containerStatuses ?. reduce (( s , c ) => s + c . restartCount , 0 ) || 0 ,
ready: p . status . containerStatuses ?. every ( c => c . ready ) || false ,
containers: p . status . containerStatuses ?. map ( c => ({
name: c . name ,
ready: c . ready ,
restarts: c . restartCount ,
state: Object . keys ( c . state || {})[ 0 ],
reason: c . state ?. waiting ?. reason || c . state ?. terminated ?. reason || null ,
})),
}));
}
See all 25 lines
The agent sees a pod in CrashLoopBackOff with 12 restarts. It decides to check events and logs.
Step 2 — Recent Events
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // from step 1
const podName = "api-proxy-7f8b4c..." ; // from step 1 results
const kube = ( path ) => cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube/ ${ path } ` ,
}). then ( r => r . body );
const events = await kube (
`api/v1/namespaces/ ${ namespace } /events?fieldSelector=involvedObject.name= ${ podName } `
);
// Sort by last timestamp, return most recent
return events . items
. sort (( a , b ) => new Date ( b . lastTimestamp ) - new Date ( a . lastTimestamp ))
. slice ( 0 , 15 )
. map ( e => ({
type: e . type ,
reason: e . reason ,
message: e . message ,
count: e . count ,
last: e . lastTimestamp ,
}));
}
See all 26 lines
Events show OOMKilled — the container ran out of memory. The agent checks logs to confirm.
Step 3 — Error Logs
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // from step 1
const podName = "api-proxy-7f8b4c..." ; // from step 1 results
const logs = await cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube/api/v1/namespaces/ ${ namespace } /pods/ ${ podName } /log` ,
query: { tailLines: "200" , previous: "true" },
}). then ( r => r . body );
// Filter for errors and warnings
const lines = logs . split ( " \n " );
const errors = lines . filter ( l =>
/error | fatal | panic | exception | oom | killed/ i . test ( l )
);
return {
total_lines: lines . length ,
error_lines: errors . length ,
errors: errors . slice ( - 20 ),
};
}
See all 23 lines
Note previous: "true" — the agent fetches logs from the crashed container, not the restarting one. It finds memory allocation failures in the last 20 error lines.
Step 4 — Deployment Rollout History
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // from step 1
const deploymentName = "api-proxy" ; // from step 1 results
const kube = ( path ) => cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube/ ${ path } ` ,
}). then ( r => r . body );
const [ deployment , replicaSets ] = await Promise . all ([
kube ( `apis/apps/v1/namespaces/ ${ namespace } /deployments/ ${ deploymentName } ` ),
kube ( `apis/apps/v1/namespaces/ ${ namespace } /replicasets` ),
]);
// Find ReplicaSets owned by this deployment
const owned = replicaSets . items
. filter ( rs => rs . metadata . ownerReferences ?. some ( o => o . name === deploymentName ))
. sort (( a , b ) => parseInt ( b . metadata . annotations ?.[ "deployment.kubernetes.io/revision" ] || "0" )
- parseInt ( a . metadata . annotations ?.[ "deployment.kubernetes.io/revision" ] || "0" ));
return {
current_image: deployment . spec . template . spec . containers [ 0 ]?. image ,
current_limits: deployment . spec . template . spec . containers [ 0 ]?. resources ?. limits ,
revisions: owned . slice ( 0 , 5 ). map ( rs => ({
revision: rs . metadata . annotations ?.[ "deployment.kubernetes.io/revision" ],
image: rs . spec . template . spec . containers [ 0 ]?. image ,
replicas: rs . status . replicas ,
created: rs . metadata . creationTimestamp ,
})),
};
}
See all 32 lines
The agent finds that the latest revision changed the image but removed memory limits — root cause identified.
Why This Matters
An SRE manually doing this would:
kubectl get pods — check status
kubectl describe pod — read events
kubectl logs --previous — check crash logs
kubectl rollout history — check what changed
That’s 4 separate commands with raw output they need to mentally parse. The agent does it in 4 execute calls, but each one filters and extracts only what’s relevant. The LLM reasons about structured findings, not walls of YAML.
More importantly, the agent adapts . It doesn’t run a fixed checklist — it sees OOMKilled and decides to check previous container logs and deployment history. A traditional MCP tool would need a pre-built “debug pod” tool that tries to anticipate every scenario.