@a5c-ai/krate 5.0.1-staging.f672fe79b
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/Dockerfile +29 -0
- package/README.md +183 -0
- package/bin/krate-demo.mjs +23 -0
- package/bin/krate-server.mjs +14 -0
- package/dist/krate-controller-ui.json +2407 -0
- package/dist/krate-lifecycle.json +201 -0
- package/dist/krate-runtime-snapshot.json +2955 -0
- package/dist/krate-summary.json +687 -0
- package/docs/README.md +61 -0
- package/docs/agents/README.md +83 -0
- package/docs/agents/acceptance-test-matrix.md +193 -0
- package/docs/agents/agent-mux-adapter-contract.md +167 -0
- package/docs/agents/agent-mux-source-map.md +310 -0
- package/docs/agents/agent-run-memory-import-spec.md +256 -0
- package/docs/agents/agent-stack-management-spec.md +421 -0
- package/docs/agents/api-contract-spec.md +309 -0
- package/docs/agents/artifacts-writeback-spec.md +145 -0
- package/docs/agents/chart-packaging-spec.md +128 -0
- package/docs/agents/ci-orchestration-spec.md +140 -0
- package/docs/agents/context-assembly-spec.md +219 -0
- package/docs/agents/controller-reconciliation-spec.md +255 -0
- package/docs/agents/crd-schema-spec.md +315 -0
- package/docs/agents/decision-log-open-questions.md +169 -0
- package/docs/agents/developer-implementation-checklist.md +329 -0
- package/docs/agents/dispatching-design.md +262 -0
- package/docs/agents/glossary.md +66 -0
- package/docs/agents/implementation-blueprint.md +324 -0
- package/docs/agents/implementation-rollout-slices.md +251 -0
- package/docs/agents/memory-context-integration-spec.md +194 -0
- package/docs/agents/memory-ontology-schema-spec.md +253 -0
- package/docs/agents/memory-operations-runbook.md +121 -0
- package/docs/agents/mvp-vertical-slice-spec.md +146 -0
- package/docs/agents/observability-audit-spec.md +265 -0
- package/docs/agents/operator-runbook.md +174 -0
- package/docs/agents/org-memory-api-payload-examples.md +333 -0
- package/docs/agents/org-memory-controller-sequence-spec.md +181 -0
- package/docs/agents/org-memory-e2e-fixture-plan.md +161 -0
- package/docs/agents/org-memory-ui-implementation-map.md +114 -0
- package/docs/agents/org-memory-vertical-slice-spec.md +168 -0
- package/docs/agents/org-resource-model-delta-spec.md +111 -0
- package/docs/agents/org-route-resource-model-spec.md +183 -0
- package/docs/agents/org-scoping-namespace-spec.md +114 -0
- package/docs/agents/rbac-secrets-management-spec.md +406 -0
- package/docs/agents/repository-page-integration-spec.md +255 -0
- package/docs/agents/resource-contract-examples.md +808 -0
- package/docs/agents/resource-relationship-map.md +190 -0
- package/docs/agents/security-threat-model.md +188 -0
- package/docs/agents/shared-memory-company-brain-spec.md +358 -0
- package/docs/agents/storage-migration-spec.md +168 -0
- package/docs/agents/subagent-orchestration-spec.md +152 -0
- package/docs/agents/system-overview.md +88 -0
- package/docs/agents/tools-mcp-skills-spec.md +189 -0
- package/docs/agents/traceability-matrix.md +79 -0
- package/docs/agents/ui-flow-spec.md +211 -0
- package/docs/agents/ui-ux-system-spec.md +426 -0
- package/docs/agents/workspace-lifecycle-spec.md +166 -0
- package/docs/architecture-spec.md +78 -0
- package/docs/components/control-plane.md +78 -0
- package/docs/components/data-plane.md +69 -0
- package/docs/components/hooks-events.md +67 -0
- package/docs/components/identity-rbac-policy.md +73 -0
- package/docs/components/kubevela-oam.md +70 -0
- package/docs/components/operations-publishing.md +81 -0
- package/docs/components/runners-ci.md +66 -0
- package/docs/components/web-ui.md +94 -0
- package/docs/external/README.md +47 -0
- package/docs/external/bidirectional-sync-design.md +134 -0
- package/docs/external/cicd-interface.md +64 -0
- package/docs/external/external-backend-controllers.md +170 -0
- package/docs/external/external-backend-crds.md +234 -0
- package/docs/external/external-backend-ui-spec.md +151 -0
- package/docs/external/external-backend-ux-flows.md +115 -0
- package/docs/external/external-object-mapping.md +125 -0
- package/docs/external/git-forge-interface.md +68 -0
- package/docs/external/github-integration-design.md +151 -0
- package/docs/external/issue-tracking-interface.md +66 -0
- package/docs/external/provider-capability-manifests.md +204 -0
- package/docs/external/provider-catalog.md +139 -0
- package/docs/external/provider-rollout-testing.md +78 -0
- package/docs/external/research-results.md +48 -0
- package/docs/external/security-auth-permissions.md +81 -0
- package/docs/external/sync-state-machines.md +108 -0
- package/docs/external/unified-external-backend-model.md +107 -0
- package/docs/external/user-facing-changes.md +67 -0
- package/docs/gaps.md +161 -0
- package/docs/install.md +94 -0
- package/docs/krate-design.md +334 -0
- package/docs/local-minikube.md +55 -0
- package/docs/ontology/README.md +32 -0
- package/docs/ontology/bounded-contexts.md +29 -0
- package/docs/ontology/events-and-hooks.md +32 -0
- package/docs/ontology/oam-kubevela.md +32 -0
- package/docs/ontology/operations-and-release.md +25 -0
- package/docs/ontology/personas-and-actors.md +32 -0
- package/docs/ontology/policies-and-invariants.md +33 -0
- package/docs/ontology/problem-space.md +30 -0
- package/docs/ontology/resource-contracts.md +40 -0
- package/docs/ontology/resource-taxonomy.md +42 -0
- package/docs/ontology/runners-and-ci.md +29 -0
- package/docs/ontology/solution-space.md +24 -0
- package/docs/ontology/storage-and-data-boundaries.md +29 -0
- package/docs/ontology/validation-matrix.md +24 -0
- package/docs/ontology/web-ui-excellent-flows.md +32 -0
- package/docs/ontology/workflows.md +39 -0
- package/docs/ontology/world.md +35 -0
- package/docs/product-requirements.md +62 -0
- package/docs/roadmap-mvp.md +87 -0
- package/docs/system-requirements.md +90 -0
- package/docs/tests/README.md +53 -0
- package/docs/tests/agent-qa-plan.md +63 -0
- package/docs/tests/browser-ui-tests.md +62 -0
- package/docs/tests/ci-quality-gates.md +48 -0
- package/docs/tests/coverage-model.md +64 -0
- package/docs/tests/e2e-scenario-tests.md +53 -0
- package/docs/tests/fixtures-test-data.md +63 -0
- package/docs/tests/observability-reliability-tests.md +54 -0
- package/docs/tests/product-test-matrix.md +145 -0
- package/docs/tests/qa-adoption-roadmap.md +130 -0
- package/docs/tests/qa-automation-plan.md +101 -0
- package/docs/tests/security-compliance-tests.md +57 -0
- package/docs/tests/test-framework-tools.md +88 -0
- package/docs/tests/test-suite-layout.md +121 -0
- package/docs/tests/unit-integration-tests.md +48 -0
- package/docs/todo-kyverno +714 -0
- package/docs/user-stories.md +78 -0
- package/examples/minikube-demo.yaml +190 -0
- package/examples/oam-application.yaml +23 -0
- package/examples/policy-kyverno-pr-title.yaml +18 -0
- package/package.json +63 -0
- package/scripts/build.mjs +29 -0
- package/scripts/setup-minikube.mjs +65 -0
- package/scripts/smoke.mjs +37 -0
- package/scripts/validate-doc-coverage.mjs +152 -0
- package/scripts/validate-package.mjs +93 -0
- package/scripts/validate-ui.mjs +207 -0
- package/src/agent-approval-controller.js +123 -0
- package/src/agent-context-bundles.js +242 -0
- package/src/agent-dispatch-controller.js +86 -0
- package/src/agent-mux-client.js +280 -0
- package/src/agent-permission-review.js +162 -0
- package/src/agent-stack-controller.js +296 -0
- package/src/agent-trigger-controller.js +108 -0
- package/src/api-controller.js +206 -0
- package/src/argocd-gitops.js +43 -0
- package/src/auth.js +265 -0
- package/src/component-catalog.js +41 -0
- package/src/control-plane.js +136 -0
- package/src/controller-client.js +38 -0
- package/src/controller-ui.js +538 -0
- package/src/data-plane.js +178 -0
- package/src/gitea-backend.js +95 -0
- package/src/handoff.js +98 -0
- package/src/hooks-events.js +63 -0
- package/src/http-server.js +151 -0
- package/src/identity-policy.js +86 -0
- package/src/index.js +30 -0
- package/src/kubernetes-controller.js +812 -0
- package/src/kubernetes-resource-gateway.js +48 -0
- package/src/operations.js +112 -0
- package/src/resource-model.js +203 -0
- package/src/runners-ci.js +48 -0
- package/src/runtime.js +196 -0
- package/src/web-ui.js +40 -0
- package/tests/agent-approval-controller.test.js +173 -0
- package/tests/agent-context-bundles.test.js +278 -0
- package/tests/agent-dispatch-controller.test.js +176 -0
- package/tests/agent-mux-client.test.js +204 -0
- package/tests/agent-permission-review.test.js +209 -0
- package/tests/agent-resources.test.js +212 -0
- package/tests/agent-stack-controller.test.js +221 -0
- package/tests/agent-trigger-controller.test.js +211 -0
- package/tests/deployment.test.js +395 -0
- package/tests/e2e/lifecycle.test.js +117 -0
- package/tests/krate.test.js +727 -0
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# Agent MVP vertical slice spec
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
This document defines the smallest coherent implementation that proves Krate agent orchestration without attempting every spec at once. It should be the first code implementation target after the docs-only phase.
|
|
6
|
+
|
|
7
|
+
## MVP goal
|
|
8
|
+
|
|
9
|
+
A repository admin can define one read-only/diagnostic agent stack, review its Kubernetes-native permissions, manually dispatch it from a repository Code or Runs page, and see a CI-like `AgentDispatchRun` with context preview, permission snapshot, Agent Mux session binding when configured, and no write-back unless explicitly approved in a later slice.
|
|
10
|
+
|
|
11
|
+
## Included user flow
|
|
12
|
+
|
|
13
|
+
1. Admin creates minimal agent resources:
|
|
14
|
+
- `AgentServiceAccount`;
|
|
15
|
+
- `AgentRoleBinding` using read-only role template;
|
|
16
|
+
- `AgentToolProfile` with read-only filesystem and no broad network;
|
|
17
|
+
- optional `AgentMcpServer` if Agent Mux capability lookup requires it;
|
|
18
|
+
- `AgentStack` for diagnostic/readonly work.
|
|
19
|
+
2. Admin runs permission review and sees `Ready=True` or exact blockers.
|
|
20
|
+
3. User opens `/orgs/[org]/repositories/[repo]/code` or `/orgs/[org]/repositories/[repo]/runs`.
|
|
21
|
+
4. User opens dispatch composer.
|
|
22
|
+
5. Krate assembles `AgentContextBundle` preview and permission review.
|
|
23
|
+
6. User confirms dispatch.
|
|
24
|
+
7. Krate creates `AgentDispatchRun` and `AgentDispatchAttempt` before Agent Mux launch.
|
|
25
|
+
8. Run appears in `/agents/runs` and repository Runs page.
|
|
26
|
+
9. If Agent Mux is configured, Krate launches and binds session/run IDs.
|
|
27
|
+
10. Run detail shows queued/running/completed/failed state, context digest, permission snapshot, event timeline, and linked session placeholder or chat panel.
|
|
28
|
+
|
|
29
|
+
## Included resources
|
|
30
|
+
|
|
31
|
+
Configuration:
|
|
32
|
+
|
|
33
|
+
- `AgentStack`;
|
|
34
|
+
- `AgentToolProfile`;
|
|
35
|
+
- `AgentServiceAccount`;
|
|
36
|
+
- `AgentRoleBinding`;
|
|
37
|
+
- `AgentSecretGrant` only for model provider token if required by deployment mode;
|
|
38
|
+
- `AgentConfigGrant` only if the selected tool/adapter needs non-secret config.
|
|
39
|
+
|
|
40
|
+
Execution:
|
|
41
|
+
|
|
42
|
+
- `AgentContextBundle`;
|
|
43
|
+
- `AgentDispatchRun`;
|
|
44
|
+
- `AgentDispatchAttempt`;
|
|
45
|
+
- `AgentCapabilityRequirement`;
|
|
46
|
+
- optional `AgentSession` projection if Agent Mux binds successfully;
|
|
47
|
+
- optional `AgentArtifact` for diagnosis summary.
|
|
48
|
+
|
|
49
|
+
## Included routes
|
|
50
|
+
|
|
51
|
+
Global:
|
|
52
|
+
|
|
53
|
+
- `/agents` overview;
|
|
54
|
+
- `/agents/stacks` minimal stack list/builder;
|
|
55
|
+
- `/agents/runs` dispatch list;
|
|
56
|
+
- `/agents/runs/[run]` run detail;
|
|
57
|
+
- `/agents/permissions` permission review panel.
|
|
58
|
+
|
|
59
|
+
Repository:
|
|
60
|
+
|
|
61
|
+
- `/orgs/[org]/repositories/[repo]/code` dispatch entry point;
|
|
62
|
+
- `/orgs/[org]/repositories/[repo]/runs` agent run rows beside pipeline/job rows;
|
|
63
|
+
- `/orgs/[org]/repositories/[repo]/settings/agents` minimal settings panel if sub-routes are available, otherwise embed in existing settings page.
|
|
64
|
+
|
|
65
|
+
API:
|
|
66
|
+
|
|
67
|
+
- generic resource API remains supported;
|
|
68
|
+
- `POST /api/agents/permissions/review`;
|
|
69
|
+
- `POST /api/agents/runs`;
|
|
70
|
+
- `GET /api/agents/runs`;
|
|
71
|
+
- `GET /api/agents/runs/:run`;
|
|
72
|
+
- `GET /api/agents/runs/:run/events` or watch equivalent.
|
|
73
|
+
|
|
74
|
+
## Explicitly deferred
|
|
75
|
+
|
|
76
|
+
- automatic trigger rules;
|
|
77
|
+
- issue/PR mention dispatch;
|
|
78
|
+
- write-back approvals and branch pushes;
|
|
79
|
+
- full Secret/ConfigMap grant wizard;
|
|
80
|
+
- subagent execution;
|
|
81
|
+
- workspace provisioning/rebase lifecycle;
|
|
82
|
+
- artifact write-back;
|
|
83
|
+
- retention jobs;
|
|
84
|
+
- full observability dashboards;
|
|
85
|
+
- broad chart feature gates beyond safe defaults;
|
|
86
|
+
- production MCP server management.
|
|
87
|
+
|
|
88
|
+
## MVP acceptance criteria
|
|
89
|
+
|
|
90
|
+
### Resource model
|
|
91
|
+
|
|
92
|
+
- Agent MVP kinds are in `src/resource-model.js` and `src/kubernetes-controller.js`.
|
|
93
|
+
- Generic `/api/controller/resources?kind=AgentStack` can list stacks.
|
|
94
|
+
- Agent resources render in advanced resource tables without custom UI hacks.
|
|
95
|
+
|
|
96
|
+
### Permission review
|
|
97
|
+
|
|
98
|
+
- Missing ServiceAccount blocks stack readiness.
|
|
99
|
+
- Missing role binding blocks stack readiness.
|
|
100
|
+
- Missing required Secret grant blocks stack readiness.
|
|
101
|
+
- Permission review response includes no Secret values.
|
|
102
|
+
|
|
103
|
+
### Manual dispatch
|
|
104
|
+
|
|
105
|
+
- Dispatch creates run and attempt before Agent Mux launch.
|
|
106
|
+
- Context bundle digest is attached to the attempt.
|
|
107
|
+
- Permission snapshot digest is attached to the attempt.
|
|
108
|
+
- Denied dispatch returns actionable policy/RBAC/grant reason.
|
|
109
|
+
|
|
110
|
+
### UI
|
|
111
|
+
|
|
112
|
+
- Code page shows dispatch entry point and disabled/missing-permission states.
|
|
113
|
+
- Runs page shows agent dispatch rows beside pipeline rows.
|
|
114
|
+
- Run detail shows source breadcrumbs, status, context, permission, and Agent Mux binding state.
|
|
115
|
+
- Empty states are server-projected through controller UI model.
|
|
116
|
+
|
|
117
|
+
### Agent Mux
|
|
118
|
+
|
|
119
|
+
- If gateway is unavailable, run remains failed/degraded with clear condition and Krate stays usable.
|
|
120
|
+
- If gateway is configured, run/session IDs are stored exactly once.
|
|
121
|
+
- Unsupported actions are disabled based on capability snapshot.
|
|
122
|
+
|
|
123
|
+
## MVP tests
|
|
124
|
+
|
|
125
|
+
Required minimum tests:
|
|
126
|
+
|
|
127
|
+
- resource schema test for `AgentStack` and `AgentDispatchRun`;
|
|
128
|
+
- permission review missing grant test;
|
|
129
|
+
- manual dispatch creates run/attempt/context test;
|
|
130
|
+
- Agent Mux unavailable fallback test;
|
|
131
|
+
- UI validation for Code dispatch entry and run detail empty/pending states;
|
|
132
|
+
- package validation for CRDs/examples.
|
|
133
|
+
|
|
134
|
+
## MVP non-negotiables
|
|
135
|
+
|
|
136
|
+
- No Secret values in browser, status, logs, prompt preview, or audit.
|
|
137
|
+
- No automatic writes to PRs, branches, checks, issues, or releases.
|
|
138
|
+
- No untrusted fork privileged ServiceAccount or Secret access.
|
|
139
|
+
- No UI-only permission checks for enabled actions.
|
|
140
|
+
- No Agent Mux state as source of truth for Krate repository resources.
|
|
141
|
+
|
|
142
|
+
## Org memory MVP slice
|
|
143
|
+
|
|
144
|
+
The org memory MVP is described in [Org memory vertical slice spec](./org-memory-vertical-slice-spec.md). It should be treated as the first coherent agent-memory build target: one org, one repository, one company brain repo, one manual dispatch with memory snapshot, and one summary-only run-memory import.
|
|
145
|
+
|
|
146
|
+
This slice is intentionally narrower than full agent orchestration. It excludes raw `.a5c` artifact retention, broad automation triggers, vector search, cross-org sharing, and advanced subagent orchestration.
|
|
@@ -0,0 +1,265 @@
|
|
|
1
|
+
# Agent observability and audit spec
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Agent runs should be observable like CI runs and auditable like privileged control-plane actions. This document defines metrics, events, logs, traces, audit records, and UI projections for agent orchestration.
|
|
6
|
+
|
|
7
|
+
It is grounded in current Krate behavior: `src/controller-ui.js` already exposes metrics, events, validation checks, and empty `auditLog`, while `/api/watch/orgs/[org]/*` streams live Kubernetes watch events into UI panels.
|
|
8
|
+
|
|
9
|
+
## Observability principles
|
|
10
|
+
|
|
11
|
+
- Every run is traceable from source event to trigger execution to dispatch run to Agent Mux session to artifacts/write-back.
|
|
12
|
+
- Every privileged decision has an audit event.
|
|
13
|
+
- User-facing run pages should explain queueing, execution, approvals, and failures without requiring raw logs first.
|
|
14
|
+
- Metrics should be repository-, stack-, runner-, and trigger-scoped.
|
|
15
|
+
- Secret values are never logged; Secret/ConfigMap metadata can be logged only by name/key/purpose/digest.
|
|
16
|
+
|
|
17
|
+
## Correlation IDs
|
|
18
|
+
|
|
19
|
+
Every agent operation should carry:
|
|
20
|
+
|
|
21
|
+
- `correlationId`: request/session-wide trace.
|
|
22
|
+
- `sourceEventUid`: webhook/CI/comment/label/manual source.
|
|
23
|
+
- `triggerExecutionUid`.
|
|
24
|
+
- `dispatchRunUid`.
|
|
25
|
+
- `attemptUid`.
|
|
26
|
+
- `agentMuxRunId`.
|
|
27
|
+
- `agentMuxSessionId`.
|
|
28
|
+
- `contextBundleDigest`.
|
|
29
|
+
- `permissionSnapshotDigest`.
|
|
30
|
+
- `artifactDigest` where applicable.
|
|
31
|
+
|
|
32
|
+
These IDs should appear in events, logs, audit records, and UI details.
|
|
33
|
+
|
|
34
|
+
## Event taxonomy
|
|
35
|
+
|
|
36
|
+
| Event | Producer | UI surface |
|
|
37
|
+
| --- | --- | --- |
|
|
38
|
+
| `AgentTriggerMatched` | trigger controller | rules, source page, run detail |
|
|
39
|
+
| `AgentTriggerCoalesced` | trigger controller | rules, source page |
|
|
40
|
+
| `AgentTriggerRejected` | trigger controller | rules, denied dispatch panel |
|
|
41
|
+
| `AgentContextAssembled` | context bundle controller | dispatch composer, run detail |
|
|
42
|
+
| `AgentPermissionReviewCompleted` | permission review | stack builder, run detail, audit |
|
|
43
|
+
| `AgentDispatchQueued` | dispatch controller | run list, pipeline page |
|
|
44
|
+
| `AgentRunnerAssigned` | dispatch controller | run detail |
|
|
45
|
+
| `AgentMuxLaunchRequested` | dispatch controller | run detail |
|
|
46
|
+
| `AgentMuxSessionBound` | Agent Mux client | run/session page |
|
|
47
|
+
| `AgentToolCallStarted` | Agent Mux event projection | observability timeline |
|
|
48
|
+
| `AgentToolCallApprovalRequested` | Agent Mux/client | approval inbox, run detail |
|
|
49
|
+
| `AgentSubagentStarted` | Agent Mux event projection | subagent tree |
|
|
50
|
+
| `AgentArtifactProduced` | dispatch controller | run detail, PR/issue page |
|
|
51
|
+
| `AgentWriteBackRequested` | approval/write-back controller | approval inbox |
|
|
52
|
+
| `AgentWriteBackApplied` | approval/write-back controller | run detail, source page |
|
|
53
|
+
| `AgentDispatchCompleted` | dispatch controller | run list, source page |
|
|
54
|
+
| `AgentDispatchFailed` | dispatch controller | run detail, source page |
|
|
55
|
+
|
|
56
|
+
## Metrics
|
|
57
|
+
|
|
58
|
+
### Global metrics
|
|
59
|
+
|
|
60
|
+
- active dispatches;
|
|
61
|
+
- queued dispatches;
|
|
62
|
+
- pending approvals;
|
|
63
|
+
- running Agent Mux sessions;
|
|
64
|
+
- failed dispatches by reason;
|
|
65
|
+
- average queue wait;
|
|
66
|
+
- average run duration;
|
|
67
|
+
- approval wait duration;
|
|
68
|
+
- token/cost estimates by stack/provider;
|
|
69
|
+
- trigger coalescing/rejection rates;
|
|
70
|
+
- missing permission warning counts.
|
|
71
|
+
|
|
72
|
+
### Repository metrics
|
|
73
|
+
|
|
74
|
+
- dispatches per repository/ref/PR/issue;
|
|
75
|
+
- failed CI repair attempts;
|
|
76
|
+
- write-back approvals/applied actions;
|
|
77
|
+
- workspace recoveries/rebase conflicts;
|
|
78
|
+
- top failing trigger rules;
|
|
79
|
+
- secrets/config grants used by active stacks.
|
|
80
|
+
|
|
81
|
+
### Runner metrics
|
|
82
|
+
|
|
83
|
+
- queue depth by runner pool;
|
|
84
|
+
- wait latency p50/p95;
|
|
85
|
+
- active attempts;
|
|
86
|
+
- trusted vs untrusted usage;
|
|
87
|
+
- runner ServiceAccount denials;
|
|
88
|
+
- cost by pool/repository.
|
|
89
|
+
|
|
90
|
+
### Agent Mux metrics
|
|
91
|
+
|
|
92
|
+
- launch latency;
|
|
93
|
+
- session bind latency;
|
|
94
|
+
- stream reconnect count;
|
|
95
|
+
- tool call count/duration;
|
|
96
|
+
- subagent count/duration;
|
|
97
|
+
- adapter rejection count;
|
|
98
|
+
- transcript/event bytes.
|
|
99
|
+
|
|
100
|
+
## Audit records
|
|
101
|
+
|
|
102
|
+
Audit records should be append-only and queryable by repository, user, stack, trigger, dispatch, and target object.
|
|
103
|
+
|
|
104
|
+
Required audit event classes:
|
|
105
|
+
|
|
106
|
+
- permission/grant changes;
|
|
107
|
+
- role/service-account changes;
|
|
108
|
+
- stack save and readiness changes;
|
|
109
|
+
- trigger lifecycle changes;
|
|
110
|
+
- dispatch creation and cancellation;
|
|
111
|
+
- context refresh;
|
|
112
|
+
- approval decisions;
|
|
113
|
+
- write-back actions;
|
|
114
|
+
- secret/config rotation impact;
|
|
115
|
+
- native RBAC drift;
|
|
116
|
+
- policy bypass or denial.
|
|
117
|
+
|
|
118
|
+
Audit record required fields:
|
|
119
|
+
|
|
120
|
+
```yaml
|
|
121
|
+
type: AgentApprovalDecision
|
|
122
|
+
actor:
|
|
123
|
+
kind: User
|
|
124
|
+
name: tmusk
|
|
125
|
+
kubernetesUser: tmusk@example.com
|
|
126
|
+
source:
|
|
127
|
+
repository: krate
|
|
128
|
+
dispatchRun: adr-01hx
|
|
129
|
+
attempt: ada-01hx-1
|
|
130
|
+
target:
|
|
131
|
+
kind: PullRequest
|
|
132
|
+
name: krate/42
|
|
133
|
+
decision:
|
|
134
|
+
allowed: true
|
|
135
|
+
reason: ApprovedByMaintainer
|
|
136
|
+
digests:
|
|
137
|
+
contextBundle: sha256:...
|
|
138
|
+
permissionSnapshot: sha256:...
|
|
139
|
+
artifact: sha256:...
|
|
140
|
+
metadata:
|
|
141
|
+
correlationId: krate-...
|
|
142
|
+
agentMuxRunId: run_01hx
|
|
143
|
+
agentMuxSessionId: ses_01hx
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
## Logs
|
|
147
|
+
|
|
148
|
+
Controller logs should be structured and include:
|
|
149
|
+
|
|
150
|
+
- controller name;
|
|
151
|
+
- reconciliation key;
|
|
152
|
+
- correlation ID;
|
|
153
|
+
- resource kind/name;
|
|
154
|
+
- phase/condition changes;
|
|
155
|
+
- external call target without secrets;
|
|
156
|
+
- duration and result.
|
|
157
|
+
|
|
158
|
+
Do not log:
|
|
159
|
+
|
|
160
|
+
- Secret values;
|
|
161
|
+
- raw authorization headers;
|
|
162
|
+
- full prompt when it may contain sensitive context;
|
|
163
|
+
- full transcript unless explicitly configured for a safe environment.
|
|
164
|
+
|
|
165
|
+
## Traces
|
|
166
|
+
|
|
167
|
+
Trace spans should cover:
|
|
168
|
+
|
|
169
|
+
- API request;
|
|
170
|
+
- permission review;
|
|
171
|
+
- context assembly;
|
|
172
|
+
- trigger evaluation;
|
|
173
|
+
- dispatch creation;
|
|
174
|
+
- runner placement;
|
|
175
|
+
- Agent Mux launch;
|
|
176
|
+
- event stream reconciliation;
|
|
177
|
+
- artifact persistence;
|
|
178
|
+
- approval/write-back.
|
|
179
|
+
|
|
180
|
+
## UI projections
|
|
181
|
+
|
|
182
|
+
`src/controller-ui.js` should eventually expose:
|
|
183
|
+
|
|
184
|
+
```json
|
|
185
|
+
{
|
|
186
|
+
"metrics": {
|
|
187
|
+
"agentDispatches": 12,
|
|
188
|
+
"agentApprovals": 3,
|
|
189
|
+
"agentMissingPermissions": 2
|
|
190
|
+
},
|
|
191
|
+
"auditLog": [],
|
|
192
|
+
"views": {
|
|
193
|
+
"agents": {
|
|
194
|
+
"activeRuns": [],
|
|
195
|
+
"pendingApprovals": [],
|
|
196
|
+
"recentFailures": [],
|
|
197
|
+
"missingPermissions": []
|
|
198
|
+
}
|
|
199
|
+
}
|
|
200
|
+
}
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
Run detail must show:
|
|
204
|
+
|
|
205
|
+
- event timeline;
|
|
206
|
+
- transcript/session state;
|
|
207
|
+
- queue and runner timings;
|
|
208
|
+
- permission snapshot;
|
|
209
|
+
- context digest;
|
|
210
|
+
- artifacts and approvals;
|
|
211
|
+
- audit trail for write-back.
|
|
212
|
+
|
|
213
|
+
## Alerts
|
|
214
|
+
|
|
215
|
+
Recommended alert conditions:
|
|
216
|
+
|
|
217
|
+
- Agent Mux gateway unavailable;
|
|
218
|
+
- dispatch queue wait p95 over threshold;
|
|
219
|
+
- approval backlog over threshold;
|
|
220
|
+
- repeated adapter launch rejection;
|
|
221
|
+
- native RBAC drift on owned role;
|
|
222
|
+
- missing Secret/ConfigMap grant blocks active stack;
|
|
223
|
+
- untrusted ref attempted privileged secret access;
|
|
224
|
+
- write-back failure after approval;
|
|
225
|
+
- retention job failure.
|
|
226
|
+
|
|
227
|
+
## Acceptance criteria
|
|
228
|
+
|
|
229
|
+
- A user can follow a failed CI-triggered agent run from source event through dispatch, session, artifacts, approval, and write-back.
|
|
230
|
+
- Every privileged grant/approval/write-back has an audit record.
|
|
231
|
+
- Missing permission warnings appear in metrics and UI.
|
|
232
|
+
- Agent Mux stream disconnects produce visible stale/reconnect state.
|
|
233
|
+
- No log/audit/UI surface contains Secret values.
|
|
234
|
+
|
|
235
|
+
## Memory observability
|
|
236
|
+
|
|
237
|
+
Memory events must be auditable because memory can change agent behavior.
|
|
238
|
+
|
|
239
|
+
Required events:
|
|
240
|
+
|
|
241
|
+
- `agent.memory.ref.resolved` with requested ref, resolved commit, mode, and requester.
|
|
242
|
+
- `agent.memory.query.executed` with snapshot, query modes, counts, truncation, and denied scopes.
|
|
243
|
+
- `agent.memory.snapshot.created` with ontology/index/query digests.
|
|
244
|
+
- `agent.memory.update.proposed` with source run, paths, diff digest, and validation result.
|
|
245
|
+
- `agent.memory.update.approved`, `agent.memory.update.merged`, and `agent.memory.update.rejected`.
|
|
246
|
+
- `agent.memory.ontology.invalid` with parse/error counts and blocking status.
|
|
247
|
+
|
|
248
|
+
Metrics should include query latency, index age, validation failures, denied memory reads, update merge latency, and historical-memory dispatch count.
|
|
249
|
+
|
|
250
|
+
## Org and memory audit fields
|
|
251
|
+
|
|
252
|
+
Every agent, memory, deployment, and repository audit event should include:
|
|
253
|
+
|
|
254
|
+
- `organizationRef`;
|
|
255
|
+
- `namespace`;
|
|
256
|
+
- actor user/group/service account;
|
|
257
|
+
- repository/deployment refs when applicable;
|
|
258
|
+
- memory repository and resolved commit when memory is used;
|
|
259
|
+
- session ID and run ID when Agent Mux or Babysitter participates;
|
|
260
|
+
- journal digest when `.a5c` run memory is imported;
|
|
261
|
+
- cross-org sharing policy ID when a cross-org ref is admitted.
|
|
262
|
+
|
|
263
|
+
## Org memory sequence audit coverage
|
|
264
|
+
|
|
265
|
+
Each sequence in `org-memory-controller-sequence-spec.md` should emit audit records for preflight, admission decision, Git ref resolution, memory query, context snapshot, Agent Mux launch, memory import collection, redaction, validation, review, merge, and cross-org denial. Audit records must include org, namespace, source refs, resolved commit, and digest fields where applicable.
|
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
# Agent operator runbook
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
This runbook describes how an operator should install, enable, validate, troubleshoot, and safely disable Krate agent orchestration once implemented. It is docs-only and aligns with the current chart/API validation surfaces.
|
|
6
|
+
|
|
7
|
+
## Current baseline
|
|
8
|
+
|
|
9
|
+
Before agent implementation, these commands should remain green:
|
|
10
|
+
|
|
11
|
+
```powershell
|
|
12
|
+
npm run validate:docs
|
|
13
|
+
npm run package:check
|
|
14
|
+
npm run ui:validate
|
|
15
|
+
npm run check
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Current install surfaces:
|
|
19
|
+
|
|
20
|
+
- chart: `charts/krate`;
|
|
21
|
+
- values: `charts/krate/values.yaml`;
|
|
22
|
+
- CRDs: `charts/krate/crds/`;
|
|
23
|
+
- controller API: `/api/controller`;
|
|
24
|
+
- resource API: `/api/controller/resources`;
|
|
25
|
+
- watch API: `/api/watch/orgs/[org]/*`.
|
|
26
|
+
|
|
27
|
+
## Enablement checklist
|
|
28
|
+
|
|
29
|
+
When agents are implemented, enable in this order:
|
|
30
|
+
|
|
31
|
+
1. Install/upgrade CRDs for agent config resources.
|
|
32
|
+
2. Enable `agents.enabled=true` in Helm values.
|
|
33
|
+
3. Configure Agent Mux gateway URL and credentials through `existingSecret`.
|
|
34
|
+
4. Configure default untrusted runner pool.
|
|
35
|
+
5. Configure default agent runtime ServiceAccount.
|
|
36
|
+
6. Enable permission review without auto-dispatch.
|
|
37
|
+
7. Create a read-only agent stack.
|
|
38
|
+
8. Validate stack readiness and permission review.
|
|
39
|
+
9. Enable manual dispatch for a test repository.
|
|
40
|
+
10. Enable trigger rules only after manual dispatch is stable.
|
|
41
|
+
11. Enable write-back approvals last.
|
|
42
|
+
|
|
43
|
+
## Preflight checks
|
|
44
|
+
|
|
45
|
+
Operator should verify:
|
|
46
|
+
|
|
47
|
+
- Kubernetes API reachable from controller pod;
|
|
48
|
+
- Krate CRDs installed;
|
|
49
|
+
- native RBAC allows controller to manage intended resources;
|
|
50
|
+
- Agent Mux gateway reachable from controller namespace;
|
|
51
|
+
- runner pools exist and trust tiers are correct;
|
|
52
|
+
- Secret/ConfigMap grant management feature gate configured;
|
|
53
|
+
- NetworkPolicy allows only required Agent Mux/MCP egress;
|
|
54
|
+
- `/api/controller` reports healthy controller model;
|
|
55
|
+
- `/api/watch/orgs/[org]/repositories` streams events.
|
|
56
|
+
|
|
57
|
+
## Safe default policy
|
|
58
|
+
|
|
59
|
+
Default install should be safe:
|
|
60
|
+
|
|
61
|
+
- agents disabled;
|
|
62
|
+
- trigger rules disabled;
|
|
63
|
+
- manual dispatch disabled;
|
|
64
|
+
- write-back approval required;
|
|
65
|
+
- no privileged secrets on forks;
|
|
66
|
+
- untrusted runner pool default;
|
|
67
|
+
- no broad Secret read for web pod;
|
|
68
|
+
- Agent Mux gateway optional/degraded if absent.
|
|
69
|
+
|
|
70
|
+
## Common operations
|
|
71
|
+
|
|
72
|
+
### Create first read-only stack
|
|
73
|
+
|
|
74
|
+
1. Create `AgentServiceAccount` with read-only role template.
|
|
75
|
+
2. Create `AgentToolProfile` with filesystem read-only and network deny.
|
|
76
|
+
3. Create `AgentStack` using read-only tool profile.
|
|
77
|
+
4. Run permission review.
|
|
78
|
+
5. Confirm stack `Ready=True`.
|
|
79
|
+
|
|
80
|
+
### Grant a tool Secret
|
|
81
|
+
|
|
82
|
+
1. Open Secret grant wizard or apply `AgentSecretGrant`.
|
|
83
|
+
2. Scope to stack/tool, repository, refs, trigger sources, and purpose.
|
|
84
|
+
3. Confirm Secret key metadata exists.
|
|
85
|
+
4. Confirm stack readiness updates.
|
|
86
|
+
5. Run dry-run before dispatch.
|
|
87
|
+
|
|
88
|
+
### Enable CI diagnosis
|
|
89
|
+
|
|
90
|
+
1. Create CI diagnosis stack.
|
|
91
|
+
2. Create `AgentTriggerRule` in draft.
|
|
92
|
+
3. Dry-run against failed `Pipeline`/`Job` payload.
|
|
93
|
+
4. Validate context bundle preview and permission review.
|
|
94
|
+
5. Set lifecycle to active.
|
|
95
|
+
6. Watch `AgentTriggerExecution` and `AgentDispatchRun` resources.
|
|
96
|
+
|
|
97
|
+
### Rotate Secret
|
|
98
|
+
|
|
99
|
+
1. Update native Secret using approved process.
|
|
100
|
+
2. Confirm `AgentSecretGrant` status updates metadata version.
|
|
101
|
+
3. Check affected stacks/rules/runs.
|
|
102
|
+
4. Retry/resume only after fresh permission review.
|
|
103
|
+
|
|
104
|
+
## Troubleshooting
|
|
105
|
+
|
|
106
|
+
| Symptom | Check | Likely fix |
|
|
107
|
+
| --- | --- | --- |
|
|
108
|
+
| stack not ready | stack conditions | fix adapter, ServiceAccount, grant, MCP, or skill dependency. |
|
|
109
|
+
| dispatch denied | permission review response | add least-privilege role/grant or change runner/source. |
|
|
110
|
+
| Agent Mux launch fails | attempt status and adapter rejection | update launch options or adapter configuration. |
|
|
111
|
+
| session stuck pending | Agent Mux gateway and session binding | retry binding or inspect gateway logs. |
|
|
112
|
+
| no watch updates | `/api/watch/orgs/<org>/<resource>` | check Kubernetes watch/RBAC/network. |
|
|
113
|
+
| Secret grant missing | capability requirements | create scoped `AgentSecretGrant`. |
|
|
114
|
+
| fork run gets privileged pool | trigger/runner trust policy | force untrusted pool and audit policy violation. |
|
|
115
|
+
| write-back duplicated | idempotency key/audit | fix write-back controller idempotency before retrying. |
|
|
116
|
+
|
|
117
|
+
## Rollback and disablement
|
|
118
|
+
|
|
119
|
+
To disable safely:
|
|
120
|
+
|
|
121
|
+
1. Pause trigger rules.
|
|
122
|
+
2. Disable manual dispatch.
|
|
123
|
+
3. Wait for active dispatches or cancel them.
|
|
124
|
+
4. Disable write-back actions.
|
|
125
|
+
5. Keep read-only run/artifact/audit views available.
|
|
126
|
+
6. Disable Agent Mux gateway integration.
|
|
127
|
+
7. Leave CRDs installed until retained records are exported or pruned.
|
|
128
|
+
|
|
129
|
+
Emergency disable:
|
|
130
|
+
|
|
131
|
+
- set `agents.enabled=false` or feature gates false;
|
|
132
|
+
- revoke Agent Mux gateway secret;
|
|
133
|
+
- revoke privileged `AgentSecretGrant` and `AgentRoleBinding` resources;
|
|
134
|
+
- scale agent controllers down if necessary;
|
|
135
|
+
- preserve audit and run records.
|
|
136
|
+
|
|
137
|
+
## Operational metrics to watch
|
|
138
|
+
|
|
139
|
+
- active dispatches;
|
|
140
|
+
- queued dispatches;
|
|
141
|
+
- failed dispatches;
|
|
142
|
+
- pending approvals;
|
|
143
|
+
- Agent Mux launch failures;
|
|
144
|
+
- permission review denials;
|
|
145
|
+
- RBAC drift;
|
|
146
|
+
- missing grants;
|
|
147
|
+
- write-back failures;
|
|
148
|
+
- retention job failures.
|
|
149
|
+
|
|
150
|
+
## Support bundle
|
|
151
|
+
|
|
152
|
+
A safe support bundle should include:
|
|
153
|
+
|
|
154
|
+
- controller model from `/api/controller` with secrets redacted;
|
|
155
|
+
- stack/rule/run resource YAML;
|
|
156
|
+
- permission review response;
|
|
157
|
+
- conditions from ServiceAccount/RoleBinding/SecretGrant/ConfigGrant resources;
|
|
158
|
+
- event timeline;
|
|
159
|
+
- Agent Mux run/session IDs;
|
|
160
|
+
- audit records;
|
|
161
|
+
- chart values with Secret values redacted.
|
|
162
|
+
|
|
163
|
+
It must not include Secret values, raw tokens, kubeconfigs, private keys, or full transcripts unless explicitly approved.
|
|
164
|
+
|
|
165
|
+
## Company brain operations
|
|
166
|
+
|
|
167
|
+
Operators should treat the memory repository like production configuration:
|
|
168
|
+
|
|
169
|
+
- verify `AgentMemoryRepository` health before enabling required-memory stacks;
|
|
170
|
+
- keep ontology validation green before allowing memory update merges;
|
|
171
|
+
- monitor index age and query failures;
|
|
172
|
+
- use historical memory refs for reproducible incident replay;
|
|
173
|
+
- disable affected `AgentMemorySource` scopes before reverting a bad memory commit;
|
|
174
|
+
- preserve `AgentMemorySnapshot` records even when source policies change.
|