@a5c-ai/krate 5.0.1-staging.69cb593ea → 5.0.1-staging.6be34ee2a

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,431 +1,2535 @@
1
1
  # Krate Architecture Specification v2
2
2
 
3
- > Derived from implementation. Last updated from source analysis.
3
+ > Exhaustive architecture reference derived from implementation source code.
4
+ > Source: `packages/krate/core/src/`, `packages/krate/sdk/src/`, `packages/krate/web/`, `packages/krate/cli/`
5
+
6
+ ---
4
7
 
5
8
  ## 1. System Overview
6
9
 
7
10
  Krate is a Kubernetes-native Git forge runtime built as a monorepo with four packages:
8
11
 
9
- | Package | NPM Name | Role | Path |
10
- |---------|-----------|------|------|
11
- | **core** | `@a5c-ai/krate` | Resource model, controllers, HTTP API server | `packages/krate/core/` |
12
- | **sdk** | `@a5c-ai/krate-sdk` | Client SDK re-exporting core helpers for web/CLI consumers | `packages/krate/sdk/` |
13
- | **cli** | `@a5c-ai/krate-cli` | CLI entrypoint and MCP server mode | `packages/krate/cli/` |
14
- | **web** | `@a5c-ai/krate-web` | Next.js 16 + React 19 web console | `packages/krate/web/` |
12
+ | Package | NPM Name | Role | Path | Dependencies |
13
+ |---------|-----------|------|------|--------------|
14
+ | **core** | `@a5c-ai/krate` | Resource model, controllers, HTTP API server | `packages/krate/core/` | Zero external (Node.js built-ins only) |
15
+ | **sdk** | `@a5c-ai/krate-sdk` | Client SDK re-exporting core helpers for web/CLI consumers | `packages/krate/sdk/` | Re-exports from core |
16
+ | **cli** | `@a5c-ai/krate-cli` | CLI entrypoint and MCP server mode | `packages/krate/cli/` | Imports from core |
17
+ | **web** | `@a5c-ai/krate-web` | Next.js 16 + React 19 web console | `packages/krate/web/` | Imports from sdk |
15
18
 
16
19
  **Design principles:**
17
20
  - Pure ESM JavaScript (Node 20+), zero external runtime dependencies in core
18
21
  - Kubernetes-first: all resources are K8s API objects (CRDs or aggregated)
19
- - CRD-driven: 75 CustomResourceDefinitions under `krate.a5c.ai/v1alpha1`
22
+ - CRD-driven: 76 CustomResourceDefinitions under `krate.a5c.ai/v1alpha1`
20
23
  - Controller pattern: each domain has a controller with explicit boundary declarations
24
+ - Intent-based: controllers produce manifests/specs but never execute kubectl directly
21
25
 
22
26
  ```mermaid
23
27
  graph TB
24
- subgraph "Krate Platform"
28
+ subgraph "Browser"
25
29
  WEB[Web Console<br/>Next.js 16 + React 19]
26
- CLI[CLI + MCP Server]
27
- SDK[SDK Layer]
28
- CORE[Core Controllers]
29
30
  end
30
-
31
+
32
+ subgraph "SDK Layer"
33
+ SDK_INDEX[sdk/src/index.js<br/>65+ re-exports]
34
+ ATLAS[sdk/src/atlas-graph-client.js<br/>Stack layer catalog]
35
+ end
36
+
37
+ subgraph "Core Controllers"
38
+ HTTP[http-server.js<br/>Node HTTP handler]
39
+ API[api-controller.js<br/>Facade/orchestrator]
40
+ STACK[agent-stack-controller.js]
41
+ DISPATCH[agent-dispatch-controller.js]
42
+ TRIGGER[agent-trigger-controller.js]
43
+ WORKSPACE[agent-workspace-controller.js]
44
+ APPROVAL[agent-approval-controller.js]
45
+ MEMORY[agent-memory-controller.js]
46
+ MEMQ[agent-memory-query.js]
47
+ PERM[agent-permission-review.js]
48
+ AUDIT[audit-controller.js]
49
+ NOTIFY[notification-controller.js]
50
+ RUNNER[runner-controller.js]
51
+ EVENTBUS[event-bus.js]
52
+ CACHE[snapshot-cache.js]
53
+ end
54
+
55
+ subgraph "Resource Layer"
56
+ GATEWAY[kubernetes-resource-gateway.js]
57
+ CLIENT[kubernetes-controller.js<br/>kubectl spawning]
58
+ RESMODEL[resource-model.js<br/>76 kinds]
59
+ end
60
+
61
+ subgraph "External Subsystem"
62
+ WEBHOOK[external/webhook-controller.js]
63
+ SYNC[external/sync-controller.js]
64
+ CONFLICT[external/conflict-controller.js]
65
+ WRITE[external/write-controller.js]
66
+ GITHUB_ADAPTER[external/github/]
67
+ end
68
+
31
69
  subgraph "Infrastructure"
32
- K8S[Kubernetes API<br/>etcd storage]
70
+ K8S[Kubernetes API<br/>etcd CRD storage]
33
71
  PG[PostgreSQL<br/>Aggregated storage]
34
72
  GITEA[Gitea<br/>Git hosting]
35
73
  end
36
-
37
- WEB --> SDK
38
- CLI --> SDK
39
- SDK --> CORE
40
- CORE --> K8S
41
- CORE --> PG
42
- CORE --> GITEA
74
+
75
+ WEB --> SDK_INDEX
76
+ SDK_INDEX --> API
77
+ HTTP --> API
78
+ API --> GATEWAY
79
+ API --> DISPATCH
80
+ API --> TRIGGER
81
+ API --> APPROVAL
82
+ API --> MEMORY
83
+ API --> WORKSPACE
84
+ API --> SYNC
85
+ API --> WEBHOOK
86
+ API --> CONFLICT
87
+ API --> WRITE
88
+ DISPATCH --> PERM
89
+ DISPATCH --> STACK
90
+ DISPATCH --> WORKSPACE
91
+ DISPATCH --> APPROVAL
92
+ DISPATCH --> MEMORY
93
+ GATEWAY --> CLIENT
94
+ CLIENT --> K8S
95
+ CLIENT --> PG
96
+ GATEWAY --> GITEA
97
+ EVENTBUS --> HTTP
98
+ ```
99
+
100
+ ---
101
+
102
+ ## 2. Package Dependency Graph
103
+
104
+ ### 2.1 Import Hierarchy (Strict)
105
+
106
+ ```
107
+ web → sdk → core
108
+ cli → core
109
+ ```
110
+
111
+ The web package NEVER imports directly from core. The SDK acts as the public API surface.
112
+
113
+ ### 2.2 Core Internal Dependencies
114
+
115
+ ```mermaid
116
+ graph LR
117
+ HTTP[http-server] --> API[api-controller]
118
+ HTTP --> CTRL_UI[controller-ui]
119
+ HTTP --> GATEWAY[kubernetes-resource-gateway]
120
+ HTTP --> EVENTBUS[event-bus]
121
+ API --> GATEWAY
122
+ API --> DISPATCH[agent-dispatch-controller]
123
+ API --> TRIGGER[agent-trigger-controller]
124
+ API --> APPROVAL[agent-approval-controller]
125
+ API --> WORKSPACE[agent-workspace-controller]
126
+ API --> MEMORY_CTRL[agent-memory-controller]
127
+ API --> PERM[agent-permission-review]
128
+ API --> SYNC[external/sync-controller]
129
+ API --> WEBHOOK_CTRL[external/webhook-controller]
130
+ API --> WRITE_CTRL[external/write-controller]
131
+ API --> CONFLICT_CTRL[external/conflict-controller]
132
+ DISPATCH --> PERM
133
+ DISPATCH --> STACK[agent-stack-controller]
134
+ DISPATCH --> CONTEXT[agent-context-bundles]
135
+ DISPATCH --> MUX[agent-mux-client]
136
+ DISPATCH --> MEMORY_CTRL
137
+ DISPATCH --> APPROVAL
138
+ DISPATCH --> WORKSPACE
139
+ GATEWAY --> CLIENT[kubernetes-controller]
140
+ CLIENT --> RESMODEL[resource-model]
141
+ STACK --> PERM
142
+ TRIGGER --> DISPATCH
43
143
  ```
44
144
 
145
+ ### 2.3 Circular Dependency Prevention
146
+
147
+ - Controllers only import `resource-model.js` and their declared `delegatesTo` modules
148
+ - Every controller has a `BOUNDARY` constant declaring what it owns and what it must not own
149
+ - The api-controller is the only fan-out point that imports multiple controllers
150
+ - No controller imports the api-controller (prevents upward dependency)
151
+
45
152
  ---
46
153
 
47
- ## 2. Data Model
154
+ ## 3. Request Lifecycle
155
+
156
+ ### 3.1 From Browser Click to Kubectl Apply
157
+
158
+ ```mermaid
159
+ sequenceDiagram
160
+ participant Browser
161
+ participant NextJS as Next.js App Router
162
+ participant WebAPI as Web API Route
163
+ participant SDK as fetchControllerUiModel()
164
+ participant HTTP as Krate HTTP Server
165
+ participant API as createKrateApiController
166
+ participant Gateway as KubernetesResourceGateway
167
+ participant Client as KubernetesResourceClient
168
+ participant Kubectl as kubectl process
169
+
170
+ Browser->>NextJS: Page load / navigation
171
+ NextJS->>WebAPI: GET /api/orgs/[org]/repositories
172
+ WebAPI->>SDK: fetchControllerUiModel({ baseUrl, org })
173
+ SDK->>HTTP: GET /api/controller?org=acme
174
+ HTTP->>API: controller.snapshot()
175
+ API->>Gateway: resourceGateway.snapshot()
176
+ Gateway->>Client: resourceClient.snapshot()
177
+ Client->>Kubectl: spawnSync('kubectl', ['get', ...])
178
+ Kubectl-->>Client: JSON stdout
179
+ Client-->>Gateway: parsed resources
180
+ Gateway-->>API: snapshot object
181
+ API-->>HTTP: withArchitecture(snapshot)
182
+ HTTP->>HTTP: createControllerUiModel(snapshot, { organization })
183
+ HTTP-->>SDK: JSON response (UI model)
184
+ SDK-->>WebAPI: structured data
185
+ WebAPI-->>NextJS: props
186
+ NextJS-->>Browser: rendered HTML
187
+ ```
188
+
189
+ ### 3.2 Function Call Chain (Exact)
190
+
191
+ 1. `createKrateHttpHandler()` receives Node.js `IncomingMessage`
192
+ 2. URL parsed: `new URL(request.url, 'http://localhost')`
193
+ 3. Route matching via regex: `/^\/api\/orgs\/([^/]+)\/resources$/`
194
+ 4. Org extracted from URL path segment
195
+ 5. `createKrateApiController({ namespace: orgNamespaceName(org) })` instantiated per-request
196
+ 6. Controller method called (e.g., `listResource`, `applyResource`)
197
+ 7. Cross-org admission check in `applyResource()`: verifies `spec.organizationRef` matches namespace
198
+ 8. `resourceGateway.apply(resource)` delegates to kubectl
199
+ 9. `clearSnapshotCache()` invalidates stale data
200
+ 10. `globalEventBus.emitResourceChange(kind, name, operation)` broadcasts SSE
201
+ 11. JSON response written via `send(response, status, body)`
48
202
 
49
- ### 2.1 Storage Split
203
+ ---
50
204
 
51
- Krate manages **76 resource kinds** across two storage backends:
205
+ ## 4. Snapshot Pipeline
52
206
 
53
- - **CONFIG storage (etcd)**: 44 kinds — organizational configuration, policies, agent definitions. Stored as Kubernetes CRDs.
54
- - **AGGREGATED storage (postgres)**: 32 kinds — operational data, event records, runtime state. Stored in PostgreSQL.
207
+ ### 4.1 `getControllerSnapshot()` Step by Step
55
208
 
56
- Source: `packages/krate/core/src/resource-model.js`
209
+ Source: `packages/krate/core/src/kubernetes-controller.js` lines 352-497
57
210
 
58
- ### 2.2 Resource Schema
211
+ ```mermaid
212
+ flowchart TD
213
+ A[Start: getControllerSnapshot] --> B{kubectl config current-context}
214
+ B -->|Fail| C[Return degraded snapshot]
215
+ B -->|OK| D{kubectl version --client=true}
216
+ D -->|Fail| C
217
+ D -->|OK| E[kubectl get apiservice v1alpha1.krate.a5c.ai]
218
+ E --> F[kubectl get crd -o json]
219
+ F --> G{Filter CRDs by group}
220
+ G --> H[krate.a5c.ai CRDs]
221
+ G --> I[core.oam.dev CRDs]
222
+ G --> J[kyverno.io CRDs]
223
+ H --> K[Build discoveredPluralSet]
224
+ K --> L[List platform-scoped resources]
225
+ L --> M[Determine org namespaces]
226
+ M --> N[List org-scoped resources per namespace]
227
+ N --> O[Get events in platform namespace]
228
+ O --> P[Run SubjectAccessReview for each CRD]
229
+ P --> Q[Discover Kyverno controllers]
230
+ Q --> R[Return full snapshot object]
231
+ ```
59
232
 
60
- Every resource follows the Kubernetes object model:
233
+ ### 4.2 Org Namespace Discovery
61
234
 
62
235
  ```javascript
63
- {
64
- apiVersion: 'krate.a5c.ai/v1alpha1',
65
- kind: '<ResourceKind>',
66
- metadata: {
67
- name: '<unique-name>',
68
- namespace: 'krate-org-<org>',
69
- labels: { 'krate.a5c.ai/org': '<org>' },
70
- annotations: {}
71
- },
72
- spec: { /* kind-specific specification */ },
73
- status: { /* reconciled status */ }
236
+ function organizationNamespaces(organizations, bindings, fallbackNamespace) {
237
+ // 1. Extract namespaceName from Organization specs
238
+ // 2. Extract namespace from OrgNamespaceBinding specs
239
+ // 3. Deduplicate with Set
240
+ // 4. Fallback: KRATE_ADMIN_ORG, KRATE_ORG, or 'default'
74
241
  }
75
242
  ```
76
243
 
77
- ### 2.3 Domain Organization
244
+ ### 4.3 In-Cluster Detection
78
245
 
79
- | Domain Context | CONFIG Kinds | AGGREGATED Kinds | Purpose |
80
- |----------------|-------------|------------------|---------|
81
- | **identity** | Organization, OrgNamespaceBinding, User, Team, Invite, IdentityMapping, AuthProvider, AgentServiceAccount, AgentRoleBinding, AgentSecretGrant, AgentConfigGrant | — | Users, teams, identity, RBAC |
82
- | **data-plane** | Repository, SSHKey, RepositoryPermission, RefPolicy | — | Repository management |
83
- | **control-plane** | BranchProtection | PullRequest, Issue, Review | Code review lifecycle |
84
- | **policy** | PolicyProfile, PolicyTemplate, PolicyBinding, PolicyExceptionRequest | — | Kyverno policy management |
85
- | **agents** | AgentStack, AgentSubagent, AgentToolProfile, AgentMcpServer, AgentSkill, AgentTriggerRule, AgentContextLabel, KrateWorkspacePolicy, AgentAdapter, AgentTransportBinding, AgentProviderConfig, KrateProject, AgentGatewayConfig, AgentMemoryRepository, AgentMemorySource, AgentMemoryOntology, AgentMemoryAssociation | AgentDispatchRun, AgentDispatchAttempt, AgentSession, AgentContextBundle, KrateArtifact, AgentApproval, AgentTriggerExecution, AgentCapabilityRequirement, WorkItemSessionLink, WorkItemWorkspaceLink, AgentSessionTranscript, AgentSessionAttachment, KrateWorkspaceRuntime, AgentMemorySnapshot, AgentMemoryQuery, AgentMemoryUpdate, AgentRunMemoryImport | Agent orchestration |
86
- | **workspaces** | KrateWorkspace | — | Git workspace management |
87
- | **external-backends** | ExternalBackendProvider, ExternalBackendBinding, ExternalBackendSyncPolicy, ExternalProviderCapabilityManifest | ExternalWebhookDelivery, ExternalSyncEvent, ExternalSyncState, ExternalWriteIntent, ExternalSyncConflict, ExternalObjectLink | External system integration |
88
- | **runners-ci** | RunnerPool | Pipeline, Job | CI/CD execution |
89
- | **hooks-events** | WebhookSubscription | WebhookDelivery | Webhook management |
90
- | **web-ui** | View, Selector | — | Saved views and selectors |
246
+ Source: `inClusterKubectlConfig()` at line 724
247
+
248
+ When running inside a Kubernetes pod:
249
+ - Checks `KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT`
250
+ - Reads `/var/run/secrets/kubernetes.io/serviceaccount/token`
251
+ - Reads `/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`
252
+ - Adds `--server`, `--certificate-authority`, `--token` args to all kubectl calls
253
+
254
+ ### 4.4 kubectl Execution Model
255
+
256
+ ```javascript
257
+ function runKubectl(args, options) {
258
+ // Uses spawnSync (synchronous) for snapshot queries
259
+ // Timeout: KRATE_KUBECTL_TIMEOUT_MS (default 3000ms)
260
+ // Max buffer: KRATE_KUBECTL_MAX_BUFFER_BYTES (default 32MB)
261
+ // windowsHide: true (prevents console flash on Windows)
262
+ // encoding: 'utf8'
263
+ }
264
+ ```
265
+
266
+ ### 4.5 Environment Variables Affecting Snapshot
267
+
268
+ | Variable | Default | Purpose |
269
+ |----------|---------|---------|
270
+ | `KRATE_KUBECTL` | `kubectl` | Path to kubectl binary |
271
+ | `KRATE_NAMESPACE` | `krate-system` | Platform namespace |
272
+ | `KRATE_KUBECTL_TIMEOUT_MS` | `3000` | kubectl spawn timeout |
273
+ | `KRATE_KUBECTL_MAX_BUFFER_BYTES` | `33554432` | Max stdout buffer (32MB) |
274
+ | `KRATE_DISABLE_IN_CLUSTER_KUBECTL` | `false` | Skip in-cluster detection |
275
+ | `KUBECONFIG` | (none) | If set, disables in-cluster mode |
276
+ | `KUBERNETES_SERVICE_HOST` | (none) | In-cluster API server host |
277
+ | `KUBERNETES_SERVICE_PORT` | `443` | In-cluster API server port |
278
+ | `KRATE_SERVICE_ACCOUNT_DIR` | `/var/run/secrets/kubernetes.io/serviceaccount` | SA mount path |
279
+ | `KRATE_ORG` | `default` | Fallback org for namespace discovery |
280
+ | `KRATE_ADMIN_ORG` | (none) | Admin org for namespace discovery |
91
281
 
92
282
  ---
93
283
 
94
- ## 3. Control Plane
284
+ ## 5. Stale-While-Revalidate Cache
285
+
286
+ Source: `packages/krate/core/src/snapshot-cache.js`
287
+
288
+ ### 5.1 Cache Architecture
289
+
290
+ ```mermaid
291
+ graph LR
292
+ subgraph "Per-Org Cache Map"
293
+ A["orgCacheMap (Map)"]
294
+ A --> B["'' → {data, timestamp, revalidating}"]
295
+ A --> C["'acme' → {data, timestamp, revalidating}"]
296
+ A --> D["'beta' → {data, timestamp, revalidating}"]
297
+ end
298
+ subgraph "Legacy Single-Org"
299
+ E["snapshotCache = {data, timestamp, org}"]
300
+ end
301
+ A -.->|sync| E
302
+ ```
303
+
304
+ ### 5.2 TTL Configuration
305
+
306
+ ```javascript
307
+ export const CACHE_TTL_MS = Number(process.env.KRATE_SNAPSHOT_CACHE_TTL_MS || 30_000);
308
+ ```
309
+
310
+ ### 5.3 staleWhileRevalidate Algorithm
311
+
312
+ ```javascript
313
+ async function staleWhileRevalidate(org, revalidateFn, swrOptions = {}) {
314
+ const ttlMs = swrOptions.ttlMs ?? CACHE_TTL_MS; // Fresh window: 30s
315
+ const staleMs = swrOptions.staleMs ?? ttlMs * 5; // Max stale: 150s
316
+
317
+ // CASE 1: Fresh (< 30s old) → return immediately
318
+ // CASE 2: Stale but usable (30s-150s) → return immediately, revalidate in background
319
+ // CASE 3: Stale and revalidating → return stale data (another caller is refreshing)
320
+ // CASE 4: No cache or too old (>150s) → block on revalidation
321
+ }
322
+ ```
323
+
324
+ ### 5.4 Cache Invalidation Triggers
95
325
 
96
- Source: `packages/krate/core/src/kubernetes-controller.js`, `kubernetes-controller-async.js`, `kubernetes-resource-gateway.js`
326
+ `clearSnapshotCache()` is called on:
327
+ - `applyResource()` success
328
+ - `applyResourceForOrg()` success
329
+ - `deleteResource()` success
330
+ - `deleteResourceForOrg()` success
331
+
332
+ ---
97
333
 
98
- ### 3.1 Resource Reconciliation
334
+ ## 6. Authentication Flow
99
335
 
100
- The control plane uses kubectl to interact with the Kubernetes API server. Resources are reconciled through:
336
+ ### 6.1 Complete OAuth Flow
101
337
 
102
- 1. **kubectl get** — list/get resources by kind and namespace
103
- 2. **kubectl apply** — create/update resources declaratively
104
- 3. **kubectl delete** — remove resources
338
+ Source: `packages/krate/core/src/auth.js`
105
339
 
106
- The `createKubernetesResourceGateway()` provides an async wrapper over kubectl operations.
340
+ ```mermaid
341
+ sequenceDiagram
342
+ participant User
343
+ participant Browser
344
+ participant LoginPage as /login page
345
+ participant AuthRoute as /api/auth/github
346
+ participant GitHub as GitHub OAuth
347
+ participant CallbackRoute as /api/auth/callback/github
348
+ participant AuthModule as auth.js
349
+ participant K8s as Kubernetes
350
+
351
+ User->>Browser: Click "Sign in with GitHub"
352
+ Browser->>LoginPage: Navigate
353
+ LoginPage->>AuthRoute: GET /api/auth/github
354
+ AuthRoute->>AuthModule: buildAuthorizationRedirect({ provider, requestUrl })
355
+ AuthModule-->>AuthRoute: { url, state, redirectUri }
356
+ AuthRoute->>Browser: 302 Redirect to GitHub
357
+
358
+ Browser->>GitHub: GET /login/oauth/authorize?client_id=...&redirect_uri=...&scope=read:user+user:email&state=...
359
+ GitHub->>User: Show authorization prompt
360
+ User->>GitHub: Authorize
361
+ GitHub->>Browser: 302 Redirect to /api/auth/callback/github?code=ABC&state=...
362
+
363
+ Browser->>CallbackRoute: GET /api/auth/callback/github?code=ABC&state=...
364
+ CallbackRoute->>AuthModule: exchangeOAuthCodeForProfile({ provider, code, requestUrl })
365
+ AuthModule->>GitHub: POST /login/oauth/access_token (code + client_secret)
366
+ GitHub-->>AuthModule: { access_token: "gho_..." }
367
+ AuthModule->>GitHub: GET /user (Authorization: Bearer gho_...)
368
+ GitHub-->>AuthModule: { login, id, email, name }
369
+ AuthModule->>AuthModule: normalizeProviderProfile(provider, profile)
370
+ AuthModule-->>CallbackRoute: { provider, subject, email, displayName, username, groups, admin }
371
+
372
+ CallbackRoute->>AuthModule: registerLoginProfile({ controller, namespace, profile })
373
+ AuthModule->>K8s: applyResource(User)
374
+ AuthModule->>K8s: applyResource(IdentityMapping)
375
+ CallbackRoute->>AuthModule: createSessionCookie(config, profile, { secret })
376
+ AuthModule-->>CallbackRoute: "krate_session=base64url.hmac; Path=/; HttpOnly; SameSite=Lax"
377
+ CallbackRoute->>Browser: Set-Cookie + 302 to /orgs/[org]
378
+ ```
107
379
 
108
- ### 3.2 Namespace Scoping
380
+ ### 6.2 Session Cookie Structure
109
381
 
110
382
  ```
111
- Namespace pattern: krate-org-{orgSlug}
383
+ krate_session = base64url(payload) . hmac_sha256_base64url(payload, secret)
112
384
  ```
113
385
 
114
- Each organization gets an isolated Kubernetes namespace (`krate-org-<org>`). The `orgNamespaceName(org)` helper (from `org-scoping.js`) computes the namespace name.
386
+ Payload JSON:
387
+ ```json
388
+ { "provider": "github", "subject": "12345", "user": "octocat" }
389
+ ```
390
+
391
+ ### 6.3 Session Verification
115
392
 
116
- Platform-scoped resources (Organization, OrgNamespaceBinding) live in `krate-system` namespace.
393
+ ```javascript
394
+ function parseSessionCookie(config, cookieValue, options) {
395
+ // 1. Split on first '.' → [payload, signature]
396
+ // 2. If signed + no secret: reject (null)
397
+ // 3. If unsigned + secret configured: reject (null)
398
+ // 4. If signed + secret: compute expected HMAC, timingSafeEqual
399
+ // 5. If match: decode base64url → JSON.parse → extract user/subject/provider
400
+ // 6. Return { cookieName, provider, subject, user } or null
401
+ }
402
+ ```
117
403
 
118
- ### 3.3 Org Isolation
404
+ ### 6.4 Delegated Identity (Proxy Auth)
119
405
 
120
- - `OrgNamespaceBinding` maps one organization to exactly one tenant namespace
121
- - All org-scoped resources carry `metadata.labels['krate.a5c.ai/org']`
122
- - API routes extract org from URL path and scope controllers to that namespace
406
+ Headers examined:
407
+ - `x-forwarded-user` (configurable via `KRATE_AUTH_DELEGATED_USER_HEADER`)
408
+ - `x-forwarded-groups` (configurable via `KRATE_AUTH_DELEGATED_GROUPS_HEADER`)
409
+ - `x-forwarded-email` (configurable via `KRATE_AUTH_DELEGATED_EMAIL_HEADER`)
123
410
 
124
- ### 3.4 Stale-While-Revalidate Cache
411
+ Local development auto-login:
412
+ - Active when `NODE_ENV !== 'production'` (or `KRATE_AUTH_DELEGATED_LOCAL_DEVELOPMENT=true`)
413
+ - Default user: `KRATE_AUTH_DELEGATED_LOCAL_USER` or `'local-developer'`
414
+ - Default groups: `KRATE_AUTH_DELEGATED_LOCAL_GROUPS` or `'krate:repo-admins'`
125
415
 
126
- Source: `packages/krate/core/src/snapshot-cache.js`
416
+ ### 6.5 Admin Detection
127
417
 
418
+ Admin status is derived from group membership:
128
419
  ```javascript
129
- CACHE_TTL_MS = 30_000 // 30 seconds, configurable via KRATE_SNAPSHOT_CACHE_TTL_MS
420
+ admin: groups.includes('krate:platform-engineers') || groups.includes('krate:repo-admins')
130
421
  ```
131
422
 
132
- - Per-org cache map stores `{ data, timestamp, revalidating }`
133
- - Returns stale data immediately while refreshing in background
134
- - Prevents kubectl overhead on repeated snapshot requests
423
+ ### 6.6 All Auth Environment Variables
424
+
425
+ | Variable | Default | Purpose |
426
+ |----------|---------|---------|
427
+ | `KRATE_AUTH_COOKIE_NAME` | `krate_session` | Cookie name |
428
+ | `KRATE_SESSION_SECRET` | `''` | HMAC signing secret |
429
+ | `KRATE_AUTH_GITHUB_ENABLED` | `true` | Enable GitHub provider |
430
+ | `KRATE_AUTH_GITHUB_CLIENT_ID` | `''` | OAuth client ID |
431
+ | `KRATE_AUTH_GITHUB_CLIENT_SECRET` | `''` | OAuth client secret |
432
+ | `KRATE_AUTH_GITHUB_AUTHORIZATION_URL` | `https://github.com/login/oauth/authorize` | Auth endpoint |
433
+ | `KRATE_AUTH_GITHUB_TOKEN_URL` | `https://github.com/login/oauth/access_token` | Token endpoint |
434
+ | `KRATE_AUTH_GITHUB_USERINFO_URL` | `https://api.github.com/user` | Profile endpoint |
435
+ | `KRATE_AUTH_GITHUB_SCOPES` | `read:user user:email` | OAuth scopes |
436
+ | `KRATE_AUTH_SSO_ENABLED` | `false` | Enable OIDC provider |
437
+ | `KRATE_AUTH_SSO_PROVIDER_NAME` | `Workspace SSO` | Display label |
438
+ | `KRATE_AUTH_SSO_ISSUER_URL` | `''` | OIDC issuer |
439
+ | `KRATE_AUTH_SSO_CLIENT_ID` | `''` | OIDC client ID |
440
+ | `KRATE_AUTH_SSO_CLIENT_SECRET` | `''` | OIDC client secret |
441
+ | `KRATE_AUTH_SSO_AUTHORIZATION_URL` | `''` | OIDC auth endpoint |
442
+ | `KRATE_AUTH_SSO_TOKEN_URL` | `''` | OIDC token endpoint |
443
+ | `KRATE_AUTH_SSO_USERINFO_URL` | `''` | OIDC profile endpoint |
444
+ | `KRATE_AUTH_SSO_SCOPES` | `openid profile email groups` | OIDC scopes |
445
+ | `KRATE_AUTH_DELEGATED_IDENTITY_ENABLED` | `false` | Enable proxy auth |
446
+ | `KRATE_AUTH_DELEGATED_USER_HEADER` | `x-forwarded-user` | User header |
447
+ | `KRATE_AUTH_DELEGATED_GROUPS_HEADER` | `x-forwarded-groups` | Groups header |
448
+ | `KRATE_AUTH_DELEGATED_EMAIL_HEADER` | `x-forwarded-email` | Email header |
449
+ | `KRATE_AUTH_DELEGATED_LOCAL_DEVELOPMENT` | auto | Enable local dev fallback |
450
+ | `KRATE_AUTH_DELEGATED_LOCAL_USER` | `local-developer` | Dev username |
451
+ | `KRATE_AUTH_DELEGATED_LOCAL_EMAIL` | `''` | Dev email |
452
+ | `KRATE_AUTH_DELEGATED_LOCAL_GROUPS` | `krate:repo-admins` | Dev groups |
453
+ | `KRATE_ADMIN_ORG` | (none) | Bootstrap admin org |
454
+ | `KRATE_ADMIN_USERNAME` | (none) | Bootstrap admin user |
135
455
 
136
456
  ---
137
457
 
138
- ## 4. Data Plane
458
+ ## 7. Resource Lifecycle
139
459
 
140
- ### 4.1 Aggregated Resources
460
+ ### 7.1 From UI Form Submit to SSE Update
141
461
 
142
- Stored in PostgreSQL (in-memory for development):
143
-
144
- | Kind | Purpose |
145
- |------|---------|
146
- | PullRequest | Review unit with source/target refs, checks, merge lifecycle |
147
- | Issue | Project-scoped work item with labels, comments, backend sync |
148
- | Review | Approval/comment/change-request for a pull request |
149
- | Pipeline | CI pipeline run state, trust tier, steps, resume point |
150
- | Job | Executable CI step with service-account scope |
462
+ ```mermaid
463
+ sequenceDiagram
464
+ participant UI as Web Console
465
+ participant Route as API Route
466
+ participant HTTP as HTTP Handler
467
+ participant API as ApiController
468
+ participant Gateway as ResourceGateway
469
+ participant Client as KubernetesClient
470
+ participant Kubectl as kubectl
471
+ participant K8s as Kubernetes API
472
+ participant Cache as SnapshotCache
473
+ participant Bus as EventBus
474
+ participant SSE as SSE Stream
475
+
476
+ UI->>Route: POST fetch('/api/orgs/acme/resources', { body: resource })
477
+ Route->>HTTP: Forward request
478
+ HTTP->>HTTP: scopeResource(resource, org)
479
+ Note over HTTP: Add namespace, labels, organizationRef
480
+ HTTP->>API: scopedController.applyResource(scopedResource)
481
+ API->>API: Cross-org admission check
482
+ Note over API: Verify spec.organizationRef matches namespace
483
+ API->>Gateway: resourceGateway.apply(resource)
484
+ Gateway->>Client: resourceClient.applyResource(resource)
485
+ Client->>Client: withOrgScope(resource, { namespace })
486
+ Client->>Client: ensureNamespace(targetNs)
487
+ Client->>Kubectl: spawnSync(['apply', '-f', '-', '-o', 'json'], { input: JSON })
488
+ Kubectl->>K8s: kubectl apply -f -
489
+ K8s-->>Kubectl: Applied resource JSON
490
+ Kubectl-->>Client: stdout
491
+ Client-->>Gateway: { operation: 'apply', resource }
492
+ Gateway-->>API: result
493
+ API->>Cache: clearSnapshotCache()
494
+ API->>Bus: globalEventBus.emitResourceChange(kind, name, 'apply')
495
+ Bus->>SSE: writer(event) → response.write('data: {...}\n\n')
496
+ SSE->>UI: EventSource message received
497
+ UI->>UI: Re-fetch / optimistic update
498
+ API-->>HTTP: { operation, resource }
499
+ HTTP-->>Route: 201 JSON
500
+ Route-->>UI: Success response
501
+ ```
151
502
 
152
- ### 4.2 Git Layer (Gitea)
503
+ ### 7.2 scopeResource Function
153
504
 
154
- Source: `packages/krate/core/src/gitea-service.js`, `gitea-backend.js`
505
+ ```javascript
506
+ function scopeResource(resource, org) {
507
+ const namespace = orgNamespaceName(org); // 'krate-org-acme'
508
+ return {
509
+ ...resource,
510
+ metadata: {
511
+ ...(resource.metadata || {}),
512
+ namespace,
513
+ labels: {
514
+ ...(resource.metadata?.labels || {}),
515
+ 'krate.a5c.ai/org': org,
516
+ 'krate.a5c.ai/namespace': namespace
517
+ }
518
+ },
519
+ spec: { ...(resource.spec || {}), organizationRef: org }
520
+ };
521
+ }
522
+ ```
155
523
 
156
- - Repository storage and hosting
157
- - Branch management (create, list, delete)
158
- - Tree/blob API for code browsing
159
- - SSH key reconciliation with repository access
524
+ ### 7.3 Cross-Org Admission
160
525
 
161
- ### 4.3 Search Index
526
+ In `applyResource()`:
527
+ ```javascript
528
+ const resourceOrg = resource.spec?.organizationRef;
529
+ const resourceNs = resource.metadata?.namespace;
530
+ if (resourceOrg) {
531
+ const expectedNs = orgNamespaceName(resourceOrg);
532
+ if (resourceNs && resourceNs !== expectedNs) {
533
+ throw new Error(`Cross-org namespace mismatch`);
534
+ }
535
+ }
536
+ ```
162
537
 
163
- The HTTP API exposes `POST /api/orgs/:org/repositories/:repo/search-index` to enqueue search indexing for a repository.
538
+ In `deleteResourceForOrg()`:
539
+ ```javascript
540
+ // Verify existing resource namespace matches org
541
+ if (!resourceNs || resourceNs !== orgNs) {
542
+ throw new Error(`Cross-org denial`);
543
+ }
544
+ ```
164
545
 
165
546
  ---
166
547
 
167
- ## 5. Agent Orchestration
548
+ ## 8. Agent Dispatch Lifecycle
168
549
 
169
- Source: `packages/krate/core/src/agent-stack-controller.js`, `agent-dispatch-controller.js`, `agent-workspace-controller.js`, `agent-trigger-controller.js`, `agent-approval-controller.js`
550
+ ### 8.1 Complete Flow
170
551
 
171
- ### 5.1 Lifecycle Flow
552
+ Source: `packages/krate/core/src/agent-dispatch-controller.js`
172
553
 
173
554
  ```mermaid
174
555
  sequenceDiagram
175
- participant Trigger as AgentTriggerRule
176
- participant Stack as AgentStack
177
- participant Dispatch as AgentDispatchRun
178
- participant Attempt as AgentDispatchAttempt
179
- participant Session as AgentSession
180
- participant Workspace as KrateWorkspace
181
-
182
- Trigger->>Dispatch: Event matches rule → create run
183
- Dispatch->>Stack: Resolve stack definition
184
- Stack->>Dispatch: Capabilities, approval mode
185
- Dispatch->>Attempt: Create execution attempt
186
- Attempt->>Session: Bind Agent Mux session
187
- Attempt->>Workspace: Provision workspace (PVC)
188
- Session->>Workspace: Mount and execute
189
- ```
190
-
191
- ### 5.2 Stack Definition
192
-
193
- `AgentStack` defines a reusable agent with:
194
- - Base agent and adapter selection
195
- - Model and prompt configuration
196
- - MCP servers, skills, subagents
197
- - Tool profiles and approval mode
198
- - Runner policy and runtime identity
199
-
200
- ### 5.3 Trigger System
201
-
202
- `AgentTriggerRule` routes events to stacks based on:
203
- - CI failures (`pipeline-failure`)
204
- - Webhook events
205
- - Comments on issues/PRs
206
- - Label additions
207
- - Cron schedules
208
- - Manual dispatch
209
-
210
- ### 5.4 Supporting Resources
211
-
212
- | Resource | Role |
213
- |----------|------|
214
- | AgentAdapter | Transport type, capabilities matrix, auth requirements |
215
- | AgentTransportBinding | Endpoint, protocol, auth, health, reconnect policy |
216
- | AgentProviderConfig | Model provider with API base, auth, rate limits |
217
- | AgentGatewayConfig | Agent Mux gateway connection settings |
218
- | AgentContextBundle | Immutable prompt/context snapshot |
219
- | AgentApproval | Human approval gate for tools, secrets, write-back |
220
- | AgentSessionTranscript | Chat transcript with message nodes and cost |
556
+ participant UI as Dispatch Button
557
+ participant HTTP as HTTP Handler
558
+ participant API as ApiController
559
+ participant Dispatch as DispatchController
560
+ participant Perm as PermissionReviewer
561
+ participant Stack as StackController
562
+ participant Memory as MemoryController
563
+ participant Approval as ApprovalController
564
+ participant Workspace as WorkspaceController
565
+ participant Context as ContextBundler
566
+ participant Mux as AgentMuxClient
567
+
568
+ UI->>HTTP: POST /api/orgs/:org/agents/dispatch
569
+ HTTP->>API: controller.dispatchAgent(input)
570
+ API->>API: snapshot = await this.snapshot()
571
+ API->>Dispatch: createManualDispatch({ ...input, resources: snapshot.resources })
572
+
573
+ Dispatch->>Dispatch: 1. Find AgentStack by name in resources
574
+ alt Stack not found
575
+ Dispatch-->>API: { error: true, reason: 'stack-not-found' }
576
+ end
577
+
578
+ Dispatch->>Perm: 2. reviewPermissions({ repository, ref, actor, agentStack, resources })
579
+ Note over Perm: Check cross-org, fork, SA, roles, secrets, configs
580
+ alt Permission denied
581
+ Dispatch-->>API: { error: true, reason: 'permission-denied' }
582
+ end
583
+
584
+ Dispatch->>Memory: 3. Memory snapshot (if AgentMemoryRepository exists)
585
+ Memory-->>Dispatch: memorySnapshot resource
586
+
587
+ alt Requires approval
588
+ Dispatch->>Dispatch: Create AgentDispatchRun (phase: AwaitingApproval)
589
+ Dispatch->>Approval: createApprovalRequest({ dispatchRun, action: 'secret-access' })
590
+ Dispatch-->>API: { run, approval, awaitingApproval: true }
591
+ end
592
+
593
+ Dispatch->>Workspace: 4. findReusableWorkspace({ org, repo, branch })
594
+ alt Reusable found
595
+ Workspace-->>Dispatch: claimWorkspace result
596
+ else No reusable
597
+ Dispatch->>Workspace: createWorkspace({ org, repo, branch })
598
+ Workspace-->>Dispatch: { workspace, pvcManifest }
599
+ end
600
+
601
+ Dispatch->>Context: 5. assembleContextBundle({ stack, repository, ref })
602
+ Context-->>Dispatch: contextBundle resource
603
+
604
+ Dispatch->>Dispatch: 6. Create AgentDispatchRun + AgentDispatchAttempt
605
+
606
+ Dispatch->>Mux: 7. agentMuxClient.launchSession({ stack, contextBundle })
607
+ alt Mux available and launch succeeds
608
+ Mux-->>Dispatch: { runId, sessionId }
609
+ Dispatch->>Dispatch: run.status.phase = 'Running'
610
+ Dispatch->>Mux: subscribeToEvents(runId, handler)
611
+ Dispatch->>Mux: reconcileTranscript(sessionId, events)
612
+ else Mux unavailable
613
+ Dispatch->>Dispatch: run.status.phase = 'Queued'
614
+ Dispatch->>Dispatch: condition: AgentMuxBound=False
615
+ end
616
+
617
+ Dispatch-->>API: { run, attempt, contextBundle, workspace, transcript }
618
+ ```
619
+
620
+ ### 8.2 Permission Review Steps
621
+
622
+ 1. Resolve AgentStack from resources
623
+ 2. Validate approvalMode (yolo/prompt/deny)
624
+ 3. Cross-org denial: agent org vs repository org
625
+ 4. Expand capabilities from stack spec (tools, MCP, skills, subagents)
626
+ 5. Untrusted fork detection (`refs/pull/\d+/`)
627
+ 6. Check AgentServiceAccount binding
628
+ 7. Check AgentRoleBinding for subject
629
+ 8. Check AgentSecretGrant for agent
630
+ 9. Check AgentConfigGrant for agent
631
+ 10. Compute decision: `allowed`, `requires-approval`, or `denied`
632
+
633
+ ### 8.3 Decision Matrix
634
+
635
+ | approvalMode | Errors | Fork | Decision |
636
+ |-------------|--------|------|----------|
637
+ | `deny` | any | any | `denied` |
638
+ | `yolo` | none | false | `allowed` |
639
+ | `yolo` | none | true | `allowed` (warnings only) |
640
+ | `prompt` | none | false | `requires-approval` |
641
+ | `prompt` | none | true | `requires-approval` |
642
+ | any | has errors | any | `denied` |
221
643
 
222
644
  ---
223
645
 
224
- ## 6. External Backend Pipeline
646
+ ## 9. External Sync Pipeline
647
+
648
+ ### 9.1 Complete Flow
649
+
650
+ ```mermaid
651
+ sequenceDiagram
652
+ participant Ext as External Provider (GitHub)
653
+ participant Ingress as POST /api/orgs/:org/agents/webhooks/ingest
654
+ participant HTTP as HTTP Handler
655
+ participant Normalize as normalizeWebhookEvent()
656
+ participant API as ApiController
657
+ participant Trigger as TriggerController
658
+ participant Webhook as WebhookController
659
+ participant Sync as SyncController
660
+ participant Conflict as ConflictController
661
+ participant Write as WriteController
662
+ participant K8s as Kubernetes
663
+
664
+ Ext->>Ingress: POST with X-Hub-Signature-256 header
665
+ Ingress->>HTTP: Match /api/orgs/:org/agents/webhooks/ingest
666
+ HTTP->>Normalize: normalizeWebhookEvent(body, org)
667
+ Note over Normalize: Pattern match: workflow_run, PR, comment, label, push
668
+ Normalize-->>HTTP: Canonical event { type, source, repository, ref, actor, payload }
669
+ HTTP->>API: processWebhookEvent({ event, organizationRef, namespace })
670
+ API->>API: snapshot()
671
+ API->>Trigger: createAgentTriggerController({ dispatchController })
672
+ API->>Trigger: processEvent({ event, resources, namespace, organizationRef })
673
+
674
+ Trigger->>Trigger: evaluateEvent — match event type against each rule's sources
675
+ loop For each matching rule
676
+ Trigger->>Trigger: Dedup check (existing TriggerExecution with same eventUid)
677
+ alt Not duplicate
678
+ Trigger->>Trigger: createTriggerExecution (phase: Dispatching)
679
+ Trigger->>Dispatch: createManualDispatch(...)
680
+ end
681
+ end
682
+ Trigger-->>API: { processed, dispatched, skipped, executions }
683
+ ```
684
+
685
+ ### 9.2 Webhook Event Normalization
686
+
687
+ Source: `normalizeWebhookEvent()` in `http-server.js`
688
+
689
+ | GitHub Action/Shape | Krate Event Type | Source Kind |
690
+ |--------------------|-----------------|-------------|
691
+ | `completed` + `workflow_run.conclusion=failure` | `ci-failure` | Pipeline |
692
+ | `opened` + `pull_request` | `pr-opened` | PullRequest |
693
+ | `created` + `comment` | `comment` | Issue/PullRequest |
694
+ | `labeled` | `label-added` | Issue/PullRequest |
695
+ | `opened` + `issue` (no PR) | `issue-created` | Issue |
696
+ | `ref` + `commits` | `push` | Repository |
697
+ | (fallback) | `webhook` | WebhookDelivery |
698
+
699
+ ### 9.3 HMAC Verification
700
+
701
+ Source: `external/webhook-controller.js`
702
+
703
+ ```javascript
704
+ verifyHmacSignature(body, signature) {
705
+ // 1. Reject if no signature header
706
+ // 2. Reject if not prefixed with 'sha256='
707
+ // 3. Compute expected: 'sha256=' + createHmac('sha256', secret).update(body).digest('hex')
708
+ // 4. timingSafeEqual(Buffer.from(expected), Buffer.from(signature))
709
+ // 5. Return { valid: true/false, reason }
710
+ }
711
+ ```
712
+
713
+ ### 9.4 Sync Controller Ownership Modes
714
+
715
+ | Mode | Krate Writes | External Writes |
716
+ |------|-------------|-----------------|
717
+ | `bidirectional` | Allowed | Allowed |
718
+ | `external-owned` | Blocked | Allowed |
719
+ | `krate-owned` | Allowed | Blocked |
720
+
721
+ ### 9.5 Watermark Tracking
225
722
 
226
- Source: `packages/krate/core/src/external/`
723
+ - Per-binding watermark stored as ISO timestamp
724
+ - Only advances forward (new timestamp must be > current)
725
+ - Persisted as `ExternalSyncWatermark` CRD resource
227
726
 
228
- ### 6.1 Pipeline Architecture
727
+ ---
728
+
729
+ ## 10. Memory Query Pipeline
730
+
731
+ ### 10.1 From Search Form to Results
229
732
 
230
733
  ```mermaid
231
- graph LR
232
- A[ExternalBackendProvider] --> B[ExternalBackendBinding]
233
- B --> C[ExternalWebhookDelivery]
234
- C --> D[ExternalSyncEvent]
235
- D --> E{Conflict?}
236
- E -->|Yes| F[ExternalSyncConflict]
237
- E -->|No| G[ExternalSyncState]
238
- F --> H[Resolution]
239
- H --> G
240
- G --> I[ExternalWriteIntent]
241
- I --> J[ExternalObjectLink]
242
- ```
243
-
244
- ### 6.2 Controllers
245
-
246
- | Controller | File | Responsibility |
247
- |-----------|------|----------------|
248
- | WebhookController | `external/webhook-controller.js` | HMAC-SHA256 verification, dedup, async event queue |
249
- | SyncController | `external/sync-controller.js` | Sync event processing, state management, watermarks |
250
- | ConflictController | `external/conflict-controller.js` | Conflict detection, resolution strategies |
251
- | WriteController | `external/write-controller.js` | Write intent queuing, approval, execution |
252
- | ProviderAdapter | `external/provider-adapter.js` | Provider-specific translation |
253
-
254
- ### 6.3 GitHub Adapter
255
-
256
- Source: `packages/krate/core/src/external/github/`
257
-
258
- - `auth.js` — GitHub App authentication, installation tokens
259
- - `git-forge.js` Repository, branch, PR operations
260
- - `issue-tracking.js` — Issues, labels, comments
261
- - `cicd.js` — Actions, workflows, check runs
262
- - `index.js` Unified GitHub adapter facade
734
+ sequenceDiagram
735
+ participant UI as Memory Search Form
736
+ participant HTTP as HTTP Handler
737
+ participant API as ApiController
738
+ participant Memory as MemoryController
739
+ participant QueryEngine as queryMemory()
740
+
741
+ UI->>HTTP: POST /api/orgs/:org/agents/memory/query { query, mode, kinds, depth }
742
+ HTTP->>API: queryAgentMemory({ query, mode, ... })
743
+ API->>Memory: queryMemory({ query, mode, organizationRef })
744
+ Memory->>QueryEngine: queryMemory({ records, documents, edges, query, mode })
745
+
746
+ alt mode = 'graph-only'
747
+ QueryEngine->>QueryEngine: queryGraph({ records, edges, query, kinds, depth })
748
+ Note over QueryEngine: buildAdjacency → filter by nodeKind → score → follow edges → sort
749
+ else mode = 'grep-only'
750
+ QueryEngine->>QueryEngine: queryGrep({ documents, query, paths, context })
751
+ Note over QueryEngine: filter by glob line-by-line search extract context
752
+ else mode = 'graph-and-grep'
753
+ QueryEngine->>QueryEngine: queryGraph + queryGrep (both)
754
+ end
755
+
756
+ QueryEngine-->>Memory: { graph, grep, stats }
757
+ Memory-->>API: result
758
+ API-->>HTTP: JSON response
759
+ HTTP-->>UI: { graph: { matches, totalMatches }, grep: { excerpts, totalMatches } }
760
+ ```
761
+
762
+ ### 10.2 Graph Scoring Algorithm
763
+
764
+ ```javascript
765
+ function scoreRecord(record, lowerQuery) {
766
+ const id = String(record.id || '').toLowerCase();
767
+ const attrs = JSON.stringify(record.attributes || {}).toLowerCase();
768
+ if (id.includes(lowerQuery)) return 2; // ID match: higher priority
769
+ if (attrs.includes(lowerQuery)) return 1; // Attribute match
770
+ return 0; // No match
771
+ }
772
+ ```
773
+
774
+ ### 10.3 Edge Traversal (BFS)
775
+
776
+ ```javascript
777
+ function followEdges(startId, adjacency, maxDepth) {
778
+ // BFS from startId up to maxDepth hops
779
+ // visited Set prevents cycles
780
+ // Returns flat array of all encountered edges
781
+ }
782
+ ```
783
+
784
+ ### 10.4 Grep Highlighting
785
+
786
+ Match output format:
787
+ ```javascript
788
+ {
789
+ path: 'docs/design.md',
790
+ lineNumber: 42,
791
+ line: 'The agent memory stores knowledge graphs...',
792
+ highlighted: 'The agent **memory** stores knowledge graphs...',
793
+ context: '...\nThe agent memory stores knowledge graphs...\n...',
794
+ contextStart: 41,
795
+ contextEnd: 43
796
+ }
797
+ ```
263
798
 
264
799
  ---
265
800
 
266
- ## 7. Memory System
801
+ ## 11. Workspace Provisioning
267
802
 
268
- Source: `packages/krate/core/src/agent-memory-controller.js`, `agent-memory-query.js`, `agent-memory-import.js`, `agent-memory-repository-source-controller.js`
803
+ ### 11.1 PVC-Based Provisioning
269
804
 
270
- ### 7.1 Pipeline
805
+ Source: `packages/krate/core/src/agent-workspace-controller.js`
271
806
 
272
807
  ```mermaid
273
- graph LR
274
- A[AgentMemoryRepository] --> B[AgentMemorySource]
275
- B --> C[AgentMemoryOntology]
276
- C --> D[AgentMemorySnapshot]
277
- D --> E[AgentMemoryQuery]
278
- F[AgentRunMemoryImport] --> A
279
- G[AgentMemoryAssociation] --> A
280
- H[AgentMemoryUpdate] --> A
808
+ flowchart TD
809
+ A[Dispatch trigger] --> B{Find reusable workspace?}
810
+ B -->|Yes: same repo+branch+Ready| C[claimWorkspace]
811
+ B -->|No| D[createWorkspace]
812
+
813
+ C --> E[Mark phase=InUse, set runRef]
814
+ D --> F[Generate workspace name]
815
+ F --> G[Generate PVC manifest]
816
+ G --> H[Create KrateWorkspace resource]
817
+
818
+ E --> I[getMountSpec]
819
+ H --> I
820
+
821
+ I --> J[Return { volume, volumeMount }]
822
+ J --> K[Attach to AgentDispatchRun.spec.mountSpec]
823
+ ```
824
+
825
+ ### 11.2 PVC Manifest Structure
826
+
827
+ ```javascript
828
+ {
829
+ apiVersion: 'v1',
830
+ kind: 'PersistentVolumeClaim',
831
+ metadata: {
832
+ name: 'krate-ws-<workspace-name>',
833
+ namespace: '<org-namespace>',
834
+ labels: {
835
+ 'krate.a5c.ai/workspace': '<workspace-name>',
836
+ 'krate.a5c.ai/org': '<org>'
837
+ }
838
+ },
839
+ spec: {
840
+ storageClassName: 'standard', // configurable via volumeSpec.storageClassName
841
+ accessModes: ['ReadWriteOnce'], // configurable
842
+ resources: { requests: { storage: '10Gi' } } // configurable via volumeSpec.capacity
843
+ }
844
+ }
281
845
  ```
282
846
 
283
- ### 7.2 Query Engine
847
+ ### 11.3 Codespace Pod Spec
284
848
 
285
- Source: `packages/krate/core/src/agent-memory-query.js`
849
+ When `launchCodespace()` is called:
850
+ - Image: `codercom/code-server:latest` (configurable)
851
+ - CPU: 1 core limit, 250m request
852
+ - Memory: 2Gi limit, 512Mi request
853
+ - Port: 8080
854
+ - Volume: PVC mount at `/workspace`
855
+ - Env: `KRATE_WORKSPACE`, `KRATE_ORG`, `GIT_AUTHOR_NAME`, `GIT_AUTHOR_EMAIL`
856
+ - Service: ClusterIP on port 8080
857
+ - URL pattern: `http://codespace-svc-<ws>.<namespace>.svc.cluster.local:8080`
286
858
 
287
- Three query modes:
288
- - **graph-only** — Graph traversal with adjacency, depth, nodeKind filtering, relevance scoring
289
- - **grep-only** — Full-text grep with context extraction
290
- - **graph-and-grep** — Combined query execution
859
+ ### 11.4 Workspace Phase Transitions
291
860
 
861
+ ```
862
+ Pending → Ready → InUse → Ready (release)
863
+ → Archived (archive)
864
+ → Terminating (delete)
865
+ Archived → Active (recover)
866
+ ```
867
+
868
+ ---
869
+
870
+ ## 12. Notification Pipeline
871
+
872
+ Source: `packages/krate/core/src/notification-controller.js`
873
+
874
+ ### 12.1 Event-to-Notification Mapping
875
+
876
+ | Source Event Type | Notification Type | Severity |
877
+ |-------------------|------------------|----------|
878
+ | `AgentDispatchRun` (completed) | `run-complete` | info |
879
+ | `AgentDispatchRun` (failed) | `run-complete` | error |
880
+ | `AgentApproval` (pending) | `approval-needed` | warning |
881
+ | `ExternalSyncConflict` | `conflict-detected` | warning |
882
+ | `KrateWorkspace` (claimed) | `workspace-ready` | info |
883
+ | (default) | `system` | info |
884
+
885
+ ### 12.2 Notification Delivery Flow
886
+
887
+ ```mermaid
888
+ sequenceDiagram
889
+ participant Controller as Any Controller
890
+ participant NotifCtrl as NotificationController
891
+ participant Store as In-Memory Store (Map)
892
+ participant Bus as EventBus
893
+ participant SSE as SSE Stream
894
+ participant Bell as NotificationBell
895
+
896
+ Controller->>NotifCtrl: createNotification(event)
897
+ NotifCtrl->>NotifCtrl: mapEventToNotification(event)
898
+ NotifCtrl->>Store: store.get(org).push(notification)
899
+ NotifCtrl->>Bus: emit({ type: 'notification', ... })
900
+ Bus->>SSE: Forward to all subscribers
901
+ SSE->>Bell: EventSource receives
902
+ Bell->>Bell: Increment unread count badge
903
+ ```
904
+
905
+ ### 12.3 User Preferences
906
+
907
+ Default preferences:
292
908
  ```javascript
293
- queryGraph({ records, edges, query, kinds, depth })
294
- queryGrep({ documents, query, contextLines })
295
- queryMemory({ records, documents, edges, query, mode, kinds, depth, contextLines })
909
+ { runs: true, approvals: true, conflicts: true, workspaces: true, sound: false, desktop: false }
296
910
  ```
297
911
 
298
- ### 7.3 Memory Import
912
+ ---
913
+
914
+ ## 13. Event Bus and SSE Streaming
915
+
916
+ ### 13.1 Event Bus Implementation
917
+
918
+ Source: `packages/krate/core/src/event-bus.js`
919
+
920
+ - Uses a `Set<Function>` for listeners (O(1) add/remove)
921
+ - `emit(event)` iterates all listeners synchronously
922
+ - `emitResourceChange(kind, name, operation)` adds timestamp
923
+ - Global singleton: `globalEventBus`
924
+
925
+ ### 13.2 SSE Endpoint
926
+
927
+ Route: `GET /api/orgs/:org/agents/events/stream`
928
+
929
+ Response headers:
930
+ ```
931
+ Content-Type: text/event-stream
932
+ Cache-Control: no-cache
933
+ Connection: keep-alive
934
+ X-Accel-Buffering: no
935
+ ```
299
936
 
300
- `AgentRunMemoryImport` imports curated babysitter run metadata into org memory with redaction and review controls.
937
+ Protocol:
938
+ 1. Initial: `data: {"type":"connected"}\n\n`
939
+ 2. Every 30s: `data: {"type":"heartbeat"}\n\n`
940
+ 3. On resource change: `data: {"type":"resource-change","kind":"...","name":"...","operation":"apply","timestamp":"..."}\n\n`
941
+ 4. On client disconnect: `clearInterval(heartbeat)`, `globalEventBus.unsubscribe(writer)`
301
942
 
302
943
  ---
303
944
 
304
- ## 8. Authentication Model
945
+ ## 14. Async Utilities
305
946
 
306
- Source: `packages/krate/core/src/auth.js`
947
+ Source: `packages/krate/core/src/async-controller.js`
948
+
949
+ ### 14.1 Event Batcher
950
+
951
+ ```javascript
952
+ createEventBatcher(handler, { maxBatchSize: 50, flushIntervalMs: 1000 })
953
+ ```
307
954
 
308
- ### 8.1 Providers
955
+ Behavior:
956
+ - Accumulates events in array
957
+ - Flushes when `batch.length >= maxBatchSize` (fire-and-forget)
958
+ - Flushes on timer (setTimeout) when batch has items but below threshold
959
+ - `flush()` forces immediate flush (awaitable)
960
+ - `stop()` clears timer and buffer
309
961
 
310
- | Provider | Type | Configuration |
311
- |----------|------|---------------|
312
- | GitHub | OAuth 2.0 | `KRATE_AUTH_GITHUB_*` env vars |
313
- | SSO | OIDC | `KRATE_AUTH_SSO_*` env vars |
314
- | Delegated | Header-based | `KRATE_AUTH_DELEGATED_*` env vars |
962
+ ### 14.2 Retry Policy
315
963
 
316
- ### 8.2 Session Management
964
+ ```javascript
965
+ createRetryPolicy({ maxRetries: 3, baseDelayMs: 1000, maxDelayMs: 30000, jitter: true })
966
+ ```
967
+
968
+ Delay formula: `min(baseDelayMs * 2^attempt, maxDelayMs)` with optional full-jitter `[0, capped]`
969
+
970
+ ### 14.3 Delivery Queue
971
+
972
+ ```javascript
973
+ createDeliveryQueue(processor, { concurrency: 5, retryPolicy })
974
+ ```
975
+
976
+ - In-memory ordered queue
977
+ - Up to `concurrency` items processed in parallel
978
+ - Each item retried per retryPolicy on failure
979
+ - `drain()` returns Promise that resolves when queue is empty and all active items complete
980
+ - `stop()` clears queue and resolves all drain waiters
981
+
982
+ ### 14.4 Checkpointer
983
+
984
+ ```javascript
985
+ createCheckpointer(storage = new Map())
986
+ ```
987
+
988
+ Simple key-value store: `save(key, value)`, `load(key)`, `clear(key)`, `listKeys()`
989
+
990
+ ---
991
+
992
+ ## 15. Controller Boundary Declarations
993
+
994
+ Every controller exports a frozen boundary object. This serves as both documentation and runtime introspection.
995
+
996
+ | Controller | Source File | Role | Owns | Must Not Own |
997
+ |-----------|-------------|------|------|--------------|
998
+ | KubernetesResourceClient | `kubernetes-controller.js` | kubectl execution | command exec, API discovery, access checks, watch streams | HTTP routes, pages, forge DTOs |
999
+ | KrateKubernetesReconciler | `kubernetes-controller.js` | Resource reconciliation | repo status, identity projection, hosting intent, policy sync | HTTP routes, pages, API DTOs |
1000
+ | KubernetesResourceGateway | `kubernetes-resource-gateway.js` | API port delegation | resource definitions, CRUD delegation, namespace scoping | HTTP routes, page flows, reconciliation |
1001
+ | KrateApiController | `api-controller.js` | HTTP facade | validation, DTOs, errors, workflow affordances, UI snapshots | kubectl execution, reconciliation loops |
1002
+ | AgentStackController | `agent-stack-controller.js` | Stack readiness | capability resolution, conditions, readiness, MCP health | secrets, dispatch execution, Mux sessions |
1003
+ | AgentDispatchController | `agent-dispatch-controller.js` | Dispatch orchestration | dispatch creation, attempt lifecycle, session binding, workspace | secrets, UI rendering |
1004
+ | AgentWorkspaceController | `agent-workspace-controller.js` | Workspace provisioning | workspace creation, PVC gen, git specs, mount specs, reuse, codespace | git execution, K8s API, secrets |
1005
+ | AgentTriggerController | `agent-trigger-controller.js` | Event routing | normalization, rule matching, trigger records, dispatch initiation | event sourcing, webhook delivery |
1006
+ | AgentApprovalController | `agent-approval-controller.js` | Approval gates | approval creation, decision recording, lookup, dedup | secrets, agent execution, UI |
1007
+ | AgentMemoryQuery | `agent-memory-query.js` | Query execution | graph traversal, filtering, scoring, grep, context extraction | persistence, HTTP, K8s, secrets |
1008
+ | WebhookController | `external/webhook-controller.js` | Inbound webhooks | HMAC validation, delivery records, dedup, event queue | resource persistence, ownership |
1009
+ | SyncController | `external/sync-controller.js` | External sync | normalization, upsert, watermarks, ownership, tombstones | HMAC, webhook delivery |
1010
+ | ConflictController | `external/conflict-controller.js` | Conflict detection | detection, resolution, superseded cleanup | write intent, sync scheduling |
1011
+ | WriteController | `external/write-controller.js` | Write intents | creation, approval gate, retry, idempotency | conflict resolution, sync state |
1012
+ | AuditController | `audit-controller.js` | Audit log | event recording, streaming, replay, metrics | identity, storage, git |
1013
+ | RunnerController | `runner-controller.js` | Runner pools | pool validation, lifecycle, scheduling, pod specs, capacity | K8s API calls, actual pod creation |
1014
+ | NotificationController | `notification-controller.js` | Notifications | creation, listing, read state, preferences | event dispatch, UI rendering, push |
1015
+ | PermissionReviewer | `agent-permission-review.js` | Permission review | capability expansion, grant resolution, snapshot creation | secrets, K8s API, runtime execution |
317
1016
 
318
- - Cookie name: `krate_session` (configurable via `KRATE_AUTH_COOKIE_NAME`)
319
- - Format: `base64url(payload).hmac_sha256_signature`
320
- - HMAC secret: `KRATE_SESSION_SECRET`
321
- - Timing-safe comparison for signature verification
322
- - `HttpOnly; SameSite=Lax` cookie attributes
1017
+ ---
1018
+
1019
+ ## 16. Concurrency Model
323
1020
 
324
- ### 8.3 Delegated Identity
1021
+ ### 16.1 Single-Threaded Event Loop
325
1022
 
326
- For environments with upstream proxy authentication:
327
- - `x-forwarded-user` header (user identity)
328
- - `x-forwarded-groups` header (group memberships)
329
- - `x-forwarded-email` header (email address)
330
- - Local development fallback with configurable defaults
1023
+ Krate core runs on Node.js's single-threaded event loop:
1024
+ - All kubectl calls use `spawnSync` (blocking) during snapshot collection
1025
+ - API request handling is async (Node HTTP server)
1026
+ - Background revalidation uses `Promise.resolve().then(...)` (microtask)
1027
+ - No worker threads or clustering in the core package
331
1028
 
332
- ### 8.4 Auth Middleware
1029
+ ### 16.2 Concurrent Access Patterns
333
1030
 
334
- All mutating API routes require authentication. The session cookie is parsed and verified on each request. Admin detection uses group membership (`krate:platform-engineers`, `krate:repo-admins`).
1031
+ | Pattern | Mechanism |
1032
+ |---------|-----------|
1033
+ | Multiple orgs cached | Per-org Map entries, independent TTLs |
1034
+ | SSE connections | Set of listener functions, one per connection |
1035
+ | Background revalidation | `revalidating` flag prevents thundering herd |
1036
+ | Event bus | Synchronous iteration over Set (no races) |
1037
+ | Audit store | Append-only array, seq counter |
1038
+ | Notification store | Per-org array, no locking needed |
335
1039
 
336
1040
  ---
337
1041
 
338
- ## 9. Deployment Architecture
1042
+ ## 17. Error Handling Strategy
1043
+
1044
+ ### 17.1 HTTP Layer
1045
+
1046
+ ```javascript
1047
+ try {
1048
+ // Route matching and handler execution
1049
+ } catch (error) {
1050
+ return send(response, 400, { error: 'bad_request', message: error.message });
1051
+ }
1052
+ ```
1053
+
1054
+ All unhandled errors in route handlers become 400 responses.
1055
+
1056
+ ### 17.2 Controller Layer
1057
+
1058
+ Controllers return error objects instead of throwing:
1059
+ ```javascript
1060
+ { error: true, reason: 'stack-not-found', message: 'AgentStack not found' }
1061
+ ```
1062
+
1063
+ ### 17.3 kubectl Layer
1064
+
1065
+ - `allowFailure: true` — returns `{ ok: false }`, caller decides
1066
+ - `allowFailure: false` — throws Error with `commandFailure()` message
1067
+
1068
+ ### 17.4 Audit Event Failures
1069
+
1070
+ ```javascript
1071
+ function emitAuditEvent(resource, operation) {
1072
+ try { ... } catch { /* Audit failures must not crash apply operations */ }
1073
+ }
1074
+ ```
1075
+
1076
+ ### 17.5 Background Revalidation Failures
1077
+
1078
+ ```javascript
1079
+ try { const fresh = await revalidateFn(); ... }
1080
+ catch { orgCacheMap.set(key, { ...current, revalidating: false }); }
1081
+ ```
1082
+
1083
+ ---
339
1084
 
340
- ### 9.1 Helm Chart
1085
+ ## 18. Deployment Architecture
341
1086
 
342
- The Krate Helm chart deploys multi-container pods:
1087
+ ### 18.1 Container Topology
343
1088
 
344
- | Container | Role |
345
- |-----------|------|
346
- | api | HTTP API server (port 3080) |
347
- | controllers | Background reconciliation controllers |
348
- | web | Next.js web console |
349
- | webhook-worker | Inbound webhook processing |
1089
+ | Container | Port | Role |
1090
+ |-----------|------|------|
1091
+ | api | 3080 | HTTP API server (`krate serve`) |
1092
+ | controllers | — | Background reconciliation (future) |
1093
+ | web | 3000 | Next.js web console |
1094
+ | webhook-worker | — | Inbound webhook processing |
350
1095
 
351
- ### 9.2 Infrastructure Requirements
1096
+ ### 18.2 CRD Management
352
1097
 
353
- - **AKS** (Azure Kubernetes Service) or compatible K8s cluster
354
- - **ACR** (Azure Container Registry) for image storage
355
- - **cert-manager** for TLS certificate provisioning
356
- - **nginx ingress** controller for HTTP routing
357
- - **PostgreSQL** for aggregated resource storage
358
- - **Gitea** for Git hosting backend
1098
+ - 76 CRDs under `krate.a5c.ai/v1alpha1`
1099
+ - All use `x-kubernetes-preserve-unknown-fields: true`
1100
+ - All namespaced
1101
+ - Platform resources (Organization, OrgNamespaceBinding) in `krate-system`
1102
+ - Org resources in `krate-org-<slug>` namespaces
359
1103
 
360
- ### 9.3 CRD Management
1104
+ ### 18.3 Infrastructure Requirements
361
1105
 
362
- 75 CRDs are defined under `krate.a5c.ai/v1alpha1`. All use:
363
- - `x-kubernetes-preserve-unknown-fields: true` for spec extensibility
364
- - Namespaced scope (platform resources in `krate-system`)
365
- - Labels for org association
1106
+ | Component | Purpose |
1107
+ |-----------|---------|
1108
+ | AKS (or compatible K8s) | Container orchestration |
1109
+ | ACR (or registry) | Image storage |
1110
+ | cert-manager | TLS provisioning |
1111
+ | nginx ingress | HTTP routing |
1112
+ | PostgreSQL | Aggregated resource storage |
1113
+ | Gitea | Git hosting backend |
1114
+ | Kyverno (optional) | Policy engine |
1115
+ | KubeVela (optional) | Application delivery |
366
1116
 
367
1117
  ---
368
1118
 
369
- ## 10. Performance Architecture
1119
+ ## 19. Data Storage Boundaries
370
1120
 
371
- ### 10.1 Caching Strategy
1121
+ | Storage Backend | Resource Count | Access Pattern |
1122
+ |----------------|---------------|----------------|
1123
+ | etcd (CRDs) | 44 CONFIG kinds | kubectl get/apply/delete |
1124
+ | PostgreSQL | 32 AGGREGATED kinds | In-memory during dev, runtime queries |
1125
+ | Gitea | Repository content | HTTP API, SSH |
1126
+ | In-memory | Notifications, audit, runners | Per-process, non-persistent |
1127
+ | Snapshot cache | Derived views | Stale-while-revalidate |
372
1128
 
373
- | Layer | Mechanism | TTL |
374
- |-------|-----------|-----|
375
- | Snapshot cache | Stale-while-revalidate | 30s (configurable) |
376
- | Per-org cache | Map-based, independent revalidation | 30s |
377
- | kubectl | Async spawn with output buffering | Per-request |
1129
+ ---
378
1130
 
379
- ### 10.2 Async Patterns
1131
+ ## 20. Configuration Reference
380
1132
 
381
- Source: `packages/krate/core/src/async-controller.js`
1133
+ ### 20.1 Core Server
382
1134
 
383
- | Utility | Purpose |
384
- |---------|---------|
385
- | `createEventBatcher` | Accumulates events, flushes by size or interval |
386
- | `createRetryPolicy` | Exponential backoff with jitter |
387
- | `createDeliveryQueue` | Ordered async delivery with error isolation |
388
- | `createCheckpointer` | Progress checkpoints for long-running operations |
1135
+ | Variable | Default | Purpose |
1136
+ |----------|---------|---------|
1137
+ | `KRATE_NAMESPACE` | `krate-system` | Platform namespace |
1138
+ | `KRATE_ORG` | `default` | Default organization |
1139
+ | `KRATE_SNAPSHOT_CACHE_TTL_MS` | `30000` | Cache freshness TTL |
1140
+ | `KRATE_GITEA_HTTP_URL` | (none) | Gitea API base URL |
389
1141
 
390
- ### 10.3 Event Bus
1142
+ ### 20.2 External Integrations
391
1143
 
392
- Source: `packages/krate/core/src/event-bus.js`
1144
+ | Variable | Default | Purpose |
1145
+ |----------|---------|---------|
1146
+ | `KRATE_KYVERNO_MODE` | auto | Kyverno integration mode |
1147
+ | `KRATE_KYVERNO_ENABLED` | (none) | Enable BYO Kyverno |
1148
+ | `KRATE_KYVERNO_NAMESPACE` | `kyverno` | Kyverno deployment namespace |
1149
+ | `KRATE_KYVERNO_POLICY_NAMESPACE` | platform ns | Policy storage namespace |
1150
+ | `KRATE_KUBEVELA_NAMESPACE` | `vela-system` | KubeVela system namespace |
393
1151
 
394
- - Global singleton (`globalEventBus`)
395
- - Pub/sub pattern: `subscribe(fn)`, `unsubscribe(fn)`, `emit(event)`
396
- - SSE streaming via `/api/orgs/:org/agents/events/stream`
397
- - 30s heartbeat for connection keepalive
1152
+ ### 20.3 Runtime Identity
1153
+
1154
+ | Variable | Default | Purpose |
1155
+ |----------|---------|---------|
1156
+ | `KRATE_SERVICE_ACCOUNT_DIR` | `/var/run/secrets/kubernetes.io/serviceaccount` | SA mount |
1157
+ | `KRATE_SERVICE_ACCOUNT_TOKEN` | `<SA_DIR>/token` | Token file path |
1158
+ | `KRATE_SERVICE_ACCOUNT_CA` | `<SA_DIR>/ca.crt` | CA cert path |
398
1159
 
399
1160
  ---
400
1161
 
401
- ## 11. Security Model
1162
+ ## 21. Resource Reconciliation Deep Dive
1163
+
1164
+ > Source: `packages/krate/core/src/kubernetes-controller.js`
1165
+
1166
+ ### 21.1 KRATE_RESOURCES Array
1167
+
1168
+ The `KRATE_RESOURCES` array (exported at module level) defines every resource the Krate control plane manages. Each entry carries the following fields:
402
1169
 
403
- ### 11.1 Session Security
1170
+ | Field | Type | Meaning |
1171
+ |-------|------|---------|
1172
+ | `kind` | string | PascalCase K8s kind (e.g. `'Organization'`) |
1173
+ | `plural` | string | Lowercase plural used in kubectl (e.g. `'organizations'`) |
1174
+ | `group` | string? | API group. Defaults to `krate.a5c.ai` when absent. KubeVela uses `core.oam.dev`, core K8s uses `''`. |
1175
+ | `namespaced` | boolean | Whether the resource lives in a namespace (`true`) or is cluster-scoped (`false`) |
1176
+ | `namespace` | string? | Fixed namespace override (e.g. `'krate-system'` for platform resources, `'vela-system'` for KubeVela defs) |
1177
+ | `storage` | string | Backend store: `'etcd'`, `'postgres'`, `'kubevela'`, `'kyverno'`, `'kyverno-reports'`, or `'core'` |
1178
+ | `platformScoped` | boolean? | When `true`, listed only from the platform namespace — not from per-org namespaces |
404
1179
 
405
- - HMAC-SHA256 signing of session cookies
406
- - `timingSafeEqual` for signature comparison (prevents timing attacks)
407
- - Signed cookies rejected when no secret configured
408
- - Unsigned cookies rejected when secret is configured
1180
+ **Resource categories and counts:**
409
1181
 
410
- ### 11.2 API Security
1182
+ | Storage | Count | Examples |
1183
+ |---------|-------|----------|
1184
+ | etcd (Krate CRDs) | 46 | Organization, User, Team, Repository, AgentStack, AgentSubagent, AgentToolProfile, AgentMcpServer, AgentSkill, AgentTriggerRule, AgentContextLabel, KrateWorkspacePolicy, AgentServiceAccount, AgentRoleBinding, AgentSecretGrant, AgentConfigGrant, AgentAdapter, AgentTransportBinding, AgentProviderConfig, KrateProject, AgentGatewayConfig, AgentMemoryRepository, AgentMemorySource, AgentMemoryOntology, AgentMemoryAssociation, ExternalBackendProvider, ExternalBackendBinding, ExternalBackendSyncPolicy |
1185
+ | postgres (aggregated) | 13 | PullRequest, Issue, Review, Pipeline, Job, WebhookDelivery, AgentDispatchRun, AgentDispatchAttempt, AgentSession, AgentContextBundle, KrateArtifact, AgentApproval, KrateWorkspace, AgentTriggerExecution, KrateWorkspaceRuntime, AgentSessionTranscript |
1186
+ | kubevela | 11 | KubeVelaApplication, KubeVelaApplicationRevision, KubeVelaComponentDefinition, KubeVelaWorkloadDefinition, KubeVelaTraitDefinition, KubeVelaScopeDefinition, KubeVelaPolicyDefinition, KubeVelaPolicy, KubeVelaWorkflowStepDefinition, KubeVelaWorkflow, KubeVelaResourceTracker |
1187
+ | kyverno / kyverno-reports | 10 | KyvernoPolicy, KyvernoClusterPolicy, KyvernoValidatingPolicy, KyvernoMutatingPolicy, KyvernoGeneratingPolicy, KyvernoDeletingPolicy, KyvernoImageValidatingPolicy, KyvernoPolicyException, PolicyReport, ClusterPolicyReport |
1188
+ | core (K8s built-in) | 2 | Secret, ConfigMap — excluded from snapshot, accessed on-demand |
411
1189
 
412
- - Auth middleware on all mutating routes
413
- - Org scoping prevents cross-tenant access
414
- - Namespace isolation in Kubernetes
1190
+ **Platform-scoped definitions** (only listed from `krate-system`):
1191
+ - `Organization` (namespace: `KRATE_PLATFORM_NAMESPACE`)
1192
+ - `OrgNamespaceBinding` (namespace: `KRATE_PLATFORM_NAMESPACE`)
415
1193
 
416
- ### 11.3 Webhook Security
1194
+ ### 21.2 getControllerSnapshot() — Step-by-Step
417
1195
 
418
- - HMAC-SHA256 verification for inbound webhooks
419
- - Timing-safe signature comparison
420
- - Deduplication by delivery ID
1196
+ `getControllerSnapshot(options)` is the synchronous entrypoint (uses `spawnSync`) that produces the full cluster state snapshot.
1197
+
1198
+ #### Step 1: currentContextResult()
1199
+
1200
+ ```javascript
1201
+ function currentContextResult(options) {
1202
+ const inCluster = inClusterKubectlConfig(options.env);
1203
+ if (inCluster) return { ok: true, stdout: `${inCluster.context}\n`, ... };
1204
+ return runKubectl(['config', 'current-context'], { ...options, allowFailure: true });
1205
+ }
1206
+ ```
1207
+
1208
+ Checks for in-cluster mode first (via `KUBERNETES_SERVICE_HOST` + service account files at `/var/run/secrets/kubernetes.io/serviceaccount/`). If found, returns synthetic result with context `'in-cluster'`. Otherwise runs `kubectl config current-context`.
1209
+
1210
+ **Failure mode:** Returns `{ ok: false }` — snapshot proceeds to build a degraded response with `kubectl.available: false`.
1211
+
1212
+ #### Step 2: versionResult
1213
+
1214
+ ```javascript
1215
+ runKubectl(['version', '--client=true', '-o', 'json'], { allowFailure: true })
1216
+ ```
1217
+
1218
+ Extracts `clientVersion.gitVersion` from JSON output. If both context and version fail, the snapshot is returned early with empty resource maps and `kubectl.available: false`.
1219
+
1220
+ #### Step 3: CRD Discovery Loop
1221
+
1222
+ ```javascript
1223
+ const crdResult = runKubectl(['get', 'crd', '-o', 'json'], { allowFailure: true });
1224
+ const discoveredCrds = crdResult.ok
1225
+ ? parseKubernetesList(crdResult.stdout).items.filter((crd) =>
1226
+ [KRATE_API_GROUP, KUBEVELA_API_GROUP].includes(crd.spec?.group) ||
1227
+ KYVERNO_DISCOVERY_GROUPS.has(crd.spec?.group))
1228
+ : [];
1229
+ const discoveredPluralSet = new Set(
1230
+ discoveredCrds.map((crd) => `${crd.spec?.group}/${crd.spec?.names?.plural}`)
1231
+ );
1232
+ ```
421
1233
 
422
- ### 11.4 Kubernetes RBAC
1234
+ Queries ALL cluster CRDs, then filters to only those belonging to `krate.a5c.ai`, `core.oam.dev`, `kyverno.io`, `policies.kyverno.io`, or `wgpolicyk8s.io`. The resulting `discoveredPluralSet` is used to decide which resources to actually query — avoids 404s for uninstalled CRDs.
423
1235
 
424
- - ClusterRole for Krate CRD access
425
- - ServiceAccount binding per agent (`AgentServiceAccount`)
426
- - `AgentRoleBinding` for managed RBAC projection
427
- - `AgentSecretGrant` / `AgentConfigGrant` for explicit secret access
1236
+ #### Step 4: platformScopedDefinitions vs orgScopedDefinitions
428
1237
 
429
- ### 11.5 CRD Extensibility
1238
+ ```javascript
1239
+ const platformScopedDefinitions = snapshotResources.filter((d) => d.platformScoped);
1240
+ const orgScopedDefinitions = snapshotResources.filter((d) => !d.platformScoped);
1241
+ ```
1242
+
1243
+ Platform-scoped resources (Organization, OrgNamespaceBinding) are listed first from their fixed namespace. This is required because org namespaces are derived from the Organization/OrgNamespaceBinding resources.
1244
+
1245
+ #### Step 5: organizationNamespaces() — Fallback Chain
1246
+
1247
+ ```javascript
1248
+ function organizationNamespaces(organizations, bindings, fallbackNamespace) {
1249
+ // 1. Extract spec.namespaceName from Organization items
1250
+ // 2. Extract spec.namespace from OrgNamespaceBinding items
1251
+ // 3. Deduplicate into a Set
1252
+ // If non-empty → return those namespaces
1253
+ // 4. Fallback: KRATE_ADMIN_ORG → orgNamespaceName(adminOrg)
1254
+ // 5. Fallback: KRATE_ORG || 'default' → orgNamespaceName(defaultOrg)
1255
+ // 6. Final fallback: platformNamespace itself
1256
+ }
1257
+ ```
1258
+
1259
+ The `orgNamespaceName(org)` function generates `krate-org-${slug}` from the org slug.
1260
+
1261
+ #### Step 6: Parallel Org-Scoped Resource Listing
1262
+
1263
+ For each org-scoped definition:
1264
+ 1. Skip if `shouldListSnapshotDefinition()` returns false (CRD not discovered and not krate.a5c.ai group)
1265
+ 2. Compute target namespaces: fixed namespace → use it alone; otherwise → all org namespaces
1266
+ 3. For each namespace, run `kubectl get <plural>.<group> -n <ns> -o json --ignore-not-found`
1267
+ 4. Flatten all items into `resources[definition.kind]`
1268
+
1269
+ #### Step 7: Event Collection
1270
+
1271
+ ```javascript
1272
+ runKubectl(['get', 'events', '-n', namespace, '-o', 'json', '--ignore-not-found'], { allowFailure: true })
1273
+ ```
1274
+
1275
+ Collects Kubernetes events from the platform namespace only (not org namespaces).
1276
+
1277
+ #### Step 8: Permission Matrix (canI Checks)
1278
+
1279
+ ```javascript
1280
+ const permissions = await Promise.all(
1281
+ snapshotResources
1282
+ .filter((d) => discoveredPluralSet.has(`${d.group || KRATE_API_GROUP}/${d.plural}`))
1283
+ .map(async (d) => ({
1284
+ kind: d.kind,
1285
+ plural: d.plural,
1286
+ verbs: Object.fromEntries(
1287
+ ['get', 'list', 'watch', 'create', 'update', 'patch', 'delete']
1288
+ .map((verb) => [verb, canI(verb, d, { kubectl, namespace, timeoutMs, env })])
1289
+ )
1290
+ }))
1291
+ );
1292
+ ```
1293
+
1294
+ For every discovered CRD, runs `kubectl auth can-i <verb> <plural>.<group> -n <ns>` for all 7 standard verbs. Result is `true`/`false` per verb.
1295
+
1296
+ #### Step 9: Kyverno Discovery
1297
+
1298
+ `discoverKyverno()` is called with the discoveredPluralSet. It:
1299
+ 1. Filters `KYVERNO_RESOURCES` to only those with discovered CRDs
1300
+ 2. Lists each Kyverno resource from the Kyverno policy namespace
1301
+ 3. Lists Kyverno controller deployments from the kyverno namespace
1302
+ 4. Runs `canI` checks for Kyverno resources
1303
+ 5. Extracts policy reports and violations
1304
+ 6. Reports degraded conditions if CRDs exist but controllers are missing
1305
+
1306
+ #### Step 10: Return Shape
1307
+
1308
+ ```typescript
1309
+ {
1310
+ source: 'kubernetes',
1311
+ mode: 'kubernetes-api',
1312
+ namespace: string, // Platform namespace
1313
+ generatedAt: string, // ISO timestamp
1314
+ correlationId: string, // UUID for request correlation
1315
+ kubectl: {
1316
+ binary: string, // kubectl path
1317
+ context: string | null, // Current context name
1318
+ clientVersion: string | null, // e.g. 'v1.28.3'
1319
+ available: boolean, // true if both context + version succeeded
1320
+ errors: string[] // Command failure messages
1321
+ },
1322
+ apiService: object | null, // Raw APIService JSON for krate API
1323
+ crds: object[], // Discovered Krate/KubeVela/Kyverno CRDs
1324
+ resources: Record<string, object[]>, // Map: kind → items array
1325
+ kyverno: object, // Full Kyverno discovery state
1326
+ events: object[], // K8s events from platform namespace
1327
+ permissions: object[], // Per-kind verb permission map
1328
+ storage: object, // Storage boundary descriptions
1329
+ commands: object[] // Generated kubectl commands per kind
1330
+ }
1331
+ ```
1332
+
1333
+ ---
1334
+
1335
+ ## 22. Controller-UI Model Construction
1336
+
1337
+ > Source: `packages/krate/core/src/controller-ui.js`
1338
+
1339
+ ### 22.1 createControllerUiModel(source, options)
1340
+
1341
+ Transforms a raw Kubernetes snapshot into a UI-ready model consumed by the web console.
1342
+
1343
+ **Parameters:**
1344
+ - `source` — Raw snapshot object (from `getControllerSnapshot()`) or a controller with `.snapshot()` method
1345
+ - `options.organization` / `options.org` — Requested org slug
1346
+
1347
+ **Pipeline:**
1348
+
1349
+ ```
1350
+ normalizeSnapshot(source)
1351
+ → ensureOrganizations(snapshot.resources.Organization)
1352
+ → resolve activeOrg (by slug match or first)
1353
+ → filterResourceItemsForOrg() per kind
1354
+ → assemble domain views (agent, delivery, policy, identity)
1355
+ → compute metrics
1356
+ → format events
1357
+ → build validation checks
1358
+ → return full model
1359
+ ```
1360
+
1361
+ ### 22.2 Organization Resolution
1362
+
1363
+ ```javascript
1364
+ function ensureOrganizations(organizations, platformNamespace) {
1365
+ if (organizations.length) return organizations.map((org) => ({
1366
+ name: org.metadata?.name,
1367
+ slug: org.spec?.slug || org.metadata?.name,
1368
+ displayName: org.spec?.displayName || slug,
1369
+ namespace: org.spec?.namespaceName || orgNamespaceName(slug),
1370
+ platformNamespace
1371
+ }));
1372
+ // Fallback: synthesize a 'default' org
1373
+ return [{ name: 'default', slug: 'default', displayName: 'Default org',
1374
+ namespace: 'krate-org-default', platformNamespace }];
1375
+ }
1376
+ ```
1377
+
1378
+ Active org is selected by matching `requestedOrg` against slug or name, falling back to `organizations[0]`.
1379
+
1380
+ ### 22.3 Resource Filtering by Org
1381
+
1382
+ ```javascript
1383
+ function filterResourceItemsForOrg(definition, items, org) {
1384
+ if (definition.kind === 'Organization') → filter by spec.slug match
1385
+ if (definition.kind === 'OrgNamespaceBinding') → filterByOrg (label/ref match)
1386
+ if (definition.namespace && !== orgNamespaceName(org)) → return all (system-level)
1387
+ default → filterByOrg (label/ref match)
1388
+ }
1389
+
1390
+ function filterByOrg(items, org) {
1391
+ const orgNamespace = orgNamespaceName(org);
1392
+ return items.filter((item) =>
1393
+ item.spec?.organizationRef === org ||
1394
+ item.metadata?.labels?.['krate.a5c.ai/org'] === org ||
1395
+ item.metadata?.namespace === orgNamespace
1396
+ );
1397
+ }
1398
+ ```
1399
+
1400
+ ### 22.4 Agent View Assembly
1401
+
1402
+ The `agentView` object is constructed from 14+ filtered resource arrays:
1403
+
1404
+ ```javascript
1405
+ const agentView = {
1406
+ org: activeOrg?.slug,
1407
+ stacks: { count, items: AgentStack[] },
1408
+ runs: { count, items: AgentDispatchRun[], active: [...non-terminal] },
1409
+ rules: { count, items: AgentTriggerRule[] },
1410
+ sessions: { count, items: AgentSession[] },
1411
+ workspaces: { count, items: KrateWorkspace[] },
1412
+ approvals: { count, items: AgentApproval[], pending: [...phase=Pending] },
1413
+ adapters: { count, items: AgentAdapter[] },
1414
+ providers: { count, items: AgentProviderConfig[] },
1415
+ projects: { count, items: KrateProject[] },
1416
+ gateway: AgentGatewayConfig | null,
1417
+ transcripts: { count, items: AgentSessionTranscript[] },
1418
+ memoryRepositories: { count, items: AgentMemoryRepository[] },
1419
+ memorySnapshots: { count, items: AgentMemorySnapshot[] },
1420
+ memoryImports: { count, items: AgentRunMemoryImport[], pending: [...] }
1421
+ };
1422
+ ```
1423
+
1424
+ ### 22.5 Delivery View (KubeVela)
1425
+
1426
+ `createDeliveryView()` assembles:
1427
+ - `installed` — boolean (any KubeVela definitions present)
1428
+ - `counts` — applications, releases, components, workloads, traits, scopes, policies, automations, managedResources
1429
+ - `capabilityCatalog` — names of installed component/trait/scope/policy/workflow-step definitions
1430
+ - `applications[]` — enriched with services, workflow status, releases, managed resources, YAML
1431
+ - `runtime` — releases, automations, policies, managedResources summaries
1432
+
1433
+ ### 22.6 Policy Engine View
1434
+
1435
+ `createPolicyEngineView()` produces:
1436
+ - `engine: 'kyverno'`, `mode`, `health` (disabled/ready/degraded)
1437
+ - `profiles`, `templates`, `bindings`, `exceptionRequests` — summarized via `policySummary()`
1438
+ - `kyvernoResources` — count per Kyverno kind
1439
+ - `controllers[]` — deployment health (name, ready, replicas)
1440
+ - `reports` — policyReports count, clusterPolicyReports count, results array
1441
+ - `violations[]` — filtered results with fail/error/warn status
1442
+
1443
+ ### 22.7 Identity View
1444
+
1445
+ `createIdentityView()` produces a fully-expanded view of:
1446
+ - `counts` — users, teams, pendingInvites, mappings, repositoryGrants, sshKeys
1447
+ - `providers[]` — name, label, type, enabled, phase
1448
+ - `users[]` — with email, teams, admin flag, disabled state
1449
+ - `teams[]` — with members, maintainers, repositoryGrants
1450
+ - `invites[]` — with email, role, teams, phase, expiresAt
1451
+ - `mappings[]` — with provider, subject, workspace/repository identity
1452
+ - `permissions[]` — with repository, subject, permission level, revoked state
1453
+ - `sshKeys[]` — with owner, scope, repository, revoked
1454
+ - `reconciliation` — counts, phases, statuses, nextActions (human-readable intents)
1455
+
1456
+ ### 22.8 Metrics
1457
+
1458
+ ```javascript
1459
+ metrics: {
1460
+ components, resources, events, auditEntries,
1461
+ users, teams, invites, repositories, pullRequests, issues, projects,
1462
+ pipelines, jobs, runnerPools, webhookDeliveries,
1463
+ policyViolations, policyBindings, deployments, releases,
1464
+ agentStacks, agentRuns, agentSessions,
1465
+ greenChecks, totalChecks
1466
+ }
1467
+ ```
1468
+
1469
+ ### 22.9 Events Formatting
1470
+
1471
+ Last 8 events are formatted as:
1472
+ ```javascript
1473
+ { type, storage: 'kubernetes', resource: 'Kind/namespace/name', actor, allowed: true, message }
1474
+ ```
1475
+
1476
+ ---
1477
+
1478
+ ## 23. HTTP Server Route Handlers
1479
+
1480
+ > Source: `packages/krate/core/src/http-server.js`
1481
+
1482
+ ### 23.1 Server Factory
1483
+
1484
+ ```javascript
1485
+ export function createKrateHttpServer(options) {
1486
+ return createServer(createKrateHttpHandler(options));
1487
+ }
1488
+
1489
+ export function createKrateHttpHandler({ runtime, controller }) {
1490
+ return async function handleKrateRequest(request, response) { ... };
1491
+ }
1492
+ ```
1493
+
1494
+ All routes use regex pattern matching against `url.pathname`. JSON responses via `send(response, status, body)` with `content-type: application/json; charset=utf-8`.
1495
+
1496
+ ### 23.2 Route Table
1497
+
1498
+ | Method | Pattern | Handler |
1499
+ |--------|---------|---------|
1500
+ | GET | `/healthz` | Returns `{ ok: true, project: 'Krate' }` |
1501
+ | GET | `/api/controller?org=:org` | Full UI model via `createControllerUiModel(snapshot, { organization })` |
1502
+ | GET/POST | `/api/orgs` | List orgs / create organization |
1503
+ | GET/POST | `/api/orgs/:org/resources` | List resources by kind (query `?kind=`) / apply resource |
1504
+ | GET/DELETE | `/api/orgs/:org/resources/:kind/:name` | Get or delete specific resource |
1505
+ | GET/POST | `/api/orgs/:org/repositories` | List / create repositories |
1506
+ | GET/DELETE | `/api/orgs/:org/repositories/:name` | Get / delete specific repository |
1507
+ | GET/POST | `/api/orgs/:org/snapshot` | Get runtime snapshot / import snapshot |
1508
+ | GET | `/api/orgs/:org/runtime-resources/:kind` | List runtime resources by kind |
1509
+ | POST | `/api/orgs/:org/repositories/:name/objects` | Record git object |
1510
+ | POST | `/api/orgs/:org/repositories/:name/search-index` | Enqueue search index |
1511
+ | POST | `/api/orgs/:org/pullrequests` | Create pull request |
1512
+ | POST | `/api/orgs/:org/pullrequests/:name/reviews` | Add review |
1513
+ | POST | `/api/orgs/:org/pullrequests/:name/checks/complete` | Complete pipeline check |
1514
+ | POST | `/api/orgs/:org/pullrequests/:name/merge` | Merge pull request |
1515
+ | POST | `/api/orgs/:org/agents/approvals/:name/decide` | Approve/deny agent approval |
1516
+ | POST | `/api/orgs/:org/agents/webhooks/ingest` | Webhook ingestion (GitHub/Gitea normalization) |
1517
+ | POST | `/api/orgs/:org/agents/events/pipeline-failure` | Pipeline failure event |
1518
+ | POST | `/api/orgs/:org/agents/events/comment` | Comment event |
1519
+ | POST | `/api/orgs/:org/agents/events/label` | Label event |
1520
+ | POST | `/api/orgs/:org/agents/triggers/process` | Evaluate event against trigger rules |
1521
+ | POST | `/api/orgs/:org/agents/memory/query` | Memory graph+grep search |
1522
+ | GET/POST | `/api/orgs/:org/secrets` | List / create secrets (AgentSecretGrant) |
1523
+ | DELETE | `/api/orgs/:org/secrets/:name` | Delete secret grant |
1524
+ | GET/POST | `/api/orgs/:org/secret-grants` | List / create secret grants |
1525
+ | POST | `/api/orgs/:org/external/sync` | Trigger external sync for binding |
1526
+ | POST | `/api/orgs/:org/external/conflicts/:name/resolve` | Resolve external sync conflict |
1527
+ | POST | `/api/orgs/:org/external/write-intents/:name/approve` | Approve write intent |
1528
+ | POST | `/api/orgs/:org/external/write-intents/:name/cancel` | Cancel write intent |
1529
+ | GET | `/api/orgs/:org/agents/events/stream` | **SSE endpoint** |
1530
+
1531
+ ### 23.3 SSE Endpoint Implementation
1532
+
1533
+ ```javascript
1534
+ const sseMatch = url.pathname.match(/^\/api\/orgs\/([^/]+)\/agents\/events\/stream$/);
1535
+ if (request.method === 'GET' && sseMatch) {
1536
+ response.writeHead(200, {
1537
+ 'Content-Type': 'text/event-stream',
1538
+ 'Cache-Control': 'no-cache',
1539
+ 'Connection': 'keep-alive',
1540
+ 'X-Accel-Buffering': 'no', // Disable nginx buffering
1541
+ });
1542
+ response.write('data: {"type":"connected"}\n\n');
1543
+
1544
+ const writer = (event) => {
1545
+ response.write(`data: ${JSON.stringify(event)}\n\n`);
1546
+ };
1547
+ globalEventBus.subscribe(writer);
1548
+
1549
+ const interval = setInterval(() => {
1550
+ response.write('data: {"type":"heartbeat"}\n\n');
1551
+ }, 30000); // 30-second heartbeat
1552
+
1553
+ request.on('close', () => {
1554
+ clearInterval(interval);
1555
+ globalEventBus.unsubscribe(writer);
1556
+ });
1557
+ }
1558
+ ```
1559
+
1560
+ **Key behaviors:**
1561
+ - Sends `{"type":"connected"}` immediately on connection
1562
+ - Subscribes a writer function to `globalEventBus`
1563
+ - Sends heartbeat every 30 seconds to keep connection alive through proxies
1564
+ - On client disconnect: clears interval and unsubscribes from event bus
1565
+ - No CORS headers (handled by proxy or web framework)
1566
+
1567
+ ### 23.4 Webhook Event Normalization
1568
+
1569
+ `normalizeWebhookEvent(body, org)` maps raw GitHub/Gitea payloads:
1570
+
1571
+ | Condition | Normalized Type |
1572
+ |-----------|-----------------|
1573
+ | `action='completed'` + `workflow_run.conclusion='failure'` | `ci-failure` |
1574
+ | `action='opened'` + `pull_request` present | `pr-opened` |
1575
+ | `action='created'` + `comment` present | `comment` |
1576
+ | `action='labeled'` | `label-added` |
1577
+ | `action='opened'` + `issue` (no PR) | `issue-created` |
1578
+ | `ref` + `commits` present | `push` |
1579
+ | fallback | `webhook` |
1580
+
1581
+ ### 23.5 Error Handling
1582
+
1583
+ All routes are wrapped in a try/catch. Unhandled errors return:
1584
+ ```json
1585
+ { "error": "bad_request", "message": "<error.message>" }
1586
+ ```
1587
+ with status 400. Unmatched routes return 404:
1588
+ ```json
1589
+ { "error": "not_found", "method": "GET", "path": "/unknown" }
1590
+ ```
1591
+
1592
+ ---
1593
+
1594
+ ## 24. Async Snapshot Architecture
1595
+
1596
+ > Source: `packages/krate/core/src/kubernetes-controller-async.js`
1597
+
1598
+ ### 24.1 runKubectlAsync(args, options)
1599
+
1600
+ Promise-based kubectl wrapper using `child_process.spawn`. Returns the same shape as the sync `runKubectl`:
1601
+
1602
+ ```typescript
1603
+ {
1604
+ ok: boolean,
1605
+ status: number | null,
1606
+ signal: string | null,
1607
+ stdout: string,
1608
+ stderr: string,
1609
+ error: string | null,
1610
+ command: string // Reconstructed command string for diagnostics
1611
+ }
1612
+ ```
1613
+
1614
+ **Timeout handling:**
1615
+ - Timer fires after `timeoutMs` (default 3000ms from `KRATE_KUBECTL_TIMEOUT_MS`)
1616
+ - Sends SIGTERM to the child process
1617
+ - If `allowFailure: true` → resolves with `{ ok: false, error: 'kubectl timed out...' }`
1618
+ - If `allowFailure: false` → rejects with Error
1619
+
1620
+ **stdin:** If `options.input` is provided, writes to child stdin then closes it.
1621
+
1622
+ ### 24.2 getControllerSnapshotAsync(options) — Parallel Execution Strategy
1623
+
1624
+ Three-phase parallel execution:
1625
+
1626
+ **Phase 1:** Context + version in parallel
1627
+ ```javascript
1628
+ const [contextResult, versionResult] = await Promise.all([
1629
+ inClusterContext || runKubectlAsync(['config', 'current-context'], ...),
1630
+ runKubectlAsync(['version', '--client=true', '-o', 'json'], ...)
1631
+ ]);
1632
+ ```
1633
+
1634
+ **Phase 2:** API service + CRD discovery in parallel
1635
+ ```javascript
1636
+ const [apiServiceResult, crdResult] = await Promise.all([
1637
+ runKubectlAsync(['get', 'apiservice', KRATE_API_VERSIONED_GROUP, ...]),
1638
+ runKubectlAsync(['get', 'crd', '-o', 'json'], ...)
1639
+ ]);
1640
+ ```
1641
+
1642
+ **Phase 3:** All resource kinds in parallel
1643
+ ```javascript
1644
+ // First: platform-scoped (to discover org namespaces)
1645
+ const platformResults = await Promise.all(
1646
+ platformScopedDefs.map((definition) => runKubectlAsync([...]))
1647
+ );
1648
+ // Derive org namespaces from results
1649
+ const orgNamespaces = resolveOrgNamespaces(resources.Organization, ...);
1650
+
1651
+ // Then: all org-scoped resources in parallel (each definition across all namespaces)
1652
+ const orgResults = await Promise.all(
1653
+ orgScopedDefs.map(async (definition) => {
1654
+ const itemArrays = await Promise.all(
1655
+ namespaces.map((ns) => runKubectlAsync([...]))
1656
+ );
1657
+ return { definition, items: itemArrays.flat() };
1658
+ })
1659
+ );
1660
+ ```
1661
+
1662
+ **Error fallback:** On any unexpected error, imports and falls back to the synchronous `getControllerSnapshot()`:
1663
+ ```javascript
1664
+ catch (error) {
1665
+ const { getControllerSnapshot } = await import('./kubernetes-controller.js');
1666
+ return getControllerSnapshot(options);
1667
+ }
1668
+ ```
1669
+
1670
+ ### 24.3 getPartialSnapshot(kinds, options)
1671
+
1672
+ Fetches only the requested resource kinds. Used by pages that need a subset (e.g. only `AgentStack` + `AgentSession`).
1673
+
1674
+ ```javascript
1675
+ export async function getPartialSnapshot(kinds = [], options = {}) {
1676
+ // 1. Resolve each kind string to a KRATE_RESOURCES definition (skip unknown)
1677
+ // 2. If any org-scoped kind is needed, pre-fetch Organization + OrgNamespaceBinding
1678
+ // to compute orgNamespaces
1679
+ // 3. Fetch all requested definitions in parallel (across all applicable namespaces)
1680
+ // 4. Return { source: 'kubernetes', mode: 'partial', namespace, generatedAt, resources }
1681
+ }
1682
+ ```
1683
+
1684
+ Return shape is minimal — no kubectl metadata, no permissions, no events. Just `resources: Record<kind, items[]>`.
1685
+
1686
+ ### 24.4 watchResourceChanges(callback, options)
1687
+
1688
+ Lightweight watch that invalidates the snapshot cache on any change.
1689
+
1690
+ ```javascript
1691
+ export function watchResourceChanges(callback, options = {}) {
1692
+ const watchKinds = options.kinds || ['Organization', 'AgentStack', 'AgentSession'];
1693
+ const children = []; // Array of spawned child processes
1694
+
1695
+ for (const kind of watchKinds) {
1696
+ const child = spawn(kubectl, [...args, '--watch', '-o', 'json'], ...);
1697
+ let buffer = '';
1698
+ child.stdout.on('data', (chunk) => {
1699
+ buffer += chunk.toString();
1700
+ // Parse newline-delimited JSON objects
1701
+ while ((newlineIdx = buffer.indexOf('\n')) !== -1) {
1702
+ const item = safeJson(line);
1703
+ if (item) {
1704
+ clearSnapshotCache(); // Invalidate ALL cached snapshots
1705
+ callback(kind, item); // User callback (errors swallowed)
1706
+ }
1707
+ }
1708
+ });
1709
+ children.push(child);
1710
+ }
1711
+
1712
+ return { stop() { children.forEach(c => c.kill('SIGTERM')); } };
1713
+ }
1714
+ ```
1715
+
1716
+ **Key behaviors:**
1717
+ - Default watched kinds: Organization, AgentStack, AgentSession
1718
+ - Uses `--watch -o json` for streaming JSON from kubectl
1719
+ - Parses newline-delimited JSON (not JSON array)
1720
+ - On any valid object: calls `clearSnapshotCache()` (see §25)
1721
+ - Returns `{ stop }` cleanup handle for graceful shutdown
1722
+
1723
+ ### 24.5 Differences from Sync Version
1724
+
1725
+ | Aspect | Sync (`getControllerSnapshot`) | Async (`getControllerSnapshotAsync`) |
1726
+ |--------|------|------|
1727
+ | Process execution | `spawnSync` | `spawn` + Promise |
1728
+ | Resource listing | Sequential loop | `Promise.all` parallel |
1729
+ | Permission checks | Inline `canI` per resource | Skipped (returns `[]`) |
1730
+ | Kyverno discovery | Full `discoverKyverno()` | Returns `emptyKyverno()` stub |
1731
+ | Error recovery | Throws or returns degraded | Falls back to sync version |
1732
+ | Event collection | Included | Included (async) |
1733
+
1734
+ ---
1735
+
1736
+ ## 25. Snapshot Cache Architecture
1737
+
1738
+ > Source: `packages/krate/core/src/snapshot-cache.js`
1739
+
1740
+ ### 25.1 Data Structures
1741
+
1742
+ ```javascript
1743
+ // Per-org cache: Map<string, CacheEntry>
1744
+ const orgCacheMap = new Map();
1745
+
1746
+ // CacheEntry shape:
1747
+ { data: object, timestamp: number, revalidating: boolean }
1748
+
1749
+ // Legacy single-org cache (backward compatibility with controller-client.js):
1750
+ let snapshotCache = { data: null, timestamp: 0, org: null };
1751
+ ```
1752
+
1753
+ ### 25.2 Constants
1754
+
1755
+ ```javascript
1756
+ export const CACHE_TTL_MS = Number(process.env.KRATE_SNAPSHOT_CACHE_TTL_MS || 30_000);
1757
+ ```
1758
+
1759
+ Default: 30 seconds. Configurable via environment.
1760
+
1761
+ ### 25.3 staleWhileRevalidate(org, revalidateFn, swrOptions)
1762
+
1763
+ Full algorithm:
1764
+
1765
+ ```javascript
1766
+ export async function staleWhileRevalidate(org, revalidateFn, swrOptions = {}) {
1767
+ const ttlMs = swrOptions.ttlMs ?? CACHE_TTL_MS; // Fresh window (default 30s)
1768
+ const staleMs = swrOptions.staleMs ?? ttlMs * 5; // Max staleness (default 150s)
1769
+ const entry = orgCacheMap.get(org ?? '');
1770
+ const now = Date.now();
1771
+
1772
+ const isFresh = entry?.data && (now - entry.timestamp) < ttlMs;
1773
+ const isStale = entry?.data && (now - entry.timestamp) < staleMs;
1774
+
1775
+ // Case 1: Fresh — return immediately, no revalidation
1776
+ if (isFresh) return entry.data;
1777
+
1778
+ // Case 2: Stale + not already revalidating — return stale, background refresh
1779
+ if (isStale && !entry.revalidating) {
1780
+ orgCacheMap.set(key, { ...entry, revalidating: true });
1781
+ Promise.resolve().then(async () => {
1782
+ try {
1783
+ const fresh = await revalidateFn();
1784
+ setOrgCache(fresh, org); // Updates both orgCacheMap and legacy cache
1785
+ } catch {
1786
+ // Clear revalidating flag so future requests can retry
1787
+ orgCacheMap.set(key, { ...current, revalidating: false });
1788
+ }
1789
+ });
1790
+ return entry.data; // Return stale immediately
1791
+ }
1792
+
1793
+ // Case 3: Stale + already revalidating — return stale (another caller is refreshing)
1794
+ if (isStale && entry.revalidating) return entry.data;
1795
+
1796
+ // Case 4: No usable cache — block on revalidation
1797
+ const fresh = await revalidateFn();
1798
+ setOrgCache(fresh, org);
1799
+ return fresh;
1800
+ }
1801
+ ```
1802
+
1803
+ ### 25.4 Cache API Summary
1804
+
1805
+ | Function | Behavior |
1806
+ |----------|----------|
1807
+ | `getOrgCache(org)` | Returns `CacheEntry` or `null` |
1808
+ | `setOrgCache(data, org)` | Stores entry with `Date.now()` timestamp, clears `revalidating` |
1809
+ | `clearOrgCache(org)` | Removes single org entry; clears legacy if matching |
1810
+ | `clearSnapshotCache()` | Clears ALL orgs + legacy cache |
1811
+ | `isCacheFresh(org, ttlMs?)` | `(Date.now() - entry.timestamp) < ttlMs` |
1812
+ | `cachedOrgs()` | Returns `[...orgCacheMap.keys()]` for introspection |
1813
+ | `setSnapshotCache(data, org)` | Legacy API: updates both stores |
1814
+ | `getSnapshotCache()` | Legacy API: returns `{ data, timestamp, org }` |
1815
+
1816
+ ### 25.5 Background Revalidation
1817
+
1818
+ The revalidation promise is fire-and-forget (`Promise.resolve().then(async () => {...})`). On error:
1819
+ - The `revalidating` flag is cleared so the next caller can try again
1820
+ - The stale data remains available until `staleMs` expires
1821
+ - No retries — the next request triggers a new revalidation attempt
1822
+
1823
+ ---
1824
+
1825
+ ## 26. Auth System Deep Dive
1826
+
1827
+ > Source: `packages/krate/core/src/auth.js`
1828
+
1829
+ ### 26.1 createAuthProviderConfig(env)
1830
+
1831
+ Parses environment variables into a provider configuration object:
1832
+
1833
+ ```javascript
1834
+ return {
1835
+ session: { cookieName: env.KRATE_AUTH_COOKIE_NAME || 'krate_session' },
1836
+ delegatedIdentity: {
1837
+ enabled: env.KRATE_AUTH_DELEGATED_IDENTITY_ENABLED === 'true',
1838
+ userHeader: env.KRATE_AUTH_DELEGATED_USER_HEADER || 'x-forwarded-user',
1839
+ groupsHeader: env.KRATE_AUTH_DELEGATED_GROUPS_HEADER || 'x-forwarded-groups',
1840
+ emailHeader: env.KRATE_AUTH_DELEGATED_EMAIL_HEADER || 'x-forwarded-email',
1841
+ localDevelopment: {
1842
+ enabled: delegatedLocalDevelopmentEnabled(env), // true unless NODE_ENV=production
1843
+ user: env.KRATE_AUTH_DELEGATED_LOCAL_USER || 'local-developer',
1844
+ email: env.KRATE_AUTH_DELEGATED_LOCAL_EMAIL || '',
1845
+ groups: env.KRATE_AUTH_DELEGATED_LOCAL_GROUPS || 'krate:repo-admins'
1846
+ }
1847
+ },
1848
+ providers: {
1849
+ github: { id: 'github', label: 'GitHub', type: 'github', enabled, clientId, clientSecret, authorizationUrl, tokenUrl, userInfoUrl, scopes },
1850
+ sso: { id: 'sso', label: '<configurable>', type: 'oidc', enabled, issuerUrl, clientId, clientSecret, authorizationUrl, tokenUrl, userInfoUrl, scopes }
1851
+ }
1852
+ };
1853
+ ```
1854
+
1855
+ **Provider enablement:**
1856
+ - GitHub: enabled unless `KRATE_AUTH_GITHUB_ENABLED=false`
1857
+ - SSO: enabled only when `KRATE_AUTH_SSO_ENABLED=true`
1858
+
1859
+ ### 26.2 listEnabledAuthProviders(config)
1860
+
1861
+ ```javascript
1862
+ Object.values(config.providers).filter((p) => p.enabled && p.clientId && p.authorizationUrl)
1863
+ ```
1864
+
1865
+ Returns only providers that are both enabled AND have credentials configured.
1866
+
1867
+ ### 26.3 buildAuthorizationRedirect({ provider, requestUrl, state })
1868
+
1869
+ 1. Validates provider is enabled with clientId and authorizationUrl
1870
+ 2. Constructs `redirectUri` = `${protocol}://${host}/api/auth/callback/${provider.id}`
1871
+ 3. Builds authorization URL with query params: `response_type=code`, `client_id`, `redirect_uri`, `scope`, `state`
1872
+ 4. Returns `{ url, state, redirectUri }`
1873
+
1874
+ State token generation: `Math.random().toString(36).slice(2) + Date.now().toString(36)`
1875
+
1876
+ ### 26.4 exchangeOAuthCodeForProfile({ provider, code, requestUrl, fetchImpl })
1877
+
1878
+ 1. POSTs to `provider.tokenUrl` with `grant_type=authorization_code`, `code`, `redirect_uri`, `client_id`, `client_secret`
1879
+ 2. Extracts `access_token` from JSON response
1880
+ 3. GETs `provider.userInfoUrl` with `Authorization: Bearer <token>`
1881
+ 4. Normalizes profile via `normalizeProviderProfile(provider, profile)`
1882
+
1883
+ **Profile normalization:**
1884
+ - GitHub: extracts `login`, `id`, `email`, `name`; groups = `[]`
1885
+ - OIDC/SSO: extracts `sub`, `email`, `preferred_username`, `groups` (comma-split if string)
1886
+ - Admin detection: groups include `krate:platform-engineers` or `krate:repo-admins`
1887
+
1888
+ ### 26.5 registerLoginProfile({ controller, namespace, profile })
1889
+
1890
+ 1. Determines org from `KRATE_ADMIN_ORG || KRATE_ORG || 'default'`
1891
+ 2. Detects bootstrap admin: compares profile username/email against `KRATE_ADMIN_USERNAME`
1892
+ 3. Calls `mapLoginProfileToKrateIdentity()` to produce User + IdentityMapping resources
1893
+ 4. Applies both via `controller.applyResource()`
1894
+ 5. Returns `{ identity, user, mapping, userResult, mappingResult }`
1895
+
1896
+ ### 26.6 createSessionCookie(config, profile, options)
1897
+
1898
+ ```javascript
1899
+ // 1. Build JSON payload
1900
+ const payload = Buffer.from(JSON.stringify({
1901
+ provider: profile.provider,
1902
+ subject: profile.subject,
1903
+ user: profile.username || profile.email
1904
+ })).toString('base64url');
1905
+
1906
+ // 2. Sign if KRATE_SESSION_SECRET is set
1907
+ if (secret) {
1908
+ const signature = createHmac('sha256', secret).update(payload).digest('base64url');
1909
+ value = `${payload}.${signature}`;
1910
+ } else {
1911
+ value = payload; // Unsigned (development mode)
1912
+ }
1913
+
1914
+ // 3. Return Set-Cookie header value
1915
+ return `${cookieName}=${value}; Path=/; HttpOnly; SameSite=Lax`;
1916
+ ```
1917
+
1918
+ ### 26.7 parseSessionCookie(config, cookieValue, options)
1919
+
1920
+ ```javascript
1921
+ // 1. Detect signature presence (indexOf('.'))
1922
+ // 2. Reject: signed cookie + no secret, or unsigned cookie + secret configured
1923
+ // 3. If signed: extract payload + signature, verify with HMAC-SHA256 + timingSafeEqual
1924
+ // 4. Decode: JSON.parse(Buffer.from(payload, 'base64url'))
1925
+ // 5. Return { cookieName, provider, subject, user } or null on any failure
1926
+ ```
1927
+
1928
+ **Security properties:**
1929
+ - Constant-time comparison via `timingSafeEqual`
1930
+ - Rejects mismatched length buffers before comparison
1931
+ - Silent failure (returns null) — no error messages leaked
1932
+
1933
+ ### 26.8 mapLoginProfileToKrateIdentity(profile)
1934
+
1935
+ Creates two Krate CRD resources:
1936
+
1937
+ **User resource:**
1938
+ ```javascript
1939
+ createResource('User', { name: userName, namespace, labels: { role } }, {
1940
+ organizationRef, displayName, email, username, teams, admin, disabled: false
1941
+ }, { phase: 'Active', lastLoginProvider, groups })
1942
+ ```
1943
+
1944
+ **IdentityMapping resource:**
1945
+ ```javascript
1946
+ createResource('IdentityMapping', { name: `${provider}-${userName}`, namespace }, {
1947
+ organizationRef, user, provider, subject, email,
1948
+ workspaceIdentity: { name, uid, groups }, // From mapOidcIdentity()
1949
+ repositoryIdentity: { username, email }
1950
+ }, { phase: 'Synced' })
1951
+ ```
1952
+
1953
+ ### 26.9 profileFromDelegatedHeaders(headers, config, options)
1954
+
1955
+ For reverse-proxy authentication (e.g. OAuth2 Proxy, Authelia):
1956
+ 1. Reads user from configured header (`x-forwarded-user` by default)
1957
+ 2. Falls back to local development profile if no header and localhost request
1958
+ 3. Reads email from email header
1959
+ 4. Reads groups from groups header (comma-separated string → array)
1960
+ 5. Admin detection: same group check as OAuth flow
1961
+ 6. Returns profile with `delegatedIdentitySource: 'proxy-header' | 'local-development'`
1962
+
1963
+ ### 26.10 normalizeName(value)
1964
+
1965
+ ```javascript
1966
+ String(value || 'user').toLowerCase()
1967
+ .replace(/[^a-z0-9-]+/g, '-') // Non-alphanumeric → dash
1968
+ .replace(/^-+|-+$/g, '') // Trim leading/trailing dashes
1969
+ .slice(0, 63) // K8s name length limit
1970
+ || 'user' // Fallback if empty after normalization
1971
+ ```
1972
+
1973
+ ### 26.11 KRATE_SESSION_SECRET Flow
1974
+
1975
+ | Scenario | Cookie Format | Verification |
1976
+ |----------|---------------|--------------|
1977
+ | No secret (dev) | `base64url(json)` | Accepts any base64url payload |
1978
+ | Secret set (prod) | `base64url(json).hmac_sha256_base64url` | Rejects unsigned, verifies HMAC |
1979
+ | Signed cookie + no secret | — | Rejected (returns null) |
1980
+ | Unsigned cookie + secret | — | Rejected (returns null) |
1981
+
1982
+ ---
1983
+
1984
+ ## 27. Event Bus Architecture
1985
+
1986
+ > Source: `packages/krate/core/src/event-bus.js`
1987
+
1988
+ ### 27.1 createEventBus() — Factory
1989
+
1990
+ ```javascript
1991
+ export function createEventBus() {
1992
+ const listeners = new Set();
1993
+ return { subscribe(fn), unsubscribe(fn), emit(event), emitResourceChange(kind, name, operation) };
1994
+ }
1995
+ ```
1996
+
1997
+ ### 27.2 globalEventBus — Module-Level Singleton
1998
+
1999
+ ```javascript
2000
+ export const globalEventBus = createEventBus();
2001
+ ```
2002
+
2003
+ Shared across the entire Node.js process. Imported by:
2004
+ - `http-server.js` — SSE endpoint subscribes writer functions
2005
+ - `api-controller.js` — emits after `applyResource()` / `deleteResource()`
2006
+
2007
+ ### 27.3 Listener Management
2008
+
2009
+ - `subscribe(fn)` — adds to `Set<Function>` (deduplication via reference equality)
2010
+ - `unsubscribe(fn)` — removes from Set
2011
+
2012
+ ### 27.4 emit(event) — Synchronous Broadcast
2013
+
2014
+ ```javascript
2015
+ emit(event) {
2016
+ for (const fn of listeners) {
2017
+ fn(event); // Synchronous invocation — no error boundary
2018
+ }
2019
+ }
2020
+ ```
2021
+
2022
+ All subscribers are called synchronously in iteration order. A throwing subscriber would propagate to the emitter.
2023
+
2024
+ ### 27.5 emitResourceChange(kind, name, operation)
2025
+
2026
+ Convenience wrapper producing structured events:
2027
+
2028
+ ```javascript
2029
+ {
2030
+ type: 'resource-change',
2031
+ kind: string, // e.g. 'Repository', 'AgentDispatchRun'
2032
+ name: string, // Resource metadata.name
2033
+ operation: string, // 'apply' | 'delete'
2034
+ timestamp: string // ISO 8601
2035
+ }
2036
+ ```
2037
+
2038
+ ### 27.6 Integration Flow
2039
+
2040
+ ```
2041
+ api-controller.applyResource()
2042
+ → globalEventBus.emitResourceChange('Repository', 'my-repo', 'apply')
2043
+ → SSE writer in http-server.js
2044
+ → response.write('data: {"type":"resource-change",...}\n\n')
2045
+ → Browser EventSource receives event
2046
+ ```
2047
+
2048
+ ### 27.7 Memory Model
2049
+
2050
+ - `listeners` is a `Set` — no persistence, no durability
2051
+ - Events are fire-and-forget — if no subscribers exist, events are dropped
2052
+ - No event replay or history — late subscribers miss past events
2053
+ - No backpressure — slow subscribers block the emit loop
2054
+
2055
+ ---
2056
+
2057
+ ## 28. Gitea Service Layer
2058
+
2059
+ > Source: `packages/krate/core/src/gitea-service.js`, `packages/krate/core/src/gitea-backend.js`
2060
+
2061
+ ### 28.1 createGiteaService(options) — High-Level Service
2062
+
2063
+ ```javascript
2064
+ export function createGiteaService(options = {}) {
2065
+ const giteaUrl = options.giteaUrl || process.env.KRATE_GITEA_HTTP_URL;
2066
+ if (!giteaUrl) return null; // Callers must check and fall back to mock
2067
+ const backend = createGiteaBackend({ baseUrl: giteaUrl, token, fetchImpl });
2068
+ return { available: true, baseUrl, listTree, getBlob, listBranches, getFileContent, createRepository };
2069
+ }
2070
+ ```
2071
+
2072
+ **Returns `null`** when `KRATE_GITEA_HTTP_URL` is not set — this is the availability check. Web routes that need tree/blob data try Gitea first, then fall back to mock responses.
2073
+
2074
+ ### 28.2 Service Methods
2075
+
2076
+ | Method | Gitea API Endpoint | Returns |
2077
+ |--------|-------------------|---------|
2078
+ | `listTree(org, repo, ref, path)` | `GET /api/v1/repos/{owner}/{repo}/contents/{path}?ref={ref}` | `[{ path, type: 'blob'|'tree', size, sha, name }]` or `null` (404) |
2079
+ | `getBlob(org, repo, ref, path)` | `GET /api/v1/repos/{owner}/{repo}/raw/{path}?ref={ref}` | Raw text content or `null` (404) |
2080
+ | `listBranches(org, repo)` | `GET /api/v1/repos/{owner}/{repo}/branches` | `[{ name, sha, protected }]` or `null` |
2081
+ | `getFileContent(org, repo, ref, path)` | `GET /api/v1/repos/{owner}/{repo}/contents/{path}?ref={ref}` | `{ path, content, size, sha, encoding, lastCommit }` or `null` |
2082
+ | `createRepository(org, name, opts)` | Delegates to `backend.createRepository()` | Gitea API response |
2083
+
2084
+ **Error handling:**
2085
+ - 404 → returns `null` (graceful degradation)
2086
+ - Other non-OK status → throws `Error('Gitea GET <path> failed with <status>')`
2087
+
2088
+ ### 28.3 createGiteaBackend(options) — Low-Level HTTP Client
2089
+
2090
+ ```javascript
2091
+ export function createGiteaBackend({ baseUrl, token, fetchImpl }) {
2092
+ async function request(method, path, body) {
2093
+ const response = await fetchImpl(`${root}/api/v1${path}`, { method, headers: {...}, body? });
2094
+ if (!response.ok) throw new Error(`Gitea ${method} ${path} failed with ${response.status}`);
2095
+ return response.status === 204 ? null : response.json();
2096
+ }
2097
+ return { role: 'gitea-backend', baseUrl, ...methods };
2098
+ }
2099
+ ```
2100
+
2101
+ **Backend methods (all use `request()` internally):**
2102
+
2103
+ | Method | HTTP | Gitea API Path |
2104
+ |--------|------|----------------|
2105
+ | `createOrganization({ name, fullName, description, visibility })` | POST | `/orgs` |
2106
+ | `createUser({ username, email, fullName, password, mustChangePassword })` | POST | `/admin/users` |
2107
+ | `editUser({ username, email, fullName, active, admin })` | PATCH | `/admin/users/{username}` |
2108
+ | `addUserSshKey({ title, key, readOnly })` | POST | `/user/keys` |
2109
+ | `createRepository({ owner, name, private, defaultBranch, description })` | POST | `/orgs/{owner}/repos` or `/user/repos` |
2110
+ | `addDeployKey({ owner, repo, title, key, readOnly })` | POST | `/repos/{owner}/{repo}/keys` |
2111
+ | `addCollaborator({ owner, repo, username, permission })` | PUT | `/repos/{owner}/{repo}/collaborators/{username}` |
2112
+ | `addTeamRepository({ org, team, repo, owner, permission })` | PUT | `/teams/{team}/repos/{owner}/{repo}` |
2113
+ | `createTeam({ org, name, permission, units })` | POST | `/orgs/{org}/teams` |
2114
+ | `addTeamMember({ team, username })` | PUT | `/teams/{team}/members/{username}` |
2115
+ | `protectBranch({ owner, repo, branch, approvals, statusChecks })` | POST | `/repos/{owner}/{repo}/branch_protections` |
2116
+ | `createIssue({ owner, repo, title, body, labels, assignees })` | POST | `/repos/{owner}/{repo}/issues` |
2117
+ | `createPullRequest({ owner, repo, title, head, base, body })` | POST | `/repos/{owner}/{repo}/pulls` |
2118
+ | `createWebhook({ owner, repo, url, events, secret })` | POST | `/repos/{owner}/{repo}/hooks` |
2119
+
2120
+ ### 28.4 Authentication
2121
+
2122
+ All requests include `Authorization: token <token>` header when `token` is provided (from `KRATE_GITEA_TOKEN` environment variable).
2123
+
2124
+ ---
2125
+
2126
+ ## 29. Notification Controller
2127
+
2128
+ > Source: `packages/krate/core/src/notification-controller.js`
2129
+
2130
+ ### 29.1 Event-to-Notification Mapping
2131
+
2132
+ | Event Type | Event Status/Condition | Notification Type | Severity |
2133
+ |------------|----------------------|-------------------|----------|
2134
+ | `AgentDispatchRun` | `status='completed'` | `run-complete` | info |
2135
+ | `AgentDispatchRun` | `status='failed'` | `run-complete` | error |
2136
+ | `AgentDispatchRun` | other status | `run-complete` | info |
2137
+ | `AgentApproval` | `status='pending'` | `approval-needed` | warning |
2138
+ | `ExternalSyncConflict` | any | `conflict-detected` | warning |
2139
+ | `KrateWorkspace` | `claimed=true` | `workspace-ready` | info |
2140
+ | fallback | any | `system` | info |
2141
+
2142
+ ### 29.2 Notification Shape
2143
+
2144
+ ```javascript
2145
+ {
2146
+ id: string, // crypto.randomUUID()
2147
+ type: string, // 'run-complete' | 'approval-needed' | 'conflict-detected' | 'workspace-ready' | 'system'
2148
+ title: string, // Human-readable title
2149
+ message: string, // Detailed message
2150
+ severity: string, // 'info' | 'warning' | 'error'
2151
+ resourceRef: any, // Optional reference to the triggering resource
2152
+ createdAt: string, // ISO 8601 timestamp
2153
+ read: boolean, // Read state (default false)
2154
+ org: string // Organization slug
2155
+ }
2156
+ ```
2157
+
2158
+ ### 29.3 Storage Model
2159
+
2160
+ ```javascript
2161
+ const store = new Map(); // org → notifications[] (in-memory, no persistence)
2162
+ const prefsStore = new Map(); // userId → preferences object
2163
+ ```
2164
+
2165
+ ### 29.4 API Methods
2166
+
2167
+ | Method | Signature | Behavior |
2168
+ |--------|-----------|----------|
2169
+ | `createNotification(event)` | `(object) → notification` | Maps event → notification, pushes to org store |
2170
+ | `listNotifications(org, opts)` | `(string, { unreadOnly?, limit?, since? })` | Sort by createdAt desc, apply filters, cap to limit (default 20) |
2171
+ | `markAsRead(notificationId)` | `(string) → boolean` | Scans all org stores, sets `read=true`, returns success |
2172
+ | `markAllAsRead(org)` | `(string) → number` | Marks all unread in org, returns count |
2173
+ | `getUnreadCount(org)` | `(string) → number` | `.filter(n => !n.read).length` |
2174
+ | `getPreferences(userId)` | `(string) → prefs` | Returns merged defaults + stored prefs |
2175
+ | `updatePreferences(userId, prefs)` | `(string, object) → prefs` | Deep-merge into stored prefs |
2176
+
2177
+ ### 29.5 Default Preferences
2178
+
2179
+ ```javascript
2180
+ { runs: true, approvals: true, conflicts: true, workspaces: true, sound: false, desktop: false }
2181
+ ```
2182
+
2183
+ ---
2184
+
2185
+ ## 30. Runner Controller
2186
+
2187
+ > Source: `packages/krate/core/src/runner-controller.js`
2188
+
2189
+ ### 30.1 validateRunnerPool(resource)
2190
+
2191
+ Validates a RunnerPool resource:
2192
+
2193
+ | Check | Error Reason | Message |
2194
+ |-------|--------------|---------|
2195
+ | resource is null/undefined | `missing-resource` | resource is required |
2196
+ | no `metadata.name` | `missing-name` | metadata.name is required |
2197
+ | no `spec.organizationRef` | `missing-org` | spec.organizationRef is required |
2198
+ | `warmReplicas` not non-negative int | `invalid-min-replicas` | must be non-negative integer |
2199
+ | `maxReplicas` not positive int | `invalid-max-replicas` | must be positive integer |
2200
+ | `warmReplicas > maxReplicas` | `replicas-conflict` | must not exceed maxReplicas |
2201
+
2202
+ Returns `{ valid: true, name, organizationRef, warmReplicas, maxReplicas, image, labels }` on success.
2203
+
2204
+ ### 30.2 getPoolStatus(pool)
2205
+
2206
+ ```javascript
2207
+ return {
2208
+ poolName,
2209
+ idle: runners.filter(status === 'Idle').length,
2210
+ active: runners.filter(status === 'Running').length,
2211
+ terminating: runners.filter(status === 'Terminating').length,
2212
+ total: poolRunners.length,
2213
+ desired: pool.spec.warmReplicas,
2214
+ maxReplicas: pool.spec.maxReplicas,
2215
+ phase: total === 0 ? 'Empty' : active > 0 ? 'Active' : 'Idle',
2216
+ scaling: total < desired ? 'ScalingUp' : total > max ? 'ScalingDown' : 'Stable'
2217
+ };
2218
+ ```
2219
+
2220
+ ### 30.3 getCapacity(pool)
2221
+
2222
+ ```javascript
2223
+ return {
2224
+ poolName,
2225
+ maxReplicas,
2226
+ used: runners.filter(status === 'Running').length,
2227
+ available: Math.max(0, maxReplicas - used),
2228
+ utilizationPct: Math.round((used / maxReplicas) * 100)
2229
+ };
2230
+ ```
2231
+
2232
+ ### 30.4 createRunner(pool, runRef)
2233
+
2234
+ 1. Generates runner ID: `runner-${poolName}-${Date.now()}-${random5chars}`
2235
+ 2. Determines initial status: `'Running'` if runRef provided, `'Idle'` otherwise
2236
+ 3. Calls `generatePodSpec()` to build the K8s Pod manifest
2237
+ 4. Stores in in-memory `runners` Map
2238
+ 5. If runRef → stores in `jobAssignments` Map
2239
+
2240
+ ### 30.5 generatePodSpec({ runnerId, pool }, workspace)
2241
+
2242
+ Produces a complete K8s Pod manifest:
2243
+
2244
+ ```javascript
2245
+ {
2246
+ apiVersion: 'v1',
2247
+ kind: 'Pod',
2248
+ metadata: {
2249
+ name: `runner-${runnerId}`,
2250
+ namespace: pool.metadata.namespace || 'krate-org-default',
2251
+ labels: {
2252
+ 'krate.a5c.ai/runner': runnerId,
2253
+ 'krate.a5c.ai/pool': poolName,
2254
+ 'krate.a5c.ai/org': organizationRef
2255
+ }
2256
+ },
2257
+ spec: {
2258
+ serviceAccountName: spec.serviceAccount || 'krate-runner',
2259
+ restartPolicy: 'Never',
2260
+ containers: [{
2261
+ name: 'runner',
2262
+ image: spec.image || 'ubuntu:24.04',
2263
+ env: [
2264
+ { name: 'KRATE_ORG', value: organizationRef },
2265
+ { name: 'KRATE_RUN_ID', value: runId },
2266
+ { name: 'KRATE_WORKSPACE_PATH', value: '/workspace' }
2267
+ ],
2268
+ volumeMounts: workspace ? [{ name: 'workspace', mountPath: '/workspace' }] : [],
2269
+ resources: {
2270
+ limits: spec.resourceLimits || { cpu: '2', memory: '4Gi' },
2271
+ requests: spec.resourceRequests || { cpu: '500m', memory: '1Gi' }
2272
+ }
2273
+ }],
2274
+ volumes: workspace ? [{
2275
+ name: 'workspace',
2276
+ persistentVolumeClaim: { claimName: `krate-ws-${runId}` }
2277
+ }] : []
2278
+ }
2279
+ }
2280
+ ```
2281
+
2282
+ ### 30.6 scheduleJob(pool, job)
2283
+
2284
+ 1. Check if job already assigned → return existing runner (reused=true)
2285
+ 2. Find idle runner in pool → assign it (status → Running)
2286
+ 3. Check capacity → if none available, return `{ error: true, reason: 'no-capacity' }`
2287
+ 4. Create new runner via `createRunner(pool, jobRef)`
2288
+
2289
+ ### 30.7 terminateRunner(runnerId)
2290
+
2291
+ 1. Look up runner in Map
2292
+ 2. Remove job assignment if any
2293
+ 3. Set status to `'Terminating'`, record `terminatedAt`
2294
+ 4. Remove from runners Map
2295
+
2296
+ ---
2297
+
2298
+ ## 31. External Backend Pipeline
2299
+
2300
+ > Source: `packages/krate/core/src/external/`
2301
+
2302
+ ### 31.1 Provider Registration (provider-resource-factory.js)
2303
+
2304
+ ```javascript
2305
+ export function createDefaultProviderRegistry() {
2306
+ const registry = createProviderRegistry(); // Map<type, adapter>
2307
+ registry.register('github', buildGitHubAdapterDescriptor());
2308
+ return registry;
2309
+ }
2310
+ ```
2311
+
2312
+ The GitHub adapter descriptor exposes factory methods (`createForge`, `createIssueTracker`, `createCicd`) for credential-bound instances, plus stub credential-free interfaces.
2313
+
2314
+ **Provider Registry API:**
2315
+ - `register(type, adapter)` — stores adapter by type key
2316
+ - `get(type)` → adapter or null
2317
+ - `list()` → `[...adapters.keys()]`
2318
+
2319
+ **Adapter validation contract** (from `provider-adapter.js`):
2320
+ - Required: `descriptor()`, `health()`, `normalizeWebhook(payload)`, `verifyWebhook(request)`
2321
+ - At least one of: `issueTracking`, `cicd`, `gitForge`
2322
+
2323
+ ### 31.2 Webhook Ingestion (webhook-controller.js)
2324
+
2325
+ ```javascript
2326
+ export function createWebhookController({ secret }) {
2327
+ const deliveries = new Map(); // deliveryId → record
2328
+ const subscribers = []; // event handlers
2329
+ return { verifyHmacSignature, createDeliveryRecord, recordDelivery, isDuplicate, onEvent, processDelivery };
2330
+ }
2331
+ ```
2332
+
2333
+ **verifyHmacSignature(body, signature):**
2334
+ - Requires `sha256=` prefix
2335
+ - Computes `sha256=` + HMAC-SHA256(secret, body).hex()
2336
+ - Constant-time comparison via `timingSafeEqual` on the full strings (not just digests)
2337
+ - Returns `{ valid: boolean, reason: string|null }`
2338
+
2339
+ **processDelivery({ deliveryId, eventType, payload, rawBody }):**
2340
+ 1. Dedup check: `if (isDuplicate(deliveryId)) → { queued: 0, duplicate: true }`
2341
+ 2. Create delivery record with timestamp
2342
+ 3. Store in `deliveries` Map
2343
+ 4. Emit to all subscribers
2344
+ 5. Return `{ queued: subscriberCount, duplicate: false, deliveryId }`
2345
+
2346
+ ### 31.3 Event Normalization (sync-controller.js)
2347
+
2348
+ ```javascript
2349
+ normalizeEvent(rawEvent) → {
2350
+ eventType,
2351
+ action: rawEvent.action || 'unknown',
2352
+ nativeId,
2353
+ providerRef,
2354
+ resourceKind,
2355
+ data: rawEvent.data || {},
2356
+ receivedAt: rawEvent.receivedAt || now,
2357
+ canonicalAt: now
2358
+ }
2359
+ ```
2360
+
2361
+ ### 31.4 Resource Upsert (sync-controller.js)
2362
+
2363
+ ```javascript
2364
+ upsertResource({ kind, localName, namespace, spec, externalEnvelope }) → resource
2365
+ ```
2366
+
2367
+ The `externalEnvelope` contains:
2368
+ - `nativeId` — provider's identifier (e.g. GitHub issue number)
2369
+ - `url` — canonical URL on the provider
2370
+ - `etag` — version marker for conflict detection
2371
+ - `providerRef` — which ExternalBackendProvider this came from
2372
+
2373
+ On upsert:
2374
+ - `firstSyncedAt` is preserved from existing resource
2375
+ - `lastSyncedAt` is always updated to now
2376
+ - Stored in internal `resources` Map keyed by `${namespace}/${kind}/${localName}`
2377
+ - Fire-and-forget `persistFn(resource)` called
2378
+
2379
+ ### 31.5 Watermark Tracking (sync-controller.js)
2380
+
2381
+ ```javascript
2382
+ updateWatermark(bindingRef, timestamp)
2383
+ // Only advances if new timestamp > current (monotonic)
2384
+ // Persists as ExternalSyncWatermark CRD resource
2385
+
2386
+ getWatermark(bindingRef) → string | null
2387
+ ```
2388
+
2389
+ Per-binding state stored in `watermarks` Map. Prevents re-processing of already-synced events.
2390
+
2391
+ ### 31.6 Ownership Modes (sync-controller.js)
2392
+
2393
+ ```javascript
2394
+ applyOwnershipMode({ ownershipMode, operation, origin }) → { allowed, reason }
2395
+ ```
2396
+
2397
+ | Mode | Krate Write | External Write |
2398
+ |------|-------------|----------------|
2399
+ | `bidirectional` | allowed | allowed |
2400
+ | `external-owned` | **blocked** | allowed |
2401
+ | `krate-owned` | allowed | **blocked** |
2402
+ | unknown | blocked | blocked |
2403
+
2404
+ ### 31.7 Conflict Detection (conflict-controller.js)
2405
+
2406
+ ```javascript
2407
+ detectConflict({ resourceRef, fieldPath, localValue, externalValue, namespace, organizationRef })
2408
+ ```
2409
+
2410
+ - If `localValue === externalValue` → `{ conflict: null }` (no conflict)
2411
+ - If different → creates `ExternalSyncConflict` resource with phase `'Open'`
2412
+ - Conflict name: `conflict-${resourceRef}-${fieldPath}-${timestamp}`
2413
+
2414
+ ### 31.8 Conflict Resolution (conflict-controller.js)
2415
+
2416
+ ```javascript
2417
+ resolveConflict({ conflictName, strategy, resolvedValue, resources })
2418
+ ```
2419
+
2420
+ | Strategy | Chosen Value | New Phase |
2421
+ |----------|-------------|-----------|
2422
+ | `prefer-external` | `spec.externalValue` | `Resolved` |
2423
+ | `prefer-krate` | `spec.localValue` | `Resolved` |
2424
+ | `manual` | `resolvedValue` param (required) | `Resolved` |
2425
+ | `ignore` | undefined | `Ignored` |
2426
+
2427
+ **Superseded check:** `supersededCheck({ resourceRef, fieldPath, resources })` marks all Open conflicts for the same field as `'Superseded'` when a new sync event arrives.
2428
+
2429
+ ### 31.9 Write Intents (write-controller.js)
2430
+
2431
+ **Phase lifecycle:**
2432
+
2433
+ ```
2434
+ PendingApproval → ReadyToSend → Sending → Succeeded
2435
+ ↘ Retrying → Sending (loop, up to maxRetries)
2436
+ ↘ Failed
2437
+ PendingApproval → Rejected
2438
+ ```
2439
+
2440
+ **createWriteIntent({ interfaceKey, operation, payload, resourceRef, requiresApproval, maxRetries }):**
2441
+ - Generates idempotency key via `getIdempotencyKey()`
2442
+ - Initial phase: `PendingApproval` if `requiresApproval`, else `ReadyToSend`
2443
+
2444
+ **approveWriteIntent({ intentName, approvedBy, resources }):**
2445
+ - Validates current phase is `PendingApproval`
2446
+ - Transitions to `ReadyToSend` with approver + timestamp
2447
+
2448
+ **executeWriteIntent({ intentName, resources, executor, onPhaseChange }):**
2449
+ - Validates current phase is `ReadyToSend`
2450
+ - Calls `executor()` (an async function that performs the external API call)
2451
+ - On success: phase → `Succeeded` with `externalResult`
2452
+ - On failure: retries up to `maxRetries`, phase cycles through `Retrying`
2453
+ - After exhausting retries: phase → `Failed` with `lastError`
2454
+
2455
+ ### 31.10 Idempotency Key Generation (write-controller.js)
2456
+
2457
+ ```javascript
2458
+ export function getIdempotencyKey({ interfaceKey, operation, resourceRef, payload }) {
2459
+ const canonical = JSON.stringify({ interfaceKey, operation, resourceRef, payload }, sortedKeys);
2460
+ // djb2 hash algorithm
2461
+ let hash = 5381;
2462
+ for (let i = 0; i < canonical.length; i++) {
2463
+ hash = ((hash << 5) + hash) ^ canonical.charCodeAt(i);
2464
+ hash = hash >>> 0; // Keep 32-bit unsigned
2465
+ }
2466
+ return `idem-${interfaceKey}-${operation}-${hash.toString(16)}`;
2467
+ }
2468
+ ```
2469
+
2470
+ Deterministic — same inputs always produce the same key. Used to prevent duplicate write operations.
2471
+
2472
+ ### 31.11 Persistence Callbacks
2473
+
2474
+ All controllers (sync, conflict, write) accept an optional `persistFn`:
2475
+
2476
+ ```javascript
2477
+ function persist(resource) {
2478
+ if (typeof persistFn === 'function') {
2479
+ Promise.resolve(persistFn(resource)).catch(() => {}); // Fire-and-forget
2480
+ }
2481
+ }
2482
+ ```
2483
+
2484
+ The persistFn is called with a fully-formed K8s-style CRD resource. Errors are swallowed — the caller wires monitoring separately if needed.
2485
+
2486
+ ### 31.12 GitHub Adapter (external/github/)
2487
+
2488
+ **auth.js — JWT Signing and Token Exchange:**
2489
+
2490
+ ```javascript
2491
+ // createGitHubJwt({ appId, privateKey, expiresInSeconds })
2492
+ // - RS256 for PEM keys (production)
2493
+ // - HS256 fallback for non-PEM keys (test mode)
2494
+ // - Returns: header.payload.signature (base64url encoded)
2495
+
2496
+ // exchangeInstallationToken({ appJwt, installationId, fetchImpl })
2497
+ // - POST https://api.github.com/app/installations/{id}/access_tokens
2498
+ // - Returns: { token, expiresAt }
2499
+ ```
430
2500
 
431
- All CRDs use `x-kubernetes-preserve-unknown-fields: true` allowing spec evolution without CRD version bumps.
2501
+ **git-forge.js GitHubGitForge class:**
2502
+
2503
+ | Method | GitHub API |
2504
+ |--------|-----------|
2505
+ | `listRepositories()` | GET `/installation/repositories` |
2506
+ | `getPullRequest({ repo, pullNumber })` | GET `/repos/{owner}/{repo}/pulls/{number}` |
2507
+ | `createPullRequest({ repo, title, head, base, body })` | POST `/repos/{owner}/{repo}/pulls` |
2508
+ | `mergePullRequest({ repo, pullNumber, mergeMethod, commitTitle })` | PUT `/repos/{owner}/{repo}/pulls/{number}/merge` |
2509
+ | `listRefs({ repo })` | GET `/repos/{owner}/{repo}/branches` + `/tags` (parallel) |
2510
+ | `syncDeployKeys({ repo, desiredKeys })` | GET+DELETE+POST `/repos/{owner}/{repo}/keys` |
2511
+ | `syncBranchProtection({ repo, branch, ... })` | PUT `/repos/{owner}/{repo}/branches/{branch}/protection` |
2512
+
2513
+ **issue-tracking.js — GitHubIssueTracking class:**
2514
+
2515
+ | Method | GitHub API |
2516
+ |--------|-----------|
2517
+ | `listIssues({ repo, state })` | GET `/repos/{owner}/{repo}/issues?state={state}` |
2518
+ | `createIssue({ repo, title, body, labels })` | POST `/repos/{owner}/{repo}/issues` |
2519
+ | `updateIssue({ repo, issueNumber, title, body, labels })` | PATCH `/repos/{owner}/{repo}/issues/{number}` |
2520
+ | `closeIssue({ repo, issueNumber })` | PATCH `/repos/{owner}/{repo}/issues/{number}` (state: closed) |
2521
+ | `listComments({ repo, issueNumber })` | GET `/repos/{owner}/{repo}/issues/{number}/comments` |
2522
+ | `createComment({ repo, issueNumber, body })` | POST `/repos/{owner}/{repo}/issues/{number}/comments` |
2523
+
2524
+ **cicd.js — GitHubCicd class:**
2525
+
2526
+ | Method | GitHub API |
2527
+ |--------|-----------|
2528
+ | `listWorkflowRuns({ repo, workflowId? })` | GET `/repos/{owner}/{repo}/actions/runs` or `/actions/workflows/{id}/runs` |
2529
+ | `listJobs({ repo, runId })` | GET `/repos/{owner}/{repo}/actions/runs/{id}/jobs` |
2530
+ | `rerunWorkflow({ repo, runId })` | POST `/repos/{owner}/{repo}/actions/runs/{id}/rerun` |
2531
+ | `cancelWorkflow({ repo, runId })` | POST `/repos/{owner}/{repo}/actions/runs/{id}/cancel` |
2532
+ | `createCheck({ repo, name, headSha, status, conclusion, detailsUrl, output })` | POST `/repos/{owner}/{repo}/check-runs` |
2533
+ | `updateCheck({ repo, checkRunId, status, conclusion, output })` | PATCH `/repos/{owner}/{repo}/check-runs/{id}` |
2534
+
2535
+ All GitHub classes use `X-GitHub-Api-Version: 2022-11-28` header and Bearer token auth.