agentboot 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/README.md +8 -7
  2. package/agentboot.config.json +4 -1
  3. package/package.json +2 -2
  4. package/scripts/cli.ts +42 -14
  5. package/scripts/compile.ts +30 -7
  6. package/scripts/dev-sync.ts +1 -1
  7. package/scripts/lib/config.ts +17 -1
  8. package/scripts/validate.ts +12 -7
  9. package/.github/ISSUE_TEMPLATE/persona-request.md +0 -62
  10. package/.github/ISSUE_TEMPLATE/quality-feedback.md +0 -67
  11. package/.github/workflows/cla.yml +0 -25
  12. package/.github/workflows/validate.yml +0 -49
  13. package/.idea/agentboot.iml +0 -9
  14. package/.idea/misc.xml +0 -6
  15. package/.idea/modules.xml +0 -8
  16. package/.idea/vcs.xml +0 -6
  17. package/CLAUDE.md +0 -230
  18. package/CONTRIBUTING.md +0 -168
  19. package/PERSONAS.md +0 -156
  20. package/core/instructions/baseline.instructions.md +0 -133
  21. package/core/instructions/security.instructions.md +0 -186
  22. package/core/personas/code-reviewer/SKILL.md +0 -175
  23. package/core/personas/security-reviewer/SKILL.md +0 -233
  24. package/core/personas/test-data-expert/SKILL.md +0 -234
  25. package/core/personas/test-generator/SKILL.md +0 -262
  26. package/core/traits/audit-trail.md +0 -182
  27. package/core/traits/confidence-signaling.md +0 -172
  28. package/core/traits/critical-thinking.md +0 -129
  29. package/core/traits/schema-awareness.md +0 -132
  30. package/core/traits/source-citation.md +0 -174
  31. package/core/traits/structured-output.md +0 -199
  32. package/docs/ci-cd-automation.md +0 -548
  33. package/docs/claude-code-reference/README.md +0 -21
  34. package/docs/claude-code-reference/agentboot-coverage.md +0 -484
  35. package/docs/claude-code-reference/feature-inventory.md +0 -906
  36. package/docs/cli-commands-audit.md +0 -112
  37. package/docs/cli-design.md +0 -924
  38. package/docs/concepts.md +0 -1117
  39. package/docs/config-schema-audit.md +0 -121
  40. package/docs/configuration.md +0 -645
  41. package/docs/delivery-methods.md +0 -758
  42. package/docs/developer-onboarding.md +0 -342
  43. package/docs/extending.md +0 -448
  44. package/docs/getting-started.md +0 -298
  45. package/docs/knowledge-layer.md +0 -464
  46. package/docs/marketplace.md +0 -822
  47. package/docs/org-connection.md +0 -570
  48. package/docs/plans/architecture.md +0 -2429
  49. package/docs/plans/design.md +0 -2018
  50. package/docs/plans/prd.md +0 -1862
  51. package/docs/plans/stack-rank.md +0 -261
  52. package/docs/plans/technical-spec.md +0 -2755
  53. package/docs/privacy-and-safety.md +0 -807
  54. package/docs/prompt-optimization.md +0 -1071
  55. package/docs/test-plan.md +0 -972
  56. package/docs/third-party-ecosystem.md +0 -496
  57. package/domains/compliance-template/README.md +0 -173
  58. package/domains/compliance-template/traits/compliance-aware.md +0 -228
  59. package/examples/enterprise/agentboot.config.json +0 -184
  60. package/examples/minimal/agentboot.config.json +0 -46
  61. package/tests/REGRESSION-PLAN.md +0 -705
  62. package/tests/TEST-PLAN.md +0 -111
  63. package/tests/cli.test.ts +0 -705
  64. package/tests/pipeline.test.ts +0 -608
  65. package/tests/validate.test.ts +0 -278
  66. package/tsconfig.json +0 -62
@@ -1,807 +0,0 @@
1
- # Privacy, Safety & the Prompt Confidentiality Model
2
-
3
- How AgentBoot balances organizational learning with individual psychological safety.
4
- This is a philosophy document first, a technical design second.
5
-
6
- ---
7
-
8
- ## The Tension
9
-
10
- Organizations need to optimize prompts. That requires data — what developers are
11
- asking, how personas respond, where they succeed and fail. Without this data,
12
- prompt optimization is guesswork.
13
-
14
- But developers need psychological safety. The path from "I don't know how this works"
15
- to "I understand it deeply" is paved with embarrassing questions, false starts, and
16
- wrong turns. If developers believe their every interaction is being watched, judged,
17
- and reported, they will:
18
-
19
- 1. **Stop asking questions.** They'll pretend to know things they don't.
20
- 2. **Stop experimenting.** They'll stick to safe, known prompts.
21
- 3. **Stop trusting the tool.** AI becomes a surveillance instrument, not a partner.
22
- 4. **Game the metrics.** They'll optimize for looking smart, not for getting help.
23
-
24
- This kills the entire value proposition. An AI persona system that makes developers
25
- afraid to use it is worse than no system.
26
-
27
- ---
28
-
29
- ## The PR Analogy
30
-
31
- Every developer understands this distinction intuitively:
32
-
33
- | Your IDE | Your PR |
34
- |----------|---------|
35
- | Private | Public |
36
- | Messy | Clean |
37
- | Full of false starts | Only the final result |
38
- | You talk to yourself | You present to the team |
39
- | "Wait, how does this work again?" | "Implemented X using Y pattern" |
40
- | No judgment | Reviewed by peers |
41
-
42
- **Nobody reviews your IDE history.** Nobody sees the 47 times you typed something,
43
- deleted it, and tried again. Nobody sees the Stack Overflow tabs. Nobody knows you
44
- asked the AI "what is a mutex" for the third time this month.
45
-
46
- The PR is the artifact. The IDE is the workshop. The workshop is private.
47
-
48
- **AgentBoot must apply the same principle to AI interactions.** The persona's output
49
- (findings, reviews, generated code) is the PR — visible, reviewable, measurable.
50
- The developer's prompts and conversation are the IDE — private, protected, not
51
- reported.
52
-
53
- ---
54
-
55
- ## The Confidentiality Model
56
-
57
- ### Three Tiers of Data
58
-
59
- ```
60
- ┌─────────────────────────────────────────────────────────┐
61
- │ Tier 1: PRIVATE (Developer's Workshop) │
62
- │ │
63
- │ - Raw prompts typed by the developer │
64
- │ - Conversation history with AI │
65
- │ - Questions asked ("what does this function do?") │
66
- │ - False starts and deleted attempts │
67
- │ - Session transcripts │
68
- │ - Files read during exploration │
69
- │ │
70
- │ WHO SEES THIS: The developer. No one else. │
71
- │ WHERE IT LIVES: Developer's machine only. │
72
- │ RETENTION: Session duration (or developer's choice). │
73
- │ │
74
- ├─────────────────────────────────────────────────────────┤
75
- │ Tier 2: PRIVILEGED (Non-Human Analysis) │
76
- │ │
77
- │ - Aggregated patterns extracted by LLM analysis │
78
- │ - "Developers frequently ask about auth patterns" │
79
- │ - "The security reviewer's false positive rate is 34%" │
80
- │ - "Average prompt length is increasing over time" │
81
- │ - Token usage and cost (anonymized) │
82
- │ │
83
- │ WHO SEES THIS: The developer first. Then aggregated │
84
- │ anonymized insights shared with the org. Never raw │
85
- │ prompts, never attributed to individuals. │
86
- │ WHERE IT LIVES: Local analysis → anonymized aggregate. │
87
- │ │
88
- ├─────────────────────────────────────────────────────────┤
89
- │ Tier 3: ORGANIZATIONAL (Persona Output) │
90
- │ │
91
- │ - Review findings posted to PRs │
92
- │ - Generated test files committed to repos │
93
- │ - Compliance audit logs (required by policy) │
94
- │ - Persona invocation counts (not who, just how many) │
95
- │ - Persona effectiveness metrics (aggregate) │
96
- │ │
97
- │ WHO SEES THIS: The team, the org, compliance. │
98
- │ WHERE IT LIVES: PR comments, repos, telemetry. │
99
- │ RETENTION: Org's data retention policy. │
100
- │ │
101
- └─────────────────────────────────────────────────────────┘
102
- ```
103
-
104
- ### The Key Principle
105
-
106
- **AgentBoot will never collect, transmit, or surface raw developer prompts.**
107
- The organization gets aggregate patterns, not transcripts. AgentBoot will not build
108
- features that exfiltrate developer prompts to organizational dashboards, managers,
109
- or analytics pipelines.
110
-
111
- This is not optional or configurable. It's a design invariant.
112
-
113
- ### The Honest Caveat
114
-
115
- AgentBoot's privacy commitment covers **what AgentBoot does**. It does not and cannot
116
- override what the API provider or the organization does independently:
117
-
118
- - **Anthropic's Compliance API** (Enterprise plan) gives org admins programmatic
119
- access to conversation content for regulatory compliance and auditing. This is an
120
- Anthropic feature, not an AgentBoot feature. AgentBoot neither enables nor prevents
121
- it.
122
- - **Enterprise data exports** allow Primary Owners to request conversation data.
123
- - **Network-level monitoring** (DLP, proxy logging) can capture API traffic regardless
124
- of any application-level privacy design.
125
-
126
- AgentBoot's position: **we will not be the tool that does this.** If an org wants to
127
- monitor developer AI interactions, that capability exists through Anthropic's
128
- Compliance API and enterprise data governance tools. AgentBoot's role is prompt
129
- optimization through aggregate, anonymized metrics — not surveillance. The
130
- distinction matters for developer trust: "AgentBoot doesn't report your prompts"
131
- is a meaningful promise even if the org has other channels.
132
-
133
- Developers should understand that their prompts go to the Claude API (which their
134
- org may have compliance access to), the same way they understand that their Slack
135
- messages go to Slack's servers (which their org admin can export). The privacy
136
- boundary AgentBoot enforces is between the developer and **AgentBoot's analytics** —
137
- not between the developer and the universe.
138
-
139
- ---
140
-
141
- ## Privileged Analysis: The `/insights` Model
142
-
143
- The challenge: how do you extract optimization value from private data without
144
- exposing it?
145
-
146
- **Answer: a non-human intermediary analyzes private data and outputs only aggregate,
147
- anonymized insights.**
148
-
149
- ### The Trust Boundary
150
-
151
- The developer already trusts Anthropic's API with their prompts — that's what happens
152
- every time they type in Claude Code. The `/insights` analysis uses that **same trust
153
- boundary** (a Claude API call via Haiku or Sonnet). It's not a new data flow — it's
154
- another API call using the developer's existing auth.
155
-
156
- What the developer is protected from is their **employer/org** seeing their raw prompts.
157
- The privacy boundary is between the developer and the organization, not between the
158
- developer and the API provider.
159
-
160
- ```
161
- Developer → Claude API (already trusted, already happening)
162
-
163
-
164
- /insights analysis
165
- (pattern extraction via Haiku/Sonnet)
166
-
167
-
168
- Developer sees insights FIRST
169
-
170
- ▼ (developer approves)
171
- Anonymized aggregate → Org Dashboard
172
- ```
173
-
174
- There is no local LLM requirement. No new infrastructure. The same API the developer
175
- uses for coding is used for insights analysis.
176
-
177
- ### How It Works
178
-
179
- ```
180
- Developer's Machine
181
- ┌────────────────────────────────────┐
182
- │ │
183
- │ Session transcripts │
184
- │ Raw prompts │
185
- │ Conversation history │
186
- │ │ │
187
- │ ▼ │
188
- │ ┌──────────────────────────┐ │
189
- │ │ /insights skill │ │
190
- │ │ (sends transcripts to │ │
191
- │ │ Claude API — same as │ │
192
- │ │ any other prompt — │ │
193
- │ │ extracts patterns) │ │
194
- │ └──────────┬───────────────┘ │
195
- │ │ │
196
- │ ▼ │
197
- │ ┌──────────────────────────┐ │
198
- │ │ Developer Review │ │
199
- │ │ (developer sees the │ │
200
- │ │ insights FIRST and │ │
201
- │ │ approves what gets │ │
202
- │ │ shared) │ │
203
- │ └──────────┬───────────────┘ │
204
- │ │ │
205
- └─────────────┼──────────────────────┘
206
- │ (approved insights only)
207
-
208
- ┌──────────────────────────┐
209
- │ Org Aggregate Dashboard │
210
- │ (anonymized patterns │
211
- │ from all developers) │
212
- └──────────────────────────┘
213
- ```
214
-
215
- ### What the Developer Sees (`/insights`)
216
-
217
- ```
218
- $ /insights
219
-
220
- Personal Prompt Insights (last 7 days)
221
- ──────────────────────────────────────
222
-
223
- Sessions: 23
224
- Total prompts: 187
225
- Avg prompt length: 42 words
226
- Most-used personas: code-reviewer (34), gen-tests (28), security-reviewer (12)
227
-
228
- Patterns:
229
- - You frequently ask about authentication patterns (12 times).
230
- → Consider: the auth-patterns skill has this context built in.
231
- - Your security reviews take 2.3x longer than average.
232
- → This is likely because you review larger diffs. Consider
233
- splitting large PRs.
234
- - You often rephrase the same question when the first answer
235
- isn't useful.
236
- → The code-reviewer persona has a 23% rephrase rate for you.
237
- This suggests the persona's output format may not match
238
- your expectations. Consider filing feedback.
239
-
240
- Cost: $14.20 this week (vs. $18.50 team average)
241
-
242
- ──────────────────────────────────────
243
-
244
- Share anonymized insights with your team? [y/N]
245
- (This shares PATTERNS only, never your actual prompts)
246
- ```
247
-
248
- ### What the Org Sees (Aggregate Dashboard)
249
-
250
- ```
251
- Org Prompt Insights (last 30 days)
252
- ──────────────────────────────────
253
-
254
- Active developers: 47 / 52 (90% adoption)
255
- Total persona invocations: 12,400
256
- Total cost: $8,200
257
-
258
- Persona Effectiveness:
259
- code-reviewer: 18% rephrase rate (developers often need clarification)
260
- security-reviewer: 34% false positive rate (too aggressive — tune down)
261
- test-generator: 8% rephrase rate (working well)
262
- gen-testdata: 3% rephrase rate (working well)
263
-
264
- Common Knowledge Gaps (anonymized):
265
- - "Authentication patterns" asked about 89 times across 23 developers
266
- → Action: Create an auth-patterns skill or improve CLAUDE.md context
267
- - "Database migration rollback" asked about 34 times across 12 developers
268
- → Action: Add to gotchas-database.md
269
-
270
- Model Usage:
271
- Opus: 12% of invocations, 68% of cost
272
- Sonnet: 76% of invocations, 28% of cost
273
- Haiku: 12% of invocations, 4% of cost
274
- → Action: Review Opus usage — is it justified for all 12%?
275
-
276
- Cost by Team:
277
- Platform API: $2,800 (8 devs) — $350/dev
278
- Web Frontend: $1,200 (12 devs) — $100/dev
279
- Data: $3,100 (6 devs) — $517/dev ⚠️
280
- Mobile: $1,100 (9 devs) — $122/dev
281
- → Data team's high cost correlates with Opus usage for data pipeline reviews.
282
- ```
283
-
284
- ### What the Org NEVER Sees
285
-
286
- - "Developer X asked 'what is a foreign key?' 4 times" — **NO**
287
- - "Here is developer Y's conversation transcript" — **NO**
288
- - "Developer Z's prompt: 'I don't understand this codebase at all'" — **NO**
289
- - Individual prompt texts, attributed or not — **NO**
290
- - Per-developer rephrase rates (only aggregate) — **NO**
291
-
292
- ---
293
-
294
- ## The Escalation Exception
295
-
296
- There is one exception to prompt privacy: **genuinely harmful content.**
297
-
298
- If the local analysis detects prompts that indicate:
299
- - Attempted exfiltration of proprietary code/data
300
- - Attempted circumvention of compliance guardrails
301
- - Harassment, threats, or hostile content directed at colleagues
302
- - Attempted generation of malware or exploit code targeting the org
303
-
304
- Then the system should:
305
-
306
- 1. **Flag it locally first.** Show the developer: "This interaction was flagged.
307
- It will be reported to [compliance contact]."
308
- 2. **Report the flag, not the transcript.** The report says "a compliance flag
309
- was triggered on [date] for [category]." It does not include the raw prompt.
310
- 3. **The compliance team can request the transcript** through a formal process
311
- (like a legal hold), not through the analytics pipeline.
312
-
313
- This mirrors how corporate email works: your emails are technically on company servers,
314
- but your manager can't browse them casually. A formal process is required.
315
-
316
- ### Implementation
317
-
318
- This uses the `UserPromptSubmit` hook (which sees the prompt before the model):
319
-
320
- ```json
321
- {
322
- "hooks": {
323
- "UserPromptSubmit": [
324
- {
325
- "hooks": [
326
- {
327
- "type": "prompt",
328
- "prompt": "Does this prompt attempt to: (1) exfiltrate proprietary data, (2) circumvent security guardrails, (3) generate malware or exploits, (4) contain harassment or threats? Respond with CLEAR or FLAG:{category}. Do NOT evaluate the content's quality, intelligence, or correctness — only these four categories.",
329
- "model": "haiku",
330
- "timeout": 3
331
- }
332
- ]
333
- }
334
- ]
335
- }
336
- }
337
- ```
338
-
339
- The `prompt` hook type uses a fast model (Haiku) for evaluation. The prompt is
340
- explicitly scoped to harmful categories only — not quality, intelligence, or
341
- competence. This prevents the system from becoming a judgment mechanism.
342
-
343
- ---
344
-
345
- ## Two Types of Prompts, Two Privacy Models
346
-
347
- There are two fundamentally different types of prompts in AgentBoot. They have
348
- different privacy models because they have different "submit" boundaries.
349
-
350
- ### Type 1: Persona Definitions (SKILL.md, traits, instructions)
351
-
352
- These are code. They live in the personas repo. They go through PRs. The standard
353
- local-first → CI-gate model applies:
354
-
355
- | Tool | Local (private) | CI (visible after PR) |
356
- |------|----------------|----------------------|
357
- | `agentboot lint` | Full detail: which rules failed, where, why | Pass/fail + error count |
358
- | `agentboot test` | Full output: expected vs. actual | Pass/fail summary |
359
- | `agentboot cost-estimate` | Per-persona cost projection | Not run in CI |
360
-
361
- **"Submit" = opening the PR to the personas repo.** Before that, iterate privately.
362
- After that, CI validation and team review are fair game — just like code.
363
-
364
- ### Type 2: Developer Prompts (conversations with Claude Code)
365
-
366
- These are conversations. **They have no submit moment.** There is no PR for
367
- "explain this function" or "I don't understand this codebase."
368
-
369
- These are **always private.** The only thing that crosses the private→public
370
- boundary is what the developer **chooses to publish**: a PR comment, committed
371
- code, a filed issue. The conversation that produced that output stays private.
372
-
373
- | Tool | What the developer sees | What the org sees |
374
- |------|------------------------|-------------------|
375
- | `/insights` | Personal patterns and suggestions | Nothing (unless developer opts in to share anonymized aggregate) |
376
- | Telemetry | N/A (developer doesn't see telemetry) | Persona invocation counts, cost, findings — no prompts, no developer IDs |
377
-
378
- **There is no "after submit" state for developer prompts.** They are always in
379
- the private zone. AgentBoot's optimization tools for developer prompts operate
380
- on aggregates and patterns extracted via `/insights` — never on the prompts
381
- themselves, and never visible to the org unless the developer explicitly opts in.
382
-
383
- See [`docs/prompt-optimization.md`](prompt-optimization.md#two-types-of-prompts-two-different-models)
384
- for how each optimization tool maps to these two types.
385
-
386
- ---
387
-
388
- ## Building a Learning Culture, Not a Surveillance Culture
389
-
390
- ### What AgentBoot Should Do
391
-
392
- **Normalize asking questions:**
393
- - The SME discoverability fragment says "Ask me anything about [domain]"
394
- - Persona output never says "you should have known this"
395
- - The `/insights` skill frames knowledge gaps as opportunities, not failures
396
-
397
- **Celebrate improvement, not perfection:**
398
- - `/insights` shows "Your rephrase rate dropped from 28% to 15% — your prompts
399
- are getting more effective" — private, to the developer only
400
- - Team metrics show "code review rephrase rate dropped 8% this month" — no
401
- individual attribution
402
-
403
- **Make prompt quality a shared responsibility:**
404
- - When the org sees "auth patterns asked about 89 times," the action item is
405
- "improve the auth documentation," not "find out who doesn't know auth"
406
- - High rephrase rates are a **persona quality problem**, not a developer
407
- intelligence problem. "Developers need to rephrase 23% of the time" means
408
- the persona's output is unclear, not that developers are unclear.
409
-
410
- **Provide safe spaces to learn:**
411
- - Personal skills (`~/.claude/skills/`) are private
412
- - User-level CLAUDE.md is private
413
- - Session history is on the developer's machine
414
- - `/insights` is opt-in for sharing
415
-
416
- ### What AgentBoot Must NOT Do
417
-
418
- - **Never surface individual developer prompts** to anyone other than that developer
419
- - **Never rank developers** by prompt quality, question frequency, or AI usage
420
- - **Never gamify** — no leaderboards, badges, or "prompt of the week"
421
- - **Never shame** — no "your prompts are below team average" messages
422
- - **Never correlate** AI usage with performance reviews
423
- - **Never make AI usage mandatory** — skeptics opt out without penalty
424
-
425
- ---
426
-
427
- ## Technical Architecture
428
-
429
- ### What Gets Collected (Telemetry — Tier 3)
430
-
431
- The audit trail hooks collect only persona output metrics:
432
-
433
- ```json
434
- {
435
- "event": "persona_invocation",
436
- "persona_id": "code-reviewer",
437
- "timestamp": "2026-03-19T14:30:00Z",
438
- "model": "sonnet",
439
- "input_tokens": 8400,
440
- "output_tokens": 3200,
441
- "duration_ms": 45000,
442
- "cost_usd": 0.089,
443
- "findings_count": { "CRITICAL": 0, "ERROR": 1, "WARN": 3, "INFO": 2 },
444
- "scope": "team:platform/api"
445
- }
446
- ```
447
-
448
- **Note what's absent:** No developer ID. No prompt text. No conversation content.
449
- No file paths read. The telemetry is about the **persona**, not the developer.
450
-
451
- If the org needs to know adoption by team (not individual), the `scope` field
452
- provides that without identifying who within the team invoked the persona.
453
-
454
- ### What Stays Local (Private — Tier 1)
455
-
456
- - Claude Code session transcripts: `~/.claude/projects/{project}/{sessionId}/`
457
- - Auto memory: `~/.claude/projects/{project}/memory/`
458
- - Agent memory: `.claude/agent-memory-local/` (gitignored)
459
- - Local settings: `.claude/settings.local.json` (gitignored)
460
-
461
- AgentBoot does not read, transmit, or reference these. They are Claude Code's
462
- native private storage.
463
-
464
- ### What Gets Analyzed Locally (Privileged — Tier 2)
465
-
466
- The `/insights` skill (or `agentboot insights`) runs as a normal Claude API call:
467
-
468
- 1. Reads local session transcripts (Tier 1 data)
469
- 2. Sends them to Claude API for pattern extraction (Haiku for speed/cost, same
470
- trust boundary the developer already uses for every prompt)
471
- 3. Presents insights to the developer (private — only the developer sees them)
472
- 4. Developer optionally approves sharing anonymized patterns (Tier 3)
473
-
474
- No new data flow is created. The developer already sends prompts to the Claude API
475
- every time they use Claude Code. The `/insights` analysis is just another API call.
476
-
477
- The analysis prompt is explicitly designed to extract patterns, not judge:
478
-
479
- ```markdown
480
- Analyze these session transcripts and extract:
481
- 1. Most frequently asked topics (not the questions themselves)
482
- 2. Persona rephrase rate (how often the developer re-asks in different words)
483
- 3. Knowledge gaps (topics where the developer asks the same type of question repeatedly)
484
- 4. Persona friction points (where the persona's output consistently doesn't match expectations)
485
-
486
- Do NOT:
487
- - Quote any developer prompt
488
- - Judge the quality or intelligence of any question
489
- - Identify specific knowledge deficiencies
490
- - Produce output that could embarrass the developer if shared
491
-
492
- Frame everything as PERSONA improvement opportunities, not developer deficiencies.
493
- ```
494
-
495
- ### Configuration
496
-
497
- ```jsonc
498
- {
499
- "privacy": {
500
- "telemetry": {
501
- "enabled": true,
502
- "includeDevId": false, // Default: no developer identity
503
- "devIdFormat": "hashed", // If includeDevId: true → "hashed" (anonymous) or "email" (attributed)
504
- "includeCost": true, // Cost tracking
505
- "includeScope": true, // Team-level attribution
506
- "destination": "local" // "local" = NDJSON file; "http" = webhook
507
- },
508
- "insights": {
509
- "enabled": true,
510
- "autoShareAnonymized": false, // Developer must opt-in to share
511
- "escalation": {
512
- "enabled": true,
513
- "categories": ["exfiltration", "guardrail-circumvention", "malware", "harassment"],
514
- "contact": "security@acme-corp.com"
515
- }
516
- },
517
- "rawPrompts": {
518
- "collect": false, // AgentBoot does not collect raw prompts
519
- "transmit": false, // AgentBoot does not transmit raw prompts
520
- "surfaceToOrg": false // AgentBoot does not surface raw prompts to org dashboards
521
- }
522
- }
523
- }
524
- ```
525
-
526
- The `rawPrompts` section has three `false` fields that cannot be set to `true`.
527
- They exist in the schema to make AgentBoot's design intent explicit.
528
-
529
- Note: these fields control what **AgentBoot** does. They do not (and cannot) control
530
- what the API provider (Anthropic) offers through its own Compliance API or what the
531
- org does through network-level monitoring. See "The Honest Caveat" above.
532
-
533
- ---
534
-
535
- ## The Org Owner's Perspective: Measuring ROI Without Surveillance
536
-
537
- The privacy model protects developers. But an org owner has a legitimate duty to
538
- measure return on investment, identify who's getting value from the tooling, and
539
- ensure the investment is justified. These aren't surveillance impulses — they're
540
- fiduciary responsibilities.
541
-
542
- The question isn't "can I read their prompts?" (you respect that boundary). The
543
- question is: **what metrics can I get that tell me who's thriving, who needs help,
544
- and who's not engaging — without seeing what they type?**
545
-
546
- ### The Right Analogy: Measuring Code Output, Not Keystrokes
547
-
548
- You already measure developer effectiveness without watching them type:
549
- - You see PR throughput, not how many times they hit backspace
550
- - You see test pass rates, not how many times they ran tests locally
551
- - You see bug escape rates, not their Stack Overflow search history
552
- - You see sprint velocity, not their IDE open hours
553
-
554
- Apply the same model to AI usage. Measure **outputs and outcomes**, not inputs
555
- and conversations.
556
-
557
- ### Metrics the Org CAN See (Without Violating Privacy)
558
-
559
- #### Tier A: Usage Metrics (From Telemetry — No Developer IDs Required)
560
-
561
- These measure whether the investment is being used at all.
562
-
563
- | Metric | What it tells you | How it's collected |
564
- |--------|------------------|-------------------|
565
- | **Seats active / seats licensed** | Adoption rate | API key usage (Anthropic Console) |
566
- | **Sessions per day (org-wide)** | Overall engagement | Telemetry aggregate |
567
- | **Persona invocations per day** | Which personas deliver value | SubagentStart/Stop hooks |
568
- | **Cost per team per month** | Budget tracking | Telemetry `scope` field |
569
- | **Model mix** (% Haiku/Sonnet/Opus) | Cost efficiency | Telemetry `model` field |
570
-
571
- These are anonymous by default. You know "the platform team ran 340 code reviews
572
- this month" — not which individual ran them.
573
-
574
- #### Tier B: Outcome Metrics (From Artifacts — Naturally Attributed)
575
-
576
- These measure whether AI usage produces better results. They come from the artifacts
577
- developers **choose to publish** (PRs, commits, deployments) — not from AI conversations.
578
-
579
- | Metric | What it tells you | Source |
580
- |--------|------------------|--------|
581
- | **PR review turnaround** | Speed of code review | GitHub/GitLab API |
582
- | **Findings-to-fix ratio** | Are persona findings getting fixed? | PR comment resolution data |
583
- | **Bug escape rate** | Bugs in prod that a persona should have caught | Incident tracking |
584
- | **Test coverage delta** | Did test generation personas increase coverage? | CI coverage reports |
585
- | **PR rejection rate** | Are PRs getting better before review? | Git/PR data |
586
- | **Time to first commit** (new hires) | Is onboarding faster? | Git history |
587
- | **Compliance audit pass rate** | Are guardrails working? | Compliance tooling |
588
-
589
- These metrics are naturally tied to individuals because PRs are attributed. But
590
- they measure the **outcome** (the code), not the **process** (the conversation).
591
- This is exactly how engineering management already works.
592
-
593
- #### Tier C: Individual Usage Metrics (Opt-In or Policy-Declared)
594
-
595
- Here's where it gets nuanced. Some orgs need per-developer usage data for cost
596
- allocation, license justification, or identifying who needs training. AgentBoot
597
- can support this **if the org explicitly configures it and communicates the policy.**
598
-
599
- ```jsonc
600
- {
601
- "privacy": {
602
- "telemetry": {
603
- "includeDevId": true, // ⚠️ Opt-in: org must set this explicitly
604
- "devIdFormat": "hashed" // "hashed" = anonymized ID; "email" = real identity
605
- }
606
- }
607
- }
608
- ```
609
-
610
- When `includeDevId` is `true`, telemetry includes a developer identifier. The org
611
- chooses the format:
612
-
613
- | Format | What the org sees | Use case |
614
- |--------|------------------|----------|
615
- | `false` (default) | No developer identity | Privacy-first (recommended) |
616
- | `"hashed"` | Consistent anonymous ID (same person = same hash, but not reversible to a name) | Usage patterns without names — "developer X7f3a uses 3x more Opus than average" |
617
- | `"email"` | Real developer email | Full attribution — requires clear communication to the team |
618
-
619
- **AgentBoot's recommendation: start with `false` or `"hashed"`.** Full attribution
620
- should only be enabled when the org has communicated the policy to the team and
621
- explained why. Surprise surveillance destroys trust. Announced measurement builds
622
- accountability.
623
-
624
- ### What Each Format Gives the Org
625
-
626
- **`includeDevId: false`** (default — no individual tracking):
627
-
628
- ```
629
- Org Dashboard:
630
- Platform team: 47 sessions/day, $2,800/mo, 340 reviews
631
- Web team: 31 sessions/day, $1,200/mo, 180 reviews
632
- Data team: 22 sessions/day, $3,100/mo, 95 reviews ⚠️ high cost/session
633
- ```
634
-
635
- You know team-level patterns. You don't know individuals. This is sufficient
636
- for budget tracking and persona effectiveness. It's NOT sufficient for
637
- per-developer usage analysis.
638
-
639
- **`includeDevId: "hashed"`** (anonymous individual tracking):
640
-
641
- ```
642
- Org Dashboard:
643
- Developer a3f2... : 12 sessions/day, $14/day, 85% persona usage
644
- Developer 7b1c... : 8 sessions/day, $9/day, 72% persona usage
645
- Developer e4d8... : 1 session/day, $0.80/day, 15% persona usage ⚠️
646
- Developer 9a0f... : 0 sessions in 14 days ⚠️
647
- ```
648
-
649
- You see usage patterns and can identify outliers — but you can't see WHO they are.
650
- The hash is consistent (same person, same hash) so you can track trends over time.
651
- But you need a separate process to resolve the hash to a person if needed (the
652
- mapping exists only in a restricted-access lookup table).
653
-
654
- This is the sweet spot for most orgs. You can answer "are people using the tools?"
655
- and "is anyone spending way too much?" without creating a name-and-shame dynamic.
656
-
657
- **`includeDevId: "email"`** (full attribution):
658
-
659
- ```
660
- Org Dashboard:
661
- alice@acme.com : 12 sessions/day, $14/day, 85% persona usage, top reviewer
662
- bob@acme.com : 8 sessions/day, $9/day, 72% persona usage
663
- carol@acme.com : 1 session/day, $0.80/day, 15% persona usage ⚠️
664
- dave@acme.com : 0 sessions in 14 days ⚠️
665
- ```
666
-
667
- Full visibility. This is legitimate for cost allocation (chargeback to teams),
668
- license optimization (reassign unused seats), and identifying training needs.
669
- But it MUST be communicated to the team in advance. "We track AI tool usage
670
- the same way we track cloud resource usage — per developer, for cost management."
671
-
672
- ### Interpreting Usage Patterns
673
-
674
- **Usage alone is a bad metric.** A developer with 0 AI sessions might be a veteran
675
- who writes great code without AI, someone who doesn't know the tools exist (an
676
- onboarding gap), or someone who tried it and didn't find value (a persona quality
677
- issue). Low usage is a signal to investigate, not a judgment.
678
-
679
- **Outcome metrics are what matter.** Combine AI usage data with the metrics you
680
- already track:
681
-
682
- | Signal | Effective Usage | Low Adoption | Adoption Without Structure |
683
- |--------|----------------|-------------|---------------------------|
684
- | PR throughput | High, consistent | Varies | Inconsistent |
685
- | AI persona usage | Moderate-high, varied personas | Zero or near-zero | High sessions but low persona usage |
686
- | Findings-to-fix ratio | High (acts on review findings) | N/A (no reviews) | Low (ignores findings) |
687
- | Cost efficiency | Moderate cost per PR | $0 (no AI) | High (lots of rephrasing, exploration) |
688
- | Bug escape rate | Low | Varies | Medium |
689
-
690
- **The "adoption without structure" pattern is the most actionable.** A developer
691
- with high session count but low persona usage is spending time and money talking to
692
- AI without the structure that personas provide. The right response is training
693
- and better onboarding — improving the `/learn` skill, persona discoverability, and
694
- prompting tips.
695
-
696
- ### Recommended Dashboard for Org Owners
697
-
698
- ```
699
- AgentBoot Org Dashboard (Monthly)
700
- ─────────────────────────────────
701
-
702
- Investment Summary:
703
- Total AI spend: $8,200 (52 developers)
704
- Avg spend/developer: $158/mo
705
- Median spend/developer: $120/mo
706
- Top quartile: $280/mo
707
- Bottom quartile: $40/mo
708
-
709
- ROI Indicators:
710
- PR review turnaround: -34% (faster since deployment)
711
- Bug escape rate: -22% (fewer prod bugs)
712
- Test coverage: +15% (from test generation personas)
713
- Onboarding time: -40% (new hires productive faster)
714
-
715
- Adoption:
716
- Active seats: 47/52 (90%)
717
- Daily active users: 38 (73%)
718
- Weekly active users: 45 (87%)
719
- Persona usage rate: 68% of sessions invoke at least one persona
720
-
721
- Cost Efficiency:
722
- Opus usage: 12% of invocations, 68% of cost
723
- → Recommendation: audit Opus usage for model downgrade candidates
724
-
725
- Team Breakdown:
726
- Platform (8 devs): $2,800 — $350/dev — highest value (most reviews)
727
- Web (12 devs): $1,200 — $100/dev — moderate, mostly test gen
728
- Data (6 devs): $3,100 — $517/dev — ⚠️ investigate high cost
729
- Mobile (9 devs): $1,100 — $122/dev — healthy
730
- Unassigned (5 devs): $0 — ⚠️ not configured or not using
731
-
732
- Attention Items:
733
- ⚠ 5 licensed developers with zero usage in 30 days
734
- → Action: check onboarding status, offer training
735
- ⚠ Data team cost is 3x average
736
- → Action: review model selection (likely Opus overuse)
737
- ⚠ 32% of sessions don't use any persona
738
- → Action: improve persona discoverability (SME fragment, /prompting-tips)
739
- ```
740
-
741
- ### What This Dashboard Does NOT Show
742
-
743
- - Individual developer prompts or conversations
744
- - Individual developer rephrase rates or question topics
745
- - Ranking of developers by AI skill or prompt quality
746
- - Which developers asked "dumb" questions
747
- - Session transcripts or conversation excerpts
748
-
749
- The dashboard shows **investment metrics** (cost, adoption, ROI) and **outcome
750
- metrics** (PR quality, bug rates, coverage). It never shows **process metrics**
751
- (what developers typed, how many times they rephrased, what they asked about).
752
-
753
- ### The Escalation Path for Outliers
754
-
755
- When the dashboard shows an outlier (5 developers with zero usage, data team at
756
- 3x cost), the response flows through **management**, not through AgentBoot:
757
-
758
- 1. **Zero usage:** Manager has a conversation: "Hey, we invested in this tooling.
759
- Want me to set up a 15-minute walkthrough?" — not "the dashboard shows you
760
- haven't used AI."
761
- 2. **High cost:** Manager reviews with the team: "Our team's AI spend is 3x the
762
- org average. Let's look at which personas we're running on Opus and whether
763
- Sonnet would work." — not "Alice spent $40 yesterday."
764
- 3. **Low persona adoption:** Platform team improves discoverability: better
765
- CLAUDE.md fragment, `/prompting-tips` skill, team demo. — not "30% of
766
- developers aren't using personas correctly."
767
-
768
- The dashboard informs management actions. It doesn't automate them.
769
-
770
- ---
771
-
772
- ## How This Fits the User Spectrum
773
-
774
- | Segment | What they experience |
775
- |---------|---------------------|
776
- | **Power Users** | Full `/insights` with detailed personal analytics. Opt-in sharing. |
777
- | **Willing Adopters** | "Ask anything, no one sees your questions." Gradual comfort → use `/insights` later. |
778
- | **Skeptics** | "We don't monitor your AI conversations. Here's the privacy architecture." The technical proof matters to this audience. |
779
- | **Non-Engineers** | Same privacy model. Their Cowork interactions are equally private. |
780
- | **IT / Platform** | Aggregate dashboard. Team-level metrics. No individual surveillance. Escalation for compliance only. |
781
- | **Org Owner / Exec** | Investment dashboard: cost, adoption, ROI indicators, outcome metrics. Per-developer usage if policy allows (hashed or attributed). Never prompts. |
782
-
783
- ---
784
-
785
- ## The Commitment
786
-
787
- AgentBoot's privacy model is a **product differentiator**, not just a policy. In a
788
- market where enterprises are deploying AI monitoring tools, AgentBoot takes the
789
- opposite stance: **we help organizations improve their AI governance without being
790
- the tool that surveils their developers.**
791
-
792
- We're honest about the boundaries:
793
- - AgentBoot will never collect or surface raw prompts. That's our commitment.
794
- - Anthropic's Compliance API gives Enterprise orgs access to conversation content.
795
- That's Anthropic's product, and it exists whether AgentBoot is installed or not.
796
- - Organizations that want conversation monitoring have that option through their
797
- API provider. AgentBoot is not that channel and will not become it.
798
-
799
- This commitment should be:
800
- 1. **In the README** — visible to every evaluator
801
- 2. **Honest about the ecosystem** — acknowledge that other channels exist
802
- 3. **In AgentBoot's architecture** — our telemetry schema has no prompt fields
803
- 4. **In the pitch** — "your developers will trust AgentBoot because we optimize
804
- from aggregates, not transcripts"
805
-
806
- The best prompt optimization system is one that developers feed willingly because they
807
- trust it with their worst questions.