agentboot 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. package/.github/ISSUE_TEMPLATE/persona-request.md +62 -0
  2. package/.github/ISSUE_TEMPLATE/quality-feedback.md +67 -0
  3. package/.github/workflows/cla.yml +25 -0
  4. package/.github/workflows/validate.yml +49 -0
  5. package/.idea/agentboot.iml +9 -0
  6. package/.idea/misc.xml +6 -0
  7. package/.idea/modules.xml +8 -0
  8. package/.idea/vcs.xml +6 -0
  9. package/CLA.md +98 -0
  10. package/CLAUDE.md +230 -0
  11. package/CONTRIBUTING.md +168 -0
  12. package/LICENSE +191 -0
  13. package/NOTICE +4 -0
  14. package/PERSONAS.md +156 -0
  15. package/README.md +172 -0
  16. package/agentboot.config.json +207 -0
  17. package/bin/agentboot.js +17 -0
  18. package/core/gotchas/README.md +35 -0
  19. package/core/instructions/baseline.instructions.md +133 -0
  20. package/core/instructions/security.instructions.md +186 -0
  21. package/core/personas/code-reviewer/SKILL.md +175 -0
  22. package/core/personas/code-reviewer/persona.config.json +11 -0
  23. package/core/personas/security-reviewer/SKILL.md +233 -0
  24. package/core/personas/security-reviewer/persona.config.json +11 -0
  25. package/core/personas/test-data-expert/SKILL.md +234 -0
  26. package/core/personas/test-data-expert/persona.config.json +10 -0
  27. package/core/personas/test-generator/SKILL.md +262 -0
  28. package/core/personas/test-generator/persona.config.json +10 -0
  29. package/core/traits/audit-trail.md +182 -0
  30. package/core/traits/confidence-signaling.md +172 -0
  31. package/core/traits/critical-thinking.md +129 -0
  32. package/core/traits/schema-awareness.md +132 -0
  33. package/core/traits/source-citation.md +174 -0
  34. package/core/traits/structured-output.md +199 -0
  35. package/docs/ci-cd-automation.md +548 -0
  36. package/docs/claude-code-reference/README.md +21 -0
  37. package/docs/claude-code-reference/agentboot-coverage.md +484 -0
  38. package/docs/claude-code-reference/feature-inventory.md +906 -0
  39. package/docs/cli-commands-audit.md +112 -0
  40. package/docs/cli-design.md +924 -0
  41. package/docs/concepts.md +1117 -0
  42. package/docs/config-schema-audit.md +121 -0
  43. package/docs/configuration.md +645 -0
  44. package/docs/delivery-methods.md +758 -0
  45. package/docs/developer-onboarding.md +342 -0
  46. package/docs/extending.md +448 -0
  47. package/docs/getting-started.md +298 -0
  48. package/docs/knowledge-layer.md +464 -0
  49. package/docs/marketplace.md +822 -0
  50. package/docs/org-connection.md +570 -0
  51. package/docs/plans/architecture.md +2429 -0
  52. package/docs/plans/design.md +2018 -0
  53. package/docs/plans/prd.md +1862 -0
  54. package/docs/plans/stack-rank.md +261 -0
  55. package/docs/plans/technical-spec.md +2755 -0
  56. package/docs/privacy-and-safety.md +807 -0
  57. package/docs/prompt-optimization.md +1071 -0
  58. package/docs/test-plan.md +972 -0
  59. package/docs/third-party-ecosystem.md +496 -0
  60. package/domains/compliance-template/README.md +173 -0
  61. package/domains/compliance-template/traits/compliance-aware.md +228 -0
  62. package/examples/enterprise/agentboot.config.json +184 -0
  63. package/examples/minimal/agentboot.config.json +46 -0
  64. package/package.json +63 -0
  65. package/repos.json +1 -0
  66. package/scripts/cli.ts +1069 -0
  67. package/scripts/compile.ts +1000 -0
  68. package/scripts/dev-sync.ts +149 -0
  69. package/scripts/lib/config.ts +137 -0
  70. package/scripts/lib/frontmatter.ts +61 -0
  71. package/scripts/sync.ts +687 -0
  72. package/scripts/validate.ts +421 -0
  73. package/tests/REGRESSION-PLAN.md +705 -0
  74. package/tests/TEST-PLAN.md +111 -0
  75. package/tests/cli.test.ts +705 -0
  76. package/tests/pipeline.test.ts +608 -0
  77. package/tests/validate.test.ts +278 -0
  78. package/tsconfig.json +62 -0
@@ -0,0 +1,1071 @@
1
+ # Prompt & Cost Optimization
2
+
3
+ AgentBoot's core claim is "prompts as code." If prompts are code, they need the same
4
+ discipline as code: linting, testing, measurement, optimization, and review. This doc
5
+ covers how AgentBoot helps organizations write better prompts, spend less on tokens,
6
+ and measure the effectiveness of their personas.
7
+
8
+ ---
9
+
10
+ ## The Problem
11
+
12
+ Most organizations adopting AI agents have no prompt discipline:
13
+
14
+ - Persona prompts are written once and never measured
15
+ - No one knows which personas cost the most or deliver the least value
16
+ - Trait definitions are vague ("be thorough") instead of specific ("check for null safety on every nullable parameter")
17
+ - Context bloat goes unnoticed — CLAUDE.md files grow to 800 lines because no one prunes
18
+ - Model selection is vibes-based ("use Opus for everything" or "use Sonnet for everything")
19
+ - There's no feedback loop — a persona that produces 90% false positives keeps running
20
+
21
+ AgentBoot must close these gaps because **the prompts it generates are the product.**
22
+ A governance framework that produces poor prompts is worse than no framework — it's
23
+ actively wasting money and developer trust.
24
+
25
+ ---
26
+
27
+ ## The Local-First Principle
28
+
29
+ Every optimization tool in AgentBoot follows the same model that developers already
30
+ use for code: **run it locally first, gate it in CI second.**
31
+
32
+ ```
33
+ Developer's machine (private) CI / shared (visible)
34
+ ───────────────────────────── ─────────────────────
35
+
36
+ agentboot lint agentboot validate --strict
37
+ "Your prompt has vague language" PR blocked: trait reference missing
38
+ "Token budget exceeded by 800" PR blocked: schema validation failed
39
+ Nobody sees this but you. Fair game — you submitted it.
40
+
41
+ agentboot test --type behavioral agentboot test --ci
42
+ "Security reviewer missed SQLi" Test results posted to PR
43
+ Fix it before you push. Team sees pass/fail, not your drafts.
44
+
45
+ agentboot cost-estimate agentboot metrics (aggregate)
46
+ "This persona costs $0.56/run" "Team cost: $8,200/mo"
47
+ Personal planning tool. Org-level, anonymized.
48
+
49
+ /insights Org dashboard
50
+ "You rephrase auth questions often" "Auth patterns asked 89 times (team)"
51
+ Private to you. No individual attribution.
52
+ ```
53
+
54
+ This mirrors exactly how code works:
55
+ - `eslint` locally → fix before anyone sees → CI catches what you missed → fair game
56
+ - `npm test` locally → fix failures privately → CI runs on PR → results are public
57
+ - Your local git history is messy → your PR is clean
58
+
59
+ ### Two Types of Prompts, Two Different Models
60
+
61
+ **Type 1: Persona definitions** (SKILL.md, traits, instructions, gotchas rules).
62
+ These are code. They live in the personas repo. They go through PRs.
63
+
64
+ - "Submit" = open the PR to the personas repo
65
+ - Before submit: lint, test, cost-estimate locally — private, iterate freely
66
+ - After submit: CI validates, team reviews — fair game
67
+ - This is identical to the code workflow. No new model needed.
68
+
69
+ **Type 2: Developer prompts** (what someone types into Claude Code during their
70
+ workday). These are conversations. They have **no submit moment.** There is no PR
71
+ for "explain this function" or "what is a mutex."
72
+
73
+ - These are **always private**. There is no "after submit" state.
74
+ - AgentBoot's optimization tools for developer prompts (`/insights`, telemetry)
75
+ operate on aggregates and patterns, never on the prompts themselves.
76
+ - The only "output" that crosses the private→public boundary is what the developer
77
+ **chooses to publish**: a PR comment, committed code, a filed issue. The
78
+ conversation that produced that output stays private.
79
+
80
+ ```
81
+ Persona definitions (Type 1) Developer prompts (Type 2)
82
+ ───────────────────────────── ─────────────────────────────
83
+
84
+ Local editing (private) Always private
85
+ │ │
86
+ ▼ ▼
87
+ agentboot lint (private) /insights (private)
88
+ agentboot test (private) Telemetry: aggregates only
89
+ │ │
90
+ ▼ ▼
91
+ PR to personas repo (submit) Developer CHOOSES to publish
92
+ │ │
93
+ ▼ ▼
94
+ CI validates (fair game) PR comment, committed code,
95
+ Team reviews (fair game) filed issue (fair game)
96
+ Everything else stays private
97
+ ```
98
+
99
+ The optimization tools in this doc mostly target **Type 1** (persona definitions) —
100
+ linting, testing, cost estimation for the prompts that the platform team authors.
101
+ For **Type 2** (developer conversations), see
102
+ [`docs/privacy-and-safety.md`](privacy-and-safety.md) for the privacy model.
103
+
104
+ This isn't just about privacy — it's about **learning without humiliation.** A
105
+ developer who's new to prompt engineering needs to be able to write a bad persona
106
+ definition, see the linter tell them it's vague, fix it, and submit the good version.
107
+ If the linter's feedback were visible to the team, they'd never experiment. And a
108
+ developer asking Claude "what is a foreign key" for the third time needs to know
109
+ that question will never surface anywhere.
110
+
111
+ ---
112
+
113
+ ## 1. Prompt Linting (`agentboot lint`)
114
+
115
+ Static analysis of persona prompts, trait definitions, and instructions — catching
116
+ problems before they reach production.
117
+
118
+ ### Lint Rules
119
+
120
+ **Token budget rules:**
121
+
122
+ | Rule | Severity | What it catches |
123
+ |------|----------|----------------|
124
+ | `prompt-too-long` | WARN at 500 lines, ERROR at 1000 | Persona prompts that exceed context budgets |
125
+ | `claude-md-too-long` | WARN at 200 lines, ERROR at 500 | Generated CLAUDE.md exceeding CC's effective limit |
126
+ | `trait-too-long` | WARN at 100 lines | Individual traits that should be split |
127
+ | `total-context-estimate` | WARN at 30% of context window | Combined persona + traits + instructions exceed budget |
128
+
129
+ **Quality rules:**
130
+
131
+ | Rule | Severity | What it catches |
132
+ |------|----------|----------------|
133
+ | `vague-instruction` | WARN | "Be thorough", "try to", "if possible" — weak language |
134
+ | `conflicting-instructions` | ERROR | Two traits or instructions that contradict each other |
135
+ | `missing-output-format` | WARN | Reviewer persona with no structured output specification |
136
+ | `missing-severity-levels` | WARN | Reviewer persona without CRITICAL/ERROR/WARN/INFO definitions |
137
+ | `hardcoded-paths` | ERROR | Absolute paths that won't work across machines |
138
+ | `hardcoded-model` | WARN | Model name in prose (should be in frontmatter) |
139
+ | `unused-trait` | WARN | Trait defined but not composed by any persona |
140
+ | `missing-anti-patterns` | INFO | Trait without "What Not To Do" section |
141
+ | `missing-activation-condition` | WARN | Trait without "When This Trait Is Active" section |
142
+ | `duplicate-instruction` | WARN | Same instruction appears in multiple places |
143
+ | `no-examples` | INFO | Persona with no example input/output |
144
+
145
+ **Security rules:**
146
+
147
+ | Rule | Severity | What it catches |
148
+ |------|----------|----------------|
149
+ | `credential-in-prompt` | ERROR | API keys, tokens, passwords in prompt text |
150
+ | `internal-url` | ERROR | Internal URLs that shouldn't be in distributed prompts |
151
+ | `pii-in-example` | ERROR | Real names, emails, etc. in persona examples |
152
+
153
+ ### Implementation
154
+
155
+ ```bash
156
+ # LOCAL (private — developer's machine)
157
+ agentboot lint # Lint everything
158
+ agentboot lint --fix # Auto-fix what's possible (trim whitespace, etc.)
159
+ agentboot lint --persona code-reviewer # Lint one persona
160
+
161
+ # CI (visible — runs on PR to personas repo)
162
+ agentboot lint --severity error --format json # Errors only, machine-readable
163
+ # CI posts: "Lint: 0 errors, 3 warnings" — not the warning details
164
+ ```
165
+
166
+ The linter operates on source files (before compilation). It catches problems that
167
+ the build system's validate step doesn't — validate checks schema correctness; lint
168
+ checks prompt quality.
169
+
170
+ **Local vs. CI behavior:** When run locally, the linter shows full detail (which rules
171
+ failed, where, suggestions). When run in CI (`--ci` flag), it reports pass/fail and
172
+ counts only — the detailed feedback stays in the CI log, not posted to the PR comment.
173
+ The developer can check the CI log if their PR fails, but the team just sees "lint
174
+ failed: 2 errors."
175
+
176
+ ### Custom Lint Rules
177
+
178
+ Organizations can define custom lint rules in `agentboot.config.json`:
179
+
180
+ ```jsonc
181
+ {
182
+ "lint": {
183
+ "rules": {
184
+ "prompt-too-long": { "warn": 300, "error": 600 }, // Stricter than default
185
+ "vague-instruction": "error", // Upgrade to error
186
+ "total-context-estimate": { "warn": 20 } // 20% of context window
187
+ },
188
+ "custom": [
189
+ {
190
+ "id": "no-passive-voice",
191
+ "pattern": "should be|could be|might be",
192
+ "message": "Use imperative voice: 'Verify X' not 'X should be verified'",
193
+ "severity": "warn"
194
+ }
195
+ ]
196
+ }
197
+ }
198
+ ```
199
+
200
+ ---
201
+
202
+ ## 2. Token Budget System
203
+
204
+ Every persona should have a token budget — the estimated context cost of loading it.
205
+ The build system calculates this and the linter enforces it.
206
+
207
+ ### Budget Calculation
208
+
209
+ ```
210
+ Persona context cost =
211
+ persona SKILL.md body tokens
212
+ + sum(composed trait tokens)
213
+ + sum(always-on instruction tokens)
214
+ + sum(path-scoped rules likely to activate)
215
+ + estimated tool definitions (if MCP servers scoped)
216
+ ```
217
+
218
+ The build system calculates this for each persona and emits it in the compiled output:
219
+
220
+ ```yaml
221
+ ---
222
+ name: security-reviewer
223
+ estimated_tokens: 4200
224
+ budget_limit: 6000
225
+ model: sonnet
226
+ ---
227
+ ```
228
+
229
+ ### Budget Enforcement
230
+
231
+ ```jsonc
232
+ {
233
+ "lint": {
234
+ "tokenBudget": {
235
+ "perPersona": 6000, // Max tokens for any single persona
236
+ "perTrait": 1500, // Max tokens for any single trait
237
+ "totalAlwaysOn": 3000, // Max tokens for always-on instructions
238
+ "claudeMd": 2000 // Max tokens for generated CLAUDE.md
239
+ }
240
+ }
241
+ }
242
+ ```
243
+
244
+ ### Why This Matters
245
+
246
+ Claude Code's context window is 200k tokens. But effective adherence drops sharply
247
+ after the first ~50k tokens of instructions. A persona that loads 15k tokens of
248
+ instructions is pushing against the useful limit, especially when combined with
249
+ file reads, tool definitions, and conversation history. Keeping personas lean (under
250
+ 6k tokens) leaves room for the actual work.
251
+
252
+ ---
253
+
254
+ ## 3. Model Selection Optimization
255
+
256
+ Not every persona needs Opus. AgentBoot should guide organizations toward cost-effective
257
+ model assignment.
258
+
259
+ ### Model Selection Matrix
260
+
261
+ | Persona Type | Recommended Model | Reasoning |
262
+ |---|---|---|
263
+ | Code reviewer (standard) | Sonnet | Pattern matching, style checks — Sonnet handles well |
264
+ | Security reviewer | Opus | Deep reasoning about attack vectors, subtle vulnerabilities |
265
+ | Test generator | Sonnet | Structured output, pattern application |
266
+ | Test data expert | Haiku | Simple data generation, templated output |
267
+ | Architecture reviewer | Opus | Cross-file reasoning, system-level understanding |
268
+ | Cost reviewer | Sonnet | Rule-based checks, pattern matching |
269
+ | Compliance guardrail | Haiku | Pattern matching, fast response needed |
270
+ | Domain expert (SME) | Sonnet | Knowledge retrieval, explanation |
271
+
272
+ ### Cost Impact
273
+
274
+ | Model | Input $/M tokens | Output $/M tokens | Relative Cost |
275
+ |-------|-------------------|--------------------| --------------|
276
+ | Haiku | $0.80 | $4.00 | 1x |
277
+ | Sonnet | $3.00 | $15.00 | ~4x |
278
+ | Opus | $15.00 | $75.00 | ~19x |
279
+
280
+ A security review that costs $5 on Opus costs $1 on Sonnet. If Sonnet's quality is
281
+ sufficient for the task, that's $4 saved per invocation. Across 50 developers running
282
+ 10 reviews/day, that's $2,000/day.
283
+
284
+ ### `agentboot cost-estimate`
285
+
286
+ ```bash
287
+ $ agentboot cost-estimate
288
+
289
+ Cost Estimate (per invocation, Sonnet baseline)
290
+ ─────────────────────────────────────────────────
291
+
292
+ Persona Model Est. Input Est. Output Est. Cost
293
+ code-reviewer sonnet ~8k tokens ~3k tokens $0.07
294
+ security-reviewer opus ~12k tokens ~5k tokens $0.56
295
+ test-generator sonnet ~6k tokens ~8k tokens $0.14
296
+ test-data-expert haiku ~4k tokens ~6k tokens $0.03
297
+
298
+ Monthly estimate (50 devs, 10 invocations/day/dev):
299
+ code-reviewer: $7,350/mo
300
+ security-reviewer: $58,800/mo ⚠️ Consider Sonnet for routine scans
301
+ test-generator: $14,700/mo
302
+ test-data-expert: $3,150/mo
303
+ ────────────────────
304
+ Total: $84,000/mo
305
+
306
+ Optimization suggestions:
307
+ ⚠ security-reviewer on Opus is 19x cost of Sonnet.
308
+ Consider: Sonnet for routine PR reviews, Opus for deep security audits only.
309
+ Estimated savings: $44,100/mo
310
+ ```
311
+
312
+ ---
313
+
314
+ ## 4. Prompt Effectiveness Metrics
315
+
316
+ You can't improve what you don't measure. AgentBoot should track persona effectiveness
317
+ over time.
318
+
319
+ ### What to Measure
320
+
321
+ **Efficiency metrics** (automated, from telemetry):
322
+
323
+ | Metric | What it measures | Source |
324
+ |--------|-----------------|--------|
325
+ | Tokens per invocation | Context efficiency | Audit trail hook |
326
+ | Cost per invocation | Dollar cost | Audit trail hook |
327
+ | Time to completion | Latency | Audit trail hook |
328
+ | Tool calls per invocation | How much exploration the persona does | PostToolUse hook |
329
+ | Compaction frequency | Whether the persona exhausts context | PostCompact hook |
330
+
331
+ **Quality metrics** (requires evaluation):
332
+
333
+ | Metric | What it measures | Source |
334
+ |--------|-----------------|--------|
335
+ | Finding accuracy | % of findings that developers act on | PR review data |
336
+ | False positive rate | % of findings developers dismiss | PR review data |
337
+ | Severity calibration | Are CRITICALs actually critical? | Post-hoc analysis |
338
+ | Coverage | % of issues caught by persona vs. missed | Bug tracking correlation |
339
+ | Developer satisfaction | Do developers trust this persona? | Survey / NPS |
340
+
341
+ **Business metrics** (organizational):
342
+
343
+ | Metric | What it measures | Source |
344
+ |--------|-----------------|--------|
345
+ | Adoption rate | % of developers using personas regularly | Session telemetry |
346
+ | Time to first invocation | How quickly new devs start using personas | Onboarding tracking |
347
+ | Review turnaround | Time from PR open to persona review complete | CI telemetry |
348
+ | Bug escape rate | Bugs in production that a persona should have caught | Incident correlation |
349
+
350
+ ### Telemetry Implementation
351
+
352
+ The audit-trail trait (which all personas should compose) emits structured telemetry:
353
+
354
+ ```json
355
+ {
356
+ "event": "persona_invocation",
357
+ "persona_id": "security-reviewer",
358
+ "persona_version": "1.2.0",
359
+ "model": "sonnet",
360
+ "scope": "team:platform/api",
361
+ "input_tokens": 8420,
362
+ "output_tokens": 3200,
363
+ "thinking_tokens": 12000,
364
+ "tool_calls": 7,
365
+ "duration_ms": 45000,
366
+ "cost_usd": 0.089,
367
+ "findings_count": { "CRITICAL": 0, "ERROR": 1, "WARN": 3, "INFO": 2 },
368
+ "suggestions": 2,
369
+ "timestamp": "2026-03-19T14:30:00Z",
370
+ "session_id": "abc123"
371
+ }
372
+ ```
373
+
374
+ Emitted via an async `Stop` hook so it doesn't slow down the developer. Appended to
375
+ a local NDJSON file or posted to an HTTP endpoint (configurable).
376
+
377
+ ### `agentboot metrics`
378
+
379
+ ```bash
380
+ agentboot metrics # Show all metrics
381
+ agentboot metrics --persona code-reviewer # One persona
382
+ agentboot metrics --team api # One team
383
+ agentboot metrics --period 30d # Last 30 days
384
+ agentboot metrics --format json # Machine-readable
385
+ ```
386
+
387
+ Reads from the NDJSON telemetry log. No external database required for V1.
388
+
389
+ ---
390
+
391
+ ## 5. Prompt Writing Best Practices
392
+
393
+ AgentBoot should encode prompt engineering best practices into its scaffolding and
394
+ documentation, so every persona starts with good patterns.
395
+
396
+ ### The AgentBoot Prompt Style Guide
397
+
398
+ **Structure every persona prompt with these sections:**
399
+
400
+ ```markdown
401
+ ## Identity (who you are)
402
+ One sentence. Role + specialization + stance.
403
+
404
+ ## Setup (what to do first)
405
+ Numbered steps the persona runs before producing output.
406
+ 1. Read the diff / file / context
407
+ 2. Load extension files if they exist
408
+ 3. Determine operating mode from arguments
409
+
410
+ ## Rules (what to check)
411
+ Numbered checklist. Specific, imperative, testable.
412
+ Each rule should be falsifiable — you can point to code and say "this violates rule 3."
413
+
414
+ ## Output Format (how to report)
415
+ Exact schema. Severity levels defined. Example output provided.
416
+
417
+ ## What Not To Do (anti-patterns)
418
+ Explicit exclusions. "Do not review code quality — defer to code-reviewer."
419
+ Prevents scope creep and reduces false positives.
420
+ ```
421
+
422
+ **Rules for writing effective instructions:**
423
+
424
+ 1. **Imperative voice.** "Verify that..." not "It should be verified that..."
425
+ 2. **Specific over general.** "Check that every async function has a try/catch" not "Handle errors properly"
426
+ 3. **Falsifiable.** Every instruction should be testable — you can write a test case that either passes or fails against it
427
+ 4. **Scoped.** Each instruction addresses one concern. Don't combine "check for SQL injection AND verify test coverage" in one bullet
428
+ 5. **Examples over descriptions.** Show what a violation looks like, not just describe it
429
+ 6. **Cite sources.** "Per OWASP A03:2021 — Injection" not "security best practice"
430
+ 7. **Include confidence guidance.** "Flag as WARN if uncertain, ERROR only if confirmed"
431
+ 8. **Limit to 20 rules per persona.** Beyond 20, adherence drops. Split into multiple personas if needed.
432
+
433
+ ### Prompt Templates
434
+
435
+ `agentboot add persona` should scaffold with these patterns baked in:
436
+
437
+ ```bash
438
+ $ agentboot add persona my-reviewer
439
+
440
+ Created: core/personas/my-reviewer/
441
+ ├── SKILL.md # Scaffolded with Identity/Setup/Rules/Output/Anti-patterns
442
+ └── persona.config.json
443
+
444
+ Next: Edit SKILL.md to define your reviewer's rules.
445
+ Run: agentboot lint --persona my-reviewer to check quality.
446
+ ```
447
+
448
+ The scaffolded SKILL.md includes placeholder sections with inline guidance:
449
+
450
+ ```markdown
451
+ ---
452
+ name: my-reviewer
453
+ description: [One line — what triggers this persona and what it does]
454
+ version: 1.0.0
455
+ traits:
456
+ critical-thinking: MEDIUM
457
+ structured-output: true
458
+ source-citation: true
459
+ ---
460
+
461
+ ## Identity
462
+
463
+ You are a [role] specializing in [domain]. Your job is to find
464
+ [what you're looking for] — not to [what you explicitly don't do].
465
+
466
+ ## Setup
467
+
468
+ 1. Run `git diff HEAD` to see current changes. If no changes, run
469
+ `git diff HEAD~1` to review the most recent commit.
470
+ 2. For each changed file, read the **full file** for context.
471
+ 3. If `.claude/extensions/my-reviewer.md` exists, read it for
472
+ project-specific rules.
473
+
474
+ ## Rules
475
+
476
+ <!-- Keep to 20 rules max. Each should be:
477
+ - Imperative voice ("Verify that..." not "It should be...")
478
+ - Specific and testable
479
+ - One concern per rule -->
480
+
481
+ 1. **[Rule name]:** [Specific, testable instruction]
482
+ 2. **[Rule name]:** [Specific, testable instruction]
483
+
484
+ ## Output Format
485
+
486
+ <!-- Use the structured-output trait's format. Customize severity thresholds. -->
487
+
488
+ Findings report with severity classifications:
489
+ - **CRITICAL**: [What qualifies — e.g., "blocks release, violates compliance"]
490
+ - **ERROR**: [What qualifies — e.g., "must fix before merge"]
491
+ - **WARN**: [What qualifies — e.g., "should address, not blocking"]
492
+ - **INFO**: [What qualifies — e.g., "observation only"]
493
+
494
+ ## What Not To Do
495
+
496
+ - Do not review [out-of-scope concern] — defer to [other-persona].
497
+ - Do not suggest refactoring unless it directly addresses a finding.
498
+ - Do not praise the code. Your job is to find problems.
499
+ ```
500
+
501
+ ---
502
+
503
+ ## 6. Prompt Testing (`agentboot test`)
504
+
505
+ Beyond linting (static analysis), personas need behavioral testing — does the prompt
506
+ actually produce the expected output when given known input?
507
+
508
+ ### Test Types
509
+
510
+ **Deterministic tests** (no LLM call, fast, free):
511
+ - Frontmatter validation (schema, required fields)
512
+ - Token budget verification
513
+ - Trait composition verification (all referenced traits exist and compose)
514
+ - Output format schema validation (if `--json-schema` is specified)
515
+
516
+ **Behavioral tests** (LLM call, slower, costs money):
517
+ - Given known-bad code, does the security reviewer find the vulnerability?
518
+ - Given clean code, does the code reviewer avoid false positives?
519
+ - Given PHI in input, does the guardrail block it?
520
+ - Given a FHIR resource, does the domain expert recognize it?
521
+
522
+ **Regression tests** (LLM call, compare against baseline):
523
+ - "This persona produced these findings last week. After the prompt change,
524
+ does it still find the same issues?" (snapshot testing)
525
+
526
+ ### Test File Format
527
+
528
+ ```yaml
529
+ # tests/security-reviewer.test.yaml
530
+ persona: security-reviewer
531
+ model: sonnet # Use cheaper model for tests
532
+ max_budget_usd: 0.50
533
+
534
+ cases:
535
+ - name: "Catches SQL injection"
536
+ input: |
537
+ Review this code:
538
+ ```python
539
+ def get_user(user_id):
540
+ query = f"SELECT * FROM users WHERE id = {user_id}"
541
+ return db.execute(query)
542
+ ```
543
+ expect:
544
+ findings_min: 1
545
+ severity_includes: ["CRITICAL", "ERROR"]
546
+ text_includes: ["SQL injection", "parameterized"]
547
+
548
+ - name: "No false positives on safe code"
549
+ input: |
550
+ Review this code:
551
+ ```python
552
+ def get_user(user_id: int):
553
+ return db.execute("SELECT * FROM users WHERE id = %s", (user_id,))
554
+ ```
555
+ expect:
556
+ findings_max: 0
557
+ severity_excludes: ["CRITICAL", "ERROR"]
558
+
559
+ - name: "Structured output format"
560
+ input: "Review the file src/auth/login.ts"
561
+ expect:
562
+ json_schema: "./schemas/review-output.json"
563
+ ```
564
+
565
+ ### Running Tests
566
+
567
+ ```bash
568
+ # LOCAL (private — iterate until tests pass)
569
+ agentboot test # Run all tests
570
+ agentboot test --persona security-reviewer # One persona
571
+ agentboot test --type deterministic # Free tests only
572
+ agentboot test --type behavioral # LLM tests only
573
+ agentboot test --update-snapshots # Update regression baselines
574
+ agentboot test --max-budget 5.00 # Cost cap for test suite
575
+
576
+ # CI (visible — runs on PR, reports pass/fail)
577
+ agentboot test --ci # Exit codes + summary only
578
+ # CI posts: "Tests: 12 passed, 0 failed" — not the test details
579
+ ```
580
+
581
+ **Local vs. CI behavior:** Locally, you see full output — which test cases passed,
582
+ which failed, what the persona produced vs. what was expected. In CI, the PR gets
583
+ a pass/fail summary. If a test fails, the developer checks the CI log privately.
584
+ The team sees "test failed" not "your persona prompt produced garbage output for
585
+ the SQL injection test case."
586
+
587
+ ---
588
+
589
+ ## 7. Developer Prompt Development (Type 2 — Always Private)
590
+
591
+ Sections 1–6 above cover Type 1 prompts (persona definitions that go through PRs).
592
+ This section covers Type 2 — the developer's daily interactions with Claude Code.
593
+ These prompts are never submitted, never reviewed, and never visible to anyone else.
594
+ But they're where most of the value (and waste) lives.
595
+
596
+ A developer who doesn't know how to ask for what they need wastes time, tokens, and
597
+ trust. A developer who's learned to prompt effectively gets 10x the value from the
598
+ same tooling. AgentBoot should help developers get better at this — privately.
599
+
600
+ ### The Prompt Development Lifecycle
601
+
602
+ Every developer prompt goes through a cycle, even if it happens in seconds:
603
+
604
+ ```
605
+ Intent → Prompt → Response → Evaluation
606
+ (what I need) (what I typed) (what I got) (was it useful?)
607
+ │ │
608
+ └──────────── Rephrase if not ─────────────────────────────┘
609
+ ```
610
+
611
+ The rephrase loop is where tokens and time are wasted. A developer who rephrases
612
+ 3 times to get the right answer spends 4x the tokens of one who gets it on the
613
+ first try. Improving that first-try success rate is the highest-leverage
614
+ optimization — and it has to happen privately.
615
+
616
+ ### What AgentBoot Can Provide (All Private, All Local)
617
+
618
+ **Prompt patterns library:**
619
+ AgentBoot should ship a personal skill (`/prompting-tips` or similar) with patterns
620
+ for effective prompting in common scenarios:
621
+
622
+ ```
623
+ /prompting-tips
624
+
625
+ Common patterns for effective prompts:
626
+
627
+ INSTEAD OF TRY
628
+ ───────────────────────────── ──────────────────────────────────
629
+ "Fix the bug" "The test in auth.test.ts:47 fails
630
+ with 'undefined is not a function'.
631
+ The relevant code is in auth.ts:30-50."
632
+
633
+ "Review this code" "/review-code src/api/users.ts"
634
+ (Use the persona — it has structured
635
+ output and consistent rules)
636
+
637
+ "Make it better" "Refactor getUserById to handle the
638
+ case where the user is soft-deleted.
639
+ The current behavior returns null
640
+ but callers expect a 404 error."
641
+
642
+ "How does auth work?" "Read src/auth/ and explain the
643
+ authentication flow from login
644
+ to token refresh, including which
645
+ middleware runs on each request."
646
+ ```
647
+
648
+ This skill loads on-demand (not always-on), costs nothing when not used, and teaches
649
+ by example.
650
+
651
+ **Personal `/insights` analysis:**
652
+ As described in the privacy doc, `/insights` analyzes the developer's session
653
+ transcripts and identifies patterns — privately. Key signals:
654
+
655
+ - **Rephrase rate:** How often the developer asks the same question in different
656
+ words. High rephrase rate = the developer isn't getting what they need on the
657
+ first try. Could be a prompting issue or a persona quality issue.
658
+ - **Specificity trend:** Are prompts getting more specific over time? This indicates
659
+ the developer is learning effective prompting patterns.
660
+ - **Persona discovery:** Is the developer using available personas or doing things
661
+ manually? "You ran `git diff | head -100` and then asked Claude to review it.
662
+ The `/review-code` persona does this automatically."
663
+ - **Cost awareness:** "Your sessions average $2.40/day. The team average is $1.80.
664
+ Your longest sessions are code exploration — consider using the Explore subagent
665
+ which uses a cheaper model."
666
+
667
+ All of this is private. The developer sees it, nobody else.
668
+
669
+ **Context-aware prompting hints:**
670
+ AgentBoot's always-on CLAUDE.md can include a lightweight prompting guide that
671
+ activates based on context:
672
+
673
+ ```markdown
674
+ ## Prompting Guidelines
675
+
676
+ When asking for code review, use /review-code instead of describing what to check.
677
+ When asking about a specific file, reference it by path: "Read src/auth/login.ts and..."
678
+ When reporting a bug, include: the error message, the file and line, and what you expected.
679
+ ```
680
+
681
+ This is ~50 tokens, always loaded, and teaches through proximity — the developer sees
682
+ it when they start a session and gradually internalizes the patterns.
683
+
684
+ **Prompt templates in skills:**
685
+ Skills with `argument-hint` and `$ARGUMENTS` substitution are prompt templates:
686
+
687
+ ```yaml
688
+ ---
689
+ name: explain-code
690
+ description: Explain how a piece of code works
691
+ argument-hint: "[file-path] [specific-question]"
692
+ ---
693
+
694
+ Read $ARGUMENTS[0] completely. Then explain:
695
+ 1. What this code does (one paragraph)
696
+ 2. The key design decisions and why they were made
697
+ 3. How it connects to the rest of the codebase
698
+ 4. Any non-obvious behavior or edge cases
699
+
700
+ If a specific question was provided: $ARGUMENTS[1]
701
+ ```
702
+
703
+ The developer types `/explain-code src/auth/middleware.ts "why is there a double-check on token expiry?"` and gets a structured, effective prompt without having to craft it from scratch. The skill IS the prompt template.
704
+
705
+ **Learning through personas:**
706
+ The personas themselves are teaching tools. When a developer invokes `/review-code`
707
+ and sees:
708
+
709
+ ```
710
+ [ERROR] src/api/users.ts:47
711
+ Missing null check on userId before database call.
712
+ Recommendation: Add guard clause.
713
+ Confidence: HIGH
714
+ Source: src/middleware/auth.ts:12 — pattern used on all other endpoints
715
+ ```
716
+
717
+ They're learning: "oh, null checks before DB calls, and I should cite patterns
718
+ from elsewhere in the codebase." Next time they write code, they'll add the null
719
+ check before the persona has to tell them. The persona is a feedback loop that
720
+ teaches the org's standards through repeated exposure — privately, one developer
721
+ at a time.
722
+
723
+ ### What This Means for AgentBoot
724
+
725
+ The developer prompt development experience is primarily delivered through:
726
+
727
+ 1. **Personas** — structured prompts that work better than ad-hoc questions
728
+ 2. **Skills** — prompt templates with argument hints and substitution
729
+ 3. **`/insights`** — private analytics on prompting patterns
730
+ 4. **Prompting tips** — a lightweight personal skill with example patterns
731
+ 5. **Always-on hints** — ~50 tokens in CLAUDE.md with contextual prompting guidance
732
+
733
+ None of these require collecting, transmitting, or surfacing developer prompts.
734
+ They work by **giving developers better tools** (personas, skills, templates) so
735
+ their prompts are effective from the start, and **private feedback** (`/insights`)
736
+ so they can improve over time without anyone watching.
737
+
738
+ ---
739
+
740
+ ## 8. Prompt Ingestion (`agentboot add prompt`)
741
+
742
+ Before the marketplace, before the PR to the personas repo, there's the moment
743
+ someone says: "Hey, try this prompt — it's great for catching auth bugs." It's in
744
+ a Slack message. Or a blog post. Or a tweet. Or scribbled on a sticky note.
745
+
746
+ This is how most knowledge sharing actually works. Not through formal contribution
747
+ processes, but through "I know a guy who gave me this awesome prompt."
748
+
749
+ `agentboot add prompt` is the on-ramp from informal sharing to governed content. It
750
+ takes raw text — any prompt, rule, tip, or instruction — and converts it into the
751
+ right AgentBoot structure.
752
+
753
+ ### How It Works
754
+
755
+ ```bash
756
+ # Paste a raw prompt
757
+ agentboot add prompt "Always check for null safety before database calls.
758
+ Verify that every nullable parameter has a guard clause. If you find a
759
+ database call without null checking, flag it as ERROR."
760
+
761
+ # AgentBoot analyzes and suggests:
762
+ #
763
+ # Analyzed your prompt. Here's what I'd make from it:
764
+ #
765
+ # Type: Rule (path-scoped gotcha)
766
+ # Name: null-safety-database
767
+ # Scope: Activates on: **/*.ts, **/*.js (files with database calls)
768
+ # Content: Formatted with paths: frontmatter + imperative rules
769
+ #
770
+ # Preview:
771
+ # ┌──────────────────────────────────────────────────┐
772
+ # │ --- │
773
+ # │ paths: │
774
+ # │ - "**/*.ts" │
775
+ # │ - "**/*.js" │
776
+ # │ - "**/db/**" │
777
+ # │ - "**/repositories/**" │
778
+ # │ description: "Null safety before database calls" │
779
+ # │ --- │
780
+ # │ │
781
+ # │ # Null Safety — Database Calls │
782
+ # │ │
783
+ # │ - Verify that every nullable parameter has a │
784
+ # │ guard clause before any database call. │
785
+ # │ - Flag database calls without null checking │
786
+ # │ as ERROR. │
787
+ # └──────────────────────────────────────────────────┘
788
+ #
789
+ # Actions:
790
+ # [1] Save as gotcha rule → .claude/rules/null-safety-database.md
791
+ # [2] Save as trait → core/traits/null-safety.md
792
+ # [3] Add to existing persona → append to code-reviewer rules
793
+ # [4] Save to personal rules → ~/.claude/rules/null-safety.md (private)
794
+ # [5] Dry run — show me what this would look like in context
795
+ # [6] Edit first — open in editor before saving
796
+ ```
797
+
798
+ ### Input Sources
799
+
800
+ ```bash
801
+ # Raw text (typed or pasted)
802
+ agentboot add prompt "Always verify RLS is enabled on new tables"
803
+
804
+ # From a file
805
+ agentboot add prompt --file ~/Downloads/auth-tips.md
806
+
807
+ # From clipboard
808
+ agentboot add prompt --clipboard
809
+
810
+ # From a URL (blog post, gist, tweet)
811
+ agentboot add prompt --url https://blog.example.com/postgres-gotchas
812
+
813
+ # Interactive (opens editor for multi-line input)
814
+ agentboot add prompt --interactive
815
+
816
+ # From stdin (pipe from another command)
817
+ cat slack-message.txt | agentboot add prompt --stdin
818
+ ```
819
+
820
+ ### What the Classifier Does
821
+
822
+ The raw prompt goes through classification to determine what it should become:
823
+
824
+ | Signal in the Prompt | Classified As | Destination |
825
+ |---|---|---|
826
+ | "Always...", "Never...", "Verify that..." | **Rule / Gotcha** | `.claude/rules/` |
827
+ | Technology-specific warning with examples | **Gotcha** (path-scoped) | `.claude/rules/` with `paths:` |
828
+ | Behavioral stance ("be skeptical", "cite sources") | **Trait** | `core/traits/` |
829
+ | Complete review workflow with output format | **Persona** | `core/personas/` |
830
+ | Single-use instruction ("for this PR, check X") | **Session instruction** | Not persisted — add to CLAUDE.md or use as-is |
831
+ | Vague/motivational ("write good code") | **Rejected** | "This is too vague to be actionable. Try: [specific suggestion]" |
832
+
833
+ The classification uses the same Claude API call the developer already has
834
+ (Haiku for speed). It's a single prompt that analyzes the input and suggests
835
+ the right type, name, scope, and file path.
836
+
837
+ ### Dry Run Mode
838
+
839
+ ```bash
840
+ agentboot add prompt "Check FHIR resources for valid CodeableConcept" --dry-run
841
+
842
+ # Dry Run — nothing will be written
843
+ #
844
+ # Input: "Check FHIR resources for valid CodeableConcept"
845
+ #
846
+ # Classification: Gotcha rule (domain-specific, path-scoped)
847
+ # Suggested paths: ["**/fhir/**", "**/*resource*"]
848
+ #
849
+ # Would write to: .claude/rules/fhir-codeable-concept.md
850
+ #
851
+ # Lint results:
852
+ # ⚠ WARN: vague-instruction — "valid" is not specific enough.
853
+ # Suggestion: "Verify CodeableConcept includes system, code, and display fields"
854
+ #
855
+ # Token impact: +45 tokens to context when paths match
856
+ #
857
+ # No files modified. Run without --dry-run to save.
858
+ ```
859
+
860
+ The dry run shows what WOULD happen: classification, destination, lint results,
861
+ and token impact — without writing anything. This is the safe way to evaluate
862
+ someone else's prompt before incorporating it.
863
+
864
+ ### The Sharing Spectrum
865
+
866
+ This feature fills the gap between "I have a raw prompt" and "it's in the
867
+ marketplace":
868
+
869
+ ```
870
+ Informal Formal
871
+ ────────────────────────────────────────────────────────────────────────
872
+
873
+ "Try this agentboot In my org's In the In the
874
+ prompt" → add prompt → personas repo → org's → public
875
+ (classify, (PR, review, private marketplace
876
+ format, CI validates) marketplace (community)
877
+ save locally)
878
+
879
+ Slack Private/Local Team-visible Org-wide Public
880
+ message (my machine) (after PR) (plugin) (everyone)
881
+ ```
882
+
883
+ Most prompts stay on the left side forever — and that's fine. The developer adds
884
+ it as a personal rule or gotcha, it helps them, nobody else needs to know. But
885
+ the pipeline to formalize it is there when the prompt proves its value.
886
+
887
+ ### Batch Ingestion
888
+
889
+ For orgs migrating to AgentBoot from an existing CLAUDE.md or custom setup:
890
+
891
+ ```bash
892
+ # Ingest an existing CLAUDE.md and classify each instruction
893
+ agentboot add prompt --file .claude/CLAUDE.md --batch
894
+
895
+ # Analyzing 47 instructions in CLAUDE.md...
896
+ #
897
+ # Classification:
898
+ # 12 → gotcha rules (path-scopeable)
899
+ # 8 → traits (behavioral)
900
+ # 3 → persona instructions (should be in specific persona)
901
+ # 18 → always-on rules (keep in CLAUDE.md)
902
+ # 4 → too vague (need rewriting)
903
+ # 2 → org-specific (keep but don't share)
904
+ #
905
+ # [1] Apply all suggestions (creates files, rewrites CLAUDE.md)
906
+ # [2] Review one by one
907
+ # [3] Export classification report (review offline)
908
+ ```
909
+
910
+ This is the migration tool. An org with an 800-line CLAUDE.md can decompose it into
911
+ proper AgentBoot structure — gotchas with path scoping, traits that compose, and a
912
+ lean CLAUDE.md with only the always-on essentials. The classification does the
913
+ analysis; the developer approves each decision.
914
+
915
+ ### What This Means for Novice Users
916
+
917
+ A developer who's never written a trait or a gotchas rule doesn't need to learn the
918
+ format first. They paste the prompt they already use. AgentBoot classifies it,
919
+ formats it, and puts it in the right place. The developer learns the structure by
920
+ seeing what AgentBoot produced — "oh, that's what a gotcha looks like" — not by
921
+ reading documentation.
922
+
923
+ Over time, they start writing traits and gotchas directly because they've seen
924
+ enough examples. The `add prompt` command is a scaffold that teaches by doing.
925
+
926
+ When they have something worth sharing, `agentboot publish` is one more step.
927
+ But that's a graduation moment, not a starting requirement.
928
+
929
+ ---
930
+
931
+ ## 9. Continuous Optimization Loop
932
+
933
+ Metrics feed back into prompt improvements in a structured cycle:
934
+
935
+ ```
936
+ ┌─────────────────────────────────────────┐
937
+ │ │
938
+ ▼ │
939
+ Write/Edit ──► Lint ──► Build ──► Deploy │
940
+ persona │ │ │ │
941
+ │ │ │ │
942
+ Tests Token Telemetry │
943
+ │ budget │ │
944
+ │ │ │ │
945
+ └────────┴────────┘ │
946
+ │ │
947
+ Metrics ───► Review ────┘
948
+ dashboard (weekly)
949
+ ```
950
+
951
+ ### Weekly Review Process
952
+
953
+ 1. **Pull metrics:** `agentboot metrics --period 7d`
954
+ 2. **Identify outliers:**
955
+ - Personas with high false positive rates → tighten rules
956
+ - Personas with high token usage → compress prompts
957
+ - Personas rarely invoked → investigate why (not useful? not discoverable?)
958
+ - Personas on Opus that could run on Sonnet → test downgrade
959
+ 3. **Update prompts:** Edit SKILL.md based on findings
960
+ 4. **Run tests:** `agentboot test` to verify changes don't regress
961
+ 5. **Lint:** `agentboot lint` to check quality
962
+ 6. **Deploy:** `agentboot build && agentboot sync`
963
+
964
+ ### Automation Hooks
965
+
966
+ AgentBoot could generate a `/optimize` skill that automates parts of this:
967
+
968
+ ```markdown
969
+ /optimize
970
+
971
+ Analyzes the last 7 days of persona telemetry and suggests:
972
+ - Prompt compression opportunities (traits that could be shorter)
973
+ - Model downgrade candidates (personas running on Opus that perform equally on Sonnet)
974
+ - False positive patterns (findings that are consistently dismissed)
975
+ - Coverage gaps (file types or directories with no persona coverage)
976
+ ```
977
+
978
+ ---
979
+
980
+ ## 10. Context Efficiency Patterns
981
+
982
+ Techniques AgentBoot applies to keep generated output token-efficient.
983
+
984
+ ### @import Over Inlining
985
+
986
+ For Claude Code repos, traits stay as separate files and CLAUDE.md uses `@imports`.
987
+ A trait shared by 4 personas is loaded once, not inlined 4 times.
988
+
989
+ **Savings:** If `critical-thinking.md` is 800 tokens and composed by 4 personas,
990
+ inlining costs 3,200 tokens. @import costs 800 tokens. Savings: 2,400 tokens.
991
+
992
+ ### Progressive Disclosure via Skills
993
+
994
+ Always-on instructions (CLAUDE.md) should be minimal. Specialized knowledge lives
995
+ in skills that load on-demand:
996
+
997
+ - **CLAUDE.md** (always loaded): org name, team, basic conventions (~200 lines)
998
+ - **Skills** (loaded on invocation): persona prompts, domain knowledge, review rules
999
+
1000
+ This is why the `context: fork` pattern matters — the skill forks to a subagent with
1001
+ its own context, so the persona's 4,000-token prompt doesn't pollute the main
1002
+ conversation.
1003
+
1004
+ ### Path-Scoped Rules Over Always-On
1005
+
1006
+ Gotchas rules with `paths:` frontmatter load only when relevant files are touched.
1007
+ A database gotchas file (500 tokens) is zero-cost when working on frontend code.
1008
+
1009
+ ### Model-Appropriate Effort
1010
+
1011
+ ```yaml
1012
+ # In persona.config.json
1013
+ effort: medium # Maps to --effort flag
1014
+ ```
1015
+
1016
+ Extended thinking tokens are billed as output tokens. A test data generator doesn't
1017
+ need deep reasoning — set effort to `low`. A security reviewer benefits from thinking
1018
+ — set to `high`. Default to `medium`.
1019
+
1020
+ ### Compact Instructions
1021
+
1022
+ ```markdown
1023
+ # In CLAUDE.md
1024
+ ## Compact instructions
1025
+ When compacting, preserve: persona invocation history, findings from reviews,
1026
+ test results. Discard: file read contents, exploratory searches, tool output.
1027
+ ```
1028
+
1029
+ Teaches Claude what to keep when the context window fills up. Without this,
1030
+ compaction may discard persona findings that need to be referenced later.
1031
+
1032
+ ---
1033
+
1034
+ ## What AgentBoot Needs to Build
1035
+
1036
+ | Component | Phase | Description |
1037
+ |-----------|-------|-------------|
1038
+ | `agentboot lint` | V1 | Static prompt analysis (token budget, quality, security rules) |
1039
+ | Token budget calculation | V1 | Estimate persona context cost at build time |
1040
+ | Prompt templates | V1 | `agentboot add persona` scaffolds with best-practice structure |
1041
+ | Prompt style guide | V1 | Documentation of effective prompt patterns |
1042
+ | `agentboot test --type deterministic` | V1 | Schema, budget, composition tests (free) |
1043
+ | Model selection guidance | V1 | Matrix in docs + model field in persona.config.json |
1044
+ | `agentboot cost-estimate` | V1.5 | Per-persona cost projection |
1045
+ | Telemetry hooks | V1.5 | Generate async Stop/SubagentStop hooks for metrics |
1046
+ | `agentboot test --type behavioral` | V1.5 | LLM-based tests with expected findings |
1047
+ | `agentboot metrics` | V2 | Read NDJSON telemetry, produce reports |
1048
+ | Custom lint rules | V2 | Org-defined lint rules in config |
1049
+ | Regression/snapshot tests | V2 | Compare persona output across versions |
1050
+ | `/optimize` skill | V2+ | Automated prompt improvement suggestions |
1051
+ | A/B testing | V2+ | Run two persona versions side-by-side, compare metrics |
1052
+ | `/prompting-tips` skill | V1 | Personal skill with effective prompting patterns |
1053
+ | `argument-hint` in all skills | V1 | Prompt templates via skill invocation |
1054
+ | Prompting hints in CLAUDE.md | V1 | ~50-token always-on contextual guidance |
1055
+ | `/insights` (personal) | V1.5 | Private rephrase rate, specificity, persona discovery |
1056
+
1057
+ ---
1058
+
1059
+ *See also:*
1060
+ - [`docs/concepts.md`](concepts.md) — structured telemetry, self-improvement reflections
1061
+ - [`docs/ci-cd-automation.md`](ci-cd-automation.md) — `claude -p` with `--json-schema` and `--max-budget-usd`
1062
+ - [`docs/claude-code-reference/feature-inventory.md`](claude-code-reference/feature-inventory.md) — /cost, /compact, model pricing
1063
+ - [Manage costs effectively — Claude Code Docs](https://code.claude.com/docs/en/costs)
1064
+ - [Demystifying evals for AI agents — Anthropic](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents)
1065
+
1066
+ Sources:
1067
+ - [Manage costs effectively — Claude Code Docs](https://code.claude.com/docs/en/costs)
1068
+ - [Claude Code Cost Optimisation Guide — systemprompt.io](https://systemprompt.io/guides/claude-code-cost-optimisation)
1069
+ - [Demystifying evals for AI agents — Anthropic](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents)
1070
+ - [AI Agent Metrics: How Elite Teams Evaluate — Galileo](https://galileo.ai/blog/ai-agent-metrics)
1071
+ - [promptfoo — GitHub](https://github.com/promptfoo/promptfoo)