@event4u/agent-config 2.20.1 → 2.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/.agent-src/commands/agent-status.md +16 -0
  2. package/.agent-src/rules/caveman-speak.md +2 -0
  3. package/.agent-src/skills/adversarial-review/SKILL.md +2 -1
  4. package/.agent-src/skills/canvas-design/SKILL.md +11 -6
  5. package/.agent-src/skills/compress-memory/SKILL.md +119 -0
  6. package/.agent-src/skills/fe-design/SKILL.md +8 -0
  7. package/.agent-src/skills/prompt-optimizer/SKILL.md +29 -5
  8. package/.agent-src/skills/react-shadcn-ui/SKILL.md +9 -0
  9. package/.agent-src/skills/refine-prompt/SKILL.md +57 -0
  10. package/.agent-src/skills/tailwind-engineer/SKILL.md +14 -0
  11. package/.agent-src/templates/agents/agent-project-settings.example.yml +53 -1
  12. package/.claude-plugin/marketplace.json +2 -1
  13. package/CHANGELOG.md +101 -138
  14. package/README.md +5 -5
  15. package/docs/architecture.md +2 -2
  16. package/docs/archive/CHANGELOG-pre-2.20.0.md +159 -0
  17. package/docs/benchmarks.md +74 -0
  18. package/docs/catalog.md +5 -3
  19. package/docs/contracts/caveman-telemetry.md +83 -0
  20. package/docs/contracts/compression-default-kill-criterion.md +82 -35
  21. package/docs/contracts/cost-summary-schema.md +107 -0
  22. package/docs/contracts/file-ownership-matrix.json +48 -0
  23. package/docs/guidelines/prompt-templates.md +166 -0
  24. package/package.json +1 -1
  25. package/scripts/_lib/bench_caveman.py +273 -0
  26. package/scripts/_lib/bench_caveman_report.py +152 -0
  27. package/scripts/bench_compress_memory.py +168 -0
  28. package/scripts/bench_run.py +119 -1
  29. package/scripts/caveman_stats.py +119 -0
  30. package/scripts/check_command_count_messaging.py +2 -2
  31. package/scripts/compress_memory.py +172 -0
  32. package/scripts/cost_by_conversation.py +78 -0
  33. package/scripts/cost_summary.py +97 -0
  34. package/scripts/update_counts.py +7 -5
  35. package/scripts/validate_caveman_carveouts.py +129 -0
  36. package/scripts/validate_safe_paths.py +118 -0
  37. package/scripts/verify_roadmap_closure.py +327 -0
@@ -0,0 +1,83 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-15
4
+ ---
5
+
6
+ # caveman telemetry — multiplier contract
7
+
8
+ > **Status:** suspended (kill-criterion not met in `caveman-v1`).
9
+ > Telemetry surface records `caveman_delta_tokens = 0` until a v2 bench
10
+ > proves a positive multiplier on the load-bearing `vs_terse` arm.
11
+
12
+ ## Constant
13
+
14
+ | Key | Value | Provenance |
15
+ |---|---|---|
16
+ | `caveman_multiplier_version` | `v1` | Tied to `bench/reports/caveman-v1.{json,md}` |
17
+ | `caveman_multiplier_value` | `0.9155` | `median(terse_control_tokens / compressed_tokens)` over the 10-prompt v1 corpus |
18
+ | `caveman_multiplier_p10` | `0.4506` | 10th percentile (worst-case carve-out-tax prompts) |
19
+ | `caveman_multiplier_p90` | `2.3664` | 90th percentile (pure-prose prompts where caveman wins) |
20
+ | `caveman_multiplier_active` | `false` | **Suspended** — kill-criterion not met (`vs_terse` median −9.27 %) |
21
+
22
+ The **active** flag gates whether the multiplier is applied to runtime
23
+ telemetry. While `false`, `scripts/caveman_stats.py` reports
24
+ `caveman_delta_tokens = 0` regardless of `speak_scope` setting.
25
+
26
+ ## How the multiplier is interpreted
27
+
28
+ `caveman_estimated_uncompressed_tokens = caveman_compressed_tokens × M`,
29
+ where `M = caveman_multiplier_value`.
30
+
31
+ `caveman_delta_tokens = caveman_estimated_uncompressed_tokens − caveman_compressed_tokens`.
32
+
33
+ - `M > 1.0` → caveman compresses; `delta` is **positive** (saving).
34
+ - `M = 1.0` → break-even; no delta surfaced.
35
+ - `M < 1.0` → caveman costs more than the terse baseline; `delta` is
36
+ **negative**. Surfacing a negative saving is misleading for the
37
+ user (looks like a bug), so the contract is to **suspend the
38
+ multiplier** and record `delta = 0` until a v2 bench lifts `M`
39
+ above `1.0` on the load-bearing arm.
40
+
41
+ ## Why suspended after v1
42
+
43
+ The `caveman-v1` bench (`bench/reports/caveman-v1.md`, 30 calls,
44
+ 2026-05-16) found:
45
+
46
+ - Median savings vs raw uncompressed: **+23.51 %** (inflated by the
47
+ carve-out-tax-free pure-prose prompts).
48
+ - Median savings vs terse-control: **−9.27 %** (load-bearing).
49
+ - Carve-out-heavy prompts (path-list −108 %, mode-marker −123 %)
50
+ drag the median negative.
51
+
52
+ The terse-control arm is the kill-criterion baseline per
53
+ [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md).
54
+ Until a v2 bench (broader corpus or a re-tuned dialect) lifts the
55
+ `vs_terse` median to ≥ 0 %, the multiplier stays suspended.
56
+
57
+ ## How to lift the suspension
58
+
59
+ 1. Run an extended bench against a broader corpus (Phase 3+ work).
60
+ 2. If `median(savings_vs_terse) ≥ 0` (and ideally ≥ 30 % to flip the
61
+ rule default), recompute `caveman_multiplier_value`.
62
+ 3. Update this contract: bump `caveman_multiplier_version` to `v2`,
63
+ set `caveman_multiplier_active = true`, cite the new bench file.
64
+ 4. The change is reversible — drop back to `v1` if a regression
65
+ appears.
66
+
67
+ ## Consumers
68
+
69
+ - [`scripts/caveman_stats.py`](../../scripts/caveman_stats.py) — reads
70
+ this constant, computes per-session / per-conversation / lifetime
71
+ deltas from `agents/cost-tracking/sessions.jsonl`.
72
+ - [`scripts/cost_summary.py`](../../scripts/cost_summary.py) — emits
73
+ the stable JSON contract for inter-tool consumption per
74
+ [`cost-summary-schema.md`](cost-summary-schema.md).
75
+ - `agent-status` skill — surfaces the per-session delta in the
76
+ status report under the `[caveman: …]` widget.
77
+
78
+ ## See also
79
+
80
+ - [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) — the rule-default-flip gate; this multiplier is gated on the same `vs_terse` arm.
81
+ - [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
82
+ - [`bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
83
+ - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md) — runtime rule the multiplier measures.
@@ -5,14 +5,16 @@ keep-beta-until: 2026-08-14
5
5
 
6
6
  # Compression default — kill-criterion
7
7
 
8
- > **Status:** parked, criterion-deferred · **Owner:** `step-4-measurement-and-benchmark.md`
9
- > closeout phase · **Source:** [`council-synthesis.md` § 7](../../agents/audit-2026-05-14-north-star/council-synthesis.md)
8
+ > **Status:** v1-measured · criterion not met · default stays `off` · **Owner:** `step-16-caveman-substance.md`
9
+ > Phase 1 closeout · **Sources:** [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
10
+ > [`council-synthesis.md` § 7](../../agents/audit-2026-05-14-north-star/council-synthesis.md) ·
11
+ > [`caveman-v1-kc-verdict.json`](../../agents/council-responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
10
12
 
11
13
  ## Rule
12
14
 
13
15
  ```
14
- DEFAULT STAYS OFF UNTIL `task bench` PRODUCES A NUMBER.
15
- DECISION OWNED BY step-4 CLOSEOUT, NOT BY THIS DOC OR BY step-99.
16
+ DEFAULT STAYS OFF UNTIL `task bench -- --caveman` PRODUCES A POSITIVE vs_terse MEDIAN.
17
+ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
16
18
  ```
17
19
 
18
20
  1. **Current state.** `caveman.speak_scope` defaults `off`. Carve-outs
@@ -21,49 +23,94 @@ DECISION OWNED BY step-4 CLOSEOUT, NOT BY THIS DOC OR BY step-99.
21
23
  [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
22
24
  but the feature is non-promoted: no skill recommends turning it on,
23
25
  no preset enables it, no profile depends on it.
24
- 2. **Baseline window.** 60 days from the first green run of
25
- `task bench` against the locked 25-prompt corpus
26
- (`step-4-measurement-and-benchmark.md`
27
- Phase 2). The corpus, the model, and the cost-tracker are frozen
28
- for the window; mid-window changes restart the clock.
29
- 3. **Decision points.** After the window closes, `step-4` closeout
30
- reads `docs/parity/bench.json` and applies exactly one of:
31
-
32
- | Measured tokens saved | Quality regression on corpus | Verdict |
26
+ 2. **Baselines.** Every published `bench/reports/caveman-v<N>.{json,md}`
27
+ measures three arms (`compressed` · `terse-control` ·
28
+ `uncompressed`) and reports two savings columns:
29
+ - `vs_raw` median savings against the uncompressed arm.
30
+ - `vs_terse` **load-bearing** median savings against the
31
+ `Answer concisely.` terse-control arm. `vs_raw` is inflated by the
32
+ carve-out-tax-free pure-prose case and is **not** the gate metric.
33
+ 3. **Decision table.** Read the latest `bench/reports/caveman-v<N>.md`
34
+ and apply exactly one of:
35
+
36
+ | Measured `vs_terse` median | Quality regression on corpus | Verdict |
33
37
  |---|---|---|
34
- | < 30 % | any | **Deprecate**remove `caveman-speak` rule, archive `caveman-compress` script, retire `caveman.*` settings keys with a one-release deprecation window |
35
- | 30 % | < 5 % | **Flip default on** `caveman.speak_scope` defaults to a non-`off` value, carve-outs stay, statusline surfaces lifetime tokens saved |
36
- | ≥ 30 % | 5 % | **Hold** — repeat the window once with tuned intensity ladder; second hold deprecate |
38
+ | < 0 % | any | **Criterion not met defer.** Keep default `off`. No telemetry multiplier. Next move owned by the corpus-widening / methodology-revision step that produces `caveman-v<N+1>`. |
39
+ | 0 % < 30 % | any | **Hold.** Keep default `off`. Authorised follow-up: widen corpus or tune carve-out share; no default flip. |
40
+ | ≥ 30 % | < 5 % | **Flip default on** — `caveman.speak_scope` defaults to a non-`off` value (separate roadmap), carve-outs stay, statusline surfaces lifetime tokens saved. |
41
+ | ≥ 30 % | ≥ 5 % | **Hold** — repeat the window once with tuned intensity ladder; second hold → deprecate. |
37
42
 
38
43
  "Quality regression" = host-side rubric on the corpus per
39
- `step-4-measurement-and-benchmark.md` Phase 3. Numbers checked into
40
- `docs/parity/bench.json` as the decision artefact.
44
+ `benchmark-report-schema.md`. Numbers checked into the published
45
+ `caveman-v<N>.json` as the decision artefact.
41
46
  4. **No interim flip.** The default does not move on anecdote,
42
- gut feeling, or a single benchmark snapshot. The 60-day window and
43
- the table above are the only path to a default change.
47
+ gut feeling, or a single positive prompt. Only a published
48
+ `caveman-v<N>` report with a `vs_terse` median in the "Flip" row
49
+ above authorises a default change, under a follow-up roadmap.
50
+
51
+ ## v1 verdict (2026-05-16)
52
+
53
+ [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
54
+ landed 30 calls · $0.0805 · 0 errors · `claude-sonnet-4-5`:
55
+
56
+ | Metric | Median | p10 | p90 |
57
+ |---|---:|---:|---:|
58
+ | `vs_raw` savings | +23.51 % | -18.29 % | +52.53 % |
59
+ | **`vs_terse` savings** | **−9.27 %** | **−109.85 %** | +51.32 % |
60
+ | Realised carve-out share (compressed arm) | 30.67 % | — | — |
61
+
62
+ Per row 1 of the table, the v1 verdict is **criterion not met — defer**.
63
+ Default stays `off`; no telemetry multiplier ships; no rule retirement
64
+ in this roadmap. Wins exist only on pure-prose prompts (caveman-09
65
+ +50.5 %, caveman-10 +58.4 %); carve-out-heavy prompts drag the median
66
+ negative (caveman-04 path-list −108 %, caveman-06 mode-marker −123 %).
67
+
68
+ ### Council split (recorded, not decisive)
69
+
70
+ Council run [`caveman-v1-kc-verdict.json`](../../agents/council-responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
71
+ (2 members · 1 round · $0.0514 actual) split:
72
+
73
+ - **`claude-sonnet-4-5`** → Decision A.1 (deprecate now) + Decision B.3
74
+ (suspend telemetry). Reasoning: the roadmap pinned `vs_terse` as
75
+ load-bearing; the data falsified it; retreating to `vs_raw` is
76
+ post-hoc rationalisation.
77
+ - **`gpt-4o`** → Decision A.3 (hold + re-bench with widened corpus +
78
+ revised terse-control prompt) + Decision B.2 (per-category
79
+ multipliers, suppress negatives). Reasoning: 10 prompts is a
80
+ razor-thin sample; the terse-control prompt may under-compress; the
81
+ carve-out validator (Phase 4) is not yet shipped, so we are
82
+ measuring a half-implemented feature.
83
+
84
+ **Synthesis (criterion-not-met + defer).** Both members agreed `vs_terse`
85
+ is the right gate. Neither's strongest path is taken in full inside
86
+ step-16: deprecation is reserved for a follow-up roadmap once v2 confirms
87
+ v1; re-bench is reserved for a follow-up roadmap with the methodology
88
+ revision the council requested. Step-16 ships the infrastructure (corpus,
89
+ bench arm, validator), records the v1 verdict, suspends the telemetry
90
+ multiplier, and hands the deprecate-vs-rebench call to the v2 roadmap.
44
91
 
45
92
  ## Why this is parked, not decided
46
93
 
47
- The council split (Opus = remove now, o1 = measure-then-decide) is
48
- real. Either branch is wrong-shaped without numbers. The kill-criterion
49
- gives the audit a deterministic resolution path and stops every
50
- downstream roadmap from re-litigating compression on every PR.
94
+ The 2026-05-14 council split (Opus = remove now, o1 = measure-then-decide)
95
+ predated v1 numbers. The 2026-05-16 council split (Sonnet = deprecate now,
96
+ GPT-4o = re-bench) is informed by v1 but disagrees on which methodological
97
+ weakness is decisive. The kill table above gives every future bench run a
98
+ deterministic resolution path and stops every downstream roadmap from
99
+ re-litigating compression on every PR.
51
100
 
52
101
  ## Cross-references
53
102
 
54
- - ``step-99-north-star-restructure.md` § Phase 4`
55
- parks this criterion, does not decide.
56
- - `step-4-measurement-and-benchmark.md`
57
- owns `task bench`, the corpus, and the closeout that applies the
58
- table above.
59
- - `step-10-caveman-parity.md`
60
- — implements the carve-outs and the statusline integration the
61
- "flip default on" branch depends on; blocks the default flip until
62
- acceptance is green.
103
+ - [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
104
+ v1 measurement; canonical baseline this doc cites.
105
+ - [`docs/benchmarks.md`](../benchmarks.md)
106
+ cadence + when the next bench run is mandatory.
107
+ - [`caveman-telemetry`](caveman-telemetry.md)
108
+ multiplier contract; records the suspended state v2 must lift.
63
109
  - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
64
110
  — runtime rule; reads `caveman.speak_scope` from settings.
65
111
 
66
112
  ## Done
67
113
 
68
- This doc exists to keep the decision visible. It is **not** an action
69
- item. `step-4` closeout closes the loop.
114
+ This doc reflects the v1 verdict. It is **not** an action item. The next
115
+ bench closeout (against `caveman-v2` once a widened corpus or revised
116
+ methodology is shipped) closes the loop.
@@ -0,0 +1,107 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-15
4
+ ---
5
+
6
+ # cost-summary schema (`cost-summary/v1`)
7
+
8
+ Stable JSON contract for inter-tool consumption of cost-tracking data
9
+ emitted by [`scripts/cost_summary.py`](../../scripts/cost_summary.py).
10
+ Schema-versioned so downstream consumers can pin and migrate explicitly.
11
+
12
+ Design reference: Ruflo `scripts/summary.mjs` (upstream cite). Our shape
13
+ diverges to align with the local `agents/cost-tracking/sessions.jsonl`
14
+ fields and the caveman-suspended-multiplier contract.
15
+
16
+ ## Envelope
17
+
18
+ ```json
19
+ {
20
+ "schema_version": "cost-summary/v1",
21
+ "generated_at": "2026-05-16T23:45:00Z",
22
+ "totals": { ... },
23
+ "by_session": [ ... ],
24
+ "by_conversation": [ ... ],
25
+ "by_model": [ ... ]
26
+ }
27
+ ```
28
+
29
+ | Field | Type | Notes |
30
+ |---|---|---|
31
+ | `schema_version` | string | Pinned to `cost-summary/v1`. Downstream consumers MUST refuse unknown versions. |
32
+ | `generated_at` | string (ISO-8601 UTC, `Z` suffix) | Emit time. |
33
+ | `totals` | object | Lifetime aggregates — see `totals` below. |
34
+ | `by_session` | array | Per `sessionId` row; ordered by `sessionId` ascending. |
35
+ | `by_conversation` | array | Per `conversation_id` row; ordered by `conversation_id` ascending. |
36
+ | `by_model` | array | Per `model` row; ordered by `model` ascending. |
37
+
38
+ ## `totals` shape
39
+
40
+ ```json
41
+ {
42
+ "sessions": 123,
43
+ "total_cost_usd": 1.2345,
44
+ "input_tokens": 100000,
45
+ "output_tokens": 50000,
46
+ "caveman_delta_tokens": 0,
47
+ "caveman_multiplier_version": "v1",
48
+ "caveman_multiplier_active": false
49
+ }
50
+ ```
51
+
52
+ `caveman_delta_tokens` is always `0` while
53
+ `caveman_multiplier_active == false` — see
54
+ [`caveman-telemetry.md`](caveman-telemetry.md) for the suspension contract.
55
+
56
+ ## `by_session` / `by_conversation` row shape
57
+
58
+ ```json
59
+ {
60
+ "key": "<sessionId or conversation_id>",
61
+ "sessions": 12,
62
+ "total_cost_usd": 0.4567,
63
+ "input_tokens": 8000,
64
+ "output_tokens": 4500,
65
+ "caveman_delta_tokens": 0
66
+ }
67
+ ```
68
+
69
+ The `key` field is the grouping identifier; consumers identify the
70
+ group by inspecting which array the row lives in.
71
+
72
+ ## `by_model` row shape
73
+
74
+ ```json
75
+ {
76
+ "model": "claude-3-5-sonnet-20241022",
77
+ "sessions": 12,
78
+ "total_cost_usd": 0.4567,
79
+ "input_tokens": 8000,
80
+ "output_tokens": 4500
81
+ }
82
+ ```
83
+
84
+ `by_model` omits caveman fields — the multiplier is dialect-scoped, not
85
+ model-scoped.
86
+
87
+ ## Stability guarantees
88
+
89
+ - **Field additions** are **non-breaking**: consumers MUST ignore unknown fields.
90
+ - **Field removals or renames** bump the `schema_version` minor (`v1` → `v2`).
91
+ - **Type changes** bump the major (`v1.*` → `v2.0`).
92
+ - Downstream consumers SHOULD pin to a specific `schema_version` and
93
+ refuse unknown ones; the pin is the migration boundary.
94
+
95
+ ## Downstream consumers
96
+
97
+ - `agent-status` skill — surfaces lifetime / current-conversation slice.
98
+ - Future `cost-export-to-monitoring` scripts (deferred; trigger:
99
+ consumer request) would wrap this JSON to push to Prometheus / OTLP.
100
+
101
+ ## See also
102
+
103
+ - [`caveman-telemetry.md`](caveman-telemetry.md) — defines the
104
+ `caveman_*` fields and the suspended-multiplier contract.
105
+ - [`scripts/cost_summary.py`](../../scripts/cost_summary.py) — implementation.
106
+ - [`scripts/cost_by_conversation.py`](../../scripts/cost_by_conversation.py) — narrower per-conversation lens with the same JSONL source.
107
+ - [`scripts/caveman_stats.py`](../../scripts/caveman_stats.py) — caveman-only delta lens with the same JSONL source.
@@ -1800,6 +1800,12 @@
1800
1800
  "load_context": [],
1801
1801
  "load_context_eager": []
1802
1802
  },
1803
+ ".agent-src.uncompressed/skills/compress-memory/SKILL.md": {
1804
+ "kind": "skill",
1805
+ "rule_type": null,
1806
+ "load_context": [],
1807
+ "load_context_eager": []
1808
+ },
1803
1809
  ".agent-src.uncompressed/skills/content-funnel-design/SKILL.md": {
1804
1810
  "kind": "skill",
1805
1811
  "rule_type": null,
@@ -6396,6 +6402,13 @@
6396
6402
  "via": "self",
6397
6403
  "depth": 0
6398
6404
  },
6405
+ {
6406
+ "source": ".agent-src.uncompressed/rules/caveman-speak.md",
6407
+ "target": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
6408
+ "type": "READ_ONLY",
6409
+ "via": "body_link",
6410
+ "depth": 1
6411
+ },
6399
6412
  {
6400
6413
  "source": ".agent-src.uncompressed/rules/cli-output-handling.md",
6401
6414
  "target": ".agent-src.uncompressed/rules/cli-output-handling.md",
@@ -8048,6 +8061,34 @@
8048
8061
  "via": "self",
8049
8062
  "depth": 0
8050
8063
  },
8064
+ {
8065
+ "source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
8066
+ "target": ".agent-src.uncompressed/rules/caveman-speak.md",
8067
+ "type": "READ_ONLY",
8068
+ "via": "body_link",
8069
+ "depth": 1
8070
+ },
8071
+ {
8072
+ "source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
8073
+ "target": ".agent-src.uncompressed/rules/role-mode-adherence.md",
8074
+ "type": "READ_ONLY",
8075
+ "via": "body_link",
8076
+ "depth": 1
8077
+ },
8078
+ {
8079
+ "source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
8080
+ "target": ".agent-src.uncompressed/skills/agents-md-thin-root/SKILL.md",
8081
+ "type": "READ_ONLY",
8082
+ "via": "body_link",
8083
+ "depth": 1
8084
+ },
8085
+ {
8086
+ "source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
8087
+ "target": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
8088
+ "type": "WRITE",
8089
+ "via": "self",
8090
+ "depth": 0
8091
+ },
8051
8092
  {
8052
8093
  "source": ".agent-src.uncompressed/skills/content-funnel-design/SKILL.md",
8053
8094
  "target": ".agent-src.uncompressed/skills/activation-design/SKILL.md",
@@ -10442,6 +10483,13 @@
10442
10483
  "via": "body_link",
10443
10484
  "depth": 1
10444
10485
  },
10486
+ {
10487
+ "source": ".agent-src.uncompressed/skills/refine-prompt/SKILL.md",
10488
+ "target": ".agent-src.uncompressed/skills/prompt-optimizer/SKILL.md",
10489
+ "type": "READ_ONLY",
10490
+ "via": "body_link",
10491
+ "depth": 1
10492
+ },
10445
10493
  {
10446
10494
  "source": ".agent-src.uncompressed/skills/refine-prompt/SKILL.md",
10447
10495
  "target": ".agent-src.uncompressed/skills/refine-prompt/SKILL.md",
@@ -0,0 +1,166 @@
1
+ # Prompt Templates
2
+
3
+ Reference catalogue of prompt structures the [`prompt-optimizer`](../../.agent-src.uncompressed/skills/prompt-optimizer/SKILL.md)
4
+ skill picks from during the **Develop** step of the 4-D methodology.
5
+ Cited by the [`refine-prompt`](../../.agent-src.uncompressed/skills/refine-prompt/SKILL.md)
6
+ skill in `mini` mode for stack-aware shaping.
7
+
8
+ Templates are tools, not dogma. Pick by request type, not by upstream
9
+ whitelist — see the [Rejection note](#rejection-note) at the bottom.
10
+
11
+ ## When-to-pick rubric
12
+
13
+ | Request shape | First-choice template | Fallback |
14
+ |---|---|---|
15
+ | One-shot technical change (refactor, lint fix) | **RTF** | File-Scope |
16
+ | Multi-file refactor across a codebase | **File-Scope** | RTF |
17
+ | Marketing / brand / tone-heavy copy | **CO-STAR** | CRISPE |
18
+ | Mixed-audience explainer (technical + lay) | **CRISPE** | CO-STAR |
19
+ | Step-by-step explainer / tutorial | **RISEN** | Few-Shot |
20
+ | Pattern-heavy task (classification, extraction) | **Few-Shot** | RISEN |
21
+ | Multi-step reasoning or math | **CoT** | ReAct |
22
+ | Tool-using agent (web search, file ops) | **ReAct** | CoT |
23
+ | Image AI (Midjourney, SD, DALL·E) | **Visual Descriptor** | Reference-Image-Edit |
24
+ | Image edit from a source image | **Reference-Image-Edit** | Visual Descriptor |
25
+ | ComfyUI / node-graph image workflows | **ComfyUI** | Visual Descriptor |
26
+ | Reverse-engineer an existing good prompt | **Prompt Decompiler** | — |
27
+ | Honest critical feedback on a finished artifact (post, design, naming, proposal) — any domain | **Honest Sparring Partner** | CRISPE |
28
+
29
+ ## Text templates
30
+
31
+ ### RTF — Role · Task · Format
32
+ Smallest viable structure. Three lines: who the AI is, what to do,
33
+ how the output should look. Best for one-shot technical asks.
34
+
35
+ ### CO-STAR — Context · Objective · Style · Tone · Audience · Response
36
+ Six-slot structure originally from the Singapore GovTech prompt
37
+ study. Best for copy where audience and tone carry the message.
38
+
39
+ ### RISEN — Role · Input · Steps · Expectation · Narrowing
40
+ Step-explicit shape. Best for tutorials, walkthroughs, and any
41
+ output where the reader follows the prompt's structure.
42
+
43
+ ### CRISPE — Capacity · Role · Insight · Statement · Personality · Experiment
44
+ Six-slot, persona-heavy. Best when the *voice* matters more than
45
+ the structure (creative writing, explainers with a strong narrator).
46
+
47
+ ### CoT — Chain-of-Thought
48
+ Append "Think step by step" or "Reason aloud before answering" to
49
+ any base template. Best for multi-step reasoning, math, planning.
50
+ Pairs well with RTF or RISEN.
51
+
52
+ ### Few-Shot
53
+ Two to five worked examples inline before the actual ask. Best for
54
+ classification, extraction, format-mimicking. Costs tokens — keep
55
+ examples minimal and representative.
56
+
57
+ ### File-Scope
58
+ Codebase-aware variant of RTF. Names the files in scope, the
59
+ allowed-edit list, and the "do not touch" list explicitly. Best for
60
+ agent-driven refactors where blast radius matters.
61
+
62
+ ```
63
+ Role: senior TypeScript engineer
64
+ Files in scope: src/auth/*.ts, tests/auth/*.ts
65
+ Do not modify: src/db/*, package.json, tsconfig.json
66
+ Task: migrate from jsonwebtoken to jose; keep the exported API stable
67
+ Format: unified diff
68
+ ```
69
+
70
+ ### ReAct — Reason + Act
71
+ Interleaved thought / action / observation loop. Best for agents
72
+ with tools — web search, file ops, shell. Each cycle:
73
+
74
+ ```
75
+ Thought: <why I need this next step>
76
+ Action: <tool call>
77
+ Observation: <result>
78
+ ... repeat ...
79
+ Final Answer: <synthesis>
80
+ ```
81
+
82
+ ### Honest Sparring Partner
83
+ Domain-agnostic stance template for getting honest critical feedback on
84
+ a finished artifact — blog post, design draft, naming decision, business
85
+ proposal, care-plan, marketing copy. Works for any role (engineer,
86
+ graphic designer, nurse, founder) because the role slot is filled by
87
+ the user's actual profession, not hard-coded.
88
+
89
+ Five-slot shape:
90
+
91
+ ```
92
+ Role: <user's domain> — e.g. graphic designer, geriatric nurse, founder
93
+ Stance: honest sparring partner, not yes-man. Push back when something
94
+ is weak; acknowledge when something is solid; stay silent when
95
+ there is nothing substantive to add. No flattery openings, no
96
+ artificial criticism for its own sake.
97
+ Context-fit: ask ONE clarifying question only if real context is missing
98
+ (role, audience, constraint). Otherwise answer directly.
99
+ Artifact: <the finished thing — paste, link, or describe>
100
+ Ask: <what kind of reaction the user wants — "does the argument hold?",
101
+ "is the naming clear?", "would this land with audience X?">
102
+ ```
103
+
104
+ **Anti-pattern this rejects:** "what do you honestly think?" prompts that
105
+ either default to praise ("looks great!") or default to manufactured
106
+ criticism ("here are 5 problems...") regardless of whether the work
107
+ warrants either reaction. The stance slot makes the honest-when-warranted
108
+ contract explicit.
109
+
110
+ **Package equivalents** — inside this agent-config, the
111
+ [`adversarial-review`](../../.agent-src.uncompressed/skills/adversarial-review/SKILL.md)
112
+ skill implements the same stance via an Attack-Defend-Revise loop and is
113
+ the right tool when the user submits finished work for a critical take.
114
+ This template is for **end-users prompting their own LLM** (ChatGPT,
115
+ Claude, Gemini) outside this package.
116
+
117
+ ## Image templates
118
+
119
+ ### Visual Descriptor
120
+ Subject · style · composition · lighting · medium · mood · technical
121
+ parameters (aspect ratio, resolution). Best for Midjourney, Stable
122
+ Diffusion, DALL·E from a blank canvas.
123
+
124
+ ### Reference-Image-Edit
125
+ Source image reference + change set + preservation set. Names what
126
+ to change, what to keep, and the desired output framing. Best for
127
+ inpainting, style transfer, character consistency.
128
+
129
+ ### ComfyUI
130
+ Node-graph-aware. Names the workflow nodes (KSampler, CLIPTextEncode,
131
+ VAEDecode) and the parameter intent per node rather than the
132
+ parameter values. Best for advanced SD pipelines.
133
+
134
+ ## Reverse template
135
+
136
+ ### Prompt Decompiler
137
+ Given an existing good output, reconstruct the prompt that would
138
+ produce it. Used to mine prompts from public LLM artifacts. Not a
139
+ shaping template — a forensic one.
140
+
141
+ ## Rejection note
142
+
143
+ Upstream `nidhinjs/prompt-master` claims that only five techniques
144
+ are "safe" for production prompting:
145
+
146
+ - few-shot
147
+ - role assignment
148
+ - structured output
149
+ - constraint-based
150
+ - chain-of-thought
151
+
152
+ This package **rejects** that whitelist. CO-STAR, RISEN, CRISPE,
153
+ ReAct, and the image-AI templates above are first-class. The
154
+ "5 safe" framing came from a single benchmark on a single LLM
155
+ generation — it does not generalise. See AI Council session
156
+ `agents/council-responses/prompt-master-mini.json` (2026-05-17) for the analysis behind this rejection. <!-- council-ref-allowed: ADR decision trace -->
157
+
158
+ The right gate is request-type fit, not technique-whitelist
159
+ membership.
160
+
161
+ ## See also
162
+
163
+ - [`prompt-optimizer`](../../.agent-src.uncompressed/skills/prompt-optimizer/SKILL.md) — engine-outbound; cites this catalogue in its Develop step
164
+ - [`refine-prompt`](../../.agent-src.uncompressed/skills/refine-prompt/SKILL.md) — engine-inbound; uses templates in `mini` mode for stack-aware shaping
165
+ - [`prompt-engineering-patterns`](../../.agent-src.uncompressed/skills/prompt-engineering-patterns/SKILL.md) — production-LLM prompt patterns (sibling skill, not a catalogue)
166
+ - AI Council session: `agents/council-responses/prompt-master-mini.json` (2026-05-17) <!-- council-ref-allowed: ADR decision trace -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@event4u/agent-config",
3
- "version": "2.20.1",
3
+ "version": "2.23.0",
4
4
  "description": "Shared agent configuration \u2014 skills, rules, commands, guidelines, and templates for AI coding tools",
5
5
  "license": "MIT",
6
6
  "private": false,