@event4u/agent-config 2.20.1 → 2.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent-src/commands/agent-status.md +16 -0
- package/.agent-src/rules/caveman-speak.md +2 -0
- package/.agent-src/skills/adversarial-review/SKILL.md +2 -1
- package/.agent-src/skills/canvas-design/SKILL.md +11 -6
- package/.agent-src/skills/compress-memory/SKILL.md +119 -0
- package/.agent-src/skills/fe-design/SKILL.md +8 -0
- package/.agent-src/skills/prompt-optimizer/SKILL.md +29 -5
- package/.agent-src/skills/react-shadcn-ui/SKILL.md +9 -0
- package/.agent-src/skills/refine-prompt/SKILL.md +57 -0
- package/.agent-src/skills/tailwind-engineer/SKILL.md +14 -0
- package/.agent-src/templates/agents/agent-project-settings.example.yml +53 -1
- package/.claude-plugin/marketplace.json +2 -1
- package/CHANGELOG.md +101 -138
- package/README.md +5 -5
- package/docs/architecture.md +2 -2
- package/docs/archive/CHANGELOG-pre-2.20.0.md +159 -0
- package/docs/benchmarks.md +74 -0
- package/docs/catalog.md +5 -3
- package/docs/contracts/caveman-telemetry.md +83 -0
- package/docs/contracts/compression-default-kill-criterion.md +82 -35
- package/docs/contracts/cost-summary-schema.md +107 -0
- package/docs/contracts/file-ownership-matrix.json +48 -0
- package/docs/guidelines/prompt-templates.md +166 -0
- package/package.json +1 -1
- package/scripts/_lib/bench_caveman.py +273 -0
- package/scripts/_lib/bench_caveman_report.py +152 -0
- package/scripts/bench_compress_memory.py +168 -0
- package/scripts/bench_run.py +119 -1
- package/scripts/caveman_stats.py +119 -0
- package/scripts/check_command_count_messaging.py +2 -2
- package/scripts/compress_memory.py +172 -0
- package/scripts/cost_by_conversation.py +78 -0
- package/scripts/cost_summary.py +97 -0
- package/scripts/update_counts.py +7 -5
- package/scripts/validate_caveman_carveouts.py +129 -0
- package/scripts/validate_safe_paths.py +118 -0
- package/scripts/verify_roadmap_closure.py +327 -0
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-15
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# caveman telemetry — multiplier contract
|
|
7
|
+
|
|
8
|
+
> **Status:** suspended (kill-criterion not met in `caveman-v1`).
|
|
9
|
+
> Telemetry surface records `caveman_delta_tokens = 0` until a v2 bench
|
|
10
|
+
> proves a positive multiplier on the load-bearing `vs_terse` arm.
|
|
11
|
+
|
|
12
|
+
## Constant
|
|
13
|
+
|
|
14
|
+
| Key | Value | Provenance |
|
|
15
|
+
|---|---|---|
|
|
16
|
+
| `caveman_multiplier_version` | `v1` | Tied to `bench/reports/caveman-v1.{json,md}` |
|
|
17
|
+
| `caveman_multiplier_value` | `0.9155` | `median(terse_control_tokens / compressed_tokens)` over the 10-prompt v1 corpus |
|
|
18
|
+
| `caveman_multiplier_p10` | `0.4506` | 10th percentile (worst-case carve-out-tax prompts) |
|
|
19
|
+
| `caveman_multiplier_p90` | `2.3664` | 90th percentile (pure-prose prompts where caveman wins) |
|
|
20
|
+
| `caveman_multiplier_active` | `false` | **Suspended** — kill-criterion not met (`vs_terse` median −9.27 %) |
|
|
21
|
+
|
|
22
|
+
The **active** flag gates whether the multiplier is applied to runtime
|
|
23
|
+
telemetry. While `false`, `scripts/caveman_stats.py` reports
|
|
24
|
+
`caveman_delta_tokens = 0` regardless of `speak_scope` setting.
|
|
25
|
+
|
|
26
|
+
## How the multiplier is interpreted
|
|
27
|
+
|
|
28
|
+
`caveman_estimated_uncompressed_tokens = caveman_compressed_tokens × M`,
|
|
29
|
+
where `M = caveman_multiplier_value`.
|
|
30
|
+
|
|
31
|
+
`caveman_delta_tokens = caveman_estimated_uncompressed_tokens − caveman_compressed_tokens`.
|
|
32
|
+
|
|
33
|
+
- `M > 1.0` → caveman compresses; `delta` is **positive** (saving).
|
|
34
|
+
- `M = 1.0` → break-even; no delta surfaced.
|
|
35
|
+
- `M < 1.0` → caveman costs more than the terse baseline; `delta` is
|
|
36
|
+
**negative**. Surfacing a negative saving is misleading for the
|
|
37
|
+
user (looks like a bug), so the contract is to **suspend the
|
|
38
|
+
multiplier** and record `delta = 0` until a v2 bench lifts `M`
|
|
39
|
+
above `1.0` on the load-bearing arm.
|
|
40
|
+
|
|
41
|
+
## Why suspended after v1
|
|
42
|
+
|
|
43
|
+
The `caveman-v1` bench (`bench/reports/caveman-v1.md`, 30 calls,
|
|
44
|
+
2026-05-16) found:
|
|
45
|
+
|
|
46
|
+
- Median savings vs raw uncompressed: **+23.51 %** (inflated by the
|
|
47
|
+
carve-out-tax-free pure-prose prompts).
|
|
48
|
+
- Median savings vs terse-control: **−9.27 %** (load-bearing).
|
|
49
|
+
- Carve-out-heavy prompts (path-list −108 %, mode-marker −123 %)
|
|
50
|
+
drag the median negative.
|
|
51
|
+
|
|
52
|
+
The terse-control arm is the kill-criterion baseline per
|
|
53
|
+
[`compression-default-kill-criterion.md`](compression-default-kill-criterion.md).
|
|
54
|
+
Until a v2 bench (broader corpus or a re-tuned dialect) lifts the
|
|
55
|
+
`vs_terse` median to ≥ 0 %, the multiplier stays suspended.
|
|
56
|
+
|
|
57
|
+
## How to lift the suspension
|
|
58
|
+
|
|
59
|
+
1. Run an extended bench against a broader corpus (Phase 3+ work).
|
|
60
|
+
2. If `median(savings_vs_terse) ≥ 0` (and ideally ≥ 30 % to flip the
|
|
61
|
+
rule default), recompute `caveman_multiplier_value`.
|
|
62
|
+
3. Update this contract: bump `caveman_multiplier_version` to `v2`,
|
|
63
|
+
set `caveman_multiplier_active = true`, cite the new bench file.
|
|
64
|
+
4. The change is reversible — drop back to `v1` if a regression
|
|
65
|
+
appears.
|
|
66
|
+
|
|
67
|
+
## Consumers
|
|
68
|
+
|
|
69
|
+
- [`scripts/caveman_stats.py`](../../scripts/caveman_stats.py) — reads
|
|
70
|
+
this constant, computes per-session / per-conversation / lifetime
|
|
71
|
+
deltas from `agents/cost-tracking/sessions.jsonl`.
|
|
72
|
+
- [`scripts/cost_summary.py`](../../scripts/cost_summary.py) — emits
|
|
73
|
+
the stable JSON contract for inter-tool consumption per
|
|
74
|
+
[`cost-summary-schema.md`](cost-summary-schema.md).
|
|
75
|
+
- `agent-status` skill — surfaces the per-session delta in the
|
|
76
|
+
status report under the `[caveman: …]` widget.
|
|
77
|
+
|
|
78
|
+
## See also
|
|
79
|
+
|
|
80
|
+
- [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) — the rule-default-flip gate; this multiplier is gated on the same `vs_terse` arm.
|
|
81
|
+
- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
|
|
82
|
+
- [`bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
|
|
83
|
+
- [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md) — runtime rule the multiplier measures.
|
|
@@ -5,14 +5,16 @@ keep-beta-until: 2026-08-14
|
|
|
5
5
|
|
|
6
6
|
# Compression default — kill-criterion
|
|
7
7
|
|
|
8
|
-
> **Status:**
|
|
9
|
-
> closeout
|
|
8
|
+
> **Status:** v1-measured · criterion not met · default stays `off` · **Owner:** `step-16-caveman-substance.md`
|
|
9
|
+
> Phase 1 closeout · **Sources:** [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
|
|
10
|
+
> [`council-synthesis.md` § 7](../../agents/audit-2026-05-14-north-star/council-synthesis.md) ·
|
|
11
|
+
> [`caveman-v1-kc-verdict.json`](../../agents/council-responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
|
|
10
12
|
|
|
11
13
|
## Rule
|
|
12
14
|
|
|
13
15
|
```
|
|
14
|
-
DEFAULT STAYS OFF UNTIL `task bench` PRODUCES A
|
|
15
|
-
DECISION OWNED BY
|
|
16
|
+
DEFAULT STAYS OFF UNTIL `task bench -- --caveman` PRODUCES A POSITIVE vs_terse MEDIAN.
|
|
17
|
+
DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
|
|
16
18
|
```
|
|
17
19
|
|
|
18
20
|
1. **Current state.** `caveman.speak_scope` defaults `off`. Carve-outs
|
|
@@ -21,49 +23,94 @@ DECISION OWNED BY step-4 CLOSEOUT, NOT BY THIS DOC OR BY step-99.
|
|
|
21
23
|
[`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
|
|
22
24
|
but the feature is non-promoted: no skill recommends turning it on,
|
|
23
25
|
no preset enables it, no profile depends on it.
|
|
24
|
-
2. **
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
26
|
+
2. **Baselines.** Every published `bench/reports/caveman-v<N>.{json,md}`
|
|
27
|
+
measures three arms (`compressed` · `terse-control` ·
|
|
28
|
+
`uncompressed`) and reports two savings columns:
|
|
29
|
+
- `vs_raw` — median savings against the uncompressed arm.
|
|
30
|
+
- `vs_terse` — **load-bearing** median savings against the
|
|
31
|
+
`Answer concisely.` terse-control arm. `vs_raw` is inflated by the
|
|
32
|
+
carve-out-tax-free pure-prose case and is **not** the gate metric.
|
|
33
|
+
3. **Decision table.** Read the latest `bench/reports/caveman-v<N>.md`
|
|
34
|
+
and apply exactly one of:
|
|
35
|
+
|
|
36
|
+
| Measured `vs_terse` median | Quality regression on corpus | Verdict |
|
|
33
37
|
|---|---|---|
|
|
34
|
-
| <
|
|
35
|
-
|
|
|
36
|
-
| ≥ 30 % |
|
|
38
|
+
| < 0 % | any | **Criterion not met — defer.** Keep default `off`. No telemetry multiplier. Next move owned by the corpus-widening / methodology-revision step that produces `caveman-v<N+1>`. |
|
|
39
|
+
| 0 % – < 30 % | any | **Hold.** Keep default `off`. Authorised follow-up: widen corpus or tune carve-out share; no default flip. |
|
|
40
|
+
| ≥ 30 % | < 5 % | **Flip default on** — `caveman.speak_scope` defaults to a non-`off` value (separate roadmap), carve-outs stay, statusline surfaces lifetime tokens saved. |
|
|
41
|
+
| ≥ 30 % | ≥ 5 % | **Hold** — repeat the window once with tuned intensity ladder; second hold → deprecate. |
|
|
37
42
|
|
|
38
43
|
"Quality regression" = host-side rubric on the corpus per
|
|
39
|
-
`
|
|
40
|
-
`
|
|
44
|
+
`benchmark-report-schema.md`. Numbers checked into the published
|
|
45
|
+
`caveman-v<N>.json` as the decision artefact.
|
|
41
46
|
4. **No interim flip.** The default does not move on anecdote,
|
|
42
|
-
gut feeling, or a single
|
|
43
|
-
|
|
47
|
+
gut feeling, or a single positive prompt. Only a published
|
|
48
|
+
`caveman-v<N>` report with a `vs_terse` median in the "Flip" row
|
|
49
|
+
above authorises a default change, under a follow-up roadmap.
|
|
50
|
+
|
|
51
|
+
## v1 verdict (2026-05-16)
|
|
52
|
+
|
|
53
|
+
[`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
|
|
54
|
+
landed 30 calls · $0.0805 · 0 errors · `claude-sonnet-4-5`:
|
|
55
|
+
|
|
56
|
+
| Metric | Median | p10 | p90 |
|
|
57
|
+
|---|---:|---:|---:|
|
|
58
|
+
| `vs_raw` savings | +23.51 % | -18.29 % | +52.53 % |
|
|
59
|
+
| **`vs_terse` savings** | **−9.27 %** | **−109.85 %** | +51.32 % |
|
|
60
|
+
| Realised carve-out share (compressed arm) | 30.67 % | — | — |
|
|
61
|
+
|
|
62
|
+
Per row 1 of the table, the v1 verdict is **criterion not met — defer**.
|
|
63
|
+
Default stays `off`; no telemetry multiplier ships; no rule retirement
|
|
64
|
+
in this roadmap. Wins exist only on pure-prose prompts (caveman-09
|
|
65
|
+
+50.5 %, caveman-10 +58.4 %); carve-out-heavy prompts drag the median
|
|
66
|
+
negative (caveman-04 path-list −108 %, caveman-06 mode-marker −123 %).
|
|
67
|
+
|
|
68
|
+
### Council split (recorded, not decisive)
|
|
69
|
+
|
|
70
|
+
Council run [`caveman-v1-kc-verdict.json`](../../agents/council-responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
|
|
71
|
+
(2 members · 1 round · $0.0514 actual) split:
|
|
72
|
+
|
|
73
|
+
- **`claude-sonnet-4-5`** → Decision A.1 (deprecate now) + Decision B.3
|
|
74
|
+
(suspend telemetry). Reasoning: the roadmap pinned `vs_terse` as
|
|
75
|
+
load-bearing; the data falsified it; retreating to `vs_raw` is
|
|
76
|
+
post-hoc rationalisation.
|
|
77
|
+
- **`gpt-4o`** → Decision A.3 (hold + re-bench with widened corpus +
|
|
78
|
+
revised terse-control prompt) + Decision B.2 (per-category
|
|
79
|
+
multipliers, suppress negatives). Reasoning: 10 prompts is a
|
|
80
|
+
razor-thin sample; the terse-control prompt may under-compress; the
|
|
81
|
+
carve-out validator (Phase 4) is not yet shipped, so we are
|
|
82
|
+
measuring a half-implemented feature.
|
|
83
|
+
|
|
84
|
+
**Synthesis (criterion-not-met + defer).** Both members agreed `vs_terse`
|
|
85
|
+
is the right gate. Neither's strongest path is taken in full inside
|
|
86
|
+
step-16: deprecation is reserved for a follow-up roadmap once v2 confirms
|
|
87
|
+
v1; re-bench is reserved for a follow-up roadmap with the methodology
|
|
88
|
+
revision the council requested. Step-16 ships the infrastructure (corpus,
|
|
89
|
+
bench arm, validator), records the v1 verdict, suspends the telemetry
|
|
90
|
+
multiplier, and hands the deprecate-vs-rebench call to the v2 roadmap.
|
|
44
91
|
|
|
45
92
|
## Why this is parked, not decided
|
|
46
93
|
|
|
47
|
-
The council split (Opus = remove now, o1 = measure-then-decide)
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
94
|
+
The 2026-05-14 council split (Opus = remove now, o1 = measure-then-decide)
|
|
95
|
+
predated v1 numbers. The 2026-05-16 council split (Sonnet = deprecate now,
|
|
96
|
+
GPT-4o = re-bench) is informed by v1 but disagrees on which methodological
|
|
97
|
+
weakness is decisive. The kill table above gives every future bench run a
|
|
98
|
+
deterministic resolution path and stops every downstream roadmap from
|
|
99
|
+
re-litigating compression on every PR.
|
|
51
100
|
|
|
52
101
|
## Cross-references
|
|
53
102
|
|
|
54
|
-
-
|
|
55
|
-
—
|
|
56
|
-
- `
|
|
57
|
-
—
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
— implements the carve-outs and the statusline integration the
|
|
61
|
-
"flip default on" branch depends on; blocks the default flip until
|
|
62
|
-
acceptance is green.
|
|
103
|
+
- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
|
|
104
|
+
— v1 measurement; canonical baseline this doc cites.
|
|
105
|
+
- [`docs/benchmarks.md`](../benchmarks.md)
|
|
106
|
+
— cadence + when the next bench run is mandatory.
|
|
107
|
+
- [`caveman-telemetry`](caveman-telemetry.md)
|
|
108
|
+
— multiplier contract; records the suspended state v2 must lift.
|
|
63
109
|
- [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
|
|
64
110
|
— runtime rule; reads `caveman.speak_scope` from settings.
|
|
65
111
|
|
|
66
112
|
## Done
|
|
67
113
|
|
|
68
|
-
This doc
|
|
69
|
-
|
|
114
|
+
This doc reflects the v1 verdict. It is **not** an action item. The next
|
|
115
|
+
bench closeout (against `caveman-v2` once a widened corpus or revised
|
|
116
|
+
methodology is shipped) closes the loop.
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-15
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# cost-summary schema (`cost-summary/v1`)
|
|
7
|
+
|
|
8
|
+
Stable JSON contract for inter-tool consumption of cost-tracking data
|
|
9
|
+
emitted by [`scripts/cost_summary.py`](../../scripts/cost_summary.py).
|
|
10
|
+
Schema-versioned so downstream consumers can pin and migrate explicitly.
|
|
11
|
+
|
|
12
|
+
Design reference: Ruflo `scripts/summary.mjs` (upstream cite). Our shape
|
|
13
|
+
diverges to align with the local `agents/cost-tracking/sessions.jsonl`
|
|
14
|
+
fields and the caveman-suspended-multiplier contract.
|
|
15
|
+
|
|
16
|
+
## Envelope
|
|
17
|
+
|
|
18
|
+
```json
|
|
19
|
+
{
|
|
20
|
+
"schema_version": "cost-summary/v1",
|
|
21
|
+
"generated_at": "2026-05-16T23:45:00Z",
|
|
22
|
+
"totals": { ... },
|
|
23
|
+
"by_session": [ ... ],
|
|
24
|
+
"by_conversation": [ ... ],
|
|
25
|
+
"by_model": [ ... ]
|
|
26
|
+
}
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
| Field | Type | Notes |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| `schema_version` | string | Pinned to `cost-summary/v1`. Downstream consumers MUST refuse unknown versions. |
|
|
32
|
+
| `generated_at` | string (ISO-8601 UTC, `Z` suffix) | Emit time. |
|
|
33
|
+
| `totals` | object | Lifetime aggregates — see `totals` below. |
|
|
34
|
+
| `by_session` | array | Per `sessionId` row; ordered by `sessionId` ascending. |
|
|
35
|
+
| `by_conversation` | array | Per `conversation_id` row; ordered by `conversation_id` ascending. |
|
|
36
|
+
| `by_model` | array | Per `model` row; ordered by `model` ascending. |
|
|
37
|
+
|
|
38
|
+
## `totals` shape
|
|
39
|
+
|
|
40
|
+
```json
|
|
41
|
+
{
|
|
42
|
+
"sessions": 123,
|
|
43
|
+
"total_cost_usd": 1.2345,
|
|
44
|
+
"input_tokens": 100000,
|
|
45
|
+
"output_tokens": 50000,
|
|
46
|
+
"caveman_delta_tokens": 0,
|
|
47
|
+
"caveman_multiplier_version": "v1",
|
|
48
|
+
"caveman_multiplier_active": false
|
|
49
|
+
}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
`caveman_delta_tokens` is always `0` while
|
|
53
|
+
`caveman_multiplier_active == false` — see
|
|
54
|
+
[`caveman-telemetry.md`](caveman-telemetry.md) for the suspension contract.
|
|
55
|
+
|
|
56
|
+
## `by_session` / `by_conversation` row shape
|
|
57
|
+
|
|
58
|
+
```json
|
|
59
|
+
{
|
|
60
|
+
"key": "<sessionId or conversation_id>",
|
|
61
|
+
"sessions": 12,
|
|
62
|
+
"total_cost_usd": 0.4567,
|
|
63
|
+
"input_tokens": 8000,
|
|
64
|
+
"output_tokens": 4500,
|
|
65
|
+
"caveman_delta_tokens": 0
|
|
66
|
+
}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
The `key` field is the grouping identifier; consumers identify the
|
|
70
|
+
group by inspecting which array the row lives in.
|
|
71
|
+
|
|
72
|
+
## `by_model` row shape
|
|
73
|
+
|
|
74
|
+
```json
|
|
75
|
+
{
|
|
76
|
+
"model": "claude-3-5-sonnet-20241022",
|
|
77
|
+
"sessions": 12,
|
|
78
|
+
"total_cost_usd": 0.4567,
|
|
79
|
+
"input_tokens": 8000,
|
|
80
|
+
"output_tokens": 4500
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
`by_model` omits caveman fields — the multiplier is dialect-scoped, not
|
|
85
|
+
model-scoped.
|
|
86
|
+
|
|
87
|
+
## Stability guarantees
|
|
88
|
+
|
|
89
|
+
- **Field additions** are **non-breaking**: consumers MUST ignore unknown fields.
|
|
90
|
+
- **Field removals or renames** bump the `schema_version` minor (`v1` → `v2`).
|
|
91
|
+
- **Type changes** bump the major (`v1.*` → `v2.0`).
|
|
92
|
+
- Downstream consumers SHOULD pin to a specific `schema_version` and
|
|
93
|
+
refuse unknown ones; the pin is the migration boundary.
|
|
94
|
+
|
|
95
|
+
## Downstream consumers
|
|
96
|
+
|
|
97
|
+
- `agent-status` skill — surfaces lifetime / current-conversation slice.
|
|
98
|
+
- Future `cost-export-to-monitoring` scripts (deferred; trigger:
|
|
99
|
+
consumer request) would wrap this JSON to push to Prometheus / OTLP.
|
|
100
|
+
|
|
101
|
+
## See also
|
|
102
|
+
|
|
103
|
+
- [`caveman-telemetry.md`](caveman-telemetry.md) — defines the
|
|
104
|
+
`caveman_*` fields and the suspended-multiplier contract.
|
|
105
|
+
- [`scripts/cost_summary.py`](../../scripts/cost_summary.py) — implementation.
|
|
106
|
+
- [`scripts/cost_by_conversation.py`](../../scripts/cost_by_conversation.py) — narrower per-conversation lens with the same JSONL source.
|
|
107
|
+
- [`scripts/caveman_stats.py`](../../scripts/caveman_stats.py) — caveman-only delta lens with the same JSONL source.
|
|
@@ -1800,6 +1800,12 @@
|
|
|
1800
1800
|
"load_context": [],
|
|
1801
1801
|
"load_context_eager": []
|
|
1802
1802
|
},
|
|
1803
|
+
".agent-src.uncompressed/skills/compress-memory/SKILL.md": {
|
|
1804
|
+
"kind": "skill",
|
|
1805
|
+
"rule_type": null,
|
|
1806
|
+
"load_context": [],
|
|
1807
|
+
"load_context_eager": []
|
|
1808
|
+
},
|
|
1803
1809
|
".agent-src.uncompressed/skills/content-funnel-design/SKILL.md": {
|
|
1804
1810
|
"kind": "skill",
|
|
1805
1811
|
"rule_type": null,
|
|
@@ -6396,6 +6402,13 @@
|
|
|
6396
6402
|
"via": "self",
|
|
6397
6403
|
"depth": 0
|
|
6398
6404
|
},
|
|
6405
|
+
{
|
|
6406
|
+
"source": ".agent-src.uncompressed/rules/caveman-speak.md",
|
|
6407
|
+
"target": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
|
|
6408
|
+
"type": "READ_ONLY",
|
|
6409
|
+
"via": "body_link",
|
|
6410
|
+
"depth": 1
|
|
6411
|
+
},
|
|
6399
6412
|
{
|
|
6400
6413
|
"source": ".agent-src.uncompressed/rules/cli-output-handling.md",
|
|
6401
6414
|
"target": ".agent-src.uncompressed/rules/cli-output-handling.md",
|
|
@@ -8048,6 +8061,34 @@
|
|
|
8048
8061
|
"via": "self",
|
|
8049
8062
|
"depth": 0
|
|
8050
8063
|
},
|
|
8064
|
+
{
|
|
8065
|
+
"source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
|
|
8066
|
+
"target": ".agent-src.uncompressed/rules/caveman-speak.md",
|
|
8067
|
+
"type": "READ_ONLY",
|
|
8068
|
+
"via": "body_link",
|
|
8069
|
+
"depth": 1
|
|
8070
|
+
},
|
|
8071
|
+
{
|
|
8072
|
+
"source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
|
|
8073
|
+
"target": ".agent-src.uncompressed/rules/role-mode-adherence.md",
|
|
8074
|
+
"type": "READ_ONLY",
|
|
8075
|
+
"via": "body_link",
|
|
8076
|
+
"depth": 1
|
|
8077
|
+
},
|
|
8078
|
+
{
|
|
8079
|
+
"source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
|
|
8080
|
+
"target": ".agent-src.uncompressed/skills/agents-md-thin-root/SKILL.md",
|
|
8081
|
+
"type": "READ_ONLY",
|
|
8082
|
+
"via": "body_link",
|
|
8083
|
+
"depth": 1
|
|
8084
|
+
},
|
|
8085
|
+
{
|
|
8086
|
+
"source": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
|
|
8087
|
+
"target": ".agent-src.uncompressed/skills/compress-memory/SKILL.md",
|
|
8088
|
+
"type": "WRITE",
|
|
8089
|
+
"via": "self",
|
|
8090
|
+
"depth": 0
|
|
8091
|
+
},
|
|
8051
8092
|
{
|
|
8052
8093
|
"source": ".agent-src.uncompressed/skills/content-funnel-design/SKILL.md",
|
|
8053
8094
|
"target": ".agent-src.uncompressed/skills/activation-design/SKILL.md",
|
|
@@ -10442,6 +10483,13 @@
|
|
|
10442
10483
|
"via": "body_link",
|
|
10443
10484
|
"depth": 1
|
|
10444
10485
|
},
|
|
10486
|
+
{
|
|
10487
|
+
"source": ".agent-src.uncompressed/skills/refine-prompt/SKILL.md",
|
|
10488
|
+
"target": ".agent-src.uncompressed/skills/prompt-optimizer/SKILL.md",
|
|
10489
|
+
"type": "READ_ONLY",
|
|
10490
|
+
"via": "body_link",
|
|
10491
|
+
"depth": 1
|
|
10492
|
+
},
|
|
10445
10493
|
{
|
|
10446
10494
|
"source": ".agent-src.uncompressed/skills/refine-prompt/SKILL.md",
|
|
10447
10495
|
"target": ".agent-src.uncompressed/skills/refine-prompt/SKILL.md",
|
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
# Prompt Templates
|
|
2
|
+
|
|
3
|
+
Reference catalogue of prompt structures the [`prompt-optimizer`](../../.agent-src.uncompressed/skills/prompt-optimizer/SKILL.md)
|
|
4
|
+
skill picks from during the **Develop** step of the 4-D methodology.
|
|
5
|
+
Cited by the [`refine-prompt`](../../.agent-src.uncompressed/skills/refine-prompt/SKILL.md)
|
|
6
|
+
skill in `mini` mode for stack-aware shaping.
|
|
7
|
+
|
|
8
|
+
Templates are tools, not dogma. Pick by request type, not by upstream
|
|
9
|
+
whitelist — see the [Rejection note](#rejection-note) at the bottom.
|
|
10
|
+
|
|
11
|
+
## When-to-pick rubric
|
|
12
|
+
|
|
13
|
+
| Request shape | First-choice template | Fallback |
|
|
14
|
+
|---|---|---|
|
|
15
|
+
| One-shot technical change (refactor, lint fix) | **RTF** | File-Scope |
|
|
16
|
+
| Multi-file refactor across a codebase | **File-Scope** | RTF |
|
|
17
|
+
| Marketing / brand / tone-heavy copy | **CO-STAR** | CRISPE |
|
|
18
|
+
| Mixed-audience explainer (technical + lay) | **CRISPE** | CO-STAR |
|
|
19
|
+
| Step-by-step explainer / tutorial | **RISEN** | Few-Shot |
|
|
20
|
+
| Pattern-heavy task (classification, extraction) | **Few-Shot** | RISEN |
|
|
21
|
+
| Multi-step reasoning or math | **CoT** | ReAct |
|
|
22
|
+
| Tool-using agent (web search, file ops) | **ReAct** | CoT |
|
|
23
|
+
| Image AI (Midjourney, SD, DALL·E) | **Visual Descriptor** | Reference-Image-Edit |
|
|
24
|
+
| Image edit from a source image | **Reference-Image-Edit** | Visual Descriptor |
|
|
25
|
+
| ComfyUI / node-graph image workflows | **ComfyUI** | Visual Descriptor |
|
|
26
|
+
| Reverse-engineer an existing good prompt | **Prompt Decompiler** | — |
|
|
27
|
+
| Honest critical feedback on a finished artifact (post, design, naming, proposal) — any domain | **Honest Sparring Partner** | CRISPE |
|
|
28
|
+
|
|
29
|
+
## Text templates
|
|
30
|
+
|
|
31
|
+
### RTF — Role · Task · Format
|
|
32
|
+
Smallest viable structure. Three lines: who the AI is, what to do,
|
|
33
|
+
how the output should look. Best for one-shot technical asks.
|
|
34
|
+
|
|
35
|
+
### CO-STAR — Context · Objective · Style · Tone · Audience · Response
|
|
36
|
+
Six-slot structure originally from the Singapore GovTech prompt
|
|
37
|
+
study. Best for copy where audience and tone carry the message.
|
|
38
|
+
|
|
39
|
+
### RISEN — Role · Input · Steps · Expectation · Narrowing
|
|
40
|
+
Step-explicit shape. Best for tutorials, walkthroughs, and any
|
|
41
|
+
output where the reader follows the prompt's structure.
|
|
42
|
+
|
|
43
|
+
### CRISPE — Capacity · Role · Insight · Statement · Personality · Experiment
|
|
44
|
+
Six-slot, persona-heavy. Best when the *voice* matters more than
|
|
45
|
+
the structure (creative writing, explainers with a strong narrator).
|
|
46
|
+
|
|
47
|
+
### CoT — Chain-of-Thought
|
|
48
|
+
Append "Think step by step" or "Reason aloud before answering" to
|
|
49
|
+
any base template. Best for multi-step reasoning, math, planning.
|
|
50
|
+
Pairs well with RTF or RISEN.
|
|
51
|
+
|
|
52
|
+
### Few-Shot
|
|
53
|
+
Two to five worked examples inline before the actual ask. Best for
|
|
54
|
+
classification, extraction, format-mimicking. Costs tokens — keep
|
|
55
|
+
examples minimal and representative.
|
|
56
|
+
|
|
57
|
+
### File-Scope
|
|
58
|
+
Codebase-aware variant of RTF. Names the files in scope, the
|
|
59
|
+
allowed-edit list, and the "do not touch" list explicitly. Best for
|
|
60
|
+
agent-driven refactors where blast radius matters.
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
Role: senior TypeScript engineer
|
|
64
|
+
Files in scope: src/auth/*.ts, tests/auth/*.ts
|
|
65
|
+
Do not modify: src/db/*, package.json, tsconfig.json
|
|
66
|
+
Task: migrate from jsonwebtoken to jose; keep the exported API stable
|
|
67
|
+
Format: unified diff
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### ReAct — Reason + Act
|
|
71
|
+
Interleaved thought / action / observation loop. Best for agents
|
|
72
|
+
with tools — web search, file ops, shell. Each cycle:
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
Thought: <why I need this next step>
|
|
76
|
+
Action: <tool call>
|
|
77
|
+
Observation: <result>
|
|
78
|
+
... repeat ...
|
|
79
|
+
Final Answer: <synthesis>
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Honest Sparring Partner
|
|
83
|
+
Domain-agnostic stance template for getting honest critical feedback on
|
|
84
|
+
a finished artifact — blog post, design draft, naming decision, business
|
|
85
|
+
proposal, care-plan, marketing copy. Works for any role (engineer,
|
|
86
|
+
graphic designer, nurse, founder) because the role slot is filled by
|
|
87
|
+
the user's actual profession, not hard-coded.
|
|
88
|
+
|
|
89
|
+
Five-slot shape:
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
Role: <user's domain> — e.g. graphic designer, geriatric nurse, founder
|
|
93
|
+
Stance: honest sparring partner, not yes-man. Push back when something
|
|
94
|
+
is weak; acknowledge when something is solid; stay silent when
|
|
95
|
+
there is nothing substantive to add. No flattery openings, no
|
|
96
|
+
artificial criticism for its own sake.
|
|
97
|
+
Context-fit: ask ONE clarifying question only if real context is missing
|
|
98
|
+
(role, audience, constraint). Otherwise answer directly.
|
|
99
|
+
Artifact: <the finished thing — paste, link, or describe>
|
|
100
|
+
Ask: <what kind of reaction the user wants — "does the argument hold?",
|
|
101
|
+
"is the naming clear?", "would this land with audience X?">
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Anti-pattern this rejects:** "what do you honestly think?" prompts that
|
|
105
|
+
either default to praise ("looks great!") or default to manufactured
|
|
106
|
+
criticism ("here are 5 problems...") regardless of whether the work
|
|
107
|
+
warrants either reaction. The stance slot makes the honest-when-warranted
|
|
108
|
+
contract explicit.
|
|
109
|
+
|
|
110
|
+
**Package equivalents** — inside this agent-config, the
|
|
111
|
+
[`adversarial-review`](../../.agent-src.uncompressed/skills/adversarial-review/SKILL.md)
|
|
112
|
+
skill implements the same stance via an Attack-Defend-Revise loop and is
|
|
113
|
+
the right tool when the user submits finished work for a critical take.
|
|
114
|
+
This template is for **end-users prompting their own LLM** (ChatGPT,
|
|
115
|
+
Claude, Gemini) outside this package.
|
|
116
|
+
|
|
117
|
+
## Image templates
|
|
118
|
+
|
|
119
|
+
### Visual Descriptor
|
|
120
|
+
Subject · style · composition · lighting · medium · mood · technical
|
|
121
|
+
parameters (aspect ratio, resolution). Best for Midjourney, Stable
|
|
122
|
+
Diffusion, DALL·E from a blank canvas.
|
|
123
|
+
|
|
124
|
+
### Reference-Image-Edit
|
|
125
|
+
Source image reference + change set + preservation set. Names what
|
|
126
|
+
to change, what to keep, and the desired output framing. Best for
|
|
127
|
+
inpainting, style transfer, character consistency.
|
|
128
|
+
|
|
129
|
+
### ComfyUI
|
|
130
|
+
Node-graph-aware. Names the workflow nodes (KSampler, CLIPTextEncode,
|
|
131
|
+
VAEDecode) and the parameter intent per node rather than the
|
|
132
|
+
parameter values. Best for advanced SD pipelines.
|
|
133
|
+
|
|
134
|
+
## Reverse template
|
|
135
|
+
|
|
136
|
+
### Prompt Decompiler
|
|
137
|
+
Given an existing good output, reconstruct the prompt that would
|
|
138
|
+
produce it. Used to mine prompts from public LLM artifacts. Not a
|
|
139
|
+
shaping template — a forensic one.
|
|
140
|
+
|
|
141
|
+
## Rejection note
|
|
142
|
+
|
|
143
|
+
Upstream `nidhinjs/prompt-master` claims that only five techniques
|
|
144
|
+
are "safe" for production prompting:
|
|
145
|
+
|
|
146
|
+
- few-shot
|
|
147
|
+
- role assignment
|
|
148
|
+
- structured output
|
|
149
|
+
- constraint-based
|
|
150
|
+
- chain-of-thought
|
|
151
|
+
|
|
152
|
+
This package **rejects** that whitelist. CO-STAR, RISEN, CRISPE,
|
|
153
|
+
ReAct, and the image-AI templates above are first-class. The
|
|
154
|
+
"5 safe" framing came from a single benchmark on a single LLM
|
|
155
|
+
generation — it does not generalise. See AI Council session
|
|
156
|
+
`agents/council-responses/prompt-master-mini.json` (2026-05-17) for the analysis behind this rejection. <!-- council-ref-allowed: ADR decision trace -->
|
|
157
|
+
|
|
158
|
+
The right gate is request-type fit, not technique-whitelist
|
|
159
|
+
membership.
|
|
160
|
+
|
|
161
|
+
## See also
|
|
162
|
+
|
|
163
|
+
- [`prompt-optimizer`](../../.agent-src.uncompressed/skills/prompt-optimizer/SKILL.md) — engine-outbound; cites this catalogue in its Develop step
|
|
164
|
+
- [`refine-prompt`](../../.agent-src.uncompressed/skills/refine-prompt/SKILL.md) — engine-inbound; uses templates in `mini` mode for stack-aware shaping
|
|
165
|
+
- [`prompt-engineering-patterns`](../../.agent-src.uncompressed/skills/prompt-engineering-patterns/SKILL.md) — production-LLM prompt patterns (sibling skill, not a catalogue)
|
|
166
|
+
- AI Council session: `agents/council-responses/prompt-master-mini.json` (2026-05-17) <!-- council-ref-allowed: ADR decision trace -->
|
package/package.json
CHANGED