npm - @event4u/agent-config - Versions diffs - 6.0.0 → 6.1.0 - Mend

@event4u/agent-config 6.0.0 → 6.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (378) hide show

package/src/scripts/render_benchmark_md.py CHANGED Viewed

@@ -79,6 +79,16 @@ def safe_load(path: Path | None) -> dict:
         return {}
+def latest_trackb_with_rdp() -> dict:
+    """Latest Track B report for the third condition (`with-rdp`), or {}."""
+    reports = sorted(REPORTS_DIR.glob("*-ab-trackb-with-rdp.json"))
+    return safe_load(reports[-1]) if reports else {}
+def _delta_pct(a: float | None, b: float | None) -> str:
+    return fmt_pct((a or 0) - (b or 0))
 def fmt_pct(value: float | None) -> str:
     if value is None:
         return "—"
@@ -91,33 +101,79 @@ def fmt_num(value: float | None, places: int = 2) -> str:
     return f"{value:.{places}f}"
-def render_headline(track_a: dict, track_b: dict) -> str:
-    a_results = (track_a.get("with") or {}).get("results", {})
-    a_without = (track_a.get("without") or {}).get("results", {})
-    b_results = (track_b.get("with") or {}).get("results", {})
-    b_without = (track_b.get("without") or {}).get("results", {})
-    a_with_acc = a_results.get("trigger_accuracy")
-    a_wo_acc = a_without.get("trigger_accuracy")
-    b_with_comp = b_results.get("completion_rate")
-    b_wo_comp = b_without.get("completion_rate")
+def fmt_int(value: float | None) -> str:
+    if value is None:
+        return "—"
+    return f"{int(value):,}"
+def _delta_num(a: float | None, b: float | None, places: int = 3) -> str:
+    d = (a or 0) - (b or 0)
+    return f"{d:+.{places}f}"
+def render_headline(track_a: dict, track_b: dict, track_b_rdp: dict) -> str:
+    wo = (track_b.get("without") or {}).get("results", {})
+    wi = (track_b.get("with") or {}).get("results", {})
+    rd = (track_b_rdp or {}).get("results", {})
+    mode = wi.get("mode") or wo.get("mode") or rd.get("mode") or "—"
+    total = wi.get("total") or wo.get("total") or rd.get("total") or 0
+    dry = mode != "live"
     lines = [
         "## Headline",
         "",
-        "> **Track A confirms surface availability** — a precondition, not an impact metric. "
-        "For the impact view (cost-ladder + behaviour with vs. without), see "
-        "[`docs/value.md`](value.md).",
+        "> **Lift of agent-config on the host model — NOT a model-vs-model benchmark.** "
+        "This measures what the package + the RDP reasoning lift do to a *fixed* host "
+        "model on a neutral fixture; it is not comparable to public SWE-bench / "
+        "Fable-5 model scores (different question entirely).",
+        "",
+    ]
+    if dry:
+        lines += [
+            "> ⚠️ **DRY RUN — no model calls were made; every cell is 0/N by construction.** "
+            "This shows the *shape* the real numbers will fill. Run `task bench:ab:live` "
+            "(billable) for actual results.",
+            "",
+        ]
+    err_bits = []
+    for name, res in (("without", wo), ("with", wi), ("with-rdp", rd)):
+        e = res.get("errored") or 0
+        if e:
+            err_bits.append(f"{name}: {e}/{res.get('total', 0)}")
+    lines += [
+        f"> ⚠️ **Low statistical power: corpus N={total} (< 40).** Directional only; "
+        "per-cell N is shown below. The `long × mechanical` cell is intentionally "
+        "empty (documented hole, not an error).",
+        "",
+    ]
+    if err_bits:
+        lines += [
+            "> ⚠️ **Some tasks errored (rate-limit / budget-cap / timeout) and are "
+            "excluded from the hit-rate** — they are NOT content failures. Errored "
+            f"counts — {'; '.join(err_bits)}. Hit-rate is computed over completed tasks only.",
+            "",
+        ]
+    lines += [
+        "_Host model + inference config (temp / top-p / max-tokens) are recorded in "
+        "Methodology and must be cited with any quoted number._",
+        "",
+        "### Table 1 — Package value (without → with)",
+        "",
+        "| Metric | without | with | delta |",
+        "|---|---|---|---|",
+        f"| Success / hit-rate | {fmt_pct(wo.get('completion_rate'))} | {fmt_pct(wi.get('completion_rate'))} | {_delta_pct(wi.get('completion_rate'), wo.get('completion_rate'))} |",
+        f"| Mean wall-time | {fmt_num(wo.get('mean_wall_time'))}s | {fmt_num(wi.get('mean_wall_time'))}s | {fmt_num((wi.get('mean_wall_time') or 0) - (wo.get('mean_wall_time') or 0))}s |",
+        f"| Ask-vs-act ratio | {fmt_num(wo.get('ask_vs_act_ratio'), 3)} | {fmt_num(wi.get('ask_vs_act_ratio'), 3)} | {_delta_num(wi.get('ask_vs_act_ratio'), wo.get('ask_vs_act_ratio'))} |",
+        f"| Total tokens | {fmt_int(wo.get('total_tokens'))} | {fmt_int(wi.get('total_tokens'))} | {fmt_int((wi.get('total_tokens') or 0) - (wo.get('total_tokens') or 0))} |",
+        "",
+        "### Table 2 — RDP reasoning lift (with → with-rdp)",
         "",
-        "| Metric | with | without | delta |",
+        "| Metric | with | with-rdp | delta |",
         "|---|---|---|---|",
-        f"| Track A surface-availability | {fmt_pct(a_with_acc)} | {fmt_pct(a_wo_acc)} | "
-        f"{fmt_pct((a_with_acc or 0) - (a_wo_acc or 0))} _(structural — files present)_ |",
-        f"| Track B completion-rate  | {fmt_pct(b_with_comp)} | {fmt_pct(b_wo_comp)} | "
-        f"{fmt_pct((b_with_comp or 0) - (b_wo_comp or 0))} |",
-        f"| Track B mean wall-time   | {fmt_num(b_results.get('mean_wall_time'))}s "
-        f"| {fmt_num(b_without.get('mean_wall_time'))}s | "
-        f"{fmt_num((b_results.get('mean_wall_time') or 0) - (b_without.get('mean_wall_time') or 0))}s |",
-        f"| Track B ask-vs-act ratio | {fmt_num(b_results.get('ask_vs_act_ratio'), 3)} "
-        f"| {fmt_num(b_without.get('ask_vs_act_ratio'), 3)} | — |",
+        f"| Success / hit-rate | {fmt_pct(wi.get('completion_rate'))} | {fmt_pct(rd.get('completion_rate'))} | {_delta_pct(rd.get('completion_rate'), wi.get('completion_rate'))} |",
+        f"| Mean wall-time | {fmt_num(wi.get('mean_wall_time'))}s | {fmt_num(rd.get('mean_wall_time'))}s | {fmt_num((rd.get('mean_wall_time') or 0) - (wi.get('mean_wall_time') or 0))}s |",
+        f"| Ask-vs-act ratio | {fmt_num(wi.get('ask_vs_act_ratio'), 3)} | {fmt_num(rd.get('ask_vs_act_ratio'), 3)} | {_delta_num(rd.get('ask_vs_act_ratio'), wi.get('ask_vs_act_ratio'))} |",
+        f"| Total tokens | {fmt_int(wi.get('total_tokens'))} | {fmt_int(rd.get('total_tokens'))} | {fmt_int((rd.get('total_tokens') or 0) - (wi.get('total_tokens') or 0))} |",
         "",
     ]
     return "\n".join(lines)
@@ -153,37 +209,68 @@ def render_track_a(track_a: dict) -> str:
     return "\n".join(lines)
-def render_track_b(track_b: dict) -> str:
+def render_track_b(track_b: dict, track_b_rdp: dict) -> str:
     lines = ["## Track B — Task completion", ""]
-    with_data = (track_b.get("with") or {}).get("results", {})
-    without_data = (track_b.get("without") or {}).get("results", {})
-    mode = with_data.get("mode") or without_data.get("mode") or "—"
+    wo = (track_b.get("without") or {}).get("results", {})
+    wi = (track_b.get("with") or {}).get("results", {})
+    rd = (track_b_rdp or {}).get("results", {})
+    mode = wi.get("mode") or wo.get("mode") or rd.get("mode") or "—"
     lines.append(f"- Mode: `{mode}`")
-    if not with_data and not without_data:
-        lines.append("")
-        lines.append("_No Track B reports yet. Run `task bench:ab:track-b`._")
-        lines.append("")
+    if not (wo or wi or rd):
+        lines += ["", "_No Track B reports yet. Run `task bench:ab:track-b`._", ""]
         return "\n".join(lines)
-    lines.extend(
-        [
-            f"- with → **{fmt_pct(with_data.get('completion_rate'))}** "
-            f"({with_data.get('passed', 0)}/{with_data.get('total', 0)})",
-            f"- without → **{fmt_pct(without_data.get('completion_rate'))}** "
-            f"({without_data.get('passed', 0)}/{without_data.get('total', 0)})",
-            "",
-            "Per-category:",
-            "",
-            "| Category | with | without | delta |",
-            "|---|---|---|---|",
-        ]
-    )
-    with_cats = with_data.get("per_category", {})
-    without_cats = without_data.get("per_category", {})
-    for cat in sorted(set(with_cats) | set(without_cats)):
-        w = with_cats.get(cat, {}).get("completion_rate") or 0
-        wo = without_cats.get(cat, {}).get("completion_rate") or 0
+    lines += [
+        f"- without → **{fmt_pct(wo.get('completion_rate'))}** ({wo.get('passed', 0)}/{wo.get('total', 0)})",
+        f"- with → **{fmt_pct(wi.get('completion_rate'))}** ({wi.get('passed', 0)}/{wi.get('total', 0)})",
+        f"- with-rdp → **{fmt_pct(rd.get('completion_rate'))}** ({rd.get('passed', 0)}/{rd.get('total', 0)})",
+        "",
+        "### Per 2×2 cell (success-rate per condition; per-cell N in parens)",
+        "",
+        "| Cell (duration × cognitive) | N | without | with | with-rdp |",
+        "|---|---|---|---|---|",
+    ]
+    wo_c, wi_c, rd_c = wo.get("per_cell", {}), wi.get("per_cell", {}), rd.get("per_cell", {})
+    cells = sorted(set(wo_c) | set(wi_c) | set(rd_c)) or [
+        "short/reasoning-heavy", "short/mechanical",
+        "long/reasoning-heavy", "long/mechanical",
+    ]
+    for cell in cells:
+        n = (wi_c.get(cell) or wo_c.get(cell) or rd_c.get(cell) or {}).get("total", 0)
+        lines.append(
+            f"| {cell} | {n} | {fmt_pct(wo_c.get(cell, {}).get('completion_rate'))} "
+            f"| {fmt_pct(wi_c.get(cell, {}).get('completion_rate'))} "
+            f"| {fmt_pct(rd_c.get(cell, {}).get('completion_rate'))} |"
+        )
+    lines += [
+        "",
+        "### Per 2×2 cell — mean tokens per condition",
+        "",
+        "| Cell (duration × cognitive) | without | with | with-rdp |",
+        "|---|---|---|---|",
+    ]
+    for cell in cells:
+        lines.append(
+            f"| {cell} | {fmt_int(wo_c.get(cell, {}).get('mean_tokens'))} "
+            f"| {fmt_int(wi_c.get(cell, {}).get('mean_tokens'))} "
+            f"| {fmt_int(rd_c.get(cell, {}).get('mean_tokens'))} |"
+        )
+    lines += [
+        "",
+        "_`short × mechanical` mean-tokens across conditions answers \"are short "
+        "tasks more expensive?\"; `long × reasoning-heavy` answers \"do long tasks "
+        "get cheaper / better?\"._",
+        "",
+        "### Per category",
+        "",
+        "| Category | without | with | with-rdp |",
+        "|---|---|---|---|",
+    ]
+    wo_cat, wi_cat, rd_cat = wo.get("per_category", {}), wi.get("per_category", {}), rd.get("per_category", {})
+    for cat in sorted(set(wo_cat) | set(wi_cat) | set(rd_cat)):
         lines.append(
-            f"| {cat} | {fmt_pct(w)} | {fmt_pct(wo)} | {fmt_pct(w - wo)} |"
+            f"| {cat} | {fmt_pct(wo_cat.get(cat, {}).get('completion_rate'))} "
+            f"| {fmt_pct(wi_cat.get(cat, {}).get('completion_rate'))} "
+            f"| {fmt_pct(rd_cat.get(cat, {}).get('completion_rate'))} |"
         )
     lines.append("")
     return "\n".join(lines)
@@ -265,8 +352,10 @@ def render(quiet: bool = False) -> int:
     b_with, b_without = latest_pair("ab-trackb")
     track_a = {"with": safe_load(a_with), "without": safe_load(a_without)}
     track_b = {"with": safe_load(b_with), "without": safe_load(b_without)}
+    track_b_rdp = latest_trackb_with_rdp()
     have_data = bool(
-        track_a["with"] or track_a["without"] or track_b["with"] or track_b["without"]
+        track_a["with"] or track_a["without"]
+        or track_b["with"] or track_b["without"] or track_b_rdp
     )
     if not have_data:
         OUT_PATH.parent.mkdir(parents=True, exist_ok=True)
@@ -282,9 +371,9 @@ def render(quiet: bool = False) -> int:
         "> Generated by `scripts/render_benchmark_md.py`. Source of truth: "
         "`internal/bench/reports/ab/`. Re-render anytime with `task bench:ab:diff`.",
         "",
-        render_headline(track_a, track_b),
+        render_headline(track_a, track_b, track_b_rdp),
         render_track_a(track_a),
-        render_track_b(track_b),
+        render_track_b(track_b, track_b_rdp),
         render_methodology(track_a, track_b),
         render_history(),
     ]

package/src/scripts/schemas/command.schema.json CHANGED Viewed

@@ -16,7 +16,12 @@
     "tier": {
       "type": "integer",
       "enum": [0, 1, 2],
-      "description": "Command-surface tier per docs/contracts/command-surface-tiers.md. 0 = daily-driver (rendered in default `./agent-config --help`), 1 = power-user (rendered with `--tier=1`), 2 = maintenance/internal (rendered only with `--tier=all`). Default for new commands is 2 — promotion is gated by ADR criteria, never by author preference."
+      "description": "Command-surface tier per docs/contracts/command-surface-tiers.md. 0 = daily-driver (rendered in default `./agent-config --help`), 1 = power-user (rendered with `--tier=1`), 2 = maintenance/internal (rendered only with `--tier=all`). Default for new commands is 2 — promotion is gated by ADR criteria, never by author preference. BACK-COMPAT ALIAS since ADR-090: `visibility:` is the named source of truth (visible↔0, advanced↔1, internal↔2); `tier:` is retained as a derived integer alias and dual-emitted in the discovery manifest during the deprecation window. When both are present they MUST agree (enforced by lint_command_tiers.py)."
+    },
+    "visibility": {
+      "type": "string",
+      "enum": ["visible", "advanced", "internal"],
+      "description": "Command-surface visibility (ADR-090) — the NAMED source of truth that supersedes the integer `tier:` proxy. `visible` = daily-driver (default `./agent-config --help`, ↔ tier 0), `advanced` = power-user (`--tier=1`, ↔ tier 1), `internal` = maintenance/hidden (`--tier=all` only, ↔ tier 2). Read by the surface classifier (commands ls/explain), the per-pack visible-command budget audit, and the discovery-manifest builder, each preferring `visibility:` and falling back to `tier:`. Default for new commands is `internal`; promotion is gated by ADR criteria per docs/contracts/command-surface-tiers.md."
     },
     "description": {
       "type": "string",

package/src/scripts/security_audit_config.py ADDED Viewed

@@ -0,0 +1,153 @@
+#!/usr/bin/env python3
+"""P3.1 — consumer-facing agent-config security audit (road-to-security-pillar.md).
+Points the Phase-1 detection logic at a *consumer's assembled* agent config —
+instruction files (CLAUDE.md, AGENTS.md, .cursor/rules, copilot-instructions),
+MCP configs (.mcp.json, .cursor/mcp.json, claude_desktop_config.json), settings
++ hooks (.claude/settings.json), and installed skills — and emits an A–F score
+with a per-category breakdown mapped to the OWASP Top 10 for Agentic
+Applications (ASI).
+Detection is the same library as the self-audit gate (so there is one source of
+truth for the patterns) under the same false-positive containment convention.
+This is decision-support, not a guarantee: detection is probabilistic.
+Usage:
+  python3 src/scripts/security_audit_config.py [--root DIR] [--json]
+"""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+HERE = Path(__file__).resolve().parent
+sys.path.insert(0, str(HERE))
+from _lib import security_lint as sl  # noqa: E402
+import lint_hidden_unicode as p11  # noqa: E402
+import lint_instruction_smuggling as p12  # noqa: E402
+import lint_mcp_config_security as p13  # noqa: E402
+import lint_skill_frontmatter_safety as p14  # noqa: E402
+# Consumer config surfaces (globs relative to --root).
+SURFACES = [
+    "CLAUDE.md", "AGENTS.md", "GEMINI.md", ".clinerules", ".windsurfrules",
+    ".github/copilot-instructions.md",
+    ".cursor/rules/**/*", ".cursorrules",
+    ".claude/skills/**/SKILL.md", ".claude/commands/**/*.md",
+    ".claude/settings.json", ".claude/settings.local.json",
+    ".mcp.json", ".cursor/mcp.json", "claude_desktop_config.json",
+]
+# check id → (category, OWASP-ASI tag)
+CATEGORY = {
+    "hidden-unicode": ("Agents/Rules", "ASI01 Goal Hijack"),
+    "instruction-smuggling": ("Agents/Rules", "ASI01 Goal Hijack"),
+    "mcp-config-security": ("MCP", "ASI04 Supply Chain"),
+    "dangerous-frontmatter": ("Permissions", "ASI03 Privilege Abuse"),
+}
+SECRET_HINT = "secret"  # mcp finding mentioning a secret → Secrets category
+CATEGORIES = ["Secrets", "Permissions", "Hooks", "MCP", "Agents/Rules"]
+# Deduction per finding (full weight); weighted findings scale by their weight.
+_DEDUCT = {"HIGH": 25.0, "MED": 5.0, "LOW": 2.0}
+def _grade(score: float) -> str:
+    return ("A" if score >= 90 else "B" if score >= 80 else "C" if score >= 70
+            else "D" if score >= 60 else "F")
+def _category(f) -> str:
+    if f.check == "mcp-config-security" and SECRET_HINT in f.message.lower():
+        return "Secrets"
+    return CATEGORY.get(f.check, ("Agents/Rules", ""))[0]
+def _iter_targets(root: Path):
+    seen = set()
+    for pattern in SURFACES:
+        for p in root.glob(pattern):
+            if p.is_file() and p not in seen:
+                seen.add(p)
+                yield p
+def audit(root: Path) -> dict:
+    findings = []
+    for p in _iter_targets(root):
+        try:
+            sf = sl.scan_path(p, root)
+        except (UnicodeDecodeError, OSError):
+            continue
+        for mod in (p11, p12, p13, p14):
+            try:
+                findings.extend(mod._scan(sf))
+            except Exception:
+                pass
+    per_cat = {c: 100.0 for c in CATEGORIES}
+    cat_findings = {c: [] for c in CATEGORIES}
+    for f in findings:
+        cat = _category(f)
+        per_cat[cat] -= _DEDUCT.get(f.severity, 2.0) * float(f.weight)
+        cat_findings[cat].append(f)
+    for c in per_cat:
+        per_cat[c] = max(0.0, per_cat[c])
+    overall = round(sum(per_cat.values()) / len(CATEGORIES), 1)
+    return {
+        "root": str(root),
+        "overall_score": overall,
+        "overall_grade": _grade(overall),
+        "categories": {
+            c: {
+                "score": round(per_cat[c], 1),
+                "grade": _grade(per_cat[c]),
+                "owasp": next((CATEGORY[fl.check][1] for fl in cat_findings[c]
+                               if fl.check in CATEGORY), ""),
+                "findings": [
+                    {"path": fl.path, "line": fl.line, "check": fl.check,
+                     "severity": fl.severity, "message": fl.message,
+                     "weight": fl.weight}
+                    for fl in cat_findings[c]
+                ],
+            }
+            for c in CATEGORIES
+        },
+    }
+def _print(report: dict) -> None:
+    print(f"Agent-config security audit — {report['root']}")
+    print(f"Overall: {report['overall_grade']} ({report['overall_score']}/100)\n")
+    for c in CATEGORIES:
+        cat = report["categories"][c]
+        tag = f" · {cat['owasp']}" if cat["owasp"] else ""
+        print(f"  {cat['grade']}  {c:<12} {cat['score']:>5}/100{tag}")
+        for f in cat["findings"]:
+            loc = f"{f['path']}:{f['line']}" if f["line"] else f["path"]
+            w = "" if f["weight"] >= 1.0 else f" (weight {f['weight']:g})"
+            print(f"        [{f['severity']}] {loc}{w}: {f['message']}")
+    print("\n> Decision support, not a guarantee — detection is probabilistic. "
+          "Pair with /threat-model and judge-security-auditor for a deep pass.")
+def main() -> int:
+    ap = argparse.ArgumentParser(description=__doc__, epilog=sl.GUIDELINE_EPILOG)
+    ap.add_argument("--root", default=".", help="consumer repo root to audit (default: cwd)")
+    ap.add_argument("--json", action="store_true")
+    args = ap.parse_args()
+    report = audit(Path(args.root))
+    if args.json:
+        import json
+        print(json.dumps(report, indent=2))
+    else:
+        _print(report)
+    # Audit is advisory: always exit 0 (it informs, it does not gate the consumer).
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

package/dist/agent-src/commands/chat-history/learn.md DELETED Viewed

@@ -1,184 +0,0 @@
----
-model_tier: medium
-name: chat-history-learn
-pack: meta
-tier: 2
-cluster: chat-history
-sub: learn
-skills: [learning-to-rule-or-skill]
-description: Pick a prior chat-history session and mine it for project-improving learnings — runs learning-to-rule-or-skill on the picked session, drafts proposal(s) under agents/proposals/
-suggestion:
-  eligible: true
-  trigger_description: "extract a learning from a past session, mine chat-history for proposals, what did we learn last session, codify a pattern from a prior session"
-  trigger_context: "user wants to derive a rule/skill/guideline proposal from the content of one prior session"
-workspaces:
-  - agent-config-maintainer
-packs:
-  - meta
----
-<!-- cloud_safe: noop -->
-# /chat-history learn
-User-driven **learning extraction** from a prior session. Surfaces
-prior sessions logged in `agents/runtime/.agent-chat-history` as numbered options,
-the user picks **one**, the agent reads that session's entries and
-runs the [`learning-to-rule-or-skill`](../../skills/learning-to-rule-or-skill/SKILL.md)
-workflow on the content — surfacing repeated mistakes, successful
-patterns, or constraints worth codifying as a rule, skill, or
-guideline proposal.
-This is the **project-improvement** counterpart to
-[`/chat-history import`](import.md): `import` renders a session
-verbatim into the current chat for the user to act on; `learn`
-mines a session for proposals that improve the agent or the
-project itself.
-## When NOT to use
-- Pull a prior session into the current chat verbatim — use
-  [`/chat-history import`](import.md).
-- Capture a learning that originated **in the current** session —
-  invoke the [`learning-to-rule-or-skill`](../../skills/learning-to-rule-or-skill/SKILL.md)
-  skill directly. `learn` is for prior-session mining only.
-- Bulk-mine all sessions — out of scope for v1. One session per
-  invocation; multi-pick is v2.
-## Steps
-### 1. Check if enabled
-Read `chat_history.enabled` from `.agent-settings.yml`. If `false`
-or the section is missing, say so and stop:
-```
-> 📒 chat-history is disabled (chat_history.enabled = false).
-> Set it to true in .agent-settings.yml to start logging.
-```
-### 2. List sessions
-Run `scripts/chat_history.py sessions --json --limit 20 --summary`.
-The helper returns an array of
-`{id, count, first_ts, last_ts, preview, summary}` sorted by
-`last_ts` desc. The `summary` field is built inside the helper
-from ≤10 sampled entries per session (5 oldest + 5 newest) —
-token-cheap, no full-body read needed for the picker. Empty
-buckets are excluded by default.
-If the array is empty, stop:
-```
-> 📒 No prior sessions found in agents/runtime/.agent-chat-history.
-```
-### 3. Surface as numbered options
-Render each session as a numbered option (per the `user-interaction`
-rule — Iron Law: numbered options for any picker). Lead with the
-helper's `summary` field — the rough arc the user picks by
-(`<first user msg> → <last user msg>`, or
-`(N entries — no user prompts; t-mix: …)` for tool-only sessions).
-Keep the session `id` **internal** for step 5's `read --session <id>`
-call; never render it in the listing. Format:
-```
-> Pick a session to mine for learnings:
->
-> 1. {summary}
->    {YYYY-MM-DD HH:MM}  ·  {count} entries
-> 2. ...
-> ...
-> N. abort — do not extract any learning
-```
-Format the timestamp as `YYYY-MM-DD HH:MM` (drop seconds + timezone
-— orientation, not forensics). Do not truncate or rewrite `summary`
-— the helper already shapes it. Always include an explicit abort
-option last. Track option-number → `id` internally so step 5 calls
-`scripts/chat_history.py read --session <id>` with the right id.
-### 4. Wait for the pick
-**One question per turn** (per `ask-when-uncertain`). Do not chain
-the listing with anything else; do not auto-pick; do not surface a
-default. Wait for the user's response.
-If the user picks the abort option, stop without reading.
-### 5. Read the picked session
-Run `scripts/chat_history.py read --session <id>` with the picked
-`id`. Hold the entries in working memory — do **not** render them
-verbatim into the chat. The verbatim path is `import`'s job; here
-the entries are input to step 6.
-### 6. Run `learning-to-rule-or-skill`
-Apply the [`learning-to-rule-or-skill`](../../skills/learning-to-rule-or-skill/SKILL.md)
-procedure on the session content:
-1. **Scan** the entries for candidate learnings — repeated
-   mistakes, successful patterns, friction points, or constraints
-   stated by the user.
-2. **Pass each candidate through the Promotion Gate** (§ 0 of the
-   skill): repetition, impact, failure pattern, non-duplication,
-   scope fit, minimal. Drop candidates that fail any gate.
-3. **For each surviving candidate**, run § 4 (search protocol — all
-   four steps), then decide rule / skill / guideline / update / no
-   action per § 3 of the skill.
-4. **Draft a proposal** for every candidate that warrants one,
-   following § 8 of the skill (proposal template under
-   `agents/proposals/<id>.md`).
-If multiple candidates survive, draft them as **separate**
-proposals — do not merge unrelated learnings into one.
-### 7. Surface the result
-Hand back to the user with a structured summary per surviving
-candidate:
-```
-> 📒 Mined session {id} — {N} candidate(s) surfaced
-> 1. {learning title}
->    Decision: {rule|skill|guideline|update|no action}
->    Proposal: agents/proposals/{proposal_id}.md
->    Gate: {pass|fail — reason}
-> 2. ...
-```
-If no candidate cleared the Promotion Gate, say so explicitly:
-```
-> 📒 Mined session {id} — no candidate cleared the Promotion Gate.
-```
-Do **not** open a PR, do **not** commit the proposals — proposal
-files land in `agents/proposals/` (gitignored or curated per
-project policy) for the user to review and route via
-`upstream-contribute` or merge into `agents/overrides/`.
-## Gotchas
-- **Promotion Gate is hard.** A grep miss is not proof of
-  non-duplication — § 4 of the skill mandates the four-step search
-  protocol. Do not skip it.
-- **One pick per invocation.** Multi-pick is v2. If the user wants
-  to mine a second session, run `/chat-history learn` again.
-- **Read-only on the log.** This command never writes to
-  `agents/runtime/.agent-chat-history`. It writes proposal drafts under
-  `agents/proposals/` only.
-- **No auto-promotion.** Drafted proposals stay in `proposals/`
-  until the user routes them. `learn` never invokes
-  `upstream-contribute` itself.
-## See also
-- [`/chat-history import`](import.md) — verbatim render of a prior session
-- [`learning-to-rule-or-skill`](../../skills/learning-to-rule-or-skill/SKILL.md) — the workflow this command orchestrates
-- [`upstream-contribute`](../../skills/upstream-contribute/SKILL.md) — promote a project-scoped proposal upstream
-- [`scripts/chat_history.py`](../../../scripts/chat_history.py) — `sessions` and `read --session` CLI surface
-- [`user-interaction`](../../rules/user-interaction.md) — numbered-options Iron Law
-- [`ask-when-uncertain`](../../rules/ask-when-uncertain.md) — one-question-per-turn Iron Law