synergyspec-selfevolving 2.1.4 → 2.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/dist/commands/config.js +4 -0
  2. package/dist/commands/learn.js +80 -24
  3. package/dist/commands/self-evolution-dream.d.ts +54 -0
  4. package/dist/commands/self-evolution-dream.js +265 -0
  5. package/dist/commands/self-evolution-episode.d.ts +5 -0
  6. package/dist/commands/self-evolution-episode.js +160 -107
  7. package/dist/commands/self-evolution.js +127 -4
  8. package/dist/commands/workflow/status.js +38 -7
  9. package/dist/core/archive.js +27 -9
  10. package/dist/core/change-readiness.d.ts +63 -6
  11. package/dist/core/change-readiness.js +912 -23
  12. package/dist/core/completions/command-registry.js +1 -1
  13. package/dist/core/fitness/loss.d.ts +10 -5
  14. package/dist/core/fitness/loss.js +11 -4
  15. package/dist/core/fitness/test-metrics.d.ts +3 -0
  16. package/dist/core/fitness/test-metrics.js +78 -1
  17. package/dist/core/learn/trajectory-discovery.js +5 -0
  18. package/dist/core/learn.js +131 -13
  19. package/dist/core/migration.d.ts +6 -14
  20. package/dist/core/migration.js +63 -21
  21. package/dist/core/profiles.d.ts +1 -1
  22. package/dist/core/profiles.js +1 -0
  23. package/dist/core/runner-evidence.d.ts +53 -0
  24. package/dist/core/runner-evidence.js +613 -0
  25. package/dist/core/self-evolution/candidates.d.ts +1 -1
  26. package/dist/core/self-evolution/candidates.js +1 -2
  27. package/dist/core/self-evolution/canonical-targets.js +1 -0
  28. package/dist/core/self-evolution/dream.d.ts +132 -0
  29. package/dist/core/self-evolution/dream.js +1093 -0
  30. package/dist/core/self-evolution/episode-orchestrator.d.ts +7 -0
  31. package/dist/core/self-evolution/episode-orchestrator.js +162 -12
  32. package/dist/core/self-evolution/episode-store.d.ts +21 -0
  33. package/dist/core/self-evolution/episode-store.js +16 -3
  34. package/dist/core/self-evolution/evolving-agent.js +8 -0
  35. package/dist/core/self-evolution/host-harness.d.ts +46 -12
  36. package/dist/core/self-evolution/host-harness.js +198 -55
  37. package/dist/core/self-evolution/index.d.ts +1 -0
  38. package/dist/core/self-evolution/index.js +1 -0
  39. package/dist/core/self-evolution/policy/policy-store.d.ts +19 -2
  40. package/dist/core/self-evolution/policy/policy-store.js +85 -0
  41. package/dist/core/self-evolution/promote.d.ts +7 -5
  42. package/dist/core/self-evolution/promote.js +111 -19
  43. package/dist/core/self-evolution/reward-agent.js +11 -9
  44. package/dist/core/self-evolution/reward-aggregator.js +2 -2
  45. package/dist/core/shared/skill-generation.d.ts +37 -0
  46. package/dist/core/shared/skill-generation.js +91 -0
  47. package/dist/core/templates/skill-templates.d.ts +1 -0
  48. package/dist/core/templates/skill-templates.js +1 -0
  49. package/dist/core/templates/workflow-manifest.js +2 -0
  50. package/dist/core/templates/workflows/archive-change.js +76 -39
  51. package/dist/core/templates/workflows/ci.js +47 -1
  52. package/dist/core/templates/workflows/dream.d.ts +10 -0
  53. package/dist/core/templates/workflows/dream.js +123 -0
  54. package/dist/core/templates/workflows/gen-tests.js +9 -3
  55. package/dist/core/templates/workflows/learn.js +11 -7
  56. package/dist/core/templates/workflows/run-tests.js +99 -4
  57. package/dist/core/templates/workflows/self-evolving.js +118 -115
  58. package/dist/core/templates/workflows/verify-change.js +130 -22
  59. package/dist/core/trajectory/adapters/codex.js +87 -29
  60. package/dist/core/trajectory/adapters/opencode.js +69 -23
  61. package/dist/core/trajectory/facts.d.ts +1 -1
  62. package/dist/core/trajectory/facts.js +23 -5
  63. package/dist/core/trajectory/registry.d.ts +16 -2
  64. package/dist/core/trajectory/registry.js +104 -29
  65. package/dist/core/trajectory/source.d.ts +27 -4
  66. package/dist/dashboard/react-client.js +4 -4
  67. package/dist/utils/change-utils.d.ts +2 -0
  68. package/dist/utils/change-utils.js +53 -2
  69. package/package.json +99 -99
  70. package/schemas/spec-driven/templates/design.md +6 -0
  71. package/scripts/nl2repo_synergyspec-selfevolving_wrapper.py +170 -0
@@ -1,122 +1,125 @@
1
- const INSTRUCTIONS_BODY = `**Role**
2
-
3
- You are the RUNNER for a completed SynergySpec-SelfEvolving change. In loop v2 (self-evolution as in-context RL) you do NOT grade and you do NOT edit canonical files — the orchestrator CODE-SPAWNS the 奖励智能体 REWARD AGENT (judge: 算分 reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), 文本梯度 textual gradient — it never edits and 弃权 abstains when there is no nameable gap) and the 演进智能体 EVOLVING AGENT (optimizer.step: ONE bounded edit ≤L onto the 策略 POLICY — it never scores), plus an optional CRITIC AGENT(基线智能体 baseline agent)that reruns the last episode's policy vN on the SAME change. Your only job is to TRIGGER the episode via the CLI and RELAY the machine-written result. Read ONLY the on-disk evidence (episode.json, diagnosis.json, the episode JSON output) — never an actor's in-conversation self-report, and never re-judge what the agents decided.
4
-
5
- **The boundary (read this first)**
6
-
7
- - The skill itself NEVER grades. Scoring — reward(主臂), reward(基线臂), advantage, the 文本梯度 textual gradient — is computed by the CODE-SPAWNED 奖励智能体 REWARD AGENT, never by you.
8
- - The skill itself NEVER edits canonical files. The ONE bounded edit (≤L) onto the 策略 POLICY (the design template — the 主智能体 MAIN AGENT's "weights") is authored by the CODE-SPAWNED 演进智能体 EVOLVING AGENT, never by you. Do NOT hand-edit any schema/template/prompt file from this skill.
9
- - You trigger ONE CLI command (the episode orchestrator), then READ and RELAY its result. That is the whole job.
10
-
11
- **Input contract**
12
-
13
- Parse these handles from the spawning prompt:
14
- - **Change name** (required). If the change name is missing or does not resolve via \`synergyspec-selfevolving list --json\`, stop and report the error — do NOT prompt the user (you may have no user channel).
1
+ const INSTRUCTIONS_BODY = `**Role**
2
+
3
+ You are the RUNNER for a completed SynergySpec-SelfEvolving change. In loop v2 (self-evolution as in-context RL) you do NOT grade and you do NOT edit canonical files — the orchestrator CODE-SPAWNS the 奖励智能体 REWARD AGENT (judge: 算分 reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), 文本梯度 textual gradient — it never edits and 弃权 abstains when there is no nameable gap) and the 演进智能体 EVOLVING AGENT (optimizer.step: ONE bounded edit ≤L onto the 策略 POLICY — it never scores), plus an optional CRITIC AGENT(基线智能体 baseline agent)that reruns the last episode's policy vN on the SAME change. Your only job is to TRIGGER the episode via the CLI and RELAY the machine-written result. Read ONLY the on-disk evidence (episode.json, diagnosis.json, the episode JSON output) — never an actor's in-conversation self-report, and never re-judge what the agents decided.
4
+
5
+ **The boundary (read this first)**
6
+
7
+ - The skill itself NEVER grades. Scoring — reward(主臂), reward(基线臂), advantage, the 文本梯度 textual gradient — is computed by the CODE-SPAWNED 奖励智能体 REWARD AGENT, never by you.
8
+ - The skill itself NEVER edits canonical files. The ONE bounded edit (≤L) onto the 策略 POLICY (the design template — the 主智能体 MAIN AGENT's "weights") is authored by the CODE-SPAWNED 演进智能体 EVOLVING AGENT, never by you. Do NOT hand-edit any schema/template/prompt file from this skill.
9
+ - You trigger ONE CLI command (the episode orchestrator), then READ and RELAY its result. That is the whole job.
10
+
11
+ **Input contract**
12
+
13
+ Parse these handles from the spawning prompt:
14
+ - **Change name** (required). If the change name is missing or does not resolve via \`synergyspec-selfevolving list --json\`, stop and report the error — do NOT prompt the user (you may have no user channel).
15
15
  - **Absolute project root.** Run every CLI command from it.
16
- - **Harness**: \`claude\` | \`codex\` | \`opencode\` | \`unknown\`. If a harness was provided and differs from the ambient host, set \`SYNERGYSPEC_SELFEVOLVING_HOST_HARNESS=<harness>\` for the CLI invocation below.
16
+ - **Harness**: \`claude\` | \`codex\` | \`opencode\` | \`unknown\`. If a concrete harness was provided, pass \`--harness <harness>\` to the CLI invocation below. If the prompt says \`unknown\` but this runner is clearly executing inside Codex, Claude Code, or OpenCode, recover the current host and pass that concrete harness. Omit \`--harness\` only when both the prompt and the current runner host are genuinely unidentified; never set \`SYNERGYSPEC_SELFEVOLVING_HOST_HARNESS=unknown\`.
17
+ - **Force-new**: \`yes\` | \`no\` (optional; default \`no\`). If \`yes\`, append \`--rerun\` so a closed matching episode is not reused.
18
+ - **Isolation**: \`fresh-context subagent\` | \`inline fallback (degraded)\` (optional). If supplied, copy it verbatim into the verdict; otherwise infer from whether this skill is running in a spawned subagent or inline fallback.
17
19
  - **Session-id / transcript path** (optional). When the spawning prompt supplied a session-id or transcript path, pass \`--session-id <id>\` / \`--transcript <path>\` to the \`episode\` command so the 主智能体 MAIN AGENT arm's trajectory discovery does not depend on the change-window fallback.
18
-
19
- **Recursion guard**
20
-
21
- Execute every step inline in THIS session. NEVER use the Task tool from this skill, and NEVER invoke synergyspec-selfevolving-learn or synergyspec-selfevolving-self-evolving — you ARE the runner. The 奖励智能体 + 演进智能体 (+ optional 基线智能体) are spawned by the CLI orchestrator in their own contexts; do not spawn them yourself.
22
-
23
- **Purpose**
24
-
25
- This is the review-and-learn step after \`/synspec:apply\` and \`/synspec:verify\`, and it is the ENTRANCE to one self-evolution EPISODE. You trigger the loop-v2 orchestrator with a single CLI command. The orchestrator runs ONE episode in a strict, durably-persisted order:
26
-
27
- 1. Records the 主智能体 MAIN AGENT (frozen actor, policy vN+1) arm for this change.
28
- 2. Optionally runs the CRITIC AGENT(基线智能体 baseline agent)— reruns the LAST episode's policy vN on the SAME change (skipped when the 单一血统 single lineage has < 2 versions or the last action was refused).
29
- 3. Runs the 奖励智能体 REWARD AGENT — computes reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), and the 文本梯度 textual gradient; writes diagnosis.json.
30
- 4. DECIDES on the main arm's edits: 弃权 abstained (no nameable gap) ⇒ skip; bad advantage (< threshold) ⇒ ROLLBACK the 策略 POLICY to the prior good version and append a 否决缓冲 reject-buffer entry; otherwise KEEP.
31
- 5. Runs the 演进智能体 EVOLVING AGENT (optimizer.step) — ONE bounded edit (≤L) onto the 策略 POLICY, or refuses, reading the reject-buffer fresh from disk.
32
- 6. Advances the 版本账本 ledger to the new 策略 POLICY version.
33
-
34
- Everything in steps 1–6 is CODE. You do not perform any of it. You issue the command and relay what it wrote.
35
-
36
- **The episode commits.** The \`episode\` command always runs the full loop — the orchestrator may roll back / keep / evolve as above; it has no read-only mode. If a read-only look (no rollback, no evolution) is wanted, that is NOT this skill's job: the caller should use plain \`learn <change>\` (no \`--apply\`) or the read-only \`self-evolution policy show\` view instead. Do NOT invent a preview flag — there is none.
37
-
38
- **Steps**
39
-
40
- 1. **Confirm the change resolves**
41
-
42
- Run:
43
- \`\`\`bash
44
- synergyspec-selfevolving status --change "<name>" --json
45
- \`\`\`
46
- If the change does not resolve, stop and report the error (do NOT prompt — you may have no user channel). Note from the status output whether apply/verify evidence is present; if it is incomplete, flag the missing evidence in your verdict — the orchestrator's 奖励智能体 REWARD AGENT will 弃权 abstain rather than score on absent evidence.
47
-
48
- 2. **Trigger the episode (the orchestrator does the work)**
49
-
50
- Run exactly ONE command — the loop-v2 orchestrator. It CODE-SPAWNS the 奖励智能体 REWARD AGENT + 演进智能体 EVOLVING AGENT (+ optional CRITIC AGENT(基线智能体)); you spawn nothing:
51
- \`\`\`bash
52
- synergyspec-selfevolving self-evolution episode --change "<change>" --session-id <id>
20
+
21
+ **Recursion guard**
22
+
23
+ Execute every step inline in THIS session. NEVER use the Task tool from this skill, and NEVER invoke synergyspec-selfevolving-learn or synergyspec-selfevolving-self-evolving — you ARE the runner. The 奖励智能体 + 演进智能体 (+ optional 基线智能体) are spawned by the CLI orchestrator in their own contexts; do not spawn them yourself.
24
+
25
+ **Purpose**
26
+
27
+ This is the review-and-learn step after \`/synspec:apply\` and \`/synspec:verify\`, and it is the ENTRANCE to one self-evolution EPISODE. You trigger the loop-v2 orchestrator with a single CLI command. The orchestrator runs ONE episode in a strict, durably-persisted order:
28
+
29
+ 1. Records the 主智能体 MAIN AGENT (frozen actor, policy vN+1) arm for this change.
30
+ 2. Optionally runs the CRITIC AGENT(基线智能体 baseline agent)— reruns the LAST episode's policy vN on the SAME change (skipped when the 单一血统 single lineage has < 2 versions or the last action was refused).
31
+ 3. Runs the 奖励智能体 REWARD AGENT — computes reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), and the 文本梯度 textual gradient; writes diagnosis.json.
32
+ 4. DECIDES on the main arm's edits: 弃权 abstained (no nameable gap) ⇒ skip; bad advantage (< threshold) ⇒ ROLLBACK the 策略 POLICY to the prior good version and append a 否决缓冲 reject-buffer entry; otherwise KEEP.
33
+ 5. Runs the 演进智能体 EVOLVING AGENT (optimizer.step) — ONE bounded edit (≤L) onto the 策略 POLICY, or refuses, reading the reject-buffer fresh from disk.
34
+ 6. Advances the 版本账本 ledger to the new 策略 POLICY version.
35
+
36
+ Everything in steps 1–6 is CODE. You do not perform any of it. You issue the command and relay what it wrote.
37
+
38
+ **The episode commits.** The \`episode\` command always runs the full loop — the orchestrator may roll back / keep / evolve as above; it has no read-only mode. If a read-only look (no rollback, no evolution) is wanted, that is NOT this skill's job: the caller should use plain \`learn <change>\` (no \`--apply\`) or the read-only \`self-evolution policy show\` view instead. Do NOT invent a preview flag — there is none.
39
+
40
+ **Steps**
41
+
42
+ 1. **Confirm the change resolves**
43
+
44
+ Run:
45
+ \`\`\`bash
46
+ synergyspec-selfevolving status --change "<name>" --json
47
+ \`\`\`
48
+ If the change does not resolve, stop and report the error (do NOT prompt — you may have no user channel). Note from the status output whether apply/verify evidence is present; if it is incomplete, flag the missing evidence in your verdict — the orchestrator's 奖励智能体 REWARD AGENT will 弃权 abstain rather than score on absent evidence.
49
+
50
+ 2. **Trigger the episode (the orchestrator does the work)**
51
+
52
+ Run exactly ONE command — the loop-v2 orchestrator. It CODE-SPAWNS the 奖励智能体 REWARD AGENT + 演进智能体 EVOLVING AGENT (+ optional CRITIC AGENT(基线智能体)); you spawn nothing:
53
+ \`\`\`bash
54
+ synergyspec-selfevolving self-evolution episode --change "<change>" --json
53
55
  \`\`\`
54
56
  - Append \`--session-id <id>\` and/or \`--transcript <path>\` ONLY when the spawning prompt supplied them.
55
- - If the harness differs from the ambient host, set \`SYNERGYSPEC_SELFEVOLVING_HOST_HARNESS=<harness>\` first.
56
-
57
- Do NOT grade, score, or author any edit yourself, and do NOT run \`evolve-from-edits\`, \`auto-evolve\`, or \`--agent\` / \`claude -p\` — those are not part of loop v2's host-facing path. The episode command IS the loop.
58
-
59
- 3. **Read the machine-written result**
60
-
61
- The \`episode\` command prints the episode result (and persists it). Read it from the JSON output, and cross-check the on-disk record:
62
- - \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/episode.json\` the episode stage and policy versions.
63
- - \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/diagnosis.json\` — the 奖励智能体's reward(主臂), reward(基线臂), advantage, 文本梯度, and any abstain reason.
64
-
65
- Take these fields straight from the result — NEVER recompute them:
66
- - **advantage** reward(主臂) reward(基线臂) (null when the baseline arm was skipped or the reward agent 弃权 abstained).
67
- - **decision**: \`rolled-back\` | \`kept\` | \`abstained\`.
68
- - **evolution kind**: the 演进智能体 outcome — \`evolved\` | \`refused\` | \`not-spawned\` (null when evolution was skipped, e.g. on 弃权).
69
- - **new 策略 POLICY version**: the 版本账本 ledger head AFTER the episode (post-rollback / post-evolve).
70
-
71
- 4. **Consult the 版本账本 ledger for context (read-only, optional)**
72
-
73
- To explain the result against prior episodes, run the READ-ONLY view:
74
- \`\`\`bash
75
- synergyspec-selfevolving self-evolution policy show --target <targetId> --json
76
- \`\`\`
77
- This shows the 版本账本 ledger (prior 策略 POLICY versions for the target, with the current head) and the 否决缓冲 reject-buffer (rolled-back directions to avoid). Use it only to contextualize the verdict — it changes nothing.
78
-
79
- 5. **Classify the outcome (do not re-judge it)**
80
-
81
- Map the machine result to a verdict, classifying any no-op honestly:
82
- - **evolved** — the 演进智能体 wrote ONE bounded edit onto the 策略 POLICY; report the new version and the rollback command.
83
- - **kept (no evolution) / abstained** — a verified-green or no-nameable-gap run where nothing was promoted is the CORRECT outcome (产物即弃), not a missed evolution. State the reason from diagnosis.json.
84
- - **rolled-back** — the edit's advantage fell below threshold; the 策略 POLICY was restored to the prior good version and a 否决缓冲 reject-buffer entry recorded the lost direction. This is the loop working, not a failure.
85
- - **busy-in-flight** — the episode command returned a clean deferral ('skipped another in-flight episode holds the target') because another episode for the SAME 策略 POLICY target is already running and holds the in-flight lock. This is a TRANSIENT, self-healing concurrency deferral, NOT a DEFECT and NOT an \`error-...\`. Report \`Outcome: busy-in-flight\` (advantage null, no episode id, 策略 POLICY version unchanged), recommend WAIT-AND-RETRY after the lock clears (it self-heals; the CLI alone re-acquires the target once the holder finishes or the 60-minute stale window elapses), and STOP do NOT hand-delete \`in-flight.json\` and do NOT call the lock 'stale'. Staleness is purely the 60-minute time window; a lock whose owner episode is at stage \`evolving\` or \`kept\` is a LIVE episode, not a stale one — deleting it would corrupt a running episode.
86
- - **SAFE refusal** (evidence missing/red, target frozen, gate refused on real grounds) is expected; state the reason and move on.
87
- - **DEFECT** (the orchestrator COULD NOT act for a reason that is NOT about evidence / freezing / scope e.g. an unbindable target that persists) surface it as an unresolved issue; do NOT hand-edit a canonical file to work around it. \`synergyspec-selfevolving status\` prints the machine-written \`Evolution:\` outcome — do not contradict it in free text.
88
-
89
- 6. **Emit the Runner Verdict (always — the final step)**
90
-
91
- Your session's final message MUST end with the \`## Episode Verdict\` block defined in the Output Format below. Copy every field from the machine-written result (the \`episode\` JSON output / episode.json + diagnosis.json) — never re-judge it. Use \`not-run\` when the episode command was never invoked (change did not resolve); state the reason on the verdict lines.
92
-
93
- **Output Format**
94
-
95
- The session's final message MUST end with exactly this block shape:
96
-
97
- \`\`\`
98
- ## Episode Verdict: <change-name>
99
- - Outcome: evolved | kept | rolled-back | abstained | not-run | busy-in-flight | refused-static-gate | refused-unverified-evidence | refused-target-frozen | error-<...>
100
- - Episode id: <episodeId, or none>
101
- - Decision: rolled-back | kept | abstained
102
- - Evolution: evolved | refused | not-spawned | none
103
- - Advantage: <reward(主臂) reward(基线臂), or null (baseline skipped / 弃权 abstained)>
104
- - 策略 POLICY version: <new ledger head version, or unchanged>
105
- - Evolved target: <canonical target id, or none>
106
- - Canonical file(s) changed: <paths, or none>
107
- - Rollback: synergyspec-selfevolving self-evolution promote <candidateId> --rollback
108
- - Loss vs baseline: <loss / baseline, or unmeasured>
109
- - Defects to surface: <genuine orchestrator errors that BLOCKED the episode — NOT evidence/red-test/frozen-target/scope refusals, and NOT busy-in-flight — or none>
110
- - Key lessons: <up to 3 one-line bullets from diagnosis.json>
111
- - Isolation: fresh-context subagent | inline fallback (degraded)
112
- \`\`\`
113
-
114
- - EVERY field MUST be copied from the machine-written result (the \`episode\` JSON output / episode.json + diagnosis.json) — never re-judged. The skill neither grades nor edits; it only relays.
115
- - Use \`not-run\` when the episode command was never invoked (the change did not resolve); state the reason on the verdict lines.
116
- - Use \`busy-in-flight\` when the episode command returned the clean concurrency deferral (another in-flight episode holds the same 策略 POLICY target): advantage is null, episode id is none, 策略 POLICY version is unchanged. It is TRANSIENT and self-healing (retry after the lock clears / the 60-min stale window) — it is NOT a DEFECT, do not list it under Defects to surface, and never advise deleting \`in-flight.json\`.
117
- - When the episode did NOT start (Episode id is none — any not-run / busy-in-flight / error-* outcome), write \`none\` for Evolved target and Canonical file(s) changed, report Decision/Advantage as none/null, and leave 策略 POLICY version unchanged. The change's CONFIGURED target id is context onlydo NOT copy it into the Evolved target field on a non-run verdict.
118
- - A \`kept\` / \`abstained\` outcome on a verified-green run is the CORRECT no-op, not a missed evolutionsay so plainly rather than hedging.
119
- - Report \`Isolation: fresh-context subagent\` when you were spawned as a subagent; report \`Isolation: inline fallback (degraded)\` when this skill is running inline in the spawning session.`;
57
+ - Append \`--harness <harness>\` when the spawning prompt supplied \`claude\`, \`codex\`, or \`opencode\`, or when the prompt supplied \`unknown\` but this runner can identify the current host as Codex, Claude Code, or OpenCode. Never append \`--harness unknown\`.
58
+ - Append \`--rerun\` ONLY when the spawning prompt supplied \`Force-new: yes\`.
59
+
60
+ Do NOT grade, score, or author any edit yourself, and do NOT run \`evolve-from-edits\`, \`auto-evolve\`, or \`--agent\` / \`claude -p\` — those are not part of loop v2's host-facing path. The episode command IS the loop.
61
+
62
+ 3. **Read the machine-written result**
63
+
64
+ The \`episode\` command prints the episode result as JSON (and persists it). Read it from the JSON output, and cross-check the on-disk record:
65
+ - \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/episode.json\` — the episode stage and policy versions.
66
+ - \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/diagnosis.json\` — the 奖励智能体's reward(主臂), reward(基线臂), advantage, 文本梯度, and any abstain reason.
67
+
68
+ Take these fields straight from the result NEVER recompute them:
69
+ - **advantage** reward(主臂) reward(基线臂) (null when the baseline arm was skipped or the reward agent 弃权 abstained).
70
+ - **decision**: \`rolled-back\` | \`kept\` | \`abstained\`.
71
+ - **evolution kind**: the 演进智能体 outcome \`evolved\` | \`refused\` | \`not-spawned\` (null when evolution was skipped, e.g. on 弃权).
72
+ - **new 策略 POLICY version**: the 版本账本 ledger head AFTER the episode (post-rollback / post-evolve).
73
+
74
+ 4. **Consult the 版本账本 ledger for context (read-only, optional)**
75
+
76
+ To explain the result against prior episodes, run the READ-ONLY view:
77
+ \`\`\`bash
78
+ synergyspec-selfevolving self-evolution policy show --target <targetId> --json
79
+ \`\`\`
80
+ This shows the 版本账本 ledger (prior 策略 POLICY versions for the target, with the current head) and the 否决缓冲 reject-buffer (rolled-back directions to avoid). Use it only to contextualize the verdict — it changes nothing.
81
+
82
+ 5. **Classify the outcome (do not re-judge it)**
83
+
84
+ Map the machine result to a verdict, classifying any no-op honestly:
85
+ - **evolved** — the 演进智能体 wrote ONE bounded edit onto the 策略 POLICY; report the new version and the rollback command.
86
+ - **kept (no evolution) / abstained** — a verified-green or no-nameable-gap run where nothing was promoted is the CORRECT outcome (产物即弃), not a missed evolution. State the reason from diagnosis.json.
87
+ - **rolled-back** — the edit's advantage fell below threshold; the 策略 POLICY was restored to the prior good version and a 否决缓冲 reject-buffer entry recorded the lost direction. This is the loop working, not a failure.
88
+ - **busy-in-flight** — the episode command returned a clean deferral ('skipped another in-flight episode holds the target') because another episode for the SAME 策略 POLICY target is already running and holds the in-flight lock. This is a TRANSIENT, self-healing concurrency deferral, NOT a DEFECT and NOT an \`error-...\`. Report \`Outcome: busy-in-flight\` (advantage null, no episode id, 策略 POLICY version unchanged), recommend WAIT-AND-RETRY after the lock clears (it self-heals; the CLI alone re-acquires the target once the holder finishes or the 60-minute stale window elapses), and STOP — do NOT hand-delete \`in-flight.json\` and do NOT call the lock 'stale'. Staleness is purely the 60-minute time window; a lock whose owner episode is at stage \`evolving\` or \`kept\` is a LIVE episode, not a stale one — deleting it would corrupt a running episode.
89
+ - **SAFE refusal** (evidence missing/red, target frozen, gate refused on real grounds) is expected; state the reason and move on.
90
+ - **DEFECT** (the orchestrator COULD NOT act for a reason that is NOT about evidence / freezing / scope — e.g. an unbindable target that persists) — surface it as an unresolved issue; do NOT hand-edit a canonical file to work around it. \`synergyspec-selfevolving status\` prints the machine-written \`Evolution:\` outcome — do not contradict it in free text.
91
+
92
+ 6. **Emit the Runner Verdict (always — the final step)**
93
+
94
+ Your session's final message MUST end with the \`## Episode Verdict\` block defined in the Output Format below. Copy every field from the machine-written result (the \`episode\` JSON output / episode.json + diagnosis.json) — never re-judge it. Use \`not-run\` when the episode command was never invoked (change did not resolve); state the reason on the verdict lines.
95
+
96
+ **Output Format**
97
+
98
+ The session's final message MUST end with exactly this block shape:
99
+
100
+ \`\`\`
101
+ ## Episode Verdict: <change-name>
102
+ - Outcome: evolved | kept | rolled-back | abstained | not-run | busy-in-flight | refused-static-gate | refused-unverified-evidence | refused-target-frozen | error-<...>
103
+ - Episode id: <episodeId, or none>
104
+ - Decision: rolled-back | kept | abstained
105
+ - Evolution: evolved | refused | not-spawned | none
106
+ - Advantage: <reward(主臂) reward(基线臂), or null (baseline skipped / 弃权 abstained)>
107
+ - 策略 POLICY version: <new ledger head version, or unchanged>
108
+ - Evolved target: <canonical target id, or none>
109
+ - Canonical file(s) changed: <paths, or none>
110
+ - Rollback: synergyspec-selfevolving self-evolution promote <candidateId> --rollback
111
+ - Loss vs baseline: <loss / baseline, or unmeasured>
112
+ - Defects to surface: <genuine orchestrator errors that BLOCKED the episode — NOT evidence/red-test/frozen-target/scope refusals, and NOT busy-in-flight — or none>
113
+ - Key lessons: <up to 3 one-line bullets from diagnosis.json>
114
+ - Isolation: fresh-context subagent | inline fallback (degraded)
115
+ \`\`\`
116
+
117
+ - EVERY field MUST be copied from the machine-written result (the \`episode\` JSON output / episode.json + diagnosis.json) never re-judged. The skill neither grades nor edits; it only relays.
118
+ - Use \`not-run\` when the episode command was never invoked (the change did not resolve); state the reason on the verdict lines.
119
+ - Use \`busy-in-flight\` when the episode command returned the clean concurrency deferral (another in-flight episode holds the same 策略 POLICY target): advantage is null, episode id is none, 策略 POLICY version is unchanged. It is TRANSIENT and self-healing (retry after the lock clears / the 60-min stale window) it is NOT a DEFECT, do not list it under Defects to surface, and never advise deleting \`in-flight.json\`.
120
+ - When the episode did NOT start (Episode id is none — any not-run / busy-in-flight / error-* outcome), write \`none\` for Evolved target and Canonical file(s) changed, report Decision/Advantage as none/null, and leave 策略 POLICY version unchanged. The change's CONFIGURED target id is context only do NOT copy it into the Evolved target field on a non-run verdict.
121
+ - A \`kept\` / \`abstained\` outcome on a verified-green run is the CORRECT no-op, not a missed evolution say so plainly rather than hedging.
122
+ - Copy the supplied \`Isolation:\` value verbatim when present. If it was not supplied, report \`Isolation: fresh-context subagent\` when you were spawned as a subagent, or \`Isolation: inline fallback (degraded)\` when this skill is running inline in the spawning session.`;
120
123
  export function getSelfEvolvingSkillTemplate() {
121
124
  return {
122
125
  name: 'synergyspec-selfevolving-self-evolving',
@@ -43,6 +43,45 @@ export function getVerifyChangeSkillTemplate() {
43
43
 
44
44
  Each dimension can have CRITICAL, WARNING, or SUGGESTION issues.
45
45
 
46
+ 4a. **Load and validate durable runner evidence**
47
+
48
+ Treat \`test-report.md\`, \`ci-report.md\`, and any chat-written summaries as
49
+ claims until their runner evidence is validated. Do not mark a requirement,
50
+ test suite, CI run, or PBT result as verified from a self-authored markdown
51
+ summary alone.
52
+
53
+ Check these files if they exist:
54
+ - \`synergyspec-selfevolving/changes/<name>/test-report.md\`
55
+ - \`synergyspec-selfevolving/ci-report.md\`
56
+ - \`synergyspec-selfevolving/changes/<name>/pbt-regressions.md\`
57
+
58
+ For each report that contains test or CI claims:
59
+ - Locate its \`### Runner Evidence\` section.
60
+ - Extract raw stdout/stderr log paths and the \`*-exit.json\` path.
61
+ - Verify every referenced evidence path exists on disk and is inside the project.
62
+ - Parse each exit JSON and require: \`command\`, \`cwd\`, \`startedAt\` or \`timestamp\`,
63
+ \`exitCode\`, and raw log paths.
64
+ - If optional JUnit or coverage paths are listed, verify they exist unless the
65
+ value is explicitly \`null\`, \`N/A\`, or empty.
66
+ - Cross-check the markdown verdict against \`exitCode\`: non-zero exit means
67
+ the run failed even if the markdown says PASS.
68
+ - Compare \`runner-exit.json.workspaceIdentity\` to the current root before
69
+ trusting the report: \`cwd\` must still be this project, recorded
70
+ \`pyproject.toml [project].name\` and hash must match the current
71
+ \`pyproject.toml\`, and recorded \`package.json\` name/hash must match the
72
+ current \`package.json\` when those files exist. A mismatch means the report
73
+ proves an older or different workspace, not the current change.
74
+
75
+ Evidence verdicts:
76
+ - **verified**: raw logs exist, exit JSON parses, required provenance fields exist, and verdict matches exit code.
77
+ - **unverified**: report exists but lacks runner evidence, has missing files, malformed JSON, or mismatched verdicts.
78
+ - **absent**: no report or evidence file exists.
79
+
80
+ Missing or unverified runner evidence is at least a WARNING. If the change is
81
+ otherwise claiming "all tests passed", "all requirements covered", or "ready
82
+ to archive" based on that report, promote it to CRITICAL until durable
83
+ evidence is available.
84
+
46
85
  5. **Verify Completeness**
47
86
 
48
87
  **Task Completion**:
@@ -114,6 +153,18 @@ export function getVerifyChangeSkillTemplate() {
114
153
  | Coherence | Followed/Issues |
115
154
  \`\`\`
116
155
 
156
+ **Evidence Provenance**:
157
+ \`\`\`markdown
158
+ ### Evidence Provenance
159
+ | Source | Status | Exit JSON | Raw Logs | Notes |
160
+ |--------|--------|-----------|----------|-------|
161
+ | test-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
162
+ | ci-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
163
+ \`\`\`
164
+
165
+ Only count test and CI claims as verification evidence when this table marks
166
+ the corresponding source \`verified\`.
167
+
117
168
  **Issues by Priority**:
118
169
 
119
170
  1. **CRITICAL** (Must fix before archive):
@@ -131,17 +182,18 @@ export function getVerifyChangeSkillTemplate() {
131
182
  - Minor improvements
132
183
  - Each with specific recommendation
133
184
 
134
- **Final Assessment**:
135
- - If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
136
- - If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
137
- - If all clear: "All checks passed. Ready for learn, then archive."
138
-
139
- Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
185
+ **Final Assessment**:
186
+ - If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
187
+ - If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
188
+ - If all clear: "All checks passed. Ready for learn, then archive."
189
+
190
+ Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
140
191
 
141
192
  **Verification Heuristics**
142
193
 
143
194
  - **Completeness**: Focus on objective checklist items (checkboxes, requirements list)
144
195
  - **Correctness**: Use keyword search, file path analysis, reasonable inference - don't require perfect certainty
196
+ - **Evidence provenance**: Prefer raw runner logs, exit JSON, JUnit XML, and coverage artifacts over report prose. Self-authored markdown summaries without durable evidence are not proof.
145
197
  - **Coherence**: Look for glaring inconsistencies, don't nitpick style
146
198
  - **False Positives**: When uncertain, prefer SUGGESTION over WARNING, WARNING over CRITICAL
147
199
  - **Actionability**: Every issue must have a specific recommendation with file/line references where applicable
@@ -257,6 +309,7 @@ export function getVerifyChangeSkillTemplate() {
257
309
  - If only tasks.md exists: verify task completion only, skip spec/design checks
258
310
  - If tasks + specs exist: verify completeness and correctness, skip design
259
311
  - If full artifacts: verify all three dimensions
312
+ - If test/CI reports exist but runner evidence is missing or invalid: keep the reports as context only, mark evidence provenance unverified, and add a WARNING or CRITICAL per step 4a
260
313
  - Always note which checks were skipped and why
261
314
  - If git diff unavailable or \`synergyspec-selfevolving/specs/\` is empty: skip blast radius gracefully
262
315
 
@@ -264,11 +317,12 @@ export function getVerifyChangeSkillTemplate() {
264
317
 
265
318
  Use clear markdown with:
266
319
  - Table for summary scorecard
267
- - Grouped lists for issues (CRITICAL/WARNING/SUGGESTION)
268
- - Code references in format: \`file.ts:123\`
269
- - Specific, actionable recommendations
270
- - Confirmation that \`synergyspec-selfevolving/changes/<name>/verification-report.md\` was written, or why it could not be written
271
- - If no critical issues remain: suggest \`/synspec:learn <name>\` next, then \`/synspec:archive <name>\`
320
+ - Evidence Provenance table with verified / unverified / absent status for test-report.md and ci-report.md
321
+ - Grouped lists for issues (CRITICAL/WARNING/SUGGESTION)
322
+ - Code references in format: \`file.ts:123\`
323
+ - Specific, actionable recommendations
324
+ - Confirmation that \`synergyspec-selfevolving/changes/<name>/verification-report.md\` was written, or why it could not be written
325
+ - If no critical issues remain: suggest \`/synspec:learn <name>\` next, then \`/synspec:archive <name>\`
272
326
  - No vague suggestions like "consider reviewing"`,
273
327
  license: 'MIT',
274
328
  compatibility: 'Requires synergyspec-selfevolving CLI.',
@@ -322,6 +376,45 @@ export function getOpsxVerifyCommandTemplate() {
322
376
 
323
377
  Each dimension can have CRITICAL, WARNING, or SUGGESTION issues.
324
378
 
379
+ 4a. **Load and validate durable runner evidence**
380
+
381
+ Treat \`test-report.md\`, \`ci-report.md\`, and any chat-written summaries as
382
+ claims until their runner evidence is validated. Do not mark a requirement,
383
+ test suite, CI run, or PBT result as verified from a self-authored markdown
384
+ summary alone.
385
+
386
+ Check these files if they exist:
387
+ - \`synergyspec-selfevolving/changes/<name>/test-report.md\`
388
+ - \`synergyspec-selfevolving/ci-report.md\`
389
+ - \`synergyspec-selfevolving/changes/<name>/pbt-regressions.md\`
390
+
391
+ For each report that contains test or CI claims:
392
+ - Locate its \`### Runner Evidence\` section.
393
+ - Extract raw stdout/stderr log paths and the \`*-exit.json\` path.
394
+ - Verify every referenced evidence path exists on disk and is inside the project.
395
+ - Parse each exit JSON and require: \`command\`, \`cwd\`, \`startedAt\` or \`timestamp\`,
396
+ \`exitCode\`, and raw log paths.
397
+ - If optional JUnit or coverage paths are listed, verify they exist unless the
398
+ value is explicitly \`null\`, \`N/A\`, or empty.
399
+ - Cross-check the markdown verdict against \`exitCode\`: non-zero exit means
400
+ the run failed even if the markdown says PASS.
401
+ - Compare \`runner-exit.json.workspaceIdentity\` to the current root before
402
+ trusting the report: \`cwd\` must still be this project, recorded
403
+ \`pyproject.toml [project].name\` and hash must match the current
404
+ \`pyproject.toml\`, and recorded \`package.json\` name/hash must match the
405
+ current \`package.json\` when those files exist. A mismatch means the report
406
+ proves an older or different workspace, not the current change.
407
+
408
+ Evidence verdicts:
409
+ - **verified**: raw logs exist, exit JSON parses, required provenance fields exist, and verdict matches exit code.
410
+ - **unverified**: report exists but lacks runner evidence, has missing files, malformed JSON, or mismatched verdicts.
411
+ - **absent**: no report or evidence file exists.
412
+
413
+ Missing or unverified runner evidence is at least a WARNING. If the change is
414
+ otherwise claiming "all tests passed", "all requirements covered", or "ready
415
+ to archive" based on that report, promote it to CRITICAL until durable
416
+ evidence is available.
417
+
325
418
  5. **Verify Completeness**
326
419
 
327
420
  **Task Completion**:
@@ -393,6 +486,18 @@ export function getOpsxVerifyCommandTemplate() {
393
486
  | Coherence | Followed/Issues |
394
487
  \`\`\`
395
488
 
489
+ **Evidence Provenance**:
490
+ \`\`\`markdown
491
+ ### Evidence Provenance
492
+ | Source | Status | Exit JSON | Raw Logs | Notes |
493
+ |--------|--------|-----------|----------|-------|
494
+ | test-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
495
+ | ci-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
496
+ \`\`\`
497
+
498
+ Only count test and CI claims as verification evidence when this table marks
499
+ the corresponding source \`verified\`.
500
+
396
501
  **Issues by Priority**:
397
502
 
398
503
  1. **CRITICAL** (Must fix before archive):
@@ -410,17 +515,18 @@ export function getOpsxVerifyCommandTemplate() {
410
515
  - Minor improvements
411
516
  - Each with specific recommendation
412
517
 
413
- **Final Assessment**:
414
- - If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
415
- - If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
416
- - If all clear: "All checks passed. Ready for learn, then archive."
417
-
418
- Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
518
+ **Final Assessment**:
519
+ - If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
520
+ - If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
521
+ - If all clear: "All checks passed. Ready for learn, then archive."
522
+
523
+ Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
419
524
 
420
525
  **Verification Heuristics**
421
526
 
422
527
  - **Completeness**: Focus on objective checklist items (checkboxes, requirements list)
423
528
  - **Correctness**: Use keyword search, file path analysis, reasonable inference - don't require perfect certainty
529
+ - **Evidence provenance**: Prefer raw runner logs, exit JSON, JUnit XML, and coverage artifacts over report prose. Self-authored markdown summaries without durable evidence are not proof.
424
530
  - **Coherence**: Look for glaring inconsistencies, don't nitpick style
425
531
  - **False Positives**: When uncertain, prefer SUGGESTION over WARNING, WARNING over CRITICAL
426
532
  - **Actionability**: Every issue must have a specific recommendation with file/line references where applicable
@@ -536,6 +642,7 @@ export function getOpsxVerifyCommandTemplate() {
536
642
  - If only tasks.md exists: verify task completion only, skip spec/design checks
537
643
  - If tasks + specs exist: verify completeness and correctness, skip design
538
644
  - If full artifacts: verify all three dimensions
645
+ - If test/CI reports exist but runner evidence is missing or invalid: keep the reports as context only, mark evidence provenance unverified, and add a WARNING or CRITICAL per step 4a
539
646
  - Always note which checks were skipped and why
540
647
  - If git diff unavailable or \`synergyspec-selfevolving/specs/\` is empty: skip blast radius gracefully
541
648
 
@@ -543,11 +650,12 @@ export function getOpsxVerifyCommandTemplate() {
543
650
 
544
651
  Use clear markdown with:
545
652
  - Table for summary scorecard
546
- - Grouped lists for issues (CRITICAL/WARNING/SUGGESTION)
547
- - Code references in format: \`file.ts:123\`
548
- - Specific, actionable recommendations
549
- - Confirmation that \`synergyspec-selfevolving/changes/<name>/verification-report.md\` was written, or why it could not be written
550
- - If no critical issues remain: suggest \`/synspec:learn <name>\` next, then \`/synspec:archive <name>\`
653
+ - Evidence Provenance table with verified / unverified / absent status for test-report.md and ci-report.md
654
+ - Grouped lists for issues (CRITICAL/WARNING/SUGGESTION)
655
+ - Code references in format: \`file.ts:123\`
656
+ - Specific, actionable recommendations
657
+ - Confirmation that \`synergyspec-selfevolving/changes/<name>/verification-report.md\` was written, or why it could not be written
658
+ - If no critical issues remain: suggest \`/synspec:learn <name>\` next, then \`/synspec:archive <name>\`
551
659
  - No vague suggestions like "consider reviewing"`
552
660
  };
553
661
  }