synergyspec-selfevolving 2.1.4 → 2.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/commands/config.js +4 -0
- package/dist/commands/learn.js +80 -24
- package/dist/commands/self-evolution-dream.d.ts +54 -0
- package/dist/commands/self-evolution-dream.js +265 -0
- package/dist/commands/self-evolution-episode.d.ts +5 -0
- package/dist/commands/self-evolution-episode.js +160 -107
- package/dist/commands/self-evolution.js +127 -4
- package/dist/commands/workflow/status.js +38 -7
- package/dist/core/archive.js +27 -9
- package/dist/core/change-readiness.d.ts +63 -6
- package/dist/core/change-readiness.js +912 -23
- package/dist/core/completions/command-registry.js +1 -1
- package/dist/core/fitness/loss.d.ts +10 -5
- package/dist/core/fitness/loss.js +11 -4
- package/dist/core/fitness/test-metrics.d.ts +3 -0
- package/dist/core/fitness/test-metrics.js +78 -1
- package/dist/core/learn/trajectory-discovery.js +5 -0
- package/dist/core/learn.js +131 -13
- package/dist/core/migration.d.ts +6 -14
- package/dist/core/migration.js +63 -21
- package/dist/core/profiles.d.ts +1 -1
- package/dist/core/profiles.js +1 -0
- package/dist/core/runner-evidence.d.ts +53 -0
- package/dist/core/runner-evidence.js +613 -0
- package/dist/core/self-evolution/candidates.d.ts +1 -1
- package/dist/core/self-evolution/candidates.js +1 -2
- package/dist/core/self-evolution/canonical-targets.js +1 -0
- package/dist/core/self-evolution/dream.d.ts +132 -0
- package/dist/core/self-evolution/dream.js +1093 -0
- package/dist/core/self-evolution/episode-orchestrator.d.ts +7 -0
- package/dist/core/self-evolution/episode-orchestrator.js +162 -12
- package/dist/core/self-evolution/episode-store.d.ts +21 -0
- package/dist/core/self-evolution/episode-store.js +16 -3
- package/dist/core/self-evolution/evolving-agent.js +8 -0
- package/dist/core/self-evolution/host-harness.d.ts +46 -12
- package/dist/core/self-evolution/host-harness.js +198 -55
- package/dist/core/self-evolution/index.d.ts +1 -0
- package/dist/core/self-evolution/index.js +1 -0
- package/dist/core/self-evolution/policy/policy-store.d.ts +19 -2
- package/dist/core/self-evolution/policy/policy-store.js +85 -0
- package/dist/core/self-evolution/promote.d.ts +7 -5
- package/dist/core/self-evolution/promote.js +111 -19
- package/dist/core/self-evolution/reward-agent.js +11 -9
- package/dist/core/self-evolution/reward-aggregator.js +2 -2
- package/dist/core/shared/skill-generation.d.ts +37 -0
- package/dist/core/shared/skill-generation.js +91 -0
- package/dist/core/templates/skill-templates.d.ts +1 -0
- package/dist/core/templates/skill-templates.js +1 -0
- package/dist/core/templates/workflow-manifest.js +2 -0
- package/dist/core/templates/workflows/archive-change.js +76 -39
- package/dist/core/templates/workflows/ci.js +47 -1
- package/dist/core/templates/workflows/dream.d.ts +10 -0
- package/dist/core/templates/workflows/dream.js +123 -0
- package/dist/core/templates/workflows/gen-tests.js +9 -3
- package/dist/core/templates/workflows/learn.js +11 -7
- package/dist/core/templates/workflows/run-tests.js +99 -4
- package/dist/core/templates/workflows/self-evolving.js +118 -115
- package/dist/core/templates/workflows/verify-change.js +130 -22
- package/dist/core/trajectory/adapters/codex.js +87 -29
- package/dist/core/trajectory/adapters/opencode.js +69 -23
- package/dist/core/trajectory/facts.d.ts +1 -1
- package/dist/core/trajectory/facts.js +23 -5
- package/dist/core/trajectory/registry.d.ts +16 -2
- package/dist/core/trajectory/registry.js +104 -29
- package/dist/core/trajectory/source.d.ts +27 -4
- package/dist/dashboard/react-client.js +4 -4
- package/dist/utils/change-utils.d.ts +2 -0
- package/dist/utils/change-utils.js +53 -2
- package/package.json +99 -99
- package/schemas/spec-driven/templates/design.md +6 -0
- package/scripts/nl2repo_synergyspec-selfevolving_wrapper.py +170 -0
|
@@ -1,122 +1,125 @@
|
|
|
1
|
-
const INSTRUCTIONS_BODY = `**Role**
|
|
2
|
-
|
|
3
|
-
You are the RUNNER for a completed SynergySpec-SelfEvolving change. In loop v2 (self-evolution as in-context RL) you do NOT grade and you do NOT edit canonical files — the orchestrator CODE-SPAWNS the 奖励智能体 REWARD AGENT (judge: 算分 reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), 文本梯度 textual gradient — it never edits and 弃权 abstains when there is no nameable gap) and the 演进智能体 EVOLVING AGENT (optimizer.step: ONE bounded edit ≤L onto the 策略 POLICY — it never scores), plus an optional CRITIC AGENT(基线智能体 baseline agent)that reruns the last episode's policy vN on the SAME change. Your only job is to TRIGGER the episode via the CLI and RELAY the machine-written result. Read ONLY the on-disk evidence (episode.json, diagnosis.json, the episode JSON output) — never an actor's in-conversation self-report, and never re-judge what the agents decided.
|
|
4
|
-
|
|
5
|
-
**The boundary (read this first)**
|
|
6
|
-
|
|
7
|
-
- The skill itself NEVER grades. Scoring — reward(主臂), reward(基线臂), advantage, the 文本梯度 textual gradient — is computed by the CODE-SPAWNED 奖励智能体 REWARD AGENT, never by you.
|
|
8
|
-
- The skill itself NEVER edits canonical files. The ONE bounded edit (≤L) onto the 策略 POLICY (the design template — the 主智能体 MAIN AGENT's "weights") is authored by the CODE-SPAWNED 演进智能体 EVOLVING AGENT, never by you. Do NOT hand-edit any schema/template/prompt file from this skill.
|
|
9
|
-
- You trigger ONE CLI command (the episode orchestrator), then READ and RELAY its result. That is the whole job.
|
|
10
|
-
|
|
11
|
-
**Input contract**
|
|
12
|
-
|
|
13
|
-
Parse these handles from the spawning prompt:
|
|
14
|
-
- **Change name** (required). If the change name is missing or does not resolve via \`synergyspec-selfevolving list --json\`, stop and report the error — do NOT prompt the user (you may have no user channel).
|
|
1
|
+
const INSTRUCTIONS_BODY = `**Role**
|
|
2
|
+
|
|
3
|
+
You are the RUNNER for a completed SynergySpec-SelfEvolving change. In loop v2 (self-evolution as in-context RL) you do NOT grade and you do NOT edit canonical files — the orchestrator CODE-SPAWNS the 奖励智能体 REWARD AGENT (judge: 算分 reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), 文本梯度 textual gradient — it never edits and 弃权 abstains when there is no nameable gap) and the 演进智能体 EVOLVING AGENT (optimizer.step: ONE bounded edit ≤L onto the 策略 POLICY — it never scores), plus an optional CRITIC AGENT(基线智能体 baseline agent)that reruns the last episode's policy vN on the SAME change. Your only job is to TRIGGER the episode via the CLI and RELAY the machine-written result. Read ONLY the on-disk evidence (episode.json, diagnosis.json, the episode JSON output) — never an actor's in-conversation self-report, and never re-judge what the agents decided.
|
|
4
|
+
|
|
5
|
+
**The boundary (read this first)**
|
|
6
|
+
|
|
7
|
+
- The skill itself NEVER grades. Scoring — reward(主臂), reward(基线臂), advantage, the 文本梯度 textual gradient — is computed by the CODE-SPAWNED 奖励智能体 REWARD AGENT, never by you.
|
|
8
|
+
- The skill itself NEVER edits canonical files. The ONE bounded edit (≤L) onto the 策略 POLICY (the design template — the 主智能体 MAIN AGENT's "weights") is authored by the CODE-SPAWNED 演进智能体 EVOLVING AGENT, never by you. Do NOT hand-edit any schema/template/prompt file from this skill.
|
|
9
|
+
- You trigger ONE CLI command (the episode orchestrator), then READ and RELAY its result. That is the whole job.
|
|
10
|
+
|
|
11
|
+
**Input contract**
|
|
12
|
+
|
|
13
|
+
Parse these handles from the spawning prompt:
|
|
14
|
+
- **Change name** (required). If the change name is missing or does not resolve via \`synergyspec-selfevolving list --json\`, stop and report the error — do NOT prompt the user (you may have no user channel).
|
|
15
15
|
- **Absolute project root.** Run every CLI command from it.
|
|
16
|
-
- **Harness**: \`claude\` | \`codex\` | \`opencode\` | \`unknown\`. If a harness was provided
|
|
16
|
+
- **Harness**: \`claude\` | \`codex\` | \`opencode\` | \`unknown\`. If a concrete harness was provided, pass \`--harness <harness>\` to the CLI invocation below. If the prompt says \`unknown\` but this runner is clearly executing inside Codex, Claude Code, or OpenCode, recover the current host and pass that concrete harness. Omit \`--harness\` only when both the prompt and the current runner host are genuinely unidentified; never set \`SYNERGYSPEC_SELFEVOLVING_HOST_HARNESS=unknown\`.
|
|
17
|
+
- **Force-new**: \`yes\` | \`no\` (optional; default \`no\`). If \`yes\`, append \`--rerun\` so a closed matching episode is not reused.
|
|
18
|
+
- **Isolation**: \`fresh-context subagent\` | \`inline fallback (degraded)\` (optional). If supplied, copy it verbatim into the verdict; otherwise infer from whether this skill is running in a spawned subagent or inline fallback.
|
|
17
19
|
- **Session-id / transcript path** (optional). When the spawning prompt supplied a session-id or transcript path, pass \`--session-id <id>\` / \`--transcript <path>\` to the \`episode\` command so the 主智能体 MAIN AGENT arm's trajectory discovery does not depend on the change-window fallback.
|
|
18
|
-
|
|
19
|
-
**Recursion guard**
|
|
20
|
-
|
|
21
|
-
Execute every step inline in THIS session. NEVER use the Task tool from this skill, and NEVER invoke synergyspec-selfevolving-learn or synergyspec-selfevolving-self-evolving — you ARE the runner. The 奖励智能体 + 演进智能体 (+ optional 基线智能体) are spawned by the CLI orchestrator in their own contexts; do not spawn them yourself.
|
|
22
|
-
|
|
23
|
-
**Purpose**
|
|
24
|
-
|
|
25
|
-
This is the review-and-learn step after \`/synspec:apply\` and \`/synspec:verify\`, and it is the ENTRANCE to one self-evolution EPISODE. You trigger the loop-v2 orchestrator with a single CLI command. The orchestrator runs ONE episode in a strict, durably-persisted order:
|
|
26
|
-
|
|
27
|
-
1. Records the 主智能体 MAIN AGENT (frozen actor, policy vN+1) arm for this change.
|
|
28
|
-
2. Optionally runs the CRITIC AGENT(基线智能体 baseline agent)— reruns the LAST episode's policy vN on the SAME change (skipped when the 单一血统 single lineage has < 2 versions or the last action was refused).
|
|
29
|
-
3. Runs the 奖励智能体 REWARD AGENT — computes reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), and the 文本梯度 textual gradient; writes diagnosis.json.
|
|
30
|
-
4. DECIDES on the main arm's edits: 弃权 abstained (no nameable gap) ⇒ skip; bad advantage (< threshold) ⇒ ROLLBACK the 策略 POLICY to the prior good version and append a 否决缓冲 reject-buffer entry; otherwise KEEP.
|
|
31
|
-
5. Runs the 演进智能体 EVOLVING AGENT (optimizer.step) — ONE bounded edit (≤L) onto the 策略 POLICY, or refuses, reading the reject-buffer fresh from disk.
|
|
32
|
-
6. Advances the 版本账本 ledger to the new 策略 POLICY version.
|
|
33
|
-
|
|
34
|
-
Everything in steps 1–6 is CODE. You do not perform any of it. You issue the command and relay what it wrote.
|
|
35
|
-
|
|
36
|
-
**The episode commits.** The \`episode\` command always runs the full loop — the orchestrator may roll back / keep / evolve as above; it has no read-only mode. If a read-only look (no rollback, no evolution) is wanted, that is NOT this skill's job: the caller should use plain \`learn <change>\` (no \`--apply\`) or the read-only \`self-evolution policy show\` view instead. Do NOT invent a preview flag — there is none.
|
|
37
|
-
|
|
38
|
-
**Steps**
|
|
39
|
-
|
|
40
|
-
1. **Confirm the change resolves**
|
|
41
|
-
|
|
42
|
-
Run:
|
|
43
|
-
\`\`\`bash
|
|
44
|
-
synergyspec-selfevolving status --change "<name>" --json
|
|
45
|
-
\`\`\`
|
|
46
|
-
If the change does not resolve, stop and report the error (do NOT prompt — you may have no user channel). Note from the status output whether apply/verify evidence is present; if it is incomplete, flag the missing evidence in your verdict — the orchestrator's 奖励智能体 REWARD AGENT will 弃权 abstain rather than score on absent evidence.
|
|
47
|
-
|
|
48
|
-
2. **Trigger the episode (the orchestrator does the work)**
|
|
49
|
-
|
|
50
|
-
Run exactly ONE command — the loop-v2 orchestrator. It CODE-SPAWNS the 奖励智能体 REWARD AGENT + 演进智能体 EVOLVING AGENT (+ optional CRITIC AGENT(基线智能体)); you spawn nothing:
|
|
51
|
-
\`\`\`bash
|
|
52
|
-
synergyspec-selfevolving self-evolution episode --change "<change>" --
|
|
20
|
+
|
|
21
|
+
**Recursion guard**
|
|
22
|
+
|
|
23
|
+
Execute every step inline in THIS session. NEVER use the Task tool from this skill, and NEVER invoke synergyspec-selfevolving-learn or synergyspec-selfevolving-self-evolving — you ARE the runner. The 奖励智能体 + 演进智能体 (+ optional 基线智能体) are spawned by the CLI orchestrator in their own contexts; do not spawn them yourself.
|
|
24
|
+
|
|
25
|
+
**Purpose**
|
|
26
|
+
|
|
27
|
+
This is the review-and-learn step after \`/synspec:apply\` and \`/synspec:verify\`, and it is the ENTRANCE to one self-evolution EPISODE. You trigger the loop-v2 orchestrator with a single CLI command. The orchestrator runs ONE episode in a strict, durably-persisted order:
|
|
28
|
+
|
|
29
|
+
1. Records the 主智能体 MAIN AGENT (frozen actor, policy vN+1) arm for this change.
|
|
30
|
+
2. Optionally runs the CRITIC AGENT(基线智能体 baseline agent)— reruns the LAST episode's policy vN on the SAME change (skipped when the 单一血统 single lineage has < 2 versions or the last action was refused).
|
|
31
|
+
3. Runs the 奖励智能体 REWARD AGENT — computes reward(主臂)&reward(基线臂), advantage = reward(主臂) − reward(基线臂), and the 文本梯度 textual gradient; writes diagnosis.json.
|
|
32
|
+
4. DECIDES on the main arm's edits: 弃权 abstained (no nameable gap) ⇒ skip; bad advantage (< threshold) ⇒ ROLLBACK the 策略 POLICY to the prior good version and append a 否决缓冲 reject-buffer entry; otherwise KEEP.
|
|
33
|
+
5. Runs the 演进智能体 EVOLVING AGENT (optimizer.step) — ONE bounded edit (≤L) onto the 策略 POLICY, or refuses, reading the reject-buffer fresh from disk.
|
|
34
|
+
6. Advances the 版本账本 ledger to the new 策略 POLICY version.
|
|
35
|
+
|
|
36
|
+
Everything in steps 1–6 is CODE. You do not perform any of it. You issue the command and relay what it wrote.
|
|
37
|
+
|
|
38
|
+
**The episode commits.** The \`episode\` command always runs the full loop — the orchestrator may roll back / keep / evolve as above; it has no read-only mode. If a read-only look (no rollback, no evolution) is wanted, that is NOT this skill's job: the caller should use plain \`learn <change>\` (no \`--apply\`) or the read-only \`self-evolution policy show\` view instead. Do NOT invent a preview flag — there is none.
|
|
39
|
+
|
|
40
|
+
**Steps**
|
|
41
|
+
|
|
42
|
+
1. **Confirm the change resolves**
|
|
43
|
+
|
|
44
|
+
Run:
|
|
45
|
+
\`\`\`bash
|
|
46
|
+
synergyspec-selfevolving status --change "<name>" --json
|
|
47
|
+
\`\`\`
|
|
48
|
+
If the change does not resolve, stop and report the error (do NOT prompt — you may have no user channel). Note from the status output whether apply/verify evidence is present; if it is incomplete, flag the missing evidence in your verdict — the orchestrator's 奖励智能体 REWARD AGENT will 弃权 abstain rather than score on absent evidence.
|
|
49
|
+
|
|
50
|
+
2. **Trigger the episode (the orchestrator does the work)**
|
|
51
|
+
|
|
52
|
+
Run exactly ONE command — the loop-v2 orchestrator. It CODE-SPAWNS the 奖励智能体 REWARD AGENT + 演进智能体 EVOLVING AGENT (+ optional CRITIC AGENT(基线智能体)); you spawn nothing:
|
|
53
|
+
\`\`\`bash
|
|
54
|
+
synergyspec-selfevolving self-evolution episode --change "<change>" --json
|
|
53
55
|
\`\`\`
|
|
54
56
|
- Append \`--session-id <id>\` and/or \`--transcript <path>\` ONLY when the spawning prompt supplied them.
|
|
55
|
-
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
- \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
- **
|
|
68
|
-
- **
|
|
69
|
-
- **
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
- **
|
|
84
|
-
- **
|
|
85
|
-
- **
|
|
86
|
-
- **
|
|
87
|
-
- **
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
-
|
|
101
|
-
-
|
|
102
|
-
-
|
|
103
|
-
-
|
|
104
|
-
-
|
|
105
|
-
-
|
|
106
|
-
-
|
|
107
|
-
-
|
|
108
|
-
-
|
|
109
|
-
-
|
|
110
|
-
-
|
|
111
|
-
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
-
|
|
116
|
-
- Use \`
|
|
117
|
-
-
|
|
118
|
-
-
|
|
119
|
-
-
|
|
57
|
+
- Append \`--harness <harness>\` when the spawning prompt supplied \`claude\`, \`codex\`, or \`opencode\`, or when the prompt supplied \`unknown\` but this runner can identify the current host as Codex, Claude Code, or OpenCode. Never append \`--harness unknown\`.
|
|
58
|
+
- Append \`--rerun\` ONLY when the spawning prompt supplied \`Force-new: yes\`.
|
|
59
|
+
|
|
60
|
+
Do NOT grade, score, or author any edit yourself, and do NOT run \`evolve-from-edits\`, \`auto-evolve\`, or \`--agent\` / \`claude -p\` — those are not part of loop v2's host-facing path. The episode command IS the loop.
|
|
61
|
+
|
|
62
|
+
3. **Read the machine-written result**
|
|
63
|
+
|
|
64
|
+
The \`episode\` command prints the episode result as JSON (and persists it). Read it from the JSON output, and cross-check the on-disk record:
|
|
65
|
+
- \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/episode.json\` — the episode stage and policy versions.
|
|
66
|
+
- \`.synergyspec-selfevolving/self-evolution/episodes/<episodeId>/diagnosis.json\` — the 奖励智能体's reward(主臂), reward(基线臂), advantage, 文本梯度, and any abstain reason.
|
|
67
|
+
|
|
68
|
+
Take these fields straight from the result — NEVER recompute them:
|
|
69
|
+
- **advantage** = reward(主臂) − reward(基线臂) (null when the baseline arm was skipped or the reward agent 弃权 abstained).
|
|
70
|
+
- **decision**: \`rolled-back\` | \`kept\` | \`abstained\`.
|
|
71
|
+
- **evolution kind**: the 演进智能体 outcome — \`evolved\` | \`refused\` | \`not-spawned\` (null when evolution was skipped, e.g. on 弃权).
|
|
72
|
+
- **new 策略 POLICY version**: the 版本账本 ledger head AFTER the episode (post-rollback / post-evolve).
|
|
73
|
+
|
|
74
|
+
4. **Consult the 版本账本 ledger for context (read-only, optional)**
|
|
75
|
+
|
|
76
|
+
To explain the result against prior episodes, run the READ-ONLY view:
|
|
77
|
+
\`\`\`bash
|
|
78
|
+
synergyspec-selfevolving self-evolution policy show --target <targetId> --json
|
|
79
|
+
\`\`\`
|
|
80
|
+
This shows the 版本账本 ledger (prior 策略 POLICY versions for the target, with the current head) and the 否决缓冲 reject-buffer (rolled-back directions to avoid). Use it only to contextualize the verdict — it changes nothing.
|
|
81
|
+
|
|
82
|
+
5. **Classify the outcome (do not re-judge it)**
|
|
83
|
+
|
|
84
|
+
Map the machine result to a verdict, classifying any no-op honestly:
|
|
85
|
+
- **evolved** — the 演进智能体 wrote ONE bounded edit onto the 策略 POLICY; report the new version and the rollback command.
|
|
86
|
+
- **kept (no evolution) / abstained** — a verified-green or no-nameable-gap run where nothing was promoted is the CORRECT outcome (产物即弃), not a missed evolution. State the reason from diagnosis.json.
|
|
87
|
+
- **rolled-back** — the edit's advantage fell below threshold; the 策略 POLICY was restored to the prior good version and a 否决缓冲 reject-buffer entry recorded the lost direction. This is the loop working, not a failure.
|
|
88
|
+
- **busy-in-flight** — the episode command returned a clean deferral ('skipped — another in-flight episode holds the target') because another episode for the SAME 策略 POLICY target is already running and holds the in-flight lock. This is a TRANSIENT, self-healing concurrency deferral, NOT a DEFECT and NOT an \`error-...\`. Report \`Outcome: busy-in-flight\` (advantage null, no episode id, 策略 POLICY version unchanged), recommend WAIT-AND-RETRY after the lock clears (it self-heals; the CLI alone re-acquires the target once the holder finishes or the 60-minute stale window elapses), and STOP — do NOT hand-delete \`in-flight.json\` and do NOT call the lock 'stale'. Staleness is purely the 60-minute time window; a lock whose owner episode is at stage \`evolving\` or \`kept\` is a LIVE episode, not a stale one — deleting it would corrupt a running episode.
|
|
89
|
+
- **SAFE refusal** (evidence missing/red, target frozen, gate refused on real grounds) is expected; state the reason and move on.
|
|
90
|
+
- **DEFECT** (the orchestrator COULD NOT act for a reason that is NOT about evidence / freezing / scope — e.g. an unbindable target that persists) — surface it as an unresolved issue; do NOT hand-edit a canonical file to work around it. \`synergyspec-selfevolving status\` prints the machine-written \`Evolution:\` outcome — do not contradict it in free text.
|
|
91
|
+
|
|
92
|
+
6. **Emit the Runner Verdict (always — the final step)**
|
|
93
|
+
|
|
94
|
+
Your session's final message MUST end with the \`## Episode Verdict\` block defined in the Output Format below. Copy every field from the machine-written result (the \`episode\` JSON output / episode.json + diagnosis.json) — never re-judge it. Use \`not-run\` when the episode command was never invoked (change did not resolve); state the reason on the verdict lines.
|
|
95
|
+
|
|
96
|
+
**Output Format**
|
|
97
|
+
|
|
98
|
+
The session's final message MUST end with exactly this block shape:
|
|
99
|
+
|
|
100
|
+
\`\`\`
|
|
101
|
+
## Episode Verdict: <change-name>
|
|
102
|
+
- Outcome: evolved | kept | rolled-back | abstained | not-run | busy-in-flight | refused-static-gate | refused-unverified-evidence | refused-target-frozen | error-<...>
|
|
103
|
+
- Episode id: <episodeId, or none>
|
|
104
|
+
- Decision: rolled-back | kept | abstained
|
|
105
|
+
- Evolution: evolved | refused | not-spawned | none
|
|
106
|
+
- Advantage: <reward(主臂) − reward(基线臂), or null (baseline skipped / 弃权 abstained)>
|
|
107
|
+
- 策略 POLICY version: <new ledger head version, or unchanged>
|
|
108
|
+
- Evolved target: <canonical target id, or none>
|
|
109
|
+
- Canonical file(s) changed: <paths, or none>
|
|
110
|
+
- Rollback: synergyspec-selfevolving self-evolution promote <candidateId> --rollback
|
|
111
|
+
- Loss vs baseline: <loss / baseline, or unmeasured>
|
|
112
|
+
- Defects to surface: <genuine orchestrator errors that BLOCKED the episode — NOT evidence/red-test/frozen-target/scope refusals, and NOT busy-in-flight — or none>
|
|
113
|
+
- Key lessons: <up to 3 one-line bullets from diagnosis.json>
|
|
114
|
+
- Isolation: fresh-context subagent | inline fallback (degraded)
|
|
115
|
+
\`\`\`
|
|
116
|
+
|
|
117
|
+
- EVERY field MUST be copied from the machine-written result (the \`episode\` JSON output / episode.json + diagnosis.json) — never re-judged. The skill neither grades nor edits; it only relays.
|
|
118
|
+
- Use \`not-run\` when the episode command was never invoked (the change did not resolve); state the reason on the verdict lines.
|
|
119
|
+
- Use \`busy-in-flight\` when the episode command returned the clean concurrency deferral (another in-flight episode holds the same 策略 POLICY target): advantage is null, episode id is none, 策略 POLICY version is unchanged. It is TRANSIENT and self-healing (retry after the lock clears / the 60-min stale window) — it is NOT a DEFECT, do not list it under Defects to surface, and never advise deleting \`in-flight.json\`.
|
|
120
|
+
- When the episode did NOT start (Episode id is none — any not-run / busy-in-flight / error-* outcome), write \`none\` for Evolved target and Canonical file(s) changed, report Decision/Advantage as none/null, and leave 策略 POLICY version unchanged. The change's CONFIGURED target id is context only — do NOT copy it into the Evolved target field on a non-run verdict.
|
|
121
|
+
- A \`kept\` / \`abstained\` outcome on a verified-green run is the CORRECT no-op, not a missed evolution — say so plainly rather than hedging.
|
|
122
|
+
- Copy the supplied \`Isolation:\` value verbatim when present. If it was not supplied, report \`Isolation: fresh-context subagent\` when you were spawned as a subagent, or \`Isolation: inline fallback (degraded)\` when this skill is running inline in the spawning session.`;
|
|
120
123
|
export function getSelfEvolvingSkillTemplate() {
|
|
121
124
|
return {
|
|
122
125
|
name: 'synergyspec-selfevolving-self-evolving',
|
|
@@ -43,6 +43,45 @@ export function getVerifyChangeSkillTemplate() {
|
|
|
43
43
|
|
|
44
44
|
Each dimension can have CRITICAL, WARNING, or SUGGESTION issues.
|
|
45
45
|
|
|
46
|
+
4a. **Load and validate durable runner evidence**
|
|
47
|
+
|
|
48
|
+
Treat \`test-report.md\`, \`ci-report.md\`, and any chat-written summaries as
|
|
49
|
+
claims until their runner evidence is validated. Do not mark a requirement,
|
|
50
|
+
test suite, CI run, or PBT result as verified from a self-authored markdown
|
|
51
|
+
summary alone.
|
|
52
|
+
|
|
53
|
+
Check these files if they exist:
|
|
54
|
+
- \`synergyspec-selfevolving/changes/<name>/test-report.md\`
|
|
55
|
+
- \`synergyspec-selfevolving/ci-report.md\`
|
|
56
|
+
- \`synergyspec-selfevolving/changes/<name>/pbt-regressions.md\`
|
|
57
|
+
|
|
58
|
+
For each report that contains test or CI claims:
|
|
59
|
+
- Locate its \`### Runner Evidence\` section.
|
|
60
|
+
- Extract raw stdout/stderr log paths and the \`*-exit.json\` path.
|
|
61
|
+
- Verify every referenced evidence path exists on disk and is inside the project.
|
|
62
|
+
- Parse each exit JSON and require: \`command\`, \`cwd\`, \`startedAt\` or \`timestamp\`,
|
|
63
|
+
\`exitCode\`, and raw log paths.
|
|
64
|
+
- If optional JUnit or coverage paths are listed, verify they exist unless the
|
|
65
|
+
value is explicitly \`null\`, \`N/A\`, or empty.
|
|
66
|
+
- Cross-check the markdown verdict against \`exitCode\`: non-zero exit means
|
|
67
|
+
the run failed even if the markdown says PASS.
|
|
68
|
+
- Compare \`runner-exit.json.workspaceIdentity\` to the current root before
|
|
69
|
+
trusting the report: \`cwd\` must still be this project, recorded
|
|
70
|
+
\`pyproject.toml [project].name\` and hash must match the current
|
|
71
|
+
\`pyproject.toml\`, and recorded \`package.json\` name/hash must match the
|
|
72
|
+
current \`package.json\` when those files exist. A mismatch means the report
|
|
73
|
+
proves an older or different workspace, not the current change.
|
|
74
|
+
|
|
75
|
+
Evidence verdicts:
|
|
76
|
+
- **verified**: raw logs exist, exit JSON parses, required provenance fields exist, and verdict matches exit code.
|
|
77
|
+
- **unverified**: report exists but lacks runner evidence, has missing files, malformed JSON, or mismatched verdicts.
|
|
78
|
+
- **absent**: no report or evidence file exists.
|
|
79
|
+
|
|
80
|
+
Missing or unverified runner evidence is at least a WARNING. If the change is
|
|
81
|
+
otherwise claiming "all tests passed", "all requirements covered", or "ready
|
|
82
|
+
to archive" based on that report, promote it to CRITICAL until durable
|
|
83
|
+
evidence is available.
|
|
84
|
+
|
|
46
85
|
5. **Verify Completeness**
|
|
47
86
|
|
|
48
87
|
**Task Completion**:
|
|
@@ -114,6 +153,18 @@ export function getVerifyChangeSkillTemplate() {
|
|
|
114
153
|
| Coherence | Followed/Issues |
|
|
115
154
|
\`\`\`
|
|
116
155
|
|
|
156
|
+
**Evidence Provenance**:
|
|
157
|
+
\`\`\`markdown
|
|
158
|
+
### Evidence Provenance
|
|
159
|
+
| Source | Status | Exit JSON | Raw Logs | Notes |
|
|
160
|
+
|--------|--------|-----------|----------|-------|
|
|
161
|
+
| test-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
|
|
162
|
+
| ci-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
|
|
163
|
+
\`\`\`
|
|
164
|
+
|
|
165
|
+
Only count test and CI claims as verification evidence when this table marks
|
|
166
|
+
the corresponding source \`verified\`.
|
|
167
|
+
|
|
117
168
|
**Issues by Priority**:
|
|
118
169
|
|
|
119
170
|
1. **CRITICAL** (Must fix before archive):
|
|
@@ -131,17 +182,18 @@ export function getVerifyChangeSkillTemplate() {
|
|
|
131
182
|
- Minor improvements
|
|
132
183
|
- Each with specific recommendation
|
|
133
184
|
|
|
134
|
-
**Final Assessment**:
|
|
135
|
-
- If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
|
|
136
|
-
- If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
|
|
137
|
-
- If all clear: "All checks passed. Ready for learn, then archive."
|
|
138
|
-
|
|
139
|
-
Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
|
|
185
|
+
**Final Assessment**:
|
|
186
|
+
- If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
|
|
187
|
+
- If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
|
|
188
|
+
- If all clear: "All checks passed. Ready for learn, then archive."
|
|
189
|
+
|
|
190
|
+
Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
|
|
140
191
|
|
|
141
192
|
**Verification Heuristics**
|
|
142
193
|
|
|
143
194
|
- **Completeness**: Focus on objective checklist items (checkboxes, requirements list)
|
|
144
195
|
- **Correctness**: Use keyword search, file path analysis, reasonable inference - don't require perfect certainty
|
|
196
|
+
- **Evidence provenance**: Prefer raw runner logs, exit JSON, JUnit XML, and coverage artifacts over report prose. Self-authored markdown summaries without durable evidence are not proof.
|
|
145
197
|
- **Coherence**: Look for glaring inconsistencies, don't nitpick style
|
|
146
198
|
- **False Positives**: When uncertain, prefer SUGGESTION over WARNING, WARNING over CRITICAL
|
|
147
199
|
- **Actionability**: Every issue must have a specific recommendation with file/line references where applicable
|
|
@@ -257,6 +309,7 @@ export function getVerifyChangeSkillTemplate() {
|
|
|
257
309
|
- If only tasks.md exists: verify task completion only, skip spec/design checks
|
|
258
310
|
- If tasks + specs exist: verify completeness and correctness, skip design
|
|
259
311
|
- If full artifacts: verify all three dimensions
|
|
312
|
+
- If test/CI reports exist but runner evidence is missing or invalid: keep the reports as context only, mark evidence provenance unverified, and add a WARNING or CRITICAL per step 4a
|
|
260
313
|
- Always note which checks were skipped and why
|
|
261
314
|
- If git diff unavailable or \`synergyspec-selfevolving/specs/\` is empty: skip blast radius gracefully
|
|
262
315
|
|
|
@@ -264,11 +317,12 @@ export function getVerifyChangeSkillTemplate() {
|
|
|
264
317
|
|
|
265
318
|
Use clear markdown with:
|
|
266
319
|
- Table for summary scorecard
|
|
267
|
-
-
|
|
268
|
-
-
|
|
269
|
-
-
|
|
270
|
-
-
|
|
271
|
-
-
|
|
320
|
+
- Evidence Provenance table with verified / unverified / absent status for test-report.md and ci-report.md
|
|
321
|
+
- Grouped lists for issues (CRITICAL/WARNING/SUGGESTION)
|
|
322
|
+
- Code references in format: \`file.ts:123\`
|
|
323
|
+
- Specific, actionable recommendations
|
|
324
|
+
- Confirmation that \`synergyspec-selfevolving/changes/<name>/verification-report.md\` was written, or why it could not be written
|
|
325
|
+
- If no critical issues remain: suggest \`/synspec:learn <name>\` next, then \`/synspec:archive <name>\`
|
|
272
326
|
- No vague suggestions like "consider reviewing"`,
|
|
273
327
|
license: 'MIT',
|
|
274
328
|
compatibility: 'Requires synergyspec-selfevolving CLI.',
|
|
@@ -322,6 +376,45 @@ export function getOpsxVerifyCommandTemplate() {
|
|
|
322
376
|
|
|
323
377
|
Each dimension can have CRITICAL, WARNING, or SUGGESTION issues.
|
|
324
378
|
|
|
379
|
+
4a. **Load and validate durable runner evidence**
|
|
380
|
+
|
|
381
|
+
Treat \`test-report.md\`, \`ci-report.md\`, and any chat-written summaries as
|
|
382
|
+
claims until their runner evidence is validated. Do not mark a requirement,
|
|
383
|
+
test suite, CI run, or PBT result as verified from a self-authored markdown
|
|
384
|
+
summary alone.
|
|
385
|
+
|
|
386
|
+
Check these files if they exist:
|
|
387
|
+
- \`synergyspec-selfevolving/changes/<name>/test-report.md\`
|
|
388
|
+
- \`synergyspec-selfevolving/ci-report.md\`
|
|
389
|
+
- \`synergyspec-selfevolving/changes/<name>/pbt-regressions.md\`
|
|
390
|
+
|
|
391
|
+
For each report that contains test or CI claims:
|
|
392
|
+
- Locate its \`### Runner Evidence\` section.
|
|
393
|
+
- Extract raw stdout/stderr log paths and the \`*-exit.json\` path.
|
|
394
|
+
- Verify every referenced evidence path exists on disk and is inside the project.
|
|
395
|
+
- Parse each exit JSON and require: \`command\`, \`cwd\`, \`startedAt\` or \`timestamp\`,
|
|
396
|
+
\`exitCode\`, and raw log paths.
|
|
397
|
+
- If optional JUnit or coverage paths are listed, verify they exist unless the
|
|
398
|
+
value is explicitly \`null\`, \`N/A\`, or empty.
|
|
399
|
+
- Cross-check the markdown verdict against \`exitCode\`: non-zero exit means
|
|
400
|
+
the run failed even if the markdown says PASS.
|
|
401
|
+
- Compare \`runner-exit.json.workspaceIdentity\` to the current root before
|
|
402
|
+
trusting the report: \`cwd\` must still be this project, recorded
|
|
403
|
+
\`pyproject.toml [project].name\` and hash must match the current
|
|
404
|
+
\`pyproject.toml\`, and recorded \`package.json\` name/hash must match the
|
|
405
|
+
current \`package.json\` when those files exist. A mismatch means the report
|
|
406
|
+
proves an older or different workspace, not the current change.
|
|
407
|
+
|
|
408
|
+
Evidence verdicts:
|
|
409
|
+
- **verified**: raw logs exist, exit JSON parses, required provenance fields exist, and verdict matches exit code.
|
|
410
|
+
- **unverified**: report exists but lacks runner evidence, has missing files, malformed JSON, or mismatched verdicts.
|
|
411
|
+
- **absent**: no report or evidence file exists.
|
|
412
|
+
|
|
413
|
+
Missing or unverified runner evidence is at least a WARNING. If the change is
|
|
414
|
+
otherwise claiming "all tests passed", "all requirements covered", or "ready
|
|
415
|
+
to archive" based on that report, promote it to CRITICAL until durable
|
|
416
|
+
evidence is available.
|
|
417
|
+
|
|
325
418
|
5. **Verify Completeness**
|
|
326
419
|
|
|
327
420
|
**Task Completion**:
|
|
@@ -393,6 +486,18 @@ export function getOpsxVerifyCommandTemplate() {
|
|
|
393
486
|
| Coherence | Followed/Issues |
|
|
394
487
|
\`\`\`
|
|
395
488
|
|
|
489
|
+
**Evidence Provenance**:
|
|
490
|
+
\`\`\`markdown
|
|
491
|
+
### Evidence Provenance
|
|
492
|
+
| Source | Status | Exit JSON | Raw Logs | Notes |
|
|
493
|
+
|--------|--------|-----------|----------|-------|
|
|
494
|
+
| test-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
|
|
495
|
+
| ci-report.md | verified / unverified / absent | \`...\` | stdout/stderr paths | <reason> |
|
|
496
|
+
\`\`\`
|
|
497
|
+
|
|
498
|
+
Only count test and CI claims as verification evidence when this table marks
|
|
499
|
+
the corresponding source \`verified\`.
|
|
500
|
+
|
|
396
501
|
**Issues by Priority**:
|
|
397
502
|
|
|
398
503
|
1. **CRITICAL** (Must fix before archive):
|
|
@@ -410,17 +515,18 @@ export function getOpsxVerifyCommandTemplate() {
|
|
|
410
515
|
- Minor improvements
|
|
411
516
|
- Each with specific recommendation
|
|
412
517
|
|
|
413
|
-
**Final Assessment**:
|
|
414
|
-
- If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
|
|
415
|
-
- If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
|
|
416
|
-
- If all clear: "All checks passed. Ready for learn, then archive."
|
|
417
|
-
|
|
418
|
-
Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
|
|
518
|
+
**Final Assessment**:
|
|
519
|
+
- If CRITICAL issues: "X critical issue(s) found. Fix before archiving."
|
|
520
|
+
- If only warnings: "No critical issues. Y warning(s) to consider. Ready for learn, then archive."
|
|
521
|
+
- If all clear: "All checks passed. Ready for learn, then archive."
|
|
522
|
+
|
|
523
|
+
Write the verification report to \`synergyspec-selfevolving/changes/<name>/verification-report.md\` so \`/synspec:learn\` can read concrete verification evidence. If you cannot write it, state that explicitly and include the report in the chat response.
|
|
419
524
|
|
|
420
525
|
**Verification Heuristics**
|
|
421
526
|
|
|
422
527
|
- **Completeness**: Focus on objective checklist items (checkboxes, requirements list)
|
|
423
528
|
- **Correctness**: Use keyword search, file path analysis, reasonable inference - don't require perfect certainty
|
|
529
|
+
- **Evidence provenance**: Prefer raw runner logs, exit JSON, JUnit XML, and coverage artifacts over report prose. Self-authored markdown summaries without durable evidence are not proof.
|
|
424
530
|
- **Coherence**: Look for glaring inconsistencies, don't nitpick style
|
|
425
531
|
- **False Positives**: When uncertain, prefer SUGGESTION over WARNING, WARNING over CRITICAL
|
|
426
532
|
- **Actionability**: Every issue must have a specific recommendation with file/line references where applicable
|
|
@@ -536,6 +642,7 @@ export function getOpsxVerifyCommandTemplate() {
|
|
|
536
642
|
- If only tasks.md exists: verify task completion only, skip spec/design checks
|
|
537
643
|
- If tasks + specs exist: verify completeness and correctness, skip design
|
|
538
644
|
- If full artifacts: verify all three dimensions
|
|
645
|
+
- If test/CI reports exist but runner evidence is missing or invalid: keep the reports as context only, mark evidence provenance unverified, and add a WARNING or CRITICAL per step 4a
|
|
539
646
|
- Always note which checks were skipped and why
|
|
540
647
|
- If git diff unavailable or \`synergyspec-selfevolving/specs/\` is empty: skip blast radius gracefully
|
|
541
648
|
|
|
@@ -543,11 +650,12 @@ export function getOpsxVerifyCommandTemplate() {
|
|
|
543
650
|
|
|
544
651
|
Use clear markdown with:
|
|
545
652
|
- Table for summary scorecard
|
|
546
|
-
-
|
|
547
|
-
-
|
|
548
|
-
-
|
|
549
|
-
-
|
|
550
|
-
-
|
|
653
|
+
- Evidence Provenance table with verified / unverified / absent status for test-report.md and ci-report.md
|
|
654
|
+
- Grouped lists for issues (CRITICAL/WARNING/SUGGESTION)
|
|
655
|
+
- Code references in format: \`file.ts:123\`
|
|
656
|
+
- Specific, actionable recommendations
|
|
657
|
+
- Confirmation that \`synergyspec-selfevolving/changes/<name>/verification-report.md\` was written, or why it could not be written
|
|
658
|
+
- If no critical issues remain: suggest \`/synspec:learn <name>\` next, then \`/synspec:archive <name>\`
|
|
551
659
|
- No vague suggestions like "consider reviewing"`
|
|
552
660
|
};
|
|
553
661
|
}
|