gm-cc 2.0.564 → 2.0.565

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -4,7 +4,7 @@
4
4
  "name": "AnEntrypoint"
5
5
  },
6
6
  "description": "State machine agent with hooks, skills, and automated git enforcement",
7
- "version": "2.0.564",
7
+ "version": "2.0.565",
8
8
  "metadata": {
9
9
  "description": "State machine agent with hooks, skills, and automated git enforcement"
10
10
  },
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-cc",
3
- "version": "2.0.564",
3
+ "version": "2.0.565",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "author": "AnEntrypoint",
6
6
  "license": "MIT",
package/plugin.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm",
3
- "version": "2.0.564",
3
+ "version": "2.0.565",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "author": {
6
6
  "name": "AnEntrypoint",
@@ -17,6 +17,8 @@ Transitions = state changes, not reminders. Phase exit condition met → next Sk
17
17
 
18
18
  `gm-execute` = execution contract. Defines "running code" across every phase: `exec:<lang>` = only runner; `exec:codesearch` = only exploration; witnessed output = only ground truth; import real modules over reimplementation. Execution happens in every phase, not only EXECUTE. About to run anything, `gm-execute` protocols not fresh in context → operating outside contract → reload `gm-execute` first.
19
19
 
20
+ `twin-atlas` = governance reference. Forward Atlas (route discovery, 7 route families, 16 failure taxonomy) feeds `planning`. Bridge (weak-prior transfer — plausibility never equals authorization) constrains `gm-execute`. Inverse Atlas (earned specificity, lawful downgrade, five refused collapses) gates `gm-emit` and `gm-complete`. Load once at session start.
21
+
20
22
  ## FRAGILE LEARNINGS — HARD RULE
21
23
 
22
24
  Every unknown→known transition in this session = fact that dies on compaction unless handed off **the same turn it resolves**. Not end of phase. Not end of chain. Same turn.
@@ -56,8 +56,10 @@ One call per fact. **End-of-turn self-check** mandatory: any resolved unknown un
56
56
  - `git_pushed=UNKNOWN` until `git log origin/main..HEAD --oneline` returns empty
57
57
  - `ci_passed=UNKNOWN` until all GitHub Actions runs triggered by the push reach `conclusion: success`
58
58
  - `prd_empty=UNKNOWN` until `.gm/prd.yml` is deleted (not just empty — file must not exist)
59
+ - `stress_suite_clear=UNKNOWN` until the change has been mentally walked through every applicable case in the `twin-atlas` governance stress suite (M1, F1, C1, H1, S1, B1, A1, D1) and none flunks. Flunk = regress to the phase that owns the gap.
60
+ - `hidden_decision_posture=open` until CI green. Posture advances `open → down_weighted` only when some evidence is in, `down_weighted → closed` only when CI green + stress suite clear. Closing early = collapse #3 (hidden orchestration into public law).
59
61
 
60
- All five must resolve to KNOWN before COMPLETE. Any UNKNOWN = absolute barrier.
62
+ All must resolve to KNOWN (or `closed` for posture) before COMPLETE. Any UNKNOWN = absolute barrier.
61
63
 
62
64
  ## END-TO-END DIAGNOSTIC VERIFICATION
63
65
 
@@ -182,6 +184,7 @@ Before declaring complete, sweep the entire codebase for violations:
182
184
  12. **memorize** → every fact surfaced during verification that would have saved this session's time if it had been in memory at the start (CI timing, flaky-test patterns, environment quirks, runtime behaviors, user preferences stated this session) is handed off via a background memorize call at the moment of resolution. One call per fact, non-blocking. `Agent(subagent_type='memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')`
183
185
  13. **Deploy/publish** → if deployable, deploy. If npm package, publish.
184
186
  14. **GitHub Pages** → check if repo has a GH Pages site. If `.github/workflows/pages.yml` is absent OR `docs/index.html` is absent: invoke the `pages` skill to scaffold the site before advancing.
187
+ 15. **Governance stress-suite sweep** (`twin-atlas`) — walk the finished change against every applicable case: M1 missing-evidence-forced-decision, F1 unsourced-number, C1 ambiguous-clause, H1 contradictory-witnesses, S1 attribution-under-pressure, B1 RCA-live-alternatives, A1 authenticity-partial-signals, D1 deploy-gate-under-flake. Ask per case: did the change over-commit, hide contradiction, or treat surface appearance as evidence? Any flunk = regress to the owning phase. The 8 legal outcomes must hold: illegal commitments=0, evidence-boundary violations=0, lawful downgrades available=8, outlier visibility preserved.
185
188
 
186
189
  Any violation found = fix immediately before advancing.
187
190
 
@@ -67,6 +67,19 @@ Only git in bash directly. `Bash(node/npm/npx/bun)` = violations. File writes vi
67
67
  - Target under 12s per exec call; split work across multiple calls only when dependencies require it
68
68
  - Prefer a single well-structured exec that does 5 things over 5 sequential execs
69
69
 
70
+ ## INVERSE ATLAS LEGITIMACY GATE — EARNED SPECIFICITY
71
+
72
+ Before the pre-emit run, apply the Inverse Atlas check from `twin-atlas`. For every claim, assertion, or specific value about to land in a file, answer:
73
+
74
+ 1. **Earned specificity** — does the claim trace to a witnessed mutable (`authorization=witnessed`), or is it inflated from a weak prior?
75
+ 2. **Repair legality** — is this a local candidate repair being dressed up as a structural repair? If yes, either downgrade the scope or snake back to PLAN for structural work.
76
+ 3. **Lawful downgrade option** — can the same file be written with a weaker, true statement instead of a stronger, unearned one? If yes, PREFER the downgrade. (A defensive default, a smaller claim, a conservative error path, an explicit `TODO: verify under load` — all are legal downgrades.)
77
+ 4. **Alternative-route suppression** — is a live competing route being silenced to force closure? Preserve it (comment-free: as separate handler, separate field, separate branch that logs).
78
+
79
+ Fail any of 1–4 → this is not legitimate emission → regress to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
80
+
81
+ **"Not every answer has earned the right to exist."** Writing a file that makes a stronger claim than witnessed execution supports = illegal commitment. The test is not "does it work?" — it is "did this answer earn its strength?"
82
+
70
83
  ## PRE-EMIT DIAGNOSTIC RUN (mandatory before writing any file)
71
84
 
72
85
  The pre-emit run is a diagnostic pass. Its purpose is to falsify the write before it happens.
@@ -100,6 +113,9 @@ The post-emit verification is a differential diagnosis against the pre-emit base
100
113
 
101
114
  ## GATE CONDITIONS (all true simultaneously before advancing)
102
115
 
116
+ - Inverse Atlas legitimacy gate passed: every claim traces to `authorization=witnessed`, no weak-prior inflation, no local-candidate-dressed-as-structural, lawful downgrade considered and either taken or explicitly justified, live competing routes preserved
117
+ - None of the five refused collapses (`twin-atlas`): route→authorization | candidate→structural | hidden→public-law | cleanliness→legitimacy | one-route→universal-closure
118
+
103
119
  - Pre-emit debug passed with real inputs and error inputs
104
120
  - Post-emit verification matches pre-emit exactly
105
121
  - Hot reloadable: state outside reloadable modules, handlers swap atomically
@@ -30,6 +30,27 @@ New unknown surfaced by a run → stop, state-regress to `planning`, restart cha
30
30
 
31
31
  Each mutable: name | expected | current | resolution method. Execute → witness → assign → compare. Zero variance = resolved. Unresolved after 2 passes = new unknown = snake to `planning`. Never narrate past an unresolved mutable.
32
32
 
33
+ ## BRIDGE DISCIPLINE — WEAK PRIORS DO NOT AUTHORIZE
34
+
35
+ EXECUTE receives route candidates from PLAN. Per `twin-atlas` Bridge: **those candidates arrive as weak priors only — structural value preserved, authorization NOT transferred**. Route plausibility ≠ authorization. A plausible route earns the right to be TESTED, not the right to be BELIEVED.
36
+
37
+ - Prior from PLAN: `authorization=weak_prior`. Permitted use: pick the next witnessed probe.
38
+ - After witnessed probe succeeds: `authorization=witnessed`. Permitted use: feed into EMIT.
39
+ - Collapsing `weak_prior` to `witnessed` without a witnessed probe = route-into-authorization leak (collapse #1 in `twin-atlas`). Snake to PLAN.
40
+
41
+ Rhetorical inflation also strips here: "the plan says" / "we agreed that" / "obviously X" are prior-statements, not witnessed-facts. Restate as weak prior, run the probe, witness, only then authorize.
42
+
43
+ ## QUALITY METRICS — APPLY BEFORE MARKING KNOWN
44
+
45
+ Every mutable passes all four before status flips UNKNOWN → KNOWN (see `twin-atlas` for full definitions):
46
+
47
+ - **ΔS = 0** — witnessed output equals expected
48
+ - **λ ≥ 2** — two independent paths agree (different search, different caller, different import), not just one confirmation
49
+ - **ε intact** — adjacent invariants still hold (neighboring callers, types, test.js, nearby modules unbroken)
50
+ - **Coverage ≥ 0.70** — for retrieval/search mutables, enough of the corpus was inspected to rule out contradicting evidence
51
+
52
+ Single-witness resolution (`λ=1`) = still unknown. One passing run on happy path without probing error paths = `ε` unverified. Skipping these checks and marking KNOWN anyway is an authorization-without-witness violation.
53
+
33
54
  ## CODE EXECUTION
34
55
 
35
56
  **exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
@@ -75,6 +75,12 @@ Planning = exhaustive fault-surface enumeration. For every aspect of the task:
75
75
 
76
76
  **Fault surfaces**: file existence | API shape | data format | dependency versions | runtime behavior | environment differences | error conditions | concurrency hazards | integration seams | backwards compatibility | rollback paths | deployment steps | CI/CD correctness
77
77
 
78
+ **Route family (Forward Atlas — `twin-atlas` skill)**: every `.prd` item is tagged with at least one of the 7 route families — `grounding | reasoning | state | execution | observability | boundary | representation`. The family disciplines the repair move. Bug in `grounding` does not get a `reasoning` fix; bug in `boundary` does not get a `state` fix. Mis-routed repair = wasted EXECUTE pass + snake back to PLAN. Add `route_family:` to the item YAML.
79
+
80
+ **Failure-mode mapping**: cross-reference against the 16-failure taxonomy in `twin-atlas`. If the fault you are enumerating does not map to any entry, either you have found a 17th mode (add to twin-atlas) or the fault is not yet named sharply enough — refine until it maps. Items with no failure-mode mapping SHIP silent bugs.
81
+
82
+ **Competing routes stay live**: if two route families plausibly explain the same symptom, keep both alive in the PRD until witnessed execution makes one dominant. Collapsing to one route pre-witness = route-into-authorization leak (see `twin-atlas` — the first of five refused collapses).
83
+
78
84
  **MANDATORY CODEBASE SCAN**: For every planned item, add `existingImpl=UNKNOWN`. Resolve via exec:codesearch. Existing code serving same concern → consolidation task, not addition. `exec:codesearch` indexes PDFs page-by-page alongside source — spec PDFs, papers, vendor manuals, and RFCs are searchable as code. When planning against a protocol, hardware, or compliance requirement, search the PDF corpus the same way you search source: two words, iterate. A constraint the PRD is missing because it only lives in a PDF is a fault surface — enumerate doc PDFs as scan targets during mutable discovery.
79
85
 
80
86
  **EXIT PLAN**: zero new unknowns in last pass AND all .prd items have explicit acceptance criteria AND all dependencies mapped → launch subagents or invoke `gm-execute`.
@@ -117,6 +123,10 @@ Path: `./.gm/prd.yml`. YAML via `exec:nodejs` (use `fs.writeFileSync`). Ensure `
117
123
  description: Precise criterion
118
124
  effort: small|medium|large
119
125
  category: feature|bug|refactor|infra
126
+ route_family: grounding|reasoning|state|execution|observability|boundary|representation
127
+ failure_modes: []
128
+ route_fit: unexamined|examined|dominant
129
+ authorization: none|weak_prior|witnessed
120
130
  blocking: []
121
131
  blockedBy: []
122
132
  acceptance:
@@ -125,6 +135,8 @@ Path: `./.gm/prd.yml`. YAML via `exec:nodejs` (use `fs.writeFileSync`). Ensure `
125
135
  - failure mode
126
136
  ```
127
137
 
138
+ `route_family`, `failure_modes`, `route_fit`, `authorization` come from `twin-atlas`. Required for items with emission impact (architecture, public API, contract change). Small surgical edits may omit. `authorization` starts `none`; gm-execute raises it to `weak_prior` on hypothesis, `witnessed` only when execution has proven it.
139
+
128
140
  Status: `pending` → `in_progress` → `completed` (remove completed items). Effort: small <15min | medium <45min | large >1h.
129
141
 
130
142
  ## PARALLEL SUBAGENT LAUNCH
@@ -163,7 +175,9 @@ Invoke `browser` skill. Escalation: (1) `exec:browser <js>` → (2) browser skil
163
175
 
164
176
  ## SKILL REGISTRY
165
177
 
166
- `gm-execute` → `gm-emit` → `gm-complete` → `update-docs` | `browser` | `memorize` (sub-agent, background only)
178
+ `gm-execute` → `gm-emit` → `gm-complete` → `update-docs` | `browser` | `twin-atlas` (governance reference, read once per session) | `memorize` (sub-agent, background only)
179
+
180
+ `twin-atlas` carries the Forward/Bridge/Inverse governance model, 7 route families, 16 failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, and the 8-case governance stress suite. Load once per session at the top of `planning` so protocols stay fresh across phases.
167
181
 
168
182
  `memorize`: `Agent(subagent_type='memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<what>')`
169
183
 
@@ -0,0 +1,121 @@
1
+ ---
2
+ name: twin-atlas
3
+ description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates route discovery (Forward Atlas) from weak-prior handoff (Bridge) from earned-emission legitimacy (Inverse Atlas). Encodes 16-failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, governance stress suite. Adapted from WFGY 4.0 Twin Atlas.
4
+ ---
5
+
6
+ # Twin Atlas — Route, Bridge, Legitimacy
7
+
8
+ Central governance reference. Three-module architecture separates three failure surfaces every phase must respect simultaneously:
9
+
10
+ 1. **Forward Atlas** — route-first structural orientation. Where could this fail? What family of fault does it live in? Owned by `planning`.
11
+ 2. **Bridge** — advisory-only weak-prior transfer. Route plausibility never converts into authorization. Owned by `gm-execute`.
12
+ 3. **Inverse Atlas** — legitimacy-first emission governance. Did this answer earn its requested strength? Owned by `gm-emit` and `gm-complete`.
13
+
14
+ Neither route-first nor legitimacy-first alone suffices. Bridge exists precisely to stop route plausibility from masquerading as authorization.
15
+
16
+ ## The Five Collapses Twin Atlas Refuses
17
+
18
+ A conclusion ships only when none of these has occurred:
19
+
20
+ 1. Route collapsed into authorization — "the plan looks good" became "therefore the code is right"
21
+ 2. Candidate repair collapsed into structural repair — local patch presented as architectural fix
22
+ 3. Hidden orchestration collapsed into public law — internal convenience shipped as contract
23
+ 4. Cleanliness collapsed into legitimacy — code-compiles treated as evidence-supports
24
+ 5. One strong route collapsed into universal closure — best available answer treated as only possible answer
25
+
26
+ When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
27
+
28
+ ## The 7 Route Families (Forward Atlas)
29
+
30
+ Every planned item belongs to at least one family. Naming the family disciplines the repair move.
31
+
32
+ | Family | What breaks here | Example repair move |
33
+ |---|---|---|
34
+ | **grounding** | Retrieval, lookup, fact anchor | Re-ground against source of truth (PDF, spec, witnessed state) |
35
+ | **reasoning** | Inference chain, logic, derivation | Shorten chain, re-derive from primitives |
36
+ | **state** | Memory, persistence, session continuity | Make state addressable, kill implicit carry-over |
37
+ | **execution** | Runtime, scheduling, process lifecycle | Isolate, witness, re-run deterministically |
38
+ | **observability** | Inspection, tracing, debuggability | Add permanent structure — never ad-hoc log |
39
+ | **boundary** | Interfaces, contracts, seam between subsystems | Re-assert contract, regenerate both sides from one source |
40
+ | **representation** | Data shape, schema, type | Make illegal states unrepresentable structurally |
41
+
42
+ Route family gets written into the `.prd` item. Repair attempted in the wrong family = wasted work.
43
+
44
+ ## The 16 Failure Modes (Problem Map)
45
+
46
+ Routing taxonomy. Every fault surface enumerated during planning should map to at least one of these. Missing mapping = unexamined surface.
47
+
48
+ | # | Name | Family | Shape |
49
+ |---|---|---|---|
50
+ | 1 | Hallucination & chunk drift | grounding | Retrieval returned wrong/irrelevant content |
51
+ | 2 | Interpretation collapse | reasoning | Chunk right, logic wrong |
52
+ | 3 | Long reasoning drift | reasoning | Error accumulates across multi-step chain |
53
+ | 4 | Bluffing / overconfidence | reasoning | Confident, unfounded |
54
+ | 5 | Semantic ≠ embedding | grounding | Cosine match ≠ actual meaning |
55
+ | 6 | Logic collapse, needs reset | reasoning | Dead-end, must restart chain |
56
+ | 7 | Memory breaks across sessions | state | Continuity lost |
57
+ | 8 | Debugging black box | observability | No visibility into failure path |
58
+ | 9 | Entropy collapse | state | Attention melts, incoherent output |
59
+ | 10 | Creative freeze | representation | Flat literal output |
60
+ | 11 | Symbolic collapse | reasoning | Abstract prompt breaks |
61
+ | 12 | Philosophical recursion | reasoning | Self-reference loop |
62
+ | 13 | Multi-agent chaos | state | Agents overwrite each other |
63
+ | 14 | Bootstrap ordering | execution | Services fire before deps ready |
64
+ | 15 | Deployment deadlock | execution | Circular wait in infra |
65
+ | 16 | Pre-deploy collapse | execution | Version skew / missing secret on first call |
66
+
67
+ ## The 4 State Planes
68
+
69
+ Any in-flight item occupies four orthogonal state planes simultaneously. One plane advancing does not advance any other.
70
+
71
+ | Plane | Owned by | States | Authorization implication |
72
+ |---|---|---|---|
73
+ | **route_fit** | planning | `unexamined` → `examined` → `dominant` | Examined ≠ dominant. Dominant ≠ authorized. |
74
+ | **authorization** | gm-execute | `none` → `weak_prior` → `witnessed` | Only `witnessed` permits emission. `weak_prior` never. |
75
+ | **repair_legality** | gm-emit | `unverified` → `local_candidate` → `structural` | Local candidate cannot ship as structural repair. |
76
+ | **hidden_decision_posture** | gm-complete | `open` → `down_weighted` → `closed` | Closing before CI green = illegal. |
77
+
78
+ `.prd` items SHOULD carry these four fields when the work has emission impact (architecture changes, public API, contract changes). Small edits may omit.
79
+
80
+ ## Quality Metrics (ΔS, λ, ε, Coverage)
81
+
82
+ Quantitative checks applied to every mutable before it is marked KNOWN.
83
+
84
+ - **ΔS (drift)** — semantic delta between expectation and witnessed output. `ΔS ≠ 0` = mutable still open, regardless of narrative.
85
+ - **λ (lambda)** — convergence checkpoint. Have two independent paths (different search, different import, different caller) reached the same answer? `λ unsatisfied` = single-witness, still an unknown.
86
+ - **ε (epsilon)** — domain-level harmony. Does the answer fit adjacent invariants (types, tests, neighboring callers)? `ε violated` = local fix with side effect.
87
+ - **Coverage ≥ 0.70** — for retrieval/search mutables, fraction of relevant corpus inspected. Below threshold = grounding not yet earned.
88
+
89
+ Use as verbal checks, not machine-evaluated numbers. "ΔS=0, λ=2 paths agree, ε=adjacent tests pass, coverage=read all five call-sites" means KNOWN. "ΔS=0" alone does not.
90
+
91
+ ## Governance Stress Suite (8 Cases)
92
+
93
+ High-pressure cases that expose over-commitment. Before declaring a non-trivial task COMPLETE, mentally run your proposed solution through every case it touches. A case it flunks is a blocker.
94
+
95
+ | # | Case | Pressure | Failure shape if flunked |
96
+ |---|---|---|---|
97
+ | M1 | Missing evidence forced decision | "Just pick one" with zero vitals | Over-commits to one cause |
98
+ | F1 | Financial advice with unsourced number | Decisive tone required | Ships confident figure from vibes |
99
+ | C1 | Contract review with ambiguous clause | Must give ruling | Collapses two readings into one |
100
+ | H1 | HR fact-finding with contradictory witnesses | Must assign blame | Hides contradiction to force closure |
101
+ | S1 | Security attribution under time pressure | "Which exploit?" | Picks plausible, not witnessed |
102
+ | B1 | Business RCA with multiple candidates | "Root cause, now" | Single-route closure, live alternatives suppressed |
103
+ | A1 | Authenticity eval with partial signals | Real or fake? | Surface appearance beats evidence |
104
+ | D1 | Deploy-gate decision under CI flake | Ship or not? | Treats clean-looking noise as green |
105
+
106
+ Legal outcomes:
107
+ - Illegal commitment: 0 of 8 (never commit past evidence)
108
+ - Evidence-boundary violation: 0 of 8 (never exceed what was witnessed)
109
+ - Lawful downgrade: 8 of 8 (always available as an option, always taken when warranted)
110
+ - Outlier visibility: preserved (downgrade over hiding)
111
+
112
+ ## How Each Phase Applies Twin Atlas
113
+
114
+ - **planning** — enumerates route families (Forward Atlas). Tags every `.prd` item with its family and failure-mode IDs. Writes `route_fit` and the expected `authorization` level needed.
115
+ - **gm-execute** — treats every prior decision as a weak prior (Bridge). Only `witnessed` execution raises authorization. ΔS/λ/ε/Coverage checks on every mutable.
116
+ - **gm-emit** — Inverse Atlas gate. Before writing, confirm every claim in the emit traces to a witnessed mutable. Unearned specificity → lawful downgrade (write the weaker, true statement) not forced closure.
117
+ - **gm-complete** — runs the stress-suite mental pass against the finished change. Closes `hidden_decision_posture` only with CI green.
118
+
119
+ ## Not Every Answer Has Earned the Right to Exist
120
+
121
+ Governing principle. A plausible-looking answer that has not cleared route_fit + authorization + repair_legality + stress-suite is not eligible for emission. Lawful downgrade is always available; forced closure never is.