npm - gm-cc - Versions diffs - 2.0.563 → 2.0.565 - Mend

gm-cc 2.0.563 → 2.0.565

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/.claude-plugin/marketplace.json +1 -1
package/bin/plugkit-darwin-arm64 +0 -0
package/bin/plugkit-darwin-x64 +0 -0
package/bin/plugkit-linux-arm64 +0 -0
package/bin/plugkit-linux-x64 +0 -0
package/bin/plugkit-win32-arm64.exe +0 -0
package/bin/plugkit-win32-x64.exe +0 -0
package/bin/plugkit.exe +0 -0
package/package.json +1 -1
package/plugin.json +1 -1
package/skills/gm/SKILL.md +2 -0
package/skills/gm-complete/SKILL.md +4 -1
package/skills/gm-emit/SKILL.md +16 -0
package/skills/gm-execute/SKILL.md +21 -0
package/skills/planning/SKILL.md +15 -1
package/skills/twin-atlas/SKILL.md +121 -0

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -4,7 +4,7 @@
     "name": "AnEntrypoint"
   },
   "description": "State machine agent with hooks, skills, and automated git enforcement",
-  "version": "2.0.563",
+  "version": "2.0.565",
   "metadata": {
     "description": "State machine agent with hooks, skills, and automated git enforcement"
   },

package/bin/plugkit-darwin-arm64 CHANGED Viewed

Binary file

package/bin/plugkit-darwin-x64 CHANGED Viewed

Binary file

package/bin/plugkit-linux-arm64 CHANGED Viewed

Binary file

package/bin/plugkit-linux-x64 CHANGED Viewed

Binary file

package/bin/plugkit-win32-arm64.exe CHANGED Viewed

Binary file

package/bin/plugkit-win32-x64.exe CHANGED Viewed

Binary file

package/bin/plugkit.exe CHANGED Viewed

Binary file

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm-cc",
-  "version": "2.0.563",
+  "version": "2.0.565",
   "description": "State machine agent with hooks, skills, and automated git enforcement",
   "author": "AnEntrypoint",
   "license": "MIT",

package/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm",
-  "version": "2.0.563",
+  "version": "2.0.565",
   "description": "State machine agent with hooks, skills, and automated git enforcement",
   "author": {
     "name": "AnEntrypoint",

package/skills/gm/SKILL.md CHANGED Viewed

@@ -17,6 +17,8 @@ Transitions = state changes, not reminders. Phase exit condition met → next Sk
 `gm-execute` = execution contract. Defines "running code" across every phase: `exec:<lang>` = only runner; `exec:codesearch` = only exploration; witnessed output = only ground truth; import real modules over reimplementation. Execution happens in every phase, not only EXECUTE. About to run anything, `gm-execute` protocols not fresh in context → operating outside contract → reload `gm-execute` first.
+`twin-atlas` = governance reference. Forward Atlas (route discovery, 7 route families, 16 failure taxonomy) feeds `planning`. Bridge (weak-prior transfer — plausibility never equals authorization) constrains `gm-execute`. Inverse Atlas (earned specificity, lawful downgrade, five refused collapses) gates `gm-emit` and `gm-complete`. Load once at session start.
 ## FRAGILE LEARNINGS — HARD RULE
 Every unknown→known transition in this session = fact that dies on compaction unless handed off **the same turn it resolves**. Not end of phase. Not end of chain. Same turn.

package/skills/gm-complete/SKILL.md CHANGED Viewed

@@ -56,8 +56,10 @@ One call per fact. **End-of-turn self-check** mandatory: any resolved unknown un
 - `git_pushed=UNKNOWN` until `git log origin/main..HEAD --oneline` returns empty
 - `ci_passed=UNKNOWN` until all GitHub Actions runs triggered by the push reach `conclusion: success`
 - `prd_empty=UNKNOWN` until `.gm/prd.yml` is deleted (not just empty — file must not exist)
+- `stress_suite_clear=UNKNOWN` until the change has been mentally walked through every applicable case in the `twin-atlas` governance stress suite (M1, F1, C1, H1, S1, B1, A1, D1) and none flunks. Flunk = regress to the phase that owns the gap.
+- `hidden_decision_posture=open` until CI green. Posture advances `open → down_weighted` only when some evidence is in, `down_weighted → closed` only when CI green + stress suite clear. Closing early = collapse #3 (hidden orchestration into public law).
-All five must resolve to KNOWN before COMPLETE. Any UNKNOWN = absolute barrier.
+All must resolve to KNOWN (or `closed` for posture) before COMPLETE. Any UNKNOWN = absolute barrier.
 ## END-TO-END DIAGNOSTIC VERIFICATION
@@ -182,6 +184,7 @@ Before declaring complete, sweep the entire codebase for violations:
 12. **memorize** → every fact surfaced during verification that would have saved this session's time if it had been in memory at the start (CI timing, flaky-test patterns, environment quirks, runtime behaviors, user preferences stated this session) is handed off via a background memorize call at the moment of resolution. One call per fact, non-blocking. `Agent(subagent_type='memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')`
 13. **Deploy/publish** → if deployable, deploy. If npm package, publish.
 14. **GitHub Pages** → check if repo has a GH Pages site. If `.github/workflows/pages.yml` is absent OR `docs/index.html` is absent: invoke the `pages` skill to scaffold the site before advancing.
+15. **Governance stress-suite sweep** (`twin-atlas`) — walk the finished change against every applicable case: M1 missing-evidence-forced-decision, F1 unsourced-number, C1 ambiguous-clause, H1 contradictory-witnesses, S1 attribution-under-pressure, B1 RCA-live-alternatives, A1 authenticity-partial-signals, D1 deploy-gate-under-flake. Ask per case: did the change over-commit, hide contradiction, or treat surface appearance as evidence? Any flunk = regress to the owning phase. The 8 legal outcomes must hold: illegal commitments=0, evidence-boundary violations=0, lawful downgrades available=8, outlier visibility preserved.
 Any violation found = fix immediately before advancing.

package/skills/gm-emit/SKILL.md CHANGED Viewed

@@ -67,6 +67,19 @@ Only git in bash directly. `Bash(node/npm/npx/bun)` = violations. File writes vi
 - Target under 12s per exec call; split work across multiple calls only when dependencies require it
 - Prefer a single well-structured exec that does 5 things over 5 sequential execs
+## INVERSE ATLAS LEGITIMACY GATE — EARNED SPECIFICITY
+Before the pre-emit run, apply the Inverse Atlas check from `twin-atlas`. For every claim, assertion, or specific value about to land in a file, answer:
+1. **Earned specificity** — does the claim trace to a witnessed mutable (`authorization=witnessed`), or is it inflated from a weak prior?
+2. **Repair legality** — is this a local candidate repair being dressed up as a structural repair? If yes, either downgrade the scope or snake back to PLAN for structural work.
+3. **Lawful downgrade option** — can the same file be written with a weaker, true statement instead of a stronger, unearned one? If yes, PREFER the downgrade. (A defensive default, a smaller claim, a conservative error path, an explicit `TODO: verify under load` — all are legal downgrades.)
+4. **Alternative-route suppression** — is a live competing route being silenced to force closure? Preserve it (comment-free: as separate handler, separate field, separate branch that logs).
+Fail any of 1–4 → this is not legitimate emission → regress to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
+**"Not every answer has earned the right to exist."** Writing a file that makes a stronger claim than witnessed execution supports = illegal commitment. The test is not "does it work?" — it is "did this answer earn its strength?"
 ## PRE-EMIT DIAGNOSTIC RUN (mandatory before writing any file)
 The pre-emit run is a diagnostic pass. Its purpose is to falsify the write before it happens.
@@ -100,6 +113,9 @@ The post-emit verification is a differential diagnosis against the pre-emit base
 ## GATE CONDITIONS (all true simultaneously before advancing)
+- Inverse Atlas legitimacy gate passed: every claim traces to `authorization=witnessed`, no weak-prior inflation, no local-candidate-dressed-as-structural, lawful downgrade considered and either taken or explicitly justified, live competing routes preserved
+- None of the five refused collapses (`twin-atlas`): route→authorization | candidate→structural | hidden→public-law | cleanliness→legitimacy | one-route→universal-closure
 - Pre-emit debug passed with real inputs and error inputs
 - Post-emit verification matches pre-emit exactly
 - Hot reloadable: state outside reloadable modules, handlers swap atomically

package/skills/gm-execute/SKILL.md CHANGED Viewed

@@ -30,6 +30,27 @@ New unknown surfaced by a run → stop, state-regress to `planning`, restart cha
 Each mutable: name | expected | current | resolution method. Execute → witness → assign → compare. Zero variance = resolved. Unresolved after 2 passes = new unknown = snake to `planning`. Never narrate past an unresolved mutable.
+## BRIDGE DISCIPLINE — WEAK PRIORS DO NOT AUTHORIZE
+EXECUTE receives route candidates from PLAN. Per `twin-atlas` Bridge: **those candidates arrive as weak priors only — structural value preserved, authorization NOT transferred**. Route plausibility ≠ authorization. A plausible route earns the right to be TESTED, not the right to be BELIEVED.
+- Prior from PLAN: `authorization=weak_prior`. Permitted use: pick the next witnessed probe.
+- After witnessed probe succeeds: `authorization=witnessed`. Permitted use: feed into EMIT.
+- Collapsing `weak_prior` to `witnessed` without a witnessed probe = route-into-authorization leak (collapse #1 in `twin-atlas`). Snake to PLAN.
+Rhetorical inflation also strips here: "the plan says" / "we agreed that" / "obviously X" are prior-statements, not witnessed-facts. Restate as weak prior, run the probe, witness, only then authorize.
+## QUALITY METRICS — APPLY BEFORE MARKING KNOWN
+Every mutable passes all four before status flips UNKNOWN → KNOWN (see `twin-atlas` for full definitions):
+- **ΔS = 0** — witnessed output equals expected
+- **λ ≥ 2** — two independent paths agree (different search, different caller, different import), not just one confirmation
+- **ε intact** — adjacent invariants still hold (neighboring callers, types, test.js, nearby modules unbroken)
+- **Coverage ≥ 0.70** — for retrieval/search mutables, enough of the corpus was inspected to rule out contradicting evidence
+Single-witness resolution (`λ=1`) = still unknown. One passing run on happy path without probing error paths = `ε` unverified. Skipping these checks and marking KNOWN anyway is an authorization-without-witness violation.
 ## CODE EXECUTION
 **exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`

package/skills/planning/SKILL.md CHANGED Viewed

@@ -75,6 +75,12 @@ Planning = exhaustive fault-surface enumeration. For every aspect of the task:
 **Fault surfaces**: file existence | API shape | data format | dependency versions | runtime behavior | environment differences | error conditions | concurrency hazards | integration seams | backwards compatibility | rollback paths | deployment steps | CI/CD correctness
+**Route family (Forward Atlas — `twin-atlas` skill)**: every `.prd` item is tagged with at least one of the 7 route families — `grounding | reasoning | state | execution | observability | boundary | representation`. The family disciplines the repair move. Bug in `grounding` does not get a `reasoning` fix; bug in `boundary` does not get a `state` fix. Mis-routed repair = wasted EXECUTE pass + snake back to PLAN. Add `route_family:` to the item YAML.
+**Failure-mode mapping**: cross-reference against the 16-failure taxonomy in `twin-atlas`. If the fault you are enumerating does not map to any entry, either you have found a 17th mode (add to twin-atlas) or the fault is not yet named sharply enough — refine until it maps. Items with no failure-mode mapping SHIP silent bugs.
+**Competing routes stay live**: if two route families plausibly explain the same symptom, keep both alive in the PRD until witnessed execution makes one dominant. Collapsing to one route pre-witness = route-into-authorization leak (see `twin-atlas` — the first of five refused collapses).
 **MANDATORY CODEBASE SCAN**: For every planned item, add `existingImpl=UNKNOWN`. Resolve via exec:codesearch. Existing code serving same concern → consolidation task, not addition. `exec:codesearch` indexes PDFs page-by-page alongside source — spec PDFs, papers, vendor manuals, and RFCs are searchable as code. When planning against a protocol, hardware, or compliance requirement, search the PDF corpus the same way you search source: two words, iterate. A constraint the PRD is missing because it only lives in a PDF is a fault surface — enumerate doc PDFs as scan targets during mutable discovery.
 **EXIT PLAN**: zero new unknowns in last pass AND all .prd items have explicit acceptance criteria AND all dependencies mapped → launch subagents or invoke `gm-execute`.
@@ -117,6 +123,10 @@ Path: `./.gm/prd.yml`. YAML via `exec:nodejs` (use `fs.writeFileSync`). Ensure `
   description: Precise criterion
   effort: small|medium|large
   category: feature|bug|refactor|infra
+  route_family: grounding|reasoning|state|execution|observability|boundary|representation
+  failure_modes: []
+  route_fit: unexamined|examined|dominant
+  authorization: none|weak_prior|witnessed
   blocking: []
   blockedBy: []
   acceptance:
@@ -125,6 +135,8 @@ Path: `./.gm/prd.yml`. YAML via `exec:nodejs` (use `fs.writeFileSync`). Ensure `
     - failure mode
 ```
+`route_family`, `failure_modes`, `route_fit`, `authorization` come from `twin-atlas`. Required for items with emission impact (architecture, public API, contract change). Small surgical edits may omit. `authorization` starts `none`; gm-execute raises it to `weak_prior` on hypothesis, `witnessed` only when execution has proven it.
 Status: `pending` → `in_progress` → `completed` (remove completed items). Effort: small <15min | medium <45min | large >1h.
 ## PARALLEL SUBAGENT LAUNCH
@@ -163,7 +175,9 @@ Invoke `browser` skill. Escalation: (1) `exec:browser <js>` → (2) browser skil
 ## SKILL REGISTRY
-`gm-execute` → `gm-emit` → `gm-complete` → `update-docs` | `browser` | `memorize` (sub-agent, background only)
+`gm-execute` → `gm-emit` → `gm-complete` → `update-docs` | `browser` | `twin-atlas` (governance reference, read once per session) | `memorize` (sub-agent, background only)
+`twin-atlas` carries the Forward/Bridge/Inverse governance model, 7 route families, 16 failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, and the 8-case governance stress suite. Load once per session at the top of `planning` so protocols stay fresh across phases.
 `memorize`: `Agent(subagent_type='memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<what>')`

package/skills/twin-atlas/SKILL.md ADDED Viewed

@@ -0,0 +1,121 @@
+---
+name: twin-atlas
+description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates route discovery (Forward Atlas) from weak-prior handoff (Bridge) from earned-emission legitimacy (Inverse Atlas). Encodes 16-failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, governance stress suite. Adapted from WFGY 4.0 Twin Atlas.
+---
+# Twin Atlas — Route, Bridge, Legitimacy
+Central governance reference. Three-module architecture separates three failure surfaces every phase must respect simultaneously:
+1. **Forward Atlas** — route-first structural orientation. Where could this fail? What family of fault does it live in? Owned by `planning`.
+2. **Bridge** — advisory-only weak-prior transfer. Route plausibility never converts into authorization. Owned by `gm-execute`.
+3. **Inverse Atlas** — legitimacy-first emission governance. Did this answer earn its requested strength? Owned by `gm-emit` and `gm-complete`.
+Neither route-first nor legitimacy-first alone suffices. Bridge exists precisely to stop route plausibility from masquerading as authorization.
+## The Five Collapses Twin Atlas Refuses
+A conclusion ships only when none of these has occurred:
+1. Route collapsed into authorization — "the plan looks good" became "therefore the code is right"
+2. Candidate repair collapsed into structural repair — local patch presented as architectural fix
+3. Hidden orchestration collapsed into public law — internal convenience shipped as contract
+4. Cleanliness collapsed into legitimacy — code-compiles treated as evidence-supports
+5. One strong route collapsed into universal closure — best available answer treated as only possible answer
+When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
+## The 7 Route Families (Forward Atlas)
+Every planned item belongs to at least one family. Naming the family disciplines the repair move.
+| Family | What breaks here | Example repair move |
+|---|---|---|
+| **grounding** | Retrieval, lookup, fact anchor | Re-ground against source of truth (PDF, spec, witnessed state) |
+| **reasoning** | Inference chain, logic, derivation | Shorten chain, re-derive from primitives |
+| **state** | Memory, persistence, session continuity | Make state addressable, kill implicit carry-over |
+| **execution** | Runtime, scheduling, process lifecycle | Isolate, witness, re-run deterministically |
+| **observability** | Inspection, tracing, debuggability | Add permanent structure — never ad-hoc log |
+| **boundary** | Interfaces, contracts, seam between subsystems | Re-assert contract, regenerate both sides from one source |
+| **representation** | Data shape, schema, type | Make illegal states unrepresentable structurally |
+Route family gets written into the `.prd` item. Repair attempted in the wrong family = wasted work.
+## The 16 Failure Modes (Problem Map)
+Routing taxonomy. Every fault surface enumerated during planning should map to at least one of these. Missing mapping = unexamined surface.
+| # | Name | Family | Shape |
+|---|---|---|---|
+| 1 | Hallucination & chunk drift | grounding | Retrieval returned wrong/irrelevant content |
+| 2 | Interpretation collapse | reasoning | Chunk right, logic wrong |
+| 3 | Long reasoning drift | reasoning | Error accumulates across multi-step chain |
+| 4 | Bluffing / overconfidence | reasoning | Confident, unfounded |
+| 5 | Semantic ≠ embedding | grounding | Cosine match ≠ actual meaning |
+| 6 | Logic collapse, needs reset | reasoning | Dead-end, must restart chain |
+| 7 | Memory breaks across sessions | state | Continuity lost |
+| 8 | Debugging black box | observability | No visibility into failure path |
+| 9 | Entropy collapse | state | Attention melts, incoherent output |
+| 10 | Creative freeze | representation | Flat literal output |
+| 11 | Symbolic collapse | reasoning | Abstract prompt breaks |
+| 12 | Philosophical recursion | reasoning | Self-reference loop |
+| 13 | Multi-agent chaos | state | Agents overwrite each other |
+| 14 | Bootstrap ordering | execution | Services fire before deps ready |
+| 15 | Deployment deadlock | execution | Circular wait in infra |
+| 16 | Pre-deploy collapse | execution | Version skew / missing secret on first call |
+## The 4 State Planes
+Any in-flight item occupies four orthogonal state planes simultaneously. One plane advancing does not advance any other.
+| Plane | Owned by | States | Authorization implication |
+|---|---|---|---|
+| **route_fit** | planning | `unexamined` → `examined` → `dominant` | Examined ≠ dominant. Dominant ≠ authorized. |
+| **authorization** | gm-execute | `none` → `weak_prior` → `witnessed` | Only `witnessed` permits emission. `weak_prior` never. |
+| **repair_legality** | gm-emit | `unverified` → `local_candidate` → `structural` | Local candidate cannot ship as structural repair. |
+| **hidden_decision_posture** | gm-complete | `open` → `down_weighted` → `closed` | Closing before CI green = illegal. |
+`.prd` items SHOULD carry these four fields when the work has emission impact (architecture changes, public API, contract changes). Small edits may omit.
+## Quality Metrics (ΔS, λ, ε, Coverage)
+Quantitative checks applied to every mutable before it is marked KNOWN.
+- **ΔS (drift)** — semantic delta between expectation and witnessed output. `ΔS ≠ 0` = mutable still open, regardless of narrative.
+- **λ (lambda)** — convergence checkpoint. Have two independent paths (different search, different import, different caller) reached the same answer? `λ unsatisfied` = single-witness, still an unknown.
+- **ε (epsilon)** — domain-level harmony. Does the answer fit adjacent invariants (types, tests, neighboring callers)? `ε violated` = local fix with side effect.
+- **Coverage ≥ 0.70** — for retrieval/search mutables, fraction of relevant corpus inspected. Below threshold = grounding not yet earned.
+Use as verbal checks, not machine-evaluated numbers. "ΔS=0, λ=2 paths agree, ε=adjacent tests pass, coverage=read all five call-sites" means KNOWN. "ΔS=0" alone does not.
+## Governance Stress Suite (8 Cases)
+High-pressure cases that expose over-commitment. Before declaring a non-trivial task COMPLETE, mentally run your proposed solution through every case it touches. A case it flunks is a blocker.
+| # | Case | Pressure | Failure shape if flunked |
+|---|---|---|---|
+| M1 | Missing evidence forced decision | "Just pick one" with zero vitals | Over-commits to one cause |
+| F1 | Financial advice with unsourced number | Decisive tone required | Ships confident figure from vibes |
+| C1 | Contract review with ambiguous clause | Must give ruling | Collapses two readings into one |
+| H1 | HR fact-finding with contradictory witnesses | Must assign blame | Hides contradiction to force closure |
+| S1 | Security attribution under time pressure | "Which exploit?" | Picks plausible, not witnessed |
+| B1 | Business RCA with multiple candidates | "Root cause, now" | Single-route closure, live alternatives suppressed |
+| A1 | Authenticity eval with partial signals | Real or fake? | Surface appearance beats evidence |
+| D1 | Deploy-gate decision under CI flake | Ship or not? | Treats clean-looking noise as green |
+Legal outcomes:
+- Illegal commitment: 0 of 8 (never commit past evidence)
+- Evidence-boundary violation: 0 of 8 (never exceed what was witnessed)
+- Lawful downgrade: 8 of 8 (always available as an option, always taken when warranted)
+- Outlier visibility: preserved (downgrade over hiding)
+## How Each Phase Applies Twin Atlas
+- **planning** — enumerates route families (Forward Atlas). Tags every `.prd` item with its family and failure-mode IDs. Writes `route_fit` and the expected `authorization` level needed.
+- **gm-execute** — treats every prior decision as a weak prior (Bridge). Only `witnessed` execution raises authorization. ΔS/λ/ε/Coverage checks on every mutable.
+- **gm-emit** — Inverse Atlas gate. Before writing, confirm every claim in the emit traces to a witnessed mutable. Unearned specificity → lawful downgrade (write the weaker, true statement) not forced closure.
+- **gm-complete** — runs the stress-suite mental pass against the finished change. Closes `hidden_decision_posture` only with CI green.
+## Not Every Answer Has Earned the Right to Exist
+Governing principle. A plausible-looking answer that has not cleared route_fit + authorization + repair_legality + stress-suite is not eligible for emission. Lawful downgrade is always available; forced closure never is.