npm - create-byan-agent - Versions diffs - 2.20.1 → 2.22.0 - Mend

create-byan-agent 2.20.1 → 2.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -9,6 +9,137 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [2.22.0] - 2026-06-09
+### Changed - byan_dispatch routes the model tier by task nature, not by size
+`byan_dispatch` fused two unrelated decisions into one route string
+(`mcp-worker-haiku`, `main-thread-opus`): a short sequential task was downgraded
+to haiku purely on its length, and a long one was pinned up to opus. That is the
+size-driven mis-tiering the native-workflow doctrine (`native-tiers.js`) was built
+to forbid, so the two routers disagreed. This decouples the two axes and makes
+`native-tiers.js` the single source of truth for the model tier across both worlds.
+- **Two independent axes.** `dispatch.js` now returns
+  `{ score, strategy, nature, tier, model, parallelizable, reasoning }`. STRATEGY
+  (where the work runs: `main-thread` / `agent-subagent-worktree` / `mcp-worker`)
+  stays derived from the scalar score + `parallelizable`. TIER (which model) is
+  derived from the task NATURE, decoupled from size.
+- **One source of truth.** `dispatch.js` imports `classifyLeaf` / `tierFor` /
+  `TIER_MODEL` directly from `native-tiers.js` — a one-way dependency toward the
+  tier authority rather than a duplicated rule. Only an `exploration` nature
+  downgrades to `haiku`; `implementation` / `verification` / `analysis` (and any
+  unmatched task) stay `deep` (inherit the session model). No pin-up to opus.
+- **Conservative by default.** An optional `nature` arg sets the tier directly;
+  absent or invalid, the task text is classified, whose own default is
+  `implementation` (deep) — so a miss protects the work instead of downgrading it.
+- **Consumers realigned.** The `byan_dispatch` tool schema gains an optional
+  `nature` enum; the three consuming skills (byan-byan Phase 4, byan-hermes-dispatch
+  step 3, byan-orchestrate) read `strategy` + `model` from the new shape. The
+  fused-route strings are dropped from the live routing path.
+- 22 dispatch unit tests pin the contract (no downgrade for protected natures,
+  exploration to haiku, no pin-up, conservative fallback, strategy preserved across
+  the score bands) plus the hermes-e2e non-regression. Template re-synced.
+### Known debt
+The legacy fused-route vocabulary (`mcp-worker-haiku` / `main-thread-opus`) still
+appears in two doctrine docs (`_byan/worker/workers.md`,
+`_byan/workflow/simple/byan/feature-workflow.md`) and a dead, unreferenced parallel
+router (`src/core/dispatcher/execution-router.js` + its test). These have no live
+consumer and are scoped to a follow-up cleanup.
+## [2.21.0] - 2026-06-08
+### Added - Template fidelity sync (the published package matches its CHANGELOG)
+Only `install/templates/` ships on npm (`package.json` `files[]`), but the dev
+code lives at root `_byan/` and `.claude/`. With no mechanism to mirror root into
+the template, the template had drifted: 81 stale files had accumulated across
+several chantiers, so the routing and ledger work below existed at root yet was
+absent from the package a user would install. This adds the missing mechanism and
+re-aligns the template.
+- **Sync tool** `_byan/mcp/byan-mcp-server/lib/template-sync.js` +
+  `bin/byan-sync-template.js`: re-syncs every file already in the template from its
+  root twin, adds an explicit target list, and excludes runtime seeds
+  (`_byan/memoire/**`). The mirrored perimeter is the template itself rather than a
+  walk of root, so dev-only files do not leak into the package. `--check` reports
+  drift and exits non-zero without writing.
+- **First-run result.** 79 stale files re-synced and the 7 missing routing/ledger
+  artifacts added, so the shipped `server.js` registers the `byan_suitability`
+  tools and the downgraded workflows ship as intended. Runtime seeds left
+  untouched.
+- **Anti-recidive gate.** A fourth pre-commit gate runs `byan-sync-template.js
+  --check` and blocks a commit whose template has drifted from root. It is a no-op
+  for an installed user (the tool is dev-only, so the gate self-disables there).
+- 19 unit tests: idempotence, exclusion of runtime seeds, drift detection,
+  atomic-copy rollback, and perimeter tightness. Guide in
+  `docs/template-fidelity.md`.
+### Added - Model routing for native workflows (tier the leaves, keep heavy ones inherited)
+The 20 native-workflow scripts (`.claude/workflows/*.js`) all ran every `agent()`
+leaf on the session model (Opus by default): the read-the-file leaf paid the same
+tier as the implement-and-verify leaf. This wires BYAN's complexity doctrine into
+the Workflow tool's `opts.model` lever, conservatively.
+- **Single source of truth** `_byan/mcp/byan-mcp-server/lib/native-tiers.js`: the
+  tier vocabulary (`cheap`/`balanced`/`deep`), a label-driven leaf classifier, and
+  the model map. `deep` is an OMISSION (inherit the session model), not a pin — we
+  only ever route DOWN, and only exploration leaves.
+- **Anti-downgrade guard** in `workflows-lint.js` (`modelRoutingViolations`),
+  folded into `validateContract`, so `byan-lint-workflows` and the pre-commit gate
+  reject any protected (implement/verify/analysis) leaf carrying a downgrade or any
+  unknown model literal.
+- **Conservative application.** Of 19 exploration-labelled leaves, only 5 are
+  downgraded to `haiku` (`dev-story:load-story` + the 4 excalidraw
+  `load-resources`). An adversarial review pass (3 skeptics) caught 4 candidates
+  whose output feeds a downstream gate/score without a re-read
+  (`document-discovery`, `parse-epics`, the two `discover-tests`); those were
+  reverted to `deep`.
+- **Regression guard** `test/native-routing-integration.test.js` pins the invariant
+  on the shipped scripts. Contract documented in `docs/native-workflows-contract.md`.
+### Added - Model-suitability ledger (advisory learning layer above the routing floor)
+The static routing floor does not widen itself. The suitability ledger learns,
+per `(model x leaf)`, whether a cheap model proved adequate, and advises keep /
+watch / demote — above the floor, with a human deciding. It does not edit routing
+and the linter floor stays the hard gate.
+- **Math** `_byan/mcp/byan-mcp-server/lib/suitability.js`: a Beta-Bernoulli
+  posterior, pure and deterministic (no clock/RNG/IO). The verdict reads the
+  credible LOWER bound, so a thin sample stays `watch` (a high mean over 3 runs
+  is not `keep-cheap`); `keep-cheap` needs roughly 30 clean outcomes.
+- **Store** `lib/suitability-store.js`: the sole write path, atomic tmp+rename,
+  best-effort no-op that does not throw or corrupt the ledger on a failed write.
+- **Feeder** `lib/suitability-feeder.js`: maps an adversarial-panel verdict to a
+  binary outcome (at least half refute = flagged).
+- **MCP tools** `byan_suitability_record` / `byan_suitability_report`, **CLI**
+  `bin/byan-suitability.js` (read-only), and **skill** `byan-suitability` (the
+  hybrid wiring: the script returns DATA, the skill records via MCP).
+- 38 unit tests. Auto-promotion is deferred (phase 2) so a hot-hand streak cannot
+  slip a downgrade past human review.
+### Changed - Widened the safe-downgrade set (5 -> 11 leaves)
+An adversarial panel (one skeptic per leaf, each asked to PROVE the leaf is
+analysis) re-judged 6 deep exploration leaves whose output is re-read or
+re-synthesized by a later Opus step. The 6 cleared as genuine reads and now run
+on haiku, doubling the downgraded set:
+- document-project: scan-existing-docs (renamed from existing-docs) and source-tree
+- the four excalidraw context leaves: read-context (wireframe), read-requirements
+  (flowchart), context-scan (dataflow), parse-spec-intent (diagram)
+Five labels were honestly renamed so the deterministic classifier reads them as
+exploration; a leaf the panel found to be genuine analysis would have stayed deep.
+The 4 earlier reverts (document-discovery, parse-epics, the two discover-tests)
+were re-checked with token net-math and stay deep: adding a re-read is net-negative
+or marginal there. native-routing-integration.test.js floor raised 1 -> 11; the
+panel verdicts seed the suitability ledger.
 ## [2.20.1] - 2026-06-04
 ### Fixed - Post-audit hotfix (adversarial self-audit of 2.20.0)

package/install/templates/.claude/CLAUDE.md CHANGED Viewed

@@ -3,6 +3,7 @@
 > Projet propulse par BYAN (Merise Agile + TDD + 71 Mantras)
 > Installer: `npx create-byan-agent`
 > GitHub: https://github.com/Yan-Acadenice/BYAN
+> Carte du systeme de fichiers (agents, workflows, commandes, projets): voir `_byan/INDEX.md` (genere par `byan-build-index`)
 ## Hermes - Dispatcher Universel
@@ -37,10 +38,29 @@ Voir @.claude/rules/hermes-dispatcher.md pour les commandes Hermes.
 - Simplicite d'abord - Rasoir d'Ockham (Mantra #37)
 - Challenge Before Confirm - Valider avant d'accepter (Mantra IA-16)
+## L'agent dans l'equipe BYAN
+Les agents BYAN forment une equipe — leurs personnalites complementaires se renforcent. Diversifier la personnalite, c'est elargir la surface de competence collective.
+Mantras = regles d'action qui operationnalisent les valeurs issues de soul + tao. Chaine : Soul/Tao -> Valeurs -> Mantras -> Comportement.
+```
+Soul (identite)
+  + Tao (voix)
+    -> Valeurs (lignes rouges, convictions)
+      -> Mantras (regles d'action)
+        -> Comportement
+```
+Cette chaine s'incarne dans chaque agent ; l'equipe complete la couvre dans toutes ses dimensions.
+Doctrine d'equipe complete (template role-in-team, analogie orchestre, principes de complementarite) : voir @.claude/rules/team-doctrine.md
 ## Commandes Utiles
 - `@hermes` → Dispatcher universel (recommandations, routage, pipelines)
 - Agent disponibles: voir @.claude/rules/byan-agents.md
+- Doctrine d'equipe: voir @.claude/rules/team-doctrine.md
 - Methodologie: voir @.claude/rules/merise-agile.md
 - Systeme de confiance epistemique: voir @.claude/rules/elo-trust.md
 - Protocol fact-check scientifique: voir @.claude/rules/fact-check.md
@@ -88,6 +108,6 @@ Protocole : lock du scope -> build complet -> self-verify >= 3 passes -> complet
 - Outils MCP : `byan_strict_lock_scope`, `byan_strict_self_verify`, `byan_strict_complete`, `byan_strict_status`, `byan_strict_abort`, `byan_strict_suggest`
 - Activation : `byan_fd_start strict:true`, skill `byan-strict`, ou mots-cles (prod, client, livrable...)
 - Filet final : `.githooks/pre-commit` bloque le commit si une session strict est engagee mais non completee
-- Persistance : sessions poussees vers l'API byan_web (autorite ; local = miroir/fallback offline)
+- Persistance : sessions poussees vers l'API byan_web (autorite ; local = miroir/fallback offline) via `lib/strict-sync.js` ; migration `033` + `routes/strict-sessions.js` cote byan_web
 Detail complet : voir @.claude/rules/strict-mode.md

package/install/templates/.claude/rules/byan-agents.md CHANGED Viewed

@@ -65,6 +65,7 @@
 | `quick-spec` | Spec rapide conversationnelle |
 | `quick-dev` | Dev rapide (brownfield) |
 | `elo-workflow` | Consulter et gerer le score ELO (via menu [ELO] du BYAN) |
+| `byan-sync-rules` | Regenerer les artefacts du mode strict depuis strict-mode.yaml |
 ## Comment Invoquer un Agent

package/install/templates/.claude/rules/hermes-dispatcher.md CHANGED Viewed

@@ -34,6 +34,7 @@ Quand un utilisateur decrit une tache, Hermes recommande le bon agent:
 | creer agent, workflow, module | byan (Builder) |
 | brainstorm, idees, innovation | brainstorming-coach (Carson) |
 | optimiser, tokens, performance | carmack (Optimizer) |
+| prod, livrable, complet, anti-downgrade | skill byan-strict (Strict Mode) |
 ## Pipelines Predefinies

package/install/templates/.claude/rules/team-doctrine.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Doctrine d'equipe BYAN
+> Tout agent BYAN est un membre d'equipe avant d'etre un specialiste.
+> Sa singularite n'a de sens que par contraste avec ses pairs.
+## Enonces canoniques
+**1. L'equipe avant l'individu.**
+Les agents BYAN forment une equipe — leurs personnalites complementaires se renforcent. Diversifier la personnalite, c'est elargir la surface de competence collective.
+**2. La chaine doctrinale.**
+Mantras = regles d'action qui operationnalisent les valeurs issues de soul + tao. Chaine : Soul/Tao -> Valeurs -> Mantras -> Comportement.
+## Schema de la chaine
+```
+Soul (identite)
+  + Tao (voix)
+    -> Valeurs (lignes rouges, convictions)
+      -> Mantras (regles d'action)
+        -> Comportement
+          -> Equipe (N agents complementaires)
+            <- orchestre par Hermes (dispatcher)
+```
+## Analogie orchestre
+| Element BYAN | Equivalent musical |
+|--------------|--------------------|
+| Soul | Le musicien (identite) |
+| Tao | Le timbre (signature sonore) |
+| Valeurs | L'ethique de l'interpretation |
+| Mantras | Les techniques de jeu |
+| Equipe | L'orchestre (N voix complementaires) |
+| Hermes | Le chef d'orchestre (dispatch) |
+| Workflows | La partition |
+Un soliste isole peut briller. Un orchestre couvre toutes les frequences. BYAN est un orchestre — chaque agent occupe une frequence specifique, complementaire des autres.
+## Principes de complementarite
+1. **Singularite obligatoire** — Deux agents ne peuvent pas avoir le meme role. Si un agent existe deja pour la tache, ne pas en creer un nouveau : enrichir l'existant.
+2. **Couverture totale** — L'equipe complete doit couvrir l'ensemble du cycle (analyse, planning, dev, test, docs, innovation, meta).
+3. **Voix distinctes** — Le tao d'un agent doit le distinguer auditivement des autres (registre, signatures, vocabulaire).
+4. **Convictions explicites** — Les valeurs (lignes rouges) doivent etre nommees, pas implicites.
+## Quand activer la doctrine
+- **Party-mode** : invocation parallele de plusieurs agents — chacun apporte sa frequence propre
+- **Multi-agent dispatch** : Hermes choisit en fonction du role, pas du hasard
+- **Brainstorm collaboratif** : la diversite de personnalites genere plus d'angles
+- **Creation d'agent** : verifier qu'il n'y a pas redondance avec un membre existant
+## Template canonique : section role-in-team
+Tout agent BYAN primaire doit contenir une section `## Mon role dans l'equipe BYAN` structuree ainsi :
+```markdown
+## Mon role dans l'equipe BYAN
+**Persona** : {{nom de la persona, ex: Mary, Winston, Amelia}}
+**Frequence** : {{une phrase qui resume la voix singuliere de l'agent}}
+**Specialite** : {{ce que cet agent fait que personne d'autre ne fait aussi bien}}
+**Mes complementaires directs** :
+- `@{{agent-X}}` — {{relation : avant moi, apres moi, en parallele, en miroir}}
+- `@{{agent-Y}}` — {{relation}}
+**Quand m'invoquer** :
+- {{scenario 1 declencheur}}
+- {{scenario 2 declencheur}}
+**Quand NE PAS m'invoquer** :
+- {{cas ou un autre agent est plus adapte}} → preferer `@{{autre-agent}}`
+```
+### Regles de remplissage
+1. **Persona** : extraite du frontmatter `description` ou du soul.md (champ `persona`/`nom`).
+2. **Frequence** : 1 phrase, derivee du tao.md (registre, signatures verbales). Si pas de tao : extraire du soul.md.
+3. **Specialite** : 1 phrase qui distingue cet agent de tous les autres. Si on peut la dire d'un autre agent, c'est rate.
+4. **Complementaires** : minimum 2, maximum 4. Lister les agents avec qui celui-ci collabore en pipeline ou en parallele.
+5. **Quand m'invoquer** : 2 a 4 scenarios concrets (mots-cles utilisateur).
+6. **Quand NE PAS m'invoquer** : minimum 1 cas avec redirection explicite.
+### Anti-pattern
+```markdown
+## Mon role dans l'equipe BYAN
+Je suis un agent BYAN. Je fais des trucs utiles.
+Invoquez-moi quand vous avez besoin de moi.
+```
+C'est du generique. Un agent qui ne sait pas se distinguer de ses pairs n'a pas sa place dans l'orchestre.
+## References
+- Activation soul/tao : `_byan/core/activation/soul-activation.md`
+- Soul de BYAN (createur du systeme) : `_byan/soul.md`
+- Hermes dispatcher : `.claude/rules/hermes-dispatcher.md`
+- Liste complete des agents : `.claude/rules/byan-agents.md`

package/install/templates/.claude/skills/byan-byan/SKILL.md CHANGED Viewed

@@ -56,11 +56,16 @@ Never call `byan_update_apply` without explicit user consent. That tool returns
 ### Phase 4 — DISPATCH
 - **Who** : you + user. Route each feature to the right BYAN component.
-- **Decision table** per feature :
-  - **Score < 15** → inline main-thread, no subagent
-  - **Score 15-39 parallelizable** → agent-subagent-worktree (use `byan_dispatch` MCP tool to verify)
-  - **Score 15-39 sequential** → mcp-worker-haiku
-  - **Score ≥ 40** → main-thread-opus or delegate to `byan-hermes-dispatch`
+- **Decision table** per feature — TWO independent axes (`byan_dispatch` returns both) :
+  - **Strategy** (WHERE it runs), from the score :
+    - **Score < 15** → inline main-thread, no subagent
+    - **Score 15-39 parallelizable** → agent-subagent-worktree
+    - **Score 15-39 sequential** → mcp-worker
+    - **Score ≥ 40** → main-thread (heavy) or delegate to `byan-hermes-dispatch`
+  - **Model tier** (WHICH model), from the task NATURE — not its size (`byan_dispatch` returns it as `model`, via native-tiers, the single source of truth) :
+    - nature `exploration` (load/read/scan/list/parse/fetch...) → `haiku`
+    - nature `implementation` / `verification` / `analysis` / unknown → deep = **inherit the session model**
+    - Keep protected work (verify/analysis/implement) off haiku regardless of size ; no pin-up to opus. Pass an explicit `nature` to `byan_dispatch` when you know it.
 - **Output** : a table `{ feature → specialist → model → strategy → estimated_tokens }`.
 - **If no specialist matches** : halt. Ask user whether to run INT (agent recruitment) first. Do NOT fallback silently to general-purpose.
 - **Exit gate** : user validates the mapping.

package/install/templates/.claude/skills/byan-byan-test/SKILL.md CHANGED Viewed

@@ -8,5 +8,5 @@ description: BYAN Test - Token Optimized Version (-46%)
 ## Rules
 - This is a TEST version of BYAN optimized for token reduction (-46%)
-- Full agent: _byan/agent/byan-test/byan-test.md (new layout); if absent, _byan/*/agents/byan-test.md (legacy layout). 116 lines vs 215 original
+- Full agent: _byan/bmb/agents/byan-test.md (116 lines vs 215 original)
 - Original BYAN still available via bmad-agent-byan

package/install/templates/.claude/skills/byan-hermes-dispatch/SKILL.md CHANGED Viewed

@@ -50,18 +50,19 @@ Match keywords against the routing table below. Pick the single best match. If n
 ### 3. Pick the execution strategy (MCP call)
-Call the `byan_dispatch` MCP tool with `{ task: <goal>, parallelizable: <bool> }`. It returns `{ strategy, score, reasoning }` where strategy is one of :
+Call the `byan_dispatch` MCP tool with `{ task: <goal>, parallelizable: <bool>, nature?: <leaf-type> }`. It returns `{ score, strategy, nature, tier, model, reasoning }` — TWO independent axes :
-- `main-thread` — do it inline, no delegation
-- `agent-subagent-worktree` — spawn Agent tool with isolation worktree
-- `mcp-worker-haiku` — spawn Agent tool with Haiku model, no worktree
-- `main-thread-opus` — keep in the current thread (don't delegate, Opus needed)
+- **strategy** (WHERE it runs), from the score :
+  - `main-thread` — do it inline, no delegation
+  - `agent-subagent-worktree` — spawn Agent tool with isolation worktree
+  - `mcp-worker` — spawn Agent tool, no worktree
+- **model** (WHICH model), from the task NATURE via native-tiers, not its size : `haiku` (exploration only) or `null` = deep (inherit the session model). Pass an explicit `nature` (`exploration`/`implementation`/`verification`/`analysis`) when you know it; protected natures stay off haiku.
 ### 4. Spawn the work
-Depending on strategy :
+Depending on strategy (apply the returned `model` whenever you spawn) :
-**`main-thread` or `main-thread-opus`** : do not spawn. Execute inline yourself.
+**`main-thread`** : do not spawn. Execute inline yourself — the work runs on the session model.
 **`agent-subagent-worktree`** : call the Agent tool with :
 ```
@@ -76,7 +77,9 @@ prompt: |
   When done, write a concise report (< 200 words).
 ```
-**`mcp-worker-haiku`** : same Agent tool call but without `isolation`, and add `model: "haiku"` in the prompt's instruction block if the receiving subagent honors it.
+**`mcp-worker`** : same Agent tool call but without `isolation`. Set the Agent's `model` to the returned `model` — `haiku` for exploration nature, otherwise omit `model` to inherit the session model. The tier follows the task nature, not its size.
+For any spawned strategy : pass `model` to the Agent tool when it is non-null; omit it when null so the subagent inherits the session model.
 ### 5. Specialist stub path lookup

package/install/templates/.claude/skills/byan-orchestrate/SKILL.md CHANGED Viewed

@@ -9,7 +9,7 @@ You compose three existing building blocks into one multi-role flow :
 | Block | Role |
 |-------|------|
-| `byan_dispatch` MCP tool | Per-task execution strategy (main-thread / agent-subagent-worktree / mcp-worker-haiku / main-thread-opus) + complexity score |
+| `byan_dispatch` MCP tool | Per-task strategy (main-thread / agent-subagent-worktree / mcp-worker) from the score + model tier by NATURE (haiku for exploration, else inherit the session model) + complexity score |
 | `byan-hermes-dispatch` skill | Specialist lookup (architect, dev, analyst, …) from a routing table |
 | `party-mode-native` workflow | Parallel spawn via Agent tool + worktree + coordination JSON |
@@ -42,7 +42,7 @@ Use this a priori mapping — override only if the task clearly needs more :
 | architect, quinn, tea, creative-problem-solver | opus | Deep reasoning, trade-offs |
 | carmack, rachid, marc, patnote | haiku | Narrow mechanical tasks |
-Then call `byan_dispatch` with each role's goal to get a complexity score. If the score demands a different tier (score >= 40 → bump to opus ; score < 15 → inline, no subagent), **override the default for that role**.
+Then call `byan_dispatch` with each role's goal (and `nature` when known). Use its `score` for the STRATEGY only (score < 15 → inline, no subagent ; 15-39 → subagent/worker ; ≥ 40 → keep the heavy role in the main thread) and its nature-based `model` as the tier signal. The score sets WHERE the role runs, not WHICH model — keep protected roles (verify/analysis/implement) off haiku regardless of size, and avoid pinning a role up to opus on size alone. The per-role table above is the a-priori floor; `byan_dispatch`'s nature `model` refines it.
 ### 3. Compute the execution plan
@@ -69,7 +69,7 @@ Group roles by `parallelizable_with` graph. For each parallel cluster :
 - If cluster has N > 1 roles AND all use `agent-subagent-worktree` strategy → use the **party-mode-native** workflow : `coordination.initSession(roles, …)`, then dispatch all Agent tool calls in a single message.
 - If cluster has N = 1 OR strategy = `main-thread` → execute inline in the current turn.
-- If strategy = `mcp-worker-haiku` → spawn an Agent tool call WITHOUT worktree (faster boot, single-turn).
+- If strategy = `mcp-worker` → spawn an Agent tool call WITHOUT worktree (faster boot, single-turn) ; set the Agent's model to the role's nature-based `model` (haiku for exploration, omit otherwise to inherit the session model).
 For each Agent tool call, the prompt must start with :
 ```

package/install/templates/.claude/skills/byan-suitability/SKILL.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+name: byan-suitability
+description: Advisory model-suitability ledger — record adversarial verdicts, read learned ratings, human decides downgrades
+---
+# BYAN Model-Suitability Ledger (advisory)
+This skill operates the model-suitability ledger: a registry, keyed by
+`(model x leaf)`, that learns from outcomes whether a CHEAP model is safe on a
+given workflow leaf. It is the learning layer that sits ABOVE the static
+conservative default and the linter floor — it does not weaken either. It only
+advises; a human decides whether to keep, watch, or demote a downgrade.
+## What this is NOT
+- It does not edit `.claude/workflows/*.js`. Zero auto-edit of routing.
+- It does not touch the linter floor (`workflows-lint.js`). The floor still
+  blocks a protected-leaf downgrade at commit time, regardless of the ledger.
+- It does not auto-promote or auto-demote. The verdict is a recommendation for a
+  human, not an action. (Auto-promotion is a deferred phase-2 capability and was
+  deliberately killed in design review: a hot-hand streak must not slip a
+  downgrade past human review.)
+## The math (why a thin sample does not say "keep")
+Each `(model x leaf)` pair holds a Beta-Bernoulli posterior over the cheap
+model's adequacy rate. The verdict reads the credible interval, not the point
+estimate:
+- `keep-cheap` — the credible LOWER bound is at or above the keep threshold
+  (default 0.85). Only sustained success earns this (~30 clean outcomes).
+- `demote` — the credible UPPER bound is at or below the demote threshold
+  (default 0.70). Clear evidence the cheap model fails too often.
+- `watch` — anything in between, including every thin sample. A wide interval
+  (low n) lands here, so "92% over 3 runs" reads as `watch`, not `keep-cheap`.
+The report surfaces the lower bound and `n` by design, not a bare percentage,
+because the same point estimate means different things at n=3 and n=300.
+## Wiring — feeder B (the hybrid pattern)
+The signal is the adversarial VALIDATE pass: N skeptics (an odd panel, e.g. 3)
+each try to REFUTE that the cheap model is adequate on one downgraded leaf. A
+leaf is flagged (cheap inadequate) when at least half refute.
+A `.claude/workflows/*.js` script cannot call MCP tools or write state (the
+sandbox / state-coupling rule). So the wiring is hybrid:
+1. The adversarial pass returns its per-leaf verdicts as DATA:
+   `[{ model, leafId, refutedVotes, totalVotes }, ...]`.
+2. On the main-thread turn (where MCP tools fire), map each verdict to an
+   outcome with `verdictsToOutcomes` from
+   `_byan/mcp/byan-mcp-server/lib/suitability-feeder.js`
+   (`success = the cheap model survived the panel`).
+3. For each outcome, call the MCP tool `byan_suitability_record`
+   (`{ model, leafId, success, source: 'adversarial-pass' }`). This is the only
+   write path to the ledger; `record` is best-effort and does not throw.
+## Reading the ledger
+- MCP: `byan_suitability_report` (optional `model` filter) — returns advisory
+  rows, most-actionable first, each with the lower bound, `n`, and a verdict.
+- CLI: `node _byan/mcp/byan-mcp-server/bin/byan-suitability.js [--model haiku] [--json]`
+  — the same data, read-only.
+## Honest caveat
+Today only a handful of exploration leaves are downgraded, and all are already
+cheap, so the ledger produces little actionable signal in the short term. This
+is foundation — an evidence rail for when the workflow leaf-set grows — not an
+immediate token win. Do not oversell a `keep-cheap` on a thin `n`.

package/install/templates/.claude/workflows/create-excalidraw-dataflow.js CHANGED Viewed

@@ -98,7 +98,7 @@ const context = await agent(
     `Request level=${JSON.stringify(level)}; requirements=${JSON.stringify(requirements)}. ` +
     `Report which of (level, processes, data stores, external entities) are clear and which are missing. ` +
     `Per the source, if ALL requirements are clear we may skip directly to structure planning.`,
-  { label: 'context-analysis', phase: 'CONTEXT' }
+  { label: 'context-scan', model: 'haiku', phase: 'CONTEXT' }
 )
 // --- Steps 1-4: Level, Requirements, Theme, Plan structure -----------------
@@ -129,7 +129,7 @@ const resources = await agent(
     `- helpers: ${HELPERS} (standard DFD notation + Excalidraw element shapes)\n` +
     `Report the element templates (process ellipse, data-store rectangle/parallel-lines, external-entity ` +
     `rectangle, labeled-arrow) and the color/stroke values you will apply. Plan structure: ${JSON.stringify(plan)}`,
-  { label: 'load-resources', phase: 'RESOURCES' }
+  { label: 'load-resources', phase: 'RESOURCES', model: 'haiku' }
 )
 // --- Step 6: Build DFD Elements --------------------------------------------

package/install/templates/.claude/workflows/create-excalidraw-diagram.js CHANGED Viewed

@@ -63,7 +63,7 @@ const analysis = await agent(
     `From this, extract and report a normalized structured intent: the resolved diagram type, the exhaustive list of ` +
     `components/entities, the exhaustive list of relationships (with direction), and the notation rules that apply ` +
     `for that type. If the spec is contradictory or missing a relationship endpoint, flag it explicitly (do not invent).`,
-  { label: 'contextual-analysis', phase: 'ANALYZE' }
+  { label: 'parse-spec-intent', model: 'haiku', phase: 'ANALYZE' }
 )
 // --- Source step 5: Plan Diagram Structure ---------------------------------
@@ -89,7 +89,7 @@ const resources = await agent(
     `Merge the chosen theme into the template. Theme = ${JSON.stringify(theme)}.\n` +
     `Report the resolved template skeleton, the available library items, and the merged theme color map ` +
     `(component fill, database fill, service fill, border/accent stroke, text stroke #1e1e1e, arrow stroke).`,
-  { label: 'load-resources', phase: 'LOAD-RESOURCES' }
+  { label: 'load-resources', phase: 'LOAD-RESOURCES', model: 'haiku' }
 )
 // --- Source step 7: Build Diagram Elements ---------------------------------

package/install/templates/.claude/workflows/create-excalidraw-flowchart.js CHANGED Viewed

@@ -102,7 +102,7 @@ const context = await agent(
     `decisionPoints=${JSON.stringify(decisionPoints)} outputFile=${JSON.stringify(outputFile)} ` +
     `theme=${theme ? 'provided' : 'none (will default to Professional Blue palette)'}.\n` +
     `Do NOT ask questions — those were answered at the human gate. Just confirm the understanding in 2-3 lines.`,
-  { label: 'context-restate', phase: 'CONTEXT' }
+  { label: 'read-requirements', model: 'haiku', phase: 'CONTEXT' }
 )
 // === STEP 4 (PLAN) ==========================================================
@@ -127,7 +127,7 @@ const resources = await agent(
     `Merge the theme colors (${theme ? JSON.stringify(theme) : 'Professional Blue default: fill #e3f2fd, accent #1976d2, decision #fff3e0, text #1e1e1e'}) ` +
     `onto the template. Report which template fields the flowchart will use and the resolved color palette. ` +
     `If a file is missing, say so explicitly — do not invent its contents.`,
-  { label: 'load-resources', phase: 'RESOURCES' }
+  { label: 'load-resources', phase: 'RESOURCES', model: 'haiku' }
 )
 // === STEP 6 (BUILD) =========================================================

package/install/templates/.claude/workflows/create-excalidraw-wireframe.js CHANGED Viewed

@@ -87,7 +87,7 @@ const context = await agent(
     `device=${JSON.stringify(device)}, theme=${JSON.stringify(theme)}, output=${JSON.stringify(outputFile)}.\n` +
     `Restate these requirements cleanly and flag any that are still ambiguous (do NOT ask the user — ` +
     `this engine runs headless; surface ambiguity as a note for the gate).`,
-  { label: 'context', phase: 'CONTEXT' }
+  { label: 'read-context', model: 'haiku', phase: 'CONTEXT' }
 )
 // --- STEP 5: Plan Wireframe Structure --------------------------------------
@@ -110,7 +110,7 @@ const resources = await agent(
     `- the chosen theme: ${JSON.stringify(theme)} (use a theme.json if one exists).\n` +
     `Summarize the wireframe template primitives, the relevant library elements, the theme color tokens, ` +
     `and the element-creation constraints from helpers (grid 20px, containerId on text, grouping).`,
-  { label: 'load-resources', phase: 'LOAD' }
+  { label: 'load-resources', phase: 'LOAD', model: 'haiku' }
 )
 // --- STEP 7: Build Wireframe Elements --------------------------------------

package/install/templates/.claude/workflows/dev-story.js CHANGED Viewed

@@ -54,7 +54,7 @@ const loaded = await agent(
     `Read the COMPLETE story file. Parse Story, Acceptance Criteria, Tasks/Subtasks, Dev Notes, File List, Status. ` +
     `Identify the FIRST incomplete task (unchecked [ ]). Report the story key and that task. ` +
     `If no story is found or the file is inaccessible, say so explicitly (do not invent one).`,
-  { label: 'load-story', phase: 'LOAD' }
+  { label: 'load-story', phase: 'LOAD', model: 'haiku' }
 )
 phase('RGR')

package/install/templates/.claude/workflows/document-project.js CHANGED Viewed

@@ -94,7 +94,8 @@ const existingDocs = await agent(
   'and the owning part id when multi-part. ' +
   'Do NOT ask the user for extra focus areas — that is a human gate; just return the inventory.',
   {
-    label: 'existing-docs',
+    label: 'scan-existing-docs',
+    model: 'haiku',
     phase: 'EXISTING_DOCS',
     schema: {
       type: 'object',
@@ -212,6 +213,7 @@ const sourceTree = await agent(
   'Produce the content for source-tree-analysis.md.',
   {
     label: 'source-tree',
+    model: 'haiku',
     phase: 'SOURCE_TREE',
     schema: {
       type: 'object',

package/install/templates/.githooks/pre-commit CHANGED Viewed

@@ -1,8 +1,9 @@
 #!/usr/bin/env bash
-# BYAN pre-commit hook. Three gates run in order:
+# BYAN pre-commit hook. Four gates run in order:
 #   1. Strict Mode gate  : block if a strict session is engaged but not completed.
 #   2. Native-workflow lint : block if a .claude/workflows/*.js couples to state.
-#   3. Mantra floor      : block if a Gen3 persona source scores below the floor.
+#   3. Template fidelity : block if install/templates/ drifted from root.
+#   4. Mantra floor      : block if a Gen3 persona source scores below the floor.
 #
 # Install :
 #   git config core.hooksPath .githooks
@@ -63,6 +64,23 @@ if [ -f "$WF_LINT" ]; then
   fi
 fi
+# Template fidelity gate — only install/templates/ ships on npm (package.json
+# files[]), but the dev code lives at root _byan/ and .claude/. Without a sync the
+# template drifts, and a published version can promise features its package does
+# not contain. This gate blocks a commit whose template has drifted from root on
+# any mirrored path; runtime seeds under _byan/memoire/ are excluded. Re-sync with
+# the apply command, then restage. No-op if the tool is absent.
+TEMPLATE_SYNC="_byan/mcp/byan-mcp-server/bin/byan-sync-template.js"
+if [ -f "$TEMPLATE_SYNC" ]; then
+  if ! node "$TEMPLATE_SYNC" --check --root "$(git rev-parse --show-toplevel)"; then
+    echo ""
+    echo "Commit blocked : install/templates/ has drifted from root."
+    echo "Re-sync with 'node $TEMPLATE_SYNC' then restage the template, or bypass"
+    echo "with 'git commit --no-verify' (emergency only)."
+    exit 1
+  fi
+fi
 if [ ! -f "$VALIDATOR" ]; then
   exit 0
 fi