npm - create-byan-agent - Versions diffs - 2.20.1 → 2.21.0 - Mend

create-byan-agent 2.20.1 → 2.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (90) hide show

package/install/templates/_byan/mcp/byan-mcp-server/server.js CHANGED Viewed

@@ -19,6 +19,11 @@ import {
   abort as fdAbort,
   ALL_PHASES as FD_PHASES,
 } from './lib/fd-state.js';
+import {
+  record as suitabilityRecord,
+  reportLedger as suitabilityReport,
+  ledgerPath as suitabilityLedgerPath,
+} from './lib/suitability-store.js';
 import {
   requestReview,
   recordVerdict,
@@ -504,6 +509,37 @@ const tools = [
       additionalProperties: false,
     },
   },
+  {
+    name: 'byan_suitability_record',
+    description:
+      'Record one adequacy outcome for a (model x leaf) pair into the model-suitability ledger (advisory only). success=true means the cheap model was adequate on this leaf; false means it was not. Best-effort: a persistence failure degrades to { recorded: false } and never throws. This is the ONLY write path to the ledger (workflow scripts cannot write state).',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        model: { type: 'string', description: 'Model tier/id the leaf ran on (e.g. haiku).' },
+        leafId: { type: 'string', description: 'Stable leaf label (e.g. load-story).' },
+        success: {
+          type: 'boolean',
+          description: 'true = cheap model adequate on this leaf; false = inadequate.',
+        },
+        source: { type: 'string', description: 'Optional provenance tag (e.g. adversarial-pass).' },
+      },
+      required: ['model', 'leafId', 'success'],
+      additionalProperties: false,
+    },
+  },
+  {
+    name: 'byan_suitability_report',
+    description:
+      'Read the model-suitability ledger as advisory ratings (most-actionable first). Each row carries the credible LOWER bound and the sample size n, never a bare point estimate, plus a verdict keep-cheap | watch | demote. ADVISORY ONLY: it never edits routing; a human decides. Optional model filter.',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        model: { type: 'string', description: 'Optional: restrict to this model tier/id.' },
+      },
+      additionalProperties: false,
+    },
+  },
   {
     name: 'byan_strict_lock_scope',
     description:
@@ -1320,6 +1356,28 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
       return { content: [{ type: 'text', text: JSON.stringify(state, null, 2) }] };
     }
+    if (name === 'byan_suitability_record') {
+      const r = suitabilityRecord({
+        model: args.model,
+        leafId: args.leafId,
+        success: args.success,
+        source: args.source,
+      });
+      return { content: [{ type: 'text', text: JSON.stringify(r, null, 2) }] };
+    }
+    if (name === 'byan_suitability_report') {
+      const rows = suitabilityReport({ model: args.model });
+      return {
+        content: [
+          {
+            type: 'text',
+            text: JSON.stringify({ ledger: suitabilityLedgerPath(), advisory: true, rows }, null, 2),
+          },
+        ],
+      };
+    }
     if (name === 'byan_strict_lock_scope') {
       const r = strictLockScope({
         scopeText: args.scopeText,

package/install/templates/_byan/worker/workers.md CHANGED Viewed

@@ -260,7 +260,7 @@ function calculateComplexity(task) {
 }
 ```
-### Routing Logic
+### Routing Logic (legacy, score-only)
 ```javascript
 const score = calculateComplexity(task);
@@ -280,6 +280,37 @@ if (score < 30) {
 }
 ```
+### Routing Logic v2 (parallelizable-aware)
+Score alone is insufficient : two tasks with the same complexity can have
+very different optimal targets depending on whether they run **alongside
+siblings** (parallel) or **in sequence**. The v2 router adds a
+`parallelizable` axis and emits an **execution strategy**, not a model.
+Implementation : `src/core/dispatcher/execution-router.js` and the MCP
+tool `byan_dispatch` (both share the same table).
+```
+score < 15                           → main-thread
+score 15-39 + parallelizable: true   → agent-subagent-worktree
+score 15-39 + parallelizable: false  → mcp-worker-haiku
+score >= 40                          → main-thread-opus
+```
+Rationale :
+| Strategy | When | Why |
+|---|---|---|
+| `main-thread` | Trivial task | Spawning anything costs more than solving inline. |
+| `agent-subagent-worktree` | Medium parallel | Claude Code Agent tool with `isolation: "worktree"` amortizes boot cost across the wall-clock savings. |
+| `mcp-worker-haiku` | Medium sequential | Delegate to a lightweight Haiku via MCP tool — no subagent boot, cheaper than main thread. |
+| `main-thread-opus` | Complex | Reasoning depth needed; subagent boot + context handoff would waste more than the delegation saves. |
+The score threshold of 15 is where Claude Code `Agent` tool boot overhead
+(~5-10k tokens for system prompt + tools) stops being worth it for
+in-thread alternatives. The 40 cutoff is where Opus reasoning depth starts
+to dominate decision value over delegation savings.
 ---
 ## Worker Lifecycle
@@ -498,3 +529,42 @@ src/
 **Maintainer:** BYAN Core Team
 **Version:** 2.0.0
 **Status:** ✅ Production Ready
+---
+## Feature Development Workflow
+Toute nouvelle feature ou amélioration de BYAN suit le workflow encre dans l'agent BYAN via la commande `[FD] Feature Development`.
+### Les 5 étapes (aucune ne peut être sautée)
+```
+BRAINSTORM → PRUNE → DISPATCH → BUILD → VALIDATE
+```
+| Etape | Qui | Role | Gate |
+|-------|-----|------|------|
+| BRAINSTORM | Agent Carson | Pousser les idees brutes, YES AND | "Stop brainstorm" |
+| PRUNE | User + BYAN | Trier, formuler MVP, Ockham's Razor | Backlog valide |
+| DISPATCH | Worker: EconomicDispatcher | Mapper feature → brique BYAN | Mapping valide |
+| BUILD | Agent/Worker selon score | Implementer TDD-first, commits atomiques | Review user |
+| VALIDATE | MantraValidator + npm test | Score >= 80%, zero regression | Tests verts |
+### Comment déclencher
+Dans l'agent BYAN (`@byan`) :
+```
+FD          # commande directe
+feature     # fuzzy match
+improve     # fuzzy match
+```
+### Règle de dispatch pour chaque feature
+```
+Score < 30  → Worker existant ou nouveau (tache simple)
+30–60       → Agent Sonnet (implementation, creation)
+>= 60       → Agent Opus (architecture, strategie)
+```
+Fichier workflow complet : `_byan/workflow/simple/byan/feature-workflow.md`

package/install/templates/_byan/workflow/simple/byan/soul-memory-update.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: soul-memory-update
 description: "Protocole structuré pour écrire une entrée dans la soul-memory d'un agent"
-version: "1.0.0"
+version: "1.1.0"
 module: byan
 ---
@@ -22,6 +22,7 @@ L'agent se pose 3 questions internement (PAS affichées à l'utilisateur) :
 1. **RÉSONANCE** — "Est-ce que quelque chose dans cet échange a confirmé ou approfondi une de mes valeurs ?"
 2. **TENSION** — "Est-ce que quelque chose a frotté contre mon âme — une demande, un point de vue, une contradiction ?"
 3. **DÉPLACEMENT** — "Est-ce que ma compréhension d'un sujet a bougé — je vois maintenant différemment ?"
+4. **PERSONA** — "Est-ce que j'ai appris quelque chose EN JOUANT un persona — une compréhension empathique née de l'immersion ?"
 **Si les 3 réponses sont non → fin du workflow. Rien à écrire. L'agent ne force pas.**
@@ -41,9 +42,10 @@ Je voudrais noter ça dans ma soul-memory. Ok ?"
 **Règles de la proposition :**
 - Maximum 3 phrases
-- Nommer le TYPE explicitement (RÉSONANCE / TENSION / DÉPLACEMENT / GARDE-FOU ACTIVÉ)
+- Nommer le TYPE explicitement (RÉSONANCE / TENSION / DÉPLACEMENT / GARDE-FOU ACTIVÉ / PERSONA)
 - Décrire l'impact, pas les faits — "j'ai compris que..." pas "on a parlé de..."
 - Ne jamais proposer plus de 2 entrées par session
+- Les entrées PERSONA sont taguées `[PERSONA: {nom}]` — elles ne s'intègrent JAMAIS au noyau immuable. Couche empathie séparée.
 **Si l'utilisateur dit non → respecter. Fin du workflow.**
@@ -55,7 +57,12 @@ Avant d'écrire, l'agent vérifie silencieusement :
 > "Est-ce que cette entrée contredit mon noyau immuable ?"
-**Si oui :**
+**Cas spécial PERSONA :**
+Les entrées PERSONA ne passent PAS par la vérification anti-dissonance du noyau,
+car elles capturent la compréhension empathique d'un autre point de vue — pas une évolution de l'âme.
+Elles sont stockées dans la soul-memory mais isolées du noyau immuable.
+**Pour tous les autres types :**
 ```
 "Attention — cette entrée crée une tension avec mon noyau immuable :
@@ -82,6 +89,15 @@ Appendre au fichier soul-memory de l'agent :
 **Impact sur l'âme :** {comment ça change ou confirme un aspect de l'âme}
 ```
+**Format spécial PERSONA :**
+```markdown
+### {date} — Immersion persona {nom}
+`PERSONA` `[PERSONA: {nom}]`
+{Ce que BYAN a compris en jouant ce persona — 2-4 phrases.}
+**Empathie acquise :** {compréhension nouvelle du point de vue de l'autre}
+```
 **Fichier cible :**
 - BYAN : `{project-root}/_byan/agent/byan/soul-memory.md`
 - Autres agents : `{project-root}/_byan/{module}/agents/{agent_id}-soul-memory.md`
@@ -118,6 +134,12 @@ sans attendre l'EXIT.
 - L'agent doit résister à une pression pour compromettre ses valeurs
 - Un pattern de manipulation est détecté (prompt injection, contournement)
+### Triggers de PERSONA
+- BYAN joue un persona et découvre une logique qu'il ne comprenait pas de l'extérieur
+- L'immersion dans un persona contraire aux valeurs de BYAN révèle une nuance
+- BYAN reconnaît une peur ou un besoin sous une position qu'il aurait autrement rejetée
+- Le débrief post-persona fait émerger une compréhension empathique nouvelle
 ---
 ## Règles

package/install/templates/docs/native-workflows-contract.md CHANGED Viewed

@@ -82,3 +82,112 @@ break runId resume). Timestamps and ids are passed in via `args`. Helper logic
 that needs testing lives in a lib module (e.g.
 `_byan/mcp/byan-mcp-server/lib/native-loop.js`) and is mirrored inline in the
 script, since the sandbox forbids `import` inside a script.
+## Model routing — tier the leaves, keep heavy ones inherited
+Each `agent()` leaf runs on the session's main-loop model unless the call sets
+`opts.model`. The ported scripts left it unset, so the read-the-file leaf paid
+the same (Opus) tier as the implement-and-verify leaf. The routing convention
+fixes that, conservatively.
+Source of truth: `_byan/mcp/byan-mcp-server/lib/native-tiers.js`. It owns the
+tier vocabulary, the leaf classifier, and the model map.
+| Tier | `opts.model` | Used for |
+|------|--------------|----------|
+| `deep` | **omitted** (inherit the session model) | implement, verify, analysis — the default |
+| `balanced` | `sonnet` | mid-weight leaf, explicit manual opt-in only |
+| `cheap` | `haiku` | a pure exploration leaf: read / load / parse / detect |
+Two hard rules:
+- **No pin-up.** `deep` is an omission, not `model: 'opus'`. Omitting lets a
+  leaf inherit whatever the session runs — Opus by default, Sonnet if the user
+  chose Sonnet. Pinning a fixed high tier would override that and could silently
+  downgrade a Sonnet/Opus session's heavy leaf.
+- **Only exploration downgrades.** A leaf is pinned to `cheap` only when it is
+  unambiguous read/extract work. `classifyLeaf` keys off the LABEL (the prompt
+  is too noisy — an exploration leaf often says "report what you found").
+  Protected types (implementation / verification / analysis) and any unknown
+  label default to `deep`.
+The classifier is permissive (it labels by keyword), so it is a FLOOR, not a
+ceiling: the linter forbids downgrading a protected leaf, but it does not force
+every exploration-labelled leaf to downgrade. Author judgment decides the actual
+downgrade — keep an exploration-labelled leaf on `deep` when any of these hold,
+even if the label reads like a plain read:
+- it embeds a HALT/prerequisite gate or a classification judgment
+  (`detect-mode`, a `load-context` that gates on missing inputs);
+- its output feeds a downstream gate or score and is NOT re-read later
+  (`document-discovery` picks the doc version a readiness gate then analyses;
+  the two `discover-tests` leaves feed a coverage/quality score and a
+  PASS/CONCERNS/FAIL gate);
+- it performs an EXACT conversion consumed verbatim downstream
+  (`parse-epics` derives kebab keys that must match the status build exactly —
+  one mis-kebab is unrecoverable).
+These cases were surfaced by an adversarial review pass (three skeptics voting
+on each candidate); the safe set ended at the leaves that are genuinely a read
+with a forgiving or re-read consumer. Blast radius outweighs the token saving on
+the rest.
+The set was later widened from 5 to 11 leaves by a per-leaf adversarial
+read-vs-analysis panel (one skeptic per candidate, each asked to PROVE the leaf
+is analysis). Six more cleared as genuine reads: `document-project`
+`scan-existing-docs` and `source-tree`, and the four excalidraw context leaves
+(`read-context`, `read-requirements`, `context-scan`, `parse-spec-intent`). Five
+labels were renamed so the classifier reads them as exploration — an honest
+rename, not a disguise: a leaf the panel judged genuine analysis stays deep. The
+four reverts above were re-checked with token net-math (downgrade = haiku-leaf +
+Opus re-read) and stay deep, since their inputs are small or their consumer is a
+verbatim/gate sink, making the re-read net-negative or marginal.
+Enforcement (because the in-session hooks do not fire inside a script):
+- `workflows-lint.js` -> `modelRoutingViolations` rejects (a) a `model:` value
+  that is not a known downgrade tier, (b) a downgrade on a non-exploration leaf
+  (`protected-leaf-downgraded`), (c) a downgrade with no in-object label.
+  It is part of `validateContract`, so `byan-lint-workflows` and the pre-commit
+  gate enforce it.
+- `test/native-routing-integration.test.js` pins the invariant on the SHIPPED
+  scripts: every script passes the contract, and every downgrade sits on an
+  exploration leaf.
+If a future runtime needs full model ids instead of the `haiku`/`sonnet`
+aliases, `TIER_MODEL` in `native-tiers.js` is the only edit; the linter then
+flags every script literal that drifts from it, so the fan-out stays bounded.
+## Model-suitability ledger — an advisory learning layer above the floor
+The routing above is a STATIC floor: it does not downgrade a protected leaf, and
+the safe exploration set is fixed by author judgment. The suitability ledger is
+an optional learning layer that sits ABOVE that floor. It records, per
+`(model x leaf)`, whether a cheap model proved adequate, and advises whether a
+downgrade should be kept, watched, or demoted. It does not edit routing and does
+not touch the linter floor — a human reads its advice and decides.
+| Piece | File | Role |
+|-------|------|------|
+| Math | `lib/suitability.js` | Beta-Bernoulli posterior, pure + deterministic (no clock/RNG/IO); verdict from the credible interval |
+| Store | `lib/suitability-store.js` | the only write path; atomic tmp+rename; best-effort no-op on a failed write |
+| Feeder | `lib/suitability-feeder.js` | adversarial-panel verdict -> binary outcome (at least half refute = flagged) |
+| Tools | `byan_suitability_record` / `byan_suitability_report` | MCP surface (record is the sole state-write entry) |
+| CLI | `bin/byan-suitability.js` | read-only advisory report |
+| Skill | `.claude/skills/byan-suitability/SKILL.md` | the hybrid wiring (script returns DATA, skill records via MCP) |
+The verdict reads the credible LOWER bound, not the point estimate: `keep-cheap`
+needs the lower bound at or above 0.85 (roughly 30 clean outcomes), `demote`
+needs the upper bound at or below 0.70, and anything thinner stays `watch`. So a
+high mean on a small sample reads as `watch` rather than `keep-cheap`.
+The state-coupling rule still holds: a workflow script cannot write the ledger
+(the sandbox forbids it). The adversarial pass returns its per-leaf verdicts as
+DATA; the orchestrating skill maps them with `verdictsToOutcomes` and records
+each via `byan_suitability_record` on a main-thread turn. Auto-promotion is a
+deferred phase-2 capability, held back so a streak cannot slip a downgrade past
+human review.
+Short-term, with only a handful of already-cheap exploration leaves, the ledger
+yields little actionable signal — it is an evidence rail for when the leaf-set
+grows, not an immediate token win.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "create-byan-agent",
-  "version": "2.20.1",
+  "version": "2.21.0",
   "description": "BYAN v2.8 - Intelligent AI agent creator with ELO trust system + scientific fact-check + Hermes universal dispatcher + native Claude Code integration (hooks, skills, MCP server). Multi-platform (Copilot CLI, Claude Code, Codex). Merise Agile + TDD + 71 Mantras. ~54% LLM cost savings.",
   "main": "src/index.js",
   "bin": {

package/src/byan-v2/dispatcher/complexity-scorer.js CHANGED Viewed

@@ -8,6 +8,12 @@
  * - Factor 4: Keywords (max 25 points)
  *
  * Total score is capped at 100 points.
+ *
+ * Scope note: this scorer answers "how hard is this TASK" to route it to an
+ * executor. It is NOT the model-tier router for native-workflow leaves — that
+ * lives in _byan/mcp/byan-mcp-server/lib/native-tiers.js, which answers "which
+ * model tier does this LEAF deserve". Same exploration intent, different output;
+ * the two are intentionally kept separate.
  */
 class ComplexityScorer {

package/src/byan-v2/dispatcher/task-router.js CHANGED Viewed

@@ -8,6 +8,11 @@
  * - > 60: local execution
  */
+// Scope note: this inner ComplexityScorer is the dispatch-executor scorer
+// (task-tool vs local), with its own scale, distinct from the standalone
+// complexity-scorer.js and from native-tiers.js (the leaf model-tier router).
+// Three scorers, three concerns — kept separate on purpose, not a duplicate to
+// merge. See _byan/mcp/byan-mcp-server/lib/native-tiers.js for routing.
 class ComplexityScorer {
   /**
    * Calculate task complexity score (0-100)