npm - @goondocks/myco - Versions diffs - 0.20.2 → 0.21.1 - Mend

@goondocks/myco 0.20.2 → 0.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (216) hide show

package/dist/src/agent/definitions/tasks/skill-evolve.yaml CHANGED Viewed

@@ -2,17 +2,20 @@ name: skill-evolve
 displayName: Skill Evolution
 description: >-
   Evaluate and evolve existing Myco-managed skills. Assesses content
-  freshness, identifies merge and narrowness opportunities, and
-  autonomously consolidates the skill inventory.
+  freshness against new spores AND codebase drift from refactors,
+  identifies merge and narrowness opportunities, and autonomously
+  consolidates the skill inventory.
 agent: myco-agent
 prompt: >-
-  Assess and evolve skills that have new knowledge or structural
-  overlap. The instruction contains pre-filtered skills with their
-  content, new spore IDs, and pre-computed similarity analysis.
+  Assess and evolve skills that have new knowledge, codebase drift,
+  or structural overlap. The instruction contains pre-filtered skills
+  with their content, new spore IDs, and pre-computed similarity
+  analysis; the verify phase fact-checks the broader active skill
+  set against the codebase.
 isDefault: false
-model: claude-sonnet-4-6
-maxTurns: 48
-timeoutSeconds: 1800
+reasoningLevel: default
+maxTurns: 72
+timeoutSeconds: 2400
 schedule:
   enabled: false
   intervalSeconds: 900
@@ -25,7 +28,7 @@ params:
   max_skills_per_run: 8
 phases:
   - name: inventory
-    model: claude-haiku-4-5-20251001
+    reasoningLevel: low
     prompt: |
       The instruction contains pre-computed structural analysis of
       all active skills. Two mechanical signals are provided:
@@ -81,13 +84,145 @@ phases:
     required: true
     readOnly: true
+  - name: verify
+    reasoningLevel: default
+    prompt: |
+      Fact-check active skills against the actual codebase. This phase
+      catches silent drift from refactors that didn't produce spores —
+      e.g., a file was renamed, a function was removed, a pattern was
+      replaced — so the spore-driven assessment in the next phase would
+      otherwise leave the skill's stale content in place.
+      ## Selection
+      List active skills via vault_skill_records (action: list,
+      status: active). Prioritize up to 3 skills with the OLDEST
+      `last_verified_at` in properties (missing/zero counts as oldest,
+      so never-verified skills sort to the front). If fewer than 3
+      active skills exist, verify them all. This watermark is
+      independent of `last_assessed_at` — it rotates verify coverage
+      even when assess never touches the skill, which is the whole
+      point: skills that never get new spores still need to be
+      fact-checked for silent refactor drift. Full rotation time
+      scales with the active skill count divided by 3, so a larger
+      skill inventory naturally stretches the verify cadence.
+      Skills flagged by the inventory phase (merge_candidates or
+      narrow_candidates in vault_state:skill-evolve-inventory) can be
+      skipped here — they're already going to be rewritten in act.
+      ## Per-skill procedure
+      For each selected skill:
+      1. Read full content via vault_skill_records (action: get).
+      2. Identify the 3-5 most LOAD-BEARING concrete claims. A claim
+         is load-bearing if the procedure depends on it being true.
+         Examples:
+         - "Edit `packages/myco/src/db/schema.ts`" — path claim
+         - "Call `setFocusedPanel()`" — identifier claim
+         - "The daemon restarts executor on SIGHUP" — behavior claim
+         Skip generic advice ("use good naming"), tool names that are
+         universal (e.g., `grep`), and illustrative code snippets that
+         aren't literal file content.
+      3. Verify each claim with the cheapest tool that answers it:
+         - **Path claim** → fs_read with a narrow line window (e.g.,
+           start_line=1, end_line=30). ENOENT or unrelated content =
+           MISSING/OUTDATED. For a DIRECTORY path, use fs_list or
+           fs_tree instead — fs_read on a directory returns an error
+           and burns a turn.
+         - **Identifier claim** → code_grep for the symbol with a
+           path+glob filter. Zero hits in expected location = MISSING.
+         - **Behavior claim** → code_grep for signature patterns that
+           would implement it. Zero hits = MISSING.
+         HARD LIMIT: **at most 2 tool calls per claim**, and **at most
+         4 claims per skill**, so ~8 filesystem calls per skill
+         absolute maximum. If your first grep returns too many hits
+         or the wrong ones, DO NOT refine and retry — accept
+         INCONCLUSIVE and move on. Exploratory refinement is exactly
+         how this phase burned through budget on earlier runs.
+      4. Categorize each claim: VERIFIED / MISSING / OUTDATED /
+         INCONCLUSIVE. INCONCLUSIVE is fine when the claim is too
+         fuzzy to verify cheaply — don't burn turns on it.
+      5. Aggregate for the skill:
+         - confidence: high (>=80% verified), medium, or low
+         - severity: none / minor (1 cosmetic miss) / major
+           (load-bearing miss) / critical (most refs gone)
+      6. Update the skill's verify watermark REGARDLESS of severity —
+         skipping this step on severity=none would make verify re-pick
+         the same skill forever. Example call:
+           vault_skill_records({
+             action: "update",
+             id: "<the-skill-uuid>",
+             properties: "{\"last_verified_at\": 1776580022}"
+           })
+         `properties` is a JSON string, merged into the skill's existing
+         properties. Any recent epoch-seconds integer works as the
+         timestamp — this is a rotation cursor, not an audit timestamp.
+      ## Budget discipline
+      Total budget for this phase: {{max_turns}} turns covering ALL
+      selected skills combined. Plan per-skill budget as roughly
+      `{{max_turns}} / skills_selected` turns. Recommended shape:
+      1 content read + up to 4 claim checks × up to 2 tool calls +
+      1 watermark write. Stop verifying a skill as soon as severity
+      is clear — finding one MISSING load-bearing claim is enough
+      to flag the skill; don't check remaining claims.
+      Prefer narrow fs_read windows (start/end lines) over whole-file
+      reads. Prefer code_grep with a path/glob filter over an open grep.
+      When a tool returns "Not a file" or zero matches, accept that
+      answer and move on — do not retry with adjusted args.
+      Tools available for this phase: {{phase_tools}}.
+      ## Store results
+      Store via vault_set_state (key: skill-evolve-drift) as JSON:
+      {
+        "verified_at": <epoch-seconds>,
+        "reports": [
+          {
+            "skill_id": "...",
+            "name": "...",
+            "severity": "none|minor|major|critical",
+            "confidence": "high|medium|low",
+            "notes": "Short: what's missing or outdated.",
+            "load_bearing_misses": ["claim 1", "claim 2"]
+          }
+        ]
+      }
+      Report via vault_report. If no active skills are selectable
+      (e.g., all were flagged by inventory for merge), store an empty
+      reports array and report skip.
+    tools:
+      - vault_skill_records
+      - vault_set_state
+      - vault_report
+      - fs_read
+      - fs_list
+      - code_grep
+    maxTurns: 24
+    required: true
+    dependsOn:
+      - inventory
   - name: assess
-    model: claude-sonnet-4-6
+    reasoningLevel: default
     prompt: |
-      Read the inventory analysis from vault_state
-      (key: skill-evolve-inventory).
+      Read two pre-computed inputs from vault_state:
+        - key: skill-evolve-inventory (merge/narrow candidates)
+        - key: skill-evolve-drift (codebase verification reports)
-      There are TWO sources of skills to assess:
+      There are THREE sources of skills to assess:
       **A. Skills with new knowledge** — listed in the instruction
       with descriptions and new spore IDs (full content is NOT in
@@ -98,8 +233,8 @@ phases:
          get, id: "<name>") and verify 2-3 code references via
          vault_search_fts. Skip content reads for skills where
          the new spores are clearly unrelated.
-      3. Check the inventory analysis: is this skill also flagged
-         for merge or narrowness?
+      3. Check the inventory AND drift analyses: is this skill also
+         flagged for merge, narrowness, or codebase drift?
       **B. Inventory-flagged skills** — merge_candidates and
       narrow_candidates from the inventory analysis. These may NOT
@@ -109,12 +244,30 @@ phases:
       2. Verify the inventory's merge/narrow recommendation by
          reading both skills' content.
-      For ALL skills from both sources, classify with one of:
+      **C. Drift-flagged skills** — reports from the verify phase with
+      severity of `minor`, `major`, or `critical`. These may NOT appear
+      in the instruction (no new spores) but need attention because
+      their content no longer matches the codebase. For each drift
+      report not already covered above:
+      1. Read the skill's content via vault_skill_records (action: get).
+      2. Trust the verify phase's load_bearing_misses as the STALE
+         detail set. You do NOT need to re-run fs_read/code_grep —
+         the verify phase already did that work. Only re-check if
+         the verify report's confidence is low.
+      3. Major/critical severity with confidence high/medium should
+         classify as STALE (or DEPRECATED if the entire subsystem
+         described is gone). Minor severity may stay CURRENT unless
+         combined with new-spore evidence.
+      For ALL skills from all three sources, classify with one of:
          - CURRENT — still accurate, no changes needed.
-         - STALE — new knowledge changes specific steps, paths,
-           or gotchas. Note exactly WHAT is new.
+         - STALE — new knowledge OR drift report changes specific
+           steps, paths, or gotchas. Note exactly WHAT is new or
+           wrong (cite the drift report's load_bearing_misses if
+           that's the driver).
          - DEPRECATED — key code references are gone or the
            procedure is no longer relevant. Note what's missing.
+           Drift severity=critical is the strongest signal here.
          - MERGE — overlaps significantly with another skill
            (from inventory analysis). Note the TARGET skill to
            merge into.
@@ -122,10 +275,11 @@ phases:
            inventory analysis). Note the BROADER skill to absorb
            into.
-         Bias toward CURRENT. A skill that is 90% accurate is
-         better left alone than rewritten with risk of losing detail.
-         Only classify MERGE/NARROW when the inventory analysis
-         supports it AND you agree after reading the content.
+         Bias toward CURRENT for cosmetic issues. A skill that is 90%
+         accurate is better left alone than rewritten with risk of
+         losing detail. Do NOT bias toward CURRENT when a drift report
+         flags load-bearing misses with high confidence — that's
+         exactly the refactor-drift case this pipeline exists to catch.
       5. Update the skill's properties with the new watermark:
          vault_skill_records (action: update, id: <skill_id>,
@@ -143,8 +297,9 @@ phases:
       Report via vault_report.
       If the instruction says "No skills need assessment" AND the
-      inventory has no merge/narrow candidates, report skip via
-      vault_report and finish.
+      inventory has no merge/narrow candidates AND the drift report
+      has no minor+ severity entries, report skip via vault_report
+      and finish.
     tools:
       - vault_spores
       - vault_search_fts
@@ -155,9 +310,10 @@ phases:
     required: true
     dependsOn:
       - inventory
+      - verify
   - name: act
-    model: claude-haiku-4-5-20251001
+    reasoningLevel: low
     prompt: |
       Read classifications from vault_state
       (key: skill-evolve-classifications). Parse the JSON.

package/dist/src/agent/definitions/tasks/skill-generate.yaml CHANGED Viewed

@@ -10,8 +10,8 @@ prompt: >-
   The instruction contains the candidate metadata and pre-assembled
   source material. Stage the skill, validate it, then finalize it.
 isDefault: false
-model: claude-sonnet-4-6
-maxTurns: 30
+reasoningLevel: default
+maxTurns: 32
 timeoutSeconds: 900
 schedule:
   enabled: false
@@ -177,9 +177,22 @@ phases:
          staging path's dedup gate already ran on your description;
          this is a sanity check on the procedural content itself.
-      7. **Accuracy:** Spot-check 2-3 specific claims against the
-         vault. Use vault_search_fts to verify file paths and
-         function names mentioned in the skill.
+      7. **Accuracy (codebase-grounded):** Identify the 3-5 most
+         LOAD-BEARING concrete claims — paths, function/class names,
+         patterns the procedure depends on — and verify each against
+         the actual source:
+         - **Path claim** → `fs_read` (narrow window). ENOENT or
+           unrelated content = FAIL.
+         - **Identifier claim** → `code_grep` with a path+glob filter.
+           Zero hits in the expected location = FAIL.
+         - **Behavior claim** → `code_grep` for signature patterns
+           that would implement it. Zero hits = FAIL.
+         Budget 1-2 tool calls per claim.
+         If any load-bearing claim FAILS, treat as a criterion failure
+         and re-stage with corrected content. Do not finalize a skill
+         whose load-bearing paths can't be verified against the
+         current codebase — that's how hallucinated paths ship.
       ## If criteria fail
@@ -216,9 +229,10 @@ phases:
       - vault_skill_records
       - vault_skill_candidates
       - vault_spores
-      - vault_search_fts
       - vault_report
-    maxTurns: 15
+      - fs_read
+      - code_grep
+    maxTurns: 18
     required: true
     dependsOn:
       - draft

package/dist/src/agent/definitions/tasks/skill-survey.yaml CHANGED Viewed

@@ -9,7 +9,7 @@ prompt: >-
   Survey the vault knowledge graph for procedural domain candidates.
   The instruction contains pre-assembled vault context.
 isDefault: false
-model: claude-sonnet-4-6
+reasoningLevel: default
 maxTurns: 35
 timeoutSeconds: 1800
 schedule:
@@ -72,8 +72,6 @@ phases:
            a domain you spotted in the digest
          - vault_spores to read full content of high-signal spores
            summarized in the baseline
-         - vault_entities for components with high mention counts
-           that might anchor a domain
          Do NOT exhaustively paginate or search every keyword.
          You have ~12 tool calls — use them purposefully.
@@ -89,8 +87,6 @@ phases:
       Store your domain clusters in working notes for the next phase.
     tools:
       - vault_spores
-      - vault_entities
-      - vault_edges
       - vault_sessions
       - vault_search_fts
       - vault_search_semantic
@@ -99,7 +95,7 @@ phases:
     readOnly: true
   - name: synthesize-evaluate
-    model: claude-sonnet-4-6
+    reasoningLevel: default
     prompt: |
       The explore phase identified procedural domain clusters from
       the vault. Now evaluate each and create candidates for domains

package/dist/src/agent/definitions/tasks/supersession-sweep.yaml CHANGED Viewed

@@ -13,7 +13,7 @@ description: >
   Records resolution events for audit trail.
 agent: myco-agent
 isDefault: false
-model: claude-haiku-4-5-20251001
+reasoningLevel: low
 maxTurns: 30
 timeoutSeconds: 300

package/dist/src/agent/definitions/tasks/title-summary.yaml CHANGED Viewed

@@ -13,9 +13,9 @@ description: >
   title and informative summary.
 agent: myco-agent
 isDefault: false
-model: claude-haiku-4-5-20251001
+reasoningLevel: low
 maxTurns: 15
-timeoutSeconds: 120
+timeoutSeconds: 300
 prompt: |
   Generate or update session titles and summaries. Budget: ~12 turns.
@@ -28,27 +28,24 @@ prompt: |
   **If a target session ID is specified above:**
   Always process it — the user explicitly requested regeneration.
-  Call `vault_sessions` with `include_active: true` to get its current
-  title/summary (the session may still be active). Call `vault_search_fts`
-  with the session ID to find ALL its prompt batches (processed and
-  unprocessed). Do NOT skip even if there are no new unprocessed batches.
+  Call `vault_session_summary_material` with `session_id: <target session>`
+  to get the current title/summary plus the compact ordered prompt-batch arc.
+  Do NOT skip even if there are no new unprocessed batches.
   **If no target session specified (automatic run):**
   Call `vault_unprocessed` with `include_active: true` — titles are needed
   mid-flight, not just after a session completes. Group by session_id —
-  each session with unprocessed batches needs its summary updated. Call
-  `vault_sessions` with `include_active: true` for those sessions. If no
-  sessions have unprocessed batches, report "skip" and finish.
+  each session with unprocessed batches needs its summary updated. For each
+  session, call `vault_session_summary_material`. If no sessions have
+  unprocessed batches, report "skip" and finish.
   ## Phase 2 — Update Each Session (budget: 8 turns)
   For each session to process:
   1. Read the EXISTING title and summary — context from prior runs.
-  2. Read the session's prompt batches — user_prompt + response_summary.
-     For targeted sessions, use `vault_search_fts` results from Phase 1.
-     For automatic runs, use `vault_unprocessed` results.
-     This is your PRIMARY source — read through the full set to understand
-     the complete arc of work.
+  2. Read the session's compact prompt-batch arc — user_prompt +
+     response_summary. This is your PRIMARY source — read through the full
+     set to understand the complete arc of work.
   3. Generate a title and summary from the full session content.
   4. Call `vault_update_session` with BOTH title and summary.
@@ -81,10 +78,6 @@ prompt: |
 toolOverrides:
   - vault_unprocessed
-  - vault_sessions
-  - vault_spores
-  - vault_search_fts
-  - vault_state
+  - vault_session_summary_material
   - vault_update_session
-  - vault_set_state
   - vault_report