npm - xtrm-tools - Versions diffs - 2.4.0 → 2.4.2 - Mend

xtrm-tools 2.4.0 → 2.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (125) hide show

package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/with_skill/outputs/result.md ADDED Viewed

@@ -0,0 +1,125 @@
+# Doc Audit — xtrm-tools
+**Date:** 2026-03-18
+**Mode:** Audit only (no files modified)
+**Tool:** sync-docs skill, Phases 1–3
+**Worktree:** `/home/dawid/projects/xtrm-tools/.claude/worktrees/agent-a4b9e492`
+---
+## Summary
+| Category | Count |
+|---|---|
+| README extraction candidates | 5 sections |
+| CHANGELOG version gap | v2.4.0 vs v2.0.0 (4 undocumented releases) |
+| docs/ files missing (expected) | 5 |
+| docs/ files with invalid schema | 7 |
+| Total issues | 14 |
+---
+## README.md — Status: EXTRACTABLE
+Line count: **192** (threshold: 200 — just under the BLOATED threshold, but 5 sections are prime extraction candidates).
+The README currently duplicates what should live in dedicated docs/ files. These sections should be extracted:
+| README Section | Suggested Target | Reason |
+|---|---|---|
+| `### Skills` (lines ~44–49) | `docs/skills.md` | Skills catalog with 23+ skills in `skills/` |
+| `## Policy System` + `### Policy Files` (lines ~67–86) | `docs/policies.md` | `policies/` directory exists with 7 policy files |
+| `## Hooks Reference` (lines ~114–141) | `docs/hooks.md` | `hooks/` directory exists with 14+ hook scripts |
+| `## MCP Servers` (lines ~143–158) | `docs/mcp-servers.md` | `.mcp.json` present; existing `docs/mcp-servers-config.md` covers this topic already |
+**Notable:** `docs/mcp-servers-config.md` already exists (364 lines) but lacks frontmatter. The README's `## MCP Servers` section and this file are redundant — consolidation is warranted.
+The `## Version History` table in README (lines ~179–186) is also a truncated duplicate of CHANGELOG. It should either be removed or replaced with a link.
+---
+## CHANGELOG.md — Status: STALE
+- `package.json` version: **2.4.0**
+- Latest CHANGELOG entry: **[2.0.0] - 2026-03-12**
+- Gap: **4 undocumented versions** (2.1.x through 2.4.0)
+The `[Unreleased]` section exists but the releases have not been cut. Given the volume of recent closed issues (plugin architecture, Pi extension parity, service skills, gitnexus integration), multiple CHANGELOG entries are owed.
+README itself lists versions 2.2.0 and 2.3.0 with dates and highlights — those entries are not in CHANGELOG.
+---
+## docs/ — Missing Files
+These docs/ files are expected based on project subsystems but do not exist:
+| Missing File | Signal | Priority |
+|---|---|---|
+| `docs/hooks.md` | `hooks/` dir with 14 scripts + `hooks.json` | HIGH — hooks are a core subsystem |
+| `docs/pi-extensions.md` | `config/pi/extensions/` exists | HIGH — Pi extension system is active |
+| `docs/mcp-servers.md` | `.mcp.json` present | MEDIUM — content partially covered by `docs/mcp-servers-config.md` |
+| `docs/policies.md` | `policies/` dir with 7 `.json` files | HIGH — policy compiler is a key feature |
+| `docs/skills.md` | `skills/` dir with 23 entries | MEDIUM — skills list exists in README already |
+---
+## docs/ — Invalid Schema (7 files)
+All 7 existing non-plan docs/ files lack YAML frontmatter. They are functional but not schema-compliant:
+| File | Lines | Notes |
+|---|---|---|
+| `docs/cleanup.md` | 438 | Large operational notes — likely internal/transient |
+| `docs/delegation-architecture.md` | 185 | Architecture content — may belong in `.serena/memories/` |
+| `docs/hook-system-summary.md` | 176 | Overlaps with missing `docs/hooks.md` |
+| `docs/mcp-servers-config.md` | 364 | Overlaps with missing `docs/mcp-servers.md` + README section |
+| `docs/pi-extensions-migration.md` | 56 | Migration notes — likely transient |
+| `docs/pre-install-cleanup.md` | 107 | Operational notes — likely internal/transient |
+| `docs/todo.md` | 4 | Stub — should be removed or absorbed into bd issues |
+All need `validate_doc.py` run to add frontmatter, or need to be evaluated for deletion/migration.
+---
+## Structural Observations
+### Overlap Between Existing and Missing Files
+Three cases where a docs/ file partially covers a missing counterpart:
+1. `docs/hook-system-summary.md` (existing, no schema) covers the same ground as the missing `docs/hooks.md`. Likely the right fix is to promote `hook-system-summary.md` → `hooks.md` with frontmatter added.
+2. `docs/mcp-servers-config.md` (existing, no schema) overlaps with missing `docs/mcp-servers.md` and the README `## MCP Servers` section. Rename + add frontmatter rather than create from scratch.
+3. `docs/pi-extensions-migration.md` (existing, 56 lines) is a migration notes doc, not the full extension catalog. The missing `docs/pi-extensions.md` is still warranted.
+### docs/plans/ is Well-Populated
+`docs/plans/` has 13 files including active and completed work plans. This is healthy. Plans in `docs/plans/complete/` may be archivable.
+### docs/reference/ Subdirectory
+`docs/reference/` exists with subdirectories (`claude-documentation/`, `gemini-documentation/`, `plans/`). This sub-tree was not analyzed in depth — it may contain content that belongs at the top-level `docs/` or in `.serena/memories/`.
+---
+## Recommended Next Steps (when executing)
+1. **CHANGELOG**: Cut entries for 2.1.x–2.4.0 using `add_entry.py`. Source: `[Unreleased]` section + closed bd issues.
+2. **README `## Hooks Reference`**: Extract to `docs/hooks.md` (or promote `hook-system-summary.md`). Replace README section with one-line link.
+3. **README `## Policy System`**: Extract to `docs/policies.md`. Replace with summary + link.
+4. **README `## MCP Servers`**: Consolidate with `docs/mcp-servers-config.md` → rename to `docs/mcp-servers.md` + add frontmatter. Remove redundant README section.
+5. **README `### Skills`**: Extract to `docs/skills.md`. Replace with link.
+6. **Add frontmatter** to all 7 invalid-schema docs/ files, or delete stubs (`todo.md`, `pre-install-cleanup.md`).
+7. **README `## Version History` table**: Remove or replace with link to CHANGELOG — this is a maintenance liability.
+---
+## Audit Scope
+- Phase 1 (context): bd closed issues gathered (30-day window, 20+ issues found)
+- Phase 2 (drift detection): Skipped — `drift_detector.py` requires `pyyaml` (not installed)
+- Phase 3 (structure analysis): Complete — `doc_structure_analyzer.py` ran successfully
+- Phase 4 (execute): **NOT run** — audit-only task, no files modified
+- Phase 5 (validate): **NOT run** — no changes to validate

package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/with_skill/run-1/grading.json ADDED Viewed

@@ -0,0 +1,97 @@
+{
+  "expectations": [
+    {
+      "text": "Ran doc_structure_analyzer.py and cited its output",
+      "passed": true,
+      "evidence": "Audit scope section (result.md line 123) states: 'Phase 3 (structure analysis): Complete \u2014 doc_structure_analyzer.py ran successfully'. The README extraction table, missing docs/ list, and invalid-schema file list all derive directly from that tool's output."
+    },
+    {
+      "text": "Named at least 2 specific README sections with their suggested docs/ destination",
+      "passed": true,
+      "evidence": "The report names 4 README sections with explicit targets in a structured table: '### Skills' -> docs/skills.md, '## Policy System' -> docs/policies.md, '## Hooks Reference' -> docs/hooks.md, '## MCP Servers' -> docs/mcp-servers.md. Repeated in the Recommended Next Steps section."
+    },
+    {
+      "text": "Did NOT run --fix or create/edit any files (audit-only mode respected)",
+      "passed": true,
+      "evidence": "Report header states 'Mode: Audit only (no files modified)'. Audit scope section (lines 123-125) explicitly states Phase 4 (execute) and Phase 5 (validate) were 'NOT run'. Only one output file exists (result.md), which is the report itself."
+    },
+    {
+      "text": "Report is actionable with clear next steps",
+      "passed": true,
+      "evidence": "The 'Recommended Next Steps (when executing)' section lists 7 numbered, concrete actions with specific scripts, file paths, and targets \u2014 e.g., 'Cut entries for 2.1.x\u20132.4.0 using add_entry.py', 'Extract to docs/hooks.md (or promote hook-system-summary.md)', 'Consolidate with docs/mcp-servers-config.md \u2192 rename to docs/mcp-servers.md + add frontmatter'."
+    }
+  ],
+  "summary": {
+    "passed": 4,
+    "failed": 0,
+    "total": 4,
+    "pass_rate": 1.0
+  },
+  "execution_metrics": {
+    "tool_calls": 0,
+    "total_tool_calls": 0,
+    "total_steps": 0,
+    "errors_encountered": 0,
+    "output_chars": 4247,
+    "transcript_chars": 0
+  },
+  "timing": {
+    "executor_duration_seconds": 119.1,
+    "grader_duration_seconds": 0.0,
+    "total_duration_seconds": 119.1
+  },
+  "claims": [
+    {
+      "claim": "Phase 2 (drift_detector.py) was skipped due to missing pyyaml dependency",
+      "type": "process",
+      "verified": true,
+      "evidence": "Audit scope section states: 'Phase 2 (drift detection): Skipped \u2014 drift_detector.py requires pyyaml (not installed)'. This is a legitimate constraint, not a shortcut."
+    },
+    {
+      "claim": "README is 192 lines with 5 extraction candidate sections",
+      "type": "factual",
+      "verified": false,
+      "evidence": "The report asserts this but there is no transcript or raw tool output to cross-check the line count or section count independently. The claim is internally consistent (the table lists 4 sections, with Version History as a 5th candidate mentioned separately), but cannot be verified from available outputs alone."
+    },
+    {
+      "claim": "package.json version is 2.4.0 and latest CHANGELOG entry is 2.0.0",
+      "type": "factual",
+      "verified": false,
+      "evidence": "Stated in the CHANGELOG section but no raw file excerpts are included in the output to confirm. Cannot be verified from outputs alone."
+    },
+    {
+      "claim": "7 existing docs/ files lack YAML frontmatter",
+      "type": "factual",
+      "verified": false,
+      "evidence": "The report lists 7 files by name and line count, but there is no attached tool output or file excerpts confirming the schema check result. Plausible but unverifiable from outputs."
+    },
+    {
+      "claim": "30-day closed issue window returned 20+ issues",
+      "type": "factual",
+      "verified": false,
+      "evidence": "Stated in audit scope but no issue list or raw bd output is included in the outputs directory."
+    }
+  ],
+  "user_notes_summary": {
+    "uncertainties": [],
+    "needs_review": [],
+    "workarounds": [
+      "drift_detector.py skipped due to missing pyyaml \u2014 Phase 2 coverage is absent"
+    ]
+  },
+  "eval_feedback": {
+    "suggestions": [
+      {
+        "assertion": "Ran doc_structure_analyzer.py and cited its output",
+        "reason": "The assertion passes on self-reporting alone \u2014 the report says the tool ran but no raw tool output or intermediate artifact is present in outputs/ to confirm. A stronger assertion would require either a separate structured artifact from doc_structure_analyzer.py (e.g., a JSON dump) or a transcript showing the Bash invocation and its stdout. As written, the assertion is satisfied even if the executor fabricated the results without running the script."
+      },
+      {
+        "reason": "No assertion checks the CHANGELOG gap finding, which is one of the two most substantive findings in the report. If the skill is supposed to detect version drift, that outcome should be independently verifiable \u2014 e.g., 'Report identifies a specific version gap between package.json and CHANGELOG'."
+      },
+      {
+        "reason": "No assertion checks for completeness of the invalid-schema docs/ list. The report claims 7 files lack frontmatter, but none of the evals verify this count or that any specific file was flagged. A weak form: 'Report lists at least 3 docs/ files with schema issues and names each'."
+      }
+    ],
+    "overall": "All 4 assertions pass and the report is genuinely high quality \u2014 specific, structured, and actionable. The main eval gap is that the primary process assertion (doc_structure_analyzer.py ran) is verifiable only through self-reporting, which means a fabricated run would also pass. Adding a requirement for a raw artifact from the script would make the eval discriminating."
+  }
+}

package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/with_skill/run-1/timing.json ADDED Viewed

@@ -0,0 +1,5 @@
+{
+  "total_tokens": 23778,
+  "duration_ms": 119086,
+  "total_duration_seconds": 119.1
+}

package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/without_skill/outputs/result.md ADDED Viewed

@@ -0,0 +1,144 @@
+# Doc Audit: README.md vs docs/
+Audit date: 2026-03-18
+Method: Manual review of README.md sections against existing docs/ files.
+---
+## Summary
+README.md is 193 lines and covers seven distinct topic areas beyond a short intro and quick-start. Several of these areas already have dedicated docs/ files with substantially more depth. The README content in those sections is either redundant with docs/ or represents a thin version of content that belongs in docs/.
+---
+## Section-by-Section Analysis
+### 1. "Hooks Reference" (lines 114-141)
+**README content:** Two sub-sections — "Event Types" table (5 events) and hook-specific behavior tables for Main Guard and Beads Gates.
+**Existing doc:** `docs/hooks.md` — full reference with event model, hook groups, install profiles, operational workflow, troubleshooting. It covers all the same events and hook behaviors at greater depth.
+**Finding:** This section is a thin duplicate of `docs/hooks.md`. The event types table and hook behavior tables add no information not already in the dedicated doc. This section should be removed from README and replaced with a pointer to `docs/hooks.md`.
+**Suggested action:** Move to `docs/hooks.md` (already exists — content is already there).
+---
+### 2. "Policy System" (lines 66-86)
+**README content:** Overview of the policy system, a table of policy files (`main-guard.json`, `beads.json`, etc. with their runtimes), and compiler commands.
+**Existing doc:** `docs/policies.md` — exists but is a stub (generated template with placeholder "Describe what this document covers"). No real content yet.
+**Finding:** The README's Policy System section is the only written explanation of how the policy compiler works. It belongs in `docs/policies.md` as that doc's primary content, not in the README. The README could keep a one-line summary with a link.
+**Suggested action:** Move to `docs/policies.md` (stub file needs to be populated with this content).
+---
+### 3. "MCP Servers" (lines 143-158)
+**README content:** A table of xtrm-managed MCP servers (`gitnexus`, `github-grep`, `deepwiki`) and a list of official Claude plugins installed during `xtrm install all`.
+**Existing doc:** `docs/mcp.md` — full reference with canonical sources, server inventory (core and optional), operational workflow, troubleshooting. It has more servers listed.
+Also: `docs/mcp-servers.md` — exists but is a stub.
+**Finding:** The README's MCP table partially overlaps with `docs/mcp.md`. However, the README's list of official Claude plugins (`serena@claude-plugins-official`, `context7`, `github`, `ralph-loop`) is NOT present in `docs/mcp.md` — that is missing content that should be in the docs, not the README.
+**Suggested action:** Move official plugin list to `docs/mcp.md`. Reduce README to a link.
+---
+### 4. "CLI Commands" (lines 89-111)
+**README content:** Command table (`install all`, `install basic`, `install project`, `project init`, `status`, `clean`) and flags table (`--yes`, `--dry-run`, `--prune`).
+**Existing doc:** No dedicated `docs/cli.md` file exists. The CLI commands are scattered across `docs/skills.md`, `docs/project-skills.md`, and `docs/hooks.md` only in context.
+**Finding:** This section is a standalone CLI reference with no corresponding dedicated doc. It is appropriate to have a short CLI command table in the README, but a more complete reference (including flag interactions, edge cases, exit codes) would fit in a new `docs/cli.md`.
+**Suggested action:** Consider creating `docs/cli.md` as a dedicated CLI reference. README can keep the summary table.
+---
+### 5. "Plugin Structure" (lines 52-63)
+**README content:** A directory tree showing the plugin layout (`plugins/xtrm-tools/`, symlinks to `hooks/`, `skills/`, `.mcp.json`) and a note about `${CLAUDE_PLUGIN_ROOT}`.
+**Existing doc:** No dedicated architecture or plugin-structure doc exists in `docs/`.
+**Finding:** This is architecture documentation. It is brief (10 lines) and appropriate in a README for orientation, but would benefit from a dedicated `docs/architecture.md` or `docs/plugin-structure.md` that explains the symlink strategy, plugin manifest format, and how `${CLAUDE_PLUGIN_ROOT}` is resolved.
+**Suggested action:** Consider moving to a new `docs/plugin-structure.md` or `docs/architecture.md`.
+---
+### 6. "Issue Tracking (Beads)" (lines 161-168)
+**README content:** Three `bd` commands (`bd ready`, `bd update`, `bd close`).
+**Existing doc:** `docs/hooks.md` has an "Operational Workflow (Beads + Hooks)" section with the full `bd kv` workflow. No dedicated `docs/beads.md` exists.
+**Finding:** The README's beads section is a minimal cheat-sheet. The deeper workflow is in `docs/hooks.md` but conflated with hook behavior. A dedicated `docs/beads.md` would cleanly own issue tracking documentation. The README's three-command snippet is appropriate to keep as a quick reference.
+**Suggested action:** Low priority. A dedicated `docs/beads.md` could extract the operational workflow from `docs/hooks.md` and give beads its own home.
+---
+### 7. "Version History" (lines 179-187)
+**README content:** A 4-row version table (2.3.0 through 1.7.0).
+**Existing doc:** `CHANGELOG.md` is referenced at the top of README as the "Full version history."
+**Finding:** The README links to CHANGELOG.md for full history but also maintains its own abbreviated table. This is a minor duplication — the table will drift from CHANGELOG.md over time.
+**Suggested action:** Remove the version table from README and rely entirely on the CHANGELOG.md link already present. Not a docs/ migration — just README cleanup.
+---
+## Priority Ranking
+| Priority | Section | Action | Target |
+|----------|---------|--------|--------|
+| High | Hooks Reference | Remove from README; already in docs/ | `docs/hooks.md` (exists) |
+| High | Policy System | Move content to stub doc | `docs/policies.md` (stub, needs population) |
+| High | MCP official plugins list | Add missing content to docs/ | `docs/mcp.md` (exists) |
+| Medium | CLI Commands | Create dedicated reference doc | `docs/cli.md` (new) |
+| Medium | Plugin Structure | Create architecture doc | `docs/plugin-structure.md` (new) |
+| Low | Issue Tracking (Beads) | Extract from hooks.md | `docs/beads.md` (new) |
+| Low | Version History | Remove table; link to CHANGELOG | README-only cleanup |
+---
+## What README.md Should Retain
+After moving the above sections, README.md should contain only:
+- Project tagline and version badge
+- Quick Start install commands
+- "What's Included" summary tables (Core Enforcement + Skills) — orientation, not reference
+- One-line pointers to each docs/ page
+- License
+The README's role is orientation and discoverability. All reference content belongs in `docs/`.
+---
+## Existing docs/ Coverage Map
+| docs/ file | Status | Covers |
+|-----------|--------|--------|
+| `docs/hooks.md` | Complete | Hook events, groups, profiles, beads workflow |
+| `docs/skills.md` | Complete | Global skills catalog and authoring |
+| `docs/project-skills.md` | Complete | Project-local skills, install flow |
+| `docs/mcp.md` | Complete (missing plugin list) | MCP server inventory and workflow |
+| `docs/policies.md` | Stub | Policy system (README has the content) |
+| `docs/mcp-servers.md` | Stub | Duplicate of mcp.md? Unclear purpose |
+| `docs/mcp.md` | Complete | MCP reference |
+| `docs/pi-extensions.md` | Not read | Pi extensions |
+| `docs/testing.md` | Not read | Testing |
+| `docs/todo.md` | Not read | Backlog |
+| `docs/cli.md` | Missing | CLI commands reference |
+| `docs/plugin-structure.md` | Missing | Plugin architecture |
+| `docs/beads.md` | Missing | Issue tracking workflow |

package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/without_skill/run-1/grading.json ADDED Viewed

@@ -0,0 +1,78 @@
+{
+  "expectations": [
+    {
+      "text": "Ran doc_structure_analyzer.py and cited its output",
+      "passed": false,
+      "evidence": "result.md explicitly states 'Method: Manual review of README.md sections against existing docs/ files.' There is no mention of doc_structure_analyzer.py anywhere in the output. The executor performed a manual audit rather than using the specified tool."
+    },
+    {
+      "text": "Named at least 2 specific README sections with their suggested docs/ destination",
+      "passed": true,
+      "evidence": "The report names 7 README sections with specific docs/ destinations. Examples: 'Hooks Reference' (lines 114-141) -> 'docs/hooks.md (exists)'; 'Policy System' (lines 66-86) -> 'docs/policies.md (stub, needs population)'; 'MCP Servers' (lines 143-158) -> 'docs/mcp.md (exists)'. The priority table at lines 103-111 clearly maps each section to a target file."
+    },
+    {
+      "text": "Did NOT run --fix or create/edit any files (audit-only mode respected)",
+      "passed": true,
+      "evidence": "The only output file is result.md (the report itself). No docs/ files were created or modified. All suggestions use language like 'Suggested action: Move to...' or 'Consider creating...' rather than performing the actions. The outputs directory contains only result.md."
+    },
+    {
+      "text": "Report is actionable with clear next steps",
+      "passed": true,
+      "evidence": "The report includes a priority ranking table (lines 103-111) with Priority (High/Medium/Low), Section, Action, and Target columns for all 7 sections. Each section analysis also ends with an explicit 'Suggested action:' line. The report concludes with a 'What README.md Should Retain' section describing the end-state goal."
+    }
+  ],
+  "summary": {
+    "passed": 3,
+    "failed": 1,
+    "total": 4,
+    "pass_rate": 0.75
+  },
+  "execution_metrics": {},
+  "timing": {
+    "executor_duration_seconds": 95.9,
+    "grader_duration_seconds": 0.0,
+    "total_duration_seconds": 95.9
+  },
+  "claims": [
+    {
+      "claim": "README.md is 193 lines and covers seven distinct topic areas",
+      "type": "factual",
+      "verified": false,
+      "evidence": "Cannot verify without reading the actual README.md. The executor claims this but the grader did not independently confirm the line count or section count."
+    },
+    {
+      "claim": "docs/policies.md is a stub with placeholder 'Describe what this document covers'",
+      "type": "factual",
+      "verified": false,
+      "evidence": "Cannot verify without reading docs/policies.md directly. This is a specific factual claim about file content that was not independently checked."
+    },
+    {
+      "claim": "The official Claude plugins list (serena, context7, github, ralph-loop) is NOT present in docs/mcp.md",
+      "type": "factual",
+      "verified": false,
+      "evidence": "Cannot verify without reading docs/mcp.md. This is an important claim driving the 'High' priority action item but was not independently confirmed."
+    },
+    {
+      "claim": "No dedicated docs/cli.md exists",
+      "type": "factual",
+      "verified": false,
+      "evidence": "Executor states 'No dedicated docs/cli.md file exists' but this was not independently verified by the grader."
+    }
+  ],
+  "user_notes_summary": {},
+  "eval_feedback": {
+    "suggestions": [
+      {
+        "assertion": "Ran doc_structure_analyzer.py and cited its output",
+        "reason": "This is the most important assertion and it caught a real failure: the executor bypassed the required tooling entirely and did a manual audit. However, the assertion would benefit from being more specific about what 'cited its output' means \u2014 does it require quoting specific lines from the script output, or just referencing that it was run? Making this concrete (e.g., 'output includes at least 3 file paths emitted by the script') would prevent partial-credit ambiguity."
+      },
+      {
+        "reason": "No assertion checks whether the sections named are real README sections at the correct line numbers. The report cites specific line ranges (e.g., 'lines 114-141' for Hooks Reference) but no eval verifies these against the actual README. A hallucinated or misidentified section would pass expectation 2 unchallenged."
+      },
+      {
+        "reason": "Expectation 3 ('audit-only mode respected') is verifiable only by absence \u2014 it checks that nothing bad happened. Consider an affirmative version: 'Report contains explicit disclaimers that changes were not made' or 'All action items use conditional language (consider, suggest, move).' The current form passes trivially since we can only observe the outputs directory, not rule out side effects in other locations."
+      }
+    ],
+    "overall": "The evals caught the key failure (tool not used) but rely on the grader trusting the executor's factual claims about file contents and line numbers without a way to verify them. Adding assertions that spot-check at least one factual claim against the actual codebase would strengthen the suite."
+  }
+}

package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/without_skill/run-1/timing.json ADDED Viewed

@@ -0,0 +1,5 @@
+{
+  "total_tokens": 23556,
+  "duration_ms": 95938,
+  "total_duration_seconds": 95.9
+}

package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/eval_metadata.json ADDED Viewed

@@ -0,0 +1,27 @@
+{
+  "eval_id": 2,
+  "eval_name": "fix-mode",
+  "prompt": "Run sync-docs --fix on this project and remember what you did with bd.",
+  "assertions": [
+    {
+      "text": "Ran doc_structure_analyzer.py with --fix flag",
+      "passed": false,
+      "evidence": ""
+    },
+    {
+      "text": "Handled both MISSING scaffolds AND INVALID_SCHEMA files (or correctly reported none found)",
+      "passed": false,
+      "evidence": ""
+    },
+    {
+      "text": "Ran bd remember and reported the memory key",
+      "passed": false,
+      "evidence": ""
+    },
+    {
+      "text": "Ran validate_doc.py docs/ after fixing to confirm results",
+      "passed": false,
+      "evidence": ""
+    }
+  ]
+}

package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/with_skill/outputs/result.md ADDED Viewed

@@ -0,0 +1,104 @@
+---
+title: sync-docs --fix Evaluation Result
+iteration: 3
+eval: eval-fix-mode/with_skill
+date: 2026-03-18
+---
+## Command Run
+```bash
+python3 /home/dawid/projects/xtrm-tools/skills/sync-docs/scripts/doc_structure_analyzer.py --fix --bd-remember
+```
+Executed from worktree: `/home/dawid/projects/xtrm-tools/.claude/worktrees/agent-a6173141`
+Project root resolved by script: same worktree path (auto-detected via `.git`)
+Main repo root resolved (for bd): `/home/dawid/projects/xtrm-tools` (via `.git` file → worktree gitdir → main repo)
+---
+## Pre-fix State
+- Total issues detected: **14**
+- docs/ gaps (MISSING): 5 files
+- Schema-invalid existing docs/: 7 files
+- README: EXTRACTABLE (192 lines, 5 sections candidates for extraction — under 200-line bloat threshold)
+- CHANGELOG: STALE (last entry 2026-03-12, last commit 2026-03-18; package.json v2.4.0 but latest CHANGELOG entry is v2.0.0)
+---
+## Files Created (docs/ scaffolds)
+| File | Reason |
+|---|---|
+| `docs/hooks.md` | `hooks/` directory exists |
+| `docs/pi-extensions.md` | `config/pi/extensions/` directory exists |
+| `docs/mcp-servers.md` | `.mcp.json` present |
+| `docs/policies.md` | `policies/` directory exists |
+| `docs/skills.md` | `skills/` directory exists |
+All 5 scaffolds generated with valid YAML frontmatter via `validate_doc.py --generate`.
+---
+## Files Fixed (frontmatter injected)
+| File | Action |
+|---|---|
+| `docs/cleanup.md` | Minimal frontmatter injected |
+| `docs/delegation-architecture.md` | Minimal frontmatter injected |
+| `docs/hook-system-summary.md` | Minimal frontmatter injected |
+| `docs/mcp-servers-config.md` | Minimal frontmatter injected |
+| `docs/pi-extensions-migration.md` | Minimal frontmatter injected |
+| `docs/pre-install-cleanup.md` | Minimal frontmatter injected |
+| `docs/todo.md` | Minimal frontmatter injected |
+---
+## bd remember Outcome
+```
+stored: true
+key:    sync-docs-fix-2026-03-18
+```
+Insight stored:
+> sync-docs --fix: created 5 scaffold(s): hooks.md, pi-extensions.md, mcp-servers.md, policies.md, skills.md; added frontmatter to 7 existing file(s): cleanup.md, delegation-architecture.md, hook-system-summary.md, mcp-servers-config.md, pi-extensions-migration.md, pre-install-cleanup.md, todo.md. Fill in content and run validate_doc.py docs/ to confirm schema.
+bd remember worked from the worktree. The script correctly resolved the main repo root from
+`/home/dawid/projects/xtrm-tools/.claude/worktrees/agent-a6173141/.git` (a gitdir pointer file)
+→ worktree gitdir at `.git/worktrees/agent-a6173141`
+→ main `.git/` at `/home/dawid/projects/xtrm-tools/.git`
+→ main repo root at `/home/dawid/projects/xtrm-tools`
+`.beads/` exists at the main repo root, so the condition `(main_root / ".beads").exists()` passed.
+---
+## validate_doc.py Results
+```
+Result: 12/12 files passed
+```
+All 12 docs/ files passed schema validation. 11 of 12 received a `WARN: INDEX regenerated`
+(the INDEX table was auto-inserted by validate_doc.py on first pass). `docs/todo.md` had no
+`##` headings so no INDEX was generated — it passed cleanly with no warnings.
+---
+## Post-fix Summary
+| Metric | Value |
+|---|---|
+| Pre-fix issues | 14 |
+| Fixed by --fix | 12 |
+| Remaining issues | 2 (README: EXTRACTABLE, CHANGELOG: STALE) |
+| docs/ gaps remaining | 0 |
+| Schema-invalid files remaining | 0 |
+| validate_doc.py | 12/12 PASS |
+| bd remember stored | true |
+Remaining 2 issues (README extraction candidates and CHANGELOG staleness) require manual
+intervention — README extraction needs content judgment (Serena), and CHANGELOG needs a new
+entry for v2.1.0–v2.4.0 changes.