npm - murmur8 - Versions diffs - 4.4.0 → 4.5.1 - Mend

murmur8 4.4.0 → 4.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/.blueprint/features/feature_feedback-test/FEATURE_SPEC.md +229 -0
package/.blueprint/features/feature_feedback-test/IMPLEMENTATION_PLAN.md +25 -0
package/.blueprint/features/feature_feedback-test/handoff-alex.md +20 -0
package/.blueprint/features/feature_feedback-test/handoff-cass.md +21 -0
package/.blueprint/features/feature_feedback-test/handoff-nigel.md +20 -0
package/.blueprint/features/feature_feedback-test/story-config-management.md +103 -0
package/.blueprint/features/feature_feedback-test/story-parse-pipeline.md +65 -0
package/.blueprint/features/feature_feedback-test/story-validation-normalisation.md +99 -0
package/README.md +18 -0
package/SKILL.md +35 -24
package/package.json +1 -1
package/src/commands/history.js +41 -2
package/src/history.js +31 -0
package/src/index.js +2 -1
package/src/murm.js +50 -0

package/.blueprint/features/feature_feedback-test/FEATURE_SPEC.md ADDED Viewed

@@ -0,0 +1,229 @@
+# Feature Specification — Feedback Module Test Suite
+## 1. Feature Intent
+**Why this feature exists.**
+- **Problem being addressed:** The `src/feedback.js` module provides foundational logic for the agent feedback loop — schema validation, quality gate evaluation, key normalisation, config management, and feedback parsing. No test file directly imports and exercises this module's exported API. Existing tests (feature_feedback-loop, feature_compressed-feedback) re-implement helper logic inline rather than testing the real module, leaving the production code untested.
+- **User need:** Developers maintaining or extending `src/feedback.js` need confidence that the exported functions behave correctly and that regressions are caught immediately by the test suite.
+- **System alignment:** Per SYSTEM_SPEC.md:Section 7 (Implementation Rules), "tests are contracts" and the suite must be green before a feature is considered complete. Untested production modules violate this principle and expose the pipeline to silent breakage.
+> This feature creates a direct unit-test harness for `src/feedback.js`, closing the coverage gap introduced by the feedback-loop and compressed-feedback features.
+---
+## 2. Scope
+### In Scope
+- Unit tests that `require('../src/feedback')` and call its exported functions directly
+- Coverage of all exported functions:
+  - `validateFeedback(feedback)` — schema validation
+  - `normalizeFeedbackKeys(feedback)` — `rec` → `recommendation` normalisation
+  - `parseFeedbackFromOutput(output)` — regex extraction and JSON parsing
+  - `shouldPause(feedback, config)` — quality gate decision logic
+  - `getDefaultConfig()` — default config shape and values
+  - `readConfig()` — file read with fallback to defaults
+  - `writeConfig(config)` — file write
+  - `setConfigValue(key, value)` — validated config mutation
+  - `resetConfig()` — restores defaults
+  - `displayConfig()` — smoke test (no crash, correct output shape)
+- File system isolation using `tmp` directories, matching the pattern established by `feature_feedback-loop.test.js`
+- Edge cases: corrupt config file, missing config file, boundary rating values, both `rec` and `recommendation` keys present
+### Out of Scope
+- Testing agent prompt text (covered by feature_compressed-feedback)
+- Integration tests spanning multiple modules (covered by feature_feedback-loop)
+- Testing `displayConfig` output formatting exhaustively (smoke test only)
+- Testing the insights calibration or issue-correlation logic (covered by feature_feedback-loop:Feedback Insights)
+---
+## 3. Actors Involved
+### Developer / Test Runner
+- **Can do:** Run `node --test test/feature_feedback-test.test.js` to verify `src/feedback.js` behaviour
+- **Cannot do:** Modify production code via test execution
+### src/feedback.js (module under test)
+- **Provides:** All exported functions listed in Section 2
+- **Constrained by:** Existing call sites; test must not require API changes
+### File System (test isolation)
+- **Provides:** Temporary directories for config file read/write tests
+- **Pattern:** `fs.mkdtempSync` setup / `fs.rmSync` teardown per describe block
+---
+## 4. Behaviour Overview
+### Happy Path: All Exported Functions Are Tested
+1. Test file imports `src/feedback.js` module
+2. Each exported function has one or more test cases covering:
+   - Correct inputs → expected outputs
+   - Boundary inputs → correct handling
+   - Invalid inputs → appropriate rejection or graceful degradation
+3. File system tests use isolated `tmp` directories to avoid cross-test pollution
+4. `process.chdir` is restored after each file-system test group
+5. All tests pass green; CI accepts the file
+### Alternative: Config File Corruption
+1. Test writes deliberately malformed JSON to the config file path
+2. `readConfig()` catches the parse error and returns defaults
+3. No exception propagates; test asserts returned value equals `getDefaultConfig()`
+### Alternative: Boundary Rating Validation
+1. Tests cover ratings 1, 5 (valid boundaries) and 0, 6 (invalid outside range)
+2. Tests cover `rating: 3.5` (non-integer, invalid) and `rating: 3` (integer, valid)
+3. `validateFeedback` returns `{ valid: false, errors: [...] }` for all invalid cases
+---
+## 5. State & Lifecycle Interactions
+- **State-creating:** None — the test file does not introduce new runtime state
+- **State-constrained:** Tests manage transient file system state (tmp directories)
+- **Module lifecycle:** `require('../src/feedback')` is resolved once per test file run; config file paths are relative and resolved against `process.cwd()` which tests temporarily redirect
+**Key constraint:** `src/feedback.js` uses a module-level constant `CONFIG_FILE = '.claude/feedback-config.json'` resolved relative to `process.cwd()`. Tests must `process.chdir(testDir)` before any call that reads/writes config, and restore `process.cwd()` in teardown.
+---
+## 6. Rules & Decision Logic
+### Rule 1: Direct Module Import Required
+- **Description:** Tests must import the real `src/feedback.js` rather than re-implementing its logic
+- **Rationale:** Inline reimplementation does not catch bugs in production code
+- **Inputs:** `require('../src/feedback')`
+- **Outputs:** Live module reference
+- **Type:** Structural constraint
+### Rule 2: Isolated File System Per Describe Block
+- **Description:** Each describe block that touches the config file must set up and tear down its own `tmp` directory
+- **Inputs:** `fs.mkdtempSync`, `process.chdir`
+- **Outputs:** Isolated state per describe block
+- **Type:** Deterministic
+### Rule 3: Boundary Coverage for Rating
+- **Description:** Rating validation must be tested at values 0, 1, 3, 5, 6 and non-integer 3.5
+- **Type:** Deterministic
+### Rule 4: Dual-Key Normalisation Coverage
+- **Description:** `normalizeFeedbackKeys` must be tested for: `rec` only, `recommendation` only, both present (recommendation wins), neither present
+- **Type:** Deterministic
+### Rule 5: Parse-and-Validate Pipeline
+- **Description:** At least one test must chain `parseFeedbackFromOutput` → `normalizeFeedbackKeys` → `validateFeedback` to verify the end-to-end extraction path works against the real module
+- **Type:** Integration within module boundary
+---
+## 7. Dependencies
+### System Components
+- **src/feedback.js:** Module under test — no modifications required
+- **node:test, node:assert:** Node.js built-in test runner and assertions (Node 18+)
+- **fs, path, os:** Standard library for file system isolation
+### File Dependencies
+- Input: `src/feedback.js` (read-only from test perspective)
+- Output: `test/feature_feedback-test.test.js` (new file)
+### Existing Test Patterns
+- Isolation pattern from `test/feature_feedback-loop.test.js` (setupTestDir / teardownTestDir)
+- Module import pattern from `test/feature_theme-adoption.test.js` and `test/feature_config-factory.test.js`
+---
+## 8. Non-Functional Considerations
+### Performance
+- All tests are synchronous file system operations on tmp dirs; expected runtime < 100ms total
+### Maintainability
+- Tests are structured to mirror `src/feedback.js` exported API, making it easy to add tests as the module evolves
+- Describe block names match function groups: `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, `shouldPause`, `Config Management`
+### Error Tolerance
+- Tmp directory teardown uses `{ force: true }` to tolerate partial cleanup on test failure
+### No Side Effects
+- Tests do not modify any project-level `.claude/` files; all file I/O is confined to `tmp` directories
+---
+## 9. Assumptions & Open Questions
+### Assumptions
+- ASSUMPTION: `src/feedback.js` exports are stable; no API changes are required to make it testable
+- ASSUMPTION: `process.chdir` correctly redirects the module's relative path resolution for `CONFIG_FILE`
+- ASSUMPTION: Node.js 18+ is available (required by SYSTEM_SPEC.md:Section 2)
+- ASSUMPTION: `displayConfig` writes to stdout; smoke test asserts it does not throw
+### Open Questions
+- Should `displayConfig` be tested with a captured stdout mock, or is a non-throw assertion sufficient? (INFERRED: non-throw is sufficient for this feature)
+- Should `setConfigValue` with unknown keys be tested? (INFERRED: yes, as the function throws a typed error that should be verified)
+- Are there any async code paths in `src/feedback.js`? (INFERRED: no — all operations are synchronous based on current implementation)
+---
+## 10. Impact on System Specification
+### Reinforces Existing Assumptions
+- Per SYSTEM_SPEC.md:Section 7, "tests are contracts" and "green suite required" — this feature closes a gap where contracts were implied but not enforced
+- Per SYSTEM_SPEC.md:Section 8 (Traceability), tests that directly import production modules create a firmer traceability chain than tests using reimplemented helpers
+### No Contradiction
+This feature introduces no new behaviour, state, or API. It adds test coverage for existing production code. No system spec update is required.
+---
+## 11. Handover to BA (Cass)
+### Story Themes
+1. **Direct Module Tests:** Tests that import and exercise `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, and `shouldPause` via the real module
+2. **Config Management Tests:** Tests for `readConfig`, `writeConfig`, `setConfigValue`, `resetConfig`, and `getDefaultConfig` with file system isolation
+3. **End-to-End Parse Pipeline:** A chained test covering `parseFeedbackFromOutput` → `normalizeFeedbackKeys` → `validateFeedback` as an integrated path
+### Expected Story Boundaries
+- Story 1: Validation and normalisation functions (no file system needed)
+- Story 2: Config management functions (file system isolation required)
+- Story 3: Parse pipeline (combines Stories 1 and 2 patterns)
+### Areas Needing Careful Story Framing
+- `process.chdir` usage must be clearly framed as test infrastructure, not production behaviour
+- The distinction between this test file and `feature_feedback-loop.test.js` must be explicit: this tests the real module; that uses inline helpers
+- `displayConfig` story should be framed as a smoke test, not a full output assertion
+---
+## 12. Change Log (Feature-Level)
+| Date       | Change                              | Reason                                  | Raised By |
+|------------|-------------------------------------|-----------------------------------------|-----------|
+| 2026-05-19 | Initial feature specification       | Close test coverage gap for src/feedback.js | Alex  |

package/.blueprint/features/feature_feedback-test/IMPLEMENTATION_PLAN.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Implementation Plan — feedback-test
+## Summary
+This feature adds a test suite for `src/feedback.js`. All 34 tests were written and verified green by Nigel prior to this planning phase — `src/feedback.js` already exports every required function and no production code changes are needed. Implementation is test-only: the test file and its artifact already exist and pass.
+## Files to Create/Modify
+| Path | Action | Purpose |
+|------|--------|---------|
+| `test/feature_feedback-test.test.js` | Already created (Nigel) | 34 tests covering all exported feedback functions |
+| `test/artifacts/feature_feedback-test/test-spec.md` | Already created (Nigel) | AC-to-test-ID mapping and assumptions |
+| `src/feedback.js` | No change required | All required exports already present and correct |
+## Implementation Steps
+1. **Verify tests pass as-is** — Run `node --test test/feature_feedback-test.test.js` to confirm all 34 tests are green. Addresses all test IDs (T-VN-*, T-CM-*, T-PP-*).
+2. **No production code changes needed** — `src/feedback.js` already exports `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, `shouldPause`, `getDefaultConfig`, `readConfig`, `writeConfig`, `setConfigValue`, `displayConfig`, and `resetConfig` with correct behaviour.
+3. **Commit the new test artefacts** — Stage and commit `test/feature_feedback-test.test.js` and `test/artifacts/feature_feedback-test/test-spec.md` along with this plan.
+## Risks / Questions
+- None. Nigel confirmed all 34 tests pass against the unmodified production file before handoff.

package/.blueprint/features/feature_feedback-test/handoff-alex.md ADDED Viewed

@@ -0,0 +1,20 @@
+## Handoff Summary
+**For:** Cass
+**Feature:** feedback-test
+### Key Decisions
+- Scope is unit tests for `src/feedback.js` exclusively — the production module is imported directly, not reimplemented inline
+- All nine exported functions are covered: `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, `shouldPause`, `getDefaultConfig`, `readConfig`, `writeConfig`, `setConfigValue`, `resetConfig`
+- File system tests use `tmp` directory isolation with `process.chdir` (matching `feature_feedback-loop.test.js` pattern)
+- `displayConfig` is smoke-tested only (non-throw assertion); full stdout capture is out of scope
+- One chained integration test covers the full parse pipeline: `parseFeedbackFromOutput` → `normalizeFeedbackKeys` → `validateFeedback`
+### Files Created
+- .blueprint/features/feature_feedback-test/FEATURE_SPEC.md
+### Open Questions
+- Whether `displayConfig` warrants stdout capture mocking (deferred; non-throw is sufficient for now)
+- Async paths in `src/feedback.js`: inferred none exist, but Cass should confirm before writing stories
+### Critical Context
+The key distinction from existing feedback tests: `feature_feedback-loop.test.js` and `feature_compressed-feedback.test.js` both re-implement feedback logic as inline helpers — they do not import `src/feedback.js`. This feature exists precisely to test the real production module. Stories must keep this boundary clear. The output file is `test/feature_feedback-test.test.js` and must use `require('../src/feedback')`.

package/.blueprint/features/feature_feedback-test/handoff-cass.md ADDED Viewed

@@ -0,0 +1,21 @@
+## Handoff Summary
+**For:** Nigel
+**Feature:** feedback-test
+### Key Decisions
+- Split into three stories matching Alex's expected boundaries: validation/normalisation (pure), config management (file I/O), and parse pipeline (chained integration)
+- `normalizeFeedbackKeys` with both `rec` and `recommendation` present: production code does NOT delete `rec` when `recommendation` already exists — both keys are preserved; ACs reflect actual implementation
+- File system isolation pattern documented explicitly per `feature_feedback-loop.test.js`; `process.chdir` restore is mandatory in teardown
+- `displayConfig` framed as smoke test (no-throw only); no stdout capture required
+- All functions confirmed synchronous — no async/await needed in tests
+### Files Created
+- .blueprint/features/feature_feedback-test/story-validation-normalisation.md
+- .blueprint/features/feature_feedback-test/story-config-management.md
+- .blueprint/features/feature_feedback-test/story-parse-pipeline.md
+### Open Questions
+- None
+### Critical Context
+Output file is `test/feature_feedback-test.test.js` using `require('../src/feedback')` — not inline reimplementations. `CONFIG_FILE` is resolved relative to `process.cwd()`; all config tests must `chdir` into a `tmp` dir before calling any read/write function. Story-parse-pipeline covers Rule 5 (chained integration test) from FEATURE_SPEC.md. Rating boundary values to test: 0, 1, 3, 5, 6, 3.5.

package/.blueprint/features/feature_feedback-test/handoff-nigel.md ADDED Viewed

@@ -0,0 +1,20 @@
+## Handoff Summary
+**For:** Codey
+**Feature:** feedback-test
+### Key Decisions
+- Tests import `require('../src/feedback')` directly — no inline reimplementation anywhere
+- Config tests use `before`/`after` (not `beforeEach`/`afterEach`) since all config tests share one tmp dir with sequential state
+- `normalizeFeedbackKeys` dual-key test asserts both keys are preserved (production does NOT delete `rec` when `recommendation` already exists)
+- `displayConfig` covered as smoke test only (no stdout capture)
+- All 34 tests are synchronous; no async/await used
+### Files Created
+- test/artifacts/feature_feedback-test/test-spec.md
+- test/feature_feedback-test.test.js
+### Open Questions
+- None
+### Critical Context
+All 34 tests pass green (`node --test test/feature_feedback-test.test.js`). No changes to `src/feedback.js` are required — the existing exports satisfy all ACs. The Config Management describe block uses a single shared tmp dir (`before`/`after`), so tests within it run sequentially and depend on one another for state (e.g. T-CM-4.1 leaves a modified config file that T-CM-5.1 then resets). Codey need not modify production code; this feature is test-only.

package/.blueprint/features/feature_feedback-test/story-config-management.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Story: Config Management Functions
+## User Story
+As a developer maintaining `src/feedback.js`,
+I want direct unit tests for `getDefaultConfig`, `readConfig`, `writeConfig`, `setConfigValue`, `resetConfig`, and `displayConfig`,
+so that config persistence logic in the production module is verified against real file I/O in an isolated environment.
+---
+## Acceptance Criteria
+**Given** `getDefaultConfig()` is called,
+**When** the function returns,
+**Then** the result has `minRatingThreshold: 3.0`, `enabled: true`, and an `issueMappings` object containing all six standard mappings defined in `src/feedback.js`.
+**Given** a `tmp` directory is set as `process.cwd()` and no config file exists there,
+**When** `readConfig()` is called,
+**Then** it returns a value equal to `getDefaultConfig()` and does not throw.
+**Given** a `tmp` directory is set as `process.cwd()` and `.claude/feedback-config.json` contains valid JSON,
+**When** `readConfig()` is called,
+**Then** it returns the parsed config object matching the written content.
+**Given** a `tmp` directory is set as `process.cwd()` and `.claude/feedback-config.json` contains malformed JSON (e.g. `{bad json`),
+**When** `readConfig()` is called,
+**Then** it returns `getDefaultConfig()` and does not throw.
+**Given** a `tmp` directory is set as `process.cwd()`,
+**When** `writeConfig(config)` is called with a valid config object,
+**Then** `.claude/feedback-config.json` is created at the expected path and its content parses back to the original config object.
+**Given** a `tmp` directory is set as `process.cwd()` and a config file exists,
+**When** `setConfigValue('minRatingThreshold', '4.5')` is called,
+**Then** `readConfig()` returns a config with `minRatingThreshold: 4.5`.
+**Given** a `tmp` directory is set as `process.cwd()`,
+**When** `setConfigValue('enabled', 'false')` is called,
+**Then** `readConfig()` returns a config with `enabled: false`.
+**Given** a `tmp` directory is set as `process.cwd()`,
+**When** `setConfigValue` is called with an unknown key (e.g. `'nonExistentKey'`),
+**Then** it throws an `Error` whose message contains `'Unknown config key'`.
+**Given** a `tmp` directory is set as `process.cwd()` and a modified config file exists,
+**When** `resetConfig()` is called,
+**Then** `readConfig()` returns a value equal to `getDefaultConfig()`.
+**Given** a `tmp` directory is set as `process.cwd()`,
+**When** `displayConfig()` is called,
+**Then** it does not throw (smoke test only — output format is not asserted).
+---
+## File System Isolation Pattern
+Each describe block that exercises config file I/O must follow this pattern:
+```
+before(): testDir = fs.mkdtempSync(os.tmpdir() + path.sep + 'feedback-test-')
+          originalCwd = process.cwd()
+          process.chdir(testDir)
+after():  process.chdir(originalCwd)
+          fs.rmSync(testDir, { recursive: true, force: true })
+```
+This mirrors the isolation pattern in `test/feature_feedback-loop.test.js`.
+The `CONFIG_FILE` constant in `src/feedback.js` is `.claude/feedback-config.json`, resolved relative to `process.cwd()`. Tests must `chdir` before calling any function that reads or writes config.
+---
+## `setConfigValue` Invalid Input Cases
+| key                   | value     | Expected behaviour                              |
+|-----------------------|-----------|-------------------------------------------------|
+| `minRatingThreshold`  | `'0.5'`   | throws — below minimum (1.0)                    |
+| `minRatingThreshold`  | `'5.5'`   | throws — above maximum (5.0)                    |
+| `minRatingThreshold`  | `'abc'`   | throws — not a number                           |
+| `enabled`             | `'yes'`   | throws — not `'true'` or `'false'`              |
+| `nonExistentKey`      | `'val'`   | throws — unknown key                            |
+---
+## Out of Scope
+- Testing `displayConfig` output format or colour rendering
+- Testing stdout mock/capture for `displayConfig`
+- Testing `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, `shouldPause` (covered in story-validation-normalisation.md)
+- End-to-end pipeline chain (covered in story-parse-pipeline.md)
+- Any modification of `src/feedback.js` production code
+- Modifying project-level `.claude/` files
+---
+## Implementation Notes
+- Import: `const { getDefaultConfig, readConfig, writeConfig, setConfigValue, resetConfig, displayConfig } = require('../src/feedback')`
+- Also import: `fs`, `os`, `path` for file system isolation
+- Group under a single `describe('Config Management', ...)` or per-function sub-describes
+- `displayConfig` reads config via `readConfig()`, so it also requires `chdir` setup
+- See: `.blueprint/features/feature_feedback-test/FEATURE_SPEC.md` for full rules and constraints

package/.blueprint/features/feature_feedback-test/story-parse-pipeline.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Story: End-to-End Parse Pipeline
+## User Story
+As a developer maintaining `src/feedback.js`,
+I want an integrated test that chains `parseFeedbackFromOutput` → `normalizeFeedbackKeys` → `validateFeedback` using the real production module,
+so that the complete feedback extraction and validation path is verified end-to-end within the module boundary.
+---
+## Acceptance Criteria
+**Given** an agent output string containing a valid `FEEDBACK: { "rating": 4, "issues": [], "rec": "proceed" }` block,
+**When** the output is passed to `parseFeedbackFromOutput`, the result to `normalizeFeedbackKeys`, and that result to `validateFeedback`,
+**Then** `parseFeedbackFromOutput` returns a non-null object, `normalizeFeedbackKeys` returns an object with `recommendation: 'proceed'` (not `rec`), and `validateFeedback` returns `{ valid: true, errors: [] }`.
+**Given** an agent output string containing `FEEDBACK: { "rating": 2, "issues": ["unclear-scope"], "rec": "pause" }`,
+**When** the same three-step chain is applied,
+**Then** `validateFeedback` returns `{ valid: true, errors: [] }` (both `rec`-normalised recommendation and rating are valid), and the normalised object has `recommendation: 'pause'`.
+**Given** an agent output string with `FEEDBACK: { "rating": 0, "issues": [], "recommendation": "proceed" }`,
+**When** the three-step chain is applied,
+**Then** `validateFeedback` returns `{ valid: false, errors: [...] }` with an error referencing the invalid rating.
+**Given** an agent output string with no `FEEDBACK:` marker,
+**When** `parseFeedbackFromOutput` is called,
+**Then** it returns `null` and the pipeline terminates at that stage (normalisation and validation are not called with null).
+---
+## Pipeline Sequence (Explicit)
+```
+input: raw output string
+  └─► parseFeedbackFromOutput(output)
+        → null           → pipeline terminates (no further steps)
+        → parsed object  → continue
+            └─► normalizeFeedbackKeys(parsed)
+                  → normalised object
+                      └─► validateFeedback(normalised)
+                            → { valid, errors }
+```
+All three functions are called on the real `src/feedback.js` module export. No step reimplements logic inline.
+---
+## Out of Scope
+- File system I/O (not required for this pipeline — all functions are in-memory)
+- `shouldPause` integration (not part of the parse pipeline; covered in story-validation-normalisation.md)
+- Config management functions (covered in story-config-management.md)
+- Any modification of `src/feedback.js` production code
+- Exhaustive permutations of each step (those are covered in story-validation-normalisation.md)
+---
+## Implementation Notes
+- Import: `const { parseFeedbackFromOutput, normalizeFeedbackKeys, validateFeedback } = require('../src/feedback')`
+- Group under `describe('Parse Pipeline', ...)` or similar
+- No file system setup required — all three functions operate on in-memory values
+- This story's tests serve as the single chained integration test called for in FEATURE_SPEC.md:Rule 5
+- The `rec` → `recommendation` normalisation step is critical: the raw parsed object uses `rec`, and `validateFeedback` accepts both keys — but the test should verify normalisation works correctly in the chain
+- See: `.blueprint/features/feature_feedback-test/FEATURE_SPEC.md` for Rule 5 context

package/.blueprint/features/feature_feedback-test/story-validation-normalisation.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Story: Validation and Normalisation Functions
+## User Story
+As a developer maintaining `src/feedback.js`,
+I want direct unit tests for `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, and `shouldPause`,
+so that regressions in the production module are caught immediately by the test suite without relying on inline reimplementations.
+---
+## Acceptance Criteria
+**Given** a feedback object with a valid integer rating (1–5), an array of strings as issues, and a valid recommendation (`proceed`, `pause`, or `revise`),
+**When** `validateFeedback(feedback)` is called,
+**Then** it returns `{ valid: true, errors: [] }`.
+**Given** a feedback object with a rating of `0` (below range), `6` (above range), `3.5` (non-integer), or a value that is not a number,
+**When** `validateFeedback(feedback)` is called,
+**Then** it returns `{ valid: false, errors: [...] }` containing an appropriate error message for each invalid rating.
+**Given** a feedback object where `issues` is not an array, or contains non-string elements,
+**When** `validateFeedback(feedback)` is called,
+**Then** it returns `{ valid: false, errors: [...] }` containing an error message describing the issues field violation.
+**Given** a feedback object with `rec` key only (no `recommendation` key),
+**When** `normalizeFeedbackKeys(feedback)` is called,
+**Then** it returns an object with `recommendation` set to the `rec` value and no `rec` key present.
+**Given** a feedback object with both `rec` and `recommendation` keys,
+**When** `normalizeFeedbackKeys(feedback)` is called,
+**Then** `recommendation` retains its original value (wins over `rec`) and `rec` is not deleted (both remain as-is per the production implementation).
+**Given** an agent output string containing a `FEEDBACK: { ... }` JSON block with valid content,
+**When** `parseFeedbackFromOutput(output)` is called,
+**Then** it returns the parsed feedback object.
+**Given** a feedback object with `recommendation: 'pause'` and a rating above `minRatingThreshold`,
+**When** `shouldPause(feedback, config)` is called,
+**Then** it returns `true`.
+**Given** a feedback object with `recommendation: 'proceed'` and a rating below `minRatingThreshold`,
+**When** `shouldPause(feedback, config)` is called,
+**Then** it returns `true` (rating-based gate triggers independently of recommendation).
+---
+## Test Boundary Details
+### `validateFeedback` — rating boundary values to test
+| Value | Expected valid |
+|-------|---------------|
+| 0     | false         |
+| 1     | true          |
+| 3     | true          |
+| 5     | true          |
+| 6     | false         |
+| 3.5   | false         |
+### `normalizeFeedbackKeys` — key scenarios to test
+| Scenario                              | Expected result                          |
+|---------------------------------------|------------------------------------------|
+| `rec` only                            | Renamed to `recommendation`; `rec` removed |
+| `recommendation` only                 | Unchanged                                |
+| Both `rec` and `recommendation`       | Both keys preserved; `recommendation` value unchanged |
+| Neither `rec` nor `recommendation`    | Object returned unchanged                |
+### `parseFeedbackFromOutput` — scenarios to test
+| Input                                 | Expected result   |
+|---------------------------------------|-------------------|
+| Valid `FEEDBACK: { ... }` block       | Parsed object     |
+| No `FEEDBACK:` marker                 | `null`            |
+| Malformed JSON after `FEEDBACK:`      | `null`            |
+### `shouldPause` — scenarios to test
+| rating | minRatingThreshold | recommendation | Expected |
+|--------|--------------------|----------------|----------|
+| 4      | 3.0                | 'proceed'      | false    |
+| 2      | 3.0                | 'proceed'      | true     |
+| 4      | 3.0                | 'pause'        | true     |
+| 2      | 3.0                | 'pause'        | true     |
+---
+## Out of Scope
+- Config file system interaction (covered in story-config-management.md)
+- End-to-end parse pipeline chain (covered in story-parse-pipeline.md)
+- `displayConfig` output assertion (smoke-tested in story-config-management.md)
+- Any modification of `src/feedback.js` production code
+- Testing agent prompt text or insights correlation logic
+---
+## Implementation Notes
+- Import: `const { validateFeedback, normalizeFeedbackKeys, parseFeedbackFromOutput, shouldPause } = require('../src/feedback')`
+- No file system setup required for this story — all functions are pure or in-memory
+- Describe block names should match function names: `validateFeedback`, `normalizeFeedbackKeys`, `parseFeedbackFromOutput`, `shouldPause`
+- See: `.blueprint/features/feature_feedback-test/FEATURE_SPEC.md` for full rules

package/README.md CHANGED Viewed

@@ -327,6 +327,24 @@ analyzes:               recommends:            calibrates:
 • Trends                • And feedback issues
 ```
+### Accessing Module Data
+Data is collected from both invocation methods and accessible via CLI commands:
+| Data | `/implement-feature` (skill) | `npx murmur8 murm` (CLI) | How to access |
+|------|------------------------------|--------------------------|---------------|
+| **Per-stage timing** (alex, cass, nigel, codey) | Recorded by orchestrating agent | Merged from worktree on successful merge | `npx murmur8 history` |
+| **Feedback ratings** (agent-to-agent) | Recorded by feedback micro-Tasks | Merged from worktree on successful merge | `npx murmur8 history`, `npx murmur8 insights --feedback` |
+| **Token cost per stage** | Recorded by orchestrating agent | Merged from worktree on successful merge | `npx murmur8 history --cost` |
+| **Batch summary** (total duration, feature outcomes) | N/A (single feature) | Recorded at batch completion | `npx murmur8 history` |
+| **Success/failure status** | Recorded per run | Recorded per feature + batch | `npx murmur8 history --stats` |
+| **Retry attempts & strategies** | Recorded on failure | Merged from worktree on successful merge | `npx murmur8 insights --failures` |
+| **Bottleneck analysis** | Derived from history | Derived from history | `npx murmur8 insights --bottlenecks` |
+| **Smart retry recommendations** | Used live during pipeline | Used live during pipeline | Automatic on failure |
+| **Diff preview** | Shown before commit | Shown per worktree before merge | Interactive during pipeline |
+**How worktree history merging works:** When `npx murmur8 murm` runs, each feature pipeline executes `/implement-feature` inside an isolated git worktree. The skill records per-stage data to `.claude/pipeline-history.json` within that worktree. After a successful merge, murmur8 reads this file and appends its entries to the main project's history before cleaning up the worktree. Failed/conflicted worktrees preserve their history for debugging.
 ## Directory Structure
 ```

package/SKILL.md CHANGED Viewed

@@ -138,7 +138,7 @@ If no history exists, skip this step silently.
 ### Step 5: Initialize
 Create/read `{QUEUE}`. Ensure dirs exist: `mkdir -p {FEAT_DIR} {TEST_DIR}`
-Unless `--no-history`, start a history entry (slug, startedAt, stages, feedback).
+Unless `--no-history`, note the pipeline start time (ISO 8601 UTC) in your working context as `PIPELINE_START`.
 ---
@@ -146,7 +146,7 @@ Unless `--no-history`, start a history entry (slug, startedAt, stages, feedback)
 **Announce:** `} Alex — creating feature spec`
-**History:** Record `stages.alex.startedAt` before spawning.
+**History:** Note `ALEX_START` (ISO 8601 UTC) before spawning.
 **Runtime prompt:** `.blueprint/prompts/alex-runtime.md`
@@ -204,7 +204,7 @@ Brief summary (5 bullets max): intent, key behaviours, scope, story themes, tens
 **On completion:**
 1. Verify `{FEAT_SPEC}` and `{FEAT_DIR}/handoff-alex.md` exist
-2. **Record history:** `stages.alex = { completedAt, durationMs, status: "success" }`
+2. Note `ALEX_END` and compute `ALEX_DURATION_MS`
 3. Update queue: move feature to `cassQueue`
 4. If `--pause-after=alex`: Show output path, ask user to continue
@@ -247,7 +247,7 @@ FEEDBACK: {"rating":N,"issues":["..."],"rec":"proceed|pause|revise"}
 **Announce:** ` } Cass — writing user stories`
-**History:** Record `stages.cass.startedAt` before spawning.
+**History:** Note `CASS_START` (ISO 8601 UTC) before spawning.
 **Runtime prompt:** `.blueprint/prompts/cass-runtime.md`
@@ -311,7 +311,7 @@ Brief summary: story count, filenames, behaviours covered (5 bullets max)
 **On completion:**
 1. Verify at least one `story-*.md` exists in `{FEAT_DIR}`
 2. Verify `{FEAT_DIR}/handoff-cass.md` exists
-2. **Record history:** `stages.cass = { completedAt, durationMs, status: "success" }`
+2. Note `CASS_END` and compute `CASS_DURATION_MS`
 3. Update queue: move feature to `nigelQueue`
 4. If `--pause-after=cass`: Show story paths, ask user to continue
@@ -349,7 +349,7 @@ FEEDBACK: {"rating":N,"issues":["..."],"rec":"proceed|pause|revise"}
 **Announce:** `  } Nigel — building test spec`
-**History:** Record `stages.nigelSpec.startedAt` before spawning.
+**History:** Note `NIGEL_SPEC_START` (ISO 8601 UTC) before spawning.
 **Runtime prompt:** `.blueprint/prompts/nigel-runtime.md`
@@ -412,7 +412,7 @@ Brief summary: test case count planned, AC coverage %, assumptions (5 bullets ma
 **On completion:**
 1. Verify `{TEST_SPEC}` and `{FEAT_DIR}/handoff-nigel.md` exist
-2. **Record history:** `stages.nigelSpec = { completedAt, durationMs, status: "success" }`
+2. Note `NIGEL_SPEC_END` and compute `NIGEL_SPEC_DURATION_MS`
 **On failure:** See [Error Handling with Retry](#error-handling-with-smart-retry)
@@ -422,7 +422,7 @@ Brief summary: test case count planned, AC coverage %, assumptions (5 bullets ma
 **Announce:** `  } Nigel — writing executable tests`
-**History:** Record `stages.nigelTests.startedAt` before spawning.
+**History:** Note `NIGEL_TESTS_START` (ISO 8601 UTC) before spawning.
 Use the Task tool with `subagent_type="general-purpose"`:
@@ -460,7 +460,7 @@ Brief summary: test count, file(s) written, any tests deferred
 **On completion:**
 1. Verify `{TEST_FILE}` exists
-2. **Record history:** `stages.nigelTests = { completedAt, durationMs, status: "success" }`
+2. Note `NIGEL_TESTS_END` and compute `NIGEL_TESTS_DURATION_MS`
 3. Update queue: move feature to `codeyQueue`
 4. If `--pause-after=nigel`: Show test paths, ask user to continue
@@ -499,7 +499,7 @@ FEEDBACK: {"rating":N,"issues":["..."],"rec":"proceed|pause|revise"}
 **Announce:** `   } Codey — drafting implementation plan`
-**History:** Record `stages.codeyPlan.startedAt` before spawning.
+**History:** Note `CODEY_PLAN_START` (ISO 8601 UTC) before spawning.
 **Runtime prompt:** `.blueprint/prompts/codey-plan-runtime.md`
@@ -556,7 +556,7 @@ Brief summary: files planned, step count, identified risks
 **On completion:**
 1. Verify `{PLAN}` exists
-2. **Record history:** `stages.codeyPlan = { completedAt, durationMs, status: "success" }`
+2. Note `CODEY_PLAN_END` and compute `CODEY_PLAN_DURATION_MS`
 3. If `--pause-after=codey-plan`: Show plan path, ask user to continue
 **On failure:** See [Error Handling with Retry](#error-handling-with-smart-retry)
@@ -567,7 +567,7 @@ Brief summary: files planned, step count, identified risks
 **Announce:** `    } Codey — implementing feature`
-**History:** Record `stages.codeyImplement.startedAt` before spawning.
+**History:** Note `CODEY_IMPL_START` (ISO 8601 UTC) before spawning.
 **Runtime prompt:** `.blueprint/prompts/codey-implement-runtime.md`
@@ -637,13 +637,13 @@ for each step in IMPLEMENTATION_PLAN.steps:
 **On all steps complete:**
 1. Run full test suite: `node --test {TEST_FILE}`
-2. **Record history:** `stages.codeyImplement = { completedAt, durationMs, status: "success", stepsCompleted: N }`
+2. Note `CODEY_IMPL_END`, compute `CODEY_IMPL_DURATION_MS`, and note `STEPS_COMPLETED`
 3. Update queue: move feature to `completed`
 4. Proceed to auto-commit (unless `--no-commit`)
 **On partial failure:**
 1. Record which steps completed and which failed
-2. **Record history:** `stages.codeyImplement = { status: "partial", stepsCompleted: M, totalSteps: N, failedAt: step }`
+2. Note partial completion: `STEPS_COMPLETED=M`, `TOTAL_STEPS=N`, `FAILED_AT_STEP=step`
 3. Report to user with option to continue manually
 **On failure:** See [Error Handling with Retry](#error-handling-with-smart-retry)
@@ -694,17 +694,28 @@ After commit, remove the slug's row from `{BACKLOG}` (if it exists). Stage with
 **Modules:** `src/history.js`, `src/cost.js`
-Unless `--no-history` flag is set, finalize the history entry:
+Unless `--no-history` flag is set, build the history entry JSON from the timestamps noted during the run and write it via the CLI:
-```javascript
-historyEntry.status = "success";
-historyEntry.completedAt = new Date().toISOString();
-historyEntry.totalDurationMs = completedAt - startedAt;
-historyEntry.commitHash = "{hash}";
-historyEntry.totalTokens = { input: N, output: M };
-historyEntry.totalCost = X.XXX;
-// Save to .claude/pipeline-history.json
-```
+```bash
+node bin/cli.js history record '{
+  "slug": "{slug}",
+  "status": "success",
+  "startedAt": "<PIPELINE_START>",
+  "completedAt": "<now ISO 8601>",
+  "totalDurationMs": <elapsed ms>,
+  "commitHash": "<hash or null>",
+  "stages": {
+    "alex":             { "startedAt": "<ALEX_START>",        "completedAt": "<ALEX_END>",        "durationMs": <ALEX_DURATION_MS>,        "status": "success" },
+    "cass":             { "startedAt": "<CASS_START>",        "completedAt": "<CASS_END>",        "durationMs": <CASS_DURATION_MS>,        "status": "success" },
+    "nigel-spec":       { "startedAt": "<NIGEL_SPEC_START>",  "completedAt": "<NIGEL_SPEC_END>",  "durationMs": <NIGEL_SPEC_DURATION_MS>,  "status": "success" },
+    "nigel-tests":      { "startedAt": "<NIGEL_TESTS_START>", "completedAt": "<NIGEL_TESTS_END>", "durationMs": <NIGEL_TESTS_DURATION_MS>, "status": "success" },
+    "codey-plan":       { "startedAt": "<CODEY_PLAN_START>",  "completedAt": "<CODEY_PLAN_END>",  "durationMs": <CODEY_PLAN_DURATION_MS>,  "status": "success" },
+    "codey-implement":  { "startedAt": "<CODEY_IMPL_START>",  "completedAt": "<CODEY_IMPL_END>",  "durationMs": <CODEY_IMPL_DURATION_MS>,  "status": "success", "stepsCompleted": <N> }
+  }
+}'
+```
+Omit stages that were skipped (e.g. cass when `--skip-stories` was used). Set `status` to `"failed"` and add `"failedStage": "<stage>"` on failure, or `"paused"` and `"pausedAfter": "<stage>"` on pause.
 **Display summary:** Stage status (✓/✗), test count, duration, commit hash, feedback ratings, cost breakdown per stage.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "murmur8",
-  "version": "4.4.0",
+  "version": "4.5.1",
   "description": "Multi-agent workflow framework for automated feature development",
   "main": "src/index.js",
   "bin": {

package/src/commands/history.js CHANGED Viewed

@@ -1,7 +1,7 @@
 /**
  * history command - View pipeline execution history
  */
-const { displayHistory, showStats, clearHistory, exportHistory } = require('../history');
+const { displayHistory, showStats, clearHistory, exportHistory, recordHistory, updateStage } = require('../history');
 const { parseFlags } = require('./utils');
 const description = 'View pipeline execution history';
@@ -10,7 +10,46 @@ async function run(args) {
   const flags = parseFlags(args);
   const subArg = args[1];
-  if (subArg === 'clear') {
+  if (subArg === 'record') {
+    const jsonArg = args[2];
+    if (!jsonArg) {
+      console.error('Usage: history record \'{"slug":"...","status":"...","startedAt":"...","completedAt":"...","totalDurationMs":N}\'');
+      process.exit(1);
+    }
+    let entry;
+    try {
+      entry = JSON.parse(jsonArg);
+    } catch (err) {
+      console.error(`Invalid JSON: ${err.message}`);
+      process.exit(1);
+    }
+    if (!entry.slug || !entry.status || !entry.startedAt || !entry.completedAt || entry.totalDurationMs === undefined) {
+      console.error('Entry must include: slug, status, startedAt, completedAt, totalDurationMs');
+      process.exit(1);
+    }
+    const ok = recordHistory(entry);
+    if (!ok) process.exit(1);
+    console.log(`Recorded history entry for "${entry.slug}" (${entry.status})`);
+  } else if (subArg === 'update-stage') {
+    // history update-stage <slug> <stage> '<json>'
+    const slug = args[2];
+    const stage = args[3];
+    const jsonArg = args[4];
+    if (!slug || !stage || !jsonArg) {
+      console.error('Usage: history update-stage <slug> <stage> \'{"durationMs":N,"status":"success"}\'');
+      process.exit(1);
+    }
+    let stageData;
+    try {
+      stageData = JSON.parse(jsonArg);
+    } catch (err) {
+      console.error(`Invalid JSON: ${err.message}`);
+      process.exit(1);
+    }
+    const ok = updateStage(slug, stage, stageData);
+    if (!ok) process.exit(1);
+    console.log(`Updated stage "${stage}" for "${slug}"`);
+  } else if (subArg === 'clear') {
     await clearHistory({ force: flags.force });
   } else if (subArg === 'export') {
     const exportOpts = {};

package/src/history.js CHANGED Viewed

@@ -90,6 +90,36 @@ function storeStageFeedback(slug, stage, feedback) {
   }
 }
+/**
+ * Updates (merges) stage data into the most recent history entry for a slug.
+ * Used by the CLI skill to record per-stage timing after each pipeline step.
+ * @param {string} slug - Feature slug
+ * @param {string} stage - Stage name (alex, cass, nigel, codey-plan, codey-implement)
+ * @param {object} data - Stage fields to merge (startedAt, completedAt, durationMs, status, etc.)
+ * @returns {boolean} True if updated successfully
+ */
+function updateStage(slug, stage, data) {
+  try {
+    const history = readHistoryFile();
+    if (history.error) {
+      console.warn('Warning: History file is corrupted, cannot update stage.');
+      return false;
+    }
+    const entry = history.findLast(e => e.slug === slug);
+    if (!entry) {
+      console.warn(`Warning: No history entry found for slug: ${slug}`);
+      return false;
+    }
+    if (!entry.stages) entry.stages = {};
+    entry.stages[stage] = { ...entry.stages[stage], ...data };
+    writeHistoryFile(history);
+    return true;
+  } catch (err) {
+    console.warn(`Warning: Failed to update stage: ${err.message}`);
+    return false;
+  }
+}
 function formatDuration(ms) {
   const seconds = Math.floor(ms / 1000);
   const minutes = Math.floor(seconds / 60);
@@ -427,6 +457,7 @@ module.exports = {
   writeHistoryFile,
   recordHistory,
   storeStageFeedback,
+  updateStage,
   displayHistory,
   showStats,
   clearHistory,

package/src/index.js CHANGED Viewed

@@ -1,7 +1,7 @@
 const { init } = require('./init');
 const { update } = require('./update');
 const { validate, formatOutput, checkNodeVersion } = require('./validate');
-const { recordHistory, displayHistory, showStats, clearHistory, storeStageFeedback } = require('./history');
+const { recordHistory, displayHistory, showStats, clearHistory, storeStageFeedback, updateStage } = require('./history');
 const {
   readConfig,
   writeConfig,
@@ -108,6 +108,7 @@ module.exports = {
   showStats,
   clearHistory,
   storeStageFeedback,
+  updateStage,
   // Retry module exports
   readConfig,
   writeConfig,

package/src/murm.js CHANGED Viewed

@@ -5,6 +5,7 @@ const { execSync, spawn } = require('child_process');
 const fs = require('fs');
 const readline = require('readline');
 const theme = require('./theme');
+const { recordHistory, readHistoryFile, writeHistoryFile } = require('./history');
 const CONFIG_FILE = '.claude/murm-config.json';
 const LOCK_FILE = '.claude/murm.lock';
@@ -19,6 +20,27 @@ const LEGACY_QUEUE_FILE = '.claude/parallel-queue.json';
 let runningProcesses = new Map();
 let isAborting = false;
+const HISTORY_FILE = '.claude/pipeline-history.json';
+function mergeWorktreeHistory(worktreePath) {
+  const worktreeHistoryPath = path.join(worktreePath, HISTORY_FILE);
+  if (!fs.existsSync(worktreeHistoryPath)) return [];
+  try {
+    const worktreeEntries = JSON.parse(fs.readFileSync(worktreeHistoryPath, 'utf8'));
+    if (!Array.isArray(worktreeEntries) || worktreeEntries.length === 0) return [];
+    const mainHistory = readHistoryFile();
+    if (mainHistory.error) return worktreeEntries;
+    mainHistory.push(...worktreeEntries);
+    writeHistoryFile(mainHistory);
+    return worktreeEntries;
+  } catch {
+    return [];
+  }
+}
 /**
  * Migrate a legacy file path to the new path.
  * If the old file exists and the new one doesn't, rename it.
@@ -1284,6 +1306,11 @@ async function runMurm(slugs, options = {}) {
         if (mergeResult.success) {
           feature.status = 'murm_complete';
           console.log(`[${timestamp}] ${result.slug}: ${theme.MESSAGES.mergedAndLanded} \u2713`);
+          // Merge per-stage history from worktree before cleanup
+          const merged = mergeWorktreeHistory(feature.worktreePath);
+          if (merged.length > 0) {
+            feature.historyMerged = true;
+          }
           removeWorktree(result.slug);
         } else if (mergeResult.conflict) {
           feature.status = 'merge_conflict';
@@ -1342,6 +1369,29 @@ async function runMurm(slugs, options = {}) {
       });
   }
+    // Record batch-level history
+    recordHistory({
+      slug: slugs.join('+'),
+      mode: 'murmuration',
+      status: summary.failed === 0 && summary.conflicts === 0 ? 'success' : 'partial',
+      startedAt: queue.startedAt,
+      completedAt: new Date().toISOString(),
+      totalDurationMs: Date.now() - new Date(queue.startedAt).getTime(),
+      baseBranch,
+      features: queue.features.map(f => ({
+        slug: f.slug,
+        status: f.status,
+        startedAt: f.startedAt,
+        completedAt: f.completedAt
+      })),
+      summary: {
+        total: slugs.length,
+        completed: summary.completed,
+        failed: summary.failed,
+        conflicts: summary.conflicts
+      }
+    });
     return { success: summary.failed === 0 && summary.conflicts === 0, summary };
   } finally {
     // Always release lock when done