npm - @hanzlaa/rcode - Versions diffs - 3.6.4 → 3.6.6 - Mend

@hanzlaa/rcode 3.6.4 → 3.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hanzlaa/rcode",
-  "version": "3.6.4",
+  "version": "3.6.6",
   "description": "rcode — the AI team that never forgets. Persistent memory, specialist agents, and slash commands for AI IDEs. Works in Claude Code, Cursor, Gemini, VS Code, and Antigravity.",
   "main": "cli/index.js",
   "bin": {

package/rihal/skills/actions/4-implementation/rihal-debug/SKILL.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: rihal-debug
-internal: true
-description: Root-cause debugging via the scientific method.
+description: Root-cause debugging via the scientific method. Enforces investigate-before-fix, structured hypothesis iteration, multi-component evidence gathering, and architectural escalation after 3 failed fixes.
 triggers:
   # English
   - "debug this"
@@ -12,11 +11,15 @@ triggers:
   - "track this down"
   - "narrow down the bug"
   - "scientific method"
+  - "bug fix"
+  - "something is broken"
   # Roman Urdu / Hindi
   - "kharab kyu hai"
   - "bug dhoondo"
   - "fix karo bug"
   - "theek karo"
+  - "kya masla hai"
+  - "kyu kaam nahi kar raha"
   # Arabic native
   - "صحّح هذا"
   - "ما المشكلة"
@@ -29,29 +32,136 @@ user-invocable: false
 @.rihal/references/karpathy-guidelines.md
+## The Iron Law
+```
+NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
+```
+If you have not completed Phase 1, you cannot propose a fix. "It seems to work" is a red flag — keep investigating until the mechanism is clear. Symptom fixes are failure.
 ## Overview
-Debugging is investigation, not pattern-matching. Each iteration narrows the problem space — never widens it. The skill enforces a written hypothesis, an experiment that distinguishes "yes" from "no", and a captured observation. Random fixes that "happen to work" are not allowed — the bug must be understood.
+Debugging is investigation, not pattern-matching. Each iteration narrows the problem space — never widens it. The skill enforces a written hypothesis, an experiment that distinguishes "yes" from "no", and a captured observation. Random fixes are not allowed — the bug must be understood before the fix is written.
+## Phase 1 — Root Cause Investigation
+**BEFORE attempting ANY fix:**
+1. **Reproduce consistently.** Write the exact steps. If not reproducible, make it reproducible first — anything else is guessing.
+2. **Read the error carefully.** Don't skim stack traces. Note file paths, line numbers, error codes. They often contain the exact answer.
+3. **Check recent changes.** `git diff`, recent commits, new dependencies, config changes, environment differences.
+4. **Gather evidence in multi-component systems.**
+   When the system has multiple layers (API → service → DB, CI → build → signing, frontend → backend → queue):
+   Add diagnostic instrumentation at EACH component boundary BEFORE proposing fixes:
+   ```
+   For EACH boundary:
+     - Log what data enters the component
+     - Log what data exits the component
+     - Verify env/config propagation
+     - Check state at each layer
+   Run ONCE to gather evidence showing WHERE it breaks.
+   THEN identify the failing component.
+   THEN investigate that specific component.
+   ```
+   Example:
+   ```bash
+   # Layer 1: incoming request
+   console.log('[L1] body:', req.body, 'userId:', req.user?.id)
+   # Layer 2: service call
+   console.log('[L2] args to createTask:', args)
+   # Layer 3: DB query
+   console.log('[L3] Prisma input:', data)
+   ```
+   This reveals which layer fails — not guessing.
-## Workflow
+5. **Trace data flow backward.** Where does the bad value originate? What called this function with that bad value? Keep tracing up until you find the source. Fix at source, not at symptom.
-1. **Reproduce the bug.** Write the exact steps. If you can't reproduce it, the first job is making it reproducible — anything else is guessing.
-2. **State the hypothesis.** "I think the bug is in <component>; specifically <mechanism>." One sentence, falsifiable.
-3. **Design the experiment.** What single test, log line, or dataflow change would distinguish a true hypothesis from a false one?
-4. **Run it. Capture the observation.** Console output verbatim, screenshot, stack trace, network response — whatever the experiment produced.
-5. **Update the hypothesis.** Either confirmed (now narrow to the next layer) or refuted (form a new hypothesis based on what was observed).
-6. **Stop conditions:** the bug is reproducible from a unit test (then hand to `rihal-prove-it`), OR the root cause is a known external constraint (e.g. third-party API behaviour) that you record in `incidents/known-issues.md`.
-7. **Never apply a fix without understanding why it works.** "It seems to fix it" is a red flag — keep investigating until the mechanism is clear.
+## Phase 2 — Pattern Analysis
-## Sentry / observability integration
+Before forming a hypothesis, find the comparison point:
+1. **Find working examples.** Locate similar code in the same codebase that works. What's different?
+2. **Read reference implementations completely.** Don't skim — partial understanding guarantees bugs.
+3. **List every difference**, however small. Don't assume "that can't matter."
+4. **Check assumptions.** What config, environment, or state does this code assume?
+## Phase 3 — Hypothesis and Experiment
+Scientific method:
+1. **State ONE hypothesis.** "I think X is the root cause because Y." Write it down. Be specific, not vague.
+2. **Design the minimal experiment.** What single test, log line, or code change would confirm or refute this hypothesis?
+3. **Run it. Capture the observation verbatim.** Console output, stack trace, network response — whatever was produced.
+4. **Update.** Confirmed → Phase 4. Refuted → form a new hypothesis based on what was observed. Do NOT add more fixes on top.
+## Phase 4 — Implementation
+1. **Create a failing test first.** Simplest possible reproduction. Use `rihal-prove-it` for writing the test that locks the fix in.
+2. **Implement ONE fix.** Address the root cause identified. No "while I'm here" improvements. No bundled refactors.
+3. **Verify.** Test passes. No other tests broken. Issue actually resolved.
+4. **If fix doesn't work:** STOP. Count fix attempts.
+   - < 3 attempts: return to Phase 1 with new information.
+   - **≥ 3 attempts: STOP — this is an architectural problem.**
+## Architectural Escalation (after 3 failed fixes)
+Pattern that signals architectural problem:
+- Each fix reveals new coupling or shared state in a different place
+- Fixes require "massive refactoring" to implement
+- Each fix creates new symptoms elsewhere
+When this pattern appears:
+1. Stop attempting fixes
+2. Ask: is this pattern fundamentally sound, or are we continuing through inertia?
+3. Discuss with the user before attempting more fixes
+4. Consider `/rihal-council` for a cross-functional review
+## Sentry / Observability Integration
 If the project has Sentry (`@sentry/*` in `package.json` or `sentry-sdk` in Python):
-- Quote the actual Sentry issue ID and stack trace in the hypothesis section
-- Look at breadcrumbs for the chain of events leading to the error
-- Check the issue's "first seen / last seen" — recurring or one-off matters
+- Quote the actual Sentry issue ID and stack trace in the hypothesis
+- Read breadcrumbs for the chain of events leading to the error
+- Check "first seen / last seen" — recurring or one-off matters
 - Cross-reference with deployment timestamps to identify regressions
+## Red Flags — STOP and return to Phase 1
+If you catch yourself thinking any of these:
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes and run tests"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "It seems to fix it"
+- "One more fix attempt" (when already tried 2+)
+- Proposing solutions before tracing data flow
+- Each fix reveals a new problem in a different place
+**ALL of these mean: STOP. Return to Phase 1.**
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Issue is simple, don't need process" | Simple bugs have root causes too. Process is fast for simple bugs. |
+| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
+| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
+| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
+| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
+| "One more fix attempt" (after 2+) | 3+ failures = architectural problem. Escalate, don't fix again. |
 ## Output Format
 ```
@@ -59,9 +169,12 @@ Reproduction:
   <exact steps>
   <observed vs expected>
+Phase 1 — Evidence
+  <what layers were instrumented and what they showed>
 Iteration 1
-  Hypothesis: <falsifiable claim>
-  Experiment: <what we did>
+  Hypothesis: <falsifiable claim — "I think X because Y">
+  Experiment: <the single test/log that would confirm or refute>
   Observation: <verbatim output>
   Outcome: confirmed | refuted | partial
@@ -69,26 +182,30 @@ Iteration N
   ...
 Root cause:
-  <one paragraph explanation of the actual mechanism>
+  <one paragraph — the actual mechanism, not the symptom>
 Fix scope:
-  <minimum change that fixes the cause, not the symptom>
+  <minimum change that addresses the cause>
 Regression test:
-  <hand off to rihal-prove-it for the test that locks the fix in>
+  <hand to rihal-prove-it — the test that locks the fix in>
 ```
-Do NOT include: "tried X and it seems to work"; speculative "maybe it's caching"; broad refactors disguised as bug fixes.
+Do NOT include: "tried X and it seems to work" · speculative "maybe it's caching" · broad refactors disguised as bug fixes.
 ## Examples
-**Happy path** — "Login fails for Arabic usernames" → reproduce: POST `/login` with `محمد` returns 500 → hypothesis: encoding boundary in URL parsing → experiment: add hex-dump log of the raw request → observation: bytes are UTF-8 but the Postgres driver re-encodes as Latin-1 → root cause: client_encoding mismatch → fix: pin client_encoding=utf8 → regression test asserts non-ASCII login returns 200.
+**Happy path** — "Login fails for Arabic usernames" → reproduce: POST `/login` with `محمد` returns 500 → Phase 1: hex-dump log of raw request body → observation: UTF-8 bytes, but Postgres driver re-encodes as Latin-1 → root cause: `client_encoding` mismatch → fix: pin `client_encoding=utf8` in connection string → regression test asserts non-ASCII login returns 200.
+**Multi-component** — "Tasks not appearing after creation" → instrument three layers: controller logs input, service logs DB call args, DB query logs row count → observation: service receives correct args, DB returns `rowCount: 0` → hypothesis: wrong table name in query → confirmed → one-line fix, regression test added.
+**Edge case — flaky test** — Passes locally, fails in CI 30% of the time → hypothesis: race condition → experiment: `--runInBand` → still flaky → next hypothesis: filesystem timing → experiment: `await fs.stat` after write → confirmed → fix.
-**Edge case — flaky test** — Test passes locally, fails in CI 30% of the time → hypothesis: race condition → experiment: run with `--runInBand` → observation: still flaky → next hypothesis: filesystem timing → experiment: await fs.stat after write → confirmed → fix.
+**Negative — shotgun fix** — "I added a try/catch around the whole function and now it doesn't crash." Refuse. The exception is silently swallowed; the bug still exists. Restore the throw and form a real hypothesis.
-**Negative — shotgun fix** — "I added a try/catch around the whole function and now it doesn't crash". Refuse. The exception is now silently swallowed; the bug still exists. Restore the throw and form a real hypothesis.
+**Architectural escalation** — Three separate fixes attempted (missing await, wrong env var, stale cache) — each fix exposed a new problem elsewhere. Stop. The async data-flow design is wrong. Escalate to `/rihal-council` before attempting Fix #4.
 ## Memory Bank Hooks
-- **Reads:** `.rihal/memory/incidents/known-issues.md` (so prior debugging context is loaded), `.rihal/memory/project/stack.md` (Sentry presence)
-- **Writes:** append the root cause to `.rihal/memory/incidents/post-mortems/YYYYMMDD-<slug>.md` when an incident is resolved; remove the entry from `known-issues.md` once the fix is verified in production
+- **Reads:** `.rihal/memory/incidents/known-issues.md` (prior debugging context), `.rihal/memory/project/stack.md` (Sentry presence, observability tools)
+- **Writes:** append root cause to `.rihal/memory/incidents/post-mortems/YYYYMMDD-<slug>.md` when resolved; remove from `known-issues.md` once fix is verified in production

package/rihal/skills/actions/4-implementation/rihal-dev-story/SKILL.md CHANGED Viewed

@@ -32,15 +32,39 @@ Follow the instructions in ./workflow.md.
 - Reads story file first; executes tasks in order
 - Marks tasks [x] only when implementation AND tests pass
 - Updates story's File List and Dev Agent Record sections
-- Reports: "Story complete. N tasks done. Tests: PASS (X). Files: [list]."
+- Runs two-stage automated review before marking complete: spec compliance → code quality
+- Reports: "Story complete. N tasks done. Tests: PASS (X). Files: [list]. Reviews: SPEC ✅ QUALITY ✅"
 - Do NOT invent scope beyond the story
 - Do NOT commit with red tests
+## Review Protocol
+After all tasks complete, dispatches two fresh reviewer subagents before handing off to human review:
+**Stage 1 — Spec Compliance:** Confirms every AC is implemented, nothing extra was built. Repeats until COMPLIANT.
+**Stage 2 — Code Quality:** Reviews naming, error handling, test depth, security, maintainability. Fixes High-severity issues; logs Medium issues for human reviewer. Repeats until APPROVED/APPROVED_WITH_NOTES.
+## Model Selection
+When dispatching reviewer subagents or sub-tasks:
+- Mechanical tasks (isolated, clear spec, 1-2 files) → cheapest/fastest model
+- Integration tasks (multi-file, pattern matching) → standard model
+- Architecture, design, or review tasks → most capable model
+## Implementer Status Protocol
+When running as a subagent implementer, report one of:
+- **DONE** — all requirements met, tests pass
+- **DONE_WITH_CONCERNS** — complete but flagging doubts about correctness or scope
+- **NEEDS_CONTEXT** — cannot proceed without specific missing information
+- **BLOCKED** — cannot complete; caller must restructure or escalate
 ## Examples
 ### Happy Path
 **Input:** "dev this story: .rihal/phases/phase-02/stories/story-005.md"
-**Expected behavior:** Read story, execute tasks in order, write tests, run suite after each task, mark checkboxes, update File List.
+**Expected behavior:** Read story, execute tasks in order, write tests, run suite after each task, mark checkboxes, update File List. After all tasks: dispatch spec compliance reviewer, dispatch code quality reviewer, mark as "review".
 ### Edge Case: Missing Story
 **Input:** "dev the login story" (story file doesn't exist)
@@ -49,3 +73,7 @@ Follow the instructions in ./workflow.md.
 ### Edge Case: Red Tests Mid-Execution
 **Input:** (task 2 breaks a test from task 1)
 **Expected behavior:** STOP. Report regression. Fix before continuing.
+### Edge Case: Spec Compliance Fails Review
+**Input:** Implementation complete but reviewer finds missing AC
+**Expected behavior:** Fix the gap, re-run tests, re-dispatch spec compliance reviewer. Do not proceed to code quality review until spec compliance passes.

package/rihal/skills/actions/4-implementation/rihal-dev-story/workflow.md CHANGED Viewed

@@ -263,6 +263,21 @@ Load config from `{project-root}/.rihal/config.json` and resolve:
   <step n="5" goal="Implement task following red-green-refactor cycle">
     <critical>FOLLOW THE STORY FILE TASKS/SUBTASKS SEQUENCE EXACTLY AS WRITTEN - NO DEVIATION</critical>
+    <!-- Model selection guidance — applies when dispatching sub-tasks to subagents -->
+    <model_selection>
+      Mechanical tasks (isolated function, clear spec, 1-2 files) → use cheapest/fastest model
+      Integration tasks (multi-file coordination, pattern matching) → use standard model
+      Architecture, design, or review tasks → use most capable model
+    </model_selection>
+    <!-- Implementer status protocol — use when completing tasks as a subagent -->
+    <status_protocol>
+      DONE: All requirements met, tests pass, committed.
+      DONE_WITH_CONCERNS: Complete but flagging doubts — describe the concern. Caller decides before review.
+      NEEDS_CONTEXT: Cannot proceed without missing information — specify exactly what is needed.
+      BLOCKED: Cannot complete despite context — describe the blocker. Caller must restructure or escalate.
+    </status_protocol>
     <action>Review the current task/subtask from the story file - this is your authoritative implementation guide</action>
     <action>Plan implementation following red-green-refactor cycle</action>
@@ -410,6 +425,65 @@ Load config from `{project-root}/.rihal/config.json` and resolve:
     <action if="definition-of-done validation fails">HALT - Address DoD failures before completing</action>
   </step>
+  <step n="9.5" goal="Two-stage automated review: spec compliance then code quality">
+    <critical>Both stages must pass before the story is marked complete. Never skip either stage. Spec compliance must pass before starting code quality review.</critical>
+    <output>🔍 **Two-Stage Review** — verifying before handing off to human review</output>
+    <!-- STAGE 1: Spec Compliance -->
+    <output>
+    ━━━ Stage 1: Spec Compliance Review ━━━
+    </output>
+    <action>Spawn a fresh spec-compliance reviewer subagent. Provide it:
+      - Full story file contents (especially Acceptance Criteria and Tasks sections)
+      - List of all modified files from the File List section
+      - Brief: "Review that every AC is satisfied in the code. Flag anything built outside spec. Flag any AC with no corresponding implementation."
+    </action>
+    <action>Reviewer reports one of: COMPLIANT | NON_COMPLIANT (with specific gaps listed)</action>
+    <check if="spec compliance reviewer reports NON_COMPLIANT">
+      <action>Fix each gap: implement missing ACs, remove any out-of-spec additions</action>
+      <action>Re-run tests to confirm fixes pass</action>
+      <action>Re-dispatch spec compliance reviewer with the same prompt</action>
+      <action>Repeat until COMPLIANT</action>
+    </check>
+    <output>✅ Stage 1 passed — implementation is spec-compliant</output>
+    <!-- STAGE 2: Code Quality Review -->
+    <output>
+    ━━━ Stage 2: Code Quality Review ━━━
+    </output>
+    <action>Spawn a fresh code quality reviewer subagent. Provide it:
+      - All modified files (from File List)
+      - Story title and ACs (context for what was being built)
+      - Project coding standards (contents of CLAUDE.md or project-context.md if available)
+      - Brief: "Review code quality: naming conventions, error handling, test coverage depth, security, performance, maintainability. Severity: High (must fix) | Medium (should fix) | Low (note only)."
+    </action>
+    <action>Reviewer reports one of: APPROVED | APPROVED_WITH_NOTES | CHANGES_REQUIRED (severity breakdown)</action>
+    <check if="code quality reviewer reports CHANGES_REQUIRED (High severity issues)">
+      <action>Fix all High-severity issues</action>
+      <action>Re-run tests to confirm fixes pass</action>
+      <action>Re-dispatch code quality reviewer with the same prompt</action>
+      <action>Repeat until APPROVED or APPROVED_WITH_NOTES</action>
+    </check>
+    <check if="Medium-severity issues exist">
+      <action>Fix Medium-severity issues when the fix is straightforward and low-risk</action>
+      <action>Log unfixed Medium issues in Dev Agent Record → Completion Notes for human reviewer awareness</action>
+    </check>
+    <output>✅ Stage 2 passed — code quality verified</output>
+    <output>
+    ✅ **Two-stage review complete** — story is spec-compliant and quality-approved.
+       Ready for human review.
+    </output>
+  </step>
   <step n="10" goal="Completion communication and user support">
     <action>Execute the enhanced definition-of-done checklist using the validation framework</action>
     <action>Prepare a concise summary in Dev Agent Record → Completion Notes</action>

package/rihal/workflows/plan-spawn-planner.md CHANGED Viewed

@@ -114,6 +114,37 @@ Output consumed by /rihal-execute. Plans need:
 </downstream_consumer>
 <deep_work_rules>
+## File Structure Map (REQUIRED — before task decomposition)
+Before writing any task, produce a file structure map listing every file this plan will create or modify:
+```
+FILES_TO_CREATE:
+  - exact/path/to/new/file.ts  — responsibility: [one sentence]
+FILES_TO_MODIFY:
+  - exact/path/to/existing.ts  — what changes: [one sentence]
+FILES_FOR_TESTS:
+  - tests/exact/path/test.ts   — tests for: [one sentence]
+```
+Rules:
+- Each file has one clear responsibility — if you can't describe it in one sentence, split the file
+- Files that change together should live together (split by responsibility, not layer)
+- This map is what informs task decomposition — each task should produce self-contained changes
+- In existing codebases: follow established patterns; only restructure files if a file is genuinely unwieldy and the split is included as its own task
+## No-Placeholders Rule (HARD BLOCKER)
+Every step must contain the actual content the executor needs. These are **plan failures** — never write them:
+- "TBD", "TODO", "implement later", "fill in details"
+- "Add appropriate error handling" / "add validation" / "handle edge cases" (without code)
+- "Write tests for the above" (without actual test code)
+- "Similar to Task N" — copy the code; executor may read tasks out of order
+- Steps that describe what to do without showing how (code blocks required for code steps)
+- References to types, functions, or methods not yet defined in any task in this plan
+If a step would require TBD content, either: (a) do the research now and fill it in, or (b) split into a research task that outputs a decision, followed by an implementation task that consumes it.
 ## Anti-Shallow Execution Rules (MANDATORY)
 Every task MUST include these fields — they are NOT optional:
@@ -185,6 +216,8 @@ Every task MUST include these fields — they are NOT optional:
 </deep_work_rules>
 <quality_gate>
+- [ ] File structure map written before first task (files_to_create / files_to_modify / files_for_tests)
+- [ ] No placeholder patterns: no TBD/TODO/implement-later, no "similar to Task N", no code steps without code
 - [ ] SPRINT.md files created in phase directory
 - [ ] Each plan has valid frontmatter including `files_modified:` array aggregating all `<files>` paths across tasks (consumed by execute.md intra-wave overlap checker)
 - [ ] Tasks are specific and actionable
@@ -195,10 +228,21 @@ Every task MUST include these fields — they are NOT optional:
 - [ ] Every task has `<done>` with a single observable acceptance sentence (Dimension 2 requirement)
 - [ ] Every `<action>` contains concrete values (no "align X with Y" without specifying what)
 - [ ] Tasks extending existing code have `<interfaces>` with relevant signatures
+- [ ] Type/name consistency: function names, types, and method signatures match across all tasks (no rename drift)
 - [ ] Dependencies correctly identified
 - [ ] Waves assigned for parallel execution
 - [ ] must_haves derived from phase goal
 </quality_gate>
+<self_review>
+After writing the complete plan, review the spec with fresh eyes before handing off:
+1. **Spec coverage** — skim each requirement in the phase goal / CONTEXT.md decisions. Can you point to a task that implements it? List any gaps; add tasks if needed.
+2. **Placeholder scan** — search the plan for the no-placeholder patterns listed above. Fix any found inline.
+3. **Type consistency** — check that function names, types, and method signatures used in later tasks match what earlier tasks define. A method called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
+Fix issues inline. No sub-agent needed — this is a quick self-check before the sprint-checker runs.
+</self_review>
 ```
 ```