npm - claude-self-reflect - Versions diffs - 7.1.10 → 7.1.11 - Mend

claude-self-reflect 7.1.10 → 7.1.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md +36 -78
package/docs/design/GRADER_PROMPT.md +81 -0
package/docs/design/batch_ground_truth_generator.py +496 -0
package/docs/design/batch_import_all_projects.py +477 -0
package/docs/design/batch_import_v3.py +278 -0
package/docs/design/conversation-analyzer/SKILL.md +133 -0
package/docs/design/conversation-analyzer/SKILL_V2.md +218 -0
package/docs/design/conversation-analyzer/extract_structured.py +186 -0
package/docs/design/extract_events_v3.py +533 -0
package/docs/design/import_existing_batch.py +188 -0
package/docs/design/recover_all_batches.py +297 -0
package/docs/design/recover_batch_results.py +287 -0
package/package.json +4 -1

package/README.md CHANGED Viewed

@@ -26,7 +26,7 @@ Give Claude perfect memory of all your conversations. Search past discussions in
 **100% Local by Default** • **20x Faster** • **Zero Configuration** • **Production Ready**
-> **Latest: v7.1.9 Cross-Project Iteration Memory** - Ralph loops now share memory across ALL projects automatically. [Learn more →](#ralph-loop-memory-integration-v719)
+> **Latest: v7.1.9 Cross-Project Iteration Memory** - Ralph loops now share memory across ALL projects automatically. [Learn more →](#ralph-loop-memory)
 ## Why This Exists
@@ -39,9 +39,8 @@ Claude starts fresh every conversation. You've solved complex bugs, designed arc
 - [The Magic](#the-magic)
 - [Before & After](#before--after)
 - [Real Examples](#real-examples)
-- [NEW: Real-time Indexing Status](#new-real-time-indexing-status-in-your-terminal)
+- [Ralph Loop Memory](#ralph-loop-memory)
 - [Key Features](#key-features)
-- [Ralph Loop Memory Integration](#ralph-loop-memory-integration)
 - [Code Quality Insights](#code-quality-insights)
 - [Architecture](#architecture)
 - [Requirements](#requirements)
@@ -133,17 +132,44 @@ Claude: "Found conversations about JWT patterns including User.authenticate
         PKCE, and social login integration."
 ```
-## NEW: Real-time Indexing Status in Your Terminal
+## Ralph Loop Memory
-See your conversation indexing progress directly in your statusline:
+<div align="center">
+<img src="docs/images/ralph-loop-csr.png" alt="Ralph Loop with CSR Memory - From hamster wheel to upward spiral" width="800"/>
+</div>
+**The difference between spinning in circles and building on every iteration.**
+Use the [ralph-wiggum plugin](https://github.com/anthropics/claude-code-plugins/tree/main/ralph-wiggum) for long tasks? CSR gives your Ralph loops **persistent memory across sessions and projects**.
+### Without CSR: The Hamster Wheel
+- Each context compaction = everything forgotten
+- Same mistakes repeated across iterations
+- No learning from past sessions
+- Cross-project insights lost forever
+### With CSR: The Upward Spiral
+- **Automatic backup** before context compaction
+- **Anti-pattern injection** - "DON'T RETRY THESE" surfaces first
+- **Success pattern learning** - reuse what worked before
+- **Cross-project memory** - learn from ALL your projects
+### Quick Setup
+```bash
+./scripts/ralph/install_hooks.sh        # Install hooks globally
+./scripts/ralph/install_hooks.sh --check  # Verify installation
+```
-### Fully Indexed (100%)
-![Statusline showing 100% indexed](docs/images/statusbar-1.png)
+### How It Works
+1. Start a Ralph loop: `/ralph-wiggum:ralph-loop "Build feature X"`
+2. Work naturally - CSR hooks capture state automatically
+3. **Stop hook** stores each iteration's learnings
+4. **PreCompact hook** backs up state before compaction
+5. Next session retrieves past insights, failed approaches, and wins
-### Active Indexing (50% with backlog)
-![Statusline showing 50% indexed with 7h backlog](docs/images/statusbar-2.png)
+> **v7.1.9+**: Cross-project iteration memory - hooks work for ALL projects, entries tagged with `project_{name}` for global searchability.
-Works with [Claude Code Statusline](https://github.com/sirmalloc/ccstatusline) - shows progress bars, percentages, and indexing lag in real-time! The statusline also displays MCP connection status (✓ Connected) and collection counts (28/29 indexed).
+[Full documentation →](docs/development/ralph-memory-integration.md)
 ## Code Quality Insights
@@ -325,74 +351,6 @@ Claude: [Searches across ALL your projects]
 </details>
-<details>
-<summary><b>Ralph Loop Memory Integration (v7.1.9+)</b></summary>
-<div align="center">
-<img src="docs/images/ralph-loop-csr.png" alt="Ralph Loop with CSR Memory - From hamster wheel to upward spiral" width="800"/>
-</div>
-Use the [ralph-wiggum plugin](https://github.com/anthropics/claude-code-plugins/tree/main/ralph-wiggum) for long tasks? CSR automatically gives Ralph loops **cross-session AND cross-project memory**:
-**Core Features:**
-- **Automatic backup** before context compaction
-- **Past session retrieval** when starting new Ralph loops
-- **Failed approach tracking** - never repeat the same mistakes
-- **Success pattern learning** - reuse what worked before
-**v7.1.9 Cross-Project Iteration Memory (NEW!):**
-- **Global Hook System** - Hooks fire for ALL projects, not just CSR
-- **Stop Hook** - Captures iteration state after every Claude response
-- **PreCompact Hook** - Backs up Ralph state before context compaction
-- **Project Tagging** - Each entry tagged with `project_{name}` for cross-project visibility
-- **Automatic Storage** - No manual protocol needed, hooks capture everything
-**v7.1+ Enhanced Features:**
-- **Error Signature Deduplication** - Normalizes errors (removes line numbers, paths, timestamps)
-- **Output Decline Detection** - Circuit breaker pattern detects >70% output drop
-- **Confidence-Based Exit** - 0-100 scoring (tasks complete, tests passing, no errors)
-- **Anti-Pattern Injection** - "DON'T RETRY THESE" surfaces failed approaches first
-- **Work Type Tracking** - IMPLEMENTATION/TESTING/DEBUGGING/DOCUMENTATION
-- **Error-Centric Search** - Find past sessions by error pattern
-**Setup (one-time):**
-```bash
-./scripts/ralph/install_hooks.sh        # Install CSR hooks globally
-./scripts/ralph/install_hooks.sh --check  # Verify installation
-```
-**How it works:**
-1. Start a Ralph loop in ANY project: `/ralph-wiggum:ralph-loop "Build feature X"`
-2. Work naturally - state is tracked in `.claude/ralph-loop.local.md`
-3. **Stop hook** captures each iteration automatically
-4. **PreCompact hook** backs up state before compaction
-5. All entries stored with project tags for cross-project search
-**Cross-Project Example:**
-```bash
-# In project-a: learned Docker fix
-# In project-b: CSR finds it automatically
-reflect_on_past("docker memory issue")
-# Returns: project_a session with Docker solution
-```
-**Verified proof (2026-01-05):**
-```
-# Hook collection shows cross-project entries:
-Total entries: 23
-1. 05:25:06 📦 PreCompact | 📧 emailmon
-2. 05:22:00 🔄 Iteration  | 🔍 claude-self-reflect
-3. 05:13:51 🔄 Iteration  | 📧 emailmon
-# Entries include project tags:
-tags: ['__csr_hook_auto__', 'project_emailmon', 'ralph_iteration']
-```
-[Full documentation →](docs/development/ralph-memory-integration.md)
-</details>
 <details>
 <summary><b>Memory Decay</b></summary>

package/docs/design/GRADER_PROMPT.md ADDED Viewed

@@ -0,0 +1,81 @@
+# Ground Truth Evaluation Prompt
+You are evaluating a code generation session for ground truth labeling. Your task is to assess the quality and effectiveness of the solution provided.
+## Evaluation Criteria
+### 1. Functional Correctness (0.0-1.0)
+- Does the solution address the user's request?
+- Are there any bugs or errors in the implementation?
+- Do builds and tests pass?
+- Is the code logically sound?
+### 2. Design Quality (0.0-1.0)
+- Does the solution follow best practices?
+- Is the code well-structured and maintainable?
+- Are appropriate patterns and architectures used?
+- Is the implementation efficient?
+### 3. Completeness (0.0-1.0)
+- Were all requirements addressed?
+- Are edge cases handled?
+- Is error handling appropriate?
+- Is the solution production-ready?
+## Scoring Guidelines
+**0.9-1.0 (Excellent)**
+- All requirements met with high quality
+- Best practices followed throughout
+- Well-tested and robust
+- Production-ready code
+**0.7-0.8 (Good)**
+- Requirements met with minor issues
+- Generally follows best practices
+- Some testing coverage
+- Mostly production-ready
+**0.5-0.6 (Acceptable)**
+- Core requirements met
+- Some quality issues
+- Limited testing
+- Needs refinement
+**0.3-0.4 (Needs Work)**
+- Partial implementation
+- Quality concerns
+- Missing tests
+- Significant gaps
+**0.0-0.2 (Inadequate)**
+- Major issues or incomplete
+- Does not meet requirements
+- Serious bugs or design flaws
+## Response Format
+Provide your evaluation in the following structure:
+```json
+{
+  "functional_correctness": 0.0-1.0,
+  "design_quality": 0.0-1.0,
+  "completeness": 0.0-1.0,
+  "overall_grade": 0.0-1.0,
+  "reasoning": "Detailed explanation of scores",
+  "strengths": ["List of strong points"],
+  "weaknesses": ["List of areas for improvement"],
+  "confidence": 0.0-1.0
+}
+```
+## Context Placeholders
+The following sections will be filled in for each evaluation:
+- `<request>`: The user's original request
+- `<solution>`: The implementation provided
+- `<tier1_results>`: Build and test results
+- `<rubric>`: Specific evaluation criteria for this session
+- `<narrative>`: Full conversation narrative for context