npm - homunculus-code - Versions diffs - 0.1.0 - Mend

homunculus-code 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/CONTRIBUTING.md +56 -0
package/LICENSE +21 -0
package/README.md +443 -0
package/bin/init.js +317 -0
package/commands/eval-skill.md +48 -0
package/commands/evolve.md +67 -0
package/commands/improve-skill.md +50 -0
package/core/evaluate-session.js +173 -0
package/core/observe.sh +51 -0
package/core/prune-instincts.js +159 -0
package/docs/nightly-agent.md +130 -0
package/examples/reference/README.md +47 -0
package/examples/reference/architecture.yaml +886 -0
package/examples/reference/evolved-agents/assistant-explorer.md +86 -0
package/examples/reference/evolved-agents/shell-debugger.md +108 -0
package/examples/reference/evolved-agents/tdd-runner.md +112 -0
package/examples/reference/evolved-evals/api-system-diagnosis.eval.yaml +125 -0
package/examples/reference/evolved-evals/assistant-system-management.eval.yaml +123 -0
package/examples/reference/evolved-evals/claude-code-reference.eval.yaml +394 -0
package/examples/reference/evolved-evals/development-verification-patterns.eval.yaml +117 -0
package/examples/reference/evolved-evals/multi-agent-design-patterns.eval.yaml +151 -0
package/examples/reference/evolved-evals/shell-automation-patterns.eval.yaml +209 -0
package/examples/reference/evolved-evals/tdd-workflow.eval.yaml +191 -0
package/examples/reference/evolved-evals/workflows.eval.yaml +148 -0
package/examples/reference/evolved-skills/api-system-diagnosis.md +234 -0
package/examples/reference/evolved-skills/assistant-system-management.md +199 -0
package/examples/reference/evolved-skills/development-verification-patterns.md +243 -0
package/examples/reference/evolved-skills/multi-agent-design-patterns.md +259 -0
package/examples/reference/evolved-skills/shell-automation-patterns.md +347 -0
package/examples/reference/evolved-skills/tdd-workflow.md +272 -0
package/examples/reference/evolved-skills/workflows.md +237 -0
package/package.json +25 -0
package/templates/CLAUDE.md.template +36 -0
package/templates/architecture.template.yaml +41 -0
package/templates/rules/evolution-system.md +29 -0

package/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Contributing to Homunculus
+Thanks for your interest in contributing! Here's how you can help.
+## Ways to Contribute
+### Share Your Evolution
+The most valuable contribution: **show us what your system evolved into.** Open a Discussion in the "Show & Tell" category with:
+- How long you've been running Homunculus
+- What goals you defined
+- What the system generated (instincts, skills, agents)
+- Anything surprising
+### Report Issues
+Found a bug? Open an issue with:
+- What you expected
+- What happened
+- Steps to reproduce
+- Your environment (OS, Node version, Claude Code version)
+### Improve Documentation
+- Fix typos or unclear explanations
+- Add examples from your experience
+- Translate to other languages
+### Contribute Code
+1. Fork the repo
+2. Create a feature branch
+3. Make your changes
+4. Test with `npx homunculus-code init` in a fresh directory
+5. Open a PR
+## Development
+```bash
+git clone https://github.com/JavanC/Homunculus.git
+cd Homunculus
+# Test the init wizard
+mkdir /tmp/test-project && cd /tmp/test-project
+node ~/Homunculus/bin/init.js
+# Test with --yes flag (non-interactive)
+node ~/Homunculus/bin/init.js --yes
+```
+## Code Style
+- Plain JavaScript (no TypeScript, no build step)
+- Node.js 18+ features OK
+- Shell scripts: `#!/usr/bin/env bash` + `set -euo pipefail`
+- Keep it simple — this is a seed, not a framework
+## License
+By contributing, you agree that your contributions will be licensed under the MIT License.

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Javan Chen
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,443 @@
+# Homunculus for Claude Code
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Claude Code](https://img.shields.io/badge/Claude%20Code-v2.1.70+-blue)](https://docs.anthropic.com/en/docs/claude-code)
+[![Node.js](https://img.shields.io/badge/Node.js-18+-green)](https://nodejs.org)
+**Stop tuning your AI. Let it tune itself.**
+You spend hours tweaking prompts, writing rules, researching new features, configuring tools. Homunculus flips this: define your goals, and the system evolves itself — skills, agents, hooks, scripts, everything — while you focus on actual work.
+```bash
+npx homunculus-code init
+```
+One command. Answer a few questions. Start using Claude Code. Your assistant begins evolving.
+> **Proof it works:** One developer ran this system for 15 days. It auto-generated 168 behavioral patterns, converged them into 7 tested skills, created 3 specialized agents, 15 commands, and 19 automation scripts. The nightly agent alone made 134 commits across 11 nights — improving the system while the developer slept. [See results →](#real-world-results)
+---
+## Why
+Every Claude Code power user hits the same wall:
+```
+Week 1:  "Claude is amazing!"
+Week 2:  "Let me add some rules and hooks..."
+Week 3:  "I need skills, agents, MCP servers, custom commands..."
+Week 4:  "I spend more time configuring Claude than using it."
+```
+Sound familiar? Even with tools like OpenClaw that can generate new skills, you're still the one deciding *what* to improve, *when* to improve it, and *whether* the improvement actually worked. The AI can write code, but it can't set its own direction.
+**Here's the difference:**
+```
+Without Homunculus:                      With Homunculus:
+  You notice tests keep failing          Goal tree detects test_pass_rate dropped
+  → You research testing patterns        → Nightly agent auto-evolves tdd skill
+  → You write a pre-commit skill         → Eval confirms 100% pass
+  → You test it manually                 → Morning report: "Fixed. Here's what changed."
+  → It breaks after a Claude update      → Next update? System re-evaluates and adapts
+  → You fix it again...                  → You slept through all of this
+```
+---
+## The Core Idea: Goal Tree
+Most AI tools optimize locally — "you did X, so I'll remember X." Homunculus optimizes **globally** — toward goals you define in a tree:
+```
+                        🎯 My AI Assistant
+                     ┌──────────┼──────────┐
+                     │          │          │
+                 Code Quality  Speed   Knowledge
+                 ┌───┴───┐     │      ┌───┴───┐
+              Testing  Review  Tasks  Research  Memory
+```
+Each node defines **why** it exists, **how** to measure it, and **what** currently implements it:
+```yaml
+# architecture.yaml — from the reference system
+autonomous_action:
+  purpose: "Act without waiting for human commands"
+  goals:
+    nightly_research:
+      purpose: "Discover better approaches while developer sleeps"
+      realized_by: heartbeat/heartbeat.sh        # a shell script
+      health_check:
+        command: "test $(find logs/ -mtime -1 | wc -l) -gt 0"
+        expected: "nightly agent ran in last 24h"
+    task_management:
+      purpose: "Track and complete tasks autonomously"
+      realized_by: quest-board/server.js          # a web app
+      metrics:
+        - name: forge_completion_rate
+          healthy: "> 70%"
+continuous_evolution:
+  purpose: "Improve over time without human intervention"
+  goals:
+    pattern_extraction:
+      purpose: "Learn from every session"
+      realized_by: scripts/evaluate-session.js    # a Node script
+    skill_aggregation:
+      purpose: "Converge patterns into tested skills"
+      realized_by: homunculus/evolved/skills/      # evolved artifacts
+```
+The `realized_by` field can point to **anything**:
+| Type | Example | When to use |
+|------|---------|-------------|
+| Skill | `skills/tdd-workflow.md` | Behavioral knowledge |
+| Agent | `agents/code-reviewer.md` | Specialized AI subagent |
+| Hook | `hooks/pre-commit.sh` | Automated trigger |
+| Script | `scripts/deploy.sh` | Shell automation |
+| Cron | `launchd/nightly-check.plist` | Scheduled task |
+| MCP | `mcp-servers/github` | External integration |
+| Rule | `rules/security.md` | Claude Code behavioral rule |
+| Command | `commands/quality-gate.md` | Slash command workflow |
+**Goals are stable. Implementations evolve.** The same goal can go from a mental note → instinct → skill → hook → agent. The system replaces and upgrades implementations while keeping goals intact.
+---
+## How It Evolves
+```
+    You use Claude Code normally
+                │
+    ┌───────────┼───────────┐
+    ▼           ▼           ▼
+ Observe    Health Check  Research
+ (hooks)    (goal tree)   (nightly)
+    │           │           │
+    └───────────┼───────────┘
+                ▼
+         ┌────────────┐
+         │  Evolve     │   Goals stay the same.
+         │  ────────── │   Implementations get better.
+         │  Extract    │
+         │  Converge   │
+         │  Eval       │
+         │  Replace    │
+         └────────────┘
+```
+**Three inputs, one engine:**
+1. **Observe** — hooks watch your tool usage, extract recurring patterns into "instincts"
+2. **Health check** — the goal tree identifies which goals are unhealthy → focus there
+3. **Research** — a nightly agent scans for better approaches and proposes experiments
+The evolution engine then:
+- Extracts behavioral patterns → **instincts** (confidence-scored, auto-decay)
+- Converges related instincts → **skills** (tested with eval specs)
+- Evaluates and improves skills → **eval → improve loop** until 100% pass
+- Proposes replacements → instinct becomes hook, script becomes agent
+- Prunes outdated implementations → automatic archival
+---
+## Quick Start
+### 1. Install
+```bash
+npx homunculus-code init
+```
+The wizard asks you a few questions and sets everything up:
+```
+🧬 Homunculus — Self-evolving AI Assistant
+? What's your project name? my-app
+? What's your main goal? Build a reliable SaaS product
+✅ Created homunculus/ structure
+✅ Generated architecture.yaml with your goals
+✅ Added observation hook to Claude Code
+✅ Copied evolution scripts
+Done! Start using Claude Code. Your assistant will evolve.
+```
+### 2. Use Claude Code Normally
+That's it. The system observes, extracts, and evolves in the background. Check progress anytime:
+```bash
+claude "/eval-skill"         # Run skill evaluations
+claude "/improve-skill"      # Auto-improve a skill
+claude "/evolve"             # Converge instincts into skills
+```
+### 3. Refine Your Goals (Optional)
+As you use the system, refine `architecture.yaml` — add sub-goals, metrics, and health checks. The more specific your goals, the smarter the evolution.
+---
+## Key Concepts
+### Goal Tree (`architecture.yaml`)
+The central nervous system. Every goal has a purpose, metrics, and health checks. The evolution system reads this to decide **what** to improve and **how** to measure success.
+### Instincts
+Small behavioral patterns auto-extracted from your usage. They have confidence scores that grow with reinforcement and decay over time (half-life: 90 days). Think of them as the system's "muscle memory."
+### Skills
+When multiple instincts cover the same area, they converge into a skill — a tested, versioned knowledge module. Every skill has an eval spec with scenario tests. Skills that fail eval get automatically improved.
+### Eval Specs
+Scenario-based tests for skills. Each scenario defines context, expected behavior, and anti-patterns. The system runs evals and auto-improves until 100% pass rate.
+### Replaceable Implementations
+The core principle: **same goal, evolving implementation.**
+```
+Goal: "Catch bugs before merge"
+  v1: instinct → "remember to run tests"
+  v2: skill    → tdd-workflow.md (with eval spec)
+  v3: hook     → pre-commit.sh (automated)
+  v4: agent    → code-reviewer.md (AI-powered)
+```
+---
+## Nightly Agent
+This is what makes the system truly autonomous. Without it, you'd still need to manually run `/eval-skill`, `/improve-skill`, `/evolve`. The nightly agent **does all of that for you** while you sleep.
+A scheduled agent (via `launchd` on macOS or `cron` on Linux) runs a heartbeat loop every night:
+```
+ You go to sleep
+        │
+        ▼
+ ┌──────────────────────────────────────────────┐
+ │  Nightly Agent (heartbeat loop)              │
+ │                                              │
+ │  1. Health Check                             │
+ │     Scan goal tree → which goals are red?    │
+ │                                              │
+ │  2. Evolve                                   │
+ │     Run /eval-skill on all skills            │
+ │     Run /improve-skill on failing ones       │
+ │     Converge new instincts → skills          │
+ │     Prune outdated instincts                 │
+ │                                              │
+ │  3. Research                                 │
+ │     Scan tech news, changelogs, community    │
+ │     "Is there a better way to achieve X?"    │
+ │                                              │
+ │  4. Experiment                               │
+ │     Generate hypotheses from weak goals      │
+ │     Run experiments in isolated worktrees    │
+ │     Merge if passed, discard if failed       │
+ │                                              │
+ │  5. Report                                   │
+ │     Generate morning report                  │
+ └──────────────────────────────────────────────┘
+        │
+        ▼
+ You wake up to a smarter assistant + a report
+```
+**Here's what a real morning report looks like:**
+```markdown
+## Morning Report — 2026-03-22
+### What Changed Overnight
+- Improved skill: claude-code-reference v4.6 → v4.8
+  (added coverage for CC v2.1.77-80 features)
+- Archived 3 outdated instincts (covered by evolved skills)
+- New experiment passed: eval noise threshold set to 5pp
+### Goal Health
+- continuous_evolution:  ✅ healthy (7 skills, all 100% eval)
+- code_quality:          ✅ healthy (148/148 tests passing)
+- resource_awareness:    ⚠️ attention (context usage trending up)
+  → Queued experiment: split large skill into chapters
+### Research Findings
+- Claude Code v2.1.81: new --bare flag could speed up headless mode
+  → Experiment queued for tomorrow night
+- New pattern detected in community: writer/reviewer agent separation
+  → Instinct created, will converge if reinforced
+### Suggested Actions (for you)
+- Review 2 knowledge card candidates from overnight research
+- Approve experiment: context reduction via skill splitting
+```
+In our reference system, the nightly agent produced **134 commits across 11 nights** — evolving skills, running experiments, researching Claude Code updates, and archiving outdated patterns. All without any human input.
+The nightly agent is what turns Homunculus from "a tool you use" into "a system that grows on its own."
+See [docs/nightly-agent.md](docs/nightly-agent.md) for setup.
+---
+## Real-World Results
+Built and tested on a real personal AI assistant. In **15 days** (starting from zero):
+| What evolved | Count | Details |
+|-------------|-------|---------|
+| Instincts | **168** | 84 active + 84 auto-archived (system prunes itself) |
+| Skills | **7** | All 100% eval pass rate (93 test scenarios) |
+| Subagents | **3** | Auto-extracted from repetitive main-thread patterns |
+| Slash commands | **15** | Workflow automations (forge-dev, quality-gate, eval...) |
+| Scripts | **19** | Session lifecycle, health checks, evolution reports |
+| Hooks | **11** | Observation, compaction, quality gates |
+| Rules | **6** | Core patterns, evolution system, knowledge management |
+| Scheduled agents | **4** | Nightly heartbeat, Discord bridge, daily news, trading |
+| ADRs | **8** | Architecture decision records |
+| Experiments | **13** | Structured A/B tests with pass/fail tracking |
+| Goal tree | **9 goals / 46+ sub-goals** | Each with health checks and metrics |
+| Total commits | **1,235** | System iterates fast |
+The nightly agent alone: **134 commits across 11 nights**.
+[See the full reference implementation →](examples/reference/)
+---
+## What Makes This Different
+| | Homunculus | OpenClaw | Cursor Rules | Claude Memory |
+|---|---|---|---|---|
+| **Goal-driven** | Goal tree with metrics + health checks | No | No | No |
+| **Learns from usage** | Auto-observation → instincts → skills | Skill generation | Manual | Auto-memory |
+| **Quality control** | Eval specs + scenario tests | None | None | None |
+| **Autonomous overnight** | Nightly agent: eval + improve + research + experiment | No | No | No |
+| **Self-improving** | Eval → improve → replace loop | Partial | No | No |
+| **Meta-evolution** | Evolution mechanism evolves itself | No | No | No |
+| **Implementation agnostic** | Skills, agents, hooks, scripts, MCP, cron... | Skills only | Rules only | Memory only |
+OpenClaw is great at generating skills on demand. Homunculus goes further: it decides *what* to improve based on goal health, *validates* improvements with evals, and does it all *autonomously* overnight. They solve different problems — OpenClaw is a power tool, Homunculus is an operating system for evolution.
+---
+## What Gets Generated
+After `npx homunculus-code init`:
+```
+your-project/
+├── architecture.yaml           # Your goal tree (the brain)
+├── homunculus/
+│   ├── instincts/
+│   │   ├── personal/           # Auto-extracted patterns
+│   │   └── archived/           # Auto-pruned old patterns
+│   ├── evolved/
+│   │   ├── skills/             # Converged, tested knowledge
+│   │   ├── agents/             # Specialized subagents
+│   │   └── evals/              # Skill evaluation specs
+│   └── experiments/            # A/B test tracking
+├── .claude/
+│   ├── rules/
+│   │   └── evolution-system.md # How Claude should evolve
+│   └── commands/
+│       ├── eval-skill.md       # /eval-skill
+│       ├── improve-skill.md    # /improve-skill
+│       └── evolve.md           # /evolve
+└── scripts/
+    ├── observe.sh              # Observation hook
+    ├── evaluate-session.js     # Pattern extraction
+    └── prune-instincts.js      # Automatic cleanup
+```
+---
+## Advanced: Meta-Evolution
+The evolution mechanism itself evolves:
+- **Instinct survival rate** too low? → Automatically raise extraction thresholds
+- **Eval discrimination** too low? → Add harder boundary scenarios
+- **Skill convergence** too slow? → Adjust aggregation triggers
+Tracked via three metrics:
+1. `instinct_survival_rate` — % of instincts that survive 14 days
+2. `skill_convergence` — time from first instinct to evolved skill
+3. `eval_discrimination` — % of eval scenarios that actually distinguish between versions
+---
+## Requirements
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) v2.1.70+
+- Node.js 18+
+- macOS or Linux
+---
+## FAQ
+<details>
+<summary><strong>Is this only for Claude Code?</strong></summary>
+The concepts (goal tree, eval-driven evolution, replaceable implementations) are tool-agnostic. The current implementation targets Claude Code hooks and commands, but the core pipeline could be adapted to other AI harnesses.
+</details>
+<details>
+<summary><strong>Does it cost extra API credits?</strong></summary>
+The observation hook is lightweight (no API calls). Instinct extraction uses a short Claude call per session (~$0.01). The nightly agent is optional and budget-configurable.
+</details>
+<details>
+<summary><strong>Can I use my existing CLAUDE.md and rules?</strong></summary>
+Yes. `npx homunculus-code init` adds to your project without overwriting existing files. Your current setup becomes the starting point for evolution.
+</details>
+<details>
+<summary><strong>How is this different from Claude Code's built-in memory?</strong></summary>
+Claude's memory records facts. Homunculus evolves *behavior* — tested skills, automated hooks, specialized agents — all driven by goals you define, with quality gates that prevent regression.
+</details>
+<details>
+<summary><strong>How does this compare to OpenClaw?</strong></summary>
+OpenClaw is excellent at generating skills on demand — it's a powerful tool. Homunculus solves a different problem: autonomous, goal-directed evolution. It decides what needs improving (via goal health), validates improvements (via evals), and does the work overnight (via the nightly agent). You could use both — OpenClaw for on-demand skill generation, Homunculus for the autonomous evolution layer on top.
+</details>
+---
+## Philosophy
+> "Your AI assistant should be a seed, not a statue."
+Stop spending your evenings tuning AI. Plant a seed, define your goals, and let it grow. The more you use it, the better it gets — and it tells you exactly how and why through eval scores, goal health checks, and morning reports.
+---
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md).
+## License
+MIT
+---
+**Built by [Javan](https://github.com/JavanC) and his self-evolving AI assistant.**