npm - aether-colony - Versions diffs - 2.0.0 - Mend

aether-colony 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of aether-colony might be problematic. Click here for more details.

Files changed (38) hide show

package/.claude/commands/ant/ant.md +89 -0
package/.claude/commands/ant/build.md +504 -0
package/.claude/commands/ant/colonize.md +94 -0
package/.claude/commands/ant/continue.md +674 -0
package/.claude/commands/ant/feedback.md +65 -0
package/.claude/commands/ant/flag.md +108 -0
package/.claude/commands/ant/flags.md +139 -0
package/.claude/commands/ant/focus.md +42 -0
package/.claude/commands/ant/init.md +129 -0
package/.claude/commands/ant/migrate-state.md +140 -0
package/.claude/commands/ant/organize.md +210 -0
package/.claude/commands/ant/pause-colony.md +87 -0
package/.claude/commands/ant/phase.md +86 -0
package/.claude/commands/ant/plan.md +409 -0
package/.claude/commands/ant/redirect.md +53 -0
package/.claude/commands/ant/resume-colony.md +83 -0
package/.claude/commands/ant/status.md +122 -0
package/.claude/commands/ant/watch.md +220 -0
package/LICENSE +21 -0
package/README.md +258 -0
package/bin/cli.js +196 -0
package/package.json +35 -0
package/runtime/DISCIPLINES.md +93 -0
package/runtime/QUEEN_ANT_ARCHITECTURE.md +347 -0
package/runtime/aether-utils.sh +760 -0
package/runtime/coding-standards.md +197 -0
package/runtime/debugging.md +207 -0
package/runtime/docs/pheromones.md +213 -0
package/runtime/learning.md +254 -0
package/runtime/planning.md +159 -0
package/runtime/tdd.md +257 -0
package/runtime/utils/atomic-write.sh +213 -0
package/runtime/utils/colorize-log.sh +132 -0
package/runtime/utils/file-lock.sh +122 -0
package/runtime/utils/watch-spawn-tree.sh +185 -0
package/runtime/verification-loop.md +159 -0
package/runtime/verification.md +116 -0
package/runtime/workers.md +671 -0

package/runtime/learning.md ADDED Viewed

@@ -0,0 +1,254 @@
+# Colony Learning Discipline
+## Overview
+The colony learns from every phase. Patterns observed during builds become **instincts** - atomic learned behaviors with confidence scoring that improve future work.
+## The Instinct Model
+An instinct is a small learned behavior:
+```yaml
+id: prefer-composition
+trigger: "when designing component architecture"
+confidence: 0.7
+domain: "architecture"
+source: "phase-3-observation"
+evidence:
+  - "Composition pattern succeeded in Phase 3"
+  - "User approved component structure"
+action: "Use composition over inheritance for component reuse"
+```
+**Properties:**
+- **Atomic** — one trigger, one action
+- **Confidence-weighted** — 0.3 = tentative, 0.9 = near certain
+- **Domain-tagged** — architecture, testing, code-style, debugging, workflow
+- **Evidence-backed** — tracks what observations created it
+## Confidence Scoring
+| Score | Meaning | Behavior |
+|-------|---------|----------|
+| 0.3 | Tentative | Suggested but not enforced |
+| 0.5 | Moderate | Applied when relevant |
+| 0.7 | Strong | Auto-applied in matching contexts |
+| 0.9 | Near-certain | Core colony behavior |
+**Confidence increases when:**
+- Pattern is repeatedly observed across phases
+- Pattern leads to successful outcomes
+- No corrections or rework needed
+**Confidence decreases when:**
+- Pattern leads to errors or rework
+- User provides negative feedback
+- Contradicting evidence appears
+## Pattern Detection
+During each phase, observe for:
+### 1. Success Patterns
+What worked well:
+- Approaches that completed without issues
+- Code structures that passed all tests
+- Workflows that were efficient
+### 2. Error Resolutions
+What was learned from debugging:
+- Root causes discovered
+- Fixes that worked
+- Architectural insights
+### 3. User Corrections
+What the user redirected:
+- Feedback via `/ant:feedback`
+- Redirects via `/ant:redirect`
+- Explicit corrections
+### 4. Tool Preferences
+What tools/patterns were effective:
+- Testing approaches that caught bugs
+- File organization that worked
+- Command sequences that succeeded
+## Instinct Structure
+Store in `memory.instincts`:
+```json
+{
+  "instincts": [
+    {
+      "id": "instinct_<timestamp>",
+      "trigger": "when X",
+      "action": "do Y",
+      "confidence": 0.5,
+      "domain": "testing",
+      "source": "phase-2",
+      "evidence": ["observation 1", "observation 2"],
+      "created_at": "<ISO-8601>",
+      "last_applied": null,
+      "applications": 0,
+      "successes": 0
+    }
+  ]
+}
+```
+## Instinct Lifecycle
+```
+Observation
+    │
+    ▼
+┌─────────────────────────────┐
+│     Pattern Detected        │
+│   (success/error/feedback)  │
+└─────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────┐
+│     Instinct Created        │
+│   (confidence: 0.3-0.5)     │
+└─────────────────────────────┘
+    │
+    │ Applied in future phases
+    ▼
+┌─────────────────────────────┐
+│   Confidence Adjusted       │
+│   (+0.1 success, -0.1 fail) │
+└─────────────────────────────┘
+    │
+    │ Reaches 0.9
+    ▼
+┌─────────────────────────────┐
+│     Core Behavior           │
+│   (always applied)          │
+└─────────────────────────────┘
+```
+## Extracting Instincts
+After each phase, the Prime Worker should identify:
+1. **What patterns led to success?**
+   - Approach taken
+   - Tools used effectively
+   - Code structures that worked
+2. **What was learned from errors?**
+   - Root causes found
+   - Fixes that worked
+   - What to avoid
+3. **What feedback was received?**
+   - User corrections
+   - Focus areas emphasized
+   - Patterns to avoid
+Format for extraction:
+```json
+"patterns_observed": [
+  {
+    "type": "success",
+    "trigger": "when implementing API endpoints",
+    "action": "use repository pattern with dependency injection",
+    "evidence": "All endpoint tests passed first try"
+  },
+  {
+    "type": "error_resolution",
+    "trigger": "when debugging async code",
+    "action": "check for missing await statements first",
+    "evidence": "Root cause was missing await in 3 cases"
+  },
+  {
+    "type": "user_feedback",
+    "trigger": "when structuring components",
+    "action": "prefer smaller, focused components",
+    "evidence": "User feedback: 'keep components small'"
+  }
+]
+```
+## Applying Instincts
+When starting a new phase, workers should:
+1. **Check relevant instincts** for the task domain
+2. **Apply high-confidence instincts** (≥0.7) automatically
+3. **Consider moderate instincts** (0.5-0.7) as suggestions
+4. **Log applications** for confidence tracking
+Format in worker prompt:
+```
+--- COLONY INSTINCTS ---
+Relevant learned patterns for this phase:
+[0.9] testing: Always run tests before claiming completion
+[0.7] architecture: Use composition over inheritance
+[0.5] code-style: Prefer functional patterns (tentative)
+```
+## Reporting Learned Patterns
+Prime Worker output should include:
+```json
+"learning": {
+  "patterns_observed": [
+    {
+      "type": "success|error_resolution|user_feedback",
+      "trigger": "when X",
+      "action": "do Y",
+      "evidence": "observation"
+    }
+  ],
+  "instincts_applied": ["instinct_id_1", "instinct_id_2"],
+  "instinct_outcomes": [
+    {"id": "instinct_id_1", "success": true},
+    {"id": "instinct_id_2", "success": false}
+  ]
+}
+```
+## Colony Memory Evolution
+Over time, instincts with high confidence become colony-wide behaviors:
+| Threshold | Evolution |
+|-----------|-----------|
+| 0.9+ with 5+ applications | Core behavior - always applied |
+| 3+ related instincts | Cluster into skill document |
+| Domain-specific cluster | Specialist worker enhancement |
+## Integration Points
+### /ant:build
+- Workers receive relevant instincts in prompt
+- Workers report patterns observed
+- Workers log instinct applications
+### /ant:continue
+- Extract patterns from build results
+- Create/update instincts
+- Adjust confidence based on outcomes
+### /ant:status
+- Show learned instincts with confidence
+- Show application statistics
+- Show recent pattern discoveries
+### /ant:feedback
+- Creates user_feedback instinct
+- High initial confidence (0.7)
+- Immediate application to current work
+## Privacy
+- Instincts stay local in `.aether/data/COLONY_STATE.json`
+- Only patterns extracted, not raw code
+- Export capability for sharing (future)

package/runtime/planning.md ADDED Viewed

@@ -0,0 +1,159 @@
+# Planning Discipline
+## Purpose
+Write comprehensive implementation plans with bite-sized tasks, assuming workers have zero context. Each task should be one action (2-5 minutes of work).
+## Core Principles
+- **DRY** - Don't Repeat Yourself
+- **YAGNI** - You Aren't Gonna Need It
+- **TDD** - Test-Driven Development (test first, always)
+- **Frequent Commits** - One commit per feature/fix
+## Bite-Sized Task Granularity
+**Each step is ONE action:**
+```
+"Write the failing test" - step
+"Run it to make sure it fails" - step
+"Implement the minimal code to make the test pass" - step
+"Run the tests and make sure they pass" - step
+"Commit" - step
+```
+**NOT acceptable:**
+- "Implement authentication with tests" (too big)
+- "Add validation" (too vague)
+- "Set up the project" (needs breakdown)
+## Task Structure
+Each task should include:
+### 1. Files Section
+```markdown
+**Files:**
+- Create: `exact/path/to/file.py`
+- Modify: `exact/path/to/existing.py:123-145`
+- Test: `tests/exact/path/to/test.py`
+```
+Always use exact paths. No "somewhere in src/" ambiguity.
+### 2. Steps with TDD Flow
+```markdown
+**Step 1: Write the failing test**
+\`\`\`python
+def test_specific_behavior():
+    result = function(input)
+    assert result == expected
+\`\`\`
+**Step 2: Run test to verify it fails**
+Run: `pytest tests/path/test.py::test_name -v`
+Expected: FAIL with "function not defined"
+**Step 3: Write minimal implementation**
+\`\`\`python
+def function(input):
+    return expected
+\`\`\`
+**Step 4: Run test to verify it passes**
+Run: `pytest tests/path/test.py::test_name -v`
+Expected: PASS
+**Step 5: Commit**
+\`\`\`bash
+git add tests/path/test.py src/path/file.py
+git commit -m "feat: add specific feature"
+\`\`\`
+```
+### 3. Expected Output
+For every command, specify expected output:
+```markdown
+Run: `npm test`
+Expected: 47 passing, 0 failing
+```
+Not just "run tests" - specify what success looks like.
+## Phase Structure
+```markdown
+### Phase N: [Component Name]
+**Goal:** [One sentence describing what this phase achieves]
+**Dependencies:** [What must exist before this phase]
+**Tasks:**
+#### Task N.1: [Specific action]
+**Files:**
+- Create: `src/components/Button.tsx`
+- Test: `tests/components/Button.test.tsx`
+**Steps:**
+(TDD steps as above)
+**Success Criteria:**
+- Tests pass for Button component
+- Component renders without errors
+```
+## Quality Checks
+Before a plan is complete, verify:
+1. **Exact paths** - Every file reference is complete path
+2. **Complete code** - No "add appropriate code" placeholders
+3. **Expected outputs** - Every command has expected result
+4. **TDD flow** - Test before implementation for each feature
+5. **No assumptions** - Worker could execute with zero context
+6. **Commit points** - Clear where to commit
+## Integration with Colony
+The Route-Setter Ant follows this discipline when generating plans.
+Each task in `plan.phases[N].tasks` should be:
+- Executable in 2-5 minutes
+- Self-contained with file paths
+- Verifiable with specific success criteria
+## Red Flags
+**Never in a plan:**
+- "Add tests" (which tests? for what?)
+- "Implement feature" (what specifically?)
+- "Update file" (which file? what changes?)
+- "Handle edge cases" (which cases?)
+**Always in a plan:**
+- Exact file paths
+- Complete code snippets
+- Exact commands to run
+- Expected outputs/results
+- One action per step
+## Why This Matters
+- Workers can execute without asking questions
+- Progress is measurable (step by step)
+- Debugging is easier (clear steps to trace)
+- Commits are atomic and meaningful
+- Quality is built in (TDD from start)

package/runtime/tdd.md ADDED Viewed

@@ -0,0 +1,257 @@
+# Test-Driven Development Discipline
+## The Iron Law
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+If you didn't watch the test fail, you don't know if it tests the right thing.
+Write code before the test? Delete it. Start over. No exceptions.
+## When to Use
+**Always:**
+- New features
+- Bug fixes
+- Refactoring
+- Behavior changes
+**The only exceptions (must be explicit):**
+- Throwaway prototypes (will be deleted)
+- Generated/config code
+- User explicitly opts out
+Thinking "skip TDD just this once"? Stop. That's rationalization.
+## Red-Green-Refactor Cycle
+```
+┌─────────────────────────────────────────────────────────────┐
+│                                                             │
+│   RED ────────► VERIFY RED ────────► GREEN                  │
+│   Write         (must fail           Write minimal          │
+│   failing       correctly)           code to pass           │
+│   test                                    │                 │
+│                                           ▼                 │
+│                                      VERIFY GREEN           │
+│                                      (must pass,            │
+│                                       all green)            │
+│                                           │                 │
+│   ◄───────────── REFACTOR ◄───────────────┘                 │
+│   Next test      Clean up                                   │
+│                  (stay green)                               │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+### RED - Write Failing Test
+Write ONE minimal test showing desired behavior.
+```typescript
+// GOOD: Clear name, tests real behavior, one thing
+test('rejects empty email with error message', async () => {
+  const result = await submitForm({ email: '' });
+  expect(result.error).toBe('Email required');
+});
+// BAD: Vague name, tests mock not code
+test('validation works', async () => {
+  const mock = jest.fn();
+  // ...
+});
+```
+**Requirements:**
+- One behavior per test
+- Clear descriptive name
+- Real code (mocks only if unavoidable)
+### VERIFY RED - Watch It Fail (MANDATORY)
+```bash
+npm test path/to/test.test.ts
+```
+Confirm:
+- Test **fails** (not errors)
+- Failure message is expected
+- Fails because feature missing (not typos)
+**Test passes immediately?** You're testing existing behavior. Fix test.
+**Test errors?** Fix error, re-run until it fails correctly.
+### GREEN - Minimal Code
+Write **simplest** code to pass the test.
+```typescript
+// GOOD: Just enough to pass
+function validateEmail(email: string): string | null {
+  if (!email?.trim()) return 'Email required';
+  return null;
+}
+// BAD: Over-engineered (YAGNI)
+function validateEmail(
+  email: string,
+  options?: { customMessage?: string; allowEmpty?: boolean }
+): ValidationResult { /* ... */ }
+```
+Don't add features beyond the test. Don't refactor other code. Don't "improve."
+### VERIFY GREEN - Watch It Pass (MANDATORY)
+```bash
+npm test path/to/test.test.ts
+```
+Confirm:
+- Test passes
+- Other tests still pass
+- Output pristine (no errors, warnings)
+**Test fails?** Fix code, not test.
+**Other tests fail?** Fix now.
+### REFACTOR - Clean Up
+Only after green:
+- Remove duplication
+- Improve names
+- Extract helpers
+**Keep tests green. Don't add behavior.**
+### REPEAT
+Next failing test for next behavior.
+## Test Types
+### Unit Tests
+- Individual functions
+- Component logic
+- Pure functions
+- < 50ms each
+### Integration Tests
+- API endpoints
+- Database operations
+- Service interactions
+### E2E Tests
+- Critical user flows
+- Complete workflows
+- Browser automation
+## Coverage Requirements
+- Minimum 80% coverage target
+- All edge cases covered
+- Error scenarios tested
+- Boundary conditions verified
+Verify with:
+```bash
+npm run test:coverage
+```
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Unverified code is debt. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "TDD will slow me down" | TDD faster than debugging. |
+| "Keep as reference" | You'll adapt it. That's testing after. Delete. |
+## Red Flags - STOP and Start Over
+If you catch yourself:
+- Writing code before test
+- Test passes immediately (didn't fail first)
+- Can't explain why test failed
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Keep as reference"
+- "This is different because..."
+**ALL mean: Delete code. Start over with TDD.**
+## Testing Anti-Patterns
+| Anti-Pattern | Problem | Fix |
+|--------------|---------|-----|
+| Testing implementation | Breaks on refactor | Test behavior/output |
+| Brittle selectors | `.css-class-xyz` | `[data-testid="x"]` |
+| Test interdependence | Tests depend on order | Each test sets up own data |
+| Mock everything | Tests mock, not code | Real code, inject deps |
+| One giant test | Can't isolate failure | One behavior per test |
+## TDD Report Format
+When reporting TDD work:
+```
+🧪 TDD Report
+=============
+Feature: {what was implemented}
+Cycle 1:
+  RED: test('rejects empty email')
+  Verified: ✓ Failed with "undefined is not 'Email required'"
+  GREEN: Added email validation
+  Verified: ✓ Passes
+Cycle 2:
+  RED: test('rejects invalid format')
+  Verified: ✓ Failed with expected message
+  GREEN: Added regex check
+  Verified: ✓ Passes
+Coverage: 87% (target: 80%)
+All tests: 47/47 passing
+```
+## Bug Fix with TDD
+1. Write failing test reproducing bug
+2. Verify it fails (proves bug exists)
+3. Fix bug
+4. Verify test passes
+5. Test now prevents regression
+**Never fix bugs without a test.**
+## Integration with Debugging
+When debugging reveals root cause:
+1. Write test that would have caught it
+2. Verify test fails
+3. Apply fix
+4. Verify test passes
+This ensures the bug never returns.
+## Verification Checklist
+Before marking work complete:
+- [ ] Every new function has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason
+- [ ] Wrote minimal code to pass
+- [ ] All tests pass
+- [ ] Coverage meets threshold
+- [ ] No skipped/disabled tests
+Can't check all boxes? You skipped TDD. Start over.