aether-colony 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of aether-colony might be problematic. Click here for more details.

Files changed (38) hide show
  1. package/.claude/commands/ant/ant.md +89 -0
  2. package/.claude/commands/ant/build.md +504 -0
  3. package/.claude/commands/ant/colonize.md +94 -0
  4. package/.claude/commands/ant/continue.md +674 -0
  5. package/.claude/commands/ant/feedback.md +65 -0
  6. package/.claude/commands/ant/flag.md +108 -0
  7. package/.claude/commands/ant/flags.md +139 -0
  8. package/.claude/commands/ant/focus.md +42 -0
  9. package/.claude/commands/ant/init.md +129 -0
  10. package/.claude/commands/ant/migrate-state.md +140 -0
  11. package/.claude/commands/ant/organize.md +210 -0
  12. package/.claude/commands/ant/pause-colony.md +87 -0
  13. package/.claude/commands/ant/phase.md +86 -0
  14. package/.claude/commands/ant/plan.md +409 -0
  15. package/.claude/commands/ant/redirect.md +53 -0
  16. package/.claude/commands/ant/resume-colony.md +83 -0
  17. package/.claude/commands/ant/status.md +122 -0
  18. package/.claude/commands/ant/watch.md +220 -0
  19. package/LICENSE +21 -0
  20. package/README.md +258 -0
  21. package/bin/cli.js +196 -0
  22. package/package.json +35 -0
  23. package/runtime/DISCIPLINES.md +93 -0
  24. package/runtime/QUEEN_ANT_ARCHITECTURE.md +347 -0
  25. package/runtime/aether-utils.sh +760 -0
  26. package/runtime/coding-standards.md +197 -0
  27. package/runtime/debugging.md +207 -0
  28. package/runtime/docs/pheromones.md +213 -0
  29. package/runtime/learning.md +254 -0
  30. package/runtime/planning.md +159 -0
  31. package/runtime/tdd.md +257 -0
  32. package/runtime/utils/atomic-write.sh +213 -0
  33. package/runtime/utils/colorize-log.sh +132 -0
  34. package/runtime/utils/file-lock.sh +122 -0
  35. package/runtime/utils/watch-spawn-tree.sh +185 -0
  36. package/runtime/verification-loop.md +159 -0
  37. package/runtime/verification.md +116 -0
  38. package/runtime/workers.md +671 -0
@@ -0,0 +1,254 @@
1
+ # Colony Learning Discipline
2
+
3
+ ## Overview
4
+
5
+ The colony learns from every phase. Patterns observed during builds become **instincts** - atomic learned behaviors with confidence scoring that improve future work.
6
+
7
+ ## The Instinct Model
8
+
9
+ An instinct is a small learned behavior:
10
+
11
+ ```yaml
12
+ id: prefer-composition
13
+ trigger: "when designing component architecture"
14
+ confidence: 0.7
15
+ domain: "architecture"
16
+ source: "phase-3-observation"
17
+ evidence:
18
+ - "Composition pattern succeeded in Phase 3"
19
+ - "User approved component structure"
20
+ action: "Use composition over inheritance for component reuse"
21
+ ```
22
+
23
+ **Properties:**
24
+ - **Atomic** — one trigger, one action
25
+ - **Confidence-weighted** — 0.3 = tentative, 0.9 = near certain
26
+ - **Domain-tagged** — architecture, testing, code-style, debugging, workflow
27
+ - **Evidence-backed** — tracks what observations created it
28
+
29
+ ## Confidence Scoring
30
+
31
+ | Score | Meaning | Behavior |
32
+ |-------|---------|----------|
33
+ | 0.3 | Tentative | Suggested but not enforced |
34
+ | 0.5 | Moderate | Applied when relevant |
35
+ | 0.7 | Strong | Auto-applied in matching contexts |
36
+ | 0.9 | Near-certain | Core colony behavior |
37
+
38
+ **Confidence increases when:**
39
+ - Pattern is repeatedly observed across phases
40
+ - Pattern leads to successful outcomes
41
+ - No corrections or rework needed
42
+
43
+ **Confidence decreases when:**
44
+ - Pattern leads to errors or rework
45
+ - User provides negative feedback
46
+ - Contradicting evidence appears
47
+
48
+ ## Pattern Detection
49
+
50
+ During each phase, observe for:
51
+
52
+ ### 1. Success Patterns
53
+ What worked well:
54
+ - Approaches that completed without issues
55
+ - Code structures that passed all tests
56
+ - Workflows that were efficient
57
+
58
+ ### 2. Error Resolutions
59
+ What was learned from debugging:
60
+ - Root causes discovered
61
+ - Fixes that worked
62
+ - Architectural insights
63
+
64
+ ### 3. User Corrections
65
+ What the user redirected:
66
+ - Feedback via `/ant:feedback`
67
+ - Redirects via `/ant:redirect`
68
+ - Explicit corrections
69
+
70
+ ### 4. Tool Preferences
71
+ What tools/patterns were effective:
72
+ - Testing approaches that caught bugs
73
+ - File organization that worked
74
+ - Command sequences that succeeded
75
+
76
+ ## Instinct Structure
77
+
78
+ Store in `memory.instincts`:
79
+
80
+ ```json
81
+ {
82
+ "instincts": [
83
+ {
84
+ "id": "instinct_<timestamp>",
85
+ "trigger": "when X",
86
+ "action": "do Y",
87
+ "confidence": 0.5,
88
+ "domain": "testing",
89
+ "source": "phase-2",
90
+ "evidence": ["observation 1", "observation 2"],
91
+ "created_at": "<ISO-8601>",
92
+ "last_applied": null,
93
+ "applications": 0,
94
+ "successes": 0
95
+ }
96
+ ]
97
+ }
98
+ ```
99
+
100
+ ## Instinct Lifecycle
101
+
102
+ ```
103
+ Observation
104
+
105
+
106
+ ┌─────────────────────────────┐
107
+ │ Pattern Detected │
108
+ │ (success/error/feedback) │
109
+ └─────────────────────────────┘
110
+
111
+
112
+ ┌─────────────────────────────┐
113
+ │ Instinct Created │
114
+ │ (confidence: 0.3-0.5) │
115
+ └─────────────────────────────┘
116
+
117
+ │ Applied in future phases
118
+
119
+ ┌─────────────────────────────┐
120
+ │ Confidence Adjusted │
121
+ │ (+0.1 success, -0.1 fail) │
122
+ └─────────────────────────────┘
123
+
124
+ │ Reaches 0.9
125
+
126
+ ┌─────────────────────────────┐
127
+ │ Core Behavior │
128
+ │ (always applied) │
129
+ └─────────────────────────────┘
130
+ ```
131
+
132
+ ## Extracting Instincts
133
+
134
+ After each phase, the Prime Worker should identify:
135
+
136
+ 1. **What patterns led to success?**
137
+ - Approach taken
138
+ - Tools used effectively
139
+ - Code structures that worked
140
+
141
+ 2. **What was learned from errors?**
142
+ - Root causes found
143
+ - Fixes that worked
144
+ - What to avoid
145
+
146
+ 3. **What feedback was received?**
147
+ - User corrections
148
+ - Focus areas emphasized
149
+ - Patterns to avoid
150
+
151
+ Format for extraction:
152
+
153
+ ```json
154
+ "patterns_observed": [
155
+ {
156
+ "type": "success",
157
+ "trigger": "when implementing API endpoints",
158
+ "action": "use repository pattern with dependency injection",
159
+ "evidence": "All endpoint tests passed first try"
160
+ },
161
+ {
162
+ "type": "error_resolution",
163
+ "trigger": "when debugging async code",
164
+ "action": "check for missing await statements first",
165
+ "evidence": "Root cause was missing await in 3 cases"
166
+ },
167
+ {
168
+ "type": "user_feedback",
169
+ "trigger": "when structuring components",
170
+ "action": "prefer smaller, focused components",
171
+ "evidence": "User feedback: 'keep components small'"
172
+ }
173
+ ]
174
+ ```
175
+
176
+ ## Applying Instincts
177
+
178
+ When starting a new phase, workers should:
179
+
180
+ 1. **Check relevant instincts** for the task domain
181
+ 2. **Apply high-confidence instincts** (≥0.7) automatically
182
+ 3. **Consider moderate instincts** (0.5-0.7) as suggestions
183
+ 4. **Log applications** for confidence tracking
184
+
185
+ Format in worker prompt:
186
+
187
+ ```
188
+ --- COLONY INSTINCTS ---
189
+ Relevant learned patterns for this phase:
190
+
191
+ [0.9] testing: Always run tests before claiming completion
192
+ [0.7] architecture: Use composition over inheritance
193
+ [0.5] code-style: Prefer functional patterns (tentative)
194
+ ```
195
+
196
+ ## Reporting Learned Patterns
197
+
198
+ Prime Worker output should include:
199
+
200
+ ```json
201
+ "learning": {
202
+ "patterns_observed": [
203
+ {
204
+ "type": "success|error_resolution|user_feedback",
205
+ "trigger": "when X",
206
+ "action": "do Y",
207
+ "evidence": "observation"
208
+ }
209
+ ],
210
+ "instincts_applied": ["instinct_id_1", "instinct_id_2"],
211
+ "instinct_outcomes": [
212
+ {"id": "instinct_id_1", "success": true},
213
+ {"id": "instinct_id_2", "success": false}
214
+ ]
215
+ }
216
+ ```
217
+
218
+ ## Colony Memory Evolution
219
+
220
+ Over time, instincts with high confidence become colony-wide behaviors:
221
+
222
+ | Threshold | Evolution |
223
+ |-----------|-----------|
224
+ | 0.9+ with 5+ applications | Core behavior - always applied |
225
+ | 3+ related instincts | Cluster into skill document |
226
+ | Domain-specific cluster | Specialist worker enhancement |
227
+
228
+ ## Integration Points
229
+
230
+ ### /ant:build
231
+ - Workers receive relevant instincts in prompt
232
+ - Workers report patterns observed
233
+ - Workers log instinct applications
234
+
235
+ ### /ant:continue
236
+ - Extract patterns from build results
237
+ - Create/update instincts
238
+ - Adjust confidence based on outcomes
239
+
240
+ ### /ant:status
241
+ - Show learned instincts with confidence
242
+ - Show application statistics
243
+ - Show recent pattern discoveries
244
+
245
+ ### /ant:feedback
246
+ - Creates user_feedback instinct
247
+ - High initial confidence (0.7)
248
+ - Immediate application to current work
249
+
250
+ ## Privacy
251
+
252
+ - Instincts stay local in `.aether/data/COLONY_STATE.json`
253
+ - Only patterns extracted, not raw code
254
+ - Export capability for sharing (future)
@@ -0,0 +1,159 @@
1
+ # Planning Discipline
2
+
3
+ ## Purpose
4
+
5
+ Write comprehensive implementation plans with bite-sized tasks, assuming workers have zero context. Each task should be one action (2-5 minutes of work).
6
+
7
+ ## Core Principles
8
+
9
+ - **DRY** - Don't Repeat Yourself
10
+ - **YAGNI** - You Aren't Gonna Need It
11
+ - **TDD** - Test-Driven Development (test first, always)
12
+ - **Frequent Commits** - One commit per feature/fix
13
+
14
+ ## Bite-Sized Task Granularity
15
+
16
+ **Each step is ONE action:**
17
+
18
+ ```
19
+ "Write the failing test" - step
20
+ "Run it to make sure it fails" - step
21
+ "Implement the minimal code to make the test pass" - step
22
+ "Run the tests and make sure they pass" - step
23
+ "Commit" - step
24
+ ```
25
+
26
+ **NOT acceptable:**
27
+ - "Implement authentication with tests" (too big)
28
+ - "Add validation" (too vague)
29
+ - "Set up the project" (needs breakdown)
30
+
31
+ ## Task Structure
32
+
33
+ Each task should include:
34
+
35
+ ### 1. Files Section
36
+
37
+ ```markdown
38
+ **Files:**
39
+ - Create: `exact/path/to/file.py`
40
+ - Modify: `exact/path/to/existing.py:123-145`
41
+ - Test: `tests/exact/path/to/test.py`
42
+ ```
43
+
44
+ Always use exact paths. No "somewhere in src/" ambiguity.
45
+
46
+ ### 2. Steps with TDD Flow
47
+
48
+ ```markdown
49
+ **Step 1: Write the failing test**
50
+
51
+ \`\`\`python
52
+ def test_specific_behavior():
53
+ result = function(input)
54
+ assert result == expected
55
+ \`\`\`
56
+
57
+ **Step 2: Run test to verify it fails**
58
+
59
+ Run: `pytest tests/path/test.py::test_name -v`
60
+ Expected: FAIL with "function not defined"
61
+
62
+ **Step 3: Write minimal implementation**
63
+
64
+ \`\`\`python
65
+ def function(input):
66
+ return expected
67
+ \`\`\`
68
+
69
+ **Step 4: Run test to verify it passes**
70
+
71
+ Run: `pytest tests/path/test.py::test_name -v`
72
+ Expected: PASS
73
+
74
+ **Step 5: Commit**
75
+
76
+ \`\`\`bash
77
+ git add tests/path/test.py src/path/file.py
78
+ git commit -m "feat: add specific feature"
79
+ \`\`\`
80
+ ```
81
+
82
+ ### 3. Expected Output
83
+
84
+ For every command, specify expected output:
85
+
86
+ ```markdown
87
+ Run: `npm test`
88
+ Expected: 47 passing, 0 failing
89
+ ```
90
+
91
+ Not just "run tests" - specify what success looks like.
92
+
93
+ ## Phase Structure
94
+
95
+ ```markdown
96
+ ### Phase N: [Component Name]
97
+
98
+ **Goal:** [One sentence describing what this phase achieves]
99
+
100
+ **Dependencies:** [What must exist before this phase]
101
+
102
+ **Tasks:**
103
+
104
+ #### Task N.1: [Specific action]
105
+
106
+ **Files:**
107
+ - Create: `src/components/Button.tsx`
108
+ - Test: `tests/components/Button.test.tsx`
109
+
110
+ **Steps:**
111
+ (TDD steps as above)
112
+
113
+ **Success Criteria:**
114
+ - Tests pass for Button component
115
+ - Component renders without errors
116
+ ```
117
+
118
+ ## Quality Checks
119
+
120
+ Before a plan is complete, verify:
121
+
122
+ 1. **Exact paths** - Every file reference is complete path
123
+ 2. **Complete code** - No "add appropriate code" placeholders
124
+ 3. **Expected outputs** - Every command has expected result
125
+ 4. **TDD flow** - Test before implementation for each feature
126
+ 5. **No assumptions** - Worker could execute with zero context
127
+ 6. **Commit points** - Clear where to commit
128
+
129
+ ## Integration with Colony
130
+
131
+ The Route-Setter Ant follows this discipline when generating plans.
132
+
133
+ Each task in `plan.phases[N].tasks` should be:
134
+ - Executable in 2-5 minutes
135
+ - Self-contained with file paths
136
+ - Verifiable with specific success criteria
137
+
138
+ ## Red Flags
139
+
140
+ **Never in a plan:**
141
+ - "Add tests" (which tests? for what?)
142
+ - "Implement feature" (what specifically?)
143
+ - "Update file" (which file? what changes?)
144
+ - "Handle edge cases" (which cases?)
145
+
146
+ **Always in a plan:**
147
+ - Exact file paths
148
+ - Complete code snippets
149
+ - Exact commands to run
150
+ - Expected outputs/results
151
+ - One action per step
152
+
153
+ ## Why This Matters
154
+
155
+ - Workers can execute without asking questions
156
+ - Progress is measurable (step by step)
157
+ - Debugging is easier (clear steps to trace)
158
+ - Commits are atomic and meaningful
159
+ - Quality is built in (TDD from start)
package/runtime/tdd.md ADDED
@@ -0,0 +1,257 @@
1
+ # Test-Driven Development Discipline
2
+
3
+ ## The Iron Law
4
+
5
+ ```
6
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
7
+ ```
8
+
9
+ If you didn't watch the test fail, you don't know if it tests the right thing.
10
+
11
+ Write code before the test? Delete it. Start over. No exceptions.
12
+
13
+ ## When to Use
14
+
15
+ **Always:**
16
+ - New features
17
+ - Bug fixes
18
+ - Refactoring
19
+ - Behavior changes
20
+
21
+ **The only exceptions (must be explicit):**
22
+ - Throwaway prototypes (will be deleted)
23
+ - Generated/config code
24
+ - User explicitly opts out
25
+
26
+ Thinking "skip TDD just this once"? Stop. That's rationalization.
27
+
28
+ ## Red-Green-Refactor Cycle
29
+
30
+ ```
31
+ ┌─────────────────────────────────────────────────────────────┐
32
+ │ │
33
+ │ RED ────────► VERIFY RED ────────► GREEN │
34
+ │ Write (must fail Write minimal │
35
+ │ failing correctly) code to pass │
36
+ │ test │ │
37
+ │ ▼ │
38
+ │ VERIFY GREEN │
39
+ │ (must pass, │
40
+ │ all green) │
41
+ │ │ │
42
+ │ ◄───────────── REFACTOR ◄───────────────┘ │
43
+ │ Next test Clean up │
44
+ │ (stay green) │
45
+ │ │
46
+ └─────────────────────────────────────────────────────────────┘
47
+ ```
48
+
49
+ ### RED - Write Failing Test
50
+
51
+ Write ONE minimal test showing desired behavior.
52
+
53
+ ```typescript
54
+ // GOOD: Clear name, tests real behavior, one thing
55
+ test('rejects empty email with error message', async () => {
56
+ const result = await submitForm({ email: '' });
57
+ expect(result.error).toBe('Email required');
58
+ });
59
+
60
+ // BAD: Vague name, tests mock not code
61
+ test('validation works', async () => {
62
+ const mock = jest.fn();
63
+ // ...
64
+ });
65
+ ```
66
+
67
+ **Requirements:**
68
+ - One behavior per test
69
+ - Clear descriptive name
70
+ - Real code (mocks only if unavoidable)
71
+
72
+ ### VERIFY RED - Watch It Fail (MANDATORY)
73
+
74
+ ```bash
75
+ npm test path/to/test.test.ts
76
+ ```
77
+
78
+ Confirm:
79
+ - Test **fails** (not errors)
80
+ - Failure message is expected
81
+ - Fails because feature missing (not typos)
82
+
83
+ **Test passes immediately?** You're testing existing behavior. Fix test.
84
+
85
+ **Test errors?** Fix error, re-run until it fails correctly.
86
+
87
+ ### GREEN - Minimal Code
88
+
89
+ Write **simplest** code to pass the test.
90
+
91
+ ```typescript
92
+ // GOOD: Just enough to pass
93
+ function validateEmail(email: string): string | null {
94
+ if (!email?.trim()) return 'Email required';
95
+ return null;
96
+ }
97
+
98
+ // BAD: Over-engineered (YAGNI)
99
+ function validateEmail(
100
+ email: string,
101
+ options?: { customMessage?: string; allowEmpty?: boolean }
102
+ ): ValidationResult { /* ... */ }
103
+ ```
104
+
105
+ Don't add features beyond the test. Don't refactor other code. Don't "improve."
106
+
107
+ ### VERIFY GREEN - Watch It Pass (MANDATORY)
108
+
109
+ ```bash
110
+ npm test path/to/test.test.ts
111
+ ```
112
+
113
+ Confirm:
114
+ - Test passes
115
+ - Other tests still pass
116
+ - Output pristine (no errors, warnings)
117
+
118
+ **Test fails?** Fix code, not test.
119
+ **Other tests fail?** Fix now.
120
+
121
+ ### REFACTOR - Clean Up
122
+
123
+ Only after green:
124
+ - Remove duplication
125
+ - Improve names
126
+ - Extract helpers
127
+
128
+ **Keep tests green. Don't add behavior.**
129
+
130
+ ### REPEAT
131
+
132
+ Next failing test for next behavior.
133
+
134
+ ## Test Types
135
+
136
+ ### Unit Tests
137
+ - Individual functions
138
+ - Component logic
139
+ - Pure functions
140
+ - < 50ms each
141
+
142
+ ### Integration Tests
143
+ - API endpoints
144
+ - Database operations
145
+ - Service interactions
146
+
147
+ ### E2E Tests
148
+ - Critical user flows
149
+ - Complete workflows
150
+ - Browser automation
151
+
152
+ ## Coverage Requirements
153
+
154
+ - Minimum 80% coverage target
155
+ - All edge cases covered
156
+ - Error scenarios tested
157
+ - Boundary conditions verified
158
+
159
+ Verify with:
160
+ ```bash
161
+ npm run test:coverage
162
+ ```
163
+
164
+ ## Common Rationalizations
165
+
166
+ | Excuse | Reality |
167
+ |--------|---------|
168
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
169
+ | "I'll test after" | Tests passing immediately prove nothing. |
170
+ | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
171
+ | "Deleting X hours is wasteful" | Sunk cost fallacy. Unverified code is debt. |
172
+ | "Need to explore first" | Fine. Throw away exploration, start with TDD. |
173
+ | "TDD will slow me down" | TDD faster than debugging. |
174
+ | "Keep as reference" | You'll adapt it. That's testing after. Delete. |
175
+
176
+ ## Red Flags - STOP and Start Over
177
+
178
+ If you catch yourself:
179
+ - Writing code before test
180
+ - Test passes immediately (didn't fail first)
181
+ - Can't explain why test failed
182
+ - Rationalizing "just this once"
183
+ - "I already manually tested it"
184
+ - "Keep as reference"
185
+ - "This is different because..."
186
+
187
+ **ALL mean: Delete code. Start over with TDD.**
188
+
189
+ ## Testing Anti-Patterns
190
+
191
+ | Anti-Pattern | Problem | Fix |
192
+ |--------------|---------|-----|
193
+ | Testing implementation | Breaks on refactor | Test behavior/output |
194
+ | Brittle selectors | `.css-class-xyz` | `[data-testid="x"]` |
195
+ | Test interdependence | Tests depend on order | Each test sets up own data |
196
+ | Mock everything | Tests mock, not code | Real code, inject deps |
197
+ | One giant test | Can't isolate failure | One behavior per test |
198
+
199
+ ## TDD Report Format
200
+
201
+ When reporting TDD work:
202
+
203
+ ```
204
+ 🧪 TDD Report
205
+ =============
206
+
207
+ Feature: {what was implemented}
208
+
209
+ Cycle 1:
210
+ RED: test('rejects empty email')
211
+ Verified: ✓ Failed with "undefined is not 'Email required'"
212
+ GREEN: Added email validation
213
+ Verified: ✓ Passes
214
+
215
+ Cycle 2:
216
+ RED: test('rejects invalid format')
217
+ Verified: ✓ Failed with expected message
218
+ GREEN: Added regex check
219
+ Verified: ✓ Passes
220
+
221
+ Coverage: 87% (target: 80%)
222
+ All tests: 47/47 passing
223
+ ```
224
+
225
+ ## Bug Fix with TDD
226
+
227
+ 1. Write failing test reproducing bug
228
+ 2. Verify it fails (proves bug exists)
229
+ 3. Fix bug
230
+ 4. Verify test passes
231
+ 5. Test now prevents regression
232
+
233
+ **Never fix bugs without a test.**
234
+
235
+ ## Integration with Debugging
236
+
237
+ When debugging reveals root cause:
238
+ 1. Write test that would have caught it
239
+ 2. Verify test fails
240
+ 3. Apply fix
241
+ 4. Verify test passes
242
+
243
+ This ensures the bug never returns.
244
+
245
+ ## Verification Checklist
246
+
247
+ Before marking work complete:
248
+
249
+ - [ ] Every new function has a test
250
+ - [ ] Watched each test fail before implementing
251
+ - [ ] Each test failed for expected reason
252
+ - [ ] Wrote minimal code to pass
253
+ - [ ] All tests pass
254
+ - [ ] Coverage meets threshold
255
+ - [ ] No skipped/disabled tests
256
+
257
+ Can't check all boxes? You skipped TDD. Start over.