compound-agent 1.5.0 → 1.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +87 -1
- package/README.md +57 -44
- package/dist/cli.js +3652 -2593
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +149 -15
- package/dist/index.js +2001 -1374
- package/dist/index.js.map +1 -1
- package/docs/research/spec_design/decision_theory_specifications_and_multi_criteria_tradeoffs.md +0 -0
- package/docs/research/spec_design/design_by_contract.md +251 -0
- package/docs/research/spec_design/domain_driven_design_strategic_modeling.md +183 -0
- package/docs/research/spec_design/formal_specification_methods.md +161 -0
- package/docs/research/spec_design/logic_and_proof_theory_under_the_curry_howard_correspondence.md +250 -0
- package/docs/research/spec_design/natural_language_formal_semantics_abuguity_in_specifications.md +259 -0
- package/docs/research/spec_design/requirements_engineering.md +234 -0
- package/docs/research/spec_design/systems_engineering_specifications_emergent_behavior_interface_contracts.md +149 -0
- package/docs/research/spec_design/what_is_this_about.md +305 -0
- package/llms.txt +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,305 @@
|
|
|
1
|
+
# The Big Picture: What Are These Papers About?
|
|
2
|
+
|
|
3
|
+
These 8 papers form a coherent body of research around one central question:
|
|
4
|
+
|
|
5
|
+
How do we precisely describe what software systems should do, and why is that so hard?
|
|
6
|
+
|
|
7
|
+
They attack this question from different angles. Here's how they fit together:
|
|
8
|
+
|
|
9
|
+
THE SPECIFICATION LANDSCAPE
|
|
10
|
+
|
|
11
|
+
┌─────────────────────────────────────────────────────────┐
|
|
12
|
+
│ │
|
|
13
|
+
│ PROBLEM SPACE SOLUTION SPACE │
|
|
14
|
+
│ (What do we need?) (How do we express it?) │
|
|
15
|
+
│ │
|
|
16
|
+
│ ┌──────────────┐ ┌───────────────────┐ │
|
|
17
|
+
│ │ Requirements │──────────>│ Formal Spec │ │
|
|
18
|
+
│ │ Engineering │ │ Methods │ │
|
|
19
|
+
│ └──────┬───────┘ │ (TLA+, Alloy, Z) │ │
|
|
20
|
+
│ │ └─────────┬─────────┘ │
|
|
21
|
+
│ │ │ │
|
|
22
|
+
│ ┌──────▼───────┐ ┌─────────▼─────────┐ │
|
|
23
|
+
│ │ Domain-Driven│ │ Design by │ │
|
|
24
|
+
│ │ Design │──────────>│ Contract │ │
|
|
25
|
+
│ └──────┬───────┘ └─────────┬─────────┘ │
|
|
26
|
+
│ │ │ │
|
|
27
|
+
│ │ WHY IT'S HARD │ │
|
|
28
|
+
│ │ ┌────────────────┐ │ │
|
|
29
|
+
│ └──│ NL Ambiguity │─────────┘ │
|
|
30
|
+
│ └────────────────┘ │
|
|
31
|
+
│ │
|
|
32
|
+
│ DEEPER FOUNDATIONS AT SCALE │
|
|
33
|
+
│ ┌──────────────┐ ┌───────────────────┐ │
|
|
34
|
+
│ │ Curry-Howard │ │ Systems Eng. │ │
|
|
35
|
+
│ │ (proofs = │ │ (emergent │ │
|
|
36
|
+
│ │ programs) │ │ behavior) │ │
|
|
37
|
+
│ └──────────────┘ └───────────────────┘ │
|
|
38
|
+
│ │
|
|
39
|
+
│ WHEN OBJECTIVES CONFLICT │
|
|
40
|
+
│ ┌──────────────────────────────────────┐ │
|
|
41
|
+
│ │ Decision Theory & Multi-Criteria │ │
|
|
42
|
+
│ │ Trade-offs │ │
|
|
43
|
+
│ └──────────────────────────────────────┘ │
|
|
44
|
+
└─────────────────────────────────────────────────────────┘
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
# Paper by Paper: Key Concepts Explained
|
|
48
|
+
|
|
49
|
+
## 1. Requirements Engineering -- "What do we actually need?"
|
|
50
|
+
|
|
51
|
+
This is the starting point of any software project. Requirements Engineering (RE) is about discovering, writing down, and maintaining what a system should do.
|
|
52
|
+
|
|
53
|
+
Core insight: RE isn't one method -- it's 8 families of approaches, each with trade-offs:
|
|
54
|
+
|
|
55
|
+
Accessibility Precision
|
|
56
|
+
(anyone can (mathematically
|
|
57
|
+
understand) rigorous)
|
|
58
|
+
│ │
|
|
59
|
+
▼ ▼
|
|
60
|
+
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
|
|
61
|
+
│Interviews│ │Templates │ │Goal │ │Formal │
|
|
62
|
+
│Scenarios │ │Standards │ │Models │ │Methods │
|
|
63
|
+
│Stories │ │Patterns │ │Use Cases │ │Proofs │
|
|
64
|
+
└──────────┘ └──────────┘ └──────────┘ └─────────┘
|
|
65
|
+
Easy to Scalable Systematic Provably
|
|
66
|
+
do, but but rigid but costly correct but
|
|
67
|
+
imprecise expert-only
|
|
68
|
+
|
|
69
|
+
Key takeaway: No single approach dominates. Teams combine methods based on risk. Safety-critical systems need formal methods; startups need lightweight stories. The pain points remain: incomplete
|
|
70
|
+
requirements, communication flaws, and volatile requirements.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
## 2. Natural Language & Ambiguity -- "Why plain English specs fail"
|
|
74
|
+
|
|
75
|
+
This paper explains why writing specs is fundamentally hard. It's not a process problem -- it's a linguistic one. Natural language is inherently ambiguous.
|
|
76
|
+
|
|
77
|
+
The 9 sources of trouble:
|
|
78
|
+
|
|
79
|
+
┌──────────────────────┬───────────────────────────────────────────────────────────────┐
|
|
80
|
+
│ Source │ Example in a Spec │
|
|
81
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
82
|
+
│ Lexical ambiguity │ "The system shall handle errors" (log? retry? ignore?) │
|
|
83
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
84
|
+
│ Structural ambiguity │ "Every user reads some report" (same report? different ones?) │
|
|
85
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
86
|
+
│ Vagueness │ "The system shall be fast" (how fast?) │
|
|
87
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
88
|
+
│ Indexicals │ "The current user" (current when?) │
|
|
89
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
90
|
+
│ Presupposition │ "Resume the process" (assumes it was running) │
|
|
91
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
92
|
+
│ Implicature │ "Some tests passed" (implies not all) │
|
|
93
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
94
|
+
│ Speech acts │ "The system should..." (obligation? suggestion?) │
|
|
95
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
96
|
+
│ Context-dependence │ Meaning shifts across teams/domains │
|
|
97
|
+
├──────────────────────┼───────────────────────────────────────────────────────────────┤
|
|
98
|
+
│ Underspecification │ Deliberately leaving details open │
|
|
99
|
+
└──────────────────────┴───────────────────────────────────────────────────────────────┘
|
|
100
|
+
|
|
101
|
+
Key takeaway: Ambiguity isn't sloppy writing. It's built into how language works. You can't eliminate it -- you can only manage it by choosing the right level of formality for your context.
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
## 3. Domain-Driven Design -- "Speak the same language"
|
|
105
|
+
|
|
106
|
+
DDD tackles the gap between how domain experts think and how developers code. The thesis: projects fail not from bad specs, but from specs that encode a wrong understanding of the domain.
|
|
107
|
+
|
|
108
|
+
Three pillars:
|
|
109
|
+
|
|
110
|
+
┌──────────────────────────────────────────────┐
|
|
111
|
+
│ UBIQUITOUS LANGUAGE │
|
|
112
|
+
│ A shared vocabulary used by EVERYONE: │
|
|
113
|
+
│ domain experts, devs, docs, and code │
|
|
114
|
+
│ │
|
|
115
|
+
│ "Policy" means the same thing in meetings, │
|
|
116
|
+
│ in Jira tickets, AND in class names │
|
|
117
|
+
└────────────────────┬─────────────────────────┘
|
|
118
|
+
│
|
|
119
|
+
┌──────────┴──────────┐
|
|
120
|
+
▼ ▼
|
|
121
|
+
┌─────────────────┐ ┌───────────────────┐
|
|
122
|
+
│ BOUNDED CONTEXTS│ │ SUBDOMAIN │
|
|
123
|
+
│ │ │ CLASSIFICATION │
|
|
124
|
+
│ A boundary where│ │ │
|
|
125
|
+
│ one model and │ │ Core: your edge │
|
|
126
|
+
│ one language │ │ Supporting: needed│
|
|
127
|
+
│ stay consistent │ │ Generic: buy it │
|
|
128
|
+
└─────────────────┘ └───────────────────┘
|
|
129
|
+
|
|
130
|
+
Why this matters for specs: If your "Order" means one thing in billing and another in shipping, your specification is already broken before you write a single line of formal logic. DDD ensures the conceptual foundation is right before you formalize anything.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
## 4. Formal Specification Methods -- "Math instead of prose"
|
|
134
|
+
|
|
135
|
+
Once you understand the domain, how do you express behavior precisely? These four methods use mathematics:
|
|
136
|
+
|
|
137
|
+
BEHAVIORAL STRUCTURAL
|
|
138
|
+
(what happens (what exists
|
|
139
|
+
over time) and relates)
|
|
140
|
+
│ │
|
|
141
|
+
┌───────────────┤ ┌──────────┤
|
|
142
|
+
▼ ▼ ▼ ▼
|
|
143
|
+
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────┐
|
|
144
|
+
│ TLA+ │ │ VDM │ │ Alloy │ │ Z │
|
|
145
|
+
│ │ │ │ │ │ │ │
|
|
146
|
+
│Temporal │ │Stepwise │ │SAT-based│ │Schema│
|
|
147
|
+
│logic + │ │refine- │ │bounded │ │calc- │
|
|
148
|
+
│set │ │ment │ │analysis │ │ulus │
|
|
149
|
+
│theory │ │ │ │ │ │ │
|
|
150
|
+
└─────────┘ └─────────┘ └─────────┘ └──────┘
|
|
151
|
+
|
|
152
|
+
Concurrent Abstract → "Show me a Modular
|
|
153
|
+
systems, Concrete, counter- math specs
|
|
154
|
+
safety & proof-driven example" for data-
|
|
155
|
+
liveness exploration intensive
|
|
156
|
+
systems
|
|
157
|
+
|
|
158
|
+
The core trade-off: Automation vs. Expressiveness
|
|
159
|
+
|
|
160
|
+
- Alloy: Fully automatic (SAT solver finds bugs for you), but only checks bounded scopes
|
|
161
|
+
- TLA+: Rich temporal reasoning, but model checking hits state explosion
|
|
162
|
+
- Z/VDM: Expressive and modular, but proof effort is manual
|
|
163
|
+
|
|
164
|
+
Key takeaway: Each lives in a different sweet spot. Use Alloy for fast "what-if" exploration, TLA+ for concurrent algorithm verification, Z/VDM for data-heavy systems with refinement paths.
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
## 5. Design by Contract -- "Agreements between code components"
|
|
168
|
+
|
|
169
|
+
DbC makes behavior explicit at the code level through three elements:
|
|
170
|
+
|
|
171
|
+
CALLER CALLEE
|
|
172
|
+
────── ──────
|
|
173
|
+
|
|
174
|
+
"I promise the input "I promise the output
|
|
175
|
+
is valid" is correct"
|
|
176
|
+
│ │
|
|
177
|
+
▼ ▼
|
|
178
|
+
┌────────────┐ call ┌────────────────┐
|
|
179
|
+
│PRECONDITION│───────────>│ POSTCONDITION │
|
|
180
|
+
│ │ │ │
|
|
181
|
+
│ x > 0 │ │ result = √x │
|
|
182
|
+
└────────────┘ └────────────────┘
|
|
183
|
+
│
|
|
184
|
+
┌──────▼──────┐
|
|
185
|
+
│ INVARIANT │
|
|
186
|
+
│ (always │
|
|
187
|
+
│ true) │
|
|
188
|
+
│ balance >= 0│
|
|
189
|
+
└─────────────┘
|
|
190
|
+
|
|
191
|
+
The paper covers four levels of contracts:
|
|
192
|
+
|
|
193
|
+
1. Code-level DbC (Eiffel, JML): Pre/post/invariants in the language itself
|
|
194
|
+
2. Behavioral subtyping: Subtypes must honor parent contracts (Liskov principle)
|
|
195
|
+
3. Example-based specs (BDD/Gherkin): Given-When-Then scenarios as executable contracts
|
|
196
|
+
4. Component contracts: Assume-guarantee reasoning at architecture level
|
|
197
|
+
|
|
198
|
+
Key takeaway: Contracts work at every scale -- from a single function to an entire system architecture. The trade-off is always formality vs. accessibility: formal contracts enable proofs, BDD scenarios enable communication with stakeholders.
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
## 6. Curry-Howard Correspondence -- "Proofs ARE programs"
|
|
202
|
+
|
|
203
|
+
This is the deepest theoretical paper. Its insight is profound:
|
|
204
|
+
|
|
205
|
+
LOGIC WORLD PROGRAMMING WORLD
|
|
206
|
+
─────────── ─────────────────
|
|
207
|
+
|
|
208
|
+
Proposition ═══════════ Type
|
|
209
|
+
Proof ═══════════ Program
|
|
210
|
+
Proof checking ═══════════ Type checking
|
|
211
|
+
Proof ═══════════ Program
|
|
212
|
+
simplification execution
|
|
213
|
+
|
|
214
|
+
A → B ═══════════ Function: A → B
|
|
215
|
+
A ∧ B ═══════════ Tuple: (A, B)
|
|
216
|
+
A ∨ B ═══════════ Either A or B
|
|
217
|
+
∀x.P(x) ═══════════ Dependent function
|
|
218
|
+
∃x.P(x) ═══════════ Dependent pair
|
|
219
|
+
|
|
220
|
+
What this means practically: Writing a type signature is stating a theorem. Writing a program that compiles is proving that theorem. A type checker is an automated proof verifier.
|
|
221
|
+
|
|
222
|
+
The paper traces this through 5 increasingly powerful systems:
|
|
223
|
+
|
|
224
|
+
Simple types ──> Dependent types ──> HoTT
|
|
225
|
+
(Haskell) (Coq, Agda) (cutting edge)
|
|
226
|
+
│
|
|
227
|
+
Linear logic ──> Session types │
|
|
228
|
+
(resources) (protocols) │
|
|
229
|
+
▼
|
|
230
|
+
Classical logic via CPS Types = Spaces
|
|
231
|
+
(control flow = proof) Proofs = Paths
|
|
232
|
+
Equality = Equivalence
|
|
233
|
+
|
|
234
|
+
Key takeaway: This isn't just theory. Proof assistants like Coq and Agda use Curry-Howard to let you write programs that are mathematically guaranteed correct. The spec IS the code.
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
## 7. Systems Engineering -- "When the whole is more than the parts"
|
|
238
|
+
|
|
239
|
+
This paper tackles what happens at scale: individual components can each meet their specs perfectly, yet the whole system fails due to emergent behavior.
|
|
240
|
+
|
|
241
|
+
Component A: ✓ meets spec ┐
|
|
242
|
+
Component B: ✓ meets spec ├──> System: ✗ FAILS
|
|
243
|
+
Component C: ✓ meets spec │ (unexpected interaction)
|
|
244
|
+
Interface AB: ✓ defined │
|
|
245
|
+
Interface BC: ✓ defined ┘
|
|
246
|
+
|
|
247
|
+
WHY? Implicit interfaces, resource contention, timing dependencies, environmental factors
|
|
248
|
+
|
|
249
|
+
Five approaches to manage this:
|
|
250
|
+
|
|
251
|
+
1. Lifecycle standards (NASA, DoD, ISO 15288): Structured processes with iterative verification
|
|
252
|
+
2. Interface management: ICDs, IRDs -- treating interfaces as first-class engineering objects
|
|
253
|
+
3. MBSE (SysML): Replace documents with analyzable models
|
|
254
|
+
4. Formal methods (CSP, model checking): Mathematically verify interaction properties
|
|
255
|
+
5. Emergent-behavior architectures: Design for monitoring and adapting to surprises
|
|
256
|
+
|
|
257
|
+
Key takeaway: No single technique catches everything. You need layers of defense: process governance + structural models + formal verification + runtime monitoring.
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
## 8. Decision Theory & Trade-offs -- "When you can't have it all"
|
|
261
|
+
|
|
262
|
+
Every spec involves trade-offs: performance vs. cost, security vs. usability, speed vs. accuracy. This paper presents three frameworks for handling them:
|
|
263
|
+
|
|
264
|
+
MCDA PARETO SATISFICING
|
|
265
|
+
"Score and rank" "Show all options" "Good enough"
|
|
266
|
+
|
|
267
|
+
Criteria × Weights Find the frontier Set thresholds
|
|
268
|
+
│ │ │
|
|
269
|
+
▼ ▼ ▼
|
|
270
|
+
┌──────────┐ ○ ○ ─ ─ ─ ─ ─ ─
|
|
271
|
+
│ Alt A: 87│ ○ ○ ← Pareto threshold │
|
|
272
|
+
│ Alt B: 72│ ○ ○ frontier │ │
|
|
273
|
+
│ Alt C: 91│ ← pick ○ ○ ──┼───────┼──
|
|
274
|
+
└──────────┘ ○ ○ │ ✓ │
|
|
275
|
+
"first one │
|
|
276
|
+
that clears│
|
|
277
|
+
the bar" │
|
|
278
|
+
|
|
279
|
+
When: Regulatory When: Design When: Large
|
|
280
|
+
decisions, clear exploration, search space,
|
|
281
|
+
stakeholders engineering limited time
|
|
282
|
+
|
|
283
|
+
The fundamental tension: When do you commit to value judgments?
|
|
284
|
+
- MCDA: upfront (you assign weights first)
|
|
285
|
+
- Pareto: deferred (explore all efficient options, then choose)
|
|
286
|
+
- Satisficing: adaptive (set aspiration levels, adjust as you go)
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
# The Unifying Thread
|
|
290
|
+
|
|
291
|
+
All 8 papers converge on one insight: there is no silver bullet for specification. Every approach trades off along these axes:
|
|
292
|
+
|
|
293
|
+
Formal ◄──────────────────────────► Informal
|
|
294
|
+
(precise, verifiable, (accessible, flexible,
|
|
295
|
+
expensive, expert-only) cheap, ambiguous)
|
|
296
|
+
|
|
297
|
+
Upfront ◄──────────────────────────► Emergent
|
|
298
|
+
(commit early, (discover as you go,
|
|
299
|
+
stable but rigid) adaptive but risky)
|
|
300
|
+
|
|
301
|
+
Local ◄────────────────────────────► Global
|
|
302
|
+
(one component, (whole system,
|
|
303
|
+
tractable) intractable)
|
|
304
|
+
|
|
305
|
+
The best practitioners combine approaches: lightweight elicitation to start, DDD to align language, formal methods for critical components, contracts for interfaces, and decision frameworks to navigate trade-offs. The papers collectively make a strong case that understanding why specification is hard (linguistics, logic, emergence, competing objectives) is the prerequisite for doing it well.
|
package/llms.txt
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
> Semantic memory plugin for Claude Code. Captures lessons from mistakes, corrections, and discoveries. Stores them in git-tracked JSONL with SQLite FTS5 indexing and local vector embeddings (EmbeddingGemma-300M via node-llama-cpp). Retrieves relevant knowledge at session start, during planning, and before architectural decisions. Install as a dev dependency, run `npx ca setup`, and every work cycle compounds -- mistakes become searchable lessons that prevent recurrence.
|
|
4
4
|
|
|
5
|
-
Compound Agent is a TypeScript CLI tool (`ca`) installable via npm/pnpm. It integrates with Claude Code through hooks (SessionStart, PreCompact, UserPromptSubmit, PostToolUseFailure, PostToolUse) and slash commands (/compound:
|
|
5
|
+
Compound Agent is a TypeScript CLI tool (`ca`) installable via npm/pnpm. It integrates with Claude Code through hooks (SessionStart, PreCompact, UserPromptSubmit, PostToolUseFailure, PostToolUse) and slash commands (/compound:spec-dev, /compound:plan, /compound:work, /compound:review, /compound:compound, /compound:cook-it). Storage uses a three-layer architecture: Beads for issue tracking, Semantic Memory for knowledge, and Workflows for structured development phases.
|
|
6
6
|
|
|
7
7
|
## Core Documentation
|
|
8
8
|
|