@simpleapps-com/augur-skills 2026.4.16 → 2026.4.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/simpleapps/bash-simplicity/SKILL.md +21 -17
- package/skills/simpleapps/code-contracts/SKILL.md +270 -0
- package/skills/simpleapps/code-contracts/apply.md +154 -0
- package/skills/simpleapps/code-contracts/audit.md +173 -0
- package/skills/simpleapps/code-contracts/vocabulary.md +196 -0
- package/skills/simpleapps/project-defaults/SKILL.md +14 -13
- package/skills/simpleapps/wiki/SKILL.md +4 -4
- package/skills/simpleapps/work-habits/SKILL.md +26 -19
package/package.json
CHANGED
|
@@ -18,8 +18,8 @@ The entire plugin system exists to remove the user as the bottleneck. Every perm
|
|
|
18
18
|
|
|
19
19
|
| Tier | Method | Speed | Example |
|
|
20
20
|
|------|--------|-------|---------|
|
|
21
|
-
| 1 | Dedicated tools (Read,
|
|
22
|
-
| 2 | Simple Bash (one command, no operators) | **MAY** run immediately if pre-approved | `pnpm typecheck` |
|
|
21
|
+
| 1 | Dedicated tools (Read, Edit, Write) | **WILL** run immediately, zero permission chance | `Read(file_path: "repo/src/foo.ts")` |
|
|
22
|
+
| 2 | Simple Bash (one command, no operators) | **MAY** run immediately if pre-approved | `pnpm typecheck`, `grep -rn pattern repo` |
|
|
23
23
|
| 3 | Complex Bash (operators, plumbing) | **WILL** trigger a permission prompt | `pnpm typecheck 2>&1; echo $?` |
|
|
24
24
|
|
|
25
25
|
Prefer tier 1 over tier 2. Use tier 2 only when no dedicated tool exists. NEVER use tier 3.
|
|
@@ -37,26 +37,32 @@ The Bash tool is a managed environment, not a raw shell. It already captures std
|
|
|
37
37
|
| Limit output | Returned in full | `\| head`, `\| tail`, `\| grep` |
|
|
38
38
|
| Run the next step | Make a separate tool call | `&&`, `;`, `\|\|` |
|
|
39
39
|
| Pass output to another command | Write to a tmp file | `$(...)`, backticks |
|
|
40
|
-
| Run inline code | Use Read/
|
|
40
|
+
| Run inline code | Use Read/Edit tools | `node -e`, `python -c` |
|
|
41
41
|
|
|
42
42
|
**One command per Bash call. No operators. No plumbing. If the command has a `;`, `&&`, `|`, `$()`, `2>&1`, or `2>/dev/null` in it, it is wrong.**
|
|
43
43
|
|
|
44
44
|
## Use Dedicated Tools
|
|
45
45
|
|
|
46
|
-
Dedicated tools are faster, require no permission, and produce better output. MUST use them instead of Bash equivalents:
|
|
46
|
+
Dedicated tools are faster, require no permission, and produce better output. MUST use them instead of Bash equivalents when one exists:
|
|
47
47
|
|
|
48
48
|
| Instead of | Use |
|
|
49
49
|
|------------|-----|
|
|
50
|
-
| `grep`, `rg` | Grep tool |
|
|
51
|
-
| `find`, `ls` (for search) | Glob tool |
|
|
52
50
|
| `cat`, `head`, `tail` | Read tool |
|
|
53
51
|
| `sed`, `awk` | Edit tool |
|
|
54
52
|
| `echo >`, `cat <<EOF` | Write tool |
|
|
55
53
|
|
|
56
|
-
|
|
54
|
+
**Search is now Bash-only.** Claude Code 2.1.117 removed the dedicated Grep and Glob tools. Search files with one of:
|
|
55
|
+
|
|
56
|
+
| Use case | Bash command |
|
|
57
|
+
|----------|--------------|
|
|
58
|
+
| Search file contents | `grep -rn <pattern> <path>` or `rg <pattern> <path>` |
|
|
59
|
+
| Find files by name | `find <path> -name <pattern>` |
|
|
60
|
+
| List directory entries | `ls <path>` |
|
|
61
|
+
|
|
62
|
+
Reserve Bash for these and for commands that never had a dedicated tool: build tools, test runners, git, package managers, system commands.
|
|
57
63
|
|
|
58
64
|
These commands are **denied** in project settings and will always be rejected. Do not attempt them:
|
|
59
|
-
`cd`, `cat`, `
|
|
65
|
+
`cd`, `cat`, `sed`, `awk`, `head`, `tail`, `sleep`, `kill`, `pkill`
|
|
60
66
|
|
|
61
67
|
MUST NOT use `node -e` or `python -c` to run inline scripts. These trigger permission prompts. If you need to read a file, use the Read tool. If you need to process data, do it in your response, not in a shell script.
|
|
62
68
|
|
|
@@ -64,13 +70,11 @@ MUST NOT use `node -e` or `python -c` to run inline scripts. These trigger permi
|
|
|
64
70
|
|
|
65
71
|
If a Bash call is denied, do NOT retry the same command and do NOT ask the user to approve it. Before anything else, check for a tool equivalent or shell plumbing that can be decomposed:
|
|
66
72
|
|
|
67
|
-
- `grep`/`rg` → Grep tool (for files on disk); for command output, the Bash tool already returned it — read what you have
|
|
68
|
-
- `find`/`ls` → Glob tool
|
|
69
73
|
- `cat`/`head`/`tail` → Read tool
|
|
70
74
|
- `sed`/`awk` → Edit tool
|
|
71
75
|
- `|`, `2>&1`, `&&`, `;`, `$()` → split into separate calls; the Bash tool already captures stdout, stderr, and exit code
|
|
72
76
|
|
|
73
|
-
Worked example: `pnpm --filter <package> typecheck 2>&1 | grep -c "error TS"` is denied because of the pipe
|
|
77
|
+
Worked example: `pnpm --filter <package> typecheck 2>&1 | grep -c "error TS"` is denied because of the pipe and redirection. The fix is to run `pnpm --filter <package> typecheck` alone — the Bash tool returns the full output and exit code — then count "error TS" occurrences in the returned output yourself. No pipe, no redirection, no retry. (`grep` itself is allowed; the deny is on the shell plumbing around it.)
|
|
74
78
|
|
|
75
79
|
## Background Tasks
|
|
76
80
|
|
|
@@ -96,21 +100,21 @@ Do not retry the server start until the user confirms the port is free.
|
|
|
96
100
|
|
|
97
101
|
## Cross-Project Searching
|
|
98
102
|
|
|
99
|
-
When looking at another project's code,
|
|
103
|
+
When looking at another project's code, search with Bash directly using the project path. MUST keep it to one simple command per call — no pipes, no `-exec`, no `2>&1 | head`.
|
|
100
104
|
|
|
101
105
|
Wrong: `find {path}/repo -name "*.ts" -exec grep -l "pattern" {} \; 2>/dev/null | head -10`
|
|
102
|
-
Right: `
|
|
106
|
+
Right: `grep -rln --include="*.ts" "pattern" {path}/repo`
|
|
103
107
|
|
|
104
|
-
Wrong: `ls {path}/repo/src/components
|
|
105
|
-
Right: `
|
|
108
|
+
Wrong: `ls {path}/repo/src/components/ | head`
|
|
109
|
+
Right: `ls {path}/repo/src/components/`
|
|
106
110
|
|
|
107
|
-
All project paths are known and predictable (see `simpleapps:wiki` Cross-Project Wiki Access).
|
|
111
|
+
All project paths are known and predictable (see `simpleapps:wiki` Cross-Project Wiki Access). Use the known path; do not search the entire filesystem.
|
|
108
112
|
|
|
109
113
|
## Subagent Responsibility
|
|
110
114
|
|
|
111
115
|
Subagents do NOT inherit this skill. They see only the prompt you give them. The primary agent MUST brief every subagent on bash-simplicity before delegating shell work, and owns the output that comes back.
|
|
112
116
|
|
|
113
|
-
Every subagent prompt that touches Bash MUST include a one-liner: "One command per Bash call. No operators. Use dedicated tools (Read,
|
|
117
|
+
Every subagent prompt that touches Bash MUST include a one-liner: "One command per Bash call. No operators. Use dedicated tools (Read, Edit, Write) over their shell equivalents (`cat`, `sed`, `awk`, `echo >`). Search with Bash directly: `grep -rn`, `find`, `ls` — Claude Code 2.1.117 removed the Grep/Glob tools."
|
|
114
118
|
|
|
115
119
|
If a subagent returns a command containing any forbidden operator (see the table above), that is the primary agent's failure. Reject and ask for a re-plan, or translate into separate simple calls. Do not execute it. A subagent violating this is running on a stale prompt; fix the prompt.
|
|
116
120
|
|
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-contracts
|
|
3
|
+
description: When working in load-bearing code (the 5-10% where correctness matters most — money math, auth, concurrency, state machines, security boundaries), tighten the type system first, then add formal contracts (@requires/@ensures/@invariant/@trusted) in the host language's native comment syntax. Use Unicode glyphs (∀, ∈, ≥, ℕ) for AI priming, paired with ASCII gloss for human readers. The contracts pay off through three independent mechanisms: priming agent reasoning, surfacing tests, and adding context-window value that agents themselves report. Drift between contract and code is a defect.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Code Contracts
|
|
7
|
+
|
|
8
|
+
> **Status: EXPERIMENTAL** for the priming hypothesis (mechanism #1). The test-surfacing (#2) and agent-reported-value (#3) mechanisms are observed in practice. Apply selectively to load-bearing code; treat as a defensible bet across three channels — pays off if any one of them lands.
|
|
9
|
+
|
|
10
|
+
Persistent prompt engineering encoded in source code. Formal contracts on load-bearing functions pay off through three independent mechanisms.
|
|
11
|
+
|
|
12
|
+
## Three mechanisms — why this is worth the context cost
|
|
13
|
+
|
|
14
|
+
| # | Mechanism | What it does | Status |
|
|
15
|
+
|---|-----------|--------------|--------|
|
|
16
|
+
| 1 | Latent-space priming | Unicode glyph rarity pulls hidden state toward the formal-methods neighborhood — sharper reasoning on the *next edit* | Hypothesis (defensible bet) |
|
|
17
|
+
| 2 | Test surfacing | Explicit clauses make intent legible — agents generate *new and better tests* using each clause as an oracle | **Observed in practice** |
|
|
18
|
+
| 3 | Agent-reported value | Agents themselves report that contracts add useful context for complex / load-bearing methods | **Observed in practice** |
|
|
19
|
+
|
|
20
|
+
The contracts are the artifact. The three mechanisms are *consequences*. Even if mechanism #1 is weaker than hoped, mechanisms #2 and #3 are already paying off.
|
|
21
|
+
|
|
22
|
+
## What this is — and is not
|
|
23
|
+
|
|
24
|
+
This is **not** documentation. It is **not** a parallel comment style for human readers alone. It is a cognitive switch that targets future agents reading the file, paired with an ASCII gloss that bridges human readers.
|
|
25
|
+
|
|
26
|
+
Models trained on F\*/Lean/Dafny/Coq corpora develop a "spec-then-implement" reasoning policy — slower, more rigorous, explicit about pre/postconditions and effects. Plain TS/PHP/Python does not activate that policy because the surface form does not match. Unicode-heavy contract prose in the same file *does* match — the model upshifts into the more rigorous mode for the function it precedes.
|
|
27
|
+
|
|
28
|
+
The annotations do not add information the model could not infer. They change the *mode* the model reasons in. Same idea as "let's reason step by step" or "you are an expert at X" — moved out of the system prompt and into the artifact, where it primes every future agent that touches the code, not just the current session.
|
|
29
|
+
|
|
30
|
+
## When to use it
|
|
31
|
+
|
|
32
|
+
MUST apply ONLY in **load-bearing code** — the 5-10% of functions where a subtle bug compounds:
|
|
33
|
+
|
|
34
|
+
- Money math (pricing, tax, fees, totals, currency conversion, rounding)
|
|
35
|
+
- Auth and permission decisions
|
|
36
|
+
- Concurrency and ordering (locks, queues, retries, idempotency keys)
|
|
37
|
+
- State machines and protocols (multi-step flows that must not be entered out of order)
|
|
38
|
+
- Security boundaries (input validation at trust boundaries, sanitization, sinks like SQL/HTML/shell)
|
|
39
|
+
- Algorithms with non-trivial invariants (custom sort/search variants, bespoke data structures)
|
|
40
|
+
|
|
41
|
+
MUST NOT apply by default to:
|
|
42
|
+
|
|
43
|
+
- Getters, setters, simple field accessors
|
|
44
|
+
- Glue code, plumbing, framework adapters
|
|
45
|
+
- UI components, presentational code
|
|
46
|
+
- One-off scripts, throwaway code
|
|
47
|
+
- Test bodies (the test name is the spec)
|
|
48
|
+
|
|
49
|
+
Annotation density has a real context-window cost on every read. MUST concentrate it where the rigorous-reasoning upgrade matters most.
|
|
50
|
+
|
|
51
|
+
## Two surfaces per clause
|
|
52
|
+
|
|
53
|
+
Every clause has two surfaces riding the same line:
|
|
54
|
+
|
|
55
|
+
| Surface | Purpose |
|
|
56
|
+
|---------|---------|
|
|
57
|
+
| **Formal** (Unicode) | Activates the rigorous-reasoning latent circuits in readers with the bandwidth to parse it. Rarity is the activation. |
|
|
58
|
+
| **Prose** (gloss + assumptions) | Renders the same content as natural language **with assumptions named**, so readers with less parsing bandwidth derive the same conclusions without paying a notation tax. |
|
|
59
|
+
|
|
60
|
+
These are not redundant translations. **The prose surface carries assumptions, not just symbols.**
|
|
61
|
+
|
|
62
|
+
### The gloss MUST name assumptions
|
|
63
|
+
|
|
64
|
+
When a clause depends on something the formal notation does not make explicit — a precondition the caller is presumed to satisfy, a sentinel value the contract treats as out-of-scope, a side condition the body relies on — the gloss MUST name it. Naming what is *not* part of the contract is as important as naming what is.
|
|
65
|
+
|
|
66
|
+
The formal notation cannot say "and X is assumed to hold." The gloss can. Examples:
|
|
67
|
+
|
|
68
|
+
- `@requires q ≥ 0 // q is non-negative; assumes caller validated input, no NaN check here`
|
|
69
|
+
- `@ensures result ∈ ℕ // result is a non-negative integer; overflow is caller's responsibility`
|
|
70
|
+
- `@invariant balance ≥ 0 // balance never negative; assumes no concurrent mutators outside this lock`
|
|
71
|
+
|
|
72
|
+
Without explicit assumption-naming, a reader must derive the assumptions from negative space — by inspecting callers, the implementation, related tests. That derivation is the parsing tax that costs careful reasoning capacity. The gloss eliminates the tax by naming it directly.
|
|
73
|
+
|
|
74
|
+
### Why both surfaces
|
|
75
|
+
|
|
76
|
+
Dense formal notation can crowd out attention to factual claims when parsing it costs the reader most of their bandwidth — the cognitive cost of parsing symbols leaves less capacity for engaging with what the contract actually says. Pairing the formal surface with prose-and-assumptions means:
|
|
77
|
+
|
|
78
|
+
- Readers with high parsing bandwidth get the formal surface activating careful reasoning **plus** the gloss catching what the formal notation leaves implicit
|
|
79
|
+
- Readers with less parsing bandwidth derive the same conclusions from the prose alone, no tax paid
|
|
80
|
+
- Either reader gets a working contract; neither is left to derive the assumptions from negative space
|
|
81
|
+
|
|
82
|
+
This is also why the gloss helps human readers: a junior dev seeing `∀ x ∈ xs. x ≥ 0` cold has nowhere to grab. The same dev seeing `∀ x ∈ xs. x ≥ 0 // every x in xs is non-negative; assumes upstream validation` reads it once and starts building intuition for the symbol set with the assumptions made explicit.
|
|
83
|
+
|
|
84
|
+
### Two pairing patterns
|
|
85
|
+
|
|
86
|
+
Pick one and apply consistently within a file.
|
|
87
|
+
|
|
88
|
+
```ts
|
|
89
|
+
// Pattern A — inline assumption-naming gloss after the formal clause
|
|
90
|
+
@requires q ≥ 0 // q is non-negative; assumes caller validated input
|
|
91
|
+
@ensures result ∈ ℕ // result is a natural number; overflow is caller's responsibility
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
```ts
|
|
95
|
+
// Pattern B — bracketed gloss on the same line
|
|
96
|
+
@requires q ≥ 0 (q is non-negative; assumes caller validated input)
|
|
97
|
+
@ensures ∀ x ∈ items. x.qty ≥ 0 (every item has non-negative qty; empty array is allowed)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Pattern A is denser; Pattern B reads more like prose. Either works.
|
|
101
|
+
|
|
102
|
+
See `vocabulary.md` for the full glyph palette, the clause-first derivation table, and per-language examples.
|
|
103
|
+
|
|
104
|
+
## Order of preference
|
|
105
|
+
|
|
106
|
+
When a function deserves a contract, work down this list. MUST use the first form that expresses the property; SHOULD fall back to looser forms only when the tighter one cannot.
|
|
107
|
+
|
|
108
|
+
### 1. Tighten the type system first
|
|
109
|
+
|
|
110
|
+
A type the compiler enforces beats a comment the compiler ignores. The model reads types the same way it reads contracts.
|
|
111
|
+
|
|
112
|
+
```ts
|
|
113
|
+
// Loose type with comment contract
|
|
114
|
+
// @requires y !== 0
|
|
115
|
+
function divide(x: number, y: number): number { return x / y; }
|
|
116
|
+
|
|
117
|
+
// Branded type that makes the precondition unrepresentable
|
|
118
|
+
type NonZero = number & { readonly __brand: 'NonZero' };
|
|
119
|
+
function divide(x: number, y: NonZero): number { return x / y; }
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Tools by language:
|
|
123
|
+
|
|
124
|
+
- **TypeScript** — branded types, narrow union types, `readonly`, template literal types, exhaustive `switch` over discriminated unions
|
|
125
|
+
- **PHP** — typed properties, `readonly`, enums (8.1+), Psalm template types
|
|
126
|
+
- **Python** — `Literal`, `Final`, `NewType`, `Annotated`, `Protocol`, `assert_never`
|
|
127
|
+
|
|
128
|
+
### 2. Use the language's checker-enforced annotation
|
|
129
|
+
|
|
130
|
+
When the type system cannot express the property, reach for an annotation a real static analyzer enforces. Drift produces a tool error, not just an agent surprise.
|
|
131
|
+
|
|
132
|
+
| Language | Tool | Real annotations |
|
|
133
|
+
|----------|------|------------------|
|
|
134
|
+
| PHP | Psalm | `@psalm-assert`, `@psalm-pure`, `@psalm-immutable`, `@psalm-mutation-free` |
|
|
135
|
+
| PHP | PHPStan | `@phpstan-assert`, `@phpstan-pure`, generic types |
|
|
136
|
+
| TypeScript | tsc + ESLint | branded types, `eslint-plugin-functional` for purity, `assert_never` |
|
|
137
|
+
| Python | mypy / pyright | `assert_type`, `TypeGuard`, `TypeIs`, `Never`, `@final` |
|
|
138
|
+
|
|
139
|
+
### 3. Formal contract for the residual
|
|
140
|
+
|
|
141
|
+
For properties no checker can express — algebraic laws, multi-step protocol invariants, invariants over external state — leave a contract in the host language's native comment syntax with Unicode + ASCII gloss.
|
|
142
|
+
|
|
143
|
+
Annotation forms:
|
|
144
|
+
|
|
145
|
+
- `@requires <precondition>` — must hold of inputs at call time
|
|
146
|
+
- `@ensures <postcondition>` — guaranteed of result / observable state
|
|
147
|
+
- `@invariant <property>` — holds at loop head, between method calls, across state transitions
|
|
148
|
+
- `@trusted <param>` — value inlined into a security-sensitive sink; origin must be trusted code, never user input
|
|
149
|
+
- `@pure` / `@mutates X` / `@throws Y` / `@io` — effect declaration
|
|
150
|
+
- `@property <law>` — algebraic law (idempotence, associativity, commutativity, monotonicity)
|
|
151
|
+
- `@time <bound>` / `@space <bound>` — asymptotic complexity. Use `Θ(...)` for tight bound, `O(...)` for upper bound only, `Ω(...)` for lower bound. Most contracts write `O(...)`; SHOULD prefer `Θ(...)` when the bound is actually tight, because `O(n²)` is technically true of an `O(n)` function and that looseness primes the wrong reasoning.
|
|
152
|
+
|
|
153
|
+
See `vocabulary.md` for the clause-first derivation table (clause shape → encoding tier), the full glyph palette, and the complexity-notation glossary.
|
|
154
|
+
|
|
155
|
+
## Style rule — form is the activation
|
|
156
|
+
|
|
157
|
+
The form MUST match formal-language conventions. Informal prose does not switch the reasoning mode. Four tiers, with the strongest pairing Unicode formal + assumption-naming gloss:
|
|
158
|
+
|
|
159
|
+
- ✗ informal English: `// always positive` — neither activation nor assumptions
|
|
160
|
+
- ✗ ASCII formal alone: `@ensures result >= 0` — weak priming, no assumption-naming
|
|
161
|
+
- ◐ Unicode formal + translation gloss: `@ensures result ≥ 0 // result >= 0` — primes the formal surface, gloss is redundant translation
|
|
162
|
+
- ✓ Unicode formal + assumption-naming gloss: `@ensures result ≥ 0 // result is non-negative; overflow is caller's responsibility` — primes high-bandwidth readers AND gives low-bandwidth readers the same content as prose with assumptions explicit
|
|
163
|
+
|
|
164
|
+
The shape signals "this is a function to reason about formally," not "this is a function to skim and pattern-match." The gloss makes the contract robust across reader capacities.
|
|
165
|
+
|
|
166
|
+
## Semantic-ambiguity second pass
|
|
167
|
+
|
|
168
|
+
After writing range and type constraints, ask: **does any constant or sentinel in this function carry more than one meaning?**
|
|
169
|
+
|
|
170
|
+
Formal annotations bias toward easy formal targets — range, type, sign. They miss *semantic overloading*: a single value standing for two distinct domain states. The blind spot is structural — `@requires x ≥ 0` cannot express "and `0` is distinct from `null`."
|
|
171
|
+
|
|
172
|
+
Common patterns to flag:
|
|
173
|
+
|
|
174
|
+
- `value ?? 0` (or `?? ""`, `?? -1`) where the coalesced-from value and the coalesced-to value carry different meanings downstream
|
|
175
|
+
- Sentinel integers (`-1` for "not found", `999` for "all", `0` for "default")
|
|
176
|
+
- Empty string vs missing string
|
|
177
|
+
- `0` returned from a counter that also legitimately returns `0`
|
|
178
|
+
|
|
179
|
+
When you find one, lift the sentinel into the type — `null | { kind: 'loaded'; price: PositiveAmount } | { kind: 'call-for-price' }` — so the two states are unrepresentable as the same value. Then the formal annotation regains coverage.
|
|
180
|
+
|
|
181
|
+
### Real example
|
|
182
|
+
|
|
183
|
+
`@simpleapps-com/augur-utils` `derive-price.ts` was written with the contract treatment, including a self-aware note about IEEE-754 vs ℝ. It still missed this exact blind spot:
|
|
184
|
+
|
|
185
|
+
```ts
|
|
186
|
+
const unitPrice = priceData.unitPrice ?? 0; // null collapses to 0
|
|
187
|
+
isCallForPrice: unitPrice === 0, // 0 means "call for price"
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
The contract correctly enforces `unitPrice ≥ 0 ∧ Number.isFinite(unitPrice)` — but cannot say "and `null` is distinct from `0`," because `null` was already coalesced away. The fix is to handle `priceData.unitPrice === null` *before* the coalesce, lifting the two states into the type.
|
|
191
|
+
|
|
192
|
+
## Drift is a defect — not a sync target
|
|
193
|
+
|
|
194
|
+
If the code contradicts an annotation, that is a bug. MUST decide which is wrong and fix it. MUST NOT silently rewrite the annotation to match incorrect code.
|
|
195
|
+
|
|
196
|
+
Unenforced annotations that drift do active harm — they prime future agents toward the wrong invariant. A wrong contract is worse than no contract.
|
|
197
|
+
|
|
198
|
+
When you find drift while editing:
|
|
199
|
+
|
|
200
|
+
1. Read the contract carefully
|
|
201
|
+
2. Read the code carefully
|
|
202
|
+
3. Decide which one captures the intended behavior — look at call sites, tests, related code
|
|
203
|
+
4. Fix whichever is wrong
|
|
204
|
+
5. If you cannot tell which is intended, MUST stop and ask the user. MUST NOT guess.
|
|
205
|
+
|
|
206
|
+
### Tautological postconditions
|
|
207
|
+
|
|
208
|
+
A related anti-pattern: postconditions that mirror the constructor. `@ensures result.kind === 'loaded'` on `function loaded(item) { return { kind: 'loaded', item } }` adds nothing — the constructor already guarantees it. Useful postconditions assert properties the *reader of the call* would not derive from the constructor alone (algebraic laws, conservation invariants, observable state changes). Restating the constructor adds bookkeeping without adding reasoning, and is a sign the contract was back-fitted rather than written clause-first.
|
|
209
|
+
|
|
210
|
+
## Why mechanism #1 works (priming hypothesis)
|
|
211
|
+
|
|
212
|
+
LLM behavior is conditional on context shape. Models trained on verified-language corpora develop latent circuits for spec-then-implementation reasoning. Native-syntax contracts in the F\*/Lean/Dafny shape *plausibly* activate those circuits during code generation, review, and refactoring — even though the host language has no formal semantics.
|
|
213
|
+
|
|
214
|
+
**Rarity is the activation.** The Unicode glyphs (∀, ∃, ⟨⟩, ↦, ⊑, ≥, ≤, ≠, ⇒, ∧, ∨, ¬, ℕ, ℝ, ∈, ∉) co-occur in training data with theorem-prover output, type-theory papers, and formal-methods source. Sampling tokens with those glyphs pulls hidden state toward that neighborhood. ASCII transliterations (`forall`, `>=`, `=>`) live in commoner code-review text — for AI priming, that familiarity dampens the shift.
|
|
215
|
+
|
|
216
|
+
The asymmetry: if the priming hypothesis works, as proof-trained model capability improves over time, annotations added today retroactively become more valuable. Zero extra work from the developer; the priming benefit grows with each model upgrade.
|
|
217
|
+
|
|
218
|
+
## Why mechanism #2 works (test surfacing)
|
|
219
|
+
|
|
220
|
+
Distinct from priming. **Observed in practice:** when a function carries explicit clauses, the agent generates *new and better tests* on subsequent edits — tests it would not have surfaced reading the implementation alone.
|
|
221
|
+
|
|
222
|
+
Contracts make intent legible. Each clause is an oracle for at least one test class:
|
|
223
|
+
|
|
224
|
+
- Each `@requires` → input-boundary tests (`q < 0`, `q = 0`, `q = NaN`)
|
|
225
|
+
- Each `@ensures` → property to assert on the output
|
|
226
|
+
- Each `@invariant` → multi-step test (call sequence, then check invariant)
|
|
227
|
+
- Each `@trusted` → fuzz-test target on untrusted inputs
|
|
228
|
+
- Each `@time O(...)` → scaling benchmark (assert the bound holds at n=10, n=100, n=1000)
|
|
229
|
+
- Each `@space O(...)` → memory-stability test (assert the bound holds across input sizes)
|
|
230
|
+
- Clauses also expose edge cases by negation: `@requires q ≥ 0` makes the agent ask "what about q < 0? what about NaN?"
|
|
231
|
+
|
|
232
|
+
Without the contract, the agent has only the function body to inspect — and the body rarely announces its boundaries explicitly. With the contract, every clause is a test-case oracle. This effect is observable session-by-session — test count, edge-case coverage, mutation-test kill rate.
|
|
233
|
+
|
|
234
|
+
### Close cousin: improvement-opportunity surfacing
|
|
235
|
+
|
|
236
|
+
The same legibility that surfaces tests also surfaces *improvement opportunities*. A function annotated `@time O(n²)` invites the agent reading it to ask "can this be improved to `O(n log n)`?" The complexity contract makes the target visible. Without it, the agent maintains the function as written; with it, the agent has an explicit invariant to question. Same mechanism (legible intent → visible gap), different output (refactor candidates instead of test cases).
|
|
237
|
+
|
|
238
|
+
This is why complexity contracts are not just documentation. A function honestly labeled `@time O(n²)` and called from a hot loop is a refactor target the agent will surface on the next read.
|
|
239
|
+
|
|
240
|
+
## Why mechanism #3 works (agent-reported value)
|
|
241
|
+
|
|
242
|
+
Agents themselves report, in active session use, that the contract content adds useful context for complex / load-bearing methods. The cost of carrying contracts in the context window is paid back in usefulness, not just consumed. Qualitative confirmation, but real-world signal that the practice pays off in routine session work.
|
|
243
|
+
|
|
244
|
+
## Evidence and falsifiability
|
|
245
|
+
|
|
246
|
+
Mechanism #1 is **plausible but not directly measured for in-source contracts**. Cite-able adjacent evidence:
|
|
247
|
+
|
|
248
|
+
- **ContractEval** (arXiv 2510.12047) — LLM contract-satisfaction rises from 0% (vanilla) to ~50% when contracts are stated in the prompt. Demonstrates the mechanism works for *prompt-supplied* contracts.
|
|
249
|
+
- **Specification-Guided Repair of Dafny Programs with LLMs** (arXiv 2507.03659) — LLMs reason measurably better when Dafny pre/postconditions are present.
|
|
250
|
+
- **Type-Constrained Code Generation** (arXiv 2504.09246) — type annotations cut hallucinated APIs and compilation errors >50%. Supports "tighten the type system first."
|
|
251
|
+
- **CoT mech-interp** (arXiv 2402.18312, arXiv 2507.22928) — surface cues like "let's think step by step" route through identifiable internal circuits in larger models. Supports "surface form conditions reasoning mode."
|
|
252
|
+
|
|
253
|
+
**Falsifiable prediction:** annotating a load-bearing function with these contracts produces, on the next agent edit, fewer correctness regressions and/or more rigorous reasoning traces than the same function un-annotated.
|
|
254
|
+
|
|
255
|
+
**Practical signals before the formal experiment runs:**
|
|
256
|
+
|
|
257
|
+
- Test-count / edge-case-coverage / mutation-kill-rate delta between Unicode-bearing and ASCII-bearing contracts on equivalent functions (mechanism #2; observable session-by-session)
|
|
258
|
+
- Fewer correctness regressions in functions carrying contract annotations (mechanism #1; quarter-scale)
|
|
259
|
+
- Fewer "oops, missed the case where X" follow-up commits to annotated functions
|
|
260
|
+
- Agent feedback on whether the contracts are pulling weight in their reasoning (mechanism #3; per-session)
|
|
261
|
+
|
|
262
|
+
If signals trend the wrong way over a quarter of usage, the bet has not paid off and the skill SHOULD be downgraded or removed.
|
|
263
|
+
|
|
264
|
+
Until then, treat this skill as a defensible bet across three independent mechanisms.
|
|
265
|
+
|
|
266
|
+
## See also
|
|
267
|
+
|
|
268
|
+
- **`vocabulary.md`** — full Unicode glyph palette, ASCII gloss patterns, clause-first derivation table, per-language examples
|
|
269
|
+
- **`audit.md`** — the audit modes (per-file + session-aware), invoked by `/contract-audit`
|
|
270
|
+
- **`apply.md`** — six-phase loop (Orient → Read → Draft → Trim → Discover → Verify) for adding contracts to existing code
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# Code Contracts — Apply
|
|
2
|
+
|
|
3
|
+
The six-phase loop for adding contracts to existing code. Use when starting from a function that does not yet carry contracts and you have decided (manually or via `audit.md`) it is load-bearing enough to warrant them.
|
|
4
|
+
|
|
5
|
+
## When to invoke this
|
|
6
|
+
|
|
7
|
+
Three triggers:
|
|
8
|
+
|
|
9
|
+
1. **Audit recommended it.** A `/contract-audit` report named this file as a contract candidate.
|
|
10
|
+
2. **You're already editing the file** for unrelated reasons and noticed it deserves contracts. Add them as part of the edit pass; do not split into a separate task.
|
|
11
|
+
3. **A bug fix landed here recently.** The fix is the highest-quality signal that the function carries non-trivial invariants worth documenting. Walk the loop after the fix.
|
|
12
|
+
|
|
13
|
+
MUST NOT invoke broadly. Applying contracts to non-load-bearing code is itself a failure of the practice — it produces JSDoc bloat without the priming or test-surfacing payoff.
|
|
14
|
+
|
|
15
|
+
## The six phases
|
|
16
|
+
|
|
17
|
+
Run them in order. Each phase has a discrete output that feeds the next.
|
|
18
|
+
|
|
19
|
+
### Phase 1: Orient
|
|
20
|
+
|
|
21
|
+
Read the file's recent history. Bug fixes are bug-locus candidates — they reveal where the invariants matter.
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
git -C repo log --oneline -10 -- <path>
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Flag any `fix:` commits as locus candidates. For each, read the commit message and the diff — what invariant was being violated? Capture it; phase 3 will turn it into a clause.
|
|
28
|
+
|
|
29
|
+
Re-verify any wiki rule that may apply (e.g., the project's PHP conventions for contract docblocks, naming conventions, helper patterns) before invoking it. Wiki content drifts; do not rely on stale recall.
|
|
30
|
+
|
|
31
|
+
**Output:** a short list of candidate invariants pulled from recent fix commits.
|
|
32
|
+
|
|
33
|
+
### Phase 2: Read
|
|
34
|
+
|
|
35
|
+
Read three things, in order:
|
|
36
|
+
|
|
37
|
+
1. **The target file** — full content, not skimmed.
|
|
38
|
+
2. **The sibling methods it dispatches to** — if `processOrder` calls `validateLineItems` and `applyTax`, read both.
|
|
39
|
+
3. **The heaviest callers** — find the top three call sites and read enough of each to understand what they assume about this function's behavior.
|
|
40
|
+
|
|
41
|
+
Reasoning from method names alone produces false recommendations. The contract clauses MUST come from observed behavior (the body) AND observed assumptions (the call sites). Skip this phase and the contracts will be plausible-but-wrong — the worst kind of contract.
|
|
42
|
+
|
|
43
|
+
**Output:** a working understanding of what the function actually does, what its callers assume, and any drift between intent and implementation.
|
|
44
|
+
|
|
45
|
+
### Phase 3: Draft
|
|
46
|
+
|
|
47
|
+
Write the contract clauses, clause-first.
|
|
48
|
+
|
|
49
|
+
For load-bearing files, write a **file-level docblock** that frames the file's role in the system. For each load-bearing method, write a per-method contract:
|
|
50
|
+
|
|
51
|
+
```php
|
|
52
|
+
/**
|
|
53
|
+
* @requires <precondition> (ASCII gloss)
|
|
54
|
+
* @ensures <postcondition> (ASCII gloss)
|
|
55
|
+
* @invariant <property> (ASCII gloss, where applicable)
|
|
56
|
+
*
|
|
57
|
+
* Footgun: <named footgun from phase 1 or 2 — bug that hit, sentinel ambiguity, etc.>
|
|
58
|
+
* (ASCII gloss of the footgun)
|
|
59
|
+
*/
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Use Unicode glyphs paired with ASCII gloss per the dual-audience pattern in `SKILL.md`. See `vocabulary.md` for the glyph palette and per-language examples.
|
|
63
|
+
|
|
64
|
+
Cite specifics:
|
|
65
|
+
|
|
66
|
+
- The bug-fix commit (e.g., "Footgun: see fix in <sha> — empty array vs single-zero array were silently the same precondition")
|
|
67
|
+
- Sibling files where applicable (e.g., "see `OrderTotal.php` for how the result is consumed")
|
|
68
|
+
- Wiki rules where they apply (file:line citations)
|
|
69
|
+
|
|
70
|
+
**Output:** a draft docblock per load-bearing method, plus a file-level overview if the file's role warrants one.
|
|
71
|
+
|
|
72
|
+
### Phase 4: Trim
|
|
73
|
+
|
|
74
|
+
Cut anything an agent could recover from reading the code itself. The contract MUST add information, not restate the implementation.
|
|
75
|
+
|
|
76
|
+
Concrete cuts:
|
|
77
|
+
|
|
78
|
+
- Postconditions that mirror the constructor (e.g., `@ensures result.kind === 'loaded'` on `function loaded(item) { return { kind: 'loaded', item } }`) — drop them.
|
|
79
|
+
- Range assertions that the type system already enforces (`@requires q ≥ 0` when the parameter type is `Natural`) — drop them.
|
|
80
|
+
- Prose that narrates the body (`// loop builds the running sum`) — drop them.
|
|
81
|
+
- Comments that explain syntax (`// using ?? for default`) — drop them.
|
|
82
|
+
|
|
83
|
+
What stays:
|
|
84
|
+
|
|
85
|
+
- Footguns that aren't visible in the body (sentinel ambiguities, cross-file trust boundaries, ordering invariants, conservation properties)
|
|
86
|
+
- Algebraic laws (idempotence, commutativity, monotonicity)
|
|
87
|
+
- Cross-method invariants
|
|
88
|
+
- Anything a future reader would not derive from the body alone
|
|
89
|
+
|
|
90
|
+
**Output:** a tight contract that earns every line.
|
|
91
|
+
|
|
92
|
+
### Phase 5: Discover
|
|
93
|
+
|
|
94
|
+
Ask: **what invariant did you surface that does not belong in this docblock but should be captured as a wiki rule?**
|
|
95
|
+
|
|
96
|
+
The discover phase is the highest-leverage step and the easiest to skip. While writing contracts, you often surface unstated conventions — a parameter ordering rule, a naming convention, a trust assumption that applies across many files. These do not belong in any one docblock but are the kind of knowledge that evaporates on `/clear`.
|
|
97
|
+
|
|
98
|
+
For each discovered convention:
|
|
99
|
+
|
|
100
|
+
- Decide where it belongs (project wiki page, shared skill, a `README.md` next to the code)
|
|
101
|
+
- Write it down (or offer to write it down — the user MAY want to phrase it themselves)
|
|
102
|
+
- Cite the file/line that surfaced it
|
|
103
|
+
|
|
104
|
+
**Output:** zero or more wiki-bound captures, written or queued.
|
|
105
|
+
|
|
106
|
+
Examples of the kind of invariants this phase catches:
|
|
107
|
+
|
|
108
|
+
- A `$siteId`-first-parameter rule across helper functions that was implicit in the code but not documented
|
|
109
|
+
- A trust assumption that strings labeled "internal" never reach SQL without going through a quoter — true in practice, undocumented
|
|
110
|
+
- A "delete flag is `'Y'`, never `null`" convention that the code relies on but no test or comment names
|
|
111
|
+
|
|
112
|
+
### Phase 6: Verify
|
|
113
|
+
|
|
114
|
+
For each clause written in phase 3 (and not cut in phase 4), ask: **what tests does this clause demand that the test suite does not yet have?**
|
|
115
|
+
|
|
116
|
+
Walk the clauses:
|
|
117
|
+
|
|
118
|
+
- Each `@requires` → name the missing input-boundary tests (negative inputs, NaN, empty, max, etc.)
|
|
119
|
+
- Each `@ensures` → name the missing output-property assertions
|
|
120
|
+
- Each `@invariant` → name the missing multi-step tests that exercise the invariant across call sequences
|
|
121
|
+
- Each `@trusted` → name the missing fuzz cases on untrusted-input simulation
|
|
122
|
+
- Each `@time O(...)` / `Θ(...)` → name the missing scaling benchmark (assert the bound holds at n=10, n=100, n=1000)
|
|
123
|
+
- Each `@space O(...)` / `Θ(...)` → name the missing memory-stability test across input sizes
|
|
124
|
+
|
|
125
|
+
This phase is the test-gap report for the contract. It is parallel to phase 5: phase 5 captures wiki-bound knowledge gaps, phase 6 captures test-bound knowledge gaps. Complexity-claim gaps are especially load-bearing — an unverified `@time Θ(n)` claim drifts to `O(n²)` silently across refactors.
|
|
126
|
+
|
|
127
|
+
The output of phase 6 is a list of tests to write. **Do not write them in this loop** — that's a separate task. The list is the artifact; whether the user writes the tests now or later is their call.
|
|
128
|
+
|
|
129
|
+
**Output:** a list of missing tests, one per clause, ordered by which would catch the most likely class of regression first.
|
|
130
|
+
|
|
131
|
+
## Convention authority
|
|
132
|
+
|
|
133
|
+
The PHP-side convention for contract docblocks lives in `wiki/PHP-Conventions.md § Contract Docblocks for Load-Bearing Methods` (in the originating repo's wiki). MUST cite that page rather than duplicating its content here. This skill's role is the *workflow*; the *convention* is a per-repo artifact.
|
|
134
|
+
|
|
135
|
+
For TS and Python, the conventions are folded directly into `SKILL.md` and `vocabulary.md` because those languages do not yet have a per-repo conventions page that lives elsewhere.
|
|
136
|
+
|
|
137
|
+
## Canonical examples
|
|
138
|
+
|
|
139
|
+
Reference the existing contracted methods so an agent can model new work on a verified example:
|
|
140
|
+
|
|
141
|
+
- `packages/roark/src/MathUtils.php::normalizeL2`
|
|
142
|
+
- `packages/roark/src/StringUtils.php::makeKey`
|
|
143
|
+
- `packages/roark/src/Helpers/CacheHelper.php` (file-level overview plus `tryLock`, `lock`, `isLocked`, `get`, `getLockKeyByName`)
|
|
144
|
+
- `packages/roark/src/Enums/CacheTtl.php` (file-level overview plus `toSeconds`)
|
|
145
|
+
- `packages/roark/src/Enums/StatusChar.php` (file-level overview plus `description`)
|
|
146
|
+
- `packages/open_search/src/Helpers/ItemsHelper.php` (file-level overview plus `applyIndexAction`, `getIndexAction`, `modifyIndex`)
|
|
147
|
+
|
|
148
|
+
(All paths are in the `simpleapps-com/augur` repo. Cross-repo read may be needed; the WIP-side `code-contracts-cluster.md` notes this as an open item.)
|
|
149
|
+
|
|
150
|
+
## Reference
|
|
151
|
+
|
|
152
|
+
- `SKILL.md` — the writing skill (auto-triggered on load-bearing edits) and the three-mechanism framing
|
|
153
|
+
- `vocabulary.md` — full glyph palette + clause-first derivation table + per-language examples
|
|
154
|
+
- `audit.md` — the audit modes (per-file + session-aware)
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
# Code Contracts — Audit
|
|
2
|
+
|
|
3
|
+
The audit modes invoked by `/contract-audit`. Two modes; pick by argument:
|
|
4
|
+
|
|
5
|
+
| Mode | Trigger | What it does |
|
|
6
|
+
|------|---------|--------------|
|
|
7
|
+
| **Per-file** | `/contract-audit <file>` | Walks the file for security seams and contract gaps; produces contracts and a test-gap report |
|
|
8
|
+
| **Session-aware** | `/contract-audit` (no args) | Scans files **read or written in this session**; for each, evaluates load-bearing-ness, existing contracts, and recommendation |
|
|
9
|
+
|
|
10
|
+
The session-aware mode is the high-leverage one: it surfaces contract candidates *while context is fresh*, instead of relying on the user to remember to run the audit later.
|
|
11
|
+
|
|
12
|
+
## Frame the audit as bug discovery, not documentation
|
|
13
|
+
|
|
14
|
+
The exercise's value is the discipline of writing contracts forcing real bugs to surface. The annotations themselves are a secondary artifact. Every audit MUST produce a *report*, not auto-applied annotations — the human stays in the loop on what to annotate vs. fix.
|
|
15
|
+
|
|
16
|
+
## Per-file audit
|
|
17
|
+
|
|
18
|
+
For a given file, walk these four questions in order:
|
|
19
|
+
|
|
20
|
+
### 1. Where do types not capture a runtime constraint?
|
|
21
|
+
|
|
22
|
+
Look for:
|
|
23
|
+
|
|
24
|
+
- `string` parameters inlined into SQL, HTML, shell, or other security-sensitive sinks — trusted vs. untrusted origin is invisible
|
|
25
|
+
- Coupled optional fields (e.g. `afterKey` requires `afterKeyColumn`)
|
|
26
|
+
- Non-null assertions (`!`) load-bearing on a runtime guard
|
|
27
|
+
- Sentinel values (`-1`, `0`, `999`, empty string) carrying domain meaning the type cannot express
|
|
28
|
+
- Number that should be ℕ, ℤ⁺, or a refinement (e.g. percentage in `[0, 100]`)
|
|
29
|
+
- Loops over inputs of unknown size — implicit `O(n)` or `O(n²)` claims with no contract to make the cost visible
|
|
30
|
+
|
|
31
|
+
For each finding, propose a contract clause AND the type-level fix that would make the clause unnecessary (the order-of-preference rule from `SKILL.md`). For complexity findings, propose `@time` / `@space` clauses *and* flag the function as a refactor candidate if the asymptotic class is plausibly improvable (e.g., nested-loop O(n²) where a hash-keyed O(n) is reachable).
|
|
32
|
+
|
|
33
|
+
### 2. Does the function's name and docstring match its actual behavior?
|
|
34
|
+
|
|
35
|
+
Read the function name, its docstring (if any), and its body. Look for:
|
|
36
|
+
|
|
37
|
+
- Plural names returning singular results (e.g. `guardrailPlugins` returning one plugin)
|
|
38
|
+
- Verbs that misstate the operation (`getX` that mutates, `isX` that returns a string)
|
|
39
|
+
- Docstrings describing intent that the body doesn't enforce
|
|
40
|
+
- Function that does N+1 things when the name promises 1
|
|
41
|
+
|
|
42
|
+
For each finding, decide: rename the function OR rewrite the body. Drift between name and behavior is a defect by the same rule that applies to contracts.
|
|
43
|
+
|
|
44
|
+
### 3. Does this file produce values consumed elsewhere with implicit contracts?
|
|
45
|
+
|
|
46
|
+
This is the **highest-value finding** and the hardest to surface. Walk every value the file emits — return values, exported constants, structures pushed into shared state, strings written to global queries, fields assigned on shared objects.
|
|
47
|
+
|
|
48
|
+
For each emitted value, ask:
|
|
49
|
+
|
|
50
|
+
- Where is it consumed? (Other files in the repo)
|
|
51
|
+
- What does the consumer assume about its shape, format, sanitization, or origin?
|
|
52
|
+
- Is that assumption written down anywhere?
|
|
53
|
+
|
|
54
|
+
The producer/consumer pair has an *implicit contract* in the gap. Surface it. Either annotate the producer with a `@trusted`/`@ensures` clause, or harden the consumer to defend against violation. Cross-file trust boundaries are where the worst bugs live.
|
|
55
|
+
|
|
56
|
+
### 4. For each contract this audit produces, what tests does the contract demand that the test suite does not yet have?
|
|
57
|
+
|
|
58
|
+
This is the test-gap report. For each clause from questions 1–3:
|
|
59
|
+
|
|
60
|
+
- `@requires` → list missing input-boundary tests (negative inputs, NaN, empty, max, etc.)
|
|
61
|
+
- `@ensures` → list missing output-property assertions
|
|
62
|
+
- `@invariant` → list missing multi-step tests that exercise the invariant across calls
|
|
63
|
+
- `@trusted` → list missing fuzz cases on untrusted-input simulation
|
|
64
|
+
- `@time O(...)` / `@time Θ(...)` → list missing scaling benchmarks (assert the bound holds at n=10, n=100, n=1000)
|
|
65
|
+
- `@space O(...)` / `@space Θ(...)` → list missing memory-stability tests across input sizes
|
|
66
|
+
|
|
67
|
+
The test-gap pass turns the audit from "find bugs" into "find bugs *and* find the test that would have caught them next time." Complexity-claim gaps are especially load-bearing — without scaling tests, the claim is unfalsifiable and an `O(n²)` regression slips into an `O(n)`-claimed function unchecked.
|
|
68
|
+
|
|
69
|
+
## When NOT to annotate
|
|
70
|
+
|
|
71
|
+
This section MUST appear in every audit report, before the per-section findings. Without it, the audit produces JSDoc bloat instead of signal.
|
|
72
|
+
|
|
73
|
+
Skip annotations on:
|
|
74
|
+
|
|
75
|
+
- **Pure transforms whose contract is fully expressed in the type signature.** A function `(n: NonZero) => number` already says `@requires n !== 0` in the type — no comment needed.
|
|
76
|
+
- **Test files.** The test name is the spec. Annotating tests adds noise.
|
|
77
|
+
- **Generated code.** Don't annotate; the generator is the spec.
|
|
78
|
+
- **One-off scripts and throwaway code.** Not load-bearing; not worth the context cost.
|
|
79
|
+
- **Glue code, plumbing, framework adapters, UI components, getters/setters.** Same scope rule as `SKILL.md`.
|
|
80
|
+
|
|
81
|
+
If the comment would just restate the type, it MUST NOT be added. The audit's signal is bugs surfaced and gaps in tests, not annotation count.
|
|
82
|
+
|
|
83
|
+
## Output format
|
|
84
|
+
|
|
85
|
+
Every audit produces a markdown report with this structure:
|
|
86
|
+
|
|
87
|
+
```markdown
|
|
88
|
+
# Audit: <file or session>
|
|
89
|
+
|
|
90
|
+
## Scope
|
|
91
|
+
<file paths audited; load-bearing assessment per file>
|
|
92
|
+
|
|
93
|
+
## When NOT to annotate (carryover)
|
|
94
|
+
<note any patterns in this audit that fall in the skip list — pure transforms whose type already says it, tests, generated, etc.>
|
|
95
|
+
|
|
96
|
+
## Findings
|
|
97
|
+
|
|
98
|
+
### 1. Types that don't capture runtime constraints
|
|
99
|
+
<list per-file findings; for each, propose contract + type-level fix>
|
|
100
|
+
|
|
101
|
+
### 2. Name / docstring drift
|
|
102
|
+
<list per-file findings; for each, propose rename OR body fix>
|
|
103
|
+
|
|
104
|
+
### 3. Cross-file trust boundaries (highest-value)
|
|
105
|
+
<list producer/consumer pairs with implicit contracts; propose @trusted/@ensures on the producer or hardening on the consumer>
|
|
106
|
+
|
|
107
|
+
### 4. Test gaps
|
|
108
|
+
<list per-finding tests that the proposed contracts demand and the suite does not have>
|
|
109
|
+
|
|
110
|
+
## Suggested next steps
|
|
111
|
+
<ordered list — usually: fix bugs found in section 1, address drift in 2, harden boundaries in 3, write missing tests in 4>
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
The audit is **read-only**. MUST NOT auto-apply contracts or modify code or tests. The human reviews the report and decides what to act on.
|
|
115
|
+
|
|
116
|
+
## Session-aware mode
|
|
117
|
+
|
|
118
|
+
When `/contract-audit` is invoked with no arguments, run a session-aware scan instead of per-file.
|
|
119
|
+
|
|
120
|
+
### Discover the candidate set
|
|
121
|
+
|
|
122
|
+
Look at the files **read or written in this session** (using session context, recent tool-call history, or open editor state). Filter to:
|
|
123
|
+
|
|
124
|
+
- Source files in `repo/` (not test files, not config, not generated)
|
|
125
|
+
- Files with at least one substantive read or edit this session
|
|
126
|
+
- Files large enough to plausibly contain load-bearing functions (skip files < ~50 lines unless the agent has reason to believe they're security-critical)
|
|
127
|
+
|
|
128
|
+
### Score each file
|
|
129
|
+
|
|
130
|
+
For each candidate, evaluate:
|
|
131
|
+
|
|
132
|
+
| Dimension | Question |
|
|
133
|
+
|-----------|----------|
|
|
134
|
+
| Load-bearing-ness | Does this file contain money math, auth, concurrency, state machines, security boundaries, or non-trivial algorithms? |
|
|
135
|
+
| Existing contracts | Are there already `@requires` / `@ensures` / `@invariant` / `@trusted` clauses? |
|
|
136
|
+
| Worth adding | If no contracts present, would adding them surface a bug or fill a test gap? |
|
|
137
|
+
|
|
138
|
+
The agent has been in the file's context this session — use that. Don't re-derive load-bearing-ness from cold.
|
|
139
|
+
|
|
140
|
+
### Output: ranked recommendations
|
|
141
|
+
|
|
142
|
+
Produce a ranked list:
|
|
143
|
+
|
|
144
|
+
```markdown
|
|
145
|
+
# Session-aware contract audit — N candidates
|
|
146
|
+
|
|
147
|
+
## Recommended (highest leverage)
|
|
148
|
+
|
|
149
|
+
1. **`<path>`** — <one-line rationale: load-bearing reason + observed gap>
|
|
150
|
+
Suggested action: <add contracts to function X; harden the boundary at line Y; etc.>
|
|
151
|
+
|
|
152
|
+
2. **`<path>`** — ...
|
|
153
|
+
|
|
154
|
+
## Already covered
|
|
155
|
+
|
|
156
|
+
<files with contracts already in place — note any drift the agent observed during the session>
|
|
157
|
+
|
|
158
|
+
## Skip (not load-bearing)
|
|
159
|
+
|
|
160
|
+
<files in the candidate set that don't meet the scope rule; brief reason>
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Keep the rationale short. The user picks the top item and runs `/contract-audit <path>` to do a per-file deep dive.
|
|
164
|
+
|
|
165
|
+
### Why this mode pays off
|
|
166
|
+
|
|
167
|
+
The user does not have to remember to run an audit. The agent surfaces candidates while the context is fresh. Mechanism #3 (agent-reported value) operationalizes here — the agent saw the file in this session and can score it; a cold audit cannot.
|
|
168
|
+
|
|
169
|
+
## Reference
|
|
170
|
+
|
|
171
|
+
- `SKILL.md` — the writing skill (auto-triggered on load-bearing edits)
|
|
172
|
+
- `vocabulary.md` — full glyph palette + clause-first derivation table + per-language examples
|
|
173
|
+
- `apply.md` — six-phase loop for adding contracts to existing code (when this audit recommends "add contracts to function X")
|
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
# Code Contracts — Vocabulary Reference
|
|
2
|
+
|
|
3
|
+
The full glyph palette, ASCII gloss patterns, clause-first derivation table, and per-language examples. Loaded by `SKILL.md` on demand.
|
|
4
|
+
|
|
5
|
+
## Glyph palette
|
|
6
|
+
|
|
7
|
+
A small, opinionated set. MUST stay narrow — each glyph the agent emits should be one a reader has seen elsewhere in this codebase, not a novelty.
|
|
8
|
+
|
|
9
|
+
| Family | Glyphs | ASCII gloss |
|
|
10
|
+
|--------|--------|-------------|
|
|
11
|
+
| Logic | `∀`, `∃`, `∧`, `∨`, `¬`, `⇒`, `⇔` | forall, exists, and, or, not, implies, iff |
|
|
12
|
+
| Comparison | `≤`, `≥`, `≠`, `≡`, `≜` | <=, >=, !=, ==, := (definition) |
|
|
13
|
+
| Sets | `∈`, `∉`, `⊆`, `⊂`, `∪`, `∩`, `∅` | in, not in, subset of, proper subset, union, intersection, empty |
|
|
14
|
+
| Numbers | `ℕ`, `ℤ`, `ℚ`, `ℝ` | natural, integer, rational, real |
|
|
15
|
+
| Functions | `→`, `↦`, `∘` | function-of, mapsto, compose |
|
|
16
|
+
| Brackets | `⟨ ⟩`, `⟦ ⟧` | tuple, denotation |
|
|
17
|
+
| Complexity | `Θ`, `O`, `Ω`, `ω`, `~`, superscripts (`²`, `³`, `ⁿ`) | tight bound, upper bound, lower bound, strictly lower, asymptotic equivalence, exponentiation |
|
|
18
|
+
| Limits | `→ ∞`, `lim` | "as n grows without bound" |
|
|
19
|
+
|
|
20
|
+
### What stays ASCII
|
|
21
|
+
|
|
22
|
+
- Tag prefixes (`@requires`, `@ensures`, `@invariant`, `@trusted`, `@pure`) — for tooling compatibility (Psalm, PHPStan, JSDoc tooling, mypy, pyright)
|
|
23
|
+
- Operators inside type signatures (TS/PHP/Python syntax — `&`, `|`, `:`, `?`, etc.)
|
|
24
|
+
- Inline gloss text on every clause (the dual-audience pattern)
|
|
25
|
+
|
|
26
|
+
Unicode lives inside the *clause body*. ASCII gloss lives on the same or next line.
|
|
27
|
+
|
|
28
|
+
### Common glyph confusions to avoid
|
|
29
|
+
|
|
30
|
+
- `⇒` (implication) vs `→` (function arrow) vs `↦` (mapsto). Use `⇒` for logical implication, `→` for "function from A to B," `↦` for "x mapsto f(x)."
|
|
31
|
+
- `≡` (logical equivalence / equal-by-definition in some traditions) vs `≜` (definition). Prefer `≜` for definitions, `≡` for "same up to" relations.
|
|
32
|
+
- `⊆` (subset-or-equal) vs `⊂` (proper subset). Most contract uses want `⊆`.
|
|
33
|
+
- `Θ(f)` (tight bound) vs `O(f)` (upper bound only) vs `Ω(f)` (lower bound). Most code calling itself "O(n log n)" is actually `Θ(n log n)` — the bound is tight, not just an upper limit. SHOULD use `Θ` when the bound is tight; `O` is correct only when the function may run *faster* than f. Loose `O` claims prime the wrong reasoning ("this is at most O(n²)" when the agent should think "this is exactly Θ(n log n)").
|
|
34
|
+
|
|
35
|
+
## Pairing patterns
|
|
36
|
+
|
|
37
|
+
Pick one and apply consistently within a file. **The gloss MUST carry assumption-naming, not just translation** (see `SKILL.md` § "Two surfaces per clause"). The Unicode formal surface activates careful reasoning in readers with bandwidth to parse it; the prose gloss carries the same content for readers without that bandwidth, plus the assumptions the formal notation cannot express.
|
|
38
|
+
|
|
39
|
+
### Pattern A — inline assumption-naming gloss after the formal clause
|
|
40
|
+
|
|
41
|
+
```ts
|
|
42
|
+
@requires q ≥ 0 // q is non-negative; assumes caller validated input
|
|
43
|
+
@ensures result ∈ ℕ // result is a natural number; overflow is caller's responsibility
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Denser. Reads as a column of formal clauses with the prose-and-assumptions as a sidebar.
|
|
47
|
+
|
|
48
|
+
### Pattern B — bracketed gloss on the same line
|
|
49
|
+
|
|
50
|
+
```ts
|
|
51
|
+
@requires q ≥ 0 (q is non-negative; assumes caller validated input)
|
|
52
|
+
@ensures ∀ x ∈ items. x.qty ≥ 0 (every item has non-negative qty; empty array is allowed)
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Less dense. Reads more like prose. Better when the clauses are short and the assumption-naming is short.
|
|
56
|
+
|
|
57
|
+
### What the gloss is NOT
|
|
58
|
+
|
|
59
|
+
A redundant translation. `@requires q ≥ 0 // q >= 0` is the wrong gloss — both lines say the same thing, neither names what is assumed. The gloss MUST add at least one of:
|
|
60
|
+
|
|
61
|
+
- An assumption the formal notation does not express (no NaN, integer not float, validated upstream)
|
|
62
|
+
- A side condition (locking, transactional context, ordering)
|
|
63
|
+
- A boundary the contract treats as out-of-scope (overflow, empty input, sentinels)
|
|
64
|
+
|
|
65
|
+
If the gloss is exactly `@requires q ≥ 0 // q >= 0`, drop it — it's bookkeeping, not load-bearing prose.
|
|
66
|
+
|
|
67
|
+
## Clause-first derivation table
|
|
68
|
+
|
|
69
|
+
Write the clause first. The encoding falls out of the clause shape.
|
|
70
|
+
|
|
71
|
+
| Clause shape | Tier | Encoding |
|
|
72
|
+
|--------------|------|----------|
|
|
73
|
+
| Refinement: `q : ℕ ∧ q > 0` *(q is a positive natural)* | 1 | Branded type + smart constructor: `type PositiveInt = number & { readonly __brand: 'PositiveInt' }; function mkPositive(n: number): PositiveInt` |
|
|
74
|
+
| Sum: `Status ≜ Loaded(Item) ∨ NotFound ∨ CallForPrice` *(tagged union of three states)* | 1 | Discriminated union: `{ kind: 'loaded'; item: Item } \| { kind: 'not-found' } \| { kind: 'call-for-price' }` |
|
|
75
|
+
| Predicate: `∀ x ∈ xs. p(x)` *(every x in xs satisfies p)* | 2 | `xs.every(p)` |
|
|
76
|
+
| Predicate: `∃ x ∈ xs. p(x)` *(some x in xs satisfies p)* | 2 | `xs.some(p)` |
|
|
77
|
+
| Effect: `pure`, `mutates X`, `throws Y` | 2 | `@psalm-pure`, `@psalm-mutation-free`, `eslint-plugin-functional` |
|
|
78
|
+
| Otherwise (algebraic law, multi-step protocol invariant, external state) | 3 | Formal-prose annotation in JSDoc / docstring |
|
|
79
|
+
|
|
80
|
+
The discipline: write the clause **before** the function. Reversing the order — writing the function and back-fitting a contract — bypasses the cognitive work and produces tautological annotations.
|
|
81
|
+
|
|
82
|
+
## Per-language examples
|
|
83
|
+
|
|
84
|
+
### TypeScript
|
|
85
|
+
|
|
86
|
+
```ts
|
|
87
|
+
/**
|
|
88
|
+
* Compute order total in cents.
|
|
89
|
+
*
|
|
90
|
+
* @requires items.length > 0 // at least one line item; empty cart is caller's responsibility
|
|
91
|
+
* @requires ∀ i ∈ items. i.qty > 0 // every item has positive qty; assumes upstream validation
|
|
92
|
+
* @ensures result === Σ(items, i ↦ i.qty × i.unitPrice)
|
|
93
|
+
* // result is the sum of qty × unitPrice; assumes integer cents, no rounding here
|
|
94
|
+
* @ensures result ∈ ℕ // result is non-negative; overflow is caller's responsibility
|
|
95
|
+
* @time Θ(n) // linear in n = items.length
|
|
96
|
+
* @space O(1) // auxiliary space — accumulator only
|
|
97
|
+
* @pure
|
|
98
|
+
*/
|
|
99
|
+
function totalCents(items: LineItem[]): number { ... }
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### PHP
|
|
103
|
+
|
|
104
|
+
```php
|
|
105
|
+
/**
|
|
106
|
+
* Rebuild the latest snapshot from the event log.
|
|
107
|
+
*
|
|
108
|
+
* @requires $events is sorted ascending by occurredAt
|
|
109
|
+
* (events are chronological; ordering is caller's responsibility, not validated here)
|
|
110
|
+
*
|
|
111
|
+
* @ensures result ≜ fold($events, replay)
|
|
112
|
+
* (result is the snapshot replayable from the event log; assumes replay is deterministic)
|
|
113
|
+
*
|
|
114
|
+
* @invariant during fold, accumulator state ≡ replay($events[0..i])
|
|
115
|
+
* (at each step i, accumulator equals replay of events 0 through i; holds only if events are immutable during fold)
|
|
116
|
+
*
|
|
117
|
+
* @time Θ(n) // linear fold over events; n = count($events)
|
|
118
|
+
* @space O(1) // running accumulator, no buffering of events
|
|
119
|
+
* @pure
|
|
120
|
+
*
|
|
121
|
+
* Footgun: $events === ∅ vs $events === [genesis] are not the same precondition.
|
|
122
|
+
* (empty array means "no genesis"; caller must distinguish from "not yet bootstrapped")
|
|
123
|
+
*/
|
|
124
|
+
function rebuild(array $events): Snapshot { ... }
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Python
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
def transfer(src: Account, dst: Account, cents: int) -> None:
|
|
131
|
+
"""
|
|
132
|
+
Transfer cents from src to dst, atomically.
|
|
133
|
+
|
|
134
|
+
@requires cents > 0 # cents must be positive; zero-amount is caller's responsibility
|
|
135
|
+
@requires src.balance ≥ cents # sufficient funds; assumes balance was read inside the same lock
|
|
136
|
+
@requires src ≠ dst # src and dst differ; self-transfer is caller's bug, not ours
|
|
137
|
+
@ensures src.balance ≡ old(src.balance) − cents
|
|
138
|
+
# src.balance decreases by cents; assumes no concurrent mutators outside this lock
|
|
139
|
+
@ensures dst.balance ≡ old(dst.balance) + cents
|
|
140
|
+
# dst.balance increases by cents; same locking assumption
|
|
141
|
+
@invariant src.balance + dst.balance ≡ old(src.balance) + old(dst.balance)
|
|
142
|
+
# total balance is conserved; holds across the atomic boundary, not mid-flight
|
|
143
|
+
@time Θ(1) # constant — three field updates
|
|
144
|
+
@space Θ(1)
|
|
145
|
+
@mutates src, dst
|
|
146
|
+
"""
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## Annotation forms reference
|
|
150
|
+
|
|
151
|
+
- `@requires <precondition>` — must hold of inputs at call time
|
|
152
|
+
- `@ensures <postcondition>` — guaranteed of result / observable state
|
|
153
|
+
- `@invariant <property>` — holds at loop head, between method calls, across state transitions
|
|
154
|
+
- `@trusted <param>` — value inlined into a security-sensitive sink (SQL, HTML, shell); origin must be trusted code, never user input
|
|
155
|
+
- `@pure` — no observable effects (no mutation, no IO, no throws under normal inputs)
|
|
156
|
+
- `@mutates <state>` — names the state mutated (a parameter, a field, a global)
|
|
157
|
+
- `@throws <type>` — names the exceptions that may be raised
|
|
158
|
+
- `@io` — performs IO (filesystem, network, console)
|
|
159
|
+
- `@property <law>` — algebraic law the function satisfies (idempotence, commutativity, associativity, monotonicity)
|
|
160
|
+
- `@time <bound>` — runtime complexity. Use `Θ(...)` for tight bound, `O(...)` for upper bound only, `Ω(...)` for lower bound. Amortized: `@time Θ(1) amortized`. Worst/avg split: `@time worst Θ(n²) avg Θ(n log n)`.
|
|
161
|
+
- `@space <bound>` — auxiliary space complexity (excluding input). Same `Θ`/`O`/`Ω` conventions.
|
|
162
|
+
|
|
163
|
+
## Glossary
|
|
164
|
+
|
|
165
|
+
A reader unfamiliar with the symbols can use this section. Over repeated exposure these become routine.
|
|
166
|
+
|
|
167
|
+
| Symbol | Reads as | Means |
|
|
168
|
+
|--------|----------|-------|
|
|
169
|
+
| `∀ x ∈ S. P(x)` | "for all x in S, P of x" | every element of S satisfies the predicate P |
|
|
170
|
+
| `∃ x ∈ S. P(x)` | "there exists x in S such that P of x" | at least one element of S satisfies P |
|
|
171
|
+
| `A ⇒ B` | "A implies B" | if A then B |
|
|
172
|
+
| `A ⇔ B` | "A iff B" | A holds exactly when B holds |
|
|
173
|
+
| `A ∧ B` | "A and B" | both A and B hold |
|
|
174
|
+
| `A ∨ B` | "A or B" | at least one of A, B holds |
|
|
175
|
+
| `¬A` | "not A" | A does not hold |
|
|
176
|
+
| `x ∈ S` | "x is in S" | x is an element of S |
|
|
177
|
+
| `S ⊆ T` | "S is a subset of T" | every element of S is in T |
|
|
178
|
+
| `S ∪ T` | "S union T" | elements in S or T (or both) |
|
|
179
|
+
| `S ∩ T` | "S intersect T" | elements in both S and T |
|
|
180
|
+
| `∅` | "empty" | the empty set or empty collection |
|
|
181
|
+
| `ℕ` | "naturals" | non-negative integers (0, 1, 2, …) |
|
|
182
|
+
| `ℤ` | "integers" | …, -1, 0, 1, … |
|
|
183
|
+
| `ℝ` | "reals" | the real numbers (note IEEE-754 ≠ ℝ) |
|
|
184
|
+
| `f : A → B` | "f from A to B" | f is a function from set A to set B |
|
|
185
|
+
| `x ↦ f(x)` | "x mapsto f of x" | the function that maps x to f(x) |
|
|
186
|
+
| `≜` | "is defined as" | left side is defined to mean the right |
|
|
187
|
+
| `≡` | "equivalent to" | the two are interchangeable in this context |
|
|
188
|
+
| `Σ(xs, f)` | "sum of f over xs" | sum of f(x) for each x in xs |
|
|
189
|
+
| `Θ(f(n))` | "Theta of f of n" | tight asymptotic bound — function grows exactly at the rate f(n) |
|
|
190
|
+
| `O(f(n))` | "Big-O of f of n" | upper bound only — function grows at most as fast as f(n) |
|
|
191
|
+
| `Ω(f(n))` | "Big-Omega of f of n" | lower bound — function grows at least as fast as f(n) |
|
|
192
|
+
| `ω(f(n))` | "little-omega of f of n" | strictly faster than f(n) |
|
|
193
|
+
| `f ~ g` | "f is asymptotic to g" | f(n)/g(n) → 1 as n → ∞ |
|
|
194
|
+
| `n → ∞` | "as n grows without bound" | the limiting case for asymptotic claims |
|
|
195
|
+
|
|
196
|
+
The glossary is intentionally short. Other symbols MAY be added when load-bearing across multiple files; bare additions for a single use SHOULD be avoided.
|
|
@@ -148,30 +148,30 @@ Every project SHOULD configure `.claude/settings.local.json` with these deny rul
|
|
|
148
148
|
},
|
|
149
149
|
"permissions": {
|
|
150
150
|
"allow": [
|
|
151
|
-
"Bash(
|
|
151
|
+
"Bash(basename:*)",
|
|
152
|
+
"Bash(dirname:*)",
|
|
153
|
+
"Bash(find:*)",
|
|
154
|
+
"Bash(grep:*)",
|
|
152
155
|
"Bash(ls:*)",
|
|
153
|
-
"Bash(
|
|
156
|
+
"Bash(lsof:*)",
|
|
154
157
|
"Bash(md5:*)",
|
|
155
158
|
"Bash(md5sum:*)",
|
|
159
|
+
"Bash(pnpm:*)",
|
|
160
|
+
"Bash(pwd:*)",
|
|
156
161
|
"Bash(readlink:*)",
|
|
162
|
+
"Bash(rg:*)",
|
|
163
|
+
"Bash(wc:*)",
|
|
157
164
|
"Bash(which:*)",
|
|
158
|
-
"Bash(basename:*)",
|
|
159
|
-
"Bash(dirname:*)",
|
|
160
|
-
"Bash(pwd:*)",
|
|
161
|
-
"Bash(lsof:*)",
|
|
162
165
|
"mcp__plugin_simpleapps_augur-api__*"
|
|
163
166
|
],
|
|
164
167
|
"deny": [
|
|
165
168
|
"Bash(awk:*)",
|
|
166
169
|
"Bash(cat:*)",
|
|
167
170
|
"Bash(cd:*)",
|
|
168
|
-
"Bash(find:*)",
|
|
169
171
|
"Bash(for:*)",
|
|
170
|
-
"Bash(grep:*)",
|
|
171
172
|
"Bash(head:*)",
|
|
172
173
|
"Bash(kill:*)",
|
|
173
174
|
"Bash(pkill:*)",
|
|
174
|
-
"Bash(rg:*)",
|
|
175
175
|
"Bash(sed:*)",
|
|
176
176
|
"Bash(sleep:*)",
|
|
177
177
|
"Bash(tail:*)",
|
|
@@ -187,16 +187,17 @@ Why each is denied:
|
|
|
187
187
|
- **`awk`**: Use the Edit tool instead.
|
|
188
188
|
- **`cat`**: Use the Read tool instead.
|
|
189
189
|
- **`cd`**: MUST NOT use in any Bash command, including compound commands (`cd /path && git`). Use `git -C repo` for git, path arguments for everything else. Compound cd+git commands trigger an unblockable Claude Code security prompt that interrupts the user even when `cd` is denied.
|
|
190
|
-
- **`
|
|
191
|
-
- **`for`**: Shell loops are unnecessary; use dedicated tools or make multiple tool calls instead.
|
|
192
|
-
- **`grep`**: Use the Grep tool instead.
|
|
190
|
+
- **`for`**: Shell loops are unnecessary; make multiple tool calls instead.
|
|
193
191
|
- **`head`/`tail`**: Use the Read tool with `offset` and `limit` parameters instead.
|
|
194
192
|
- **`kill`/`pkill`**: Use `TaskStop` to manage background processes. `TaskStop` cleanly shuts down the task and updates Claude Code's internal tracking.
|
|
195
|
-
- **`rg`**: Use the Grep tool instead (it uses ripgrep internally).
|
|
196
193
|
- **`sed`**: Use the Edit tool instead.
|
|
197
194
|
- **`sleep`**: Unnecessary; use proper sequencing or background tasks.
|
|
198
195
|
- **`Edit(~/.claude/plugins/**)` / `Write(~/.claude/plugins/**)`**: The installed plugin tree is a cache. Marketplace updates clobber it. To change plugin behavior, edit the plugin's source repo (e.g., `~/projects/simpleapps/augur-skills/`) instead.
|
|
199
196
|
|
|
197
|
+
Why `find`, `grep`, and `rg` are now ALLOWED (previously denied):
|
|
198
|
+
|
|
199
|
+
Claude Code 2.1.117 removed the dedicated Grep and Glob tools. Search is now done via Bash. Denying `grep`/`find`/`rg` while no dedicated alternative exists makes the agent unable to search anything. These commands are allowed by default; the bash-simplicity skill still applies (one command per call, no shell plumbing).
|
|
200
|
+
|
|
200
201
|
## Bin Scripts (PATH)
|
|
201
202
|
|
|
202
203
|
The augur-skills plugin includes shell scripts (`cld`, `cldo`, `tmcld`, etc.) in `plugins/simpleapps/bin/`. When installed via the Claude Code marketplace, these live at:
|
|
@@ -188,12 +188,12 @@ Every wiki on the machine is a local knowledge base. When looking for how someth
|
|
|
188
188
|
2. Pull the latest for all wikis before searching:
|
|
189
189
|
- `git -C {projectRoot}/clients/*/wiki pull` (one call per wiki, not a glob)
|
|
190
190
|
- `git -C {projectRoot}/simpleapps/*/wiki pull`
|
|
191
|
-
3. Search across all wikis with
|
|
192
|
-
- `
|
|
193
|
-
- `
|
|
191
|
+
3. Search across all wikis with Bash `grep`:
|
|
192
|
+
- `grep -rn --include="*.md" "<pattern>" {projectRoot}/clients/*/wiki/`
|
|
193
|
+
- `grep -rn --include="*.md" "<pattern>" {projectRoot}/simpleapps/*/wiki/`
|
|
194
194
|
4. Read the matching pages to get the full context
|
|
195
195
|
|
|
196
|
-
|
|
196
|
+
Discover which projects have wikis with Bash `ls`: `ls -d {projectRoot}/clients/*/wiki` and `ls -d {projectRoot}/simpleapps/*/wiki`.
|
|
197
197
|
|
|
198
198
|
The wikis are kept fresh by `/curate-wiki` runs across projects. Searching locally is instant and requires no internet access; the knowledge is already on the machine.
|
|
199
199
|
|
|
@@ -11,13 +11,14 @@ Do not add features, refactor surrounding code, or "improve" beyond the request.
|
|
|
11
11
|
|
|
12
12
|
## Use the right tool
|
|
13
13
|
|
|
14
|
-
Prefer dedicated tools over Bash equivalents. They are faster, need no permissions, and produce cleaner output:
|
|
14
|
+
Prefer dedicated tools over Bash equivalents when one exists. They are faster, need no permissions, and produce cleaner output:
|
|
15
15
|
- Read not `cat`/`head`/`tail`
|
|
16
|
-
- Grep not `grep`/`rg`
|
|
17
|
-
- Glob not `find`/`ls`
|
|
18
16
|
- Edit not `sed`/`awk`
|
|
17
|
+
- Write not `echo >`/`cat <<EOF`
|
|
19
18
|
|
|
20
|
-
|
|
19
|
+
Search is Bash-only — Claude Code 2.1.117 removed the dedicated Grep and Glob tools. Use `grep -rn`, `rg`, `find`, and `ls` directly via Bash, one command per call (no `-exec`, no piping to `head`).
|
|
20
|
+
|
|
21
|
+
Reserve Bash for searches above and for commands that never had a dedicated tool.
|
|
21
22
|
|
|
22
23
|
MUST NOT use `cd` in any Bash command, not even in compound commands like `cd /path && git log`. Use `git -C repo` for git, and path arguments for everything else. The `cd` deny rule does not suppress Claude Code's built-in security prompt for compound cd+git commands, so any `cd` usage will interrupt the user.
|
|
23
24
|
|
|
@@ -29,12 +30,19 @@ Before asking the user for credentials, tokens, siteId, domain, or any site-spec
|
|
|
29
30
|
|
|
30
31
|
When debugging in the browser, MUST check for error overlays (red error pill/badge at the bottom of the page) before guessing at the problem. Click it, read the full error, stack trace, and source location. The answer is almost always right there.
|
|
31
32
|
|
|
32
|
-
##
|
|
33
|
+
## Context discipline
|
|
34
|
+
|
|
35
|
+
Every file read, command output, and subagent response sits in context for the rest of the session. The agent behaviors that matter:
|
|
36
|
+
|
|
37
|
+
- Broad exploration ("where is X wired up", "how does Y work") → delegate to an Explore subagent with a word cap. The entire exploration happens outside your context; only the returned summary costs you tokens. This is the single biggest lever for keeping the main thread slim. The trick is asking for everything you will need up front — file paths, line numbers, surrounding context, edge cases — in one specific request. A complete request yields a complete answer; a vague one forces a second round-trip that erases the saving. See `subagent-briefing.md` for the required briefing elements.
|
|
38
|
+
- Do not re-read files already loaded in this session. Trust the earlier Read.
|
|
39
|
+
- After Grep gives you a line number, Read with offset/limit — not the whole file.
|
|
40
|
+
- Commands with large output (test runs, build logs, long grep results): redirect to a `tmp/` file, then Grep or targeted-Read the parts you need.
|
|
41
|
+
- Do not duplicate subagent work. If you delegated the search, use the answer — do not re-run the greps inline to verify.
|
|
33
42
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
-
|
|
37
|
-
- Two sentences that answer the question beat two pages that fill the context window
|
|
43
|
+
## Response length
|
|
44
|
+
|
|
45
|
+
Be complete and concise. Accuracy and completeness come first — do not truncate a real answer to look terse. But verbosity is not thoroughness. Every token sent to the user is a token they are expected to read; too many tokens raise cognitive load and annoy them. Output tokens also cost ~5x input on Opus, so the waste compounds. Multi-option writeups, draft code blocks, and "here are my thoughts" bullets are the default failure mode when a paragraph would cover it. Say what's needed, then stop.
|
|
38
46
|
|
|
39
47
|
## Verify your own work
|
|
40
48
|
|
|
@@ -96,24 +104,23 @@ These actions hide problems; they do not fix them. If a rule or test seems wrong
|
|
|
96
104
|
|
|
97
105
|
## Branch hygiene before starting work
|
|
98
106
|
|
|
99
|
-
The lifecycle commands `/wip`, `/investigate`, and `/implement` MUST verify branch state before
|
|
107
|
+
Branch management is the agent's job, not the user's. The lifecycle commands `/wip`, `/investigate`, and `/implement` MUST verify branch state before starting, but the agent SHOULD handle the safe transitions itself rather than blocking the user with "go run `git switch` first."
|
|
100
108
|
|
|
101
109
|
When invoked for issue `#N`:
|
|
102
110
|
|
|
103
111
|
1. Run `git -C repo branch --show-current` → branch `B`
|
|
104
112
|
2. Run `git -C repo status --porcelain` → working tree state `T`
|
|
105
113
|
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
114
|
+
| `B` | `T` | Action |
|
|
115
|
+
|-----|-----|--------|
|
|
116
|
+
| Contains `N` | any | Proceed — continuing in-flight work for this issue |
|
|
117
|
+
| `main` / `master` | clean | For `/wip` and `/investigate` (read/scaffold only): proceed on `main`. For `/implement` (code changes): create the branch yourself with `git -C repo switch -c <type>/<N>-<slug>`, then proceed. Derive `<type>` from the issue title prefix (`feat:`, `fix:`, `chore:`, `docs:`). Derive `<slug>` from the issue title (lowercase-hyphenated, ≤40 chars). |
|
|
118
|
+
| `main` / `master` | dirty | HARD STOP — uncommitted changes need to land somewhere first. Report exactly which files are modified. Let the user decide (commit on a branch, discard, stash). MUST NOT touch their changes. |
|
|
119
|
+
| Belongs to a different issue (`feat/M-…`, `M ≠ N`) | any | HARD STOP — tell the user to `/submit` the in-flight work first, then re-run. MUST NOT switch branches with uncommitted work present. |
|
|
112
120
|
|
|
113
|
-
|
|
114
|
-
- If `B` is `main`/`master` but `T` is dirty: tell the user the uncommitted changes need to land somewhere (their own branch + `/submit`, or explicit discard) before starting new work.
|
|
121
|
+
The HARD STOPs only fire when the user's working state would be lost or mixed by proceeding. Clean main + known issue is not a stop condition — `/implement` creates the branch itself; `/wip` and `/investigate` proceed in place because they don't write code.
|
|
115
122
|
|
|
116
|
-
|
|
123
|
+
Branching mistakes compound silently and the cost of recovery scales with how many commands later they are caught — but the answer is the agent doing the safe transition autonomously, not screaming the sky is falling at the user every time the workflow requires a routine `git switch`.
|
|
117
124
|
|
|
118
125
|
## Track progress
|
|
119
126
|
|