counselors 0.4.12 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +164 -1
- package/assets/presets/bughunt.yml +33 -0
- package/assets/presets/contracts.yml +16 -0
- package/assets/presets/hotspots.yml +35 -0
- package/assets/presets/invariants.yml +17 -0
- package/assets/presets/regression.yml +16 -0
- package/assets/presets/security.yml +17 -0
- package/dist/cli.js +1834 -479
- package/dist/cli.js.map +1 -1
- package/package.json +2 -1
package/README.md
CHANGED
|
@@ -118,6 +118,137 @@ counselors run -t opus,opus,opus "Review this" # Run the same tool multiple tim
|
|
|
118
118
|
| `--json` | Output manifest as JSON |
|
|
119
119
|
| `-o, --output-dir <dir>` | Base output directory |
|
|
120
120
|
|
|
121
|
+
### `loop [prompt]`
|
|
122
|
+
|
|
123
|
+
Multi-round dispatch — agents iterate, seeing prior outputs each round.
|
|
124
|
+
|
|
125
|
+
Each round dispatches to all tools in parallel. Starting from round 2, each agent receives the outputs from all prior rounds, so it can build on previous analysis and avoid repeating findings.
|
|
126
|
+
|
|
127
|
+
```text
|
|
128
|
+
input: user prompt/focus (e.g.: "focus on the auth module", "look at the sidebar component")
|
|
129
|
+
|
|
|
130
|
+
+--> with --preset:
|
|
131
|
+
| [repo discovery phase] --> [prompt-writing phase] --> execution prompt (includes boilerplate)
|
|
132
|
+
+--> without --preset:
|
|
133
|
+
inline arg prompt:
|
|
134
|
+
default: [repo discovery phase] --> [prompt-writing phase] --> enhanced execution prompt
|
|
135
|
+
opt-out: --no-inline-enhancement (skip discovery/prompt-writing)
|
|
136
|
+
file/stdin prompt: used as provided (discovery/prompt-writing skipped)
|
|
137
|
+
|
|
138
|
+
all modes: execution boilerplate is always appended
|
|
139
|
+
|
|
140
|
+
execution prompt
|
|
141
|
+
|
|
|
142
|
+
v
|
|
143
|
+
+------------------------------- loop rounds -------------------------------+
|
|
144
|
+
| round 1: dispatch to all selected tools in parallel |
|
|
145
|
+
| write per-tool outputs + round notes |
|
|
146
|
+
| |
|
|
147
|
+
| round N>1: execution prompt + references to prior round outputs |
|
|
148
|
+
| (new findings, challenge/refine prior findings) |
|
|
149
|
+
| dispatch in parallel, write outputs + notes |
|
|
150
|
+
| |
|
|
151
|
+
| stop when: |
|
|
152
|
+
| - max rounds reached, or |
|
|
153
|
+
| - duration expires, or |
|
|
154
|
+
| - convergence threshold reached, or |
|
|
155
|
+
| - user aborts (Ctrl+C after current round) |
|
|
156
|
+
+---------------------------------------------------------------------------+
|
|
157
|
+
|
|
|
158
|
+
v
|
|
159
|
+
final notes + run manifest
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
```text
|
|
163
|
+
Round behavior:
|
|
164
|
+
|
|
165
|
+
round 1 prompt = base execution prompt
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
round N prompt = base execution prompt
|
|
169
|
+
// Base execution prompt is amended with...
|
|
170
|
+
+ "Prior Round Outputs" section
|
|
171
|
+
+ @refs to recent prior tool outputs
|
|
172
|
+
+ instruction to avoid duplicate findings, challenge/refine
|
|
173
|
+
prior claims, and expand from prior leads
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
counselors loop "Find and fix test gaps in src/auth/" --rounds 5
|
|
178
|
+
counselors loop --duration 30m "Hunt for edge cases"
|
|
179
|
+
counselors loop --preset bughunt "src/api" --tools opus,codex
|
|
180
|
+
counselors loop --preset hotspots "critical request path" --group smart
|
|
181
|
+
counselors loop --list-presets
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
| Flag | Description |
|
|
185
|
+
|------|-------------|
|
|
186
|
+
| `--rounds <N>` | Number of dispatch rounds (default: 3) |
|
|
187
|
+
| `--duration <time>` | Max total duration (e.g. `"30m"`, `"1h"`). If set without `--rounds`, runs unlimited rounds until time expires |
|
|
188
|
+
| `--preset <name-or-path>` | Use a built-in preset (e.g. `"bughunt"`) or a custom `.yml/.yaml` preset file |
|
|
189
|
+
| `--list-presets` | List built-in presets and exit |
|
|
190
|
+
| `--no-inline-enhancement` | For non-preset inline prompts, skip discovery + prompt-writing enhancement |
|
|
191
|
+
|
|
192
|
+
Plus all `run` flags: `-f`, `-t`, `-g`, `--context`, `--read-only`, `--dry-run`, `--json`, `-o`.
|
|
193
|
+
|
|
194
|
+
**SIGINT handling:** First Ctrl+C finishes the current round gracefully. Second Ctrl+C force-exits immediately.
|
|
195
|
+
|
|
196
|
+
**Presets** provide domain-specific multi-round workflows.
|
|
197
|
+
|
|
198
|
+
Built-ins:
|
|
199
|
+
- `bughunt` — bugs, edge cases, and missing test coverage
|
|
200
|
+
- `security` — exploitable vulnerabilities and high-impact security flaws
|
|
201
|
+
- `invariants` — impossible states and state synchronization problems
|
|
202
|
+
- `regression` — behavior changes likely to break existing callers/users
|
|
203
|
+
- `contracts` — mismatches between API producers and consumers
|
|
204
|
+
- `hotspots` — high-impact bottlenecks, including O(n^2)+ patterns
|
|
205
|
+
|
|
206
|
+
Custom presets (code-grounded):
|
|
207
|
+
|
|
208
|
+
```yaml
|
|
209
|
+
name: auth-audit
|
|
210
|
+
description: |
|
|
211
|
+
Audit authentication and authorization code paths for real issues.
|
|
212
|
+
Ground every claim in repository evidence.
|
|
213
|
+
For each finding, include concrete file paths and explain the exact control/data flow.
|
|
214
|
+
Do not speculate about behavior that is not visible in code.
|
|
215
|
+
defaultRounds: 3
|
|
216
|
+
defaultReadOnly: bestEffort
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
counselors loop --preset ./presets/auth-audit.yml "src/auth and middleware"
|
|
221
|
+
counselors loop --preset ./presets/auth-audit.yml "session + token flows" --dry-run
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
Guidelines for "truth of the code" presets:
|
|
225
|
+
- Write `description` so findings must cite concrete evidence (file paths, functions, branches, tests).
|
|
226
|
+
- Require the agent to separate observed behavior from assumptions and call out unknowns explicitly.
|
|
227
|
+
- Ask for reproducible checks (commands/tests) for each high-confidence claim.
|
|
228
|
+
- Keep the focus target narrow in the prompt argument (specific dirs, modules, or request paths).
|
|
229
|
+
|
|
230
|
+
### `mkdir [prompt]`
|
|
231
|
+
|
|
232
|
+
Create a counselors output directory and optionally write `prompt.md` without dispatching.
|
|
233
|
+
|
|
234
|
+
If you do not provide a prompt (arg, `-f`, or stdin), `mkdir` creates only the containing directory.
|
|
235
|
+
|
|
236
|
+
Useful when an orchestrating agent wants counselors to own output-dir creation and just return paths.
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
counselors mkdir --json
|
|
240
|
+
counselors mkdir "Review the auth flow for edge cases" --json
|
|
241
|
+
echo "prompt" | counselors mkdir --json
|
|
242
|
+
cat prompt.md | counselors mkdir --json
|
|
243
|
+
counselors mkdir -f prompt.md --json
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
The JSON output includes:
|
|
247
|
+
- `outputDir`
|
|
248
|
+
- `promptFilePath` (`null` when no prompt was provided)
|
|
249
|
+
- `slug`
|
|
250
|
+
- `promptSource` (`none`, `inline`, `file`, or `stdin`)
|
|
251
|
+
|
|
121
252
|
### `init`
|
|
122
253
|
|
|
123
254
|
Interactive setup wizard. Discovers installed AI CLIs, lets you pick tools and models, runs validation tests.
|
|
@@ -163,6 +294,14 @@ counselors cleanup --dry-run --older-than 7d
|
|
|
163
294
|
counselors cleanup --older-than 36h --yes
|
|
164
295
|
```
|
|
165
296
|
|
|
297
|
+
### `config`
|
|
298
|
+
|
|
299
|
+
Print the config file path and the full resolved configuration as JSON.
|
|
300
|
+
|
|
301
|
+
```bash
|
|
302
|
+
counselors config
|
|
303
|
+
```
|
|
304
|
+
|
|
166
305
|
### `tools`
|
|
167
306
|
|
|
168
307
|
Manage configured tools.
|
|
@@ -274,6 +413,24 @@ Each run creates a directory under your configured output directory (`defaults.o
|
|
|
274
413
|
|
|
275
414
|
If the `{slug}` directory already exists, counselors appends a timestamp suffix to avoid collisions.
|
|
276
415
|
|
|
416
|
+
For multi-round runs (`loop`), each round gets its own subdirectory:
|
|
417
|
+
|
|
418
|
+
```
|
|
419
|
+
<outputDir>/{slug}/
|
|
420
|
+
round-1/
|
|
421
|
+
prompt.md
|
|
422
|
+
{tool-id}.md
|
|
423
|
+
{tool-id}.stderr
|
|
424
|
+
round-notes.md
|
|
425
|
+
round-2/
|
|
426
|
+
prompt.md # augmented with prior round outputs
|
|
427
|
+
{tool-id}.md
|
|
428
|
+
round-notes.md
|
|
429
|
+
...
|
|
430
|
+
final-notes.md # combined notes across all rounds
|
|
431
|
+
run.json # manifest with rounds array
|
|
432
|
+
```
|
|
433
|
+
|
|
277
434
|
## Skill / slash command
|
|
278
435
|
|
|
279
436
|
Install `/counselors` as a skill in Claude Code or other agents:
|
|
@@ -286,7 +443,7 @@ counselors skill
|
|
|
286
443
|
counselors agent
|
|
287
444
|
```
|
|
288
445
|
|
|
289
|
-
The skill template provides a multi-phase workflow: gather context, select agents, assemble prompt,
|
|
446
|
+
The skill template provides a multi-phase workflow: gather context, select agents, choose dispatch mode (`run` vs `loop`), assemble prompt/focus, create prompt files via `counselors mkdir` when needed, dispatch, read results, and synthesize a combined answer.
|
|
290
447
|
|
|
291
448
|
## How is this different from...?
|
|
292
449
|
|
|
@@ -374,6 +531,12 @@ Codex also found 2 bugs all agents acknowledged: dedup by name drops valid sugge
|
|
|
374
531
|
| Phase ordering | All agree: keep phases 4 and 5 separate, add a Phase 0 for compat checker |
|
|
375
532
|
| `renderer.metrics` hack | All agree: high to extremely high risk of breakage in 0.4.0 |
|
|
376
533
|
|
|
534
|
+
**Topic: Multi-round test gap hunting** — _`loop --preset test`_
|
|
535
|
+
|
|
536
|
+
> counselors loop --preset test --scope src/auth/ --rounds 3
|
|
537
|
+
|
|
538
|
+
Round 1 discovers the test landscape and finds initial gaps. Round 2 reads the round-1 reports and hunts for edge cases the first round missed. Round 3 goes deeper on anything still uncovered. Each agent independently builds on prior findings without repeating them.
|
|
539
|
+
|
|
377
540
|
## Security
|
|
378
541
|
|
|
379
542
|
- **Environment allowlisting**: Child processes only receive allowlisted environment variables (PATH, HOME, API keys, proxy settings, etc.) — no full `process.env` leak.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
name: bughunt
|
|
2
|
+
description: |
|
|
3
|
+
You are hunting for real correctness bugs, edge-case failures, and missing tests that would allow regressions.
|
|
4
|
+
|
|
5
|
+
Prioritize:
|
|
6
|
+
- User-visible correctness failures over style issues
|
|
7
|
+
- High-blast-radius bugs over speculative nits
|
|
8
|
+
- Findings likely to produce meaningful failing tests
|
|
9
|
+
|
|
10
|
+
Look for:
|
|
11
|
+
- Logic errors: wrong conditionals, off-by-one, incorrect defaults, null/undefined handling gaps
|
|
12
|
+
- Boundary and error-path failures: empty inputs, max/min values, partial failures, cleanup/rollback gaps
|
|
13
|
+
- Validation and contract gaps: unchecked inputs, missing type guards, mismatched return assumptions
|
|
14
|
+
- Concurrency/order bugs: TOCTOU, race conditions, shared mutable state hazards, invalid async state transitions
|
|
15
|
+
- Resource-lifecycle bugs: unclosed handles, unreleased locks, dangling listeners, swallowed exceptions in finally blocks
|
|
16
|
+
- Missing test coverage on risky branches: error handlers, retry logic, fallback paths, migration paths
|
|
17
|
+
|
|
18
|
+
Multi-round rule:
|
|
19
|
+
- Prioritize novel findings not already reported in prior rounds.
|
|
20
|
+
- If you repeat a prior finding, add new evidence, sharper impact analysis, or a better test strategy.
|
|
21
|
+
|
|
22
|
+
For each finding, include:
|
|
23
|
+
- severity: critical | high | medium | low
|
|
24
|
+
- confidence: high | medium | low
|
|
25
|
+
- location: file path + function/method name
|
|
26
|
+
- evidence: concrete code pattern and why it can fail at runtime
|
|
27
|
+
- impact: user/system consequence if triggered
|
|
28
|
+
- minimal fix: smallest safe change
|
|
29
|
+
- test idea: a concrete failing test scenario (inputs + expected behavior)
|
|
30
|
+
|
|
31
|
+
Skip trivial style comments unless they hide a correctness bug.
|
|
32
|
+
defaultRounds: 3
|
|
33
|
+
defaultReadOnly: enforced
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
name: contracts
|
|
2
|
+
description: |
|
|
3
|
+
You are auditing for API contract drift across server handlers, shared types, clients, validators, and tests.
|
|
4
|
+
|
|
5
|
+
Look for:
|
|
6
|
+
- Request/response field mismatches between API handlers and consumers (missing, renamed, retyped, or re-nested fields)
|
|
7
|
+
- Optional vs required drift across schemas, runtime validators, and TypeScript/PHP/Python types
|
|
8
|
+
- Enum and status value drift across backend models, API serializers, and frontend/client assumptions
|
|
9
|
+
- Inconsistent error contracts: different status codes or error payload shapes for similar failure classes
|
|
10
|
+
- Versioning and backward-compatibility breaks (silent behavior changes, removed fields, stricter parsing)
|
|
11
|
+
- Serialization mismatches (date/time formats, number/string coercion, nullability handling)
|
|
12
|
+
- Documentation/examples/spec files that no longer match actual implementation behavior
|
|
13
|
+
|
|
14
|
+
For each issue found, include the producer and consumer locations, describe the concrete contract mismatch, and explain the runtime impact. Suggest a contract test or integration test that would fail before the fix and pass after it.
|
|
15
|
+
defaultRounds: 3
|
|
16
|
+
defaultReadOnly: enforced
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
name: hotspots
|
|
2
|
+
description: |
|
|
3
|
+
You are auditing for high-impact performance bottlenecks, with emphasis on asymptotic complexity and scaling behavior.
|
|
4
|
+
|
|
5
|
+
Prioritize:
|
|
6
|
+
- Hot-path issues over cold-path micro-optimizations
|
|
7
|
+
- Large wins with low implementation risk
|
|
8
|
+
- Evidence-backed findings over speculative tuning
|
|
9
|
+
|
|
10
|
+
Look for:
|
|
11
|
+
- Accidental O(n^2)+ patterns: nested scans, repeated sort/filter passes, per-item linear lookups
|
|
12
|
+
- N+1 access patterns across database, API, filesystem, queue, or cache boundaries
|
|
13
|
+
- Unbounded traversal/fan-out work that grows poorly with input size
|
|
14
|
+
- Repeated expensive work that should be cached, memoized, batched, or precomputed
|
|
15
|
+
- Serialization/parsing churn on hot paths (JSON encode/decode loops, repeated cloning/transforms)
|
|
16
|
+
- Large allocations/copies in tight loops instead of incremental reuse
|
|
17
|
+
- Missing pagination, streaming, chunking, or backpressure that causes latency/memory spikes
|
|
18
|
+
|
|
19
|
+
Multi-round rule:
|
|
20
|
+
- Prioritize novel hotspots not already reported in prior rounds.
|
|
21
|
+
- If you repeat a hotspot, add stronger evidence, better complexity analysis, or a lower-risk fix.
|
|
22
|
+
|
|
23
|
+
For each finding, include:
|
|
24
|
+
- severity: critical | high | medium | low
|
|
25
|
+
- confidence: high | medium | low
|
|
26
|
+
- location: file path + function/method name
|
|
27
|
+
- evidence: exact code path and operation causing the cost
|
|
28
|
+
- complexity: define input variables (for example n, m) and estimate before/after Big-O
|
|
29
|
+
- impact: expected latency/throughput/memory effect and where it appears
|
|
30
|
+
- minimal fix: smallest safe change (index/map usage, batching, caching, pagination, etc.)
|
|
31
|
+
- validation idea: benchmark/profiling or test strategy to confirm the gain
|
|
32
|
+
|
|
33
|
+
Skip tiny micro-optimizations unless they are clearly on a critical hot path.
|
|
34
|
+
defaultRounds: 4
|
|
35
|
+
defaultReadOnly: enforced
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
name: invariants
|
|
2
|
+
description: |
|
|
3
|
+
You are auditing a codebase for state synchronization issues, impossible states, and state management anti-patterns.
|
|
4
|
+
|
|
5
|
+
Look for:
|
|
6
|
+
- Boolean explosion: multiple booleans creating 2^n states where many combinations are impossible (e.g. isLoading && isError both true)
|
|
7
|
+
- Impossible states: bags of optionals instead of discriminated unions — types that allow combinations that should never exist
|
|
8
|
+
- Magic strings: string literals used for status/state comparisons instead of enums or constants
|
|
9
|
+
- Status mismatches: database enums not matching code enums (different spelling, different count, different casing)
|
|
10
|
+
- Duplicated state: the same data stored in multiple locations that can get out of sync
|
|
11
|
+
- Derived state stored: computed values persisted when they could be calculated on the fly (e.g. totalCount stored instead of items.length)
|
|
12
|
+
- Missing state machines: complex multi-step flows or status fields with 4+ values managed with ad-hoc conditionals instead of explicit state machines
|
|
13
|
+
- Single source of truth violations: the same authoritative data defined in multiple places (validation rules duplicated client/server, type definitions copied across files, permissions checked in both frontend and backend with different logic)
|
|
14
|
+
|
|
15
|
+
For each issue found, include the file path, the specific code pattern, and what can go wrong. Suggest the minimal fix — discriminated union, enum extraction, computed getter, or state machine. Focus on drift that can cause real bugs at runtime, not theoretical concerns.
|
|
16
|
+
defaultRounds: 3
|
|
17
|
+
defaultReadOnly: enforced
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
name: regression
|
|
2
|
+
description: |
|
|
3
|
+
You are auditing a codebase for regression risk with emphasis on behavior changes that can break existing users or dependent systems.
|
|
4
|
+
|
|
5
|
+
Look for:
|
|
6
|
+
- Contract drift in function signatures, return shapes, event payloads, or CLI flags that callers may rely on
|
|
7
|
+
- Removed or weakened guards (validation, authorization, null checks) that previously prevented invalid states
|
|
8
|
+
- Refactors that changed control flow or ordering semantics in subtle ways (initialization order, retry order, cleanup timing)
|
|
9
|
+
- Partial migrations where old and new code paths can diverge in behavior
|
|
10
|
+
- Error handling regressions: swallowed exceptions, changed status codes, missing rollback/cleanup in failure paths
|
|
11
|
+
- Feature flag or config default changes that alter runtime behavior without clear migration handling
|
|
12
|
+
- Tests that assert implementation details but miss observable behavior, leaving real regressions undetected
|
|
13
|
+
|
|
14
|
+
For each issue found, include file path and function/method name, explain the user-visible regression risk, and suggest a concrete failing test that would catch it. Prioritize high-blast-radius risks over low-impact code style concerns.
|
|
15
|
+
defaultRounds: 3
|
|
16
|
+
defaultReadOnly: enforced
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
name: security
|
|
2
|
+
description: |
|
|
3
|
+
You are a security engineer reviewing a codebase with an attacker's mindset. Your goal is to find exploitable vulnerabilities, not theoretical concerns.
|
|
4
|
+
|
|
5
|
+
Look for:
|
|
6
|
+
- Injection flaws: SQL, NoSQL, OS command, LDAP, or template injection from unsanitized user input reaching queries, shells, or eval
|
|
7
|
+
- Broken authentication: weak password handling, missing rate limiting, session fixation, credential exposure in logs or error messages
|
|
8
|
+
- Broken access control: missing authorization checks, insecure direct object references (IDOR), privilege escalation, path traversal
|
|
9
|
+
- Sensitive data exposure: secrets in source code, unencrypted storage or transit, excessive data in API responses, PII in logs
|
|
10
|
+
- Cross-site scripting (XSS): reflected, stored, or DOM-based — user input reaching HTML, attributes, or JavaScript without escaping
|
|
11
|
+
- Insecure deserialization: untrusted data passed to deserializers (pickle, unserialize, JSON.parse of executable content, yaml.load)
|
|
12
|
+
- Security misconfiguration: verbose error messages leaking internals, debug mode in production, default credentials, overly permissive CORS
|
|
13
|
+
- Missing cryptographic controls: hardcoded keys, weak algorithms (MD5, SHA1 for passwords), predictable tokens, improper random number generation
|
|
14
|
+
|
|
15
|
+
For each vulnerability found, include the file path, the vulnerable code pattern, how an attacker would exploit it, and the specific fix. Prioritize findings by exploitability — a real injection flaw matters more than a missing security header.
|
|
16
|
+
defaultRounds: 3
|
|
17
|
+
defaultReadOnly: enforced
|