counselors 0.4.12 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -118,6 +118,137 @@ counselors run -t opus,opus,opus "Review this" # Run the same tool multiple tim
118
118
  | `--json` | Output manifest as JSON |
119
119
  | `-o, --output-dir <dir>` | Base output directory |
120
120
 
121
+ ### `loop [prompt]`
122
+
123
+ Multi-round dispatch — agents iterate, seeing prior outputs each round.
124
+
125
+ Each round dispatches to all tools in parallel. Starting from round 2, each agent receives the outputs from all prior rounds, so it can build on previous analysis and avoid repeating findings.
126
+
127
+ ```text
128
+ input: user prompt/focus (e.g.: "focus on the auth module", "look at the sidebar component")
129
+ |
130
+ +--> with --preset:
131
+ | [repo discovery phase] --> [prompt-writing phase] --> execution prompt (includes boilerplate)
132
+ +--> without --preset:
133
+ inline arg prompt:
134
+ default: [repo discovery phase] --> [prompt-writing phase] --> enhanced execution prompt
135
+ opt-out: --no-inline-enhancement (skip discovery/prompt-writing)
136
+ file/stdin prompt: used as provided (discovery/prompt-writing skipped)
137
+
138
+ all modes: execution boilerplate is always appended
139
+
140
+ execution prompt
141
+ |
142
+ v
143
+ +------------------------------- loop rounds -------------------------------+
144
+ | round 1: dispatch to all selected tools in parallel |
145
+ | write per-tool outputs + round notes |
146
+ | |
147
+ | round N>1: execution prompt + references to prior round outputs |
148
+ | (new findings, challenge/refine prior findings) |
149
+ | dispatch in parallel, write outputs + notes |
150
+ | |
151
+ | stop when: |
152
+ | - max rounds reached, or |
153
+ | - duration expires, or |
154
+ | - convergence threshold reached, or |
155
+ | - user aborts (Ctrl+C after current round) |
156
+ +---------------------------------------------------------------------------+
157
+ |
158
+ v
159
+ final notes + run manifest
160
+ ```
161
+
162
+ ```text
163
+ Round behavior:
164
+
165
+ round 1 prompt = base execution prompt
166
+
167
+
168
+ round N prompt = base execution prompt
169
+ // Base execution prompt is amended with...
170
+ + "Prior Round Outputs" section
171
+ + @refs to recent prior tool outputs
172
+ + instruction to avoid duplicate findings, challenge/refine
173
+ prior claims, and expand from prior leads
174
+ ```
175
+
176
+ ```bash
177
+ counselors loop "Find and fix test gaps in src/auth/" --rounds 5
178
+ counselors loop --duration 30m "Hunt for edge cases"
179
+ counselors loop --preset bughunt "src/api" --tools opus,codex
180
+ counselors loop --preset hotspots "critical request path" --group smart
181
+ counselors loop --list-presets
182
+ ```
183
+
184
+ | Flag | Description |
185
+ |------|-------------|
186
+ | `--rounds <N>` | Number of dispatch rounds (default: 3) |
187
+ | `--duration <time>` | Max total duration (e.g. `"30m"`, `"1h"`). If set without `--rounds`, runs unlimited rounds until time expires |
188
+ | `--preset <name-or-path>` | Use a built-in preset (e.g. `"bughunt"`) or a custom `.yml/.yaml` preset file |
189
+ | `--list-presets` | List built-in presets and exit |
190
+ | `--no-inline-enhancement` | For non-preset inline prompts, skip discovery + prompt-writing enhancement |
191
+
192
+ Plus all `run` flags: `-f`, `-t`, `-g`, `--context`, `--read-only`, `--dry-run`, `--json`, `-o`.
193
+
194
+ **SIGINT handling:** First Ctrl+C finishes the current round gracefully. Second Ctrl+C force-exits immediately.
195
+
196
+ **Presets** provide domain-specific multi-round workflows.
197
+
198
+ Built-ins:
199
+ - `bughunt` — bugs, edge cases, and missing test coverage
200
+ - `security` — exploitable vulnerabilities and high-impact security flaws
201
+ - `invariants` — impossible states and state synchronization problems
202
+ - `regression` — behavior changes likely to break existing callers/users
203
+ - `contracts` — mismatches between API producers and consumers
204
+ - `hotspots` — high-impact bottlenecks, including O(n^2)+ patterns
205
+
206
+ Custom presets (code-grounded):
207
+
208
+ ```yaml
209
+ name: auth-audit
210
+ description: |
211
+ Audit authentication and authorization code paths for real issues.
212
+ Ground every claim in repository evidence.
213
+ For each finding, include concrete file paths and explain the exact control/data flow.
214
+ Do not speculate about behavior that is not visible in code.
215
+ defaultRounds: 3
216
+ defaultReadOnly: bestEffort
217
+ ```
218
+
219
+ ```bash
220
+ counselors loop --preset ./presets/auth-audit.yml "src/auth and middleware"
221
+ counselors loop --preset ./presets/auth-audit.yml "session + token flows" --dry-run
222
+ ```
223
+
224
+ Guidelines for "truth of the code" presets:
225
+ - Write `description` so findings must cite concrete evidence (file paths, functions, branches, tests).
226
+ - Require the agent to separate observed behavior from assumptions and call out unknowns explicitly.
227
+ - Ask for reproducible checks (commands/tests) for each high-confidence claim.
228
+ - Keep the focus target narrow in the prompt argument (specific dirs, modules, or request paths).
229
+
230
+ ### `mkdir [prompt]`
231
+
232
+ Create a counselors output directory and optionally write `prompt.md` without dispatching.
233
+
234
+ If you do not provide a prompt (arg, `-f`, or stdin), `mkdir` creates only the containing directory.
235
+
236
+ Useful when an orchestrating agent wants counselors to own output-dir creation and just return paths.
237
+
238
+ ```bash
239
+ counselors mkdir --json
240
+ counselors mkdir "Review the auth flow for edge cases" --json
241
+ echo "prompt" | counselors mkdir --json
242
+ cat prompt.md | counselors mkdir --json
243
+ counselors mkdir -f prompt.md --json
244
+ ```
245
+
246
+ The JSON output includes:
247
+ - `outputDir`
248
+ - `promptFilePath` (`null` when no prompt was provided)
249
+ - `slug`
250
+ - `promptSource` (`none`, `inline`, `file`, or `stdin`)
251
+
121
252
  ### `init`
122
253
 
123
254
  Interactive setup wizard. Discovers installed AI CLIs, lets you pick tools and models, runs validation tests.
@@ -163,6 +294,14 @@ counselors cleanup --dry-run --older-than 7d
163
294
  counselors cleanup --older-than 36h --yes
164
295
  ```
165
296
 
297
+ ### `config`
298
+
299
+ Print the config file path and the full resolved configuration as JSON.
300
+
301
+ ```bash
302
+ counselors config
303
+ ```
304
+
166
305
  ### `tools`
167
306
 
168
307
  Manage configured tools.
@@ -274,6 +413,24 @@ Each run creates a directory under your configured output directory (`defaults.o
274
413
 
275
414
  If the `{slug}` directory already exists, counselors appends a timestamp suffix to avoid collisions.
276
415
 
416
+ For multi-round runs (`loop`), each round gets its own subdirectory:
417
+
418
+ ```
419
+ <outputDir>/{slug}/
420
+ round-1/
421
+ prompt.md
422
+ {tool-id}.md
423
+ {tool-id}.stderr
424
+ round-notes.md
425
+ round-2/
426
+ prompt.md # augmented with prior round outputs
427
+ {tool-id}.md
428
+ round-notes.md
429
+ ...
430
+ final-notes.md # combined notes across all rounds
431
+ run.json # manifest with rounds array
432
+ ```
433
+
277
434
  ## Skill / slash command
278
435
 
279
436
  Install `/counselors` as a skill in Claude Code or other agents:
@@ -286,7 +443,7 @@ counselors skill
286
443
  counselors agent
287
444
  ```
288
445
 
289
- The skill template provides a multi-phase workflow: gather context, select agents, assemble prompt, dispatch via `counselors run`, read results, and synthesize a combined answer.
446
+ The skill template provides a multi-phase workflow: gather context, select agents, choose dispatch mode (`run` vs `loop`), assemble prompt/focus, create prompt files via `counselors mkdir` when needed, dispatch, read results, and synthesize a combined answer.
290
447
 
291
448
  ## How is this different from...?
292
449
 
@@ -374,6 +531,12 @@ Codex also found 2 bugs all agents acknowledged: dedup by name drops valid sugge
374
531
  | Phase ordering | All agree: keep phases 4 and 5 separate, add a Phase 0 for compat checker |
375
532
  | `renderer.metrics` hack | All agree: high to extremely high risk of breakage in 0.4.0 |
376
533
 
534
+ **Topic: Multi-round test gap hunting** — _`loop --preset test`_
535
+
536
+ > counselors loop --preset test --scope src/auth/ --rounds 3
537
+
538
+ Round 1 discovers the test landscape and finds initial gaps. Round 2 reads the round-1 reports and hunts for edge cases the first round missed. Round 3 goes deeper on anything still uncovered. Each agent independently builds on prior findings without repeating them.
539
+
377
540
  ## Security
378
541
 
379
542
  - **Environment allowlisting**: Child processes only receive allowlisted environment variables (PATH, HOME, API keys, proxy settings, etc.) — no full `process.env` leak.
@@ -0,0 +1,33 @@
1
+ name: bughunt
2
+ description: |
3
+ You are hunting for real correctness bugs, edge-case failures, and missing tests that would allow regressions.
4
+
5
+ Prioritize:
6
+ - User-visible correctness failures over style issues
7
+ - High-blast-radius bugs over speculative nits
8
+ - Findings likely to produce meaningful failing tests
9
+
10
+ Look for:
11
+ - Logic errors: wrong conditionals, off-by-one, incorrect defaults, null/undefined handling gaps
12
+ - Boundary and error-path failures: empty inputs, max/min values, partial failures, cleanup/rollback gaps
13
+ - Validation and contract gaps: unchecked inputs, missing type guards, mismatched return assumptions
14
+ - Concurrency/order bugs: TOCTOU, race conditions, shared mutable state hazards, invalid async state transitions
15
+ - Resource-lifecycle bugs: unclosed handles, unreleased locks, dangling listeners, swallowed exceptions in finally blocks
16
+ - Missing test coverage on risky branches: error handlers, retry logic, fallback paths, migration paths
17
+
18
+ Multi-round rule:
19
+ - Prioritize novel findings not already reported in prior rounds.
20
+ - If you repeat a prior finding, add new evidence, sharper impact analysis, or a better test strategy.
21
+
22
+ For each finding, include:
23
+ - severity: critical | high | medium | low
24
+ - confidence: high | medium | low
25
+ - location: file path + function/method name
26
+ - evidence: concrete code pattern and why it can fail at runtime
27
+ - impact: user/system consequence if triggered
28
+ - minimal fix: smallest safe change
29
+ - test idea: a concrete failing test scenario (inputs + expected behavior)
30
+
31
+ Skip trivial style comments unless they hide a correctness bug.
32
+ defaultRounds: 3
33
+ defaultReadOnly: enforced
@@ -0,0 +1,16 @@
1
+ name: contracts
2
+ description: |
3
+ You are auditing for API contract drift across server handlers, shared types, clients, validators, and tests.
4
+
5
+ Look for:
6
+ - Request/response field mismatches between API handlers and consumers (missing, renamed, retyped, or re-nested fields)
7
+ - Optional vs required drift across schemas, runtime validators, and TypeScript/PHP/Python types
8
+ - Enum and status value drift across backend models, API serializers, and frontend/client assumptions
9
+ - Inconsistent error contracts: different status codes or error payload shapes for similar failure classes
10
+ - Versioning and backward-compatibility breaks (silent behavior changes, removed fields, stricter parsing)
11
+ - Serialization mismatches (date/time formats, number/string coercion, nullability handling)
12
+ - Documentation/examples/spec files that no longer match actual implementation behavior
13
+
14
+ For each issue found, include the producer and consumer locations, describe the concrete contract mismatch, and explain the runtime impact. Suggest a contract test or integration test that would fail before the fix and pass after it.
15
+ defaultRounds: 3
16
+ defaultReadOnly: enforced
@@ -0,0 +1,35 @@
1
+ name: hotspots
2
+ description: |
3
+ You are auditing for high-impact performance bottlenecks, with emphasis on asymptotic complexity and scaling behavior.
4
+
5
+ Prioritize:
6
+ - Hot-path issues over cold-path micro-optimizations
7
+ - Large wins with low implementation risk
8
+ - Evidence-backed findings over speculative tuning
9
+
10
+ Look for:
11
+ - Accidental O(n^2)+ patterns: nested scans, repeated sort/filter passes, per-item linear lookups
12
+ - N+1 access patterns across database, API, filesystem, queue, or cache boundaries
13
+ - Unbounded traversal/fan-out work that grows poorly with input size
14
+ - Repeated expensive work that should be cached, memoized, batched, or precomputed
15
+ - Serialization/parsing churn on hot paths (JSON encode/decode loops, repeated cloning/transforms)
16
+ - Large allocations/copies in tight loops instead of incremental reuse
17
+ - Missing pagination, streaming, chunking, or backpressure that causes latency/memory spikes
18
+
19
+ Multi-round rule:
20
+ - Prioritize novel hotspots not already reported in prior rounds.
21
+ - If you repeat a hotspot, add stronger evidence, better complexity analysis, or a lower-risk fix.
22
+
23
+ For each finding, include:
24
+ - severity: critical | high | medium | low
25
+ - confidence: high | medium | low
26
+ - location: file path + function/method name
27
+ - evidence: exact code path and operation causing the cost
28
+ - complexity: define input variables (for example n, m) and estimate before/after Big-O
29
+ - impact: expected latency/throughput/memory effect and where it appears
30
+ - minimal fix: smallest safe change (index/map usage, batching, caching, pagination, etc.)
31
+ - validation idea: benchmark/profiling or test strategy to confirm the gain
32
+
33
+ Skip tiny micro-optimizations unless they are clearly on a critical hot path.
34
+ defaultRounds: 4
35
+ defaultReadOnly: enforced
@@ -0,0 +1,17 @@
1
+ name: invariants
2
+ description: |
3
+ You are auditing a codebase for state synchronization issues, impossible states, and state management anti-patterns.
4
+
5
+ Look for:
6
+ - Boolean explosion: multiple booleans creating 2^n states where many combinations are impossible (e.g. isLoading && isError both true)
7
+ - Impossible states: bags of optionals instead of discriminated unions — types that allow combinations that should never exist
8
+ - Magic strings: string literals used for status/state comparisons instead of enums or constants
9
+ - Status mismatches: database enums not matching code enums (different spelling, different count, different casing)
10
+ - Duplicated state: the same data stored in multiple locations that can get out of sync
11
+ - Derived state stored: computed values persisted when they could be calculated on the fly (e.g. totalCount stored instead of items.length)
12
+ - Missing state machines: complex multi-step flows or status fields with 4+ values managed with ad-hoc conditionals instead of explicit state machines
13
+ - Single source of truth violations: the same authoritative data defined in multiple places (validation rules duplicated client/server, type definitions copied across files, permissions checked in both frontend and backend with different logic)
14
+
15
+ For each issue found, include the file path, the specific code pattern, and what can go wrong. Suggest the minimal fix — discriminated union, enum extraction, computed getter, or state machine. Focus on drift that can cause real bugs at runtime, not theoretical concerns.
16
+ defaultRounds: 3
17
+ defaultReadOnly: enforced
@@ -0,0 +1,16 @@
1
+ name: regression
2
+ description: |
3
+ You are auditing a codebase for regression risk with emphasis on behavior changes that can break existing users or dependent systems.
4
+
5
+ Look for:
6
+ - Contract drift in function signatures, return shapes, event payloads, or CLI flags that callers may rely on
7
+ - Removed or weakened guards (validation, authorization, null checks) that previously prevented invalid states
8
+ - Refactors that changed control flow or ordering semantics in subtle ways (initialization order, retry order, cleanup timing)
9
+ - Partial migrations where old and new code paths can diverge in behavior
10
+ - Error handling regressions: swallowed exceptions, changed status codes, missing rollback/cleanup in failure paths
11
+ - Feature flag or config default changes that alter runtime behavior without clear migration handling
12
+ - Tests that assert implementation details but miss observable behavior, leaving real regressions undetected
13
+
14
+ For each issue found, include file path and function/method name, explain the user-visible regression risk, and suggest a concrete failing test that would catch it. Prioritize high-blast-radius risks over low-impact code style concerns.
15
+ defaultRounds: 3
16
+ defaultReadOnly: enforced
@@ -0,0 +1,17 @@
1
+ name: security
2
+ description: |
3
+ You are a security engineer reviewing a codebase with an attacker's mindset. Your goal is to find exploitable vulnerabilities, not theoretical concerns.
4
+
5
+ Look for:
6
+ - Injection flaws: SQL, NoSQL, OS command, LDAP, or template injection from unsanitized user input reaching queries, shells, or eval
7
+ - Broken authentication: weak password handling, missing rate limiting, session fixation, credential exposure in logs or error messages
8
+ - Broken access control: missing authorization checks, insecure direct object references (IDOR), privilege escalation, path traversal
9
+ - Sensitive data exposure: secrets in source code, unencrypted storage or transit, excessive data in API responses, PII in logs
10
+ - Cross-site scripting (XSS): reflected, stored, or DOM-based — user input reaching HTML, attributes, or JavaScript without escaping
11
+ - Insecure deserialization: untrusted data passed to deserializers (pickle, unserialize, JSON.parse of executable content, yaml.load)
12
+ - Security misconfiguration: verbose error messages leaking internals, debug mode in production, default credentials, overly permissive CORS
13
+ - Missing cryptographic controls: hardcoded keys, weak algorithms (MD5, SHA1 for passwords), predictable tokens, improper random number generation
14
+
15
+ For each vulnerability found, include the file path, the vulnerable code pattern, how an attacker would exploit it, and the specific fix. Prioritize findings by exploitability — a real injection flaw matters more than a missing security header.
16
+ defaultRounds: 3
17
+ defaultReadOnly: enforced