all-hands-cli 0.1.11 → 0.1.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -1,203 +1,157 @@
|
|
|
1
1
|
# Validation Tooling
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Programmatic validation replaces human supervision. Validation suites compound from stochastic exploration into deterministic gates.
|
|
4
4
|
|
|
5
5
|
## Crystallization Lifecycle
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
1. **Stochastic exploration** — Agent-driven exploratory testing using model intuition discovers patterns
|
|
7
|
+
1. **Stochastic exploration** — Agent-driven exploratory testing discovers patterns
|
|
10
8
|
2. **Pattern crystallization** — Discovered patterns become deterministic checks
|
|
11
9
|
3. **CI/CD entrenchment** — Deterministic checks gate releases
|
|
12
10
|
4. **Frontier shift** — Stochastic exploration moves to new unknowns
|
|
13
11
|
|
|
14
|
-
|
|
12
|
+
Every domain has both a stochastic dimension (exploratory) and a deterministic dimension (binary pass/fail).
|
|
15
13
|
|
|
16
14
|
## Suite Existence Threshold
|
|
17
15
|
|
|
18
|
-
A
|
|
16
|
+
A suite must have a meaningful stochastic dimension to justify existing. Deterministic-only tools (type checking, linting, formatting) are test commands in acceptance criteria and CI/CD — they are NOT suites.
|
|
19
17
|
|
|
20
18
|
## Repository Agnosticism
|
|
21
19
|
|
|
22
|
-
This
|
|
23
|
-
- Reference existing default validation suites shipped with this repo (currently: xcode-automation, browser-automation)
|
|
24
|
-
- Use generic/hypothetical descriptions that any target repository can map to their own context
|
|
25
|
-
|
|
26
|
-
When examples are needed, use **snippets from the existing default suites** rather than naming suites or commands that belong to a specific target project. Target repositories create their own suites for their domains — this file teaches how to create and structure them, not what they should be called.
|
|
20
|
+
This file MUST NOT contain project-specific references. All examples must either reference default suites shipped with this repo (currently: xcode-automation, browser-automation) or use generic descriptions. This file teaches patterns, not inventories.
|
|
27
21
|
|
|
28
|
-
|
|
22
|
+
Project-specific references cause agents to look for suites that don't exist in target repos and couple the harness to a single project. If a pattern needs a concrete example, draw it from xcode-automation or browser-automation.
|
|
29
23
|
|
|
30
24
|
## Creating Validation Tooling
|
|
31
25
|
|
|
32
26
|
Follow `.allhands/flows/shared/CREATE_VALIDATION_TOOLING_SPEC.md` for the full process. This creates a spec, not an implementation.
|
|
33
27
|
|
|
34
28
|
### Research Phase
|
|
35
|
-
-
|
|
36
|
-
-
|
|
29
|
+
- `ah tavily search "<validation_type> testing tools"` for available tools
|
|
30
|
+
- `ah perplexity research "best practices <validation_type> testing <technology>"` for best practices
|
|
37
31
|
- Determine whether the domain has a meaningful stochastic dimension before proceeding
|
|
38
|
-
-
|
|
32
|
+
- `ah tools --list` to check existing MCP integrations
|
|
39
33
|
|
|
40
34
|
### Tool Validation Phase
|
|
41
|
-
|
|
35
|
+
Research produces assumptions; running the tool produces ground truth:
|
|
42
36
|
- Install and verify tool responds to `--help`
|
|
43
37
|
- Create a minimal test target (temp directory, not committed)
|
|
44
38
|
- Execute representative stochastic workflows
|
|
45
|
-
-
|
|
39
|
+
- Try commands against codebase-relevant scenarios
|
|
46
40
|
- Document divergences from researched documentation
|
|
47
41
|
|
|
48
42
|
### Suite Writing Philosophy
|
|
49
43
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
-
|
|
53
|
-
- **
|
|
54
|
-
- **Motivation framing**: Frame around harness value: reducing human-in-loop supervision, verifying code quality, confirming implementation matches expectations.
|
|
55
|
-
- **Exploration categories**: Describe with enough command specificity to orient. For untested territory, prefer motivations over prescriptive sequences — the agent extrapolates better from goals than rigid steps. For patterns verified through testing, state them authoritatively (see below).
|
|
44
|
+
- **`--help` as prerequisite**: Suites MUST instruct agents to run `<tool> --help` before exploration. Suites MUST NOT replicate full command docs.
|
|
45
|
+
- **Inline command examples**: Weave brief examples into use-case motivations as calibration anchors — not exhaustive catalogs.
|
|
46
|
+
- **Motivation framing**: Frame around reducing human-in-loop supervision, verifying quality, confirming implementation matches expectations.
|
|
47
|
+
- **Exploration categories**: Enough command specificity to orient. Untested territory: motivations over prescriptive sequences. Verified patterns: state authoritatively.
|
|
56
48
|
|
|
57
|
-
Formula: **motivations
|
|
49
|
+
Formula: **motivations + inline command examples + `--help` for progressive disclosure**.
|
|
58
50
|
|
|
59
51
|
### Proven vs Untested Guidance
|
|
60
52
|
|
|
61
|
-
Validation
|
|
53
|
+
- **Proven patterns** (verified via Tool Validation Phase): State authoritatively within use-case motivations. Override generic tool docs when they conflict. Example: "`xctrace` requires `--device '<UDID>'` for simulator" is a hard requirement discovered through testing, stated directly alongside the motivation.
|
|
54
|
+
- **Untested edge cases**: Define the motivation and reference analogous solved examples. Do NOT write prescriptive steps for unverified scenarios — frontier models given clear motivation and a reference example extrapolate better than they follow rigid untested instructions.
|
|
62
55
|
|
|
63
|
-
|
|
64
|
-
- **Untested edge cases** (not yet exercised in this repo): Define the **motivation** (what the agent should achieve and why) and reference **analogous solved examples** from proven patterns. Do NOT write prescriptive step-by-step instructions for scenarios that haven't been verified — unverified prescriptions can mislead the agent into rigid sequences that don't match reality. Instead, trust that a frontier model given clear motivation and a reference example of how a similar problem was solved will extrapolate the correct approach through stochastic exploration.
|
|
65
|
-
|
|
66
|
-
**Why this matters**: Frontier models produce emergent, adaptive behavior when given goals and reference points. Unverified prescriptive instructions constrain this emergence and risk encoding incorrect assumptions. Motivation + examples activate the model's reasoning about the problem space; rigid untested instructions bypass it. The Tool Validation Phase exists to convert untested guidance into proven patterns over time — the crystallization lifecycle in action.
|
|
56
|
+
The Tool Validation Phase converts untested guidance into proven patterns over time — the crystallization lifecycle in action.
|
|
67
57
|
|
|
68
58
|
### Evidence Capture
|
|
69
59
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
- **Agent (self-verification)**: Primitives used during the observe-act-verify loop (state checks, assertions, console output). Real-time, not recorded.
|
|
73
|
-
- **Engineer (review artifacts)**: Trust evidence produced after exploration (recordings, screenshots, traces, reports).
|
|
60
|
+
- **Agent (self-verification)**: State checks, assertions, console output during observe-act-verify. Real-time, not recorded.
|
|
61
|
+
- **Engineer (review artifacts)**: Recordings, screenshots, traces, reports produced after exploration.
|
|
74
62
|
|
|
75
63
|
Pattern: explore first, capture second.
|
|
76
64
|
|
|
77
65
|
## Validation Suite Schema
|
|
78
66
|
|
|
79
|
-
Run `ah schema validation-suite` for the authoritative schema. Key sections
|
|
67
|
+
Run `ah schema validation-suite` for the authoritative schema. Key sections:
|
|
80
68
|
|
|
81
|
-
- **Stochastic Validation**: Agent-driven exploratory testing
|
|
69
|
+
- **Stochastic Validation**: Agent-driven exploratory testing
|
|
82
70
|
- **Deterministic Integration**: Binary pass/fail commands that gate completion
|
|
83
71
|
|
|
84
|
-
List available suites: `ah validation-tools list`
|
|
85
|
-
|
|
86
72
|
## Integration with Prompt Execution
|
|
87
73
|
|
|
88
|
-
Prompt files reference
|
|
89
|
-
1. Agent reads
|
|
90
|
-
2. Agent runs
|
|
74
|
+
Prompt files reference suites in `validation_suites` frontmatter. During execution:
|
|
75
|
+
1. Agent reads **Stochastic Validation** during implementation
|
|
76
|
+
2. Agent runs **Deterministic Integration** for acceptance criteria gating
|
|
91
77
|
3. Validation review (`PROMPT_VALIDATION_REVIEW.md`) confirms pass/fail
|
|
92
78
|
|
|
93
79
|
## Command Documentation Principle
|
|
94
80
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
**External tooling commands — Document explicitly**: Commands from external tools (`xctrace`, `xcrun simctl`, `agent-browser`, `playwright`, `curl`, etc.) are stable, unfamiliar to agents by default, and unlikely to change with codebase evolution. Document specific commands, flags, and use cases inline with motivations. Example from xcode-automation: `xcrun xctrace record --template 'Time Profiler' --device '<UDID>' --attach '<PID>'` — the flags, ordering constraints, and PID discovery method are all external tool knowledge that the suite documents explicitly.
|
|
98
|
-
|
|
99
|
-
**Internal codebase commands — Document patterns, not inventories**: Project-specific scripts, test commands, and codebase-specific CLI wrappers evolve rapidly. Instead:
|
|
100
|
-
1. **Document core infrastructure commands explicitly** — commands that boot services, manage environments, and are foundational to validation in the target project. These are stable and essential per-project, but suites should teach agents how to discover them (e.g., "check `package.json` scripts" or "run `--help`"), not hardcode specific script names.
|
|
101
|
-
2. **Teach patterns for everything else** — naming conventions, where to discover project commands, what categories mean, and how to build upon them.
|
|
102
|
-
3. **Document motivations** — why different test categories exist, when to use which, what confidence each provides.
|
|
103
|
-
|
|
104
|
-
Per **Frontier Models are Capable**: An agent given patterns + motivations + discovery instructions outperforms one given stale command inventories. Suites that teach patterns age gracefully; suites that enumerate commands require maintenance on every change.
|
|
81
|
+
- **External tooling** (xctrace, simctl, playwright, etc.) — Document explicitly: commands, flags, use cases inline with motivations. Stable and unfamiliar to agents by default. Example from xcode-automation: `xcrun xctrace record --template 'Time Profiler' --device '<UDID>' --attach '<PID>'` — flags, ordering constraints, and PID discovery are external tool knowledge that belongs in the suite.
|
|
82
|
+
- **Internal codebase commands** — Document patterns, not inventories: teach discovery (`package.json` scripts, `--help`), naming conventions, motivations for test categories. Pattern-based suites age gracefully; command inventories require constant maintenance.
|
|
105
83
|
|
|
106
84
|
## Decision Tree Requirement
|
|
107
85
|
|
|
108
|
-
Every
|
|
109
|
-
- Distinguish
|
|
110
|
-
- Show where
|
|
111
|
-
- Surface
|
|
112
|
-
- Cleanly articulate multiple expected use cases within a single suite
|
|
86
|
+
Every suite MUST include a decision tree routing agents to the correct validation approach:
|
|
87
|
+
- Distinguish relevant instructions per scenario (e.g., UI-only vs full E2E)
|
|
88
|
+
- Show where stochastic vs deterministic testing applies
|
|
89
|
+
- Surface branch points where other suites must be utilized (e.g., "Does this branch have native code changes? → Yes → follow xcode-automation decision tree")
|
|
113
90
|
|
|
114
|
-
The decision tree replaces flat prerequisite lists with structured routing
|
|
91
|
+
The decision tree replaces flat prerequisite lists with structured routing — an agent follows the branch matching their situation, skipping irrelevant setup.
|
|
115
92
|
|
|
116
93
|
## tmux Session Management Standard
|
|
117
94
|
|
|
118
|
-
|
|
95
|
+
Suites requiring long-running processes MUST use tmux:
|
|
119
96
|
|
|
120
97
|
```bash
|
|
121
|
-
#
|
|
98
|
+
# -t $TMUX_PANE pins split to agent's window, not user's focused window
|
|
122
99
|
tmux split-window -h -d -t $TMUX_PANE \
|
|
123
100
|
-c /path/to/repo '<command>'
|
|
124
101
|
```
|
|
125
102
|
|
|
126
|
-
**Observability**:
|
|
127
|
-
|
|
128
|
-
**
|
|
129
|
-
|
|
130
|
-
**Worktree isolation**: Each worktree uses unique ports (via `.env.local`), so tmux sessions in different worktrees don't conflict. Agents must use the correct repo path (`-c`) for the worktree they're operating in.
|
|
103
|
+
- **Observability**: Verify via `tmux capture-pane -p -t <pane_id>` before proceeding
|
|
104
|
+
- **Teardown**: Reverse order. `tmux send-keys -t <pane_id> C-c` or kill the pane
|
|
105
|
+
- **Worktree isolation**: Unique ports per worktree (`.env.local`), correct repo path (`-c`)
|
|
131
106
|
|
|
132
107
|
Reference xcode-automation as the canonical tmux pattern.
|
|
133
108
|
|
|
134
109
|
## Hypothesis-First Validation Workflow
|
|
135
110
|
|
|
136
|
-
New suites
|
|
111
|
+
New suites: draft, then test on a feature branch before marking guidance as proven.
|
|
137
112
|
|
|
138
|
-
1. **Draft**: Write suite
|
|
139
|
-
2. **Test on feature branch**:
|
|
140
|
-
3. **Verify & adjust**: Document what works, what doesn't
|
|
141
|
-
4. **Solidify**: Only
|
|
113
|
+
1. **Draft**: Write suite based on plan/analysis (mark unverified practices as hypotheses)
|
|
114
|
+
2. **Test on feature branch**: Exercise practices hands-on
|
|
115
|
+
3. **Verify & adjust**: Document what works, what doesn't
|
|
116
|
+
4. **Solidify**: Only verified practices become authoritative guidance
|
|
142
117
|
|
|
143
|
-
The plan/handoff document persists as the hypothesis record
|
|
118
|
+
The plan/handoff document persists as the hypothesis record for future work.
|
|
144
119
|
|
|
145
120
|
## Cross-Referencing Between Suites
|
|
146
121
|
|
|
147
|
-
**Reference**
|
|
148
|
-
|
|
149
|
-
**Inline** when the command is simple and stable (e.g., `xcrun simctl boot <UDID>`) — no need to send agents to another document for a single command.
|
|
150
|
-
|
|
151
|
-
Decision trees are the natural place for cross-references — branch points that route to another suite's decision tree. Example from browser-automation: "Does the change affect native iOS rendering? → Yes → follow xcode-automation decision tree for build and simulator verification."
|
|
152
|
-
|
|
153
|
-
## Testing Scenario Matrix
|
|
154
|
-
|
|
155
|
-
Target repositories should build a scenario matrix mapping their validation scenarios to suite combinations. The matrix documents which suites apply to which types of changes, so agents can quickly determine what validation is needed. Structure as a table:
|
|
122
|
+
- **Reference** for complex multi-step setup — point to the authoritative suite's decision tree
|
|
123
|
+
- **Inline** for simple, stable commands — no redirect needed for a single command
|
|
156
124
|
|
|
157
|
-
|
|
158
|
-
|----------|----------|-------|
|
|
159
|
-
| _Description of change type_ | _Which suites apply_ | _Any special setup or cross-references_ |
|
|
125
|
+
Decision tree branch points are the natural place for cross-references.
|
|
160
126
|
|
|
161
|
-
|
|
127
|
+
## Suite Discoverability
|
|
162
128
|
|
|
163
|
-
|
|
164
|
-
|----------|----------|-------|
|
|
165
|
-
| Browser UI changes only | browser-automation | Dev server must be running |
|
|
166
|
-
| Native iOS/macOS changes | xcode-automation | Simulator setup via session defaults |
|
|
167
|
-
| Cross-platform changes (web + native) | browser-automation + xcode-automation | Each suite's decision tree routes to the relevant validation path |
|
|
129
|
+
Suite discovery is programmatic, not manual. No maintained inventories or mapping tables.
|
|
168
130
|
|
|
169
|
-
|
|
131
|
+
- **During creation**: `ah validation-tools list` — check for overlap and cross-reference points before creating a new suite.
|
|
132
|
+
- **During utilization**: Agents run `ah validation-tools list` to discover suites via glob patterns and descriptions. Decision trees handle routing.
|
|
170
133
|
|
|
171
134
|
## Environment Management Patterns
|
|
172
135
|
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
**ENV injection**: Document how the target project injects environment variables for different contexts (local development, testing, production). Suites should teach the pattern (e.g., "check for `.env.*` files and wrapper scripts") rather than hardcoding specific variable names.
|
|
176
|
-
|
|
177
|
-
**Service isolation**: When validation requires running services (dev servers, databases, bundlers), document how to avoid port conflicts across concurrent worktrees or parallel agent sessions. Reference the suite's ENV Configuration table for relevant variables.
|
|
178
|
-
|
|
179
|
-
**Worktree isolation**: Each worktree should use unique ports and isolated service instances where possible. Suites should document which resources need isolation and how to configure it (e.g., xcode-automation documents simulator isolation via dedicated simulator clones and derived data paths).
|
|
180
|
-
|
|
181
|
-
## Suite Creation Guidance
|
|
136
|
+
Suites depending on environment configuration should document:
|
|
182
137
|
|
|
183
|
-
|
|
138
|
+
- **ENV injection**: Teach discovery patterns (e.g., "check `.env.*` files") rather than hardcoding variable names
|
|
139
|
+
- **Service isolation**: How to avoid port conflicts across concurrent worktrees/sessions
|
|
140
|
+
- **Worktree isolation**: Unique ports and isolated service instances per worktree
|
|
184
141
|
|
|
185
|
-
|
|
142
|
+
## Suite Creation Checklist
|
|
186
143
|
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
6. Document proven vs untested guidance per the Hypothesis-First Validation Workflow
|
|
144
|
+
1. Follow `ah schema validation-suite`
|
|
145
|
+
2. Validate stochastic dimension meets existence threshold
|
|
146
|
+
3. External tools explicit, internal commands via patterns + discovery
|
|
147
|
+
4. Include a Decision Tree
|
|
148
|
+
5. Use tmux standard for long-running processes
|
|
149
|
+
6. Mark proven vs untested guidance
|
|
194
150
|
7. Cross-reference other suites at decision tree branch points
|
|
195
151
|
|
|
196
|
-
**Structural templates
|
|
197
|
-
- xcode-automation — external-tool-heavy suite (MCP tools, xctrace, simctl). Reference for suites that primarily wrap external CLI tools with agent-driven exploration.
|
|
198
|
-
- browser-automation — dual-dimension suite (agent-browser stochastic, Playwright deterministic). Reference for suites that have both agent-driven exploration and scripted CI-gated tests.
|
|
152
|
+
**Structural templates**: xcode-automation (external-tool-heavy), browser-automation (dual stochastic/deterministic).
|
|
199
153
|
|
|
200
154
|
## Related References
|
|
201
155
|
|
|
202
|
-
- [`tools-commands-mcp-hooks.md`](tools-commands-mcp-hooks.md) —
|
|
203
|
-
- [`knowledge-compounding.md`](knowledge-compounding.md) —
|
|
156
|
+
- [`tools-commands-mcp-hooks.md`](tools-commands-mcp-hooks.md) — Validation using hooks, CLI commands, or MCP tools
|
|
157
|
+
- [`knowledge-compounding.md`](knowledge-compounding.md) — Crystallized patterns compounding into persistent knowledge
|
package/package.json
CHANGED
package/.allhands/README.md
DELETED
|
@@ -1,75 +0,0 @@
|
|
|
1
|
-
# All Hands CLI
|
|
2
|
-
|
|
3
|
-
Internal CLI for the All Hands agentic harness.
|
|
4
|
-
|
|
5
|
-
## Installation
|
|
6
|
-
|
|
7
|
-
```bash
|
|
8
|
-
cd .allhands/harness
|
|
9
|
-
npm install
|
|
10
|
-
```
|
|
11
|
-
|
|
12
|
-
The `ah` command is automatically installed to `~/.local/bin/ah` when you run `npx all-hands init`. This shim finds and executes the project-local `.allhands/harness/ah` from any subdirectory.
|
|
13
|
-
|
|
14
|
-
For local development, copy the shim to your PATH:
|
|
15
|
-
```bash
|
|
16
|
-
cp .allhands/harness/ah ~/.local/bin/ah
|
|
17
|
-
```
|
|
18
|
-
|
|
19
|
-
### Universal Ctags (for `ah docs` command)
|
|
20
|
-
|
|
21
|
-
```bash
|
|
22
|
-
# macOS
|
|
23
|
-
brew install universal-ctags
|
|
24
|
-
|
|
25
|
-
# Ubuntu/Debian
|
|
26
|
-
sudo apt install universal-ctags
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
### AST-grep (for advanced code search)
|
|
30
|
-
|
|
31
|
-
```bash
|
|
32
|
-
# macOS
|
|
33
|
-
brew install ast-grep
|
|
34
|
-
|
|
35
|
-
# cargo
|
|
36
|
-
cargo install ast-grep --locked
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
### Desktop Notifications (macOS)
|
|
40
|
-
|
|
41
|
-
```bash
|
|
42
|
-
brew install --cask notifier
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
## Language Servers (for LSP tool)
|
|
46
|
-
|
|
47
|
-
```bash
|
|
48
|
-
npm install -g typescript-language-server typescript pyright
|
|
49
|
-
brew install swift
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
## Environment Variables
|
|
53
|
-
|
|
54
|
-
Check `.env.ai.example` for what you should populate `.env.ai` with.
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
## Quick Start
|
|
58
|
-
|
|
59
|
-
```bash
|
|
60
|
-
ah <command>
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
The `ah` command works from any directory within an all-hands project.
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
## Project Settings
|
|
67
|
-
|
|
68
|
-
Project-specific configuration lives in `.allhands/settings.json`:
|
|
69
|
-
|
|
70
|
-
```json
|
|
71
|
-
{
|
|
72
|
-
"$schema": "./harness/src/schemas/settings.schema.json",
|
|
73
|
-
}
|
|
74
|
-
```
|
|
75
|
-
|