hatch3r 1.7.5 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/README.md +2 -2
  2. package/agents/hatch3r-context-rules.md +22 -6
  3. package/agents/hatch3r-creator.md +2 -1
  4. package/agents/hatch3r-handoff-loader.md +1 -1
  5. package/agents/hatch3r-implementer.md +8 -0
  6. package/agents/hatch3r-learnings-loader.md +1 -1
  7. package/agents/hatch3r-reviewer.md +2 -0
  8. package/agents/shared/user-content-templates.md +31 -1
  9. package/commands/hatch3r-agent-customize.md +4 -0
  10. package/commands/hatch3r-api-spec.md +7 -0
  11. package/commands/hatch3r-benchmark.md +7 -0
  12. package/commands/hatch3r-board-fill.md +7 -0
  13. package/commands/hatch3r-board-groom.md +4 -0
  14. package/commands/hatch3r-board-init.md +51 -0
  15. package/commands/hatch3r-board-pickup.md +8 -0
  16. package/commands/hatch3r-board-refresh.md +4 -0
  17. package/commands/hatch3r-board-shared.md +6 -6
  18. package/commands/hatch3r-bug-plan.md +7 -0
  19. package/commands/hatch3r-codebase-map.md +8 -0
  20. package/commands/hatch3r-command-customize.md +4 -0
  21. package/commands/hatch3r-context-health.md +5 -0
  22. package/commands/hatch3r-create.md +57 -4
  23. package/commands/hatch3r-debug.md +7 -0
  24. package/commands/hatch3r-dep-audit.md +4 -0
  25. package/commands/hatch3r-feature-plan.md +7 -0
  26. package/commands/hatch3r-handoff.md +7 -0
  27. package/commands/hatch3r-healthcheck.md +4 -0
  28. package/commands/hatch3r-hooks.md +4 -0
  29. package/commands/hatch3r-learn.md +16 -0
  30. package/commands/hatch3r-migration-plan.md +7 -0
  31. package/commands/hatch3r-onboard.md +7 -0
  32. package/commands/hatch3r-pr-resolve.md +8 -1
  33. package/commands/hatch3r-project-spec.md +8 -0
  34. package/commands/hatch3r-quick-change.md +7 -0
  35. package/commands/hatch3r-recipe.md +4 -0
  36. package/commands/hatch3r-refactor-plan.md +7 -0
  37. package/commands/hatch3r-release.md +5 -0
  38. package/commands/hatch3r-revision.md +7 -0
  39. package/commands/hatch3r-roadmap.md +8 -0
  40. package/commands/hatch3r-rule-customize.md +4 -0
  41. package/commands/hatch3r-security-audit.md +4 -0
  42. package/commands/hatch3r-skill-customize.md +4 -0
  43. package/commands/hatch3r-test-plan.md +7 -0
  44. package/commands/hatch3r-workflow.md +9 -1
  45. package/dist/cli/index.js +2600 -777
  46. package/dist/cli/index.js.map +1 -1
  47. package/package.json +8 -5
  48. package/rules/hatch3r-agent-orchestration-detail.md +3 -0
  49. package/rules/hatch3r-agent-orchestration-detail.mdc +3 -0
  50. package/rules/hatch3r-agent-orchestration.md +25 -2
  51. package/rules/hatch3r-agent-orchestration.mdc +25 -2
  52. package/rules/hatch3r-iteration-summary.md +2 -0
  53. package/rules/hatch3r-iteration-summary.mdc +2 -0
  54. package/rules/hatch3r-observability-tracing-detail.md +7 -148
  55. package/rules/hatch3r-observability-tracing-detail.mdc +6 -148
  56. package/rules/hatch3r-observability-tracing.md +154 -6
  57. package/rules/hatch3r-observability-tracing.mdc +154 -6
  58. package/skills/hatch3r-agent-customize/SKILL.md +10 -0
  59. package/skills/hatch3r-ai-feature/SKILL.md +2 -0
  60. package/skills/hatch3r-api-spec/SKILL.md +68 -0
  61. package/skills/hatch3r-cli-csvkit/SKILL.md +2 -2
  62. package/skills/hatch3r-cli-duckdb/SKILL.md +3 -3
  63. package/skills/hatch3r-cli-jq/SKILL.md +4 -0
  64. package/skills/hatch3r-cli-miller/SKILL.md +2 -2
  65. package/skills/hatch3r-cli-overview/SKILL.md +1 -1
  66. package/skills/{hatch3r-cli-xsv → hatch3r-cli-qsv}/SKILL.md +20 -18
  67. package/skills/hatch3r-cli-stagehand/SKILL.md +48 -16
  68. package/skills/hatch3r-command-customize/SKILL.md +10 -0
  69. package/skills/hatch3r-customize/SKILL.md +3 -0
  70. package/skills/hatch3r-design-system-detect/SKILL.md +2 -0
  71. package/skills/hatch3r-observability-verify/SKILL.md +4 -3
  72. package/skills/hatch3r-reliability-verify/SKILL.md +2 -0
  73. package/skills/hatch3r-rule-customize/SKILL.md +10 -0
  74. package/skills/hatch3r-skill-customize/SKILL.md +10 -0
  75. package/skills/hatch3r-ui-ux-verify/SKILL.md +2 -0
@@ -19,48 +19,71 @@ Browserbase Stagehand — AI-driven browser automation
19
19
 
20
20
  ## When to Use
21
21
 
22
- Reach for `stagehand` when the task is in the **browser** category and the agent would otherwise call an MCP tool or read large outputs into context.
22
+ Reach for `stagehand` when the task is in the **browser** category and the agent would otherwise call an MCP tool or read large outputs into context. v3 (released 2025-10-29) operates directly on the Chrome DevTools Protocol — choose Stagehand when the target page changes shape often enough that hand-written selectors break, or when a prompt is the most compact spec of intent.
23
23
 
24
24
  ## Token Cost
25
25
 
26
26
  CLI tools return structured stdout that fits in <1KB for typical queries; equivalent MCP calls regularly exceed 10KB.
27
27
  Reference: Anthropic engineering (Nov 4 2025) — code-execution-over-MCP yields 98.7% token reduction.
28
28
 
29
+ ## v3 Driver Model
30
+
31
+ v3 dropped the hard Playwright dependency and exposes a modular driver layer. Pick the driver that matches the host environment:
32
+
33
+ - **CDP-native (default):** Stagehand talks Chrome DevTools Protocol directly — no test-runner dependency, smallest install, Bun-compatible.
34
+ - **Playwright peer:** install `playwright-core` alongside Stagehand to reuse existing Playwright fixtures, traces, or `@playwright/test` reporters.
35
+ - **Puppeteer peer:** install `puppeteer-core` to share a launcher with existing Puppeteer scripts.
36
+ - **Patchright peer:** install `patchright-core` for stealth-patched CDP profiles.
37
+
38
+ `playwright-core`, `puppeteer-core`, and `patchright-core` are peer dependencies in v3 — install only the driver you use.
39
+
29
40
  ## Recipes
30
41
 
31
42
  ```bash
32
- npx stagehand init
43
+ npx create-browser-app
33
44
  ```
34
- Scaffold a Stagehand project with sample TypeScript actions and a `stagehand.config.ts`.
45
+ Scaffold a v3 Stagehand project with TypeScript wiring, a `stagehand.config.ts`, and an example `act`/`extract`/`observe` script. Replaces the v2 `npx stagehand init` workflow.
35
46
 
36
47
  ```bash
37
- npx stagehand run scripts/login.ts
48
+ node scripts/login.ts
38
49
  ```
39
- Execute an AI-driven action script Stagehand resolves selectors from natural-language intent at runtime.
50
+ Execute an AI-driven action script. The script imports `Stagehand` from `@browserbasehq/stagehand`, calls `stagehand.act("click the login button")`, and Stagehand resolves the action at runtime via CDP — no test runner required.
40
51
 
41
52
  ```bash
42
- npx stagehand record --selector-mode=ai
53
+ npx browse get markdown https://example.com
43
54
  ```
44
- Record an interactive session, capturing AI-resolved selectors for replay.
55
+ One-shot page extraction via `browse-cli` (v0.6+). Returns structured Markdown the agent can consume directly; cheaper than spawning a full Stagehand session for a single read.
45
56
 
46
57
  ```bash
47
- npx stagehand observe https://example.com 'find the login form'
58
+ npx browse cdp wss://browser.example.com
59
+ ```
60
+ Attach to an existing CDP endpoint (Browserbase managed session, local Chrome, or a custom launcher). Useful when the script delegates browser lifecycle to another supervisor.
61
+
62
+ ```typescript
63
+ // scripts/observe.ts — observe primitive returns actions without executing
64
+ import { Stagehand } from "@browserbasehq/stagehand";
65
+ const stagehand = new Stagehand({ env: "LOCAL" });
66
+ await stagehand.init();
67
+ const actions = await stagehand.observe("find the login form");
68
+ console.log(JSON.stringify(actions, null, 2));
69
+ await stagehand.close();
48
70
  ```
49
- One-shot observation returns the structured action(s) without executing them. Useful for dry-run agent loops.
71
+ Dry-run agent loop: `observe` returns the candidate action set without performing it, so a caller can route the decision (execute, ask the user, or reject).
50
72
 
51
73
  ## Wrong Choice When
52
74
 
53
- - **Deterministic E2E test flow with stable selectors:** the AI resolution adds latency and flakiness for selectors you already control. Use `hatch3r-cli-playwright` (tier 2) instead.
54
- - **High-volume scraping at scale:** Stagehand's per-action LLM round-trip is cost-prohibitive past a few hundred pages use the Browserbase remote-browser product or raw Playwright with explicit selectors.
55
- - **Headless CI in air-gapped environments:** Stagehand requires outbound LLM API access for selector resolution; offline environments fail open-loop.
75
+ - **High-volume scraping at scale:** Stagehand's per-action LLM round-trip is cost-prohibitive past a few hundred pages use the Browserbase managed-browser product, raw CDP with cached locators (v3's `deepLocator`), or Stagehand's action cache once a workflow is recorded as a deterministic script.
76
+ - **Headless CI in air-gapped environments:** Stagehand requires outbound LLM API access for selector resolution; offline environments fail the `act`/`extract`/`observe` calls. Pre-record actions with v3's automatic action cache, then replay the cached deterministic script in the air-gapped runner.
77
+ - **Workflows already covered by a stable test suite:** if Playwright tests with hand-tuned locators already pass green, Stagehand adds an LLM round-trip per step with no behavioural gain. Use `hatch3r-cli-playwright` (tier 2) for the test surface; reserve Stagehand for the agent-driven exploratory flows.
56
78
 
57
79
  ## Alternatives
58
80
 
59
81
  | Tool | When to prefer |
60
82
  |------|----------------|
61
- | `hatch3r-cli-playwright` (tier 2) | Stable selectors, deterministic CI, no LLM round-trips needed |
62
- | Browserbase managed browsers | Production scale, session recording, anti-bot evasion |
63
- | Skyvern / Browser-Use | Workflow-style automation with embedded LLM agents |
83
+ | `hatch3r-cli-playwright` (tier 2) | Existing test fixtures, deterministic CI, no LLM round-trips needed |
84
+ | Browserbase managed browsers | Production scale, session recording, anti-bot evasion, CAPTCHA solving |
85
+ | Stagehand action cache (built into v3) | Same workflow re-run many times record once, replay deterministically |
86
+ | Skyvern / Browser-Use | Workflow-style automation with embedded LLM agents and built-in task loops |
64
87
 
65
88
  ## Detection / Install
66
89
 
@@ -72,8 +95,17 @@ command -v stagehand
72
95
  Install (mac):
73
96
 
74
97
  ```bash
75
- # npm
98
+ # npm — v3 (Oct 29 2025); drivers are peer deps, install only what you use
76
99
  npm install -g @browserbasehq/stagehand
100
+ # Add a driver only if you need Playwright/Puppeteer/Patchright interop:
101
+ # npm install -g playwright-core # OR
102
+ # npm install -g puppeteer-core # OR
103
+ # npm install -g patchright-core
77
104
  ```
78
105
 
106
+ References:
107
+ - v3 release announcement (2025-10-29): https://www.browserbase.com/blog/stagehand-v3
108
+ - Latest npm releases: https://github.com/browserbase/stagehand/releases
109
+ - v3 docs: https://docs.stagehand.dev/v3/get_started/introduction
110
+
79
111
  Homepage: https://github.com/browserbase/stagehand
@@ -5,9 +5,19 @@ tags: [customize]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
+ redirect_to: hatch3r-customize
8
9
  ---
9
10
  # Command Customization
10
11
 
11
12
  > **This skill has been consolidated.** Use the `hatch3r-customize` skill with `type: command`.
12
13
 
13
14
  For command-specific reference (YAML schema, examples), see the `hatch3r-command-customize` command.
15
+
16
+ ## Rejected Merge Alternative (D16.3 add-vs-remove bias)
17
+
18
+ Per `governance/audit/domains/D16-compound-system.md` SA 16.3, the default recommendation on functional overlap is MERGE rather than removal. Full deletion of this redirect file was rejected for two reasons:
19
+
20
+ 1. **Preserves UX entry points.** Users typed `/h4tcher-command-customize` or referenced the id `hatch3r-command-customize` (per `commands/hatch3r-command-customize.md:2` and sibling redirects) before consolidation. Deleting the id breaks those entry points without a redirect target.
21
+ 2. **Signals umbrella canonicality.** The `redirect_to: hatch3r-customize` frontmatter field marks `hatch3r-customize` as the single source of truth — tooling, audit scans, and adapters can resolve any redirect to the canonical without re-reading body prose.
22
+
23
+ The 13-LOC redirect cost is paid once per type; the umbrella body lives in `skills/hatch3r-customize/SKILL.md`.
@@ -5,9 +5,12 @@ tags: [customize]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
+ canonical_for: [hatch3r-agent-customize, hatch3r-command-customize, hatch3r-rule-customize, hatch3r-skill-customize]
8
9
  ---
9
10
  # Artifact Customization Management
10
11
 
12
+ > **Canonical entry point.** Four type-specific skills (`hatch3r-agent-customize`, `hatch3r-command-customize`, `hatch3r-rule-customize`, `hatch3r-skill-customize`) redirect here via `redirect_to: hatch3r-customize` frontmatter. Their body documents the rejected-merge alternative per `governance/audit/domains/D16-compound-system.md` SA 16.3.
13
+
11
14
  ## Quick Start
12
15
 
13
16
  ```
@@ -4,6 +4,8 @@ type: skill
4
4
  description: Detect existing design tokens, component library, and theming convention in a project before authoring new UI primitives — output a concise inventory for downstream implementers
5
5
  tags: [ui, design-system, frontend]
6
6
  quality_charter: agents/shared/quality-charter.md
7
+ efficiency_patterns: agents/shared/efficiency-patterns.md
8
+ cache_friendly: true
7
9
  ---
8
10
  # Design System Detection Workflow
9
11
 
@@ -4,6 +4,8 @@ type: skill
4
4
  description: Verification gate before declaring an agent-produced service done — OTel span coverage on request path, structured-log + trace-id correlation, SLO definition, error-tracking integration, GenAI semconv on AI features
5
5
  tags: [review, performance, devops]
6
6
  quality_charter: agents/shared/quality-charter.md
7
+ efficiency_patterns: agents/shared/efficiency-patterns.md
8
+ cache_friendly: true
7
9
  ---
8
10
  # Observability Verification Gate
9
11
 
@@ -79,7 +81,7 @@ Never under-fan-out to save tokens. Token cost is dominated by quality and compl
79
81
  Applies only when the feature calls an LLM or runs an agent:
80
82
 
81
83
  - GenAI semconv span on every LLM call carrying `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons`. Cache-hit flag emitted as a span attribute when the provider returns one.
82
- - Tools invoked by the agent emit `tool.{name}.execute` spans per `rules/hatch3r-observability-tracing-detail.md`. Each tool span carries `tool.name`, `tool.input_hash`, `tool.output_status`, `tool.duration_ms`.
84
+ - Tools invoked by the agent emit `tool.{name}.execute` spans per `rules/hatch3r-observability-tracing.md` § "AI Agent Instrumentation". Each tool span carries `tool.name`, `tool.input_hash`, `tool.output_status`, `tool.duration_ms`.
83
85
  - Cost telemetry per request: a metric counter `gen_ai.tokens_total{direction, model, agent_name}` and a histogram `gen_ai.request_duration_ms`.
84
86
  - GenAI spans sampled at 50-100% in production — higher than general spans because volume is low and per-call cost is high.
85
87
 
@@ -119,8 +121,7 @@ The orchestrator running this skill emits a single-line verdict per gate (`GATE_
119
121
  - `rules/hatch3r-observability.md`
120
122
  - `rules/hatch3r-observability-logging.md`
121
123
  - `rules/hatch3r-observability-metrics.md`
122
- - `rules/hatch3r-observability-tracing.md`
123
- - `rules/hatch3r-observability-tracing-detail.md`
124
+ - `rules/hatch3r-observability-tracing.md` (includes AI agent instrumentation; was previously split as `-detail`)
124
125
 
125
126
  ## References
126
127
 
@@ -4,6 +4,8 @@ type: skill
4
4
  description: Reliability verification gate before declaring an agent-produced service done — SLO defined, kill switch, timeouts, retries, probes, runbook, staged rollout
5
5
  tags: [review, devops]
6
6
  quality_charter: agents/shared/quality-charter.md
7
+ efficiency_patterns: agents/shared/efficiency-patterns.md
8
+ cache_friendly: true
7
9
  ---
8
10
  # Reliability Verification Gate
9
11
 
@@ -5,9 +5,19 @@ tags: [customize]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
+ redirect_to: hatch3r-customize
8
9
  ---
9
10
  # Rule Customization
10
11
 
11
12
  > **This skill has been consolidated.** Use the `hatch3r-customize` skill with `type: rule`.
12
13
 
13
14
  For rule-specific reference (scope overrides, YAML schema), see the `hatch3r-rule-customize` command.
15
+
16
+ ## Rejected Merge Alternative (D16.3 add-vs-remove bias)
17
+
18
+ Per `governance/audit/domains/D16-compound-system.md` SA 16.3, the default recommendation on functional overlap is MERGE rather than removal. Full deletion of this redirect file was rejected for two reasons:
19
+
20
+ 1. **Preserves UX entry points.** Users typed `/h4tcher-rule-customize` or referenced the id `hatch3r-rule-customize` (per `rules/hatch3r-browser-verification.md:57` and sibling cross-references) before consolidation. Deleting the id breaks those entry points without a redirect target.
21
+ 2. **Signals umbrella canonicality.** The `redirect_to: hatch3r-customize` frontmatter field marks `hatch3r-customize` as the single source of truth — tooling, audit scans, and adapters can resolve any redirect to the canonical without re-reading body prose.
22
+
23
+ The 13-LOC redirect cost is paid once per type; the umbrella body lives in `skills/hatch3r-customize/SKILL.md`.
@@ -5,9 +5,19 @@ tags: [customize]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
+ redirect_to: hatch3r-customize
8
9
  ---
9
10
  # Skill Customization
10
11
 
11
12
  > **This skill has been consolidated.** Use the `hatch3r-customize` skill with `type: skill`.
12
13
 
13
14
  For skill-specific reference (YAML schema, examples), see the `hatch3r-skill-customize` command.
15
+
16
+ ## Rejected Merge Alternative (D16.3 add-vs-remove bias)
17
+
18
+ Per `governance/audit/domains/D16-compound-system.md` SA 16.3, the default recommendation on functional overlap is MERGE rather than removal. Full deletion of this redirect file was rejected for two reasons:
19
+
20
+ 1. **Preserves UX entry points.** Users typed `/h4tcher-skill-customize` or referenced the id `hatch3r-skill-customize` (per `rules/hatch3r-browser-verification.md:58` and sibling cross-references) before consolidation. Deleting the id breaks those entry points without a redirect target.
21
+ 2. **Signals umbrella canonicality.** The `redirect_to: hatch3r-customize` frontmatter field marks `hatch3r-customize` as the single source of truth — tooling, audit scans, and adapters can resolve any redirect to the canonical without re-reading body prose.
22
+
23
+ The 13-LOC redirect cost is paid once per type; the umbrella body lives in `skills/hatch3r-customize/SKILL.md`.
@@ -4,6 +4,8 @@ type: skill
4
4
  description: UI/UX verification gate before declaring a feature done — axe-core, scripted keyboard trace, accessibility-tree snapshot, four-state coverage, visual-regression baseline, one human screen-reader pass per release
5
5
  tags: [ui, ux, a11y]
6
6
  quality_charter: agents/shared/quality-charter.md
7
+ efficiency_patterns: agents/shared/efficiency-patterns.md
8
+ cache_friendly: true
7
9
  ---
8
10
  # UI/UX Verification Gate
9
11