npm - all-hands-cli - Versions diffs - 0.1.0 - Mend

all-hands-cli 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (305) hide show

package/.allhands/README.md +75 -0
package/.allhands/agents/compounder.yaml +15 -0
package/.allhands/agents/coordinator.yaml +17 -0
package/.allhands/agents/documentor.yaml +15 -0
package/.allhands/agents/e2e-test-planner.yaml +17 -0
package/.allhands/agents/emergent.yaml +22 -0
package/.allhands/agents/executor.yaml +14 -0
package/.allhands/agents/ideation.yaml +11 -0
package/.allhands/agents/initiative-steering.yaml +19 -0
package/.allhands/agents/judge.yaml +13 -0
package/.allhands/agents/planner.yaml +19 -0
package/.allhands/agents/pr-reviewer.yaml +15 -0
package/.allhands/docs.json +5 -0
package/.allhands/docs.local.json +26 -0
package/.allhands/flows/COMPOUNDING.md +203 -0
package/.allhands/flows/COORDINATION.md +89 -0
package/.allhands/flows/CORE.md +87 -0
package/.allhands/flows/DOCUMENTATION.md +218 -0
package/.allhands/flows/E2E_TEST_PLAN_BUILDING.md +140 -0
package/.allhands/flows/EMERGENT_PLANNING.md +57 -0
package/.allhands/flows/IDEATION_SCOPING.md +154 -0
package/.allhands/flows/INITIATIVE_STEERING.md +110 -0
package/.allhands/flows/JUDGE_REVIEWING.md +79 -0
package/.allhands/flows/PROMPT_TASK_EXECUTION.md +68 -0
package/.allhands/flows/PR_REVIEWING.md +43 -0
package/.allhands/flows/SPEC_PLANNING.md +216 -0
package/.allhands/flows/harness/WRITING_HARNESS_FLOWS.md +27 -0
package/.allhands/flows/harness/WRITING_HARNESS_KNOWLEDGE.md +27 -0
package/.allhands/flows/harness/WRITING_HARNESS_ORCHESTRATION.md +27 -0
package/.allhands/flows/harness/WRITING_HARNESS_SKILLS.md +27 -0
package/.allhands/flows/harness/WRITING_HARNESS_TOOLS.md +27 -0
package/.allhands/flows/harness/WRITING_HARNESS_VALIDATION_TOOLING.md +27 -0
package/.allhands/flows/shared/CODEBASE_UNDERSTANDING.md +72 -0
package/.allhands/flows/shared/CREATE_HARNESS_SPEC.md +48 -0
package/.allhands/flows/shared/CREATE_SPEC.md +41 -0
package/.allhands/flows/shared/CREATE_VALIDATION_TOOLING_SPEC.md +70 -0
package/.allhands/flows/shared/DOCUMENTATION_DISCOVERY.md +123 -0
package/.allhands/flows/shared/DOCUMENTATION_WRITER.md +101 -0
package/.allhands/flows/shared/EMERGENT_REFINEMENT_ANALYSIS.md +76 -0
package/.allhands/flows/shared/EXTERNAL_TECH_GUIDANCE.md +97 -0
package/.allhands/flows/shared/IDEATION_CODEBASE_GROUNDING.md +49 -0
package/.allhands/flows/shared/PLAN_DEEPENING.md +152 -0
package/.allhands/flows/shared/PROMPT_TASKS_CURATION.md +113 -0
package/.allhands/flows/shared/PROMPT_VALIDATION_REVIEW.MD +99 -0
package/.allhands/flows/shared/QUICK_PREMORTEM.md +70 -0
package/.allhands/flows/shared/RESEARCH_GUIDANCE.md +38 -0
package/.allhands/flows/shared/REVIEW_OPTIONS_BREAKDOWN.md +68 -0
package/.allhands/flows/shared/SKILL_EXTRACTION.md +84 -0
package/.allhands/flows/shared/SPEC_FLOW_ANALYSIS.md +119 -0
package/.allhands/flows/shared/TDD_WORKFLOW.md +109 -0
package/.allhands/flows/shared/UTILIZE_VALIDATION_TOOLING.md +84 -0
package/.allhands/flows/shared/WRITING_HARNESS_FLOWS.md +11 -0
package/.allhands/flows/shared/WRITING_HARNESS_MCP_TOOLS.md +84 -0
package/.allhands/flows/shared/jury/ARCHITECTURE_REVIEW.md +91 -0
package/.allhands/flows/shared/jury/BEST_PRACTICES_REVIEW.md +80 -0
package/.allhands/flows/shared/jury/CLAIM_VERIFICATION_REVIEW.md +101 -0
package/.allhands/flows/shared/jury/EXPECTATIONS_FIT_REVIEW.md +78 -0
package/.allhands/flows/shared/jury/MAINTAINABILITY_REVIEW.md +110 -0
package/.allhands/flows/shared/jury/PROMPTS_EXPECTATIONS_FIT.md +74 -0
package/.allhands/flows/shared/jury/PROMPTS_FLOW_ANALYSIS.md +92 -0
package/.allhands/flows/shared/jury/PROMPTS_YAGNI.md +78 -0
package/.allhands/flows/shared/jury/PROMPT_PREMORTEM.md +125 -0
package/.allhands/flows/shared/jury/SECURITY_REVIEW.md +86 -0
package/.allhands/flows/shared/jury/YAGNI_REVIEW.md +82 -0
package/.allhands/flows/wip/DEBUG_INVESTIGATION.md +162 -0
package/.allhands/flows/wip/MEMORY_RECALL.md +62 -0
package/.allhands/harness/ah +131 -0
package/.allhands/harness/package-lock.json +5292 -0
package/.allhands/harness/package.json +52 -0
package/.allhands/harness/src/__tests__/e2e/commands.test.ts +307 -0
package/.allhands/harness/src/__tests__/e2e/event-loop.test.ts +539 -0
package/.allhands/harness/src/__tests__/e2e/hooks.test.ts +427 -0
package/.allhands/harness/src/__tests__/e2e/new-initiative-routing.test.ts +137 -0
package/.allhands/harness/src/__tests__/e2e/run-e2e.ts +109 -0
package/.allhands/harness/src/__tests__/e2e/specs-type.test.ts +210 -0
package/.allhands/harness/src/__tests__/e2e/validation-hooks.test.ts +669 -0
package/.allhands/harness/src/__tests__/e2e/validation-path-consistency.test.ts +354 -0
package/.allhands/harness/src/__tests__/e2e/validation.test.ts +528 -0
package/.allhands/harness/src/__tests__/harness/assertions.ts +318 -0
package/.allhands/harness/src/__tests__/harness/cli-runner.ts +359 -0
package/.allhands/harness/src/__tests__/harness/fixture.ts +384 -0
package/.allhands/harness/src/__tests__/harness/hook-runner.ts +411 -0
package/.allhands/harness/src/__tests__/harness/index.ts +122 -0
package/.allhands/harness/src/cli.ts +36 -0
package/.allhands/harness/src/commands/complexity.ts +177 -0
package/.allhands/harness/src/commands/context7.ts +202 -0
package/.allhands/harness/src/commands/docs.ts +557 -0
package/.allhands/harness/src/commands/hooks.ts +24 -0
package/.allhands/harness/src/commands/index.ts +51 -0
package/.allhands/harness/src/commands/knowledge.ts +382 -0
package/.allhands/harness/src/commands/memories.ts +302 -0
package/.allhands/harness/src/commands/notify.ts +61 -0
package/.allhands/harness/src/commands/oracle.ts +158 -0
package/.allhands/harness/src/commands/perplexity.ts +220 -0
package/.allhands/harness/src/commands/planning.ts +245 -0
package/.allhands/harness/src/commands/schema.ts +73 -0
package/.allhands/harness/src/commands/skills.ts +128 -0
package/.allhands/harness/src/commands/solutions.ts +353 -0
package/.allhands/harness/src/commands/spawn.ts +158 -0
package/.allhands/harness/src/commands/specs.ts +532 -0
package/.allhands/harness/src/commands/tavily.ts +226 -0
package/.allhands/harness/src/commands/tools.ts +579 -0
package/.allhands/harness/src/commands/trace.ts +327 -0
package/.allhands/harness/src/commands/tui.ts +960 -0
package/.allhands/harness/src/commands/validate.ts +143 -0
package/.allhands/harness/src/commands/validation-tools.ts +108 -0
package/.allhands/harness/src/hooks/context.ts +1442 -0
package/.allhands/harness/src/hooks/enforcement.ts +170 -0
package/.allhands/harness/src/hooks/index.ts +54 -0
package/.allhands/harness/src/hooks/lifecycle.ts +229 -0
package/.allhands/harness/src/hooks/notification.ts +104 -0
package/.allhands/harness/src/hooks/observability.ts +551 -0
package/.allhands/harness/src/hooks/session.ts +88 -0
package/.allhands/harness/src/hooks/shared.ts +815 -0
package/.allhands/harness/src/hooks/transcript-parser.ts +208 -0
package/.allhands/harness/src/hooks/validation.ts +617 -0
package/.allhands/harness/src/lib/__tests__/ctags.test.ts +244 -0
package/.allhands/harness/src/lib/__tests__/docs-validation.test.ts +344 -0
package/.allhands/harness/src/lib/__tests__/mcp-runtime.test.ts +190 -0
package/.allhands/harness/src/lib/__tests__/schema.test.ts +861 -0
package/.allhands/harness/src/lib/base-command.ts +198 -0
package/.allhands/harness/src/lib/cli-daemon.ts +343 -0
package/.allhands/harness/src/lib/compaction.ts +313 -0
package/.allhands/harness/src/lib/ctags.ts +497 -0
package/.allhands/harness/src/lib/docs-validation.ts +907 -0
package/.allhands/harness/src/lib/event-loop.ts +662 -0
package/.allhands/harness/src/lib/flows.ts +155 -0
package/.allhands/harness/src/lib/git.ts +276 -0
package/.allhands/harness/src/lib/knowledge-worker.ts +72 -0
package/.allhands/harness/src/lib/knowledge.ts +810 -0
package/.allhands/harness/src/lib/llm.ts +255 -0
package/.allhands/harness/src/lib/mcp-client.ts +432 -0
package/.allhands/harness/src/lib/mcp-daemon.ts +486 -0
package/.allhands/harness/src/lib/mcp-runtime.ts +418 -0
package/.allhands/harness/src/lib/notification.ts +115 -0
package/.allhands/harness/src/lib/opencode/index.ts +70 -0
package/.allhands/harness/src/lib/opencode/profiles.ts +300 -0
package/.allhands/harness/src/lib/opencode/prompts/codesearch.md +98 -0
package/.allhands/harness/src/lib/opencode/prompts/knowledge-aggregator.md +67 -0
package/.allhands/harness/src/lib/opencode/runner.ts +281 -0
package/.allhands/harness/src/lib/oracle.ts +926 -0
package/.allhands/harness/src/lib/planning-utils.ts +150 -0
package/.allhands/harness/src/lib/planning.ts +605 -0
package/.allhands/harness/src/lib/pr-review.ts +225 -0
package/.allhands/harness/src/lib/prompts.ts +522 -0
package/.allhands/harness/src/lib/schema.ts +418 -0
package/.allhands/harness/src/lib/schemas/agent-profile.ts +141 -0
package/.allhands/harness/src/lib/schemas/template-vars.ts +138 -0
package/.allhands/harness/src/lib/session.ts +164 -0
package/.allhands/harness/src/lib/specs.ts +348 -0
package/.allhands/harness/src/lib/tldr.ts +829 -0
package/.allhands/harness/src/lib/tmux.ts +1051 -0
package/.allhands/harness/src/lib/trace-store.ts +714 -0
package/.allhands/harness/src/mcp/__tests__/index.test.ts +46 -0
package/.allhands/harness/src/mcp/_template.ts +47 -0
package/.allhands/harness/src/mcp/filesystem.ts +33 -0
package/.allhands/harness/src/mcp/index.ts +69 -0
package/.allhands/harness/src/mcp/playwright.ts +34 -0
package/.allhands/harness/src/mcp/xcodebuild.ts +29 -0
package/.allhands/harness/src/schemas/docs.schema.json +44 -0
package/.allhands/harness/src/schemas/settings.schema.json +214 -0
package/.allhands/harness/src/tui/actions.ts +227 -0
package/.allhands/harness/src/tui/file-viewer-modal.ts +270 -0
package/.allhands/harness/src/tui/index.ts +1574 -0
package/.allhands/harness/src/tui/modal.ts +232 -0
package/.allhands/harness/src/tui/prompts-pane.ts +186 -0
package/.allhands/harness/src/tui/status-pane.ts +434 -0
package/.allhands/harness/tsconfig.json +22 -0
package/.allhands/harness/vitest.config.ts +13 -0
package/.allhands/pillars.md +33 -0
package/.allhands/principles.md +88 -0
package/.allhands/schemas/alignment.yaml +51 -0
package/.allhands/schemas/documentation.yaml +10 -0
package/.allhands/schemas/prompt.yaml +92 -0
package/.allhands/schemas/skill.yaml +34 -0
package/.allhands/schemas/solution.yaml +131 -0
package/.allhands/schemas/spec.yaml +67 -0
package/.allhands/schemas/validation-suite.yaml +49 -0
package/.allhands/schemas/workflow.yaml +51 -0
package/.allhands/settings.json +57 -0
package/.allhands/skills/claude-code-patterns/SKILL.md +60 -0
package/.allhands/skills/claude-code-patterns/docs/context-hygiene.md +19 -0
package/.allhands/skills/harness-maintenance/SKILL.md +449 -0
package/.allhands/skills/harness-maintenance/references/core-architecture.md +187 -0
package/.allhands/skills/harness-maintenance/references/harness-skills.md +87 -0
package/.allhands/skills/harness-maintenance/references/knowledge-compounding.md +78 -0
package/.allhands/skills/harness-maintenance/references/tools-commands-mcp-hooks.md +115 -0
package/.allhands/skills/harness-maintenance/references/validation-tooling.md +77 -0
package/.allhands/skills/harness-maintenance/references/writing-flows.md +84 -0
package/.allhands/validation/browser-automation.md +109 -0
package/.allhands/validation/xcode-automation.md +195 -0
package/.allhands/workflows/documentation.md +86 -0
package/.allhands/workflows/investigation.md +81 -0
package/.allhands/workflows/milestone.md +91 -0
package/.allhands/workflows/optimization.md +85 -0
package/.allhands/workflows/refactor.md +99 -0
package/.allhands/workflows/triage.md +81 -0
package/.claude/README.md +1 -0
package/.claude/agents/explorer.md +10 -0
package/.claude/agents/researcher.md +11 -0
package/.claude/agents/task-runner.md +8 -0
package/.claude/settings.json +231 -0
package/.env.ai.example +7 -0
package/.github/workflows/npm-publish.yml +69 -0
package/.internal.json +45 -0
package/.tldr/config.json +11 -0
package/.tldrignore +90 -0
package/CLAUDE.md +6 -0
package/README.md +98 -0
package/bin/sync-cli.js +7552 -0
package/concerns.md +7 -0
package/docs/README.md +41 -0
package/docs/agents/README.md +24 -0
package/docs/agents/agent-configuration-system.md +86 -0
package/docs/agents/execution-agents.md +50 -0
package/docs/agents/knowledge-agents.md +61 -0
package/docs/agents/orchestration-agent.md +57 -0
package/docs/agents/planning-agents.md +84 -0
package/docs/agents/quality-review-agents.md +67 -0
package/docs/agents/workflow-agent-orchestration.md +69 -0
package/docs/flows/README.md +44 -0
package/docs/flows/compounding.md +126 -0
package/docs/flows/coordination.md +72 -0
package/docs/flows/core-harness-integration.md +63 -0
package/docs/flows/documentation-orchestration.md +98 -0
package/docs/flows/e2e-test-plan-building.md +83 -0
package/docs/flows/emergent-refinement.md +104 -0
package/docs/flows/flow-authoring-and-mcp-tools.md +89 -0
package/docs/flows/judge-reviewing.md +112 -0
package/docs/flows/plan-deepening-and-research.md +107 -0
package/docs/flows/plan-review-jury.md +114 -0
package/docs/flows/pr-reviewing.md +54 -0
package/docs/flows/prompt-task-execution.md +119 -0
package/docs/flows/spec-planning.md +162 -0
package/docs/flows/type-specific-scoping-flows.md +49 -0
package/docs/flows/validation-and-skills-integration.md +145 -0
package/docs/flows/wip/wip-flows.md +102 -0
package/docs/harness/README.md +23 -0
package/docs/harness/agent-profiles.md +84 -0
package/docs/harness/cli/README.md +24 -0
package/docs/harness/cli/cli-entry-and-command-discovery.md +91 -0
package/docs/harness/cli/docs-command.md +87 -0
package/docs/harness/cli/knowledge-command.md +91 -0
package/docs/harness/cli/minor-cli-commands.md +65 -0
package/docs/harness/cli/oracle-command.md +113 -0
package/docs/harness/cli/planning-command.md +95 -0
package/docs/harness/cli/schema-and-validation-commands.md +154 -0
package/docs/harness/cli/search-commands.md +97 -0
package/docs/harness/cli/spawn-command.md +136 -0
package/docs/harness/cli/specs-command.md +102 -0
package/docs/harness/cli/tools-command.md +122 -0
package/docs/harness/cli/trace-command.md +122 -0
package/docs/harness/cli-daemon.md +92 -0
package/docs/harness/event-loop.md +184 -0
package/docs/harness/hooks/README.md +15 -0
package/docs/harness/hooks/context-hooks.md +96 -0
package/docs/harness/hooks/lifecycle-and-observability-hooks.md +135 -0
package/docs/harness/hooks/validation-hooks.md +97 -0
package/docs/harness/test-harness.md +149 -0
package/docs/harness/tui.md +176 -0
package/docs/memories.md +20 -0
package/docs/solutions/agentic-issues/premature-agent-deletion-tui-action-dependency-20260130.md +49 -0
package/docs/solutions/agentic-issues/ref-anchor-scope-mismatch-skill-references-20260131.md +55 -0
package/docs/solutions/agentic-issues/tautological-tests-routing-20260131.md +52 -0
package/docs/solutions/integration_issue/blocktool-output-format-mismatch-hook-runner-20260130.md +52 -0
package/docs/solutions/integration_issue/dual-validation-path-divergence-schema-20260130.md +66 -0
package/docs/solutions/security-issues/unsanitized-domain-path-join-20260131.md +52 -0
package/docs/solutions/test-failures/event-loop-mock-ordering-checkAgentWindows-20260130.md +63 -0
package/docs/sync-cli/README.md +19 -0
package/docs/sync-cli/cli-entrypoint-and-commands.md +39 -0
package/docs/sync-cli/commands/README.md +11 -0
package/docs/sync-cli/commands/pull-manifest-command.md +36 -0
package/docs/sync-cli/commands/push-command.md +84 -0
package/docs/sync-cli/commands/sync-command.md +71 -0
package/docs/sync-cli/systems/README.md +14 -0
package/docs/sync-cli/systems/git-and-github-integration.md +49 -0
package/docs/sync-cli/systems/interactive-ui.md +43 -0
package/docs/sync-cli/systems/manifest-and-distribution.md +51 -0
package/docs/sync-cli/systems/path-resolution.md +42 -0
package/package.json +46 -0
package/scripts/install-shim.sh +40 -0
package/scripts/pre-pack.sh +25 -0
package/specs/harness-maintenance-skill.spec.md +138 -0
package/specs/roadmap/git-spec-lifecycle-management.spec.md +113 -0
package/specs/sync-init-flag.spec.md +117 -0
package/specs/unified-workflow-orchestration.spec.md +250 -0
package/specs/validation-tooling-practice.spec.md +98 -0
package/specs/workflow-domain-configuration.spec.md +265 -0
package/src/commands/pull-manifest.ts +31 -0
package/src/commands/push.ts +344 -0
package/src/commands/sync.ts +289 -0
package/src/lib/constants.ts +10 -0
package/src/lib/dotfiles.ts +36 -0
package/src/lib/fs-utils.ts +18 -0
package/src/lib/gh.ts +40 -0
package/src/lib/git.ts +63 -0
package/src/lib/gitignore.ts +167 -0
package/src/lib/manifest.ts +121 -0
package/src/lib/marker-sync.ts +39 -0
package/src/lib/paths.ts +38 -0
package/src/lib/target-lines.ts +66 -0
package/src/lib/ui.ts +78 -0
package/src/sync-cli.ts +120 -0
package/target-lines.json +23 -0
package/tsconfig.json +20 -0

package/.allhands/validation/xcode-automation.md ADDED Viewed

@@ -0,0 +1,195 @@
+---
+name: xcode-automation
+description: "Xcode-based validation for iOS/macOS native implementations — exploratory build verification, performance profiling, UI automation, and runtime analysis"
+globs:
+  - "**/*.swift"
+  - "**/*.m"
+  - "**/*.h"
+  - "**/*.xib"
+  - "**/*.storyboard"
+  - "**/*.xcodeproj/**"
+  - "**/*.xcworkspace/**"
+  - "**/ios/**"
+  - "**/macos/**"
+  - "**/Podfile"
+  - "**/Package.swift"
+  - "**/*.entitlements"
+  - "**/*.plist"
+tools:
+  - "xcodebuildmcp"
+  - "xctrace"
+---
+## Purpose
+This suite validates native Apple platform quality across a unified domain: build integrity, runtime performance, UI interaction correctness, and resource profiling. These are sub-concerns within a single validation domain — the Xcode build and runtime environment — not separate suites.
+The stochastic dimension uses agent-driven Xcode automation to build projects, deploy to simulators, explore UI via accessibility-based interaction, capture logs, and probe performance characteristics using profiling instruments. The deterministic dimension (unit tests, snapshot tests via `xcodebuild test`) is planned but not yet implemented.
+Per **Agentic Validation Tooling**, this suite meets the existence threshold: the stochastic dimension (exploratory build verification, UI automation, performance profiling, memory analysis) provides meaningful agent-driven validation beyond what deterministic tests alone can cover.
+## Tooling
+### XcodeBuildMCP (stochastic dimension)
+- **Harness integration**: Registered as MCP server `xcodebuild` — access via `ah tools xcodebuild`. Run `ah tools xcodebuild --help-tool` for full parameter schemas before exploration.
+- MCP server wrapping `xcodebuild`, `simctl`, and AXe (accessibility-based UI automation). 63+ tools across workflow groups — exposed as MCP tool calls via `ah tools xcodebuild:<tool>`.
+- **Workflow groups**: Only `simulator` is enabled by default. Enable additional groups (`ui-automation`, `logging`, `project-discovery`, `session-management`, `simulator-management`) via `.xcodebuildmcp/config.yaml` in the target project.
+- **Session defaults model**: `session-set-defaults` (hyphenated) persists workspace path, scheme, simulator, and configuration across subsequent calls — reduces token overhead significantly. Use `workspacePath` for CocoaPods projects (`.xcworkspace`), `projectPath` for standalone `.xcodeproj`. Always set session defaults before exploration.
+- **Tool discovery first**: Run `ah tools xcodebuild` to see all available tools, then `ah tools xcodebuild --help-tool` for parameter schemas. Tool awareness shapes what you attempt. Prerequisite, not afterthought.
+### xctrace (stochastic dimension — profiling)
+- **Installation**: Ships with Xcode. Verify with `xcrun xctrace version`.
+- CLI for Instruments profiling — not an MCP tool; invoked directly via shell. Run `xcrun xctrace help` and `xcrun xctrace record --help` before any profiling — the subcommand vocabulary (record, export, list, symbolicate) determines what analysis is possible.
+- **Template-based recording**: `xctrace list templates` reveals available profiling templates. Templates define what instruments are active during a recording session.
+- **Attachment by PID**: `--attach` requires a numeric PID, not a process name. For simulator apps, `xcrun simctl spawn <UDID> launchctl list | grep <bundle_id>` returns the simulator-internal PID, which xctrace cannot use. Instead, find the **host PID** via `pgrep -f "appname.app/appname"`. For `--launch` mode, all flags (`--time-limit`, `--output`, `--no-prompt`) must come **before** the `--launch -- <bundle_id>` terminator — flags after `--` are passed to the launched app, not xctrace.
+## Stochastic Validation
+Agent-driven exploratory Xcode validation. This section teaches WHAT to validate and WHY — MCP tool discovery and `xctrace --help` teach HOW.
+### Core Loop
+**Prerequisite**: Run `ah tools xcodebuild --help-tool` for parameter schemas, then `session-set-defaults` with the target project. Run `xcrun xctrace help` to internalize profiling vocabulary.
+Discover project → build → deploy to simulator → explore UI → capture logs → profile performance → analyze results.
+This is the thinking pattern to internalize, not a command sequence:
+- Always discover before building — `discover_projs` reveals workspace/project structure, `list_schemes` shows available build targets. Never assume scheme names.
+- Set session defaults early — `workspacePath` (for CocoaPods projects) or `projectPath`, scheme, simulator name, and configuration persist across calls. This eliminates repetitive parameter passing and reduces token cost.
+- **Visible simulator preferred** — call `open_sim` before `boot_sim` to make the simulator visible. Headless mode (`boot_sim` without `open_sim`) provides no visual feedback on what the agent is doing. Visible simulators let engineers observe agent-driven UI interactions in real time. Default to visible; use headless only in CI environments.
+- Verify build success before deployment — build failures surface dependency issues, missing signing, or configuration problems that must be resolved before any runtime validation. For React Native / Expo projects, `npx expo prebuild --platform ios --clean` must run before the first build to ensure all XCFramework slices are downloaded.
+- Use `preferXcodebuild=true` for clean builds — the incremental build system (`xcodemake`) can produce incomplete `.app` bundles (missing Info.plist, executables). Always use `--preferXcodebuild=true` on the first build after `clean` or `expo prebuild`.
+- Check logs after interactions — native logs via `start_sim_log_cap` (requires `bundleId`), JS logs via direct `log stream` with `subsystem == "com.facebook.react.log"` for RN/Expo apps (see Log capture and analysis use case). Runtime crashes, constraint violations, and warnings appear in logs, not the UI.
+- Profile after functional verification — profiling a broken app wastes time. Confirm the app runs correctly first, then measure performance.
+### Use Cases
+These seed categories guide exploration. Per **Frontier Models are Capable**, the agent extrapolates deeper investigation from these starting points.
+- **Build verification**: `discover_projs` to find workspace, `list_schemes` to enumerate targets, `build_sim --preferXcodebuild=true` to compile for simulator. Verify clean builds succeed. Exercise `clean` then rebuild to catch incremental build artifacts masking errors. Check `show_build_settings` for unexpected configuration (wrong SDK, missing preprocessor flags, incorrect deployment target).
+- **Deploy and run**: For reliable deploy, use the multi-step sequence: `build_sim` → `get_sim_app_path` → `get_app_bundle_id` → `install_app_sim` → `launch_app_sim`. The composite `build_run_sim` is convenient but can timeout on long builds. Verify the app launches without crashes — `launch_app_logs_sim` captures stdout/stderr from launch. Boot specific simulator devices via `list_sims` and `boot_sim` to test across device classes (iPhone SE, iPhone 16 Pro Max, iPad).
+- **UI automation and verification**: Enable the `ui-automation` workflow group. Verification uses two complementary methods — both are required, not interchangeable:
+  - **Programmatic verification** (`describe_ui`): Captures the full accessibility hierarchy with precise frame coordinates. Use for asserting specific state changes: element labels, button presence/absence, text content. Call `describe_ui` after each interaction to confirm the expected state change occurred (e.g., "Like count: 0" → "Like count: 3"). This is the primary method for **semantic state verification** — did the right data appear?
+  - **Visual verification** (`screenshot` + read the image): Captures a screenshot and the agent MUST visually inspect it to verify layout correctness, rendering quality, and visual state. This catches issues invisible to the accessibility hierarchy: clipped text, overlapping elements, incorrect colors, broken layouts, missing images, compressed frames. This is the primary method for **visual/layout verification** — does it look right?
+  - Per **Agentic Validation Tooling**, the agent is the observer — it must use both its programmatic and visual senses. Taking a screenshot without reading it provides no validation value. `describe_ui` alone misses rendering bugs. Use `describe_ui` for "is the state correct?" and `screenshot` (visually inspected) for "does it render correctly?"
+  - Interact via `tap`, `swipe`, `type_text`, `key_press` using accessibility labels from `describe_ui` (preferred) or coordinates. Walk critical user flows: onboarding, navigation between screens, form submission, back navigation.
+- **Log capture and analysis**: Two log channels exist for native and JS respectively:
+  - **Native logs** (`start_sim_log_cap`): Requires `bundleId`. Captures structured `os_log` messages filtered by `subsystem == "<bundleId>"`. Surfaces: constraint ambiguity warnings (Auto Layout issues), main thread violations, memory warnings, unhandled exceptions, API deprecation notices, missing `UIBackgroundModes` entries. Use `captureConsole=true` to additionally capture the app process's stdout/stderr (note: this relaunches the app on start and **terminates it on stop** — plan the lifecycle accordingly).
+  - **JS logs for React Native / Expo**: JavaScript `console.log` output routes through Hermes JSI → `RCTLog` → Apple's `os_log` under subsystem `com.facebook.react.log` (category `javascript`) — NOT the app's bundle ID subsystem and NOT stdout/stderr. This means `start_sim_log_cap` will **not** capture JS console.log messages, because it filters by the app's bundle ID subsystem. Two approaches to capture JS logs:
+    1. **Direct `log stream`** (preferred for automation): `xcrun simctl spawn <UDID> log stream --level=debug --predicate 'subsystem == "com.facebook.react.log"'` — captures JS `console.log` output in real time via the simulator's unified log system. Run in background, exercise the app, then inspect the output.
+    2. **Metro terminal output**: When Metro is running (`expo start`), JS logs also appear in Metro's stdout as `LOG  [message]`. If Metro is running as a background task, its output file contains all JS logs.
+  - For comprehensive validation, use both channels: `start_sim_log_cap` for native-level diagnostics and direct `log stream` (or Metro output) for JS-level state change verification. The JS log channel is essential for verifying that UI automation interactions produce the expected application-level state changes.
+- **Performance profiling**: Use `xctrace` after confirming the app runs correctly. Find the **host PID** via `pgrep -f "appname.app/appname"` (not `launchctl list`, which returns the simulator-internal PID). Then: `xcrun xctrace record --template 'Time Profiler' --device '<UDID>' --attach '<PID>' --time-limit 30s --output /tmp/profile.trace --no-prompt`. Export with `xcrun xctrace export --input /tmp/profile.trace --toc` to understand trace structure (schemas, tables), then XPath queries for specific tables. For Expo/RN apps, look for `com.facebook.react.runtime.JavaScript` (Hermes JS thread) and `hades` (Hermes GC) in thread samples.
+- **Memory analysis**: Same host PID discovery, then `xcrun xctrace record --template 'Leaks' --device '<UDID>' --attach '<PID>' --time-limit 60s --output /tmp/leaks.trace --no-prompt`. The `Leaks` template includes the `Allocations` instrument — a single recording provides both leak detection and heap allocation statistics. Export leak results via `xcrun xctrace export --input /tmp/leaks.trace --xpath '/trace-toc/run[@number="1"]/tracks/track[@name="Leaks"]/details/detail[@name="Leaks"]'` and allocation statistics via the `Allocations` track. The standalone `Allocations` template uses deferred recording mode, making CLI export less straightforward — prefer `Leaks` for combined analysis. Exercise flows repeatedly and check for monotonic heap growth indicating retain cycles.
+- **Combined profiling + UI automation**: Run `xctrace record` in the background (long `--time-limit` or no limit), then exercise the app via xcodebuild MCP UI automation (`describe_ui`, `tap`, `gesture`) while profiling captures the runtime behavior. This surfaces memory leaks, performance regressions, and hangs in actual user flows, not just idle state. **Ordering matters**: `xctrace --attach` sessions end when the target process exits. If `stop_sim_log_cap` (console mode) terminates the app while xctrace is recording, the trace ends early. Always let xctrace reach its `--time-limit` or stop it explicitly before terminating the app or stopping console log capture.
+- **Animation quality**: `xcrun xctrace record --template 'Animation Hitches' --device '<UDID>' --attach '<PID>' --time-limit 30s --output /tmp/hitches.trace --no-prompt` while scrolling, navigating, and animating. Hitch duration > 33ms (2 frames) indicates dropped frames visible to users. **Note**: Animation Hitches is **not supported on simulator** — requires a physical device.
+- **Additional profiling templates**: Beyond the core three, `App Launch` measures startup time (critical for RN/Expo apps with large JS bundles), `Network` captures HTTP request timing and payload sizes, `SwiftUI` profiles SwiftUI-specific rendering (not relevant for RN), `Swift Concurrency` profiles async/await patterns, `CPU Counters` provides low-level CPU performance data, and `Power Profiler` measures battery impact.
+- **Evidence capture**: Per **Agentic Validation Tooling**, two audiences require different artifacts. Agent self-verification (real-time `describe_ui` checks, screenshot visual inspection, log stream monitoring) happens during the observe-act-verify loop. Engineer review artifacts (`screenshot` images, `record_sim_video` recordings, xctrace `.trace` files, captured log output) are produced after exploration. Pattern: explore first, then capture review evidence — but the agent MUST visually read screenshots it takes during exploration, not just save them.
+### Resilience
+Stochastic exploration in the Xcode environment has unique failure modes. These patterns prevent death spirals:
+- Max 3 retries on any build or interaction, then report failure and move on
+- `screenshot` on failure — capture simulator state before recovery attempts
+- Simulator reset if app becomes unresponsive — `stop_app_sim`, then re-launch. If simulator itself hangs, `erase_sims` for a clean slate
+- Code signing bail-out — provisioning profile or certificate errors: report the exact error and move on. These require human intervention.
+- CocoaPods/SPM resolution — dependency resolution failures: check `Podfile.lock` freshness, try `clean` and rebuild. Report if unresolvable.
+- Stale session defaults — if switching between projects or schemes, always call `session-set-defaults` again. Stale defaults cause confusing errors.
+- xctrace attachment failures — `--attach` requires the **host PID** (via `pgrep -f "appname.app/appname"`), not the simulator-internal PID from `launchctl list`. If the app exits before profiling starts, use `--launch` mode — but note that `--launch` does not pass URL scheme arguments, so Expo/RN apps may fail to connect to the Metro bundler. For Expo apps, prefer `--attach` after launching the app via xcodebuild MCP or `expo start`.
+- Incomplete `.app` bundles — if `install_app_sim` fails with "Missing bundle ID", the build produced a partial bundle. Run `clean` then `build_sim --preferXcodebuild=true` (the incremental builder can produce incomplete output).
+- React Native / Expo setup — `npx expo prebuild --platform ios --clean` must run before first build to download XCFramework simulator slices. Without this, the `[CP] Copy XCFrameworks` build phase fails with rsync errors.
+Use `ah tools xcodebuild --help-tool` and `xcrun xctrace help` for all available operations. This suite teaches what to validate and why — the tools teach how.
+### Simulator Visibility
+Per **Agentic Validation Tooling**, programmatic validation replaces human supervision — but engineer trust requires observability. **Visible simulators are preferred** over headless for local development and validation:
+- Call `open_sim` before `boot_sim` to make the Simulator.app window visible
+- Visible mode lets engineers observe agent-driven UI interactions, verify screenshot quality, and spot issues the agent might miss
+- `boot_sim` alone boots headless (no window) — appropriate for CI but not for interactive validation sessions
+- The `.xcodebuildmcp/config.yaml` `sessionDefaults.simulatorName` targets the device; visibility is controlled by whether Simulator.app is open
+### Simulator Isolation (Multi-Worktree)
+When running validation across multiple worktrees simultaneously, each needs an isolated simulator, derived data path, and Metro port to prevent contention:
+- **Dedicated simulator**: `xcrun simctl create "<worktree>-iPhone16Pro" "iPhone 16 Pro"` creates a named clone. Target by UDID via `session-set-defaults --simulatorId=<UDID>`.
+- **Derived data isolation**: `build_sim --derivedDataPath=<path>` or `-derivedDataPath` on any build tool keeps build products separate per worktree.
+- **AGENT_ID isolation**: The harness MCP daemon already isolates sessions by `AGENT_ID` — parallel agents get independent XcodeBuildMCP sessions automatically.
+- **Expo Metro port isolation**: Each worktree's Metro bundler must run on a unique port. Use `expo start --port <port>` (e.g., 8081, 8082, 8083). The built app connects to the port specified at launch time — if switching ports, the app must be rebuilt with `expo run:ios` targeting the new port, or the `RCT_METRO_PORT` environment variable must be set before build. The `--port` flag on `expo start` only controls where Metro listens; the app's compiled bundler URL must match.
+### Test Target Setup (Expo / React Native)
+For Expo projects used as validation targets:
+1. `npm install` in the project root
+2. `npx expo prebuild --platform ios --clean` — generates `ios/` directory with proper XCFramework slices. CocoaPods install is handled automatically.
+3. **Build using preconfigured package.json scripts when they exist** — check for `"ios"` script (typically `expo run:ios`). Use `npm run ios -- --device "<SimulatorName>"` to target a specific simulator. Fall back to `expo run:ios` directly if no script exists. For already-built apps testing JS-only changes, use `expo start` instead.
+4. `expo run:ios` handles the full pipeline: build → install → launch. However, in non-interactive environments it may log `"Skipping dev server"` — the Metro bundler won't start automatically. Start `expo start --port 8081` separately to serve the JS bundle, then reload the app via UI automation (`tap --label "Reload"` on the red box).
+5. First build after `expo prebuild --clean` should use `--preferXcodebuild=true` if building via xcodebuild MCP tools directly.
+The workspace path for session defaults is `./ios/<project>.xcworkspace` (not `.xcodeproj`) when CocoaPods are in use. Use `simulatorId` (UDID) rather than `simulatorName` — the two parameters are **mutually exclusive** in `session-set-defaults`, and many tools require the UDID. Prefer `simulatorId` for reliable targeting.
+### Expo Dev Server and HMR
+For JS-only change validation (no native code changes):
+- `expo start --port <port>` starts the Metro bundler for Hot Module Replacement. JS file changes propagate to the running app automatically without rebuild.
+- Verify HMR changes via `describe_ui` — check that `AXLabel` values reflect the updated text after file save.
+- The Metro bundler must be running on the port the app expects (default 8081). The app connects to the bundler URL passed at launch time.
+- For native code changes, a full rebuild via `expo run:ios` or `build_sim` is required.
+### Deterministic Teardown
+Teardown in reverse order of setup to prevent orphaned processes. **Ordering is critical** — stopping log capture (console mode) terminates the app, which terminates any attached xctrace session:
+1. **Wait for / stop xctrace**: If profiling is active, either let it reach `--time-limit` or terminate it first. An attached xctrace session will end when the app process exits, so stopping it before step 2 ensures a clean trace file.
+2. **Stop the app**: `stop_app_sim --bundleId "<bundle_id>"` — terminates the app in the simulator
+3. **Stop active log captures**: `stop_sim_log_cap --logSessionId "<id>"` for any active sessions. Console capture mode terminates the app on stop — skip step 2 if using console capture.
+4. **Kill JS log stream**: If a background `log stream` process was started for JS log capture (subsystem `com.facebook.react.log`), terminate it.
+5. **Clear session defaults**: `session-clear-defaults --all true` — prevents stale defaults from affecting the next session
+6. **Kill the Metro dev server**: Terminate the `expo start` process (background task or PID)
+7. **Clean trace artifacts**: Remove `.trace` bundles from `/tmp` or working directory
+8. **Simulator**: Leave running for potential reuse by other worktrees. Only shut down via `xcrun simctl shutdown <UDID>` if explicitly cleaning up.
+### Bad State Detection
+Detect app failures by comparing `describe_ui` output against expected state:
+- **Red box / error screen**: UI hierarchy contains `redbox-dismiss`, `redbox-reload`, `redbox-copy` buttons — the app hit a JS error or couldn't connect to the dev server
+- **Home screen instead of app**: `AXLabel` shows system app names (Safari, Messages, etc.) with `pid` belonging to SpringBoard — the app crashed or was terminated
+- **Empty view hierarchy**: Single `Application` node with no children — the app is loading or hung during initialization
+- **Stale PID**: `describe_ui` returns elements with a different `pid` than expected — the app was relaunched (possibly by `captureConsole` or xctrace `--launch`)
+On bad state detection: `screenshot` for evidence, then attempt recovery via `stop_app_sim` → `launch_app_sim`. If the simulator itself is unresponsive, `erase_sims` for a clean slate.
+## Deterministic Integration
+**Planned — not yet implemented.**
+The deterministic dimension for this suite will use `xcodebuild test` for CI-gated binary pass/fail validation:
+- **Unit tests**: XCTest suites validating business logic, model layer, and service interfaces. Run via `xcodebuild test -scheme <scheme> -destination 'platform=iOS Simulator,name=<device>'`.
+- **Snapshot tests**: Point-free swift-snapshot-testing or similar for visual regression of individual views/components. Baseline images committed to repo, fail CI on drift.
+- **Performance tests**: XCTest `measure {}` blocks with baselines for critical code paths. Fail on regression beyond configured deviation.
+These will be implemented as the suite matures through the crystallization lifecycle — current stochastic exploration patterns will inform which deterministic gates are most valuable.
+## ENV Configuration
+| Variable | Required | Dimension | Purpose |
+|----------|----------|-----------|---------|
+| `XCODE_WORKSPACE` | No | Both | Path to `.xcworkspace` (discovered automatically if not set) |
+| `XCODE_SCHEME` | No | Both | Build scheme name (discovered via `list_schemes` if not set) |
+| `SIMULATOR_NAME` | No | Stochastic | Target simulator device (e.g., `iPhone 16 Pro`) |
+| `XCODEBUILDMCP_WORKFLOWS` | No | Stochastic | Comma-separated workflow groups to enable beyond `simulator` |
+| `DERIVED_DATA_PATH` | No | Both | Custom DerivedData location for build isolation |
+| `RCT_METRO_PORT` | No | Stochastic | Metro bundler port for Expo/RN apps (default 8081). Must match `expo start --port` |
+| `CI` | Auto | Deterministic | Set by CI environment; controls test retry and reporter behavior |
+Project-specific configuration should be committed as `.xcodebuildmcp/config.yaml` in the target project root. This file controls which workflow groups are enabled and sets session defaults (workspace path, scheme, simulator, configuration). The agent should still discover the target project's workspace structure at execution time via `discover_projs` — the config file provides defaults, not overrides.

package/.allhands/workflows/documentation.md ADDED Viewed

@@ -0,0 +1,86 @@
+---
+name: documentation
+type: documentation
+planning_depth: focused
+jury_required: false
+max_tangential_hypotheses: 1
+required_ideation_questions:
+  - "What areas need documentation?"
+  - "Who is the audience?"
+  - "Any existing docs to extend or replace?"
+  - "What format and location?"
+---
+## Domain Knowledge
+### Audience-First Thinking
+Documentation specs are organized around audiences, not features. The same system may need different documentation for different readers:
+| Audience | Focus | Depth |
+|----------|-------|-------|
+| **Developers** | APIs, architecture, contribution guides | Technical, code-level |
+| **End users** | Features, workflows, troubleshooting | Task-oriented, no internals |
+| **Ops/SRE** | Runbooks, monitoring, deployment | Operational, procedure-focused |
+| **New team members** | Onboarding, architecture overview, conventions | Progressive, context-building |
+### Documentation State Vocabulary
+Existing documentation falls into identifiable states that inform the approach:
+| State | Meaning | Action |
+|-------|---------|--------|
+| **Outdated** | Exists but no longer accurate | Update with current reality |
+| **Missing** | No documentation exists | Create from scratch |
+| **Scattered** | Information exists across multiple locations | Consolidate and organize |
+| **Wrong** | Actively misleading | Correct with high priority |
+### Format Taxonomy
+Documentation format should match audience and content type:
+| Format | Best For |
+|--------|----------|
+| **README** | Project overview, quickstart, contribution guide |
+| **Docs site** | Comprehensive reference, tutorials, guides |
+| **Inline code docs** | API reference, function-level documentation |
+| **Runbooks** | Operational procedures, incident response |
+## Ideation Guidance
+Per **Knowledge Compounding**, documentation compounds value when it targets the right audience with the right depth.
+### Probe Guidance
+- Probe vague coverage requests — demand specific areas and audiences
+- Distinguish between "no docs" and "wrong docs" — the approach differs significantly
+### Output Sections
+Spec body sections for documentation domain:
+- **Motivation**: Why current documentation is insufficient
+- **Goals**: Coverage targets by audience and area
+- **Technical Considerations**: Existing docs state, format preferences, location
+- **Open Questions**: Unknowns the planner should investigate
+## Planning Considerations
+### Coverage-by-Audience-and-Area Framing
+Planning should organize documentation work as a coverage matrix:
+- Rows: areas/features needing documentation
+- Columns: audiences requiring documentation
+- Cells: specific documentation deliverables
+This framing prevents gaps and avoids redundant documentation across audiences.
+### Existing Documentation Assessment
+Before writing new documentation, planning should assess what exists:
+- Audit existing docs for accuracy and completeness
+- Identify reusable content vs content needing replacement
+- Map existing documentation to the coverage matrix
+### Prompt Output Range
+Documentation specs produce 2-5 focused prompts. Each prompt typically covers one audience or one major area.

package/.allhands/workflows/investigation.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+name: investigation
+type: investigation
+planning_depth: focused
+jury_required: false
+max_tangential_hypotheses: 2
+required_ideation_questions:
+  - "What's broken / what's the issue?"
+  - "What evidence do you have?"
+  - "What does 'fixed' look like?"
+  - "Any constraints?"
+  - "Any suspected root causes?"
+---
+## Domain Knowledge
+### Problem-Evidence-Fix Framing
+Investigation specs are structured around a symptom-first approach: capture what's wrong, gather evidence, define what "fixed" means. The engineer describes symptoms, not suspected causes — root cause identification is the investigation's output, not its input.
+### Evidence Vocabulary
+Evidence types to surface and categorize:
+| Evidence Type | Examples |
+|---------------|----------|
+| **Error logs** | Stack traces, error messages, log patterns |
+| **Reproduction steps** | Exact sequence to trigger the issue |
+| **Affected scope** | Users affected, environments, frequency |
+| **Temporal patterns** | When it started, intermittent vs constant, correlation with deploys |
+| **Metrics** | Error rates, latency spikes, resource exhaustion signals |
+### Suspected Root Causes as Hypothesis Seeds
+Engineer-provided suspected causes are hypothesis seeds, not conclusions. They inform investigation direction but should not constrain the search space. Weight them alongside evidence-based hypotheses generated during planning.
+### Knowledge Gap Detection
+| Signal | Action |
+|--------|--------|
+| "It just broke" (no timeline) | Probe for recent changes, deploys, config updates |
+| "It happens sometimes" (no pattern) | Probe for environmental differences, load conditions |
+| "I think it's X" (premature diagnosis) | Acknowledge hypothesis, still gather full evidence |
+| Symptom described as cause | Redirect to observable behavior — "what do you see?" |
+## Ideation Guidance
+Per **Ideation First**, the investigation interview captures the problem space so the planner can ground hypotheses in evidence.
+### Probe Guidance
+- Probe vague symptom descriptions — demand concrete evidence
+- Separate symptoms from suspected causes — capture both but label them distinctly
+### Output Sections
+Spec body sections for investigation domain:
+- **Motivation**: The problem and its impact
+- **Goals**: Success criteria from "what does fixed look like"
+- **Technical Considerations**: Evidence, constraints, suspected causes
+- **Open Questions**: Unknowns the planner should investigate
+## Planning Considerations
+### Focused Research
+Focused research on the problem domain rather than broad codebase exploration. Investigation planning should:
+- Ground hypotheses in gathered evidence
+- Prioritize hypotheses by evidence weight and impact
+- Design diagnostic steps that narrow the search space efficiently
+### Hypothesis-Driven Investigation Approach
+Prompts should be structured as hypothesis validation steps:
+- Each prompt tests one or more hypotheses
+- Early prompts gather diagnostic data; later prompts apply fixes
+- Evidence correlation patterns guide hypothesis ordering
+### Prompt Output Range
+Investigation specs produce 2-5 focused prompts. Investigation is inherently iterative — fewer, targeted prompts are preferred over broad sweeps.

package/.allhands/workflows/milestone.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: milestone
+type: milestone
+planning_depth: deep
+jury_required: true
+max_tangential_hypotheses: 5
+required_ideation_questions:
+  - "What are you trying to accomplish?"
+  - "Why does this matter and what worries you about this?"
+  - "What can you handle vs need automated?"
+  - "What would success look like?"
+---
+## Domain Knowledge
+### Core Interview Dimensions
+The `required_ideation_questions` elicit each dimension directly. Also infer dimensions passively from engineer behavior:
+| Dimension | Elicit via | Infer from |
+|-----------|------------|------------|
+| Goals | "What are you trying to accomplish?" | Problem description |
+| Motivations | "Why does this matter?" | Frustrations expressed |
+| Concerns | "What worries you about this?" | Caveats/hedging |
+| Desires | "What would ideal look like?" | Enthusiasm |
+| Capabilities | "What can you handle vs need automated?" | Technical language |
+| Expectations | "What would success look like?" | Examples given |
+### Category Deep Dives
+Work through relevant categories based on milestone scope. Each category surfaces domain-specific concerns that engineers often underspecify:
+| Category | Key Questions | Knowledge Gap Signals |
+|----------|---------------|----------------------|
+| **User Experience** | "Walk through: user opens this first time - what happens?" | Describes features instead of journeys |
+| **Data & State** | "What needs to be stored? Where does data come from/go?" | Says "just a database" without schema thinking |
+| **Technical** | "What systems must this work with? Constraints?" | Picks tech without understanding tradeoffs |
+| **Scale** | "How many users/requests? Now vs future?" | Says "millions" without infrastructure thinking |
+| **Integrations** | "External services? APIs consumed/created?" | Assumes integrations are simple |
+| **Security** | "Who should do what? Sensitive data?" | Says "just basic login" |
+### Additional Knowledge Gap Signals
+| Signal | Action |
+|--------|--------|
+| Conflicting requirements | Surface the conflict explicitly and ask for Disposable Variants Approach |
+### Completeness Check
+Before transitioning from ideation to spec writing, verify coverage:
+| Area | Verified |
+|------|----------|
+| Problem statement clear | [ ] |
+| Technical constraints understood | [ ] |
+| User expectations deeply understood | [ ] |
+| All discernable milestone elements either have a user expectation, or an open question for downstream agents | [ ] |
+| No "To Be Discussed" items remaining | [ ] |
+If gaps exist, return to surveying for specific categories.
+## Ideation Guidance
+Per **Ideation First**, engineers control depth — domain config ensures coverage without forcing depth.
+### Probe Guidance
+- Probe vague responses with category deep dives
+- Detect knowledge gaps using the signal tables
+### Guiding Principles Synthesis
+Synthesize guiding principles from the engineer's philosophy expressed during ideation. Validate synthesized principles with the engineer before proceeding to spec writing.
+### Output Sections
+Spec body sections for milestone domain:
+- **Motivation**: Implicit in goals — why this matters
+- **Goals**: What the engineer is trying to accomplish
+- **Technical Considerations**: Grounded in codebase reality from exploration subtasks
+- **Open Questions**: For architect to research/decide during planning
+### Optional: Spec Flow Analysis
+Before or after creating the spec, offer flow analysis for complex features. Recommended for user-facing features with multiple paths, complex integrations, or features with unclear scope boundaries.
+## Planning Considerations
+### Prompt Output Range
+Milestone specs produce 5-15 coordinated prompts. Prompts must be fully autonomous — no human intervention during execution.

package/.allhands/workflows/optimization.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: optimization
+type: optimization
+planning_depth: focused
+jury_required: false
+max_tangential_hypotheses: 2
+required_ideation_questions:
+  - "What's slow / expensive?"
+  - "What are the performance targets?"
+  - "How should improvements be measured?"
+  - "Current baseline metrics?"
+  - "Any constraints?"
+---
+## Domain Knowledge
+### Performance Vocabulary
+Optimization specs are grounded in quantitative measurement. Key performance dimensions:
+| Dimension | Metrics | Examples |
+|-----------|---------|----------|
+| **Latency** | Response time, P50/P95/P99 | "API responds in 200ms P95" |
+| **Throughput** | Requests/sec, items/sec | "Process 1000 events/sec" |
+| **Resource usage** | Memory, CPU, disk, connections | "Stay under 512MB RSS" |
+| **Cost** | $/request, $/month, compute hours | "Reduce Lambda cost by 40%" |
+### Baseline-Target-Measurement Triple
+Every optimization must establish three things:
+1. **Baseline**: Current measured performance ("now it takes 2s P95")
+2. **Target**: Concrete improvement goal ("reduce to 500ms P95")
+3. **Measurement**: How improvement is verified ("benchmark suite X, dashboard Y")
+Without all three, the optimization is underspecified. Probe for missing elements.
+### Knowledge Gap Detection
+| Signal | Action |
+|--------|--------|
+| "It feels slow" (no numbers) | Demand concrete metrics — profile first |
+| "Make it faster" (no target) | Probe for acceptable thresholds |
+| "Optimize everything" (no focus) | Identify the bottleneck — what's the user-facing pain? |
+| Assumes cause without profiling | Redirect to measurement — "have you profiled this?" |
+## Ideation Guidance
+Per **Ideation First**, the optimization interview captures measurable targets so the planner can create profiling-first hypotheses.
+### Probe Guidance
+- Probe vague targets — demand concrete numbers
+- Verify baseline metrics exist or flag measurement as a prerequisite task
+### Output Sections
+Spec body sections for optimization domain:
+- **Motivation**: What's slow/expensive and why it matters
+- **Goals**: Performance targets with measurable thresholds
+- **Technical Considerations**: Baseline metrics, measurement approach, constraints
+- **Open Questions**: Unknowns the planner should profile or research
+## Planning Considerations
+### Profiling-First Approach
+Planning should front-load measurement and profiling:
+- First prompt(s) establish baseline measurements if not already available
+- Optimization prompts follow, each targeting a specific bottleneck
+- Final prompt verifies targets are met against the same measurement approach
+### Measurement Method Validation
+The measurement approach itself must be validated — unreliable benchmarks produce unreliable results. Planning should ensure the measurement tooling is trustworthy before optimizing against it.
+### Backwards Compatibility Constraints
+Optimization must not change observable behavior. Planning should surface:
+- API contract preservation requirements
+- Data format compatibility
+- Feature flag needs for gradual rollout of performance changes
+### Prompt Output Range
+Optimization specs produce 2-6 focused prompts. Measurement setup may require its own prompt if baselines don't exist.

package/.allhands/workflows/refactor.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: refactor
+type: refactor
+planning_depth: focused
+jury_required: false
+max_tangential_hypotheses: 2
+required_ideation_questions:
+  - "What's the scope?"
+  - "What invariants must be preserved?"
+  - "What's the target architecture?"
+  - "Incremental or big-bang?"
+  - "Any constraints?"
+---
+## Domain Knowledge
+### Invariant Preservation
+Refactor specs are defined by what must NOT change alongside what should. Key invariant categories:
+| Invariant Type | Examples |
+|----------------|----------|
+| **API contracts** | Public function signatures, REST endpoints, event schemas |
+| **Observable behavior** | Output for given inputs, side effects, error handling |
+| **Test coverage** | Existing tests continue to pass without modification |
+| **External interfaces** | Database schemas, file formats, wire protocols |
+Invariants are the safety rails of a refactor — they define the transformation's constraints and enable confident validation.
+### Current-State to Target-Architecture Framing
+Every refactor must articulate:
+1. **Current state**: What exists now and why it's problematic
+2. **Target architecture**: The desired end state — pattern, structure, naming, organization
+3. **Transformation path**: How to get from current to target — incremental stages or atomic landing
+### Migration Strategy Dimension
+The incremental vs big-bang decision shapes the entire plan:
+| Strategy | When to Use | Planning Impact |
+|----------|-------------|-----------------|
+| **Incremental** | Large scope, dependent consumers, high risk | Multiple prompts with intermediate stable states |
+| **Big-bang** | Small scope, isolated module, low risk | Fewer prompts, atomic transformation |
+| **Feature-flagged** | Parallel old/new paths needed during transition | Additional prompt for flag setup and cleanup |
+### Knowledge Gap Detection
+| Signal | Action |
+|--------|--------|
+| "Clean up the code" (no target) | Probe for specific target architecture |
+| "Refactor everything" (no scope) | Demand scope boundaries — which modules/files? |
+| "It should just work the same" (vague invariants) | Enumerate specific contracts to preserve |
+| No mention of tests | Surface test coverage as explicit invariant |
+## Ideation Guidance
+Per **Ideation First**, the refactor interview captures scope boundaries and invariants so the planner can create safe transformation hypotheses.
+### Probe Guidance
+- Probe vague scope boundaries — demand specific modules, files, or patterns
+- Enumerate invariants explicitly — don't assume the engineer has considered all contract surfaces
+### Output Sections
+Spec body sections for refactor domain:
+- **Motivation**: Why the current structure is problematic
+- **Goals**: Target architecture and preserved invariants
+- **Non-Goals**: What's explicitly out of scope (unique to refactor)
+- **Technical Considerations**: Migration strategy, constraints, coordination needs
+- **Open Questions**: Unknowns the planner should investigate
+## Planning Considerations
+### Feature Flag Consideration
+For staged delivery, planning should evaluate whether feature flags are needed:
+- Parallel old/new code paths during transition
+- Gradual migration of dependent consumers
+- Rollback capability for high-risk transformations
+### Dependent Consumer Coordination
+When refactoring shared code, planning must account for:
+- Which consumers depend on the current interface
+- Migration order for dependent consumers
+- Whether consumers can be updated atomically or need compatibility shims
+### Test Coverage Preservation
+Planning should verify and maintain test coverage:
+- Existing tests pass against both current and target architectures during transition
+- New tests cover the target architecture's specific patterns
+- Test migration may require its own prompt for large refactors
+### Prompt Output Range
+Refactor specs produce 2-7 focused prompts. Incremental refactors with many dependent consumers trend toward the higher end.