all-hands-cli 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.allhands/README.md +75 -0
- package/.allhands/agents/compounder.yaml +15 -0
- package/.allhands/agents/coordinator.yaml +17 -0
- package/.allhands/agents/documentor.yaml +15 -0
- package/.allhands/agents/e2e-test-planner.yaml +17 -0
- package/.allhands/agents/emergent.yaml +22 -0
- package/.allhands/agents/executor.yaml +14 -0
- package/.allhands/agents/ideation.yaml +11 -0
- package/.allhands/agents/initiative-steering.yaml +19 -0
- package/.allhands/agents/judge.yaml +13 -0
- package/.allhands/agents/planner.yaml +19 -0
- package/.allhands/agents/pr-reviewer.yaml +15 -0
- package/.allhands/docs.json +5 -0
- package/.allhands/docs.local.json +26 -0
- package/.allhands/flows/COMPOUNDING.md +203 -0
- package/.allhands/flows/COORDINATION.md +89 -0
- package/.allhands/flows/CORE.md +87 -0
- package/.allhands/flows/DOCUMENTATION.md +218 -0
- package/.allhands/flows/E2E_TEST_PLAN_BUILDING.md +140 -0
- package/.allhands/flows/EMERGENT_PLANNING.md +57 -0
- package/.allhands/flows/IDEATION_SCOPING.md +154 -0
- package/.allhands/flows/INITIATIVE_STEERING.md +110 -0
- package/.allhands/flows/JUDGE_REVIEWING.md +79 -0
- package/.allhands/flows/PROMPT_TASK_EXECUTION.md +68 -0
- package/.allhands/flows/PR_REVIEWING.md +43 -0
- package/.allhands/flows/SPEC_PLANNING.md +216 -0
- package/.allhands/flows/harness/WRITING_HARNESS_FLOWS.md +27 -0
- package/.allhands/flows/harness/WRITING_HARNESS_KNOWLEDGE.md +27 -0
- package/.allhands/flows/harness/WRITING_HARNESS_ORCHESTRATION.md +27 -0
- package/.allhands/flows/harness/WRITING_HARNESS_SKILLS.md +27 -0
- package/.allhands/flows/harness/WRITING_HARNESS_TOOLS.md +27 -0
- package/.allhands/flows/harness/WRITING_HARNESS_VALIDATION_TOOLING.md +27 -0
- package/.allhands/flows/shared/CODEBASE_UNDERSTANDING.md +72 -0
- package/.allhands/flows/shared/CREATE_HARNESS_SPEC.md +48 -0
- package/.allhands/flows/shared/CREATE_SPEC.md +41 -0
- package/.allhands/flows/shared/CREATE_VALIDATION_TOOLING_SPEC.md +70 -0
- package/.allhands/flows/shared/DOCUMENTATION_DISCOVERY.md +123 -0
- package/.allhands/flows/shared/DOCUMENTATION_WRITER.md +101 -0
- package/.allhands/flows/shared/EMERGENT_REFINEMENT_ANALYSIS.md +76 -0
- package/.allhands/flows/shared/EXTERNAL_TECH_GUIDANCE.md +97 -0
- package/.allhands/flows/shared/IDEATION_CODEBASE_GROUNDING.md +49 -0
- package/.allhands/flows/shared/PLAN_DEEPENING.md +152 -0
- package/.allhands/flows/shared/PROMPT_TASKS_CURATION.md +113 -0
- package/.allhands/flows/shared/PROMPT_VALIDATION_REVIEW.MD +99 -0
- package/.allhands/flows/shared/QUICK_PREMORTEM.md +70 -0
- package/.allhands/flows/shared/RESEARCH_GUIDANCE.md +38 -0
- package/.allhands/flows/shared/REVIEW_OPTIONS_BREAKDOWN.md +68 -0
- package/.allhands/flows/shared/SKILL_EXTRACTION.md +84 -0
- package/.allhands/flows/shared/SPEC_FLOW_ANALYSIS.md +119 -0
- package/.allhands/flows/shared/TDD_WORKFLOW.md +109 -0
- package/.allhands/flows/shared/UTILIZE_VALIDATION_TOOLING.md +84 -0
- package/.allhands/flows/shared/WRITING_HARNESS_FLOWS.md +11 -0
- package/.allhands/flows/shared/WRITING_HARNESS_MCP_TOOLS.md +84 -0
- package/.allhands/flows/shared/jury/ARCHITECTURE_REVIEW.md +91 -0
- package/.allhands/flows/shared/jury/BEST_PRACTICES_REVIEW.md +80 -0
- package/.allhands/flows/shared/jury/CLAIM_VERIFICATION_REVIEW.md +101 -0
- package/.allhands/flows/shared/jury/EXPECTATIONS_FIT_REVIEW.md +78 -0
- package/.allhands/flows/shared/jury/MAINTAINABILITY_REVIEW.md +110 -0
- package/.allhands/flows/shared/jury/PROMPTS_EXPECTATIONS_FIT.md +74 -0
- package/.allhands/flows/shared/jury/PROMPTS_FLOW_ANALYSIS.md +92 -0
- package/.allhands/flows/shared/jury/PROMPTS_YAGNI.md +78 -0
- package/.allhands/flows/shared/jury/PROMPT_PREMORTEM.md +125 -0
- package/.allhands/flows/shared/jury/SECURITY_REVIEW.md +86 -0
- package/.allhands/flows/shared/jury/YAGNI_REVIEW.md +82 -0
- package/.allhands/flows/wip/DEBUG_INVESTIGATION.md +162 -0
- package/.allhands/flows/wip/MEMORY_RECALL.md +62 -0
- package/.allhands/harness/ah +131 -0
- package/.allhands/harness/package-lock.json +5292 -0
- package/.allhands/harness/package.json +52 -0
- package/.allhands/harness/src/__tests__/e2e/commands.test.ts +307 -0
- package/.allhands/harness/src/__tests__/e2e/event-loop.test.ts +539 -0
- package/.allhands/harness/src/__tests__/e2e/hooks.test.ts +427 -0
- package/.allhands/harness/src/__tests__/e2e/new-initiative-routing.test.ts +137 -0
- package/.allhands/harness/src/__tests__/e2e/run-e2e.ts +109 -0
- package/.allhands/harness/src/__tests__/e2e/specs-type.test.ts +210 -0
- package/.allhands/harness/src/__tests__/e2e/validation-hooks.test.ts +669 -0
- package/.allhands/harness/src/__tests__/e2e/validation-path-consistency.test.ts +354 -0
- package/.allhands/harness/src/__tests__/e2e/validation.test.ts +528 -0
- package/.allhands/harness/src/__tests__/harness/assertions.ts +318 -0
- package/.allhands/harness/src/__tests__/harness/cli-runner.ts +359 -0
- package/.allhands/harness/src/__tests__/harness/fixture.ts +384 -0
- package/.allhands/harness/src/__tests__/harness/hook-runner.ts +411 -0
- package/.allhands/harness/src/__tests__/harness/index.ts +122 -0
- package/.allhands/harness/src/cli.ts +36 -0
- package/.allhands/harness/src/commands/complexity.ts +177 -0
- package/.allhands/harness/src/commands/context7.ts +202 -0
- package/.allhands/harness/src/commands/docs.ts +557 -0
- package/.allhands/harness/src/commands/hooks.ts +24 -0
- package/.allhands/harness/src/commands/index.ts +51 -0
- package/.allhands/harness/src/commands/knowledge.ts +382 -0
- package/.allhands/harness/src/commands/memories.ts +302 -0
- package/.allhands/harness/src/commands/notify.ts +61 -0
- package/.allhands/harness/src/commands/oracle.ts +158 -0
- package/.allhands/harness/src/commands/perplexity.ts +220 -0
- package/.allhands/harness/src/commands/planning.ts +245 -0
- package/.allhands/harness/src/commands/schema.ts +73 -0
- package/.allhands/harness/src/commands/skills.ts +128 -0
- package/.allhands/harness/src/commands/solutions.ts +353 -0
- package/.allhands/harness/src/commands/spawn.ts +158 -0
- package/.allhands/harness/src/commands/specs.ts +532 -0
- package/.allhands/harness/src/commands/tavily.ts +226 -0
- package/.allhands/harness/src/commands/tools.ts +579 -0
- package/.allhands/harness/src/commands/trace.ts +327 -0
- package/.allhands/harness/src/commands/tui.ts +960 -0
- package/.allhands/harness/src/commands/validate.ts +143 -0
- package/.allhands/harness/src/commands/validation-tools.ts +108 -0
- package/.allhands/harness/src/hooks/context.ts +1442 -0
- package/.allhands/harness/src/hooks/enforcement.ts +170 -0
- package/.allhands/harness/src/hooks/index.ts +54 -0
- package/.allhands/harness/src/hooks/lifecycle.ts +229 -0
- package/.allhands/harness/src/hooks/notification.ts +104 -0
- package/.allhands/harness/src/hooks/observability.ts +551 -0
- package/.allhands/harness/src/hooks/session.ts +88 -0
- package/.allhands/harness/src/hooks/shared.ts +815 -0
- package/.allhands/harness/src/hooks/transcript-parser.ts +208 -0
- package/.allhands/harness/src/hooks/validation.ts +617 -0
- package/.allhands/harness/src/lib/__tests__/ctags.test.ts +244 -0
- package/.allhands/harness/src/lib/__tests__/docs-validation.test.ts +344 -0
- package/.allhands/harness/src/lib/__tests__/mcp-runtime.test.ts +190 -0
- package/.allhands/harness/src/lib/__tests__/schema.test.ts +861 -0
- package/.allhands/harness/src/lib/base-command.ts +198 -0
- package/.allhands/harness/src/lib/cli-daemon.ts +343 -0
- package/.allhands/harness/src/lib/compaction.ts +313 -0
- package/.allhands/harness/src/lib/ctags.ts +497 -0
- package/.allhands/harness/src/lib/docs-validation.ts +907 -0
- package/.allhands/harness/src/lib/event-loop.ts +662 -0
- package/.allhands/harness/src/lib/flows.ts +155 -0
- package/.allhands/harness/src/lib/git.ts +276 -0
- package/.allhands/harness/src/lib/knowledge-worker.ts +72 -0
- package/.allhands/harness/src/lib/knowledge.ts +810 -0
- package/.allhands/harness/src/lib/llm.ts +255 -0
- package/.allhands/harness/src/lib/mcp-client.ts +432 -0
- package/.allhands/harness/src/lib/mcp-daemon.ts +486 -0
- package/.allhands/harness/src/lib/mcp-runtime.ts +418 -0
- package/.allhands/harness/src/lib/notification.ts +115 -0
- package/.allhands/harness/src/lib/opencode/index.ts +70 -0
- package/.allhands/harness/src/lib/opencode/profiles.ts +300 -0
- package/.allhands/harness/src/lib/opencode/prompts/codesearch.md +98 -0
- package/.allhands/harness/src/lib/opencode/prompts/knowledge-aggregator.md +67 -0
- package/.allhands/harness/src/lib/opencode/runner.ts +281 -0
- package/.allhands/harness/src/lib/oracle.ts +926 -0
- package/.allhands/harness/src/lib/planning-utils.ts +150 -0
- package/.allhands/harness/src/lib/planning.ts +605 -0
- package/.allhands/harness/src/lib/pr-review.ts +225 -0
- package/.allhands/harness/src/lib/prompts.ts +522 -0
- package/.allhands/harness/src/lib/schema.ts +418 -0
- package/.allhands/harness/src/lib/schemas/agent-profile.ts +141 -0
- package/.allhands/harness/src/lib/schemas/template-vars.ts +138 -0
- package/.allhands/harness/src/lib/session.ts +164 -0
- package/.allhands/harness/src/lib/specs.ts +348 -0
- package/.allhands/harness/src/lib/tldr.ts +829 -0
- package/.allhands/harness/src/lib/tmux.ts +1051 -0
- package/.allhands/harness/src/lib/trace-store.ts +714 -0
- package/.allhands/harness/src/mcp/__tests__/index.test.ts +46 -0
- package/.allhands/harness/src/mcp/_template.ts +47 -0
- package/.allhands/harness/src/mcp/filesystem.ts +33 -0
- package/.allhands/harness/src/mcp/index.ts +69 -0
- package/.allhands/harness/src/mcp/playwright.ts +34 -0
- package/.allhands/harness/src/mcp/xcodebuild.ts +29 -0
- package/.allhands/harness/src/schemas/docs.schema.json +44 -0
- package/.allhands/harness/src/schemas/settings.schema.json +214 -0
- package/.allhands/harness/src/tui/actions.ts +227 -0
- package/.allhands/harness/src/tui/file-viewer-modal.ts +270 -0
- package/.allhands/harness/src/tui/index.ts +1574 -0
- package/.allhands/harness/src/tui/modal.ts +232 -0
- package/.allhands/harness/src/tui/prompts-pane.ts +186 -0
- package/.allhands/harness/src/tui/status-pane.ts +434 -0
- package/.allhands/harness/tsconfig.json +22 -0
- package/.allhands/harness/vitest.config.ts +13 -0
- package/.allhands/pillars.md +33 -0
- package/.allhands/principles.md +88 -0
- package/.allhands/schemas/alignment.yaml +51 -0
- package/.allhands/schemas/documentation.yaml +10 -0
- package/.allhands/schemas/prompt.yaml +92 -0
- package/.allhands/schemas/skill.yaml +34 -0
- package/.allhands/schemas/solution.yaml +131 -0
- package/.allhands/schemas/spec.yaml +67 -0
- package/.allhands/schemas/validation-suite.yaml +49 -0
- package/.allhands/schemas/workflow.yaml +51 -0
- package/.allhands/settings.json +57 -0
- package/.allhands/skills/claude-code-patterns/SKILL.md +60 -0
- package/.allhands/skills/claude-code-patterns/docs/context-hygiene.md +19 -0
- package/.allhands/skills/harness-maintenance/SKILL.md +449 -0
- package/.allhands/skills/harness-maintenance/references/core-architecture.md +187 -0
- package/.allhands/skills/harness-maintenance/references/harness-skills.md +87 -0
- package/.allhands/skills/harness-maintenance/references/knowledge-compounding.md +78 -0
- package/.allhands/skills/harness-maintenance/references/tools-commands-mcp-hooks.md +115 -0
- package/.allhands/skills/harness-maintenance/references/validation-tooling.md +77 -0
- package/.allhands/skills/harness-maintenance/references/writing-flows.md +84 -0
- package/.allhands/validation/browser-automation.md +109 -0
- package/.allhands/validation/xcode-automation.md +195 -0
- package/.allhands/workflows/documentation.md +86 -0
- package/.allhands/workflows/investigation.md +81 -0
- package/.allhands/workflows/milestone.md +91 -0
- package/.allhands/workflows/optimization.md +85 -0
- package/.allhands/workflows/refactor.md +99 -0
- package/.allhands/workflows/triage.md +81 -0
- package/.claude/README.md +1 -0
- package/.claude/agents/explorer.md +10 -0
- package/.claude/agents/researcher.md +11 -0
- package/.claude/agents/task-runner.md +8 -0
- package/.claude/settings.json +231 -0
- package/.env.ai.example +7 -0
- package/.github/workflows/npm-publish.yml +69 -0
- package/.internal.json +45 -0
- package/.tldr/config.json +11 -0
- package/.tldrignore +90 -0
- package/CLAUDE.md +6 -0
- package/README.md +98 -0
- package/bin/sync-cli.js +7552 -0
- package/concerns.md +7 -0
- package/docs/README.md +41 -0
- package/docs/agents/README.md +24 -0
- package/docs/agents/agent-configuration-system.md +86 -0
- package/docs/agents/execution-agents.md +50 -0
- package/docs/agents/knowledge-agents.md +61 -0
- package/docs/agents/orchestration-agent.md +57 -0
- package/docs/agents/planning-agents.md +84 -0
- package/docs/agents/quality-review-agents.md +67 -0
- package/docs/agents/workflow-agent-orchestration.md +69 -0
- package/docs/flows/README.md +44 -0
- package/docs/flows/compounding.md +126 -0
- package/docs/flows/coordination.md +72 -0
- package/docs/flows/core-harness-integration.md +63 -0
- package/docs/flows/documentation-orchestration.md +98 -0
- package/docs/flows/e2e-test-plan-building.md +83 -0
- package/docs/flows/emergent-refinement.md +104 -0
- package/docs/flows/flow-authoring-and-mcp-tools.md +89 -0
- package/docs/flows/judge-reviewing.md +112 -0
- package/docs/flows/plan-deepening-and-research.md +107 -0
- package/docs/flows/plan-review-jury.md +114 -0
- package/docs/flows/pr-reviewing.md +54 -0
- package/docs/flows/prompt-task-execution.md +119 -0
- package/docs/flows/spec-planning.md +162 -0
- package/docs/flows/type-specific-scoping-flows.md +49 -0
- package/docs/flows/validation-and-skills-integration.md +145 -0
- package/docs/flows/wip/wip-flows.md +102 -0
- package/docs/harness/README.md +23 -0
- package/docs/harness/agent-profiles.md +84 -0
- package/docs/harness/cli/README.md +24 -0
- package/docs/harness/cli/cli-entry-and-command-discovery.md +91 -0
- package/docs/harness/cli/docs-command.md +87 -0
- package/docs/harness/cli/knowledge-command.md +91 -0
- package/docs/harness/cli/minor-cli-commands.md +65 -0
- package/docs/harness/cli/oracle-command.md +113 -0
- package/docs/harness/cli/planning-command.md +95 -0
- package/docs/harness/cli/schema-and-validation-commands.md +154 -0
- package/docs/harness/cli/search-commands.md +97 -0
- package/docs/harness/cli/spawn-command.md +136 -0
- package/docs/harness/cli/specs-command.md +102 -0
- package/docs/harness/cli/tools-command.md +122 -0
- package/docs/harness/cli/trace-command.md +122 -0
- package/docs/harness/cli-daemon.md +92 -0
- package/docs/harness/event-loop.md +184 -0
- package/docs/harness/hooks/README.md +15 -0
- package/docs/harness/hooks/context-hooks.md +96 -0
- package/docs/harness/hooks/lifecycle-and-observability-hooks.md +135 -0
- package/docs/harness/hooks/validation-hooks.md +97 -0
- package/docs/harness/test-harness.md +149 -0
- package/docs/harness/tui.md +176 -0
- package/docs/memories.md +20 -0
- package/docs/solutions/agentic-issues/premature-agent-deletion-tui-action-dependency-20260130.md +49 -0
- package/docs/solutions/agentic-issues/ref-anchor-scope-mismatch-skill-references-20260131.md +55 -0
- package/docs/solutions/agentic-issues/tautological-tests-routing-20260131.md +52 -0
- package/docs/solutions/integration_issue/blocktool-output-format-mismatch-hook-runner-20260130.md +52 -0
- package/docs/solutions/integration_issue/dual-validation-path-divergence-schema-20260130.md +66 -0
- package/docs/solutions/security-issues/unsanitized-domain-path-join-20260131.md +52 -0
- package/docs/solutions/test-failures/event-loop-mock-ordering-checkAgentWindows-20260130.md +63 -0
- package/docs/sync-cli/README.md +19 -0
- package/docs/sync-cli/cli-entrypoint-and-commands.md +39 -0
- package/docs/sync-cli/commands/README.md +11 -0
- package/docs/sync-cli/commands/pull-manifest-command.md +36 -0
- package/docs/sync-cli/commands/push-command.md +84 -0
- package/docs/sync-cli/commands/sync-command.md +71 -0
- package/docs/sync-cli/systems/README.md +14 -0
- package/docs/sync-cli/systems/git-and-github-integration.md +49 -0
- package/docs/sync-cli/systems/interactive-ui.md +43 -0
- package/docs/sync-cli/systems/manifest-and-distribution.md +51 -0
- package/docs/sync-cli/systems/path-resolution.md +42 -0
- package/package.json +46 -0
- package/scripts/install-shim.sh +40 -0
- package/scripts/pre-pack.sh +25 -0
- package/specs/harness-maintenance-skill.spec.md +138 -0
- package/specs/roadmap/git-spec-lifecycle-management.spec.md +113 -0
- package/specs/sync-init-flag.spec.md +117 -0
- package/specs/unified-workflow-orchestration.spec.md +250 -0
- package/specs/validation-tooling-practice.spec.md +98 -0
- package/specs/workflow-domain-configuration.spec.md +265 -0
- package/src/commands/pull-manifest.ts +31 -0
- package/src/commands/push.ts +344 -0
- package/src/commands/sync.ts +289 -0
- package/src/lib/constants.ts +10 -0
- package/src/lib/dotfiles.ts +36 -0
- package/src/lib/fs-utils.ts +18 -0
- package/src/lib/gh.ts +40 -0
- package/src/lib/git.ts +63 -0
- package/src/lib/gitignore.ts +167 -0
- package/src/lib/manifest.ts +121 -0
- package/src/lib/marker-sync.ts +39 -0
- package/src/lib/paths.ts +38 -0
- package/src/lib/target-lines.ts +66 -0
- package/src/lib/ui.ts +78 -0
- package/src/sync-cli.ts +120 -0
- package/target-lines.json +23 -0
- package/tsconfig.json +20 -0
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Jury-based implementation review orchestration with seven specialized reviewer agents covering architecture, security, YAGNI, maintainability, best practices, expectations fit, and claim verification"
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Judge Reviewing
|
|
6
|
+
|
|
7
|
+
After implementation is complete, the judge orchestrates a jury of specialized reviewers to assess the work against planning artifacts and the original spec. The design embodies **Quality Engineering**: the question is not "does this work?" but "which issues matter enough to fix?" The engineer makes all final decisions.
|
|
8
|
+
|
|
9
|
+
[ref:.allhands/flows/JUDGE_REVIEWING.md::7a26793]
|
|
10
|
+
|
|
11
|
+
## Jury Architecture
|
|
12
|
+
|
|
13
|
+
```mermaid
|
|
14
|
+
flowchart TD
|
|
15
|
+
J[Judge Orchestrator] -->|spawn parallel| BP[Best Practices Review]
|
|
16
|
+
J --> EF[Expectations Fit Review]
|
|
17
|
+
J --> SEC[Security Review]
|
|
18
|
+
J --> Y[YAGNI Review]
|
|
19
|
+
J --> M[Maintainability Review]
|
|
20
|
+
J --> AR[Architecture Review]
|
|
21
|
+
J --> CV[Claim Verification Review]
|
|
22
|
+
|
|
23
|
+
BP --> SYN[Feedback Synthesis]
|
|
24
|
+
EF --> SYN
|
|
25
|
+
SEC --> SYN
|
|
26
|
+
Y --> SYN
|
|
27
|
+
M --> SYN
|
|
28
|
+
AR --> SYN
|
|
29
|
+
CV --> SYN
|
|
30
|
+
|
|
31
|
+
SYN --> ENG[Engineer Decision]
|
|
32
|
+
ENG --> FIX[review-fix Prompts]
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
All seven reviewers run in parallel as independent subtasks. The judge never reads jury flow files directly -- per **Context is Precious**, each subtask loads only its own review flow.
|
|
36
|
+
|
|
37
|
+
## Reviewer Specializations
|
|
38
|
+
|
|
39
|
+
### Domain Best Practices
|
|
40
|
+
|
|
41
|
+
[ref:.allhands/flows/shared/jury/BEST_PRACTICES_REVIEW.md::d8bd995]
|
|
42
|
+
|
|
43
|
+
Spawned per domain touched by the implementation (expo/react-native, trpc/serverless, database/drizzle/supabase, web/tanstack/nextjs, dev tooling, CI/CD). Each reviewer extracts domain skills and codebase knowledge, then compares implementation against extracted patterns for compliance, preferences, pitfalls, and consistency.
|
|
44
|
+
|
|
45
|
+
Fallback: if no skill findings exist for a domain, the reviewer falls back to web research. Per **Knowledge Compounding**, this gap signals a missing skill that should be created.
|
|
46
|
+
|
|
47
|
+
### Expectations Fit
|
|
48
|
+
|
|
49
|
+
[ref:.allhands/flows/shared/jury/EXPECTATIONS_FIT_REVIEW.md::40d4d15]
|
|
50
|
+
|
|
51
|
+
Verifies implementation honors desires, concerns, and decisions from ideation and planning. Checks: engineer desires implemented, success criteria met, concerns addressed, planning decisions honored, scope matched, and -- critically -- goal achievement versus mere task completion.
|
|
52
|
+
|
|
53
|
+
### Security
|
|
54
|
+
|
|
55
|
+
[ref:.allhands/flows/shared/jury/SECURITY_REVIEW.md::7e98745]
|
|
56
|
+
|
|
57
|
+
Reviews against OWASP Top 10 and common vulnerability patterns. File-type-aware: API endpoints get input validation and auth checks, database queries get parameterization review, frontend gets XSS and CSRF analysis, config files get secrets exposure scanning.
|
|
58
|
+
|
|
59
|
+
### YAGNI
|
|
60
|
+
|
|
61
|
+
[ref:.allhands/flows/shared/jury/YAGNI_REVIEW.md::0fca163]
|
|
62
|
+
|
|
63
|
+
Detects over-engineering with source-awareness: agentic over-reach gets highest priority (P1), post-planning engineer decisions get P2, original planning decisions get P3. Agents systematically over-engineer, so this reviewer provides a deliberate counterweight.
|
|
64
|
+
|
|
65
|
+
Patterns detected: beyond-scope implementation, unused code, over-abstraction, unnecessary feature flags, impossible error handling, premature optimization, orphaned artifacts, dead exports, and defensive overkill.
|
|
66
|
+
|
|
67
|
+
### Maintainability
|
|
68
|
+
|
|
69
|
+
[ref:.allhands/flows/shared/jury/MAINTAINABILITY_REVIEW.md::2d6ec3f]
|
|
70
|
+
|
|
71
|
+
Targets agentic anti-patterns specifically:
|
|
72
|
+
|
|
73
|
+
| Anti-Pattern | Description |
|
|
74
|
+
|-------------|-------------|
|
|
75
|
+
| Hallucination | Imports that don't exist, APIs used incorrectly |
|
|
76
|
+
| Duplication | Re-implementing existing utilities |
|
|
77
|
+
| Miscommunication | Prompt A establishes pattern, Prompt B ignores it |
|
|
78
|
+
| Inconsistency | Different approaches for same problem |
|
|
79
|
+
| Over-abstraction | Unnecessary wrappers, premature generalization |
|
|
80
|
+
| Orphaned artifacts | Files created but never connected |
|
|
81
|
+
|
|
82
|
+
Includes LOC reduction estimates for simplification recommendations.
|
|
83
|
+
|
|
84
|
+
### Architecture
|
|
85
|
+
|
|
86
|
+
[ref:.allhands/flows/shared/jury/ARCHITECTURE_REVIEW.md::3ec3664]
|
|
87
|
+
|
|
88
|
+
Verifies SOLID principles, dependency direction (toward stable abstractions), circular dependency detection, layer violation detection, and boundary crossing analysis. Compares against both documented architecture and implicit patterns discovered via `ah knowledge docs search`.
|
|
89
|
+
|
|
90
|
+
### Claim Verification
|
|
91
|
+
|
|
92
|
+
[ref:.allhands/flows/shared/jury/CLAIM_VERIFICATION_REVIEW.md::5bbb331]
|
|
93
|
+
|
|
94
|
+
Verifies factual assertions in prompts and alignment docs against actual codebase state. Categorizes claims as existence, absence, behavior, or location claims, each with different verification methods. Flags common verification failure patterns like "based on search results..." or "similar to X, Y does..."
|
|
95
|
+
|
|
96
|
+
## Feedback Synthesis
|
|
97
|
+
|
|
98
|
+
[ref:.allhands/flows/shared/REVIEW_OPTIONS_BREAKDOWN.md::739ad0a]
|
|
99
|
+
|
|
100
|
+
After all reviewers complete, findings are organized into actionable options:
|
|
101
|
+
|
|
102
|
+
| Severity | Criteria | Engineer Action |
|
|
103
|
+
|----------|----------|-----------------|
|
|
104
|
+
| Blocking | Prevents goal achievement, broken functionality | Must address |
|
|
105
|
+
| Recommended | Best practice violations, potential issues | Should address |
|
|
106
|
+
| Optional | Style improvements, minor enhancements | May address |
|
|
107
|
+
|
|
108
|
+
Duplicates across reviewers are combined and elevated -- repeated concerns prove urgency. The engineer chooses which issues to accept and which to decline, with reasoning documented in the alignment doc.
|
|
109
|
+
|
|
110
|
+
## Output: Review-Fix Prompts
|
|
111
|
+
|
|
112
|
+
Accepted fixes become `type: review-fix` prompts created per [ref:.allhands/flows/shared/PROMPT_TASKS_CURATION.md::1abf30b]. Per **Knowledge Compounding**, declined items are documented with reasoning to prevent future re-suggestion of rejected approaches.
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Research workflows for deepening plans with parallel skill application, codebase pattern discovery, solutions search, and external technology guidance"
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Plan Deepening and Research
|
|
6
|
+
|
|
7
|
+
Research in the harness is never open-ended browsing. Every research flow serves a specific consumer -- an ideation interview, a planning session, a prompt being enhanced -- and each has constraints that prevent context waste. This document covers the four research-oriented flows and how they compose.
|
|
8
|
+
|
|
9
|
+
## Research Tool Selection
|
|
10
|
+
|
|
11
|
+
[ref:.allhands/flows/shared/RESEARCH_GUIDANCE.md::eb9185c]
|
|
12
|
+
|
|
13
|
+
Before any research begins, agents must select the right tool for the depth needed:
|
|
14
|
+
|
|
15
|
+
```mermaid
|
|
16
|
+
flowchart TD
|
|
17
|
+
Q[Research Question] --> D{What kind of answer?}
|
|
18
|
+
D -->|Broad synthesis with citations| P[ah perplexity research]
|
|
19
|
+
D -->|+ Twitter/X community insights| PG[ah perplexity research --grok-challenge]
|
|
20
|
+
D -->|Find source URLs| T[ah tavily search]
|
|
21
|
+
D -->|Full content from known URL| TE[ah tavily extract]
|
|
22
|
+
D -->|Challenge findings with social signals| PC[ah perplexity research --challenge]
|
|
23
|
+
D -->|GitHub content| GH[gh CLI directly]
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
The combination strategy is deliberate: when unsure, run multiple tools in parallel and compare result quality. Per **Context is Precious**, this is faster than sequential attempts with wrong tools.
|
|
27
|
+
|
|
28
|
+
## Codebase Understanding
|
|
29
|
+
|
|
30
|
+
[ref:.allhands/flows/shared/CODEBASE_UNDERSTANDING.md::b10cce8]
|
|
31
|
+
|
|
32
|
+
This flow governs how agents explore the codebase without consuming excessive context. It enforces a strict search hierarchy:
|
|
33
|
+
|
|
34
|
+
| Priority | Tool | Use When |
|
|
35
|
+
|----------|------|----------|
|
|
36
|
+
| 1st | `ah knowledge docs search` | Any discovery task -- returns engineered knowledge with "why" context |
|
|
37
|
+
| 2nd | `tldr semantic search` / grep | Knowledge search insufficient, need code-level patterns |
|
|
38
|
+
| 3rd | LSP | Known symbol name from knowledge search results |
|
|
39
|
+
| 4th | `ah solutions search` / `ah memories search` | Similar problem solved before, or engineer preferences exist |
|
|
40
|
+
| 5th | `ast-grep` | Structured code pattern matching as last resort |
|
|
41
|
+
|
|
42
|
+
Knowledge search results include `insight` (engineering knowledge), `lsp_entry_points` (files with exploration rationale), and `design_notes` (architectural decisions). This is richer than raw file reads and costs fewer tokens.
|
|
43
|
+
|
|
44
|
+
### Query Formatting
|
|
45
|
+
|
|
46
|
+
Queries must be complete sentences, not keyword soup. `"how does the retry mechanism handle rate limits when calling external APIs"` outperforms `"retry rate limit api"` because the knowledge system indexes on semantic meaning.
|
|
47
|
+
|
|
48
|
+
## External Technology Guidance
|
|
49
|
+
|
|
50
|
+
[ref:.allhands/flows/shared/EXTERNAL_TECH_GUIDANCE.md::9766b03]
|
|
51
|
+
|
|
52
|
+
When implementation requires external libraries or services, this flow provides two parallel research channels:
|
|
53
|
+
|
|
54
|
+
| Channel | Tool | Returns |
|
|
55
|
+
|---------|------|---------|
|
|
56
|
+
| Documentation | `ah context7 search` | Official API references, configuration patterns, version-specific behaviors |
|
|
57
|
+
| Open source exploration | `gh search` + local clone to `.reposearch/` | Real implementation patterns, architectural decisions, library usage examples |
|
|
58
|
+
|
|
59
|
+
The clone-and-browse approach leverages the agent's file navigation capabilities -- regex search, pattern matching, and direct file reading across a cloned repository. This is more capable than API-based code search for understanding implementation patterns.
|
|
60
|
+
|
|
61
|
+
## Plan Deepening
|
|
62
|
+
|
|
63
|
+
[ref:.allhands/flows/shared/PLAN_DEEPENING.md::97692d7]
|
|
64
|
+
|
|
65
|
+
Plan deepening is an optional enhancement phase that runs after planning is complete but before execution begins. It enriches prompts with research findings without changing their scope.
|
|
66
|
+
|
|
67
|
+
### Spawned Research Axes
|
|
68
|
+
|
|
69
|
+
```mermaid
|
|
70
|
+
flowchart LR
|
|
71
|
+
PD[Plan Deepening] --> S[Skill Application]
|
|
72
|
+
PD --> SO[Solutions Search]
|
|
73
|
+
PD --> CB[Codebase Patterns]
|
|
74
|
+
PD --> ER[External Research]
|
|
75
|
+
|
|
76
|
+
S --> SY[Synthesis by Prompt]
|
|
77
|
+
SO --> SY
|
|
78
|
+
CB --> SY
|
|
79
|
+
ER --> SY
|
|
80
|
+
|
|
81
|
+
SY --> CD{Conflicts?}
|
|
82
|
+
CD -->|Yes| ENG[Flag for Engineer]
|
|
83
|
+
CD -->|No| EN[Enhance Prompts]
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Each axis runs in parallel:
|
|
87
|
+
|
|
88
|
+
- **Skill application**: Matches available skills to plan domains, extracts patterns and gotchas
|
|
89
|
+
- **Solutions search**: Checks `ah solutions search` and `ah memories search` for relevant past learnings
|
|
90
|
+
- **Codebase patterns**: Discovers existing implementations of similar patterns via `CODEBASE_UNDERSTANDING.md`
|
|
91
|
+
- **External research**: For novel technologies or high-risk domains via `RESEARCH_GUIDANCE.md`
|
|
92
|
+
|
|
93
|
+
### Enhancement Constraints
|
|
94
|
+
|
|
95
|
+
Plan deepening adds a `## Research Insights` section to each prompt. It preserves all original content -- tasks, acceptance criteria, and scope are never modified. If research conflicts with the current plan, conflicts are flagged for engineer review rather than resolved automatically.
|
|
96
|
+
|
|
97
|
+
### When to Use Plan Deepening
|
|
98
|
+
|
|
99
|
+
| Scenario | Recommended? |
|
|
100
|
+
|----------|-------------|
|
|
101
|
+
| Complex architectural decisions | Yes |
|
|
102
|
+
| High-risk domains (security, payments, data migrations) | Yes |
|
|
103
|
+
| Novel technologies not yet in codebase | Yes |
|
|
104
|
+
| Large specs with many unknowns | Yes |
|
|
105
|
+
| Straightforward feature work with established patterns | No -- adds overhead without proportional value |
|
|
106
|
+
|
|
107
|
+
Per **Knowledge Compounding**, plan deepening is where past solutions and skills compound most directly into future work. Each research finding that makes it into a prompt prevents an executor from having to rediscover the same knowledge.
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Pre-execution jury review of planning artifacts: expectations fit, flow analysis, YAGNI detection, and premortem risk analysis run before any prompt is executed"
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Plan Review Jury
|
|
6
|
+
|
|
7
|
+
The plan review jury runs after planning is complete but before any prompt executes. While the [judge review](judge-reviewing.md) evaluates finished implementation, this jury evaluates the plan itself -- catching issues when they are cheapest to fix. Per **Quality Engineering**, detecting risks before execution prevents wasted agent cycles.
|
|
8
|
+
|
|
9
|
+
## Jury Composition
|
|
10
|
+
|
|
11
|
+
The plan review jury has four members, each spawned as a parallel subtask during the planning phase described in [ref:.allhands/flows/SPEC_PLANNING.md::cc0b192]:
|
|
12
|
+
|
|
13
|
+
```mermaid
|
|
14
|
+
flowchart TD
|
|
15
|
+
PL[Planning Agent] -->|spawn parallel| EF[Expectations Fit]
|
|
16
|
+
PL --> FA[Flow Analysis]
|
|
17
|
+
PL --> YG[YAGNI]
|
|
18
|
+
PL --> PM[Premortem]
|
|
19
|
+
|
|
20
|
+
EF --> RO[Review Options Breakdown]
|
|
21
|
+
FA --> RO
|
|
22
|
+
YG --> RO
|
|
23
|
+
PM --> RO
|
|
24
|
+
|
|
25
|
+
RO --> ENG[Engineer Decision]
|
|
26
|
+
ENG -->|Accept fix| NP[New/Amended Prompts]
|
|
27
|
+
ENG -->|Decline| DOC[Document in Alignment Doc]
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Expectations Fit
|
|
31
|
+
|
|
32
|
+
[ref:.allhands/flows/shared/jury/PROMPTS_EXPECTATIONS_FIT.md::7fee0b6]
|
|
33
|
+
|
|
34
|
+
Treats the spec doc as ground truth and traces every engineer expectation to the prompts that address it. Detects:
|
|
35
|
+
|
|
36
|
+
- Spec desires not covered by any prompt
|
|
37
|
+
- Prompts that contradict spec expectations
|
|
38
|
+
- Alignment doc decisions that deviate from spec without explanation
|
|
39
|
+
- Coverage holes where an engineer expectation has no implementing prompt
|
|
40
|
+
|
|
41
|
+
Output is gap-oriented: P1 for missing coverage, P2 for inconsistencies, P3 for ambiguities that could be interpreted multiple ways.
|
|
42
|
+
|
|
43
|
+
## Flow Analysis
|
|
44
|
+
|
|
45
|
+
[ref:.allhands/flows/shared/jury/PROMPTS_FLOW_ANALYSIS.md::16202d3]
|
|
46
|
+
|
|
47
|
+
Analyzes prompt dependencies and ordering with a derisking lens. The core questions come from thinking like a tech lead:
|
|
48
|
+
|
|
49
|
+
| Priority | Question |
|
|
50
|
+
|----------|----------|
|
|
51
|
+
| Feasibility | Which prompts reveal if implementation is even possible? |
|
|
52
|
+
| Stability | Which prompts prove core architecture works? |
|
|
53
|
+
| Blockers | Which prompts unblock the most other work? |
|
|
54
|
+
| Confidence | Which prompts give earliest signal on success? |
|
|
55
|
+
| Wiring | Do prompts plan how components connect? |
|
|
56
|
+
|
|
57
|
+
The analysis builds a dependency graph, identifies the critical path, recommends reordering for maximum derisking, and assesses parallelization opportunities with merge-risk ratings (safe, medium, high based on file overlap).
|
|
58
|
+
|
|
59
|
+
## YAGNI
|
|
60
|
+
|
|
61
|
+
[ref:.allhands/flows/shared/jury/PROMPTS_YAGNI.md::3b8082f]
|
|
62
|
+
|
|
63
|
+
Evaluates planning artifacts for over-engineering with decision-source awareness:
|
|
64
|
+
|
|
65
|
+
| Source | Priority |
|
|
66
|
+
|--------|----------|
|
|
67
|
+
| Agent-proposed complexity | Higher priority -- agents systematically over-engineer |
|
|
68
|
+
| Engineer-decided complexity | Lower priority -- explicit awareness, but perspective still offered |
|
|
69
|
+
|
|
70
|
+
Detects premature abstraction, future-proofing, over-configuration, defensive complexity, feature creep within planned prompts, and scope bloat (10+ files or 7+ tasks). Emergent prompts and disposable variants are explicitly excluded from feature creep detection -- they exist to discover value by design.
|
|
71
|
+
|
|
72
|
+
## Premortem
|
|
73
|
+
|
|
74
|
+
[ref:.allhands/flows/shared/jury/PROMPT_PREMORTEM.md::aef49b5]
|
|
75
|
+
|
|
76
|
+
Identifies failure modes before they happen, using a verification protocol that prevents false alarms:
|
|
77
|
+
|
|
78
|
+
### Risk Taxonomy
|
|
79
|
+
|
|
80
|
+
| Category | Symbol | Meaning |
|
|
81
|
+
|----------|--------|---------|
|
|
82
|
+
| Tiger | `[TIGER]` | Clear threat requiring action or explicit acceptance |
|
|
83
|
+
| Paper Tiger | `[PAPER]` | Looks threatening but acceptable with evidence |
|
|
84
|
+
| Elephant | `[ELEPHANT]` | Unspoken concern nobody raised yet |
|
|
85
|
+
|
|
86
|
+
### Verification Protocol
|
|
87
|
+
|
|
88
|
+
Before flagging any Tiger, the premortem reviewer must confirm: the relevant prompts were read, the alignment doc was checked for existing mitigation, and the concern is actually in scope. If any check fails, it cannot be flagged as a Tiger.
|
|
89
|
+
|
|
90
|
+
### Checklist Dimensions
|
|
91
|
+
|
|
92
|
+
The premortem works through four systematic categories: prompt completeness (acceptance criteria, validation tooling, dependencies, scope), technical risks (external dependencies, breaking changes, migration paths, security, error handling), integration risks (component wiring, feature flags, cross-prompt testing), and process risks (requirements clarity, validation suite coverage, parallel execution conflicts).
|
|
93
|
+
|
|
94
|
+
## Feedback Synthesis
|
|
95
|
+
|
|
96
|
+
All four jury members' findings flow into [ref:.allhands/flows/shared/REVIEW_OPTIONS_BREAKDOWN.md::739ad0a], which ranks items P1/P2/P3, deduplicates across reviewers, and presents options to the engineer.
|
|
97
|
+
|
|
98
|
+
### Premortem Integration Rules
|
|
99
|
+
|
|
100
|
+
| Premortem Finding | Maps To | Handling |
|
|
101
|
+
|-------------------|---------|----------|
|
|
102
|
+
| Tigers (high severity) | P1 | Require explicit accept or fix decision |
|
|
103
|
+
| Tigers (medium severity) | P2 | Recommend addressing, allow skip |
|
|
104
|
+
| Elephants | Discussion points | Surface to engineer, document response |
|
|
105
|
+
| Paper Tigers | Acknowledged | Note as acceptable risk in alignment doc |
|
|
106
|
+
| Checklist gaps | P2 or P3 | Prompt amendments to close gaps |
|
|
107
|
+
|
|
108
|
+
The engineer's decisions -- including explicitly accepted risks -- are documented in the alignment doc per **Knowledge Compounding**, preventing future jury reviews from re-raising the same concerns.
|
|
109
|
+
|
|
110
|
+
## Quick Premortem Variant
|
|
111
|
+
|
|
112
|
+
[ref:.allhands/flows/shared/QUICK_PREMORTEM.md::1aa1d20]
|
|
113
|
+
|
|
114
|
+
For single prompts (emergent refinement, review-fix, PR-review), a lightweight 3-minute premortem variant exists. It answers five questions (biggest failure mode, external dependency risk, rollback feasibility, uncovered edge cases, unclear requirements) and returns a go/adjust/block recommendation. Full premortems are reserved for complete milestone prompt sets.
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Flow for processing external PR review feedback into actionable prompts with decision documentation for accepted and declined items"
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# PR Reviewing Flow
|
|
6
|
+
|
|
7
|
+
The PR reviewing flow bridges the gap between external code review feedback and the harness prompt model. PR comments arrive as unstructured text; this flow synthesizes them into structured decisions and actionable prompts.
|
|
8
|
+
|
|
9
|
+
Per **Knowledge Compounding**, what the engineer declines is as important as what they accept. Declined items are documented to prevent future agents from re-suggesting rejected approaches.
|
|
10
|
+
|
|
11
|
+
## Flow Sequence
|
|
12
|
+
|
|
13
|
+
```mermaid
|
|
14
|
+
sequenceDiagram
|
|
15
|
+
participant PR as PR Comments
|
|
16
|
+
participant Flow as PR Review Flow
|
|
17
|
+
participant Eng as Engineer
|
|
18
|
+
participant Align as Alignment Doc
|
|
19
|
+
participant Prompts as Prompt Files
|
|
20
|
+
|
|
21
|
+
Flow->>PR: gh pr view --comments
|
|
22
|
+
Flow->>Flow: Synthesize feedback
|
|
23
|
+
Flow->>Flow: Structure via REVIEW_OPTIONS_BREAKDOWN
|
|
24
|
+
Flow->>Eng: Present grouped options
|
|
25
|
+
Eng->>Flow: Accept / Decline decisions
|
|
26
|
+
Flow->>Align: Document declined items + reasoning
|
|
27
|
+
Flow->>Prompts: Create review-fix prompts for accepted items
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Feedback Processing
|
|
31
|
+
|
|
32
|
+
The flow reads PR comments via `gh` CLI, then structures feedback using the review options breakdown methodology from [ref:.allhands/flows/shared/REVIEW_OPTIONS_BREAKDOWN.md::739ad0a]. Feedback is grouped by severity and effort before presentation to the engineer.
|
|
33
|
+
|
|
34
|
+
## Decision Handling
|
|
35
|
+
|
|
36
|
+
| Decision | Action | Why |
|
|
37
|
+
|----------|--------|-----|
|
|
38
|
+
| Accepted | Create `type: review-fix` prompt with PR comment context | Feeds into prompt execution loop |
|
|
39
|
+
| Declined | Document in alignment doc with engineer reasoning | Prevents re-suggestion by future agents |
|
|
40
|
+
|
|
41
|
+
Both outcomes produce artifacts. The accepted path creates prompts following [ref:.allhands/flows/shared/PROMPT_TASKS_CURATION.md::1abf30b]. The declined path updates the alignment doc (schema via `ah schema alignment`) with explicit rejection rationale. This dual-tracking ensures no review feedback is lost regardless of the engineer's decision.
|
|
42
|
+
|
|
43
|
+
## Prompt Creation
|
|
44
|
+
|
|
45
|
+
Review-fix prompts include:
|
|
46
|
+
- Frontmatter `type: review-fix`
|
|
47
|
+
- PR comment context in the body
|
|
48
|
+
- Acceptance criteria derived from the reviewer's concern
|
|
49
|
+
|
|
50
|
+
The flow completes once all prompts are created and the alignment doc is updated. These prompts then enter the normal execution loop.
|
|
51
|
+
|
|
52
|
+
## Source Flow
|
|
53
|
+
|
|
54
|
+
[ref:.allhands/flows/PR_REVIEWING.md::63f8508]
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Prompt execution lifecycle with stochastic/deterministic validation phases: context gathering, implementation with exploratory validation, deterministic acceptance gate, and completion protocol with race-condition-safe ordering"
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Prompt Task Execution
|
|
6
|
+
|
|
7
|
+
Prompts are the atomic unit of work in the harness. Each prompt file IS the task -- a self-contained specification with tasks, acceptance criteria, validation suites, and skills. The execution flow ensures every prompt passes through a validation gate before completion and that all work is documented for downstream agents.
|
|
8
|
+
|
|
9
|
+
## Execution Lifecycle
|
|
10
|
+
|
|
11
|
+
```mermaid
|
|
12
|
+
stateDiagram-v2
|
|
13
|
+
[*] --> ContextGathering
|
|
14
|
+
ContextGathering --> Implementation
|
|
15
|
+
Implementation --> Validation
|
|
16
|
+
Validation --> Implementation: FAIL
|
|
17
|
+
Validation --> Completion: PASS
|
|
18
|
+
Completion --> [*]
|
|
19
|
+
|
|
20
|
+
state Implementation {
|
|
21
|
+
[*] --> ExecuteTasks
|
|
22
|
+
ExecuteTasks --> HandleDeviation: deviation detected
|
|
23
|
+
HandleDeviation --> ExecuteTasks: non-architectural
|
|
24
|
+
HandleDeviation --> Blocked: architectural change needed
|
|
25
|
+
}
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Validation Dimensions in Execution
|
|
29
|
+
|
|
30
|
+
Per **Agentic Validation Tooling**, the two validation dimensions apply at different phases of prompt execution:
|
|
31
|
+
|
|
32
|
+
| Phase | Dimension | What Happens |
|
|
33
|
+
|-------|-----------|--------------|
|
|
34
|
+
| **Implementation** | Stochastic | Agent reads suite **Stochastic Validation** sections; uses model intuition to probe edge cases, test user flows, verify quality beyond deterministic checks |
|
|
35
|
+
| **Validation Gate** | Deterministic | Agent runs suite **Deterministic Integration** commands; binary pass/fail gates completion |
|
|
36
|
+
|
|
37
|
+
Stochastic exploration during implementation informs quality but is not an acceptance criterion. Acceptance criteria must be deterministic -- drawn from suite Deterministic Integration sections per [ref:.allhands/flows/shared/UTILIZE_VALIDATION_TOOLING.md::1df56ac].
|
|
38
|
+
|
|
39
|
+
## Context Gathering
|
|
40
|
+
|
|
41
|
+
[ref:.allhands/flows/PROMPT_TASK_EXECUTION.md::9baf478]
|
|
42
|
+
|
|
43
|
+
The executor reads the prompt file and the alignment doc. If FAILURE SUMMARY sections exist from prior attempts, the agent adapts to their redirections. Additional codebase search is available but typically unnecessary -- per **Context is Precious**, prompts should contain sufficient context from the planning phase.
|
|
44
|
+
|
|
45
|
+
## Prompt Curation Principles
|
|
46
|
+
|
|
47
|
+
[ref:.allhands/flows/shared/PROMPT_TASKS_CURATION.md::1abf30b]
|
|
48
|
+
|
|
49
|
+
Prompts are designed with strict context budget awareness:
|
|
50
|
+
|
|
51
|
+
| Context Usage | Agent Quality | Implication |
|
|
52
|
+
|---------------|--------------|-------------|
|
|
53
|
+
| 0-30% | Peak | Thorough, comprehensive work |
|
|
54
|
+
| 30-50% | Good | Solid execution |
|
|
55
|
+
| 50-70% | Degrading | Efficiency mode kicks in |
|
|
56
|
+
| 70%+ | Poor | Rushed, minimal output |
|
|
57
|
+
|
|
58
|
+
This drives hard scope limits: 2-6 tasks per prompt, target 50% context max, and prompts modifying 7+ files must be split. The principle is **Context is Precious** applied at the individual agent level.
|
|
59
|
+
|
|
60
|
+
### Key Curation Rules
|
|
61
|
+
|
|
62
|
+
- Completed prompts (`status: done`) are immutable -- create new prompts to extend or fix
|
|
63
|
+
- Prompts must plan **wiring**, not just artifacts -- tasks connect components via API calls, imports, and state flow
|
|
64
|
+
- Skills and validation suites are embedded in frontmatter, not discovered at execution time
|
|
65
|
+
- Testing is NOT a separate prompt -- validation happens via `validation_suites` attached to feature prompts
|
|
66
|
+
|
|
67
|
+
## Deviation Handling
|
|
68
|
+
|
|
69
|
+
Per **Frontier Models are Capable**, executors handle deviations autonomously:
|
|
70
|
+
|
|
71
|
+
```mermaid
|
|
72
|
+
flowchart TD
|
|
73
|
+
D[Deviation Detected] --> T{Type?}
|
|
74
|
+
T -->|Bug/error| F1[Fix immediately, document in summary]
|
|
75
|
+
T -->|Missing critical functionality| F2[Add immediately, document in summary]
|
|
76
|
+
T -->|Blocking issue| F3[Fix to unblock, document in summary]
|
|
77
|
+
T -->|Architectural change needed| F4[Stop, set status: blocked]
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The boundary is clear: executors can fix anything within the prompt's conceptual scope but must stop and escalate if the fix requires new database tables, major schema changes, or new services.
|
|
81
|
+
|
|
82
|
+
## Validation Gate
|
|
83
|
+
|
|
84
|
+
[ref:.allhands/flows/shared/PROMPT_VALIDATION_REVIEW.MD::cfed59b]
|
|
85
|
+
|
|
86
|
+
Every prompt must pass validation review before completion. The reviewer is a separate subtask that provides independent judgment.
|
|
87
|
+
|
|
88
|
+
### Review Dimensions
|
|
89
|
+
|
|
90
|
+
The reviewer checks three levels of completion:
|
|
91
|
+
|
|
92
|
+
| Level | What it Means |
|
|
93
|
+
|-------|---------------|
|
|
94
|
+
| Existence | Files were created or modified |
|
|
95
|
+
| Substantive | Implementation is real, not placeholder |
|
|
96
|
+
| Wired | Components are connected (imports, API calls, state rendering) |
|
|
97
|
+
|
|
98
|
+
Task completion does not equal goal achievement. The reviewer specifically scans for stub patterns: `TODO`, `return null`, `return {}`, empty handlers, orphaned code, and unconnected APIs.
|
|
99
|
+
|
|
100
|
+
### Validation Quality Critique
|
|
101
|
+
|
|
102
|
+
The reviewer evaluates not just whether tests pass, but whether the validation itself is meaningful. Red flags include mock tests that don't test real implementation, validation curated to look like it passes, and redundant tests that add no confidence.
|
|
103
|
+
|
|
104
|
+
### Iteration Protocol
|
|
105
|
+
|
|
106
|
+
If validation fails, the executor acts on feedback and resubmits. After attempt 2 with genuine limitations, the executor may communicate compromises -- the reviewer may still reject, but rigid blocking on diminishing returns wastes cycles.
|
|
107
|
+
|
|
108
|
+
## Completion Protocol
|
|
109
|
+
|
|
110
|
+
The completion order is critical and exists to prevent race conditions with parallel agents:
|
|
111
|
+
|
|
112
|
+
1. Write success summary to prompt file (including deviations handled)
|
|
113
|
+
2. Append summary to alignment doc's Prompt Summaries section
|
|
114
|
+
3. Commit implementation changes only (prompt and alignment files are NOT git tracked)
|
|
115
|
+
4. Set `status: done` in frontmatter (must be after summaries)
|
|
116
|
+
5. Rename prompt file with `-DONE` suffix
|
|
117
|
+
6. Stop
|
|
118
|
+
|
|
119
|
+
Per **Knowledge Compounding**, the alignment doc summary enables other agents to see completed work without reading each individual prompt -- reducing context consumption across the entire agent fleet.
|