buildanything 1.6.0 → 1.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -1
- package/.claude-plugin/plugin.json +10 -2
- package/README.md +71 -61
- package/agents/agentic-identity-trust.md +65 -311
- package/agents/data-consolidation-agent.md +3 -22
- package/agents/design-brand-guardian.md +52 -275
- package/agents/design-image-prompt-engineer.md +67 -196
- package/agents/design-ui-designer.md +37 -361
- package/agents/design-ux-architect.md +51 -434
- package/agents/design-ux-researcher.md +48 -299
- package/agents/design-whimsy-injector.md +58 -405
- package/agents/engineering-backend-architect.md +39 -202
- package/agents/engineering-data-engineer.md +41 -236
- package/agents/engineering-devops-automator.md +73 -258
- package/agents/engineering-frontend-developer.md +33 -206
- package/agents/engineering-mobile-app-builder.md +36 -446
- package/agents/engineering-rapid-prototyper.md +34 -428
- package/agents/engineering-security-engineer.md +44 -204
- package/agents/engineering-senior-developer.md +18 -138
- package/agents/engineering-technical-writer.md +40 -302
- package/agents/marketing-app-store-optimizer.md +63 -276
- package/agents/marketing-social-media-strategist.md +38 -87
- package/agents/project-management-experiment-tracker.md +62 -156
- package/agents/report-distribution-agent.md +4 -24
- package/agents/sales-data-extraction-agent.md +3 -22
- package/agents/specialized-cultural-intelligence-strategist.md +41 -62
- package/agents/specialized-developer-advocate.md +65 -234
- package/agents/support-analytics-reporter.md +76 -306
- package/agents/support-executive-summary-generator.md +26 -172
- package/agents/support-finance-tracker.md +67 -362
- package/agents/support-legal-compliance-checker.md +40 -497
- package/agents/support-support-responder.md +40 -532
- package/agents/testing-accessibility-auditor.md +67 -271
- package/agents/testing-api-tester.md +58 -274
- package/agents/testing-evidence-collector.md +48 -170
- package/agents/testing-performance-benchmarker.md +75 -236
- package/agents/testing-reality-checker.md +49 -192
- package/agents/testing-test-results-analyzer.md +70 -276
- package/agents/testing-tool-evaluator.md +52 -368
- package/agents/testing-workflow-optimizer.md +66 -415
- package/bin/setup.js +45 -0
- package/bin/sync-version.js +38 -0
- package/commands/add-feature.md +98 -0
- package/commands/build.md +156 -93
- package/commands/dogfood.md +43 -0
- package/commands/fix.md +89 -0
- package/commands/idea-sweep.md +19 -82
- package/commands/refactor.md +68 -0
- package/commands/ux-review.md +81 -0
- package/commands/verify.md +43 -0
- package/hooks/session-start +5 -10
- package/package.json +4 -1
- package/agents/agents-orchestrator.md +0 -365
- package/agents/data-analytics-reporter.md +0 -52
- package/agents/lsp-index-engineer.md +0 -312
- package/agents/macos-spatial-metal-engineer.md +0 -335
- package/agents/marketing-content-creator.md +0 -52
- package/agents/marketing-growth-hacker.md +0 -52
- package/agents/product-sprint-prioritizer.md +0 -152
- package/agents/product-trend-researcher.md +0 -157
- package/agents/project-management-project-shepherd.md +0 -192
- package/agents/project-management-studio-operations.md +0 -198
- package/agents/project-management-studio-producer.md +0 -201
- package/agents/project-manager-senior.md +0 -133
- package/agents/support-infrastructure-maintainer.md +0 -616
- package/agents/terminal-integration-specialist.md +0 -68
- package/agents/visionos-spatial-engineer.md +0 -52
- package/agents/xr-cockpit-interaction-specialist.md +0 -30
- package/agents/xr-immersive-developer.md +0 -30
- package/agents/xr-interface-architect.md +0 -30
- package/commands/protocols/brainstorm.md +0 -99
- package/commands/protocols/build-fix.md +0 -52
- package/commands/protocols/cleanup.md +0 -56
- package/commands/protocols/design.md +0 -287
- package/commands/protocols/eval-harness.md +0 -62
- package/commands/protocols/metric-loop.md +0 -94
- package/commands/protocols/planning.md +0 -56
- package/commands/protocols/verify.md +0 -63
|
@@ -1,62 +0,0 @@
|
|
|
1
|
-
# Eval Harness Protocol
|
|
2
|
-
|
|
3
|
-
You are the orchestrator. Phase 6.1 audits are complete. Before running the metric loop, define formal eval cases that are concrete, executable, and reproducible. This replaces subjective narrative audits with deterministic pass/fail tests.
|
|
4
|
-
|
|
5
|
-
## How This Differs from the Metric Loop
|
|
6
|
-
|
|
7
|
-
The metric loop answers "how good is this?" (qualitative score 0-100, iterative improvement).
|
|
8
|
-
The eval harness answers "does this specific behavior work reliably?" (binary pass/fail, deterministic).
|
|
9
|
-
|
|
10
|
-
They are complementary: eval harness failures become specific issues for the metric loop to fix.
|
|
11
|
-
|
|
12
|
-
## Step 0: Define Eval Cases
|
|
13
|
-
|
|
14
|
-
YOU (the orchestrator) define eval cases based on:
|
|
15
|
-
- Audit findings from Phase 6.1 (highest-severity items first)
|
|
16
|
-
- Architecture doc (API contracts, auth model, data validation rules)
|
|
17
|
-
- Design doc (core user flows, edge cases)
|
|
18
|
-
|
|
19
|
-
Write eval cases to `docs/plans/.build-state.md` under `## Eval Harness`:
|
|
20
|
-
|
|
21
|
-
| # | Name | Action | Expected Result | pass@k | Severity |
|
|
22
|
-
|---|------|--------|-----------------|--------|----------|
|
|
23
|
-
|
|
24
|
-
**Severity thresholds (non-negotiable):**
|
|
25
|
-
- CRITICAL: pass@5 (must pass 5/5 — 100% reliability)
|
|
26
|
-
- HIGH: pass@4 (must pass 4/5 — 80% reliability)
|
|
27
|
-
- MEDIUM: pass@3 (must pass 3/5 — 60% reliability)
|
|
28
|
-
|
|
29
|
-
Aim for 8-15 eval cases. Cover: auth boundaries, input validation, error handling, core happy path, primary edge cases.
|
|
30
|
-
|
|
31
|
-
**Eval cases must be concrete and executable** — actual commands (curl, function calls, UI interactions), not descriptions. Bad: "Auth should work." Good: "curl -X GET /api/recipes without Authorization header → expect 401."
|
|
32
|
-
|
|
33
|
-
## Step 1: Run Eval
|
|
34
|
-
|
|
35
|
-
Call the Agent tool — description: "Run eval harness" — mode: "bypassPermissions" — prompt:
|
|
36
|
-
|
|
37
|
-
"[COMPLEXITY: M] Run these eval cases. For each case, execute the action the specified number of times (k). Report per case: PASS (N/k passed, meets threshold) or FAIL (N/k passed, below threshold). Include the actual result on failures. [paste eval case table]"
|
|
38
|
-
|
|
39
|
-
<HARD-GATE>
|
|
40
|
-
The eval agent RUNS cases. It does NOT define them. Case definition is the orchestrator's job.
|
|
41
|
-
</HARD-GATE>
|
|
42
|
-
|
|
43
|
-
## Step 2: Score
|
|
44
|
-
|
|
45
|
-
Count PASS cases / total cases. This is the eval baseline. Record to `docs/plans/.build-state.md`.
|
|
46
|
-
|
|
47
|
-
## Step 3: Feed into Metric Loop
|
|
48
|
-
|
|
49
|
-
Any FAIL case with severity CRITICAL or HIGH becomes a candidate issue for the Phase 6.2 metric loop. Pass the failure details (case name, action, expected vs actual) as context when defining the metric loop's metric.
|
|
50
|
-
|
|
51
|
-
## Step 4: Re-evaluate After Metric Loop
|
|
52
|
-
|
|
53
|
-
After the Phase 6.2 metric loop exits, re-run the eval harness. All CRITICAL cases must now pass. If any CRITICAL case still fails, flag it for the Reality Checker in Step 6.3.
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## Rules
|
|
58
|
-
|
|
59
|
-
- Eval cases are defined by the ORCHESTRATOR, not by the eval agent.
|
|
60
|
-
- pass@k thresholds are non-negotiable per severity level.
|
|
61
|
-
- Re-run eval after metric loop to verify fixes — this is the exit gate.
|
|
62
|
-
- Eval failures feed into the metric loop as specific, concrete issues — not vague audit findings.
|
|
@@ -1,94 +0,0 @@
|
|
|
1
|
-
# Metric Loop Protocol
|
|
2
|
-
|
|
3
|
-
You are the orchestrator. You are about to run a metric-driven iteration loop on an artifact (code, architecture, docs, etc.) to drive it toward a quality target.
|
|
4
|
-
|
|
5
|
-
## Step 0: Define Your Metric
|
|
6
|
-
|
|
7
|
-
Before iterating, YOU define the metric for this specific context. Consider:
|
|
8
|
-
- What is the artifact? (a task implementation, a security audit, an architecture doc, etc.)
|
|
9
|
-
- What does "good" look like? (all tests pass, zero critical vulns, all acceptance criteria met, etc.)
|
|
10
|
-
- Is the metric quantitative (test pass rate, vuln count, coverage %) or qualitative (architecture completeness, doc clarity)?
|
|
11
|
-
|
|
12
|
-
Write a **Metric Definition** block to `docs/plans/.build-state.md`:
|
|
13
|
-
|
|
14
|
-
```
|
|
15
|
-
## Active Metric Loop
|
|
16
|
-
Phase: [current phase]
|
|
17
|
-
Artifact: [what you're iterating on]
|
|
18
|
-
Metric: [what you're measuring, in one sentence]
|
|
19
|
-
How to measure: [what the measurement agent should do — run tests, audit code, check criteria, etc.]
|
|
20
|
-
Target: [score 0-100 at which you stop]
|
|
21
|
-
Max iterations: [hard cap, default 5]
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
Then create a score log table:
|
|
25
|
-
|
|
26
|
-
```
|
|
27
|
-
| Iter | Score | Delta | Top Issue | Files |
|
|
28
|
-
|------|-------|-------|-----------|-------|
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
When starting a new metric loop, REPLACE the previous Active Metric Loop section (if any). There is only ever ONE active metric loop. Previous loop results should already be recorded in their phase's section above. When the loop completes (Step 2 exit), rename the section header from `## Active Metric Loop` to `## Completed Metric Loop — [Phase N]` and leave it for historical reference.
|
|
32
|
-
|
|
33
|
-
If you are in Phase 5, also record the current sub-step for the overall task cycle (not all of these are within the metric loop itself):
|
|
34
|
-
```
|
|
35
|
-
Sub-step: [5.1 Implement | 5.1b Cleanup | 5.2 Metric Loop | 5.3 Loop Exit | 5.4 Verify]
|
|
36
|
-
```
|
|
37
|
-
This tells the orchestrator exactly where to resume after context compaction.
|
|
38
|
-
|
|
39
|
-
## Step 1: MEASURE
|
|
40
|
-
|
|
41
|
-
Call the Agent tool — description: "Measure [metric]" — prompt:
|
|
42
|
-
|
|
43
|
-
"[How to measure, from your metric definition]. Score the current state 0-100. Return your response with a clear SCORE: [number] line, a list of FINDINGS, and the single TOP ISSUE most likely to improve the score if fixed."
|
|
44
|
-
|
|
45
|
-
Read the agent's response. You need: the SCORE, the TOP ISSUE, and the file paths for diagnosis in Step 3. Record the score to `docs/plans/.build-state.md`. The full findings list is useful for diagnosis but does NOT need to persist in your context across iterations — once you've picked the top issue, the details of lower-priority findings can go. Append a row to the score log in `docs/plans/.build-state.md`:
|
|
46
|
-
|
|
47
|
-
| Iter | Score | Delta | Top Issue | Files |
|
|
48
|
-
|------|-------|-------|-----------|-------|
|
|
49
|
-
|
|
50
|
-
## Step 2: CHECK EXIT
|
|
51
|
-
|
|
52
|
-
Stop the loop if ANY of these:
|
|
53
|
-
|
|
54
|
-
- **Score >= target** → done. Log "Target met at iteration [N]."
|
|
55
|
-
- **Iteration >= max** → done. Log "Max iterations reached. Final score: [N]."
|
|
56
|
-
- **Stall: last 2 scores show no improvement** (delta <= 0 twice in a row) → done. Log "Stalled at score [N]."
|
|
57
|
-
|
|
58
|
-
On stall or max iterations:
|
|
59
|
-
- **Interactive mode:** present score history + top remaining issue to user. Ask for direction.
|
|
60
|
-
- **Autonomous mode:** if score >= 60% of target, accept with warning. Otherwise skip. Log to `docs/plans/build-log.md`.
|
|
61
|
-
|
|
62
|
-
If not exiting, continue to Step 3.
|
|
63
|
-
|
|
64
|
-
## Step 3: DIAGNOSE
|
|
65
|
-
|
|
66
|
-
Look at the findings from Step 1. Pick the ONE highest-impact issue — the single fix most likely to move the score. Do not try to fix everything at once. This is the autoresearch insight: one targeted change per iteration, measured impact.
|
|
67
|
-
|
|
68
|
-
## Step 4: IMPROVE
|
|
69
|
-
|
|
70
|
-
Call the Agent tool — description: "Fix [top issue]" — mode: "bypassPermissions" — prompt:
|
|
71
|
-
|
|
72
|
-
"TARGETED FIX: [specific issue to fix, from diagnosis]. CONTEXT: [relevant architecture/criteria]. Make this specific change. Do not refactor unrelated code. Commit: 'fix: [description]'."
|
|
73
|
-
|
|
74
|
-
> **Do NOT pass the measurement agent's full findings to this agent. Only pass the single diagnosed issue and relevant file paths.**
|
|
75
|
-
|
|
76
|
-
## Step 5: LOOP
|
|
77
|
-
|
|
78
|
-
Return to Step 1. Re-measure the artifact after the fix.
|
|
79
|
-
|
|
80
|
-
---
|
|
81
|
-
|
|
82
|
-
## Rules
|
|
83
|
-
|
|
84
|
-
<HARD-GATE>
|
|
85
|
-
AUTHOR-BIAS ELIMINATION: The measurement agent and the fix agent must NEVER share context.
|
|
86
|
-
- They MUST be separate Agent tool calls (separate subprocesses, separate context windows).
|
|
87
|
-
- The fix agent receives ONLY: (a) the single top issue diagnosed in Step 3, (b) the relevant file paths, (c) the acceptance criteria. It does NOT receive the measurement agent's full findings, score breakdown, or other issues.
|
|
88
|
-
- The measurement agent in the next iteration does NOT know what the fix agent did — it measures the artifact fresh.
|
|
89
|
-
- Rationale: When a reviewer shares context with an implementer, the implementer unconsciously optimizes for the reviewer's framing rather than actual quality.
|
|
90
|
-
</HARD-GATE>
|
|
91
|
-
- One fix per iteration. Measure its impact before fixing the next thing.
|
|
92
|
-
- Track ALL scores in `docs/plans/.build-state.md` so the history survives context compaction.
|
|
93
|
-
- If context was compacted mid-loop: read `docs/plans/.build-state.md`, find the Active Metric Loop section, resume from the last recorded iteration.
|
|
94
|
-
- CONTEXT HYGIENE: Measurement agents are analysis agents — read their full output for diagnosis. But once you've picked the top issue (Step 3) and dispatched the fix (Step 4), the detailed findings from THAT iteration are spent. Don't accumulate findings across iterations — each measurement is fresh.
|
|
@@ -1,56 +0,0 @@
|
|
|
1
|
-
# Planning Protocol
|
|
2
|
-
|
|
3
|
-
You are the orchestrator converting a validated Design Document and Architecture Document into an ordered, developer-ready task list.
|
|
4
|
-
|
|
5
|
-
## Input
|
|
6
|
-
|
|
7
|
-
You need two documents before running this protocol:
|
|
8
|
-
- **Design Document** (`docs/plans/YYYY-MM-DD-[topic]-design.md`) — scope, user flows, data model, tech stack
|
|
9
|
-
- **Architecture Document** (`docs/plans/architecture.md`) — services, API contracts, database schema, component tree
|
|
10
|
-
|
|
11
|
-
## Step 1: Break Down
|
|
12
|
-
|
|
13
|
-
Decompose the architecture into ordered, atomic tasks. Each task must be:
|
|
14
|
-
|
|
15
|
-
- **Implementable independently** — a developer agent can build it without needing unfinished work from other tasks
|
|
16
|
-
- **Testable** — there are concrete acceptance criteria that can be verified
|
|
17
|
-
- **Scoped to MVP** — if the design doc says a feature is deferred, do not create tasks for it
|
|
18
|
-
|
|
19
|
-
For each task:
|
|
20
|
-
|
|
21
|
-
```
|
|
22
|
-
### Task [N]: [name]
|
|
23
|
-
**Type:** frontend / backend / integration / infrastructure
|
|
24
|
-
**Description:** [what to build, 2-3 sentences]
|
|
25
|
-
**Acceptance Criteria:**
|
|
26
|
-
- [ ] [specific, verifiable criterion]
|
|
27
|
-
- [ ] [specific, verifiable criterion]
|
|
28
|
-
**Dependencies:** [task numbers that must complete first, or "none"]
|
|
29
|
-
**Size:** S (< 1 hour) / M (1-3 hours) / L (3+ hours)
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
## Step 2: Order
|
|
33
|
-
|
|
34
|
-
Order tasks by dependency chain, then by priority within each dependency level:
|
|
35
|
-
|
|
36
|
-
1. Infrastructure/scaffolding first (project setup, database schema, base config)
|
|
37
|
-
2. Core data model and API endpoints
|
|
38
|
-
3. Primary user flow (the main thing the user does)
|
|
39
|
-
4. Supporting features
|
|
40
|
-
5. Polish, error handling, edge cases
|
|
41
|
-
|
|
42
|
-
Flag any circular dependencies — these indicate an architecture problem that needs resolution before building.
|
|
43
|
-
|
|
44
|
-
## Step 3: Validate
|
|
45
|
-
|
|
46
|
-
Check the task list against the design doc:
|
|
47
|
-
|
|
48
|
-
- Every feature in MVP scope has at least one task
|
|
49
|
-
- No task exceeds the MVP boundary
|
|
50
|
-
- No task is too large (L tasks should be split if possible)
|
|
51
|
-
- Dependency chains are no deeper than 3 levels
|
|
52
|
-
- Acceptance criteria are specific enough that a developer agent can verify them without ambiguity
|
|
53
|
-
|
|
54
|
-
## Step 4: Save
|
|
55
|
-
|
|
56
|
-
Save to `docs/plans/sprint-tasks.md`.
|
|
@@ -1,63 +0,0 @@
|
|
|
1
|
-
# Verification Protocol
|
|
2
|
-
|
|
3
|
-
You are the orchestrator. You are about to run a deterministic verification gate — a fast, sequential pass/fail check that catches regressions before expensive audit agents run.
|
|
4
|
-
|
|
5
|
-
## When to Run
|
|
6
|
-
|
|
7
|
-
Run this protocol at every phase boundary: after scaffolding, after each task, before final review. It is cheap. Run it often.
|
|
8
|
-
|
|
9
|
-
## Step 1: Detect Stack
|
|
10
|
-
|
|
11
|
-
Before running checks, detect the project's stack from manifest files:
|
|
12
|
-
|
|
13
|
-
| Manifest | Stack | Build | Types | Lint | Test | Security |
|
|
14
|
-
|----------|-------|-------|-------|------|------|----------|
|
|
15
|
-
| `package.json` | Node | `npm run build` | `npx tsc --noEmit` | `npm run lint` | `npm test` | `npm audit` |
|
|
16
|
-
| `requirements.txt` / `pyproject.toml` | Python | — | `mypy .` | `ruff check .` | `pytest` | `pip audit` |
|
|
17
|
-
| `go.mod` | Go | `go build ./...` | (included in build) | `golangci-lint run` | `go test ./...` | `govulncheck ./...` |
|
|
18
|
-
| `Cargo.toml` | Rust | `cargo build` | (included in build) | `cargo clippy` | `cargo test` | `cargo audit` |
|
|
19
|
-
|
|
20
|
-
Skip any check that does not apply (e.g., skip Build for a pure Python script, skip Type-Check for JavaScript without TypeScript). A skipped check counts as PASS.
|
|
21
|
-
|
|
22
|
-
## Step 2: Run Checks Sequentially
|
|
23
|
-
|
|
24
|
-
Call the Agent tool — description: "Verify [phase name]" — mode: "bypassPermissions" — prompt:
|
|
25
|
-
|
|
26
|
-
"Run the Verification Protocol. Execute all 6 checks sequentially, stop on first failure. Report: VERIFY: PASS (6/6) or VERIFY: FAIL at step [N] — [check name]: [reason]."
|
|
27
|
-
|
|
28
|
-
The agent runs these checks in order, stopping on the first FAIL:
|
|
29
|
-
|
|
30
|
-
| # | Check | What it does |
|
|
31
|
-
|---|-------|-------------|
|
|
32
|
-
| 1 | Build | Project compiles/bundles without errors |
|
|
33
|
-
| 2 | Type-Check | No type errors (tsc, mypy, etc.) |
|
|
34
|
-
| 3 | Lint | No lint violations |
|
|
35
|
-
| 4 | Test | All tests pass |
|
|
36
|
-
| 5 | Security | No known vulnerabilities in deps |
|
|
37
|
-
| 6 | Diff Review | `git diff` of uncommitted changes — no debug code, no secrets, no obvious regressions |
|
|
38
|
-
|
|
39
|
-
<HARD-GATE>
|
|
40
|
-
ONE AGENT, ONE PASS: The orchestrator spawns exactly ONE agent for the entire verification. This is a single Agent tool call, not 6 separate agents. The agent runs each check as a sequential shell command and evaluates the result before proceeding.
|
|
41
|
-
</HARD-GATE>
|
|
42
|
-
|
|
43
|
-
## Step 3: Handle Result
|
|
44
|
-
|
|
45
|
-
**On PASS:** Log `VERIFY: PASS (6/6)` to `docs/plans/.build-state.md`. Proceed to next phase.
|
|
46
|
-
|
|
47
|
-
**On FAIL:** Read the failure reason and spawn a targeted fix agent:
|
|
48
|
-
|
|
49
|
-
| Failed Check | Fix Strategy |
|
|
50
|
-
|-------------|-------------|
|
|
51
|
-
| Build / Type-Check / Lint | Run the Build-Fix Protocol (`commands/protocols/build-fix.md`). It isolates the first error, fixes it, rebuilds, detects cascade resolution, and reverts bad fixes automatically. |
|
|
52
|
-
| Test | Spawn fix agent: "Fix the failing test: [test name]. Read the test, read the implementation, fix the implementation — not the test — unless the test is wrong." |
|
|
53
|
-
| Security | Spawn fix agent: "Resolve vulnerability: [advisory]. Update the dependency or apply the recommended remediation." |
|
|
54
|
-
| Diff Review | Spawn fix agent: "Remove debug code / hardcoded secrets / regressions found in diff review: [details]." |
|
|
55
|
-
|
|
56
|
-
After the fix agent completes, re-run verification from Step 2.
|
|
57
|
-
|
|
58
|
-
<HARD-GATE>
|
|
59
|
-
MAX 3 FIX ATTEMPTS: If verification fails 3 times on the same phase:
|
|
60
|
-
- **Interactive mode:** present the failure history to the user. Ask for direction.
|
|
61
|
-
- **Autonomous mode:** log the failure to `docs/plans/build-log.md` and proceed with a warning.
|
|
62
|
-
Do not loop forever.
|
|
63
|
-
</HARD-GATE>
|