multi-forge 0.2.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- forge/__init__.py +3 -0
- forge/_extensions/agents/.gitkeep +0 -0
- forge/_extensions/commands/.gitkeep +0 -0
- forge/_extensions/skills/analyze/SKILL.md +87 -0
- forge/_extensions/skills/challenge/SKILL.md +91 -0
- forge/_extensions/skills/consensus/SKILL.md +120 -0
- forge/_extensions/skills/consensus/resources/code_consensus_evaluation.md +94 -0
- forge/_extensions/skills/consensus/resources/consensus_evaluation.md +70 -0
- forge/_extensions/skills/consensus/resources/synthesis.md +101 -0
- forge/_extensions/skills/debate/SKILL.md +116 -0
- forge/_extensions/skills/debate/resources/code_debate_evaluation.md +101 -0
- forge/_extensions/skills/debate/resources/debate_evaluation.md +90 -0
- forge/_extensions/skills/panel/SKILL.md +141 -0
- forge/_extensions/skills/panel/resources/synthesis.md +103 -0
- forge/_extensions/skills/qa/SKILL.md +704 -0
- forge/_extensions/skills/qa/resources/checklist/0-enable.md +78 -0
- forge/_extensions/skills/qa/resources/checklist/1-preflight.md +24 -0
- forge/_extensions/skills/qa/resources/checklist/10-resume.md +143 -0
- forge/_extensions/skills/qa/resources/checklist/11-config.md +150 -0
- forge/_extensions/skills/qa/resources/checklist/12-search.md +58 -0
- forge/_extensions/skills/qa/resources/checklist/13-guard.md +237 -0
- forge/_extensions/skills/qa/resources/checklist/14-workflow.md +305 -0
- forge/_extensions/skills/qa/resources/checklist/15-skills.md +155 -0
- forge/_extensions/skills/qa/resources/checklist/16-handoff.md +224 -0
- forge/_extensions/skills/qa/resources/checklist/17-info.md +50 -0
- forge/_extensions/skills/qa/resources/checklist/18-disable.md +84 -0
- forge/_extensions/skills/qa/resources/checklist/19-uninstall.md +146 -0
- forge/_extensions/skills/qa/resources/checklist/2-extensions.md +188 -0
- forge/_extensions/skills/qa/resources/checklist/20-cleanup.md +36 -0
- forge/_extensions/skills/qa/resources/checklist/3-auth.md +234 -0
- forge/_extensions/skills/qa/resources/checklist/4-proxy.md +481 -0
- forge/_extensions/skills/qa/resources/checklist/5-session.md +541 -0
- forge/_extensions/skills/qa/resources/checklist/6-hooks.md +275 -0
- forge/_extensions/skills/qa/resources/checklist/7-costs.md +309 -0
- forge/_extensions/skills/qa/resources/checklist/8-status-line.md +174 -0
- forge/_extensions/skills/qa/resources/checklist/9-direct-commands.md +146 -0
- forge/_extensions/skills/qa/resources/checklist.md +103 -0
- forge/_extensions/skills/qa/resources/report-template.md +62 -0
- forge/_extensions/skills/qa/scripts/start-container.sh +529 -0
- forge/_extensions/skills/qa/scripts/walkthrough-state.py +1137 -0
- forge/_extensions/skills/review/SKILL.md +125 -0
- forge/_extensions/skills/review/references/claude-4.6.md +474 -0
- forge/_extensions/skills/review/references/claude-4.7.md +710 -0
- forge/_extensions/skills/review/references/gemini-3.1.md +546 -0
- forge/_extensions/skills/review/references/gpt-5.5.md +490 -0
- forge/_extensions/skills/review/references/skills-writing-guide.md +1588 -0
- forge/_extensions/skills/review/resources/code-anthropic.md +160 -0
- forge/_extensions/skills/review/resources/code-gemini.md +184 -0
- forge/_extensions/skills/review/resources/code-openai.md +203 -0
- forge/_extensions/skills/review/resources/code.md +160 -0
- forge/_extensions/skills/review-docs/SKILL.md +121 -0
- forge/_extensions/skills/review-docs/resources/docs-anthropic.md +170 -0
- forge/_extensions/skills/review-docs/resources/docs-gemini.md +204 -0
- forge/_extensions/skills/review-docs/resources/docs-openai.md +231 -0
- forge/_extensions/skills/review-docs/resources/docs.md +170 -0
- forge/_extensions/skills/smoke-test/SKILL.md +27 -0
- forge/_extensions/skills/smoke-test/scripts/smoke-test.sh +118 -0
- forge/_extensions/skills/understand/SKILL.md +148 -0
- forge/_extensions/skills/understand/resources/code-anthropic.md +163 -0
- forge/_extensions/skills/understand/resources/code-gemini.md +194 -0
- forge/_extensions/skills/understand/resources/code-openai.md +181 -0
- forge/_extensions/skills/understand/resources/code.md +163 -0
- forge/_extensions/skills/understand/resources/docs-anthropic.md +177 -0
- forge/_extensions/skills/understand/resources/docs-gemini.md +202 -0
- forge/_extensions/skills/understand/resources/docs-openai.md +191 -0
- forge/_extensions/skills/understand/resources/docs.md +177 -0
- forge/_extensions/skills/walkthrough/SKILL.md +599 -0
- forge/_extensions/skills/walkthrough/resources/checklist.md +765 -0
- forge/_extensions/skills/walkthrough/scripts/run-in-repo.sh +118 -0
- forge/_extensions/skills/walkthrough/scripts/setup-test-repo.sh +198 -0
- forge/_extensions/skills/walkthrough/scripts/walkthrough-state.py +1137 -0
- forge/backend/__init__.py +174 -0
- forge/backend/adapters/__init__.py +38 -0
- forge/backend/adapters/litellm.py +158 -0
- forge/backend/creation.py +89 -0
- forge/backend/registry.py +178 -0
- forge/cli/__init__.py +16 -0
- forge/cli/auth.py +483 -0
- forge/cli/backend.py +298 -0
- forge/cli/claude.py +411 -0
- forge/cli/config_cmd.py +303 -0
- forge/cli/extensions.py +1001 -0
- forge/cli/gc.py +165 -0
- forge/cli/guard.py +1018 -0
- forge/cli/guards.py +106 -0
- forge/cli/handoff.py +110 -0
- forge/cli/hooks/__init__.py +36 -0
- forge/cli/hooks/_group.py +20 -0
- forge/cli/hooks/_helpers.py +149 -0
- forge/cli/hooks/commands.py +1677 -0
- forge/cli/hooks/direct_commands.py +1304 -0
- forge/cli/hooks/install.py +232 -0
- forge/cli/hooks/policy.py +151 -0
- forge/cli/hooks/read_hygiene.py +74 -0
- forge/cli/hooks/verification.py +370 -0
- forge/cli/logs.py +406 -0
- forge/cli/main.py +292 -0
- forge/cli/proxy.py +1821 -0
- forge/cli/proxy_costs.py +313 -0
- forge/cli/search.py +416 -0
- forge/cli/session.py +892 -0
- forge/cli/session_addendum.py +81 -0
- forge/cli/session_fork.py +750 -0
- forge/cli/session_handoff.py +141 -0
- forge/cli/session_lifecycle.py +2053 -0
- forge/cli/session_manage.py +1336 -0
- forge/cli/session_memory.py +201 -0
- forge/cli/status_line.py +1398 -0
- forge/cli/workflow.py +1964 -0
- forge/config/__init__.py +110 -0
- forge/config/dataclass_utils.py +88 -0
- forge/config/defaults/__init__.py +0 -0
- forge/config/defaults/backends/__init__.py +0 -0
- forge/config/defaults/backends/litellm.yaml +196 -0
- forge/config/defaults/templates/__init__.py +0 -0
- forge/config/defaults/templates/litellm-anthropic-local.yaml +33 -0
- forge/config/defaults/templates/litellm-anthropic.yaml +24 -0
- forge/config/defaults/templates/litellm-gemini-flash-local.yaml +37 -0
- forge/config/defaults/templates/litellm-gemini-local.yaml +32 -0
- forge/config/defaults/templates/litellm-gemini-test.yaml +34 -0
- forge/config/defaults/templates/litellm-gemini.yaml +21 -0
- forge/config/defaults/templates/litellm-openai-codex-local.yaml +36 -0
- forge/config/defaults/templates/litellm-openai-local.yaml +38 -0
- forge/config/defaults/templates/litellm-openai.yaml +28 -0
- forge/config/defaults/templates/openrouter-anthropic.yaml +23 -0
- forge/config/defaults/templates/openrouter-deepseek.yaml +26 -0
- forge/config/defaults/templates/openrouter-gemini-flash.yaml +26 -0
- forge/config/defaults/templates/openrouter-gemini.yaml +23 -0
- forge/config/defaults/templates/openrouter-glm.yaml +23 -0
- forge/config/defaults/templates/openrouter-kimi.yaml +30 -0
- forge/config/defaults/templates/openrouter-minimax.yaml +26 -0
- forge/config/defaults/templates/openrouter-openai-codex.yaml +23 -0
- forge/config/defaults/templates/openrouter-openai.yaml +28 -0
- forge/config/defaults/templates/openrouter-qwen.yaml +25 -0
- forge/config/loader.py +675 -0
- forge/config/schema.py +448 -0
- forge/core/__init__.py +5 -0
- forge/core/auth/__init__.py +67 -0
- forge/core/auth/capabilities.py +219 -0
- forge/core/auth/credentials_file.py +244 -0
- forge/core/auth/protocols.py +18 -0
- forge/core/auth/secrets.py +243 -0
- forge/core/auth/template_secrets.py +112 -0
- forge/core/data/__init__.py +5 -0
- forge/core/data/model_catalog.yaml +1522 -0
- forge/core/data/pricing.yaml +140 -0
- forge/core/data/system_prompt_addendums/__init__.py +0 -0
- forge/core/data/system_prompt_addendums/gemini.md +330 -0
- forge/core/data/system_prompt_addendums/openai.md +328 -0
- forge/core/llm/__init__.py +231 -0
- forge/core/llm/clients/__init__.py +14 -0
- forge/core/llm/clients/base.py +115 -0
- forge/core/llm/clients/litellm.py +619 -0
- forge/core/llm/clients/openai_compat.py +244 -0
- forge/core/llm/clients/openrouter.py +234 -0
- forge/core/llm/credentials.py +439 -0
- forge/core/llm/detection.py +86 -0
- forge/core/llm/errors.py +44 -0
- forge/core/llm/protocols.py +80 -0
- forge/core/llm/types.py +176 -0
- forge/core/logging.py +146 -0
- forge/core/models/__init__.py +91 -0
- forge/core/models/catalog.py +467 -0
- forge/core/models/pricing.py +165 -0
- forge/core/models/types.py +167 -0
- forge/core/naming.py +212 -0
- forge/core/ops/__init__.py +73 -0
- forge/core/ops/context.py +141 -0
- forge/core/ops/gc.py +802 -0
- forge/core/ops/proxy.py +146 -0
- forge/core/ops/resolution.py +135 -0
- forge/core/ops/session.py +344 -0
- forge/core/ops/session_context.py +548 -0
- forge/core/paths.py +38 -0
- forge/core/process.py +54 -0
- forge/core/reactive/__init__.py +38 -0
- forge/core/reactive/cost_tracking.py +300 -0
- forge/core/reactive/env.py +180 -0
- forge/core/reactive/proxy.py +78 -0
- forge/core/reactive/routing.py +622 -0
- forge/core/reactive/session_runner.py +185 -0
- forge/core/reactive/structured_output.py +62 -0
- forge/core/reactive/tagger.py +94 -0
- forge/core/reactive/throttle.py +132 -0
- forge/core/state/__init__.py +59 -0
- forge/core/state/exceptions.py +59 -0
- forge/core/state/io.py +140 -0
- forge/core/state/lock.py +99 -0
- forge/core/state/timestamps.py +60 -0
- forge/core/transcript.py +78 -0
- forge/core/typing_helpers.py +24 -0
- forge/core/workqueue/__init__.py +67 -0
- forge/core/workqueue/queue.py +552 -0
- forge/core/workqueue/types.py +63 -0
- forge/guard/__init__.py +26 -0
- forge/guard/deterministic/__init__.py +26 -0
- forge/guard/deterministic/base.py +158 -0
- forge/guard/deterministic/coding_standards.py +256 -0
- forge/guard/deterministic/registry.py +148 -0
- forge/guard/deterministic/tdd.py +171 -0
- forge/guard/engine.py +216 -0
- forge/guard/protocols.py +91 -0
- forge/guard/queries.py +96 -0
- forge/guard/semantic/__init__.py +34 -0
- forge/guard/semantic/promotion.py +18 -0
- forge/guard/semantic/supervisor.py +813 -0
- forge/guard/semantic/verdict.py +183 -0
- forge/guard/store.py +124 -0
- forge/guard/team/__init__.py +6 -0
- forge/guard/team/config.py +24 -0
- forge/guard/team/handlers.py +209 -0
- forge/guard/team/prompts.py +41 -0
- forge/guard/types.py +125 -0
- forge/guard/workflow/__init__.py +17 -0
- forge/guard/workflow/branches.py +67 -0
- forge/guard/workflow/config.py +63 -0
- forge/guard/workflow/divergence.py +113 -0
- forge/guard/workflow/policy.py +87 -0
- forge/guard/workflow/stages.py +205 -0
- forge/install/__init__.py +55 -0
- forge/install/cli.py +281 -0
- forge/install/exceptions.py +163 -0
- forge/install/hooks.py +109 -0
- forge/install/installer.py +1037 -0
- forge/install/models.py +321 -0
- forge/install/preset.py +272 -0
- forge/install/settings_merge.py +831 -0
- forge/install/tracking.py +238 -0
- forge/install/version.py +141 -0
- forge/proxy/__init__.py +0 -0
- forge/proxy/base_client.py +181 -0
- forge/proxy/client_adapter.py +476 -0
- forge/proxy/client_factory.py +531 -0
- forge/proxy/converters.py +1206 -0
- forge/proxy/cost_logger.py +132 -0
- forge/proxy/cost_tracker.py +242 -0
- forge/proxy/data_models.py +338 -0
- forge/proxy/error_hints.py +92 -0
- forge/proxy/metrics.py +222 -0
- forge/proxy/model_spec.py +158 -0
- forge/proxy/proxies.py +333 -0
- forge/proxy/proxy_identity.py +134 -0
- forge/proxy/proxy_orchestrator.py +1018 -0
- forge/proxy/proxy_startup.py +54 -0
- forge/proxy/server.py +1561 -0
- forge/proxy/utils.py +537 -0
- forge/review/__init__.py +6 -0
- forge/review/adversarial.py +111 -0
- forge/review/consensus.py +236 -0
- forge/review/engine.py +356 -0
- forge/review/models.py +437 -0
- forge/review/resources/__init__.py +5 -0
- forge/review/resources/codereview-performance.md +85 -0
- forge/review/resources/codereview-quick.md +75 -0
- forge/review/resources/codereview-security.md +92 -0
- forge/review/resources/codereview.md +85 -0
- forge/review/resources/docreview-quick.md +75 -0
- forge/review/resources/docreview.md +86 -0
- forge/review/resources/thinkdeep.md +89 -0
- forge/review/routing.py +368 -0
- forge/review/synthesis.py +73 -0
- forge/runtime_config.py +438 -0
- forge/search/__init__.py +55 -0
- forge/search/bm25_store.py +264 -0
- forge/search/content_store.py +197 -0
- forge/search/engine.py +352 -0
- forge/search/exceptions.py +51 -0
- forge/search/extractor.py +234 -0
- forge/search/index_state.py +295 -0
- forge/search/store.py +215 -0
- forge/search/tokenizer.py +24 -0
- forge/session/__init__.py +130 -0
- forge/session/active.py +339 -0
- forge/session/artifacts.py +202 -0
- forge/session/claude/__init__.py +50 -0
- forge/session/claude/cleanup.py +105 -0
- forge/session/claude/invoke.py +236 -0
- forge/session/claude/paths.py +200 -0
- forge/session/cleanup.py +216 -0
- forge/session/config.py +34 -0
- forge/session/direct_model.py +107 -0
- forge/session/effective.py +169 -0
- forge/session/exceptions.py +255 -0
- forge/session/handoff.py +881 -0
- forge/session/handoff_agent.py +544 -0
- forge/session/hooks/__init__.py +35 -0
- forge/session/hooks/models.py +73 -0
- forge/session/hooks/session_start.py +507 -0
- forge/session/identity.py +84 -0
- forge/session/index.py +553 -0
- forge/session/manager.py +1506 -0
- forge/session/models.py +572 -0
- forge/session/overrides.py +344 -0
- forge/session/plan_resolution.py +286 -0
- forge/session/prev_sessions.py +128 -0
- forge/session/store.py +431 -0
- forge/session/validation.py +47 -0
- forge/session/worktree/__init__.py +65 -0
- forge/session/worktree/cleanup.py +262 -0
- forge/session/worktree/config_copy.py +203 -0
- forge/session/worktree/create.py +332 -0
- forge/sidecar/__init__.py +29 -0
- forge/sidecar/container.py +161 -0
- forge/sidecar/docker.py +86 -0
- forge/sidecar/secrets.py +19 -0
- multi_forge-0.2.0.dist-info/METADATA +242 -0
- multi_forge-0.2.0.dist-info/RECORD +311 -0
- multi_forge-0.2.0.dist-info/WHEEL +4 -0
- multi_forge-0.2.0.dist-info/entry_points.txt +2 -0
- multi_forge-0.2.0.dist-info/licenses/LICENSE +203 -0
- multi_forge-0.2.0.dist-info/licenses/NOTICE +14 -0
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Adversarial Code Evaluation
|
|
2
|
+
|
|
3
|
+
```xml
|
|
4
|
+
<role>
|
|
5
|
+
You are a senior code evaluator performing a structured adversarial assessment.
|
|
6
|
+
{stance_prompt}
|
|
7
|
+
You identify bugs, design issues, security concerns, and performance problems.
|
|
8
|
+
You provide actionable feedback with specific code references.
|
|
9
|
+
</role>
|
|
10
|
+
|
|
11
|
+
<behavior>
|
|
12
|
+
- Read all code in scope before forming opinions
|
|
13
|
+
- Cite specific file:line references for every finding
|
|
14
|
+
- Evaluate strictly on technical merits
|
|
15
|
+
- Support every claim with evidence or reasoning
|
|
16
|
+
- Cover ALL files in ONE pass -- do not present partial results
|
|
17
|
+
- Be specific: "potential null dereference at auth.py:45" not "might have issues"
|
|
18
|
+
- Provide a clear verdict with confidence level
|
|
19
|
+
</behavior>
|
|
20
|
+
|
|
21
|
+
<scope_constraints>
|
|
22
|
+
- Review only what's in scope
|
|
23
|
+
- Do not expand to adjacent code unless directly affected
|
|
24
|
+
- If tests exist for reviewed code, check them for coverage gaps
|
|
25
|
+
</scope_constraints>
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Code Under Evaluation
|
|
31
|
+
|
|
32
|
+
{target}
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Evaluation Framework
|
|
37
|
+
|
|
38
|
+
### 1. Quality
|
|
39
|
+
|
|
40
|
+
- Logic errors and edge cases
|
|
41
|
+
- Error handling: are errors caught, propagated, and surfaced correctly?
|
|
42
|
+
- Type safety: do type annotations match runtime behavior?
|
|
43
|
+
- Test coverage: are critical paths tested?
|
|
44
|
+
|
|
45
|
+
### 2. Security
|
|
46
|
+
|
|
47
|
+
- Input validation at trust boundaries
|
|
48
|
+
- Injection vectors (command, SQL, path traversal)
|
|
49
|
+
- Secrets in code or logs
|
|
50
|
+
- Authentication and authorization gaps
|
|
51
|
+
|
|
52
|
+
### 3. Performance
|
|
53
|
+
|
|
54
|
+
- Unnecessary allocations or copies in hot paths
|
|
55
|
+
- N+1 query patterns
|
|
56
|
+
- Missing caching where data is reused
|
|
57
|
+
- Blocking calls in async contexts
|
|
58
|
+
|
|
59
|
+
### 4. Architecture
|
|
60
|
+
|
|
61
|
+
- Component boundaries: is coupling appropriate?
|
|
62
|
+
- Dependency direction: do imports flow the right way?
|
|
63
|
+
- Abstraction level: is complexity in the right place?
|
|
64
|
+
- Interface contracts: are public APIs stable and well-defined?
|
|
65
|
+
|
|
66
|
+
### 5. Risks
|
|
67
|
+
|
|
68
|
+
- What could go wrong in production?
|
|
69
|
+
- What is the blast radius of failure?
|
|
70
|
+
- Missing error recovery or graceful degradation?
|
|
71
|
+
- Deployment or migration risks?
|
|
72
|
+
|
|
73
|
+
### 6. Recommendation
|
|
74
|
+
|
|
75
|
+
- Overall verdict: ACCEPT, ACCEPT_WITH_CONDITIONS, or REJECT
|
|
76
|
+
- Confidence level: LOW, MEDIUM, HIGH
|
|
77
|
+
- Key conditions (if ACCEPT_WITH_CONDITIONS)
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## Output Format
|
|
82
|
+
|
|
83
|
+
````xml
|
|
84
|
+
<output_format>
|
|
85
|
+
Respond with a structured evaluation in JSON:
|
|
86
|
+
|
|
87
|
+
{
|
|
88
|
+
"verdict": "ACCEPT" | "ACCEPT_WITH_CONDITIONS" | "REJECT",
|
|
89
|
+
"confidence": "LOW" | "MEDIUM" | "HIGH",
|
|
90
|
+
"key_findings": [
|
|
91
|
+
{"category": "quality|security|performance|architecture|risks",
|
|
92
|
+
"finding": "specific finding with file:line reference",
|
|
93
|
+
"severity": "critical|high|medium|low"}
|
|
94
|
+
],
|
|
95
|
+
"recommendation": "1-2 sentence summary of your recommendation",
|
|
96
|
+
"conditions": ["condition 1", "condition 2"]
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
Wrap the JSON in a ```json code fence.
|
|
100
|
+
</output_format>
|
|
101
|
+
````
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# Structured Evaluation
|
|
2
|
+
|
|
3
|
+
```xml
|
|
4
|
+
<role>
|
|
5
|
+
You are a technical evaluator performing a structured assessment.
|
|
6
|
+
{stance_prompt}
|
|
7
|
+
</role>
|
|
8
|
+
|
|
9
|
+
<behavior>
|
|
10
|
+
- Evaluate strictly on technical merits
|
|
11
|
+
- Support every claim with evidence or reasoning
|
|
12
|
+
- Be specific: cite exact trade-offs, not vague concerns
|
|
13
|
+
- Provide a clear verdict with confidence level
|
|
14
|
+
</behavior>
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Proposal Under Evaluation
|
|
20
|
+
|
|
21
|
+
{proposal}
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Evaluation Framework
|
|
26
|
+
|
|
27
|
+
### 1. Feasibility
|
|
28
|
+
|
|
29
|
+
- Can this be implemented with the available technology and resources?
|
|
30
|
+
- What are the key technical dependencies?
|
|
31
|
+
- Are there proven precedents or is this novel?
|
|
32
|
+
|
|
33
|
+
### 2. Correctness
|
|
34
|
+
|
|
35
|
+
- Does the proposal solve the stated problem?
|
|
36
|
+
- Are there logical gaps or incorrect assumptions?
|
|
37
|
+
- Does it handle edge cases and failure modes?
|
|
38
|
+
|
|
39
|
+
### 3. Trade-offs
|
|
40
|
+
|
|
41
|
+
- What does this approach gain vs alternatives?
|
|
42
|
+
- What does it cost (complexity, performance, maintenance)?
|
|
43
|
+
- Are the trade-offs appropriate for the context?
|
|
44
|
+
|
|
45
|
+
### 4. Risks
|
|
46
|
+
|
|
47
|
+
- What could go wrong in implementation?
|
|
48
|
+
- What could go wrong in production?
|
|
49
|
+
- What is the blast radius of failure?
|
|
50
|
+
|
|
51
|
+
### 5. Completeness
|
|
52
|
+
|
|
53
|
+
- Are all requirements addressed?
|
|
54
|
+
- Are there missing considerations?
|
|
55
|
+
- What would need to be added before this is production-ready?
|
|
56
|
+
|
|
57
|
+
### 6. Alternatives
|
|
58
|
+
|
|
59
|
+
- What other approaches could solve this problem?
|
|
60
|
+
- Why might they be better or worse?
|
|
61
|
+
|
|
62
|
+
### 7. Recommendation
|
|
63
|
+
|
|
64
|
+
- Overall verdict: ACCEPT, ACCEPT_WITH_CONDITIONS, or REJECT
|
|
65
|
+
- Confidence level: LOW, MEDIUM, HIGH
|
|
66
|
+
- Key conditions (if ACCEPT_WITH_CONDITIONS)
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Output Format
|
|
71
|
+
|
|
72
|
+
````xml
|
|
73
|
+
<output_format>
|
|
74
|
+
Respond with a structured evaluation in JSON:
|
|
75
|
+
|
|
76
|
+
{
|
|
77
|
+
"verdict": "ACCEPT" | "ACCEPT_WITH_CONDITIONS" | "REJECT",
|
|
78
|
+
"confidence": "LOW" | "MEDIUM" | "HIGH",
|
|
79
|
+
"key_findings": [
|
|
80
|
+
{"category": "feasibility|correctness|trade-offs|risks|completeness",
|
|
81
|
+
"finding": "specific finding",
|
|
82
|
+
"severity": "critical|high|medium|low"}
|
|
83
|
+
],
|
|
84
|
+
"recommendation": "1-2 sentence summary of your recommendation",
|
|
85
|
+
"conditions": ["condition 1", "condition 2"]
|
|
86
|
+
}
|
|
87
|
+
|
|
88
|
+
Wrap the JSON in a ```json code fence.
|
|
89
|
+
</output_format>
|
|
90
|
+
````
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: forge:panel
|
|
3
|
+
description: Multi-model panel review. Multiple models review independently, then findings are synthesized.
|
|
4
|
+
disable-model-invocation: true
|
|
5
|
+
argument-hint: '[target: path or instruction] [--output path] [--code] [--models m1,m2] [--roles r1,r2] [--review-type type] [--severity level]'
|
|
6
|
+
context: fork
|
|
7
|
+
allowed-tools: Bash, Read
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Panel Review
|
|
11
|
+
|
|
12
|
+
Run a panel review: fans out the same review task to multiple models in parallel, then synthesizes findings.
|
|
13
|
+
|
|
14
|
+
## Usage
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
/forge:panel [target] [--code] [--models model1,model2]
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Arguments
|
|
21
|
+
|
|
22
|
+
| Argument | Required | Description |
|
|
23
|
+
| --------------- | -------- | ---------------------------------------------------------------------------- |
|
|
24
|
+
| `target` | Optional | File, directory, or instruction on what to review (defaults to cwd) |
|
|
25
|
+
| `--code` | Optional | Switch: use code review framework (default: document review) |
|
|
26
|
+
| `--models` | Optional | Comma-separated model list (default: Forge workflow defaults) |
|
|
27
|
+
| `--roles` | Optional | Comma-separated reviewer roles (security, performance, architecture, ...) |
|
|
28
|
+
| `--review-type` | Optional | Review focus: full, security, performance, quick (security/perf need --code) |
|
|
29
|
+
| `--severity` | Optional | Minimum severity to report: high or critical |
|
|
30
|
+
| `--output` | Optional | Write result to file instead of conversation (e.g., `review.md`) |
|
|
31
|
+
|
|
32
|
+
**Available models:** !`forge workflow list-models`
|
|
33
|
+
|
|
34
|
+
Only use models with status **ready** in the table above. If the default set includes unavailable models, pass
|
|
35
|
+
`--models <ready models>` explicitly. If the user explicitly requested an unavailable model, stop and tell them what
|
|
36
|
+
proxy or credential is missing rather than silently substituting. If no models are ready, tell the user what's missing
|
|
37
|
+
and stop.
|
|
38
|
+
|
|
39
|
+
## Models Used
|
|
40
|
+
|
|
41
|
+
| Model | Strength | Via |
|
|
42
|
+
| ------------------------ | ----------------------------------- | ----------------------- |
|
|
43
|
+
| `gpt-5.5` | Logical problems, systematic review | openrouter-openai proxy |
|
|
44
|
+
| `gemini-3.1-pro-preview` | Balanced analysis, large context | openrouter-gemini proxy |
|
|
45
|
+
| `claude-opus` | Stable Claude Opus 4.6 reasoning | Direct Anthropic |
|
|
46
|
+
|
|
47
|
+
Selectable direct Claude workers include `claude-opus-4.6`, `claude-opus-4.6-1m`, and `claude-opus-4.7`. Use
|
|
48
|
+
`claude-opus-4.7` as a bounded review/quorum worker when the prompt has a concrete target and should require file:line
|
|
49
|
+
evidence. You can include both 4.6 and 4.7 in one panel, for example:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
forge workflow panel src/ --code --models claude-opus-4.6,claude-opus-4.7 --json --cwd "$(pwd)"
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Execution
|
|
58
|
+
|
|
59
|
+
### Step 1: Resolve Target and Flags
|
|
60
|
+
|
|
61
|
+
Parse `$ARGUMENTS` into a positional target and optional flags. The target is the first non-flag value (file path,
|
|
62
|
+
directory, or free-form instruction). Strip any leading `@` prefix on the target (Claude Code file reference syntax). If
|
|
63
|
+
no target is found, default to the current working directory.
|
|
64
|
+
|
|
65
|
+
Recognized flags (extract from `$ARGUMENTS` if present):
|
|
66
|
+
|
|
67
|
+
- `--code` — switch
|
|
68
|
+
- `--models <value>` — comma-separated model list
|
|
69
|
+
- `--roles <value>` — comma-separated role list
|
|
70
|
+
- `--review-type <value>` — one of: full, security, performance, quick
|
|
71
|
+
- `--severity <value>` — one of: high, critical
|
|
72
|
+
- `--output <path>` — write result to file instead of conversation
|
|
73
|
+
|
|
74
|
+
Never ask the user to clarify. If `$ARGUMENTS` contains anything, proceed immediately.
|
|
75
|
+
|
|
76
|
+
### Step 2: Run Multi-Model Review
|
|
77
|
+
|
|
78
|
+
Execute the panel workflow, forwarding all parsed flags:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
forge workflow panel <target> [--code] [--models <models>] [--roles <roles>] [--review-type <type>] [--severity <sev>] --json --cwd "$(pwd)"
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Omit any flag the user didn't specify.
|
|
85
|
+
|
|
86
|
+
Parse the JSON output. The structure is:
|
|
87
|
+
|
|
88
|
+
```json
|
|
89
|
+
{
|
|
90
|
+
"prompt": "...",
|
|
91
|
+
"results": {
|
|
92
|
+
"gpt-5.5": {"response": "...", "error": null, "success": true, "duration_seconds": 45.2},
|
|
93
|
+
"gemini-3.1-pro-preview": {"response": "...", "error": null, "success": true, "duration_seconds": 38.1},
|
|
94
|
+
"claude-opus": {"response": "...", "error": null, "success": true, "duration_seconds": 52.7}
|
|
95
|
+
},
|
|
96
|
+
"resolved_models": {
|
|
97
|
+
"gpt-5.5": {
|
|
98
|
+
"requested_model": "gpt-5.5",
|
|
99
|
+
"resolved_model": "openai/gpt-5.5",
|
|
100
|
+
"provider": "openrouter",
|
|
101
|
+
"proxy": "openrouter-openai",
|
|
102
|
+
"template": "openrouter-openai",
|
|
103
|
+
"source": "preferred_proxy"
|
|
104
|
+
}
|
|
105
|
+
},
|
|
106
|
+
"successful": 3,
|
|
107
|
+
"failed": 0
|
|
108
|
+
}
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Step 3: Synthesize Results
|
|
112
|
+
|
|
113
|
+
Read `${CLAUDE_SKILL_DIR}/resources/synthesis.md` for synthesis instructions. If the file is missing, report the actual
|
|
114
|
+
missing-path problem and stop. Then respond with:
|
|
115
|
+
|
|
116
|
+
0. Resolved models used: one line per worker from `resolved_models`, including requested model, resolved model ref,
|
|
117
|
+
provider, proxy, and template
|
|
118
|
+
1. Consensus issues (found by 2+ models)
|
|
119
|
+
2. Unique findings from each model
|
|
120
|
+
3. Conflict resolution
|
|
121
|
+
4. Unified priority list
|
|
122
|
+
5. Suggested fix order based on dependencies
|
|
123
|
+
|
|
124
|
+
**Output routing:** If `--output` was specified, write the complete synthesis to that path using the Write tool (create
|
|
125
|
+
parent directories if needed). Print a one-line confirmation: `Wrote synthesis to {path}`. Do not also print the full
|
|
126
|
+
result in the conversation. If `--output` was not specified, print the result in the conversation as usual.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Error Handling
|
|
131
|
+
|
|
132
|
+
- If 1 model fails: Include its error, synthesize from successful models
|
|
133
|
+
- If 2+ models fail: Report failure, do not attempt synthesis
|
|
134
|
+
- If proxy not available: `forge workflow panel` skips that model and reports the error in JSON
|
|
135
|
+
|
|
136
|
+
## Requirements
|
|
137
|
+
|
|
138
|
+
- **Forge CLI**: `forge` must be on PATH
|
|
139
|
+
- **Claude CLI**: workflow workers run through local `claude -p`; `claude` must be on PATH in this Bash environment
|
|
140
|
+
- **Proxies**: GPT-5.5 and Gemini require active proxies (`forge proxy create openrouter-openai`)
|
|
141
|
+
- **List available models**: `forge workflow list-models`
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
# Multi-Model Synthesis Instructions
|
|
2
|
+
|
|
3
|
+
You have received responses from multiple AI models reviewing the same target. Your task is to synthesize these into a
|
|
4
|
+
unified, actionable report.
|
|
5
|
+
|
|
6
|
+
## Synthesis Framework
|
|
7
|
+
|
|
8
|
+
### 1. Identify Consensus Issues
|
|
9
|
+
|
|
10
|
+
Issues found by **2 or more models** have higher confidence. List these first:
|
|
11
|
+
|
|
12
|
+
```markdown
|
|
13
|
+
## Consensus Findings (High Confidence)
|
|
14
|
+
|
|
15
|
+
### Critical
|
|
16
|
+
- **[Issue]** (found by: gpt-5.5, gemini-3.1-pro-preview)
|
|
17
|
+
- Location: `file.py:123`
|
|
18
|
+
- Impact: [description]
|
|
19
|
+
- Fix: [suggestion]
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### 2. Catalog Unique Findings
|
|
23
|
+
|
|
24
|
+
Each model has different strengths. Unique findings may be valid insights others missed:
|
|
25
|
+
|
|
26
|
+
| Model | Strength | Unique Finding Type |
|
|
27
|
+
| ---------------------- | ------------ | ------------------------------------------ |
|
|
28
|
+
| gpt-5.5 | Logic errors | Edge cases, off-by-one, null handling |
|
|
29
|
+
| gemini-3.1-pro-preview | Pragmatic | Missing tests, documentation gaps |
|
|
30
|
+
| claude-opus | Architecture | Coupling, abstraction leaks, design issues |
|
|
31
|
+
|
|
32
|
+
### 3. Resolve Conflicts
|
|
33
|
+
|
|
34
|
+
When models disagree:
|
|
35
|
+
|
|
36
|
+
1. **Examine the target** directly to verify claims
|
|
37
|
+
2. **Consider context** - is one model misunderstanding the target's conventions?
|
|
38
|
+
3. **Note uncertainty** if unresolvable
|
|
39
|
+
|
|
40
|
+
```markdown
|
|
41
|
+
## Disputed Findings
|
|
42
|
+
|
|
43
|
+
- **[Issue]**: gpt-5.5 says X, gemini says Y
|
|
44
|
+
- My assessment: [your determination after examining the target]
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### 3.5. Extract Cross-Review Insights
|
|
48
|
+
|
|
49
|
+
What does the *combination* of reviews reveal that no single review shows?
|
|
50
|
+
|
|
51
|
+
1. **Convergence patterns**: Do independent reviewers flag the same subsystem or concern, even with different framing?
|
|
52
|
+
Shared convergence on an area amplifies its importance.
|
|
53
|
+
2. **Blind spots from disagreement**: When one model flags a risk that others ignore, note whether the others lacked
|
|
54
|
+
evidence or lacked the analytical frame to see it.
|
|
55
|
+
3. **Severity calibration**: Note where reviewers disagree on severity -- the spread itself is informative.
|
|
56
|
+
4. **Mechanical/parsing findings**: Findings based on literal parsing (syntax errors, invalid markup, broken links,
|
|
57
|
+
wrong field names) are uniquely valuable from multi-model review. Elevate these regardless of which single model
|
|
58
|
+
found them.
|
|
59
|
+
|
|
60
|
+
### 4. Create Unified Priority List
|
|
61
|
+
|
|
62
|
+
Rank all validated findings by:
|
|
63
|
+
|
|
64
|
+
1. **Severity**: Critical > High > Medium > Low
|
|
65
|
+
2. **Confidence**: Consensus > Unique (verified) > Unique (unverified)
|
|
66
|
+
3. **Scope**: Widespread > Isolated
|
|
67
|
+
|
|
68
|
+
### 5. Suggest Fix Order
|
|
69
|
+
|
|
70
|
+
Consider dependencies when ordering fixes:
|
|
71
|
+
|
|
72
|
+
```markdown
|
|
73
|
+
## Recommended Fix Order
|
|
74
|
+
|
|
75
|
+
1. [Critical issue] - blocks other fixes
|
|
76
|
+
2. [High issue] - foundation for others
|
|
77
|
+
3. [Medium issues] - can be parallelized
|
|
78
|
+
4. [Low issues] - nice to have
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Output Format
|
|
82
|
+
|
|
83
|
+
```markdown
|
|
84
|
+
# Multi-Model Review: [Target Name]
|
|
85
|
+
|
|
86
|
+
## Summary
|
|
87
|
+
- Models consulted: 3 (gpt-5.5, gemini-3.1-pro-preview, claude-opus)
|
|
88
|
+
- Consensus issues: N
|
|
89
|
+
- Unique findings: N
|
|
90
|
+
- Conflicts resolved: N
|
|
91
|
+
|
|
92
|
+
## Consensus Findings (High Confidence)
|
|
93
|
+
[...]
|
|
94
|
+
|
|
95
|
+
## Unique Findings Worth Noting
|
|
96
|
+
[...]
|
|
97
|
+
|
|
98
|
+
## Disputed or Uncertain
|
|
99
|
+
[...]
|
|
100
|
+
|
|
101
|
+
## Recommended Fix Order
|
|
102
|
+
[...]
|
|
103
|
+
```
|