@exaudeus/workrail 3.1.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.1.0",
3
+ "version": "3.2.0",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "id": "coding-task-workflow-agentic",
3
3
  "name": "Agentic Task Dev Workflow (Lean • Notes-First • WorkRail Executor)",
4
- "version": "2.1.0",
4
+ "version": "1.0.0",
5
5
  "description": "Lean variant of the agentic coding workflow. Merges triage, inputs gate, context gathering, and re-triage into a single Understand & Classify phase. Reduces context variable count and removes top-level clarificationPrompts. Same quality guarantees with fewer tokens.",
6
6
  "recommendedPreferences": {
7
7
  "recommendedAutonomy": "guided",
@@ -58,7 +58,7 @@
58
58
  { "var": "rigorMode", "equals": "QUICK" }
59
59
  ]
60
60
  },
61
- "prompt": "Generate a lightweight design inline. QUICK rigor means the path is clear and risk is low.\n\nProduce two mandatory candidates:\n1. The simplest possible change that satisfies acceptance criteria\n2. Follow the existing repo pattern for this kind of change\n\nFor each candidate:\n- One-sentence summary\n- Key tradeoff\n- Failure mode to watch\n- Philosophy fit (name specific principles)\n\nCompare and recommend. If both converge on the same approach, say so honestly.\n\nWrite the output to `design-candidates.md` using the same structure as the deep design path:\n- Problem Understanding\n- Candidates (each with: summary, tradeoff, failure mode, philosophy fit)\n- Comparison and Recommendation\n\nSet context variable:\n- `designSummary` one-paragraph summary of the recommendation and why",
61
+ "prompt": "Generate a lightweight design inline. QUICK rigor means the path is clear and risk is low.\n\nProduce two mandatory candidates:\n1. The simplest possible change that satisfies acceptance criteria\n2. Follow the existing repo pattern for this kind of change\n\nFor each candidate:\n- One-sentence summary\n- Which tensions it resolves and which it accepts\n- How it relates to existing repo patterns (follows / adapts / departs)\n- Failure mode to watch\n- Philosophy fit (name specific principles)\n\nCompare and recommend. If both converge on the same approach, say so honestly.\n\nWrite the output to `design-candidates.md` with this structure:\n- Problem Understanding (core tensions, what makes it hard)\n- Philosophy Constraints (which principles matter for this problem)\n- Candidates (each with: summary, tensions resolved/accepted, failure mode, philosophy fit)\n- Comparison and Recommendation\n- Open Questions (if any remain)",
62
62
  "requireConfirmation": false
63
63
  },
64
64
  {
@@ -112,13 +112,30 @@
112
112
  },
113
113
  "body": [
114
114
  {
115
- "id": "phase-2a-review-design",
116
- "title": "Review Design for Gaps, Issues, and Improvements",
117
- "prompt": "Review the selected architecture using the explicit tradeoffs and failure modes from Phase 1 as your review criteria — not a generic gaps checklist.\n\nTargeted review (derived from Phase 1 outputs):\n1. Are the `acceptedTradeoffs` actually acceptable? For each accepted tradeoff, verify it won't violate acceptance criteria or invariants under realistic conditions.\n2. Are the `identifiedFailureModes` actually handled? For each failure mode, trace through the design and confirm there's a mitigation path. If not, flag it.\n3. Does the selected approach's relationship to existing repo patterns hold up? If it 'adapts' an existing pattern, verify the adaptation doesn't break the invariants the original pattern protects.\n4. Is there a simpler version of the selected approach that still satisfies acceptance criteria? (Complexity must continue to justify itself.)\n\nCompare against the runner-up:\n- Are there elements from the runner-up that would strengthen the selected approach without adding complexity?\n- Would a hybrid resolve an accepted tradeoff that's bothering you?\n\nPhilosophy alignment: does the architecture respect the user's active coding rules?\n\nBefore delegating, state your current assessment: what do you think the strongest and weakest parts of the design are right now?\n\nMode-adaptive delegation:\n- QUICK: self-review only\n- STANDARD: optionally spawn ONE WorkRail Executor running `routine-hypothesis-challenge` focused specifically on the accepted tradeoffs and failure modes\n- THOROUGH: spawn TWO WorkRail Executors — `routine-hypothesis-challenge` on tradeoffs + `routine-execution-simulation` on failure modes\n\nAfter receiving subagent output (if used), interrogate it against your pre-assessment. Do not adopt their framing wholesale. State what changed your thinking and what didn't.\n\nIf issues are found, fix the design (update `selectedApproach`, `architectureRationale`, `pivotTriggers`, `acceptedTradeoffs`, `identifiedFailureModes`) before continuing.\n\nSet context variables:\n- `designFindings`\n- `designRevised`",
115
+ "id": "phase-2a-pre-assess-design-review",
116
+ "title": "Pre-Assess Design Review",
117
+ "prompt": "Before the detailed design review, state your current assessment in 2-4 sentences.\n\nSay:\n- what you think the strongest part of the selected design is right now\n- what you think the weakest part is right now\n- which tradeoff or failure mode worries you most\n\nThis is your reference point for interpreting the review findings.\n\nSet context variable:\n- `designReviewAssessment`",
118
118
  "requireConfirmation": false
119
119
  },
120
120
  {
121
- "id": "phase-2b-loop-decision",
121
+ "id": "phase-2b-design-review-core",
122
+ "title": "Design Review Core",
123
+ "templateCall": {
124
+ "templateId": "wr.templates.routine.design-review",
125
+ "args": {
126
+ "deliverableName": "design-review-findings.md"
127
+ }
128
+ },
129
+ "requireConfirmation": false
130
+ },
131
+ {
132
+ "id": "phase-2c-synthesize-design-review",
133
+ "title": "Synthesize Design Review Findings",
134
+ "prompt": "Read `design-review-findings.md` and synthesize the review into workflow-owned decisions.\n\nPart A — Compare against your pre-assessment:\nRevisit `designReviewAssessment`.\n- What did the review confirm?\n- What did it surface that you missed?\n- What changed your mind and what held firm?\n\nPart B — Optional mode-adaptive challenge around the review findings:\n- QUICK: self-synthesize only\n- STANDARD: optionally spawn ONE WorkRail Executor running `routine-hypothesis-challenge` focused on the most serious review finding\n- THOROUGH: optionally spawn TWO WorkRail Executors — `routine-hypothesis-challenge` on the most serious finding + `routine-execution-simulation` on the most dangerous failure mode\n\nPart C — Decide:\nInterpret the findings yourself. Do not adopt the review artifact or any subagent framing wholesale.\n\nIf issues are found, fix the design (update `selectedApproach`, `architectureRationale`, `pivotTriggers`, `acceptedTradeoffs`, `identifiedFailureModes`) before continuing.\n\nSet context variables:\n- `designFindings`\n- `designRevised`",
135
+ "requireConfirmation": false
136
+ },
137
+ {
138
+ "id": "phase-2d-loop-decision",
122
139
  "title": "Design Review Loop Decision",
123
140
  "prompt": "Provide a loop control artifact.\n\nDecision rules:\n- if `designFindings` is non-empty and design was revised -> continue (verify the revision)\n- if `designFindings` is empty -> stop\n- if max iterations reached -> stop and document remaining concerns\n\nOutput exactly:\n```json\n{\n \"artifacts\": [{\n \"kind\": \"wr.loop_control\",\n \"decision\": \"continue\"\n }]\n}\n```",
124
141
  "requireConfirmation": false,
@@ -242,13 +259,24 @@
242
259
  },
243
260
  "body": [
244
261
  {
245
- "id": "phase-7a-verify-and-fix",
246
- "title": "Verify Integration and Fix Issues",
247
- "prompt": "Perform integration verification across all implemented slices.\n\nRequired:\n- verify acceptance criteria\n- map invariants to concrete proof (tests, build results, explicit reasoning)\n- run whole-task validation commands\n- identify any invariant violations or regressions\n- confirm the implemented result aligns with the user's coding philosophy, naming any tensions explicitly\n- review cumulative drift across all slices\n- check whether repeated small compromises added up to a larger pattern problem\n\nIf issues are found, fix them immediately:\n- apply code fixes\n- re-run affected tests\n- update `implementation_plan.md` if the fix changed boundaries or approach\n\nSet context variables:\n- `integrationFindings`\n- `integrationPassed`\n- `regressionDetected`",
262
+ "id": "phase-7a-final-verification-core",
263
+ "title": "Final Verification Core",
264
+ "templateCall": {
265
+ "templateId": "wr.templates.routine.final-verification",
266
+ "args": {
267
+ "deliverableName": "final-verification-findings.md"
268
+ }
269
+ },
270
+ "requireConfirmation": false
271
+ },
272
+ {
273
+ "id": "phase-7b-fix-and-summarize",
274
+ "title": "Fix Issues and Summarize Verification",
275
+ "prompt": "Read `final-verification-findings.md` and turn it into workflow-owned decisions and fixes.\n\nRequired:\n- interpret the findings yourself rather than rubber-stamping them\n- identify any invariant violations or regressions that must be fixed now\n- if issues are found, fix them immediately\n- re-run affected tests\n- update `implementation_plan.md` if the fix changed boundaries or approach\n\nSet context variables:\n- `integrationFindings`\n- `integrationPassed`\n- `regressionDetected`",
248
276
  "requireConfirmation": false
249
277
  },
250
278
  {
251
- "id": "phase-7b-loop-decision",
279
+ "id": "phase-7c-loop-decision",
252
280
  "title": "Final Verification Loop Decision",
253
281
  "prompt": "Provide a loop control artifact.\n\nDecision rules:\n- if `integrationFindings` is non-empty and fixes were applied -> continue (re-verify the fixes)\n- if `integrationFindings` is empty or all issues resolved -> stop and produce handoff\n- if max iterations reached -> stop and document remaining concerns\n\nWhen stopping, include the handoff summary:\n- acceptance criteria status\n- invariant status\n- test/build summary\n- concise PR/MR description draft (why, test plan, rollout notes)\n- follow-up tickets\n- any philosophy tensions accepted intentionally and why\n\nKeep the handoff concise and executive-level. Do not auto-merge or push unless the user explicitly asks.\n\nOutput exactly:\n```json\n{\n \"artifacts\": [{\n \"kind\": \"wr.loop_control\",\n \"decision\": \"continue\"\n }]\n}\n```",
254
282
  "requireConfirmation": true,
@@ -0,0 +1,60 @@
1
+ {
2
+ "id": "routine-design-review",
3
+ "name": "Design Review Routine",
4
+ "version": "1.0.0",
5
+ "description": "Reviews a selected design using explicit tradeoffs, failure modes, simpler-alternative checks, runner-up comparison, and philosophy alignment. Produces a reusable design-review findings artifact.",
6
+ "clarificationPrompts": [
7
+ "What design artifact or summary should I review?",
8
+ "What tradeoffs, failure modes, and runner-up option are available?",
9
+ "What artifact name should I produce?"
10
+ ],
11
+ "preconditions": [
12
+ "A selected design or design summary is available",
13
+ "Accepted tradeoffs and failure modes are available",
14
+ "A runner-up or alternative approach is available",
15
+ "The dev's philosophy or rules are available"
16
+ ],
17
+ "metaGuidance": [
18
+ "PURPOSE: review the quality of a selected design, not generate a fresh design from scratch.",
19
+ "ROLE: you are a reviewer looking for real gaps, not generic criticism.",
20
+ "PHILOSOPHY: name tensions by principle when they matter.",
21
+ "SIMPLICITY: always ask whether a simpler version would still satisfy acceptance criteria."
22
+ ],
23
+ "steps": [
24
+ {
25
+ "id": "step-review-tradeoffs",
26
+ "title": "Step 1: Review Accepted Tradeoffs",
27
+ "prompt": "Review the selected design by walking through the accepted tradeoffs explicitly.\n\nFor each accepted tradeoff:\n- Verify it will not violate acceptance criteria or invariants under realistic conditions\n- Identify what would make the tradeoff no longer acceptable\n- Note any hidden assumptions\n\nWorking notes:\n- Tradeoff review\n- Hidden assumptions\n- Conditions under which the tradeoff fails",
28
+ "agentRole": "You are a reviewer validating whether accepted tradeoffs are actually acceptable.",
29
+ "requireConfirmation": false
30
+ },
31
+ {
32
+ "id": "step-review-failure-modes",
33
+ "title": "Step 2: Review Failure Modes",
34
+ "prompt": "Review the identified failure modes for the selected design.\n\nFor each failure mode:\n- Trace whether the design handles it adequately\n- Identify missing mitigations\n- Note which failure mode is most dangerous if it occurs\n\nWorking notes:\n- Failure mode coverage\n- Missing mitigations\n- Highest-risk failure mode",
35
+ "agentRole": "You are a failure analyst checking whether the design can survive realistic problems.",
36
+ "requireConfirmation": false
37
+ },
38
+ {
39
+ "id": "step-compare-runner-up",
40
+ "title": "Step 3: Compare Against Runner-Up and Simpler Alternatives",
41
+ "prompt": "Compare the selected design against the runner-up and a simpler possible variant.\n\nReview:\n- Whether the runner-up has elements worth pulling into the selected design\n- Whether a hybrid would resolve an uncomfortable tradeoff without adding much complexity\n- Whether a simpler version of the selected design would still satisfy acceptance criteria\n\nWorking notes:\n- Runner-up strengths worth borrowing\n- Simpler alternative analysis\n- Hybrid opportunities",
42
+ "agentRole": "You are comparing options honestly rather than defending the current favorite.",
43
+ "requireConfirmation": false
44
+ },
45
+ {
46
+ "id": "step-review-philosophy",
47
+ "title": "Step 4: Review Philosophy Alignment",
48
+ "prompt": "Review the selected design against the dev's philosophy.\n\nName the principles that matter for this design and assess:\n- Which principles are satisfied clearly\n- Which principles are under tension\n- Which tensions are acceptable versus risky\n\nWorking notes:\n- Relevant principles\n- Satisfied principles\n- Tensions and why they matter",
49
+ "agentRole": "You are checking whether the design respects what the dev actually values.",
50
+ "requireConfirmation": false
51
+ },
52
+ {
53
+ "id": "step-deliver",
54
+ "title": "Step 5: Deliver Design Review Findings",
55
+ "prompt": "Create `{deliverableName}`.\n\nRequired structure:\n- Tradeoff Review\n- Failure Mode Review\n- Runner-Up / Simpler Alternative Review\n- Philosophy Alignment\n- Findings (Red / Orange / Yellow or equivalent severity)\n- Recommended Revisions\n- Residual Concerns\n\nOptimize for concise, actionable findings that the main workflow can interpret and decide on.",
56
+ "agentRole": "You are delivering a review artifact for the main workflow to synthesize and act on.",
57
+ "requireConfirmation": false
58
+ }
59
+ ]
60
+ }
@@ -0,0 +1,62 @@
1
+ {
2
+ "id": "routine-final-verification",
3
+ "name": "Final Verification Routine",
4
+ "version": "1.0.0",
5
+ "description": "Performs reusable final verification over acceptance criteria, invariants, validation evidence, regressions, cumulative drift, and philosophy alignment. Produces a proof-oriented verification artifact built around claim -> evidence -> gap -> severity -> readiness verdict.",
6
+ "clarificationPrompts": [
7
+ "What implementation or slices should I verify?",
8
+ "What acceptance criteria and invariants must hold?",
9
+ "What validation commands or evidence are available?",
10
+ "What artifact name should I produce?"
11
+ ],
12
+ "preconditions": [
13
+ "Implementation is available for review",
14
+ "Acceptance criteria are available",
15
+ "Invariants are available",
16
+ "A deterministic validation path exists"
17
+ ],
18
+ "metaGuidance": [
19
+ "PURPOSE: verify whether the whole task is truly done, not just locally green.",
20
+ "ROLE: you are a verifier proving or disproving readiness using evidence.",
21
+ "PROOF: map every readiness claim to tests, build output, artifacts, or explicit reasoning.",
22
+ "SEVERITY: every gap should be classified clearly so the caller knows what blocks shipping.",
23
+ "DRIFT: look for cumulative compromise, not just isolated defects."
24
+ ],
25
+ "steps": [
26
+ {
27
+ "id": "step-map-claims-to-proof",
28
+ "title": "Step 1: Map Acceptance Criteria and Invariants to Proof",
29
+ "prompt": "Map the implementation's readiness claims to concrete proof.\n\nFor each acceptance criterion and invariant:\n- state the claim clearly\n- identify the strongest supporting evidence (test, build output, artifact, code reasoning)\n- note whether the proof is strong, partial, or missing\n- record any gap that prevents the claim from being fully proven\n\nWorking notes:\n- Claim -> proof matrix\n- Strong / partial / missing proof\n- Gaps that weaken readiness",
30
+ "agentRole": "You are a verifier mapping claims to concrete proof.",
31
+ "requireConfirmation": false
32
+ },
33
+ {
34
+ "id": "step-review-validation-evidence",
35
+ "title": "Step 2: Review Validation Evidence Quality",
36
+ "prompt": "Review the overall validation evidence quality.\n\nCheck:\n- whether the right validation commands were run\n- whether the evidence is trustworthy and sufficient for readiness\n- whether any critical area has only weak or indirect proof\n- whether additional validation would materially change confidence\n\nWorking notes:\n- Validation commands reviewed\n- Evidence strength assessment\n- Missing or weak proof",
37
+ "agentRole": "You are checking whether the validation story is actually strong enough to trust.",
38
+ "requireConfirmation": false
39
+ },
40
+ {
41
+ "id": "step-classify-gaps-and-regressions",
42
+ "title": "Step 3: Classify Gaps, Regressions, and Drift by Severity",
43
+ "prompt": "Review the implementation for regressions, drift, and unresolved gaps.\n\nCheck:\n- invariant violations or regressions\n- whether repeated small compromises added up to a larger pattern problem\n- whether the implementation still matches intended plan boundaries\n- whether any proof gaps should block shipping versus merely lower confidence\n\nClassify each issue by severity:\n- Red: blocks readiness\n- Orange: should be fixed before shipping if possible\n- Yellow: acceptable tension or bounded follow-up\n\nWorking notes:\n- Regressions found\n- Drift assessment\n- Severity-classified gaps",
44
+ "agentRole": "You are looking for the subtle ways a task can go wrong even when individual slices seemed fine.",
45
+ "requireConfirmation": false
46
+ },
47
+ {
48
+ "id": "step-review-philosophy",
49
+ "title": "Step 4: Review Philosophy Alignment",
50
+ "prompt": "Review the final result against the dev's philosophy.\n\nAssess:\n- which principles are clearly satisfied\n- which tensions remain intentionally accepted\n- which philosophy violations should be severity Red, Orange, or Yellow\n- whether any philosophy concern changes the readiness verdict\n\nWorking notes:\n- Satisfied principles\n- Accepted tensions\n- Severity-classified philosophy concerns",
51
+ "agentRole": "You are checking whether the finished result still reflects the dev's standards.",
52
+ "requireConfirmation": false
53
+ },
54
+ {
55
+ "id": "step-deliver",
56
+ "title": "Step 5: Deliver Final Verification Findings",
57
+ "prompt": "Create `{deliverableName}`.\n\nRequired structure:\n- Readiness Claims and Proof Matrix\n - claim\n - supporting evidence\n - proof strength (strong / partial / missing)\n - proof gap\n- Validation Evidence Summary\n- Severity-Classified Gaps\n - Red (blocking)\n - Orange (should fix)\n - Yellow (accepted tension / follow-up)\n- Regression / Drift Review\n- Philosophy Alignment\n- Recommended Fixes\n- Readiness Verdict\n - Ready\n - Ready with Accepted Tensions\n - Not Ready\n\nOptimize for a compact artifact the main workflow can use to decide whether to fix, re-verify, or hand off.",
58
+ "agentRole": "You are delivering a verification artifact the main workflow can interpret and act on.",
59
+ "requireConfirmation": false
60
+ }
61
+ ]
62
+ }
@@ -1,113 +1,69 @@
1
1
  {
2
2
  "id": "routine-hypothesis-challenge",
3
3
  "name": "Hypothesis Challenge Routine",
4
- "version": "1.1.0",
5
- "description": "Adversarial testing of hypotheses using an Ideate -> Plan -> Execute strategy. Configurable rigor levels (1-5) allow for progressively deeper skepticism and stress testing.",
4
+ "version": "1.0.0",
5
+ "description": "Lean adversarial review of a hypothesis, recommendation, or diagnosis. Produces the strongest counter-argument, exposes weak assumptions and evidence gaps, identifies likely failure modes, and defines the critical tests needed to keep, revise, or reject the current claim.",
6
6
  "clarificationPrompts": [
7
- "What hypotheses or assumptions should I challenge?",
8
- "What rigor level do you need? (1=Surface, 3=Deep, 5=Maximum)",
9
- "What evidence supports these hypotheses?",
10
- "What context should I consider? (bug description, findings, constraints)"
7
+ "What hypothesis, recommendation, or diagnosis should I challenge?",
8
+ "What evidence currently supports it?",
9
+ "What depth do you need? (QUICK / STANDARD / THOROUGH; legacy rigor=1/3/5 is still accepted for compatibility)",
10
+ "What artifact name should I produce?"
11
11
  ],
12
12
  "preconditions": [
13
- "Hypotheses or assumptions are clearly stated",
14
- "Rigor level (1-5) is specified",
15
- "Supporting evidence is available",
16
- "Agent has read access to relevant context"
13
+ "A target hypothesis, recommendation, or diagnosis is available",
14
+ "Supporting evidence or reasoning is available",
15
+ "Relevant context is available for challenge"
17
16
  ],
18
17
  "metaGuidance": [
19
- "**ROUTINE PURPOSE:**",
20
- "This routine performs adversarial testing. It separates strategy (Ideating attack vectors) from execution (Challenging/Stress Testing).",
21
- "**PHASES:**",
22
- "1. IDEATE: Brainstorm attack vectors and common flaws",
23
- "2. STRATEGIZE: Define the challenge plan for the requested rigor",
24
- "3. EXECUTE: Run the challenge (Surface/Deep/Max)",
25
- "4. SYNTHESIZE: Deliver verdicts and alternative explanations",
26
- "**CORE PRINCIPLES:**",
27
- "- ADVERSARIAL: Actively try to disprove, don't confirm",
28
- "- SYSTEMATIC: Challenge assumptions, logic, and evidence",
29
- "- CONSTRUCTIVE: Goal is to strengthen truth, not just destroy"
18
+ "PURPOSE: strengthen truth by trying to break the current story.",
19
+ "ROLE: you are an adversarial reviewer, not a neutral summarizer.",
20
+ "SCOPE: challenge assumptions, evidence quality, and likely failure modes.",
21
+ "DISCIPLINE: produce concrete counter-arguments and critical tests, not vague skepticism.",
22
+ "DEPTH: QUICK = strongest counter only; STANDARD = counter + failure-mode review; THOROUGH = add alternative explanations and sharper discrimination tests.",
23
+ "COMPATIBILITY: prefer depth language (QUICK / STANDARD / THOROUGH). Treat legacy rigor values as adapter input, not the primary model."
30
24
  ],
31
25
  "steps": [
32
26
  {
33
- "id": "step-0-ideate-vectors",
34
- "title": "Step 0: Ideate Challenge Strategy",
35
- "prompt": "**IDEATE ATTACK VECTORS**\n\nBefore diving into specific hypotheses, step back and look at the whole picture.\n\n**YOUR MISSION:** Brainstorm ways to break these hypotheses.\n\n**EXECUTE:**\n1. Review all hypotheses together\n2. Identify shared assumptions (do they all assume X?)\n3. Brainstorm classes of failure (Concurrency? State? Logic?)\n4. Identify weak points in the evidence provided\n\n**REFLECT:**\n- Are these hypotheses too similar?\n- Are they missing a whole category of explanation?\n- What is the most likely \"unknown unknown\"?\n\n**WORKING NOTES:**\n- Common Assumptions\n- Potential Attack Vectors (e.g., Race Conditions, Edge Cases)\n- Evidence Weaknesses",
36
- "agentRole": "You are a red-team strategist planning your attack.",
37
- "requireConfirmation": false,
38
- "guidance": [
39
- "BRAINSTORM: Look for systemic issues first",
40
- "CATEGORIES: Think in categories (Logic, Data, Timing, Environment)",
41
- "SKEPTICISM: Assume the hypotheses are wrong. Why?"
42
- ]
27
+ "id": "step-load-target",
28
+ "title": "Step 1: Load the Target Claim and Evidence",
29
+ "prompt": "Load the current claim you are challenging.\n\nCapture:\n- the target claim in one sentence\n- the main assumptions it depends on\n- the strongest supporting evidence currently available\n- what result would count as meaningful disproof\n\nKeep this step compact and precise. The goal is to define exactly what is under challenge and what would falsify it.",
30
+ "agentRole": "You are defining exactly what claim is on trial and what would falsify it.",
31
+ "requireConfirmation": false
43
32
  },
44
33
  {
45
- "id": "step-1-plan-challenge",
46
- "title": "Step 1: Plan Challenge Tactics",
47
- "prompt": "**DEFINE CHALLENGE PLAN**\n\nNow define your specific tactics for the requested rigor level.\n\n**YOUR MISSION:** Create a concrete plan to test these hypotheses.\n\n**EXECUTE:**\n1. Map attack vectors to specific hypotheses\n2. Define **Key Questions** to answer\n3. Select **Stress Tests** or **Counter-Examples** to search for\n4. Define criteria for \"disproof\"\n\n**DELIVERABLE:**\nCreate `challenge-strategy.md`:\n- Attack Plan for each Hypothesis\n- Key Assumptions to Probe\n- Required Evidence check",
48
- "agentRole": "You are a lead auditor defining the scope of the audit.",
49
- "requireConfirmation": false,
50
- "guidance": [
51
- "TACTICS: Be specific (e.g., \"Check for null user in auth flow\")",
52
- "CRITERIA: What would convince you the hypothesis is false?"
53
- ]
34
+ "id": "step-break-claim",
35
+ "title": "Step 2: Find the Strongest Counter-Argument",
36
+ "prompt": "Find the strongest case against the current claim.\n\nChallenge it by asking:\n- What is the strongest counter-argument or competing explanation?\n- What evidence could be interpreted differently?\n- What hidden assumption is carrying too much weight?\n- What would a sharp skeptic say first?\n\nOptimize for the single strongest attack, not a long list of weak objections.",
37
+ "agentRole": "You are a sharp skeptic trying to overturn the current favorite with the strongest available attack.",
38
+ "requireConfirmation": false
54
39
  },
55
40
  {
56
- "id": "step-execute-rigor-1",
57
- "title": "Execution: Rigor 1 (Surface)",
58
- "runCondition": {
59
- "var": "rigor",
60
- "gte": 1
61
- },
62
- "prompt": "**EXECUTE RIGOR 1: SURFACE CHALLENGE**\n\nExecute your challenge plan at Rigor 1 (Surface).\n\n**MISSION:** Identify obvious flaws and simple counter-examples.\n\n**EXECUTE:**\n1. Follow `challenge-strategy.md`\n2. Check for obvious logical gaps\n3. Identify simple counter-examples\n4. Check for Occam's Razor alternatives\n\n**WORKING NOTES:**\n- Obvious Flaws\n- Simple Counter-Examples\n- Better Alternatives",
63
- "agentRole": "You are a skeptical reviewer looking for quick wins.",
64
- "requireConfirmation": false,
65
- "guidance": [
66
- "FOCUS: Obvious errors, simple logic gaps",
67
- "SPEED: Don't dig deep yet, look for low-hanging fruit"
68
- ]
69
- },
70
- {
71
- "id": "step-execute-rigor-3",
72
- "title": "Execution: Rigor 3 (Deep)",
73
- "runCondition": {
74
- "var": "rigor",
75
- "gte": 3
76
- },
77
- "prompt": "**EXECUTE RIGOR 3: DEEP CHALLENGE**\n\nExecute your challenge plan at Rigor 3 (Deep Analysis).\n\n**MISSION:** Deeply challenge with edge cases and hidden assumptions.\n\n**EXECUTE:**\n1. Follow `challenge-strategy.md`\n2. Expose hidden assumptions\n3. Generate systematic edge cases\n4. Analyze timing and environment factors\n\n**WORKING NOTES:**\n- Hidden Assumptions Exposed\n- Edge Case Analysis\n- Environmental Factors",
78
- "agentRole": "You are a rigorous auditor digging for structural flaws.",
79
- "requireConfirmation": false,
80
- "guidance": [
81
- "FOCUS: Unstated assumptions, boundary conditions",
82
- "DEPTH: trace logic chains to find breaks"
83
- ]
41
+ "id": "step-review-failure-modes",
42
+ "title": "Step 3: Review Weak Evidence and Likely Failure Modes",
43
+ "prompt": "Probe where the current claim could fail under realistic pressure.\n\nReview:\n- the weakest part of the evidence chain\n- the most likely failure modes if the claim is wrong\n- edge cases or environmental factors that could invalidate the conclusion\n- contradictions, unexplained facts, or missing proof\n\nFocus on the few things most likely to flip the conclusion rather than exhaustive enumeration.",
44
+ "agentRole": "You are testing whether the current story survives realistic pressure and real evidence quality.",
45
+ "requireConfirmation": false
84
46
  },
85
47
  {
86
- "id": "step-execute-rigor-5",
87
- "title": "Execution: Rigor 5 (Maximum)",
48
+ "id": "step-thorough-alternatives",
49
+ "title": "Step 4: Generate Alternative Explanations and Critical Tests",
88
50
  "runCondition": {
89
- "var": "rigor",
90
- "gte": 5
51
+ "or": [
52
+ { "var": "depth", "equals": "THOROUGH" },
53
+ { "var": "rigorMode", "equals": "THOROUGH" },
54
+ { "var": "rigor", "gte": 5 }
55
+ ]
91
56
  },
92
- "prompt": "**EXECUTE RIGOR 5: MAX CHALLENGE**\n\nExecute your challenge plan at Rigor 5 (Maximum Skepticism).\n\n**MISSION:** Try to break it completely.\n\n**EXECUTE:**\n1. Follow `challenge-strategy.md`\n2. Exhaustive assumption enumeration\n3. Extreme edge cases and adversarial inputs\n4. Second-order effects and chaos scenarios\n\n**WORKING NOTES:**\n- Exhaustive Challenges\n- Extreme Scenarios\n- Disproof Attempts",
93
- "agentRole": "You are a relentless adversary trying to prove it wrong.",
94
- "requireConfirmation": false,
95
- "guidance": [
96
- "FOCUS: Breaking the system, extreme edge cases",
97
- "MINDSET: Trust nothing, verify everything"
98
- ]
57
+ "prompt": "For THOROUGH review, go beyond the primary counter-argument into alternatives and discrimination strategy.\n\nProduce:\n- the 1-2 strongest alternative explanations or competing hypotheses\n- why each might beat the current claim\n- the critical tests, observations, or traces that would discriminate between them\n- what result would cause you to keep, revise, or reject the current claim\n\nThis step exists to make THOROUGH meaningfully deeper than STANDARD, not just wordier.",
58
+ "agentRole": "You are building the shortest path to proving which explanation survives.",
59
+ "requireConfirmation": false
99
60
  },
100
61
  {
101
- "id": "step-synthesize",
102
- "title": "Step 5: Synthesize Verdicts",
103
- "prompt": "**SYNTHESIZE VERDICTS**\n\nSynthesize all challenges into final verdicts.\n\n**MISSION:** Deliver clear judgments on each hypothesis.\n\n**EXECUTE:**\n1. Review all Working Notes\n2. Assign Verdicts (Keep/Revise/Reject)\n3. Prioritize Alternatives\n4. Define Critical Tests\n\n**DELIVERABLE:**\nCreate `{deliverableName}`:\n- Executive Summary\n- Hypothesis Analysis & Verdicts\n- Critical Tests Needed\n- Recommendations",
104
- "agentRole": "You are a judge delivering the final verdict.",
105
- "requireConfirmation": false,
106
- "guidance": [
107
- "VERDICTS: Be decisive based on evidence",
108
- "ALTERNATIVES: Propose concrete, better explanations",
109
- "ACTION: What specifically needs to be tested?"
110
- ]
62
+ "id": "step-deliver",
63
+ "title": "Step 5: Deliver the Challenge Verdict",
64
+ "prompt": "Create `{deliverableName}`.\n\nRequired structure:\n- Target Claim\n- Strongest Counter-Argument\n- Weak Assumptions / Evidence Gaps\n- Likely Failure Modes\n- Alternative Explanations (if explored)\n- Critical Tests\n- Verdict: Keep / Revise / Reject\n- Next Action\n\nOptimize for a compact artifact that a main workflow can interrogate and act on immediately. Prefer decisive arguments over exhaustive ceremony.",
65
+ "agentRole": "You are delivering a decisive challenge artifact for the main workflow or caller to synthesize.",
66
+ "requireConfirmation": false
111
67
  }
112
68
  ]
113
- }
69
+ }