@exaudeus/workrail 3.77.0 → 3.78.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/application/services/compiler/ref-registry.js +2 -1
- package/dist/console-ui/assets/{index-D9pYbwS0.js → index-CtQZQTW-.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/pipeline-run-context.d.ts +18 -0
- package/dist/daemon/core/session-context.js +4 -7
- package/dist/daemon/types.d.ts +4 -4
- package/dist/manifest.json +19 -19
- package/dist/v2/durable-core/schemas/artifacts/discovery-handoff.d.ts +3 -0
- package/dist/v2/durable-core/schemas/artifacts/discovery-handoff.js +1 -0
- package/dist/v2/usecases/console-service.js +15 -4
- package/dist/v2/usecases/console-types.d.ts +3 -0
- package/docs/ideas/backlog.md +43 -32
- package/package.json +1 -1
- package/workflows/routines/hypothesis-challenge.json +2 -2
- package/workflows/wr.discovery.json +219 -88
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"id": "wr.discovery",
|
|
3
3
|
"name": "Discovery Workflow",
|
|
4
|
-
"version": "3.
|
|
4
|
+
"version": "3.5.0",
|
|
5
5
|
"metricsProfile": "research",
|
|
6
6
|
"validatedAgainstSpecVersion": 3,
|
|
7
7
|
"description": "Use this to explore and think through a problem end-to-end. Moves between landscape exploration, problem framing, candidate generation, adversarial challenge, and uncertainty resolution.",
|
|
@@ -21,6 +21,20 @@
|
|
|
21
21
|
"wr.features.subagent_guidance"
|
|
22
22
|
],
|
|
23
23
|
"assessments": [
|
|
24
|
+
{
|
|
25
|
+
"id": "framing-specificity-gate",
|
|
26
|
+
"purpose": "The problem frame has a specific falsification condition before entering candidate generation. Lighter gate used on the landscape_first path.",
|
|
27
|
+
"dimensions": [
|
|
28
|
+
{
|
|
29
|
+
"id": "framing_specificity",
|
|
30
|
+
"purpose": "A concrete falsification condition is named -- one thing that if discovered true would change the path or direction. Generic caveats do not count.",
|
|
31
|
+
"levels": [
|
|
32
|
+
"vague",
|
|
33
|
+
"specific"
|
|
34
|
+
]
|
|
35
|
+
}
|
|
36
|
+
]
|
|
37
|
+
},
|
|
24
38
|
{
|
|
25
39
|
"id": "framing-rigor-gate",
|
|
26
40
|
"purpose": "The problem frame is specific enough to constrain candidate generation -- grounded with a concrete falsification condition and independent of the proposed solution.",
|
|
@@ -106,7 +120,14 @@
|
|
|
106
120
|
"Parallelism: parallelize independent research lenses, stakeholder lenses, and bounded cognitive routines. Serialize synthesis, recommendation decisions, and canonical document writes.",
|
|
107
121
|
"Capability fallbacks: if web browsing is unavailable, use repo context, user context, and internal knowledge, then record the evidence gaps explicitly. If delegation is unavailable, do the passes yourself in sequence.",
|
|
108
122
|
"Doc discipline: keep append-only decision reasoning in the design doc. Record assumptions, contradictions, abandoned paths, and why the selected direction won.",
|
|
109
|
-
"Boundary: this workflow can end with a recommendation memo, prototype or test plan, or a research-informed direction. It should not implement production code."
|
|
123
|
+
"Boundary: this workflow can end with a recommendation memo, prototype or test plan, or a research-informed direction. It should not implement production code.",
|
|
124
|
+
"Framing lock-in: no LLM-driven reframing before landscape research -- no quality gain was found (N=280). Pre-landscape framing lock-in is a known residual risk. Phase 0a surfaces assumptions but does not guarantee the frame is correct.",
|
|
125
|
+
"FrameValidityCheck in Phase 1g is the structural defense: after landscape research, check whether new information challenges the current frame before generating candidates.",
|
|
126
|
+
"Autonomous operation: in fully autonomous runs without human confirmation gates, the recommendation is provisional. Surface selectionTier from Phase 3e to the user. QUICK mode always produces provisional_recommendation regardless of observed signals.",
|
|
127
|
+
"Confidence calibration: recommendationConfidenceBand can be downgraded but never upgraded once a phase is complete. If challenge or review reveals new uncertainty, downgrade. Upstream findings cannot be washed out by downstream optimism.",
|
|
128
|
+
"Cross-family challenge: use a verified different model family for challenge/selection executors where possible. If cross-family cannot be verified, default to Rung 2 (same-family steelmanning) -- conservative degradation over optimistic claim.",
|
|
129
|
+
"Challenge quality self-judgment: structural/tactical/surface classification is self-produced -- a known residual risk. New evidence justifying a position change must reference a specific finding already recorded in designDocPath or a candidate output file.",
|
|
130
|
+
"Position-bias correction: the order-randomization check in Phase 3e is a prompt-level approximation -- it reduces but does not eliminate position bias. Structural randomization requires future engine support."
|
|
110
131
|
],
|
|
111
132
|
"functionDefinitions": [
|
|
112
133
|
{
|
|
@@ -120,14 +141,30 @@
|
|
|
120
141
|
{
|
|
121
142
|
"name": "prototypeSpecTemplate",
|
|
122
143
|
"definition": "Use this shape for a lightweight prototype or validation artifact:\n## Goal\n## Non-goals\n## Learning question\n## Artifact type\n## What will be exercised\n## Falsification criteria\n## Test scenarios\n## Expected signals\n## Pivot / stop rule"
|
|
144
|
+
},
|
|
145
|
+
{
|
|
146
|
+
"name": "FrameValidityCheckTemplate",
|
|
147
|
+
"definition": "Use this shape when performing a FrameValidityCheck:\n## Frame Validity Check\n- Current frame (one sentence)\n- New information from landscape/research not present when frame was set\n- frameChallenge: valid | partial | needs_reframe\n- reframeRequired: { revisedFrame: string, changedAssumptions: string[] } | null\nNote: frameChallenge = 'needs_reframe' with reframeRequired = null is structurally invalid -- if the frame needs revision, provide the revision before proceeding."
|
|
148
|
+
},
|
|
149
|
+
{
|
|
150
|
+
"name": "SelectionOutputTemplate",
|
|
151
|
+
"definition": "Use this shape for the SelectionOutput (structured output value, not a doc section):\n## Selection Output\n- tier: strong_recommendation | provisional_recommendation | insufficient_signal\n- tierRationale: explicit reasoning checkable against the signals below\n- challengeQualityObserved: structural | tactical | surface\n- semanticDiversityObserved: high | medium | low\n- framingResolutionStatus: resolved | partial | unresolved\n- requiredActionOnInsufficient: what the user must do before acting (null if not insufficient_signal)\nTier rules:\n strong_recommendation: structural challenge + high/medium diversity + resolved framing\n provisional_recommendation: tactical challenge OR medium diversity OR partial framing\n insufficient_signal: surface challenge OR low diversity OR unresolved framing OR single-family with no verified external grounding OR QUICK mode (always provisional)\nNote: tier assignment uses only observable signals -- not self-reported confidence."
|
|
152
|
+
},
|
|
153
|
+
{
|
|
154
|
+
"name": "SelectionEvidenceTemplate",
|
|
155
|
+
"definition": "Use this shape for the fresh-context selection executor's return artifact:\n## Selection Evidence\n- Candidate ranking (ordered, one-sentence rationale per candidate)\n- challengeQualityObserved: structural | tactical | surface\n- semanticDiversityObserved: high | medium | low\n- framingResolutionStatus: resolved | partial | unresolved\n- keyReasoning: what most influenced the ranking\n- caveat: any limitation (single-family, no external grounding, model family unverified, etc.)"
|
|
156
|
+
},
|
|
157
|
+
{
|
|
158
|
+
"name": "framingDeepProcedureTemplate",
|
|
159
|
+
"definition": "Shared framing procedure (used by full_spectrum and design_first paths):\n- Capture the tensions as a structured list -- not just a count. Record identifiedTensions as an array of one-sentence tension descriptions. This list is passed verbatim to candidate generation executors.\n- Name ONE specific concrete condition that would make the current framing wrong -- not a generic caveat, but a specific thing that if discovered true would change the path or direction. Record as primaryFramingRisk in the design doc.\n- Record philosophySources as a list of file paths or Memory entry names encoding the decision-maker's principles and constraints. (Domain-specific lookup rules per problemDomain.) Empty list is acceptable if none exist."
|
|
123
160
|
}
|
|
124
161
|
],
|
|
125
162
|
"steps": [
|
|
126
163
|
{
|
|
127
164
|
"id": "phase-0-reframe",
|
|
128
|
-
"title": "Phase 0a:
|
|
165
|
+
"title": "Phase 0a: Surface Assumptions and Define Success",
|
|
129
166
|
"promptBlocks": {
|
|
130
|
-
"goal": "
|
|
167
|
+
"goal": "Surface the assumptions baked into the stated goal, commit to what success looks like, and identify what would make the current framing wrong -- before any research begins. This is an assumption-commitment step: the output is a set of explicit commitments the rest of the workflow is held to, not a claim that reframing has improved the problem statement.",
|
|
131
168
|
"constraints": [
|
|
132
169
|
[
|
|
133
170
|
{
|
|
@@ -154,7 +191,8 @@
|
|
|
154
191
|
"The success criteria would let a skeptic determine whether the work actually succeeded.",
|
|
155
192
|
"If the goal was a solution-statement, the underlying problem is stated independently of the proposed solution.",
|
|
156
193
|
"The reframed problem is meaningfully different from the original goal wording, or you have explicitly noted why the original framing was already problem-shaped.",
|
|
157
|
-
"`idealEndState` describes the best achievable outcome, not the most defensible one. If they are the same, say so explicitly."
|
|
194
|
+
"`idealEndState` describes the best achievable outcome, not the most defensible one. If they are the same, say so explicitly.",
|
|
195
|
+
"For each alternative frame produced: name one candidate solution it would generate that the original frame would not. If you cannot name such a candidate, the frames are not substantively different -- iterate until they are."
|
|
158
196
|
]
|
|
159
197
|
},
|
|
160
198
|
"requireConfirmation": false
|
|
@@ -258,7 +296,8 @@
|
|
|
258
296
|
"Gather a current-state summary, existing approaches or precedents, option categories, notable contradictions, strong constraints from the world, and evidence gaps.",
|
|
259
297
|
"If `delegationAvailable = true`, decide whether parallel research is likely to give you a better read here. If yes, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize the outputs yourself. If not, keep going yourself and record why solo work is enough.",
|
|
260
298
|
"Update `designDocPath` using `landscapePacketTemplate`.",
|
|
261
|
-
"Set these keys in the next `continue_workflow` call's `context` object: `landscapeSummary`, `landscapeGapCount`, `contradictionCount`, `precedentCount`, `retriageNeeded`."
|
|
299
|
+
"Set these keys in the next `continue_workflow` call's `context` object: `landscapeSummary`, `landscapeGapCount`, `contradictionCount`, `precedentCount`, `retriageNeeded`.",
|
|
300
|
+
"Before proceeding: confirm landscape findings are recorded in `designDocPath`. Do not begin synthesis or candidate generation until the landscape packet is committed to the doc."
|
|
262
301
|
],
|
|
263
302
|
"verify": [
|
|
264
303
|
"Important contradictions or evidence gaps are explicit.",
|
|
@@ -293,7 +332,8 @@
|
|
|
293
332
|
"Gather a current-state summary, existing approaches or precedents, option categories, notable contradictions, strong constraints from the world, and evidence gaps.",
|
|
294
333
|
"If `delegationAvailable = true` and `rigorMode != QUICK`, decide whether parallel research is likely to sharpen the landscape meaningfully. If yes, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize the outputs yourself. If not, keep going yourself and record why solo work is enough.",
|
|
295
334
|
"Update `designDocPath` using `landscapePacketTemplate`.",
|
|
296
|
-
"Set these keys in the next `continue_workflow` call's `context` object: `landscapeSummary`, `landscapeGapCount`, `contradictionCount`, `precedentCount`, `retriageNeeded`."
|
|
335
|
+
"Set these keys in the next `continue_workflow` call's `context` object: `landscapeSummary`, `landscapeGapCount`, `contradictionCount`, `precedentCount`, `retriageNeeded`.",
|
|
336
|
+
"Before proceeding: confirm landscape findings are recorded in `designDocPath`. Do not begin synthesis or candidate generation until the landscape packet is committed to the doc."
|
|
297
337
|
],
|
|
298
338
|
"verify": [
|
|
299
339
|
"Important contradictions or evidence gaps are explicit.",
|
|
@@ -328,7 +368,8 @@
|
|
|
328
368
|
"Gather a current-state summary, the main existing approaches or precedents, hard constraints from the world, obvious contradictions, and evidence gaps.",
|
|
329
369
|
"If `delegationAvailable = true` and `rigorMode = THOROUGH`, decide whether a parallel scan is worth the extra step. If yes, spawn bounded research support and synthesize it yourself. If not, keep going yourself and record why solo work is enough.",
|
|
330
370
|
"Update `designDocPath` using `landscapePacketTemplate`.",
|
|
331
|
-
"Set these keys in the next `continue_workflow` call's `context` object: `landscapeSummary`, `landscapeGapCount`, `contradictionCount`, `precedentCount`, `retriageNeeded`."
|
|
371
|
+
"Set these keys in the next `continue_workflow` call's `context` object: `landscapeSummary`, `landscapeGapCount`, `contradictionCount`, `precedentCount`, `retriageNeeded`.",
|
|
372
|
+
"Before proceeding: confirm landscape findings are recorded in `designDocPath`. Do not begin synthesis or candidate generation until the landscape packet is committed to the doc."
|
|
332
373
|
],
|
|
333
374
|
"verify": [
|
|
334
375
|
"The design-first path is still grounded in reality rather than invention alone.",
|
|
@@ -370,6 +411,20 @@
|
|
|
370
411
|
"The stakeholder/problem packet covers the core tension, primary stakeholders, success criteria, and framing risk."
|
|
371
412
|
]
|
|
372
413
|
},
|
|
414
|
+
"assessmentRefs": [
|
|
415
|
+
"framing-specificity-gate"
|
|
416
|
+
],
|
|
417
|
+
"assessmentConsequences": [
|
|
418
|
+
{
|
|
419
|
+
"when": {
|
|
420
|
+
"anyEqualsLevel": "vague"
|
|
421
|
+
},
|
|
422
|
+
"effect": {
|
|
423
|
+
"kind": "require_followup",
|
|
424
|
+
"guidance": "framing_specificity = vague: name a single concrete thing that, if discovered true, would change the path or direction chosen. Generic caveats do not count."
|
|
425
|
+
}
|
|
426
|
+
}
|
|
427
|
+
],
|
|
373
428
|
"functionReferences": [
|
|
374
429
|
"problemFrameTemplate()"
|
|
375
430
|
],
|
|
@@ -413,6 +468,17 @@
|
|
|
413
468
|
"`philosophySources` is recorded -- empty list is acceptable if none exist."
|
|
414
469
|
]
|
|
415
470
|
},
|
|
471
|
+
"assessmentConsequences": [
|
|
472
|
+
{
|
|
473
|
+
"when": {
|
|
474
|
+
"anyEqualsLevel": "vague"
|
|
475
|
+
},
|
|
476
|
+
"effect": {
|
|
477
|
+
"kind": "require_followup",
|
|
478
|
+
"guidance": "framing_specificity = vague: name a concrete falsification condition. framing_independence = solution_embedded: restate the problem without the proposed solution -- what outcome or pain are you solving for?"
|
|
479
|
+
}
|
|
480
|
+
}
|
|
481
|
+
],
|
|
416
482
|
"functionReferences": [
|
|
417
483
|
"problemFrameTemplate()"
|
|
418
484
|
],
|
|
@@ -456,6 +522,17 @@
|
|
|
456
522
|
"`philosophySources` is recorded -- empty list is acceptable if none exist."
|
|
457
523
|
]
|
|
458
524
|
},
|
|
525
|
+
"assessmentConsequences": [
|
|
526
|
+
{
|
|
527
|
+
"when": {
|
|
528
|
+
"anyEqualsLevel": "vague"
|
|
529
|
+
},
|
|
530
|
+
"effect": {
|
|
531
|
+
"kind": "require_followup",
|
|
532
|
+
"guidance": "framing_specificity = vague: name a concrete falsification condition. framing_independence = solution_embedded: restate the problem without the proposed solution -- what outcome or pain are you solving for?"
|
|
533
|
+
}
|
|
534
|
+
}
|
|
535
|
+
],
|
|
459
536
|
"functionReferences": [
|
|
460
537
|
"problemFrameTemplate()"
|
|
461
538
|
],
|
|
@@ -463,38 +540,27 @@
|
|
|
463
540
|
},
|
|
464
541
|
{
|
|
465
542
|
"id": "phase-1g-retriage",
|
|
466
|
-
"title": "Phase 1g: Re-Triage
|
|
467
|
-
"runCondition": {
|
|
468
|
-
"or": [
|
|
469
|
-
{
|
|
470
|
-
"var": "retriageNeeded",
|
|
471
|
-
"equals": true
|
|
472
|
-
},
|
|
473
|
-
{
|
|
474
|
-
"var": "pathRecommendation",
|
|
475
|
-
"equals": "design_first"
|
|
476
|
-
},
|
|
477
|
-
{
|
|
478
|
-
"var": "pathRecommendation",
|
|
479
|
-
"equals": "full_spectrum"
|
|
480
|
-
}
|
|
481
|
-
]
|
|
482
|
-
},
|
|
543
|
+
"title": "Phase 1g: Re-Triage and Frame Validity Check",
|
|
483
544
|
"promptBlocks": {
|
|
484
|
-
"goal": "
|
|
545
|
+
"goal": "Check whether the landscape and framing work revealed anything that challenges the current problem frame, then reassess the path with real evidence instead of just the initial wording.",
|
|
485
546
|
"constraints": [
|
|
486
547
|
"Base the re-triage on actual evidence gathered so far, not on the original default path alone.",
|
|
487
|
-
"Only change the path or rigor when the center of gravity has materially shifted."
|
|
548
|
+
"Only change the path or rigor when the center of gravity has materially shifted.",
|
|
549
|
+
"The FrameValidityCheck is required on every path -- do not skip it."
|
|
488
550
|
],
|
|
489
551
|
"procedure": [
|
|
552
|
+
"Perform a FrameValidityCheck using `FrameValidityCheckTemplate`: what did the landscape and framing work reveal that was not present in the frame set in Phase 0a? Does any new information challenge the current problem frame? Set `frameChallenge`: valid | partial | needs_reframe. If needs_reframe: provide `reframeRequired` with `revisedFrame` and `changedAssumptions`; record the revised frame in `designDocPath` before proceeding. If valid or partial: proceed, recording any partial concerns in `designDocPath`.",
|
|
553
|
+
"Set `framingResolutionStatus` based on FrameValidityCheck result: `resolved` (frameChallenge = valid), `partial` (frameChallenge = partial), `unresolved` (frameChallenge = needs_reframe and not yet resolved).",
|
|
490
554
|
"Review whether the dominant uncertainty is still what you thought it was at the start.",
|
|
491
555
|
"Review whether `landscapeGapCount`, `contradictionCount`, or `framingRiskCount` materially change the center of gravity.",
|
|
492
556
|
"Review whether the task now clearly needs a prototype, more research, or a broader synthesis pass.",
|
|
493
557
|
"Confirm or adjust `pathRecommendation` and `rigorMode`.",
|
|
494
558
|
"Set `pathChangedAfterContext`, `needsPrototype`, and `needsFurtherResearch`.",
|
|
495
|
-
"Set these keys in the next `continue_workflow` call's `context` object: `pathRecommendation`, `rigorMode`, `pathChangedAfterContext`, `needsPrototype`, `needsFurtherResearch`."
|
|
559
|
+
"Set these keys in the next `continue_workflow` call's `context` object: `pathRecommendation`, `rigorMode`, `pathChangedAfterContext`, `needsPrototype`, `needsFurtherResearch`, `framingResolutionStatus`."
|
|
496
560
|
],
|
|
497
561
|
"verify": [
|
|
562
|
+
"FrameValidityCheck was completed and `framingResolutionStatus` is set.",
|
|
563
|
+
"If `frameChallenge = needs_reframe`: `reframeRequired` is provided and the revised frame is recorded in `designDocPath` before proceeding.",
|
|
498
564
|
"Any path change is justified by the evidence gathered so far.",
|
|
499
565
|
"Prototype and research needs are called explicitly rather than left implicit."
|
|
500
566
|
]
|
|
@@ -528,10 +594,12 @@
|
|
|
528
594
|
"State what kind of uncertainty remains: recommendation uncertainty, research uncertainty, or prototype-learning uncertainty."
|
|
529
595
|
],
|
|
530
596
|
"procedure": [
|
|
597
|
+
"Before generating criteria: name the abstract principles at play in this problem independent of any specific candidate or solution. What does a good answer look like in principle, before you know what the candidates are?",
|
|
531
598
|
"Produce a concise synthesis of the opportunity, the 3-5 criteria the final direction must satisfy, the strongest framing risk, and the current best explanation of what success looks like. Also surface which of the `challengedAssumptions` from phase-0a are still unresolved -- any candidate that silently bets on a challenged assumption must name it as a risk.",
|
|
532
599
|
"When generating `decisionCriteria`: criteria that come from constraints and anti-goals are necessary but not sufficient. They are compatibility thresholds -- every viable candidate will satisfy them. At least one criterion must be quality-aspirational: derived from `idealEndState`, not from constraints. Quality-aspirational criteria ask 'which candidate is best?' not 'which candidates pass?'. Examples by domain -- software: 'which requires the fewest changes to add a new phase?', 'which would a senior engineer be proudest of in two years?'; product: 'which best positions us for the market shift we expect in 18 months?', 'which gives us the most learning per unit of investment?'; ux: 'which best serves users who are stressed or in a hurry?', 'which scales to the full vision without a redesign?'; personal: 'which aligns best with the values I've said matter most?', 'which would I regret least in 10 years?'; general: 'which creates the most optionality for future decisions?'. If `solutionAmbitiousness = 'ideal_solution'`, this quality-aspirational criterion is mandatory -- not optional.",
|
|
533
600
|
"Set `candidateCountTarget` adaptively: QUICK = 2-3, STANDARD = 3-4, THOROUGH = 4-5.",
|
|
534
601
|
"Identify and read the overarching vision or north-star document for this problem. For `problemDomain = 'software'`: look for `docs/vision.md`, `VISION.md`, or a product vision section in project docs. For `problemDomain = 'product'`: look for a product strategy doc, company mission, or OKRs. For `problemDomain = 'ux'`: look for a design vision, brand guide, or experience principles. For `problemDomain = 'personal'`: look for stated personal goals or values the user has shared. For `problemDomain = 'general'`: ask yourself what the overarching goal behind this decision is. Then answer: (1) How does solving this problem serve the overarching vision? If it doesn't, say so explicitly. (2) Which long-term goals does this decision need to leave room for? Name at least one seam or constraint the solution must honor. (3) Does any candidate direction foreclose something that matters long-term? Add at least one vision-alignment criterion to `decisionCriteria`. If no vision document exists, use your best understanding of the direction from context.",
|
|
602
|
+
"Record `decisionCriteria` (with explicit weights or priority order) and your interpretation of `idealEndState` in `designDocPath` before proceeding to candidate generation. In Phase 3e you will be required to quote these weights verbatim -- any modification after seeing candidates must be flagged explicitly.",
|
|
535
603
|
"Set these keys in the next `continue_workflow` call's `context` object: `decisionCriteria`, `riskiestAssumption`, `candidateCountTarget`, `needsPrototype`, `needsFurtherResearch`, `pathReady`."
|
|
536
604
|
],
|
|
537
605
|
"verify": [
|
|
@@ -542,34 +610,6 @@
|
|
|
542
610
|
]
|
|
543
611
|
},
|
|
544
612
|
"promptFragments": [
|
|
545
|
-
{
|
|
546
|
-
"id": "p2-challenge-needed",
|
|
547
|
-
"when": {
|
|
548
|
-
"var": "needsChallenge",
|
|
549
|
-
"equals": true
|
|
550
|
-
},
|
|
551
|
-
"text": "Because the framing still needs challenge, produce the strongest case against it yourself before moving on."
|
|
552
|
-
},
|
|
553
|
-
{
|
|
554
|
-
"id": "p2-challenge-deleg",
|
|
555
|
-
"when": {
|
|
556
|
-
"and": [
|
|
557
|
-
{
|
|
558
|
-
"var": "needsChallenge",
|
|
559
|
-
"equals": true
|
|
560
|
-
},
|
|
561
|
-
{
|
|
562
|
-
"var": "delegationAvailable",
|
|
563
|
-
"equals": true
|
|
564
|
-
},
|
|
565
|
-
{
|
|
566
|
-
"var": "rigorMode",
|
|
567
|
-
"not_equals": "QUICK"
|
|
568
|
-
}
|
|
569
|
-
]
|
|
570
|
-
},
|
|
571
|
-
"text": "Also decide whether a delegated challenge is worth the extra step. If yes, spawn ONE WorkRail Executor running `routine-hypothesis-challenge` against the current framing, then synthesize it yourself. If not, record why your own challenge is enough."
|
|
572
|
-
},
|
|
573
613
|
{
|
|
574
614
|
"id": "p2-full-balance",
|
|
575
615
|
"when": {
|
|
@@ -616,7 +656,8 @@
|
|
|
616
656
|
"goal": "Generate enough genuinely distinct candidates for me to support a real choice and write `design-candidates.md`.",
|
|
617
657
|
"constraints": [
|
|
618
658
|
"Create at least 2 materially different candidates, and add another if the set still clusters too tightly.",
|
|
619
|
-
"Include at least one runner-up strong enough that switching to it would feel real, not hypothetical."
|
|
659
|
+
"Include at least one runner-up strong enough that switching to it would feel real, not hypothetical.",
|
|
660
|
+
"Diversity interventions (persona rotation, verbalized sampling) are not applied in QUICK mode. This is intentional -- QUICK accepts lower diversity for speed. The selectionTier will be provisional_recommendation regardless of observed signals."
|
|
620
661
|
],
|
|
621
662
|
"procedure": [
|
|
622
663
|
"Include one simplest plausible direction and one direction that better serves the selected path's emphasis.",
|
|
@@ -626,7 +667,8 @@
|
|
|
626
667
|
],
|
|
627
668
|
"verify": [
|
|
628
669
|
"`design-candidates.md` contains a genuinely distinct quick candidate set rather than shallow variations.",
|
|
629
|
-
"The quick path still leaves a real choice on the table."
|
|
670
|
+
"The quick path still leaves a real choice on the table.",
|
|
671
|
+
"Confirm candidates rely on different primary mechanisms -- not just different labels on the same approach. If they cluster, add one more generation pass before proceeding."
|
|
630
672
|
]
|
|
631
673
|
},
|
|
632
674
|
"promptFragments": [
|
|
@@ -667,13 +709,14 @@
|
|
|
667
709
|
"For `design_first`, require at least one direction that meaningfully reframes the problem instead of only packaging obvious solutions.",
|
|
668
710
|
"For `landscape_first`, require the candidate set to clearly reflect landscape precedents, constraints, and contradictions rather than drifting into free invention.",
|
|
669
711
|
"For `THOROUGH`, require one extra push if the first spread still feels clustered or too safe.",
|
|
670
|
-
"If `delegationAvailable = true` and `rigorMode != QUICK`: assign 2-3 non-overlapping focus angles before spawning anything. Then spawn the corresponding WorkRail Executors SIMULTANEOUSLY, each running `wr.routine-tension-driven-design`. The ONLY way to pass context to each executor is via its goal string -- executors start with clean sessions and cannot read the main agent's context variables. Build a self-contained goal string for each executor using these exact markers
|
|
712
|
+
"If `delegationAvailable = true` and `rigorMode != QUICK`: assign 2-3 non-overlapping focus angles before spawning anything. Then spawn the corresponding WorkRail Executors SIMULTANEOUSLY, each running `wr.routine-tension-driven-design`. The ONLY way to pass context to each executor is via its goal string -- executors start with clean sessions and cannot read the main agent's context variables. Build a self-contained goal string for each executor using these exact markers: 'FOCUS ANGLE: <the assigned angle as a concrete instruction> | PERSONA: <a mundane ordinary persona for this executor, e.g. \"a pragmatic ops engineer who has seen many systems fail\" or \"a product manager who has shipped features that missed the mark\"> | SAMPLING: Generate 5 distinct approaches with the probability (0.0-1.0) that a typical AI assistant would suggest each. Include at least 2 with probability < 0.10. Label each: Approach [N] (p=[probability]): [description]. | PROBLEM: <reframedProblem> | TENSIONS: <bullet list of identified tensions> | CRITERIA: <bullet list of decisionCriteria> | IDEAL END STATE: <idealEndState> | RISKIEST ASSUMPTION: <riskiestAssumption> | PHILOSOPHY: <philosophySources paths or summary> | OUTPUT FILE: design-candidates-angle-N.md'. Assign distinct angles -- concrete examples: (A) 'Anchor every candidate to the ideal end state -- build the best achievable design if effort and scope were no constraint', (B) 'Anchor every candidate to the riskiest assumption -- build the most defensible design that does not bet on any single assumption being true', (C) 'Anchor every candidate to the primary framing risk -- what would the design look like if the current problem framing is wrong?'. After all executors complete, read each executor's output file. Synthesize: (1) Does the final set span from idealEndState to most-defensible? If not, name the gap. (2) Does at least one candidate from each executor's angle survive? If an angle is absent, justify it. (3) Does cross-executor comparison yield any insight no single executor reached? If yes, add it as a new candidate. (4) For each candidate: name its primary mechanism in one sentence. Confirm all primary mechanisms are distinct. If two share a mechanism, merge them into the stronger version and note what was collapsed -- a set where all candidates share a primary mechanism is a failed generation pass. Write the synthesized result to `design-candidates.md`. If `delegationAvailable = false`, generate candidates yourself and record why solo execution was used.",
|
|
671
713
|
"Write these expectations into `designDocPath` so the later synthesis can judge whether the injected routine met them."
|
|
672
714
|
],
|
|
673
715
|
"verify": [
|
|
674
716
|
"The path-specific expectations for candidate generation are explicit before the injected routine runs.",
|
|
675
|
-
"If delegation was used: each executor received a self-contained briefing with a distinct non-overlapping focus angle. The synthesized candidate set reflects main-agent judgment, not a flat concatenation of executor outputs.",
|
|
676
|
-
"If delegation was
|
|
717
|
+
"If delegation was used: each executor received a self-contained briefing with a distinct non-overlapping focus angle and a distinct mundane persona. The synthesized candidate set reflects main-agent judgment, not a flat concatenation of executor outputs.",
|
|
718
|
+
"If delegation was used: the primary mechanism of each candidate is named and confirmed distinct. If two share a mechanism, they were merged before finalizing.",
|
|
719
|
+
"If delegation was skipped: the reason is recorded. After Phase 3c injected routine completes, name the primary mechanism of each generated candidate. Confirm they are distinct. If two share a mechanism, merge and regenerate before proceeding to Phase 3d."
|
|
677
720
|
]
|
|
678
721
|
},
|
|
679
722
|
"promptFragments": [
|
|
@@ -713,12 +756,23 @@
|
|
|
713
756
|
},
|
|
714
757
|
{
|
|
715
758
|
"id": "phase-3d-select-direction",
|
|
716
|
-
"title": "Phase 3d: Challenge
|
|
759
|
+
"title": "Phase 3d: Challenge Direction",
|
|
717
760
|
"assessmentRefs": [
|
|
718
761
|
"candidate-quality-gate"
|
|
719
762
|
],
|
|
763
|
+
"assessmentConsequences": [
|
|
764
|
+
{
|
|
765
|
+
"when": {
|
|
766
|
+
"anyEqualsLevel": "shallow"
|
|
767
|
+
},
|
|
768
|
+
"effect": {
|
|
769
|
+
"kind": "require_followup",
|
|
770
|
+
"guidance": "candidate_distinctness = shallow: name the primary mechanism of each candidate; merge any sharing the same mechanism; generate a replacement. quality_criterion_present = absent: add a criterion from idealEndState asking 'which is best?' not just 'which passes?'"
|
|
771
|
+
}
|
|
772
|
+
}
|
|
773
|
+
],
|
|
720
774
|
"promptBlocks": {
|
|
721
|
-
"goal": "Read `design-candidates.md
|
|
775
|
+
"goal": "Read `design-candidates.md` and challenge the leading option externally before selection. You are challenging, not selecting -- selection happens in Phase 3e.",
|
|
722
776
|
"constraints": [
|
|
723
777
|
[
|
|
724
778
|
{
|
|
@@ -726,32 +780,84 @@
|
|
|
726
780
|
"refId": "wr.refs.adversarial_challenge_rules"
|
|
727
781
|
}
|
|
728
782
|
],
|
|
729
|
-
"
|
|
730
|
-
"
|
|
783
|
+
"Challenge comes before selection. Do not choose a winner in this step.",
|
|
784
|
+
"External challenge is required -- not optional. Use the degradation ladder."
|
|
731
785
|
],
|
|
732
786
|
"procedure": [
|
|
733
|
-
"Compare candidates against `pathRecommendation` and `decisionCriteria
|
|
734
|
-
"
|
|
735
|
-
"
|
|
736
|
-
"
|
|
787
|
+
"Compare candidates against `pathRecommendation` and `decisionCriteria` to identify the leading candidate and strongest alternative.",
|
|
788
|
+
"Challenge the leading candidate using the cross-family degradation ladder: Rung 1 (verified cross-family): spawn a WorkRail Executor from a verified different model family running `routine-hypothesis-challenge` against the leading candidate, providing the leading candidate, `decisionCriteria`, `acceptedTradeoffs`, `identifiedFailureModes`, and the strongest runner-up. Rung 2 (single-family fallback, or if cross-family cannot be verified): spawn a same-family WorkRail Executor running `routine-hypothesis-challenge` with explicit steelmanning instructions -- the executor must construct the strongest possible case for the runner-up, not just find flaws in the leader. Rung 3 (no delegation available): record the gap, note that no external challenge was possible, and set `recommendationConfidenceBand` to at most 'medium'. Record which rung was used.",
|
|
789
|
+
"Classify each challenge finding before acting on it: structural (questions a fundamental assumption -- would require changing the direction), tactical (questions scope or implementation -- changes within the direction), surface (style or completeness -- does not change the direction). Only structural findings should trigger reconsideration of the leading candidate. Tactical findings: record as implementation risks. Surface findings: note and set aside.",
|
|
790
|
+
"Complete this sentence at least 3 times with genuinely distinct reasons: 'This direction will fail if ___.' Record as pre-mortem risks in `designDocPath`. Generic completions ('too complex', 'team bandwidth') do not count.",
|
|
791
|
+
"Record `challengeQualityObserved` (structural | tactical | surface -- the highest-quality finding), `acceptedTradeoffs`, `identifiedFailureModes`, and the rung used.",
|
|
792
|
+
"Set these keys in the next `continue_workflow` call's `context` object: `challengeQualityObserved`, `acceptedTradeoffs`, `identifiedFailureModes`, `hasStrongAlternative`, `needsPrototype`, `needsFurtherResearch`."
|
|
793
|
+
],
|
|
794
|
+
"outputRequired": {
|
|
795
|
+
"notesMarkdown": "Describe the leading candidate, the strongest alternative, what the challenge found, the challenge quality classification, and the rung used."
|
|
796
|
+
},
|
|
797
|
+
"verify": [
|
|
798
|
+
"An external challenge executor was spawned -- not self-produced. The rung used is recorded.",
|
|
799
|
+
"Each challenge finding is classified as structural / tactical / surface before being acted on.",
|
|
800
|
+
"Pre-mortem has at least 3 distinct specific failure conditions -- not generic caveats.",
|
|
801
|
+
"`challengeQualityObserved` is set.",
|
|
802
|
+
"Has any position changed during this step? If yes: the new evidence must reference a specific finding already recorded in a candidate output file or the challenge output -- not reasoning generated in this step. 'On reflection' does not count. If no qualifying evidence, restore the prior position and record: 'Position change without new evidence detected.'"
|
|
803
|
+
]
|
|
804
|
+
},
|
|
805
|
+
"promptFragments": [
|
|
806
|
+
{
|
|
807
|
+
"id": "p3c-full-balance",
|
|
808
|
+
"when": {
|
|
809
|
+
"var": "pathRecommendation",
|
|
810
|
+
"equals": "full_spectrum"
|
|
811
|
+
},
|
|
812
|
+
"text": "Because this is `full_spectrum`, ensure the challenge tests both landscape fit and framing fit -- not just one."
|
|
813
|
+
},
|
|
814
|
+
{
|
|
815
|
+
"id": "p3c-prototype-test",
|
|
816
|
+
"when": {
|
|
817
|
+
"var": "needsPrototype",
|
|
818
|
+
"equals": true
|
|
819
|
+
},
|
|
820
|
+
"text": "Because the work may still need a prototype, ensure the challenge specifically tests the direction's biggest uncertainty."
|
|
821
|
+
}
|
|
822
|
+
],
|
|
823
|
+
"requireConfirmation": false
|
|
824
|
+
},
|
|
825
|
+
{
|
|
826
|
+
"id": "phase-3e-select-direction",
|
|
827
|
+
"title": "Phase 3e: Select Direction",
|
|
828
|
+
"promptBlocks": {
|
|
829
|
+
"goal": "Read the challenge findings from Phase 3d, apply the criteria you pre-committed to in Phase 2, and select the winning direction. Assign a typed SelectionOutput tier from observable signals only.",
|
|
830
|
+
"constraints": [
|
|
831
|
+
"Use criteria weights you recorded in Phase 2 -- quote them before reading candidates. Do not adjust weights after seeing candidates.",
|
|
832
|
+
"Tier assignment uses only observable signals: challengeQualityObserved, semantic diversity, framingResolutionStatus. Not self-reported confidence."
|
|
833
|
+
],
|
|
834
|
+
"procedure": [
|
|
835
|
+
"Quote the `decisionCriteria` and weights you recorded in `designDocPath` during Phase 2 before reading any candidates. If your weights have changed since then, flag the change explicitly and justify it -- weight drift toward the leading candidate is a red flag.",
|
|
836
|
+
"Order-bias check: consider each candidate as if it were presented to you first. Does your ranking change by position? If yes, name which bias is affecting you and correct for it. Note: this is a prompt-level approximation of structural order randomization -- it reduces but does not eliminate position bias.",
|
|
837
|
+
"If `rigorMode != QUICK`: spawn a fresh-context WorkRail Executor with ONLY: the candidate summaries, the quoted `decisionCriteria` with weights, and `challengeQualityObserved` from Phase 3d. Do NOT include full discovery context, session history, or the design doc. The executor returns `SelectionEvidenceTemplate` output. Read the executor's ranked evidence and assign the SelectionOutput tier using `SelectionOutputTemplate`. You assign the tier -- the executor provides evidence.",
|
|
838
|
+
"If `rigorMode = QUICK`: skip fresh-context selection entirely. Set `selectionTier = provisional_recommendation`. Record in `designDocPath`: 'QUICK mode: fresh-context selection was not applied. Tier is provisional regardless of apparent signal quality.'",
|
|
737
839
|
"Choose `selectedDirection` and `runnerUpDirection`.",
|
|
738
|
-
"Quality ceiling check:
|
|
739
|
-
"
|
|
840
|
+
"Quality ceiling check: compare `selectedDirection` to `idealEndState` from Phase 0a. Does it reach the ideal? If not, name exactly what it falls short of and why that shortfall is justified. If `solutionAmbitiousness = 'ideal_solution'` and the selected direction is the most conservative candidate, provide a specific reason each more ambitious candidate was ruled out -- 'too complex' alone is not sufficient. Also confirm that at least one candidate in the final set is consistent with an alternative frame from Phase 0a; if none is, record the gap.",
|
|
841
|
+
"Apply downgrade-only rule: `recommendationConfidenceBand` may be downgraded from the Phase 3d baseline if challenge findings revealed uncertainty; it may not be upgraded.",
|
|
842
|
+
"Record `acceptedTradeoffs`, `identifiedFailureModes`, `selectionTier`, and what would trigger a switch to the runner-up.",
|
|
740
843
|
"Update `designDocPath` Decision Log with why the winner won, why the runner-up lost, and how the winner compares to `idealEndState`.",
|
|
741
|
-
"Set these keys in the next `continue_workflow` call's `context` object: `selectedDirection`, `runnerUpDirection`, `acceptedTradeoffs`, `identifiedFailureModes`, `hasStrongAlternative`, `needsPrototype`, `needsFurtherResearch`, `recommendationConfidenceBand`."
|
|
844
|
+
"Set these keys in the next `continue_workflow` call's `context` object: `selectedDirection`, `runnerUpDirection`, `acceptedTradeoffs`, `identifiedFailureModes`, `hasStrongAlternative`, `needsPrototype`, `needsFurtherResearch`, `recommendationConfidenceBand`, `selectionTier`."
|
|
742
845
|
],
|
|
743
846
|
"outputRequired": {
|
|
744
|
-
"notesMarkdown": "Explain the winning direction
|
|
847
|
+
"notesMarkdown": "Explain the winning direction, the strongest alternative, what the challenge changed or failed to change, and the selectionTier with explicit rationale citing observable signals."
|
|
745
848
|
},
|
|
746
849
|
"verify": [
|
|
747
|
-
"
|
|
748
|
-
"
|
|
749
|
-
"
|
|
850
|
+
"`selectionTier` is assigned from observable signals -- not self-reported confidence.",
|
|
851
|
+
"`decisionCriteria` weights were quoted from Phase 2 before reading candidates; any changes flagged explicitly.",
|
|
852
|
+
"Quality ceiling check explicitly compared `selectedDirection` to `idealEndState`.",
|
|
853
|
+
"`recommendationConfidenceBand` was not upgraded from the Phase 3d value.",
|
|
854
|
+
"If `rigorMode = QUICK`: `selectionTier = provisional_recommendation`, no exceptions.",
|
|
855
|
+
"If `selectionTier = insufficient_signal`: `requiredActionOnInsufficient` is non-null and specific."
|
|
750
856
|
]
|
|
751
857
|
},
|
|
752
858
|
"promptFragments": [
|
|
753
859
|
{
|
|
754
|
-
"id": "
|
|
860
|
+
"id": "p3e-full-balance",
|
|
755
861
|
"when": {
|
|
756
862
|
"var": "pathRecommendation",
|
|
757
863
|
"equals": "full_spectrum"
|
|
@@ -759,7 +865,7 @@
|
|
|
759
865
|
"text": "Because this is `full_spectrum`, do not let the final choice overfit either the landscape or the framing. Make the winner earn both."
|
|
760
866
|
},
|
|
761
867
|
{
|
|
762
|
-
"id": "
|
|
868
|
+
"id": "p3e-prototype-test",
|
|
763
869
|
"when": {
|
|
764
870
|
"var": "needsPrototype",
|
|
765
871
|
"equals": true
|
|
@@ -831,13 +937,17 @@
|
|
|
831
937
|
"If the review surfaces bounded issues but not a direction change, record them as residual concerns instead of pretending the design is perfect."
|
|
832
938
|
],
|
|
833
939
|
"procedure": [
|
|
940
|
+
"Before acting on review findings: classify each finding as structural (questions a fundamental assumption -- would change the direction), tactical (questions implementation -- changes within the direction), or surface (style, completeness). Only structural findings should trigger direction revision.",
|
|
834
941
|
"Compare the review findings to your own prior reasoning.",
|
|
835
942
|
"State what changed your mind, what did not, and why.",
|
|
836
943
|
"If issues are real, update `selectedDirection`, `runnerUpDirection`, `acceptedTradeoffs`, `identifiedFailureModes`, and the Decision Log in `designDocPath`.",
|
|
944
|
+
"If review findings reveal new uncertainty, downgrade `recommendationConfidenceBand` accordingly. Do not upgrade it above the Phase 3e value.",
|
|
837
945
|
"Set these keys in the next `continue_workflow` call's `context` object: `directionFindings`, `directionRevised`, `needsPrototype`, `needsFurtherResearch`, `recommendationConfidenceBand`."
|
|
838
946
|
],
|
|
839
947
|
"verify": [
|
|
840
|
-
"Your decision reflects synthesized judgment, not copied review output."
|
|
948
|
+
"Your decision reflects synthesized judgment, not copied review output.",
|
|
949
|
+
"Each review finding was classified as structural / tactical / surface before being acted on.",
|
|
950
|
+
"`recommendationConfidenceBand` was not upgraded above the Phase 3e value."
|
|
841
951
|
]
|
|
842
952
|
},
|
|
843
953
|
"requireConfirmation": false
|
|
@@ -924,11 +1034,13 @@
|
|
|
924
1034
|
],
|
|
925
1035
|
"procedure": [
|
|
926
1036
|
"Consolidate the recommendation, strongest alternative, confidence band, and residual risks in `designDocPath`.",
|
|
1037
|
+
"Apply the downgrade-only rule: `recommendationConfidenceBand` may be downgraded if remaining uncertainty justifies it. It may not be upgraded above the Phase 3e value.",
|
|
927
1038
|
"Set these keys in the next `continue_workflow` call's `context` object: `finalConfidenceBand`, `residualRiskCount`, `handoffReady`."
|
|
928
1039
|
],
|
|
929
1040
|
"verify": [
|
|
930
1041
|
"The direct recommendation path is justified by the actual remaining uncertainty.",
|
|
931
|
-
"The handoff can proceed without pretending unresolved work has vanished."
|
|
1042
|
+
"The handoff can proceed without pretending unresolved work has vanished.",
|
|
1043
|
+
"`recommendationConfidenceBand` was not upgraded above the Phase 3e value."
|
|
932
1044
|
]
|
|
933
1045
|
},
|
|
934
1046
|
"requireConfirmation": false
|
|
@@ -963,13 +1075,15 @@
|
|
|
963
1075
|
"procedure": [
|
|
964
1076
|
"Gather the missing evidence that could actually change the recommendation and update the Landscape Packet and Decision Log in `designDocPath`.",
|
|
965
1077
|
"Update assumptions, confidence, and the recommended next action for me.",
|
|
1078
|
+
"Apply the downgrade-only rule: `recommendationConfidenceBand` may be downgraded if this pass reveals new uncertainty. It may not be upgraded above the Phase 3e value.",
|
|
966
1079
|
"Set these keys in the next `continue_workflow` call's `context` object: `resolutionInsightCount`, `invalidatedAssumptionCount`, `remainingGapCount`, `resolutionContinueRecommended`, `recommendationConfidenceBand`."
|
|
967
1080
|
],
|
|
968
1081
|
"outputRequired": {
|
|
969
1082
|
"notesMarkdown": "Summarize what the research follow-up taught you and how it changed the recommendation."
|
|
970
1083
|
},
|
|
971
1084
|
"verify": [
|
|
972
|
-
"The pass materially addressed a real missing evidence gap."
|
|
1085
|
+
"The pass materially addressed a real missing evidence gap.",
|
|
1086
|
+
"`recommendationConfidenceBand` was not upgraded above the Phase 3e value."
|
|
973
1087
|
]
|
|
974
1088
|
},
|
|
975
1089
|
"requireConfirmation": false
|
|
@@ -1032,13 +1146,15 @@
|
|
|
1032
1146
|
"Create or update a minimal prototype/test spec in `designDocPath` using `prototypeSpecTemplate`, then run the cheapest test that can still falsify the direction or give it real support.",
|
|
1033
1147
|
"Capture what worked, what failed, and what would falsify it.",
|
|
1034
1148
|
"Update assumptions, confidence, and the recommended next action for me.",
|
|
1149
|
+
"Apply the downgrade-only rule: `recommendationConfidenceBand` may be downgraded if this pass reveals new uncertainty. It may not be upgraded above the Phase 3e value.",
|
|
1035
1150
|
"Set these keys in the next `continue_workflow` call's `context` object: `resolutionInsightCount`, `invalidatedAssumptionCount`, `remainingGapCount`, `resolutionContinueRecommended`, `recommendationConfidenceBand`."
|
|
1036
1151
|
],
|
|
1037
1152
|
"outputRequired": {
|
|
1038
1153
|
"notesMarkdown": "Summarize what the prototype or test pass taught you and how it changed the recommendation."
|
|
1039
1154
|
},
|
|
1040
1155
|
"verify": [
|
|
1041
|
-
"The pass materially addressed the selected remaining uncertainty."
|
|
1156
|
+
"The pass materially addressed the selected remaining uncertainty.",
|
|
1157
|
+
"`recommendationConfidenceBand` was not upgraded above the Phase 3e value."
|
|
1042
1158
|
]
|
|
1043
1159
|
},
|
|
1044
1160
|
"functionReferences": [
|
|
@@ -1079,8 +1195,19 @@
|
|
|
1079
1195
|
"assessmentRefs": [
|
|
1080
1196
|
"recommendation-confidence-gate"
|
|
1081
1197
|
],
|
|
1198
|
+
"assessmentConsequences": [
|
|
1199
|
+
{
|
|
1200
|
+
"when": {
|
|
1201
|
+
"anyEqualsLevel": "uncompared"
|
|
1202
|
+
},
|
|
1203
|
+
"effect": {
|
|
1204
|
+
"kind": "require_followup",
|
|
1205
|
+
"guidance": "ideal_state_comparison = uncompared: compare the selected direction to idealEndState and name what it falls short of with justification. residual_risks_named = generic: replace generic caveats with specific risks naming what would concretely invalidate the recommendation."
|
|
1206
|
+
}
|
|
1207
|
+
}
|
|
1208
|
+
],
|
|
1082
1209
|
"promptBlocks": {
|
|
1083
|
-
"goal": "Validate
|
|
1210
|
+
"goal": "Validate the recommendation using a falsification-focused fresh-context check, then confirm the right confidence level and caveats.",
|
|
1084
1211
|
"constraints": [
|
|
1085
1212
|
[
|
|
1086
1213
|
{
|
|
@@ -1092,13 +1219,17 @@
|
|
|
1092
1219
|
"If important contradictions or gaps remain, downgrade confidence and say so explicitly."
|
|
1093
1220
|
],
|
|
1094
1221
|
"procedure": [
|
|
1095
|
-
"
|
|
1096
|
-
"
|
|
1222
|
+
"Spawn a fresh-context WorkRail Executor with ONLY: the original problem statement, `idealEndState` from Phase 0a, and the full content of `designDocPath`. Do NOT include `selectionTier`, session history, or Phase 3e rationale. The executor answers: (1) Based only on this document and the original problem, what important considerations appear to be missing? (2) What single piece of evidence, if discovered, would most likely change the recommendation? (3) What failure modes are not addressed by the recommendation? Read the executor's findings. For each gap the executor identifies that the workflow did not address: record as a residual risk, or if material, the recommendation-confidence-gate will require follow-up.",
|
|
1223
|
+
"Final check: `recommendationConfidenceBand` must not exceed the Phase 3e value. If Phase 6 work reveals no new uncertainty, the band stays at the Phase 3e value.",
|
|
1224
|
+
"Ensure the design doc is complete enough for a human to use.",
|
|
1097
1225
|
"Set these keys in the next `continue_workflow` call's `context` object: `finalConfidenceBand`, `residualRiskCount`, `handoffReady`."
|
|
1098
1226
|
],
|
|
1099
1227
|
"verify": [
|
|
1228
|
+
"The fresh-context validation executor was spawned and its findings were read.",
|
|
1100
1229
|
"The confidence band matches the evidence quality and remaining risks.",
|
|
1101
|
-
"
|
|
1230
|
+
"`recommendationConfidenceBand` was not upgraded above the Phase 3e value.",
|
|
1231
|
+
"The handoff is recommendation-ready, not just thought-complete.",
|
|
1232
|
+
"Has any position changed since Phase 3e? If yes: the new evidence must reference a specific finding in `designDocPath` or the Phase 6 executor output -- not reasoning generated in this step. If no qualifying evidence, restore the Phase 3e position and record: 'Position change without new evidence detected -- Phase 3e position restored.'"
|
|
1102
1233
|
]
|
|
1103
1234
|
},
|
|
1104
1235
|
"promptFragments": [
|
|
@@ -1146,8 +1277,8 @@
|
|
|
1146
1277
|
],
|
|
1147
1278
|
"procedure": [
|
|
1148
1279
|
"Update `designDocPath` with a final summary containing the selected path, problem framing, landscape takeaways, chosen direction, strongest alternative, why it lost, confidence band, residual risks, and next actions.",
|
|
1149
|
-
"In the final chat output, tell me the selected path, the chosen direction, the key reason it won, and where to find `designDocPath`.",
|
|
1150
|
-
"When writing the final answer, also emit an enriched wr.discovery_handoff artifact in your complete_step call:\n{\n \"kind\": \"wr.discovery_handoff\",\n \"version\": 1,\n \"selectedDirection\": \"<one sentence: the chosen approach>\",\n \"designDocPath\": \"<path to design doc, or empty string>\",\n \"confidenceBand\": \"high\" | \"medium\" | \"low\",\n \"keyInvariants\": [\"<invariant that must hold>\", ...],\n \"rejectedDirections\": [{\"direction\": \"<approach>\", \"reason\": \"<why rejected>\"}, ...],\n \"implementationConstraints\": [\"<thing the coding agent MUST NOT violate>\", ...],\n \"keyCodebaseLocations\": [{\"path\": \"<file path>\", \"relevance\": \"<why relevant>\"}, ...]\n}\nThe implementationConstraints and
|
|
1280
|
+
"In the final chat output, tell me: the selected path, the chosen direction, the key reason it won, the `selectionTier` and what it means for how much to trust this recommendation, and where to find `designDocPath`. If `selectionTier = insufficient_signal`: explicitly tell me that human review is required before acting on this recommendation. If `selectionTier = provisional_recommendation`: note what would need to be true for the recommendation to reach strong_recommendation.",
|
|
1281
|
+
"When writing the final answer, also emit an enriched wr.discovery_handoff artifact in your complete_step call:\n{\n \"kind\": \"wr.discovery_handoff\",\n \"version\": 1,\n \"selectedDirection\": \"<one sentence: the chosen approach>\",\n \"designDocPath\": \"<path to design doc, or empty string>\",\n \"confidenceBand\": \"high\" | \"medium\" | \"low\",\n \"selectionTier\": \"strong_recommendation\" | \"provisional_recommendation\" | \"insufficient_signal\",\n \"keyInvariants\": [\"<invariant that must hold>\", ...],\n \"rejectedDirections\": [{\"direction\": \"<approach>\", \"reason\": \"<why rejected>\"}, ...],\n \"implementationConstraints\": [\"<thing the coding agent MUST NOT violate>\", ...],\n \"keyCodebaseLocations\": [{\"path\": \"<file path>\", \"relevance\": \"<why relevant>\"}, ...]\n}\nThe implementationConstraints, keyCodebaseLocations, and selectionTier fields are especially important -- they orient downstream agents and signal how much to trust the recommendation."
|
|
1151
1282
|
],
|
|
1152
1283
|
"verify": [
|
|
1153
1284
|
"The design doc reads like a coherent human artifact.",
|