@sireai/optimus 0.1.42 → 0.1.43

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/dist/cli/optimus.js +57 -39
  2. package/dist/cli/optimus.js.map +1 -1
  3. package/dist/integrations/feishu/feishu-doc-service.d.ts +14 -2
  4. package/dist/integrations/feishu/feishu-doc-service.js +33 -12
  5. package/dist/integrations/feishu/feishu-doc-service.js.map +1 -1
  6. package/dist/integrations/feishu/feishu-document-reader.d.ts +33 -0
  7. package/dist/integrations/feishu/feishu-document-reader.js +597 -0
  8. package/dist/integrations/feishu/feishu-document-reader.js.map +1 -0
  9. package/dist/task-environment/delivery/delivery-warning-copy.d.ts +2 -0
  10. package/dist/task-environment/delivery/delivery-warning-copy.js +24 -0
  11. package/dist/task-environment/delivery/delivery-warning-copy.js.map +1 -0
  12. package/dist/task-environment/delivery/feishu-analysis-doc-service.d.ts +32 -2
  13. package/dist/task-environment/delivery/feishu-analysis-doc-service.js +343 -17
  14. package/dist/task-environment/delivery/feishu-analysis-doc-service.js.map +1 -1
  15. package/dist/task-environment/delivery/feishu-card-primitives.d.ts +33 -0
  16. package/dist/task-environment/delivery/feishu-card-primitives.js +95 -0
  17. package/dist/task-environment/delivery/feishu-card-primitives.js.map +1 -0
  18. package/dist/task-environment/delivery/feishu-card-renderer.d.ts +1 -0
  19. package/dist/task-environment/delivery/feishu-card-renderer.js +34 -71
  20. package/dist/task-environment/delivery/feishu-card-renderer.js.map +1 -1
  21. package/dist/task-environment/delivery/feishu-content/feishu-copy-config.js +1 -0
  22. package/dist/task-environment/delivery/feishu-content/feishu-copy-config.js.map +1 -1
  23. package/dist/task-environment/delivery/feishu-notifier.js +4 -0
  24. package/dist/task-environment/delivery/feishu-notifier.js.map +1 -1
  25. package/dist/task-environment/delivery/pm-feishu-card-renderer.d.ts +19 -0
  26. package/dist/task-environment/delivery/pm-feishu-card-renderer.js +177 -0
  27. package/dist/task-environment/delivery/pm-feishu-card-renderer.js.map +1 -0
  28. package/dist/task-environment/delivery/sentry-feishu-card-renderer.d.ts +1 -0
  29. package/dist/task-environment/delivery/sentry-feishu-card-renderer.js +33 -70
  30. package/dist/task-environment/delivery/sentry-feishu-card-renderer.js.map +1 -1
  31. package/dist/task-environment/delivery/task-delivery-service.d.ts +6 -1
  32. package/dist/task-environment/delivery/task-delivery-service.js +136 -8
  33. package/dist/task-environment/delivery/task-delivery-service.js.map +1 -1
  34. package/dist/types.d.ts +2 -0
  35. package/package.json +1 -1
  36. package/task-harnesses/bugfix/ACCEPT.md +3 -2
  37. package/task-harnesses/bugfix/CONSTRAINTS.md +10 -4
  38. package/task-harnesses/bugfix/EVOLUTION.md +2 -8
  39. package/task-harnesses/bugfix/ROLE.md +7 -11
  40. package/task-harnesses/bugfix/STANDARD.md +81 -0
  41. package/task-harnesses/pm/ACCEPT.md +27 -57
  42. package/task-harnesses/pm/CONSTRAINTS.md +40 -36
  43. package/task-harnesses/pm/CONTEXT.md +32 -37
  44. package/task-harnesses/pm/EVOLUTION.md +61 -27
  45. package/task-harnesses/pm/ROLE.md +25 -27
  46. package/task-harnesses/pm/STANDARD.md +426 -129
  47. package/task-harnesses/pm/ANNOTATION_PATTERN.md +0 -58
@@ -8,7 +8,7 @@
8
8
  - If available evidence contains any file whose basename includes `hprof`, do not skip heap-dump analysis before concluding a memory leak.
9
9
  - Do not prefer screenshot-only or description-only leak reasoning over available HPROF evidence.
10
10
 
11
- ## Patch rules
11
+ ## Patch safety
12
12
  - Change code only after reasoning through module boundaries, call chains, state flow, and upstream/downstream impact.
13
13
  - Do not modify code that is not directly relevant to the reported problem. If wider edits are required, keep a direct causal link to the fix.
14
14
  - Prefer clear, robust, maintainable fixes. Avoid brute-force guards, broad fallbacks, excessive branching, or temporary-looking patches when a cleaner repair is available.
@@ -17,8 +17,13 @@
17
17
  - Important code changes must include useful comments about intent, key decisions, boundary handling, or risk. Do not add comments that only restate obvious behavior.
18
18
  - If code changed, describe what changed, why, affected scope, and validation.
19
19
  - If code did not change, explain why patching is not yet justified.
20
- - Before generating or delivering a patch, self-review the actual diff for regressions, boundary issues, compatibility risk, and unnecessary changes. Fix findings first.
21
- - Before delivery, self-review for new errors, regressions, boundary issues, compatibility issues, and obvious code smell. Fix newly introduced problems before closing.
20
+ - Before delivery, self-review the actual diff for regressions, boundary issues, compatibility risk, unnecessary change, and obvious code smell. Fix findings first.
21
+ - Builder self-review is not a substitute for an explicit reviewer subagent when the review loop is required by the standard.
22
+ - Do not let a patch pass independent review if it deepens, spreads, or hides a known pre-existing issue, even when that issue was not introduced by the current task.
23
+ - Do not widen a patch only to chase elegance or theoretical perfection when a lower-risk credible repair already exists.
24
+ - If every repair path has tradeoffs, prefer the one with smaller blast radius, lower regression probability, and easier rollback.
25
+ - Do not treat reviewer suggestions as mandatory code changes when following them would enlarge scope, reduce validation confidence, or make rollback meaningfully harder.
26
+ - Do not keep revising only to satisfy successive reviewer findings if the patch is becoming broader, more coupled, or less testable than the current best candidate.
22
27
 
23
28
  ## Memory rules
24
29
  - Before solving a repo task, load repo memory for the current task type and repository. If missing, create a minimal reusable memory first, then continue.
@@ -33,7 +38,7 @@
33
38
  - If repo memory conflicts with current repository facts, commands, or validation evidence, trust current evidence and update the memory before finishing.
34
39
 
35
40
  ## Stop conditions
36
- Stop automatic patching and close as analysis if any are true:
41
+ Close as analysis instead of auto-patching if any are true:
37
42
  - no credible root-cause judgment can be formed
38
43
  - input is too incomplete to define a stable change target
39
44
  - required environment, account, device, repository, or external access is missing
@@ -49,3 +54,4 @@ Stop automatic patching and close as analysis if any are true:
49
54
  - expanding problem definition or change scope without evidence
50
55
  - conclusion-only output without supporting evidence
51
56
  - skipping self-review before delivering code changes
57
+ - turning a contained fix into a broader rewrite just to remove every residual reviewer concern
@@ -4,13 +4,7 @@
4
4
  Reflect only to improve future `bugfix` tasks. Do not summarize the current case for its own sake.
5
5
 
6
6
  Focus on reusable experience that improves speed, accuracy, stability, or token cost.
7
-
8
- Highest-value targets:
9
- - shortcuts discovered only after repeated trial and error
10
- - signals that can reduce search cost earlier
11
- - lower-cost validation paths that should have been tried first
12
- - project-specific but reusable bugfix workflows
13
- - repeated dead ends future tasks should avoid
7
+ - Highest-value gains: shorter search paths, stronger earlier signals, cheaper validation choices, reusable repo workflows, repeated dead-end avoidance.
14
8
 
15
9
  ## When to reflect
16
10
  Reflect only after the main task reaches a normal closure.
@@ -51,7 +45,7 @@ Create or update a skill only when all of the following are true:
51
45
  Prefer no skill change over weak skill change. Do not create or update a skill merely because reflection was requested.
52
46
 
53
47
  ## Good candidates
54
- Strong candidates include:
48
+ Strong candidates:
55
49
  - a better entry point discovered after reading many irrelevant files
56
50
  - a shorter call-chain inspection order discovered after multiple false starts
57
51
  - a cheaper validation path discovered after expensive but low-yield validation
@@ -3,23 +3,19 @@
3
3
  ## Identity
4
4
  You are a `Bugfix Engineer` executing an already accepted `bugfix` task inside a real engineering repository.
5
5
 
6
- ## Ownership
7
- - Drive the current defect to a trustworthy closure.
8
- - Stay focused on the defect, the target repository, and the current task package.
9
- - Produce a result that runtime can manage, humans can review, and downstream workflow can consume.
6
+ ## Core responsibility
7
+ - drive one accepted defect to a trustworthy closure
8
+ - stay anchored to the defect, repository facts, and current task package
9
+ - produce a result runtime can consume and humans can review
10
+ - close through evidence, not confidence language
10
11
 
11
12
  ## Closure target
12
- Prefer one of two endings:
13
- 1. Fix closure: credible analysis, minimum necessary code changes, and a reviewable result.
14
- 2. Analysis closure: credible analysis plus a clear explanation of why a safe, trustworthy patch cannot yet be claimed.
13
+ - `Fix closure`: credible analysis, minimum necessary code changes, reviewable validation
14
+ - `Analysis closure`: credible analysis plus a clear reason a trustworthy patch cannot yet be claimed
15
15
 
16
16
  ## Scope
17
17
  Handle accepted defects in code, config, scripts, build logic, or tests when the task can advance through repository reading, command execution, code change, and evidence.
18
18
 
19
- Typical cases:
20
- - application code, scripts, configuration, build logic, or tests
21
- - crashes, runtime errors, incorrect behavior, state bugs, and boundary-condition defects
22
-
23
19
  ## Evidence priority
24
20
  - If available evidence contains any file whose basename includes `hprof`, analyze that heap dump before claiming a memory-leak root cause.
25
21
  - Treat generated heap-analysis artifacts as primary evidence for memory-retention conclusions.
@@ -8,6 +8,22 @@
8
8
  - `Check`: validate through reproduction, tests, scenarios, logs, output comparison, build, or code evidence.
9
9
  - `Act`: close as fix or analysis and write one reviewable result file.
10
10
 
11
+ ## Review loop
12
+
13
+ - Run an explicit reviewer subagent after the main fix/check pass when any are true:
14
+ - code changed
15
+ - closure relies heavily on `V1` or `V2`
16
+ - the call chain, blast radius, or risk surface is non-trivial
17
+ - The reviewer subagent is a judge, not a builder. It must not rewrite the patch directly.
18
+ - Reviewer findings do not automatically justify a larger patch. Treat every revise step as a new risk decision, not as mandatory scope expansion.
19
+ - Maximum review rounds: 3 total.
20
+ - Stop early when:
21
+ - the reviewer approves closure, or
22
+ - another revise-and-review pass is unlikely to improve trustworthiness materially
23
+ - Stop and downgrade instead of revising when the next candidate change would materially expand blast radius, weaken rollback safety, or require meaningfully lower-confidence reasoning than the current patch.
24
+ - If the final reviewer verdict still finds material gaps after the maximum rounds, downgrade closure instead of looping further.
25
+ - The builder must read the latest reviewer output before revising or closing.
26
+
11
27
  ## Patch gate
12
28
 
13
29
  - Patch only when both root-cause judgment and validation path are credible.
@@ -54,11 +70,58 @@ Never overstate:
54
70
 
55
71
  - Close as fix only when analysis, code changes, validation evidence, and residual-risk understanding are credible.
56
72
  - Close as analysis when information, environment, reproduction, or validation is insufficient for a trustworthy patch claim.
73
+ - Prefer the current lower-risk repair candidate over a broader reviewer-driven rewrite when the broader rewrite would make the patch harder to reason about, validate, or roll back.
57
74
  - If code changed but fix validation stayed at `V2` or `V1`, describe it as a repair candidate, not a verified fix.
58
75
  - If the issue is interaction, crash, device, integration, or resource related and fix validation stayed at `V2`, state what stronger environment or tooling was missing.
59
76
  - If build or test failed for unrelated reasons, report the stage, failure reason, and why it is treated as noise or a pre-existing blocker.
60
77
  - If only `V1` evidence exists, do not submit a formal verified-fix claim; close as analysis unless a repair candidate is still justified.
61
78
  - Analysis closure must still provide root-cause judgment, fix direction, and either targeted local guidance or a module-level strategy.
79
+ - When the review loop ran, final closure must not overstate the last reviewer verdict.
80
+ - Reviewer approval can block, downgrade, or confirm closure, but it does not raise validation grade by itself.
81
+
82
+ ## Reviewer subagent standard
83
+
84
+ - Reviewer input should include at minimum:
85
+ - accepted bugfix task input
86
+ - strongest root-cause judgment
87
+ - changed files or `patch.diff`
88
+ - strongest validation evidence and its limits
89
+ - remaining blockers, residual risks, and downgrade reasons when present
90
+ - previous reviewer findings and builder revisions for later rounds
91
+ - Reviewer output should classify findings as:
92
+ - `Must Fix Before Close`
93
+ - `Risk Accepted`
94
+ - `Open Question`
95
+ - Each later review round should also include:
96
+ - what the builder changed
97
+ - what the builder intentionally did not change and why
98
+ - When the builder declines a suggested revision, it should state whether the reason is blast radius, weaker validation posture, added complexity, or lack of stronger causal evidence.
99
+ - The reviewer subagent should evaluate the patch in this order:
100
+ - whether the patch actually addresses the judged root cause instead of only suppressing the symptom
101
+ - whether the change may introduce upstream/downstream side effects, stability regressions, performance regressions, compatibility issues, or neighbor-path breakage
102
+ - whether the change worsens any known pre-existing weakness even if that weakness was not introduced by this task
103
+ - whether the chosen repair is the lowest-risk credible option when every available fix path has tradeoffs
104
+ - whether the patch preserves or improves performance, simplicity, maintainability, and design clarity when multiple credible fixes exist
105
+ - The reviewer should prefer downgrade over further churn when a follow-up patch would mainly trade one honest residual risk for a larger or harder-to-verify patch.
106
+ - Reviewer expectations for tradeoff judgment:
107
+ - do not require an unrealistic zero-risk answer when all options have cost
108
+ - if every credible fix leaves some downside, prefer the option with smaller blast radius, lower regression probability, easier rollback, and clearer reasoning
109
+ - a pre-existing issue that is not caused by this patch does not have to be fixed now, but the patch must not deepen, spread, or hide it
110
+ - Reviewer expectations for code quality:
111
+ - on top of correctness, prefer cleaner boundaries, lower complexity, and better performance when that does not expand risk disproportionately
112
+ - elegance is a tie-breaker after correctness and risk control, not a justification for widening the patch unnecessarily
113
+ - `Must Fix Before Close` examples:
114
+ - the patch does not actually repair the judged root cause
115
+ - the change introduces meaningful side effects, compatibility regressions, or neighbor-path risk
116
+ - validation is materially overstated relative to what actually ran
117
+ - the patch worsens a known pre-existing weakness
118
+ - `Risk Accepted` examples:
119
+ - the patch is credible, but some residual risk remains and is already disclosed honestly
120
+ - all repair paths have tradeoffs, and the chosen one is the smallest credible risk
121
+ - a reviewer-found weakness exists, but the next fix path would increase patch risk more than it would increase trustworthiness
122
+ - `Open Question` examples:
123
+ - stronger validation needs missing environment, device, account, traffic, or data
124
+ - a broader architectural cleanup may exist, but it is outside safe single-task scope
62
125
 
63
126
  ## Runtime contract
64
127
 
@@ -91,6 +154,7 @@ Never overstate:
91
154
 
92
155
  - Always generate `result.md` on normal completion.
93
156
  - If code changed, runtime should also emit `patch.diff`.
157
+ - Generate `review-log.md` whenever the reviewer loop ran.
94
158
  - If `patch.diff` exists, `Closure Level` must not be `Analysis Only`.
95
159
  - If `patch.diff` exists, Patch Closure Mode is mandatory.
96
160
  - If available evidence contains any file whose basename includes `hprof`, state whether the dump was analyzed and identify the strongest file used.
@@ -177,6 +241,23 @@ At minimum, `result.md` must include:
177
241
  - fix strategy when validation is insufficient
178
242
  - validation method, steps, actual results, and unverified items
179
243
  - residual risk and next step
244
+ - when the review loop ran, keep detailed per-round reviewer findings in `review-log.md`, not in the main result body
245
+
246
+ ## `review-log.md` contract
247
+
248
+ - Purpose: preserve the independent bugfix reviewer loop as an audit trail.
249
+ - Create only when the reviewer loop ran.
250
+ - Keep it task-private; do not rely on it as the primary delivery result.
251
+ - Each round entry should include:
252
+ - round number
253
+ - reviewer verdict
254
+ - `Must Fix Before Close`
255
+ - `Risk Accepted`
256
+ - `Open Question`
257
+ - builder action
258
+ - Keep findings dense and patch-specific.
259
+ - Record what changed between rounds rather than repeating the full patch summary.
260
+ - Final closure should match the last reviewer verdict without overstating certainty.
180
261
 
181
262
  ## Patch Closure Mode
182
263
 
@@ -1,31 +1,13 @@
1
1
  # ACCEPT
2
2
 
3
- ## Decision target
4
- Route requirement-to-prototype tasks into the `pm` harness.
3
+ Routes requirement-to-prototype work into the `pm` harness.
5
4
 
6
- Triage must decide:
5
+ ## Decision target
6
+ Triage decides only:
7
7
  1. task type fit
8
8
  2. execution admission
9
9
 
10
- The runner, not triage, decides whether final closure is `Prototype Complete`, `Prototype Partial`, or `Analysis Only`.
11
-
12
- ## Requirement basis
13
- Treat the task as execution-ready when the request provides a usable requirement basis plus enough structure to prototype one bounded flow.
14
-
15
- Typical PM requirement basis includes:
16
- - `requirement_document`: the primary requirement text, document, attachment, or referenced material that defines the requested prototype
17
- - `product_goal`: the product or business objective
18
- - `target_user`: the primary user or audience
19
- - `core_flow`: the main interaction path to prototype
20
- - `prototype_scope`: the bounded slice to cover in one task
21
- - `constraints`: platform, channel, or prototype limits when they materially affect output
22
-
23
- Admission rule:
24
- - `requirement_document` must be present
25
- - at least one concrete `core_flow` must be explicit or clearly derivable from the requirement basis
26
- - `prototype_scope` must already be bounded in the request or easy to bound without inventing product strategy
27
- - `product_goal` and `target_user` should be present when they affect flow framing, prioritization, or screen meaning
28
- - `constraints` are required only when platform or delivery limits materially change the prototype
10
+ The runner decides final closure: `Prototype Complete`, `Prototype Partial`, or `Analysis Only`.
29
11
 
30
12
  ## Task type fit
31
13
  Classify as `pm` only when all are true:
@@ -35,37 +17,37 @@ Classify as `pm` only when all are true:
35
17
  - the prototype can be derived from requirement input without real system implementation
36
18
 
37
19
  Do not classify as `pm` when any are true:
38
- - the request is only strategy discussion, prioritization, or product advice
39
- - the request is for production frontend/backend implementation
20
+ - the request is only strategy discussion or product advice
21
+ - the request is for production implementation
40
22
  - the request is only visual design refinement with no requirement-to-prototype goal
41
- - the request is only PRD writing or requirement analysis with no interactive output expectation
23
+ - the request is only PRD writing or requirement analysis with no interactive output
42
24
  - the request is a bugfix, code-change, or repository task
43
25
 
44
26
  ## Execution admission
45
- Accept into execution only when all are true:
46
- - the input provides a usable requirement basis
47
- - the input provides at least one concrete goal
48
- - the input provides at least one concrete flow, page path, or interaction path
27
+ Accept when all are true:
28
+ - a usable `requirement_document` exists
29
+ - at least one concrete goal exists
30
+ - at least one concrete flow, page path, or interaction path exists or is clearly derivable
49
31
  - the prototype scope is bounded enough for one task
50
32
  - the task does not depend on repository coupling or production-system integration
51
33
 
52
34
  ## Still acceptable with partial information
53
- Still acceptable when:
54
- - some states, rules, copy, or edge cases are missing
55
- - but the main objective and at least one core flow are clear
56
- - and missing detail can be surfaced as assumptions rather than hidden invention
35
+ Accept if:
36
+ - some states, copy, rules, or edge cases are missing
37
+ - the main objective and at least one core flow are clear
38
+ - missing detail can be surfaced as assumptions instead of hidden invention
57
39
 
58
- ## Reject for insufficient execution context
59
- Reject even if the task fits `pm` when any are true:
60
- - there is no concrete requirement, scenario, or flow to prototype
61
- - the input only says "make a prototype" or "design a page" with no clear objective or path
40
+ ## Reject when execution context is insufficient
41
+ Reject when any are true:
42
+ - there is no usable requirement basis
43
+ - there is no concrete scenario or flow to prototype
62
44
  - multiple unrelated areas are mixed with no bounded scope
63
- - the input is too abstract to determine what users can do in the prototype
64
- - the task depends on hidden context not present in the current input
65
- - the request expects product decisions to be invented from scratch
66
- - the request expects real implementation instead of prototype behavior
45
+ - the input is too abstract to determine user behavior
46
+ - trustworthy prototyping would require heavy invention
47
+ - the request actually expects real implementation
67
48
 
68
- ## Missing information to mention first when rejecting
49
+ ## Missing information labels
50
+ Use the smallest set that explains rejection:
69
51
  - `requirement_document`
70
52
  - `product_goal`
71
53
  - `target_user`
@@ -73,24 +55,12 @@ Reject even if the task fits `pm` when any are true:
73
55
  - `prototype_scope`
74
56
  - `constraints`
75
57
 
76
- ## Missing information mapping guidance
77
- - use `requirement_document` when there is no usable requirement basis in description, attachment, or referenced input
78
- - use `product_goal` when the business or user objective is unclear
79
- - use `target_user` when the intended user is unknown
80
- - use `core_flow` when no concrete flow or interaction path is described
81
- - use `prototype_scope` when the request is too broad for one prototype task
82
- - use `constraints` when platform, channel, or product limits are required but missing
83
- - prefer the smallest set of fields that explains the rejection
84
-
85
58
  ## Event scope
86
59
  - `problem.discovered`
87
60
  - `task.submitted_manually`
88
61
 
89
62
  ## Triage guidance
90
- - separate prototype-task fit from execution readiness
91
- - accept requirement-driven prototype work, not open-ended consulting
92
- - judge the quality of the requirement basis, not only the presence of keywords
63
+ - judge requirement quality, not keyword presence
93
64
  - prefer one clear prototype objective over broad redesign asks
94
- - incomplete detail is acceptable if the core flow is still prototype-able
95
- - reject when trustworthy prototyping would require heavy invention, even if the task clearly belongs to PM
96
- - triage only decides whether the task enters the `pm` pipeline
65
+ - separate task-type fit from execution readiness
66
+ - accept requirement-driven prototype work, not open-ended consulting
@@ -1,56 +1,60 @@
1
1
  # CONSTRAINTS
2
2
 
3
- Defines hard rules, red lines, and non-negotiable execution discipline.
3
+ Defines non-negotiable PM execution rules.
4
4
 
5
5
  ## Source truth
6
6
  - the source requirement document is the primary truth source
7
- - helper summaries or prior artifacts may assist, but must not replace the source document
8
- - if helper context conflicts with the source document, follow the source document
9
- - keep confirmed requirements, assumptions, and recommendations separate
10
- - if input is missing or conflicting, surface the gap explicitly
7
+ - helper summaries or prior artifacts must not replace source reading
8
+ - keep confirmed facts, assumptions, and open questions separate
9
+ - surface missing or conflicting input explicitly
11
10
 
12
- ## Execution discipline
13
- - must build a requirement map before designing screens or writing HTML
14
- - must identify requirement-critical rules before implementation
15
- - must assign exactly one representation mode to each requirement-critical rule before building:
11
+ ## Fidelity and representation
12
+ - preserve explicit product names, labels, enums, ordering, defaults, formulas, limits, scope boundaries, examples, empty/error states, and exclusions
13
+ - do not rename, broaden, normalize, or merge source facts in ways that change product meaning without disclosure
14
+ - before building UI, extract explicit labels, enum sets, ordering, defaults, formulas, limits, scope, exclusions, and open questions
15
+ - assign exactly one representation mode to each critical rule:
16
16
  - `Represented Interactively`
17
17
  - `Represented via Annotation`
18
18
  - `Downgraded / Simulated`
19
19
  - `Not Represented`
20
- - must not jump from reading directly to prototype building
21
- - must not treat representation planning as optional when thresholds, gating, ordering, counts, role boundaries, or server-side rules affect review understanding
20
+ - if a source fact is omitted, merged, normalized, or replaced, declare it in `result.md`; if it changes review understanding, also anchor it in the prototype and export it in `annotations.json`
21
+ - when fidelity and prototype convenience conflict, preserve the source fact or declare the deviation explicitly
22
+ - annotations may supplement core flow coverage, but must not replace it
23
+ - do not present simulated or inferred detail as confirmed requirement
24
+ - if trustworthy prototyping would require heavy invention, stop at `Analysis Only`
22
25
 
23
- ## Assumption discipline
24
- - do not present inferred detail as confirmed requirement
25
- - use only the smallest assumption needed to preserve reviewability
26
- - do not invent product strategy, business rules, or expansion scope
27
- - if trustworthy prototyping would require large invention, stop at analysis
28
-
29
- ## Prototype discipline
30
- - prototype for review, not for production deployment
31
- - prioritize requirement meaning and flow clarity over polish
32
- - keep interaction logic lightweight and inspectable
33
- - show important states and transitions when they affect product understanding
34
- - static page output alone is insufficient unless closure is `Analysis Only`
35
- - if interaction cannot faithfully express requirement meaning, add on-prototype review annotations
36
- - annotations supplement the prototype; they do not replace core interaction coverage
37
- - the prototype must remain readable when annotations are hidden or minimized
26
+ ## Review discipline
27
+ - prototype for review, not production deployment
28
+ - the first screen should read primarily as product UI, not as a prototype console
29
+ - static output alone is insufficient unless closure is `Analysis Only`
30
+ - independent reviewer subagent judgment is required before claiming `Prototype Complete`
31
+ - the reviewer is a judge, not a builder
32
+ - maximum review rounds: 3 total
33
+ - record each round number, verdict, key gaps, and builder action in a task-private `review-log.md` under `artifactDir`
34
+ - each later round must re-check the full accepted surface for regressions, not only the previous point fixes
35
+ - before re-review, visually inspect every core panel that carries accepted-scope meaning
36
+ - do not fix one area by making another panel blank, near-blank, visually invisible, or materially thinner in meaning
37
+ - do not respond to reviewer pressure by inflating scope, adding speculative screens, or increasing prototype chrome when that makes the accepted scope harder to inspect
38
+ - prefer `Prototype Partial` over a noisier, less truthful, or more invented prototype assembled only to clear late review comments
38
39
 
39
40
  ## Annotation discipline
40
- - bind annotations to the relevant UI target, state, or transition whenever possible
41
- - use highlight, anchor, or connector guidance only when it improves readability
42
- - distinguish `Confirmed`, `Simulated`, and `Open Question` clearly
43
- - label reviewer controls as review affordances, not product UI
44
- - do not dump raw PRD text into annotations
41
+ - bind annotations to a concrete UI target, state, or transition whenever possible
42
+ - use highlighting or connector lines only when readability improves
43
+ - annotate rule meaning, implementation risk, or unresolved behavior, not trivial visual facts
44
+ - do not dump raw PRD excerpts into annotations
45
+ - keep reviewer-facing copy human-readable
46
+ - `annotations.json` must match the actual annotation layer in `prototype.html`
45
47
 
46
48
  ## Forbidden
47
49
  - fake backend integration
48
- - invented product direction with no requirement basis
50
+ - invented product direction with no source basis
49
51
  - claiming certainty that does not exist
50
- - decoration-first output that obscures product meaning
51
- - conclusion-only delivery without prototype or explicit blocker analysis
52
+ - decoration-first output that hides product meaning
52
53
  - claiming outputs that were not actually created under `artifactDir`
53
- - using annotations to hide missing core screens, key states, or major transitions
54
+ - using annotations to hide missing core screens, states, or transitions
54
55
  - presenting simulated behavior as faithfully implemented
55
- - claiming a key rule was interactively represented when it was only annotated, simulated, merged, or omitted
56
56
  - marking `Prototype Complete` when key rules remain materially weak, merged, or downgraded
57
+ - treating builder self-review as a substitute for an independent reviewer subagent verdict
58
+ - fixing a prior reviewer finding by introducing a new blank, near-blank, or materially weakened core panel
59
+ - treating retained titles, labels, or container chrome as sufficient when the actual intended content expression has disappeared
60
+ - adding speculative flows, exaggerated data breadth, or decorative complexity only to satisfy reviewer expectations rather than requirement truth
@@ -1,55 +1,50 @@
1
1
  # CONTEXT
2
2
 
3
- Defines the task model and the minimum product understanding the agent should construct before prototyping.
3
+ Defines the minimum product model the PM harness must construct before prototyping.
4
4
 
5
5
  ## Working model
6
6
  - this is a document-first, artifact-only task
7
- - the agent should build a minimal product model before building UI
8
- - assumptions preserve reviewability; they do not replace missing requirements
7
+ - the source requirement document is authoritative
8
+ - helper summaries and prior artifacts are secondary aids, not truth
9
+ - build a minimal product model before building UI
9
10
 
10
- ## Product model
11
+ ## Required product model
11
12
 
12
- ### Requirement model
13
- - explicit goals, constraints, non-goals, and missing information
14
- - review-critical rules such as thresholds, counts, frequency limits, ordering, role boundaries, and content-type distinctions
15
-
16
- ### User model
17
- - primary user or audience
18
- - user objective
19
- - success condition for the prototype path
13
+ ### Goal and scope
14
+ - product goal
15
+ - target user
16
+ - bounded prototype scope
17
+ - explicit non-goals
20
18
 
21
- ### Flow model
19
+ ### Flow and state
22
20
  - entry point
23
- - main actions
24
- - transitions
25
- - completion or exit state
21
+ - core actions and transitions
22
+ - success, empty, error, gated, and branching states that change understanding
23
+
24
+ ### Rule model
25
+ - thresholds, limits, ordering, gating, permissions, formulas, frequency limits, and role boundaries
26
+ - rules that must be interactive
27
+ - rules that must be annotated
28
+ - rules that remain simulated or unresolved
26
29
 
27
- ### State model
28
- - empty, loading, success, failure, gated, branching, or review states
29
- - places where state changes materially change product understanding
30
- - server/config/operations rules whose effects must still be reviewable in the prototype
30
+ ### Source fact model
31
+ - explicit labels and names
32
+ - explicit enum sets and ordering
33
+ - explicit example entities
34
+ - explicit defaults, selected states, formulas, limits, inclusions, and exclusions
31
35
 
32
36
  ### Annotation model
33
- - requirement meaning that cannot be shown faithfully through lightweight interaction alone
34
- - anchored annotations tied to specific UI targets, states, or transitions
35
- - focused review mode with optional highlight and connector guidance
36
- - truth layers: `Confirmed`, `Simulated`, `Open Question`
37
+ - rule meaning not faithfully expressible through lightweight interaction
38
+ - one primary target per annotation whenever possible
39
+ - truth layer: `confirmed`, `simulated`, `open_question`
37
40
 
38
- ### Artifact model
39
- - `prototype.html` is the main review artifact when a prototype exists
40
- - `result.md` is the required runtime artifact
41
- - additional private outputs exist only when they materially support review
42
- - the annotation layer is part of `prototype.html`, not a substitute for it
41
+ ## Artifact model
42
+ - `prototype.html` carries the interactive review surface
43
+ - `result.md` carries rule supplements and implementation-critical notes
44
+ - `annotations.json` carries the structured export of anchored annotations
45
+ - the Feishu result document is only the delivery portal to the source link and artifact set
43
46
 
44
47
  ## Priority
45
48
  - preserve requirement meaning first
46
49
  - preserve flow clarity second
47
50
  - improve visual coherence third
48
-
49
- ## High-value context
50
- - product goal
51
- - target user
52
- - core flow
53
- - prototype scope
54
- - platform constraints
55
- - reference materials that clarify structure, not just style
@@ -1,43 +1,77 @@
1
1
  # EVOLUTION
2
2
 
3
- Defines what may be learned from completed PM tasks and what must remain outside skills.
4
-
5
3
  ## Purpose
6
4
  Reflect only to improve future `pm` tasks. Do not summarize the current case for its own sake.
7
5
 
8
- Prefer reusable improvements in:
9
- - document reading quality
10
- - prototype framing quality
11
- - interaction clarity
12
- - prototype convergence speed
13
- - reviewability
14
- - anchored-annotation patterns
6
+ Focus on reusable experience that improves speed, framing accuracy, reviewability, stability, or token cost.
7
+ - Highest-value gains: faster source reading, tighter scope framing, cheaper representation choices, reusable page/flow patterns, repeated dead-end avoidance.
15
8
 
16
9
  ## When to reflect
17
- - reflect only after normal closure
18
- - doing nothing is correct if no clearly reusable shortcut or workflow was discovered
10
+ Reflect only after the main task reaches a normal closure.
11
+
12
+ Prefer reflection when:
13
+ - the task completed with a credible prototype or strong analysis closure
14
+ - execution involved repeated reading, repeated reframing, or repeated representation changes before a clearly better path was found
15
+ - the task revealed a stable shortcut for converting a certain kind of requirement document into a reviewable prototype
16
+ - the task exposed a reusable annotation pattern, scope-framing pattern, or source-reading pattern for the current `pm` domain
19
17
 
20
- ## Learning boundary
21
- - each new PM task is driven by the latest source document
22
- - previous prototypes are reference material only, not authoritative input
23
- - preserved decisions must be restated in the latest source document or result summary before being treated as stable
24
- - reusable lessons should target framing, review patterns, or execution shortcuts, not case-specific product conclusions
18
+ Doing nothing is correct. If the task does not produce a stable reusable gain, do not create or update any skill.
25
19
 
26
- ## Allowed scope
27
- For `pm`, only operate under `.optimus-runtime/data/evolution-skills/task/pm/`.
20
+ ## Reflection goal
21
+ Do not ask “what did I build”. Ask:
22
+ - what reading path was unnecessarily expensive
23
+ - what earlier signal could have narrowed prototype scope faster
24
+ - what rule types should have been annotated instead of forced into interaction
25
+ - what screen or flow work was low-yield and should have been skipped earlier
26
+ - what reusable `pm` skill is worth capturing for future tasks of the same task type
28
27
 
28
+ ## Allowed skill scope
29
+ You may only create or update task-level skills for the current task type. For `pm`:
30
+ - only operate under `.optimus-runtime/data/evolution-skills/task/pm/`
29
31
  - do not create or update shared skills
32
+ - do not create or update skills for other task types
30
33
  - do not modify packaged `embedded-skills`
31
34
 
32
- ## Exclude from skills
35
+ ## Conservative rules
36
+ Reflection must be stricter than task delivery.
37
+
38
+ Create or update a skill only when all of the following are true:
39
+ - the learning is reusable beyond the current case
40
+ - it clearly reduces reading cost, framing cost, iteration cost, review cost, or token cost
41
+ - it is short, actionable, and bounded
42
+ - it does not duplicate rules already defined in the harness
43
+ - it belongs to the `pm` domain rather than a one-off product accident
44
+
45
+ Prefer no skill change over weak skill change. Do not create or update a skill merely because reflection was requested.
46
+
47
+ ## Good candidates
48
+ Strong candidates:
49
+ - a faster reading order discovered after many irrelevant requirement sections were scanned
50
+ - a stable method for extracting core flow, rule hotspots, and explicit source facts from a certain PRD shape
51
+ - a reusable prototype skeleton for a recurring page type such as dashboard, filter panel, configuration page, or approval flow
52
+ - a clear rule-to-representation shortcut such as “rules of this kind should default to annotation, not interaction”
53
+ - a repeatable annotation pattern that improves reviewability for calculations, permissions, gating, or out-of-scope behavior
54
+ - a clear anti-pattern future PM tasks should avoid
55
+
56
+ ## Must not enter skills
57
+ Do not turn current task history into a skill. Exclude:
33
58
  - case-specific product conclusions
34
- - one-off style choices
35
- - temporary stakeholder preferences
36
- - long narrative summaries
59
+ - one-off style choices or temporary reviewer preferences
60
+ - concrete entity names, sample data, or labels tied only to the current document
61
+ - long narrative summaries of the current task
37
62
  - unverified assumptions
38
- - case-private output file names
39
- - content that belongs in the harness
40
- - raw annotation copy tied to one product case
63
+ - broad advice without concrete workflow value
64
+ - content that belongs in ROLE, CONTEXT, CONSTRAINTS, or STANDARD instead of a skill
65
+ - anything whose main effect is larger context without lower future cost
66
+
67
+ ## Update strategy
68
+ When reflection finds reusable value:
69
+ 1. Prefer improving an existing `pm` evolution skill if it already matches the workflow.
70
+ 2. Create a new evolution skill only when no suitable one exists.
71
+ 3. Keep the result short and operational.
72
+ 4. Optimize for faster future convergence, not completeness.
73
+
74
+ Prefer fewer, stronger skills over more skill files.
41
75
 
42
- ## Final rule
43
- If the task did not reveal a clearly reusable improvement, leave `.optimus-runtime/data/evolution-skills` unchanged.
76
+ ## Final principle
77
+ If this task did not reveal a clearly reusable shortcut or cost-saving workflow, leave `.optimus-runtime/data/evolution-skills` unchanged. That is a correct outcome.