npm - @sireai/optimus - Versions diffs - 0.1.9 → 0.1.11 - Mend

@sireai/optimus 0.1.9 → 0.1.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

package/task-harnesses/bugfix/STANDARD.md CHANGED Viewed

@@ -19,30 +19,45 @@
 ## Validation policy
 - A claimed fix requires validation.
-- Prefer stronger evidence before weaker evidence.
-- Report the strongest level reached and why stronger levels were unavailable.
+- Prefer the highest-reliability validation that is feasible now, not the cheapest one.
+- If a stronger level is blocked, state the blocker and downgrade explicitly.
+- Report exactly one strongest token and keep the method/result details below it.
+- Also report exactly one validation grade `V1` to `V5`.
-1. `L4 functional`: real device, simulator, or another directly runnable environment
-2. `L3 self-check`: local unit tests, targeted custom tests, lightweight scenario injection
-3. `L2 build`: relevant compile target, module build, or targeted test task
-4. `L1 code evidence`: static reasoning, call-chain review, diff review
+Reliability and cost order:
-### Android order
+1. `V5`: `device_verified` - highest confidence, highest cost
+2. `V4`: `simulator_verified`, `scenario_verified` - very high or high confidence, high or medium-high cost
+3. `V3`: `regression_tests_passed`, `unit_tests_passed` - medium-high or medium confidence, medium or medium-high cost
+4. `V2`: `module_build_passed`, `compile_passed`, `targeted_tests_passed` - partial executable proof, medium to lower confidence, low-medium to medium cost
+5. `V1`: `code_reviewed` - lowest confidence, lowest cost
-1. Prefer real-device or simulator validation when `adb` and runnable targets exist.
-2. Otherwise prefer local tests or scenario injection.
-3. Use device-side automated checks only when local validation cannot cover the behavior.
-4. Fall back to compile validation such as `compileDebugKotlin` or relevant unit tests.
-5. Use code-only validation only when all stronger forms are unavailable.
+Token contract:
+- `device_verified` -> `V5`: real device, real business path, fix behavior observed
+- `simulator_verified` -> `V4`: simulator/emulator, real business path, fix behavior observed
+- `scenario_verified` -> `V4`: directly runnable real path or end-to-end scenario executed in a near-real environment
+- `regression_tests_passed` -> `V3`: multiple relevant existing regression/integration cases passed
+- `unit_tests_passed` -> `V3`: real project unit tests passed
+- `module_build_passed` -> `V2`: real module/package build passed
+- `compile_passed` -> `V2`: real compile target passed
+- `targeted_tests_passed` -> `V2`: targeted harness, stub, mock, injected script, minimal API-surface check, or one-off focused executable proof passed
+- `code_reviewed` -> `V1`: no executable validation succeeded; conclusion relies on code/log/evidence review only
+Never overstate:
+- Stub, mock, temporary harness, minimal API surface, injected script, or ad hoc executable proof must be `targeted_tests_passed`, not `scenario_verified`.
+- Real compile/build did not pass: do not claim `compile_passed` or `module_build_passed`.
+- No real device/simulator path was exercised: do not claim `device_verified`, `simulator_verified`, or `scenario_verified`.
 ## Closure policy
 - Close as fix only when analysis, code changes, validation evidence, and residual-risk understanding are credible.
 - Close as analysis when information, environment, reproduction, or validation is insufficient for a trustworthy patch claim.
-- If code changed but validation reached only `L2` or `L1`, describe it as a repair candidate, not a verified fix.
-- If the issue is interaction, crash, device, integration, or resource related and validation stayed at `L2`, state what stronger environment or tooling was missing.
+- If code changed but fix validation stayed at `V2` or `V1`, describe it as a repair candidate, not a verified fix.
+- If the issue is interaction, crash, device, integration, or resource related and fix validation stayed at `V2`, state what stronger environment or tooling was missing.
 - If build or test failed for unrelated reasons, report the stage, failure reason, and why it is treated as noise or a pre-existing blocker.
-- If only `L1` evidence exists, do not submit a formal patch claim; close as analysis.
+- If only `V1` evidence exists, do not submit a formal verified-fix claim; close as analysis unless a repair candidate is still justified.
 - Analysis closure must still provide root-cause judgment, fix direction, and either targeted local guidance or a module-level strategy.
 ## Runtime contract
@@ -81,40 +96,75 @@
 - Before writing `result.md`, determine `Closure Level`, then follow exactly one language mode:
   - `Verified Fix` or `Repair Candidate`: Patch Closure Mode; all narrative sections are English
   - `Analysis Only`: Analysis Closure Mode; narrative sections are Chinese
-- `Validation Summary` stays a single English token in all cases.
+- `Reproduction` / `复现情况` uses the compact form `<token> (R*)` when grade is known.
+- `Fix Validation` / `修复验证` uses the compact form `<token> (V*)` when grade is known.
 - Use repository-relative code paths only; never use absolute local paths.
 - Commands, logs, stack traces, API errors, and identifiers may stay in their original language when needed.
 - If closure is patch-related and any narrative field is Chinese, the output is invalid and must be rewritten.
 ## Mandatory field-name contract
-Do not rename these downstream-consumed keys:
+These keys are parsed by downstream delivery code. Keep exact capitalization and wording. Do not rename them.
+Two layers are consumed:
+- Delivery Summary keys: compact, high-density fields for cards, comments, and status output
+- Detail keys: strongest conclusion, evidence, and next-action fields reused by downstream renderers
-- English patch-mode keys:
-  - `Validation Summary`
+Patch mode:
+- Delivery Summary keys:
+  - `Root Cause`
+  - `Fix`
+  - `Reproduction`
+  - `Fix Validation`
+  - `Impact Check`
+  - `Confidence`
+  - `Blocking Point`
+- Detail keys:
   - `Strongest Current Conclusion`
-  - `Analysis Summary`
   - `Key Evidence`
   - `Recommended Action`
   - `Analysis Doc URL`
-- Chinese analysis-mode keys:
-  - `验证摘要`
+Analysis mode:
+- Delivery Summary keys:
+  - `根因摘要`
+  - `修复建议`
+  - `复现情况`
+  - `修复验证`
+  - `影响评估`
+  - `确定性`
+  - `阻塞点`
+- Detail keys:
   - `当前最强结论`
   - `分析摘要`
   - `关键证据`
   - `建议动作`
   - `分析文档链接`
-Keep exact capitalization and wording.
 ## Result content
 At minimum, `result.md` must include:
+- one compact `Delivery Summary` / `交付摘要` block using the exact Delivery Summary keys above
+  Keep each summary value dense and short enough for comment/card reuse.
+- a reproduction summary as one high-density English token
+  Allowed values: `naturally_reproduced`, `induced_reproduced`, `historical_evidence_matched`, `not_reproduced`
+- a reproduction grade inside `Reproduction` / `复现情况`
+  Allowed values: `R1`, `R2`, `R3`, `R4`
 - a validation summary as one high-density English token
   Allowed values: `device_verified`, `simulator_verified`, `scenario_verified`, `unit_tests_passed`, `targeted_tests_passed`, `regression_tests_passed`, `compile_passed`, `module_build_passed`, `code_reviewed`
   Forbidden generic values: `validation_completed`, `tests_passed`, `verified`, `passed`, `done`
-  If multiple validations were performed, report only the strongest one.
+  If multiple validations were performed, report only the strongest one by the reliability order above.
+- a validation grade inside `Fix Validation` / `修复验证`
+  Allowed values: `V1`, `V2`, `V3`, `V4`, `V5`
+  It must match the selected validation token.
+- an impact-check summary token
+  Allowed values: `neighbor_paths_checked`, `partial_neighbor_check`, `not_checked`
+- a confidence token
+  Allowed values: `C1`, `C2`, `C3`, `C4`
 - problem summary and impact scope
 - category: functional, stability, performance, or compatibility
 - reproduction likelihood: always, high, low, or unknown
@@ -135,40 +185,49 @@ At minimum, `result.md` must include:
 - All narrative text must be English.
 - Do not emit Chinese prose in headings, bullets, conclusions, evidence, patch notes, validation narrative, risks, or next steps.
 - Do not emit `Analysis Summary` in patch closure output.
-- Patch closure must include:
-  - `Strongest Current Conclusion`
-  - `Key Evidence`
-  - `Recommended Action`
+- Patch closure must include the exact Patch Detail keys from the field-name contract.
 ### Patch Closure Template
 ```md
 # Bugfix Result
-## Summary
+## Delivery Summary
+- Root Cause:
+- Fix:
+- Reproduction:
+- Fix Validation:
+- Impact Check:
+- Confidence:
+- Blocking Point: `None` if not blocked
+## Detail
+### Summary
 - Problem:
 - Impact:
 - Category: Functional / Stability / Performance / Compatibility
 - Reproduction Likelihood: Always / High / Low / Unknown
-## Root Cause
+### Root Cause
 - Strongest Current Conclusion:
 - Key Evidence:
 - Relevant Code Locations:
-## Change
+### Change
 - Closure Level: Verified Fix / Repair Candidate
 - Patch Notes:
 - Fix Strategy:
 - Blocking Point: `None` if not blocked
-## Validation
+### Validation
 - Validation Summary: exactly one short English token, strongest validation only
+- Validation Grade: exactly one token, `V1` to `V5`, matching `Validation Summary`
 - Method:
 - Result:
 - Unverified Items:
-## Risks
+### Risks
 - Residual Risk:
 - Recommended Action:
 ```
@@ -181,7 +240,7 @@ At minimum, `result.md` must include:
 - Narrative text must remain Chinese.
 - `验证摘要` must still be a single English token.
 - Do not force patch-delivery prose such as `Patch Notes` into analysis closure output.
-- Analysis closure must include the Chinese contract keys for strongest conclusion, summary, evidence, and recommendation.
+- Analysis closure must include the exact Analysis Detail keys from the field-name contract.
 ### Analysis Closure Template
@@ -190,29 +249,41 @@ Use the following Chinese output structure exactly:
 ```md
 # 缺陷分析结果
-## 问题概述
+## 交付摘要
+- 根因摘要:
+- 修复建议:
+- 复现情况: `<token> (R*)` when grade is known
+- 修复验证: `<token> (V*)` when grade is known
+- 影响评估:
+- 确定性:
+- 阻塞点:
+## 详细分析
+### 问题概述
 - 问题:
 - 影响:
 - 分类: 功能 / 稳定性 / 性能 / 兼容性
 - 复现概率: 必现 / 高概率 / 低概率 / 未知
-## 分析结论
+### 分析结论
 - 当前最强结论:
 - 分析摘要:
 - 关键证据:
 - 相关代码位置:
-## 修复判断
+### 修复判断
 - Closure Level: Analysis Only
 - 阻塞点:
 - 建议动作:
-## 验证情况
+### 验证情况
 - 验证摘要: exactly one short English token, strongest validation only
+- 验证等级: exactly one token, `V1` to `V5`, matching `验证摘要`
 - 已验证内容:
 - 未验证内容:
-## 风险
+### 风险
 - 主要风险:
 - 建议动作:
 - 分析文档链接:
@@ -236,6 +307,8 @@ Patch closure examples:
 - If code changed, ensure `result.md` and `patch.diff` do not contradict each other.
 - If important code changed, ensure explanatory comments are present.
 - If validation was performed, ensure claims are not overstated.
+- Ensure validation token matches the strongest proof actually executed, not the intended proof.
+- Ensure `Delivery Summary` / `交付摘要` is present and consistent with the detailed sections below it.
 - Distinguish confirmed facts from inference.
 - For patch closure, ensure `Strongest Current Conclusion`, `Key Evidence`, and `Recommended Action` are English before returning.
@@ -245,7 +318,13 @@ Patch closure examples:
 - if `patch.diff` exists, closure mode is not `Analysis Only`
 - patch closure uses English narrative only
 - analysis closure uses Chinese narrative only
-- required downstream field names are exact
+- Delivery Summary keys are exact
+- Detail keys are exact
 - patch closure does not emit `Analysis Summary`
 - analysis closure does not mix in patch-delivery sections
-- `Validation Summary` or `验证摘要` is exactly one strongest English token
+- `Reproduction` or `复现情况` uses exactly one allowed token and an `R*` grade when grade is present
+- `Fix Validation` or `修复验证` uses exactly one strongest allowed validation token and a matching `V*` grade when grade is present
+- `Impact Check` or `影响评估` uses exactly one allowed token
+- `Confidence` or `确定性` uses exactly one allowed token
+- detailed validation section still contains `Validation Summary` / `验证摘要`
+- detailed validation section still contains `Validation Grade` / `验证等级`

package/embedded-skills/shared/video-keyframe-analyzer/references/encountered-problems.md DELETED Viewed

@@ -1,12 +0,0 @@
-# Encountered Problems
-Known operational issues from real Jira recording analysis:
-- Jira attachment links may require browser/CAS auth; direct curl can return a login page.
-- Browser downloads may appear first as hidden temporary files before the final filename is visible.
-- Interactive shell environment variables may not reach non-interactive task commands.
-- Cloud video understanding can be blocked by network sandboxing.
-- Remote video processing can be slow even for short recordings.
-- Remote polling can fail with transient partial reads.
-The local keyframe workflow avoids most of these by reducing video analysis to local image artifacts.

package/embedded-skills/shared/video-keyframe-analyzer/references/triage-checklist.md DELETED Viewed

@@ -1,48 +0,0 @@
-# Triage Checklist
-Use this compact structure after reviewing keyframes.
-## Summary
-- User flow
-- Visible failure
-- Post-failure state
-## Timeline
-- `00:00-00:03`: entry state
-- `00:03-00:08`: visible user action or transition
-- `00:08-00:10`: anomaly
-- `00:10+`: post-failure behavior
-## Repro Steps
-Infer only what the recording supports:
-1. Open app and enter the visible module.
-2. Perform the visible interaction.
-3. Observe the incorrect outcome.
-## Observed vs Expected
-- Observed: what the UI actually did.
-- Expected: what should happen in a healthy flow.
-## Evidence
-Useful visual evidence includes:
-- app returns to launcher unexpectedly
-- relaunch lands on login or wrong page
-- dialog remains visible after state changes
-- loading spinner or white screen stays unchanged
-- expected toast, navigation, or dismissal never appears
-## Open Questions
-Keep speculation separate:
-- account or environment state
-- network condition
-- exact trigger if not visible
-- whether the issue reproduces across devices or accounts