npm - @sireai/optimus - Versions diffs - 0.1.9 → 0.1.10 - Mend

@sireai/optimus 0.1.9 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

package/task-harnesses/bugfix/STANDARD.md CHANGED Viewed

@@ -19,30 +19,45 @@
 ## Validation policy
 - A claimed fix requires validation.
-- Prefer stronger evidence before weaker evidence.
-- Report the strongest level reached and why stronger levels were unavailable.
+- Prefer the highest-reliability validation that is feasible now, not the cheapest one.
+- If a stronger level is blocked, state the blocker and downgrade explicitly.
+- Report exactly one strongest token and keep the method/result details below it.
+- Also report exactly one validation grade `V1` to `V5`.
-1. `L4 functional`: real device, simulator, or another directly runnable environment
-2. `L3 self-check`: local unit tests, targeted custom tests, lightweight scenario injection
-3. `L2 build`: relevant compile target, module build, or targeted test task
-4. `L1 code evidence`: static reasoning, call-chain review, diff review
+Reliability and cost order:
-### Android order
+1. `V5`: `device_verified` - highest confidence, highest cost
+2. `V4`: `simulator_verified`, `scenario_verified` - very high or high confidence, high or medium-high cost
+3. `V3`: `regression_tests_passed`, `unit_tests_passed` - medium-high or medium confidence, medium or medium-high cost
+4. `V2`: `module_build_passed`, `compile_passed`, `targeted_tests_passed` - partial executable proof, medium to lower confidence, low-medium to medium cost
+5. `V1`: `code_reviewed` - lowest confidence, lowest cost
-1. Prefer real-device or simulator validation when `adb` and runnable targets exist.
-2. Otherwise prefer local tests or scenario injection.
-3. Use device-side automated checks only when local validation cannot cover the behavior.
-4. Fall back to compile validation such as `compileDebugKotlin` or relevant unit tests.
-5. Use code-only validation only when all stronger forms are unavailable.
+Token contract:
+- `device_verified` -> `V5`: real device, real business path, fix behavior observed
+- `simulator_verified` -> `V4`: simulator/emulator, real business path, fix behavior observed
+- `scenario_verified` -> `V4`: directly runnable real path or end-to-end scenario executed in a near-real environment
+- `regression_tests_passed` -> `V3`: multiple relevant existing regression/integration cases passed
+- `unit_tests_passed` -> `V3`: real project unit tests passed
+- `module_build_passed` -> `V2`: real module/package build passed
+- `compile_passed` -> `V2`: real compile target passed
+- `targeted_tests_passed` -> `V2`: targeted harness, stub, mock, injected script, minimal API-surface check, or one-off focused executable proof passed
+- `code_reviewed` -> `V1`: no executable validation succeeded; conclusion relies on code/log/evidence review only
+Never overstate:
+- Stub, mock, temporary harness, minimal API surface, injected script, or ad hoc executable proof must be `targeted_tests_passed`, not `scenario_verified`.
+- Real compile/build did not pass: do not claim `compile_passed` or `module_build_passed`.
+- No real device/simulator path was exercised: do not claim `device_verified`, `simulator_verified`, or `scenario_verified`.
 ## Closure policy
 - Close as fix only when analysis, code changes, validation evidence, and residual-risk understanding are credible.
 - Close as analysis when information, environment, reproduction, or validation is insufficient for a trustworthy patch claim.
-- If code changed but validation reached only `L2` or `L1`, describe it as a repair candidate, not a verified fix.
-- If the issue is interaction, crash, device, integration, or resource related and validation stayed at `L2`, state what stronger environment or tooling was missing.
+- If code changed but fix validation stayed at `V2` or `V1`, describe it as a repair candidate, not a verified fix.
+- If the issue is interaction, crash, device, integration, or resource related and fix validation stayed at `V2`, state what stronger environment or tooling was missing.
 - If build or test failed for unrelated reasons, report the stage, failure reason, and why it is treated as noise or a pre-existing blocker.
-- If only `L1` evidence exists, do not submit a formal patch claim; close as analysis.
+- If only `V1` evidence exists, do not submit a formal verified-fix claim; close as analysis unless a repair candidate is still justified.
 - Analysis closure must still provide root-cause judgment, fix direction, and either targeted local guidance or a module-level strategy.
 ## Runtime contract
@@ -81,7 +96,8 @@
 - Before writing `result.md`, determine `Closure Level`, then follow exactly one language mode:
   - `Verified Fix` or `Repair Candidate`: Patch Closure Mode; all narrative sections are English
   - `Analysis Only`: Analysis Closure Mode; narrative sections are Chinese
-- `Validation Summary` stays a single English token in all cases.
+- `Reproduction` / `复现情况` uses the compact form `<token> (R*)` when grade is known.
+- `Fix Validation` / `修复验证` uses the compact form `<token> (V*)` when grade is known.
 - Use repository-relative code paths only; never use absolute local paths.
 - Commands, logs, stack traces, API errors, and identifiers may stay in their original language when needed.
 - If closure is patch-related and any narrative field is Chinese, the output is invalid and must be rewritten.
@@ -91,14 +107,25 @@
 Do not rename these downstream-consumed keys:
 - English patch-mode keys:
-  - `Validation Summary`
+  - `Root Cause`
+  - `Fix`
+  - `Reproduction`
+  - `Fix Validation`
+  - `Impact Check`
+  - `Confidence`
+  - `Blocking Point`
   - `Strongest Current Conclusion`
-  - `Analysis Summary`
   - `Key Evidence`
   - `Recommended Action`
   - `Analysis Doc URL`
 - Chinese analysis-mode keys:
-  - `验证摘要`
+  - `根因摘要`
+  - `修复建议`
+  - `复现情况`
+  - `修复验证`
+  - `影响评估`
+  - `确定性`
+  - `阻塞点`
   - `当前最强结论`
   - `分析摘要`
   - `关键证据`
@@ -111,10 +138,25 @@ Keep exact capitalization and wording.
 At minimum, `result.md` must include:
+- one compact summary block for downstream delivery consumers
+  Patch mode fields: `Root Cause`, `Fix`, `Reproduction`, `Fix Validation`, `Impact Check`, `Confidence`, `Blocking Point`
+  Analysis mode fields: `根因摘要`, `修复建议`, `复现情况`, `修复验证`, `影响评估`, `确定性`, `阻塞点`
+  Keep each summary value dense and short enough for comment/card reuse.
+- a reproduction summary as one high-density English token
+  Allowed values: `naturally_reproduced`, `induced_reproduced`, `historical_evidence_matched`, `not_reproduced`
+- a reproduction grade inside `Reproduction` / `复现情况`
+  Allowed values: `R1`, `R2`, `R3`, `R4`
 - a validation summary as one high-density English token
   Allowed values: `device_verified`, `simulator_verified`, `scenario_verified`, `unit_tests_passed`, `targeted_tests_passed`, `regression_tests_passed`, `compile_passed`, `module_build_passed`, `code_reviewed`
   Forbidden generic values: `validation_completed`, `tests_passed`, `verified`, `passed`, `done`
-  If multiple validations were performed, report only the strongest one.
+  If multiple validations were performed, report only the strongest one by the reliability order above.
+- a validation grade inside `Fix Validation` / `修复验证`
+  Allowed values: `V1`, `V2`, `V3`, `V4`, `V5`
+  It must match the selected validation token.
+- an impact-check summary token
+  Allowed values: `neighbor_paths_checked`, `partial_neighbor_check`, `not_checked`
+- a confidence token
+  Allowed values: `C1`, `C2`, `C3`, `C4`
 - problem summary and impact scope
 - category: functional, stability, performance, or compatibility
 - reproduction likelihood: always, high, low, or unknown
@@ -145,30 +187,42 @@ At minimum, `result.md` must include:
 ```md
 # Bugfix Result
-## Summary
+## Delivery Summary
+- Root Cause:
+- Fix:
+- Reproduction:
+- Fix Validation:
+- Impact Check:
+- Confidence:
+- Blocking Point: `None` if not blocked
+## Detail
+### Summary
 - Problem:
 - Impact:
 - Category: Functional / Stability / Performance / Compatibility
 - Reproduction Likelihood: Always / High / Low / Unknown
-## Root Cause
+### Root Cause
 - Strongest Current Conclusion:
 - Key Evidence:
 - Relevant Code Locations:
-## Change
+### Change
 - Closure Level: Verified Fix / Repair Candidate
 - Patch Notes:
 - Fix Strategy:
 - Blocking Point: `None` if not blocked
-## Validation
+### Validation
 - Validation Summary: exactly one short English token, strongest validation only
+- Validation Grade: exactly one token, `V1` to `V5`, matching `Validation Summary`
 - Method:
 - Result:
 - Unverified Items:
-## Risks
+### Risks
 - Residual Risk:
 - Recommended Action:
 ```
@@ -190,29 +244,41 @@ Use the following Chinese output structure exactly:
 ```md
 # 缺陷分析结果
-## 问题概述
+## 交付摘要
+- 根因摘要:
+- 修复建议:
+- 复现情况: `<token> (R*)` when grade is known
+- 修复验证: `<token> (V*)` when grade is known
+- 影响评估:
+- 确定性:
+- 阻塞点:
+## 详细分析
+### 问题概述
 - 问题:
 - 影响:
 - 分类: 功能 / 稳定性 / 性能 / 兼容性
 - 复现概率: 必现 / 高概率 / 低概率 / 未知
-## 分析结论
+### 分析结论
 - 当前最强结论:
 - 分析摘要:
 - 关键证据:
 - 相关代码位置:
-## 修复判断
+### 修复判断
 - Closure Level: Analysis Only
 - 阻塞点:
 - 建议动作:
-## 验证情况
+### 验证情况
 - 验证摘要: exactly one short English token, strongest validation only
+- 验证等级: exactly one token, `V1` to `V5`, matching `验证摘要`
 - 已验证内容:
 - 未验证内容:
-## 风险
+### 风险
 - 主要风险:
 - 建议动作:
 - 分析文档链接:
@@ -236,6 +302,8 @@ Patch closure examples:
 - If code changed, ensure `result.md` and `patch.diff` do not contradict each other.
 - If important code changed, ensure explanatory comments are present.
 - If validation was performed, ensure claims are not overstated.
+- Ensure validation token matches the strongest proof actually executed, not the intended proof.
+- Ensure `Delivery Summary` / `交付摘要` is present and consistent with the detailed sections below it.
 - Distinguish confirmed facts from inference.
 - For patch closure, ensure `Strongest Current Conclusion`, `Key Evidence`, and `Recommended Action` are English before returning.
@@ -248,4 +316,9 @@ Patch closure examples:
 - required downstream field names are exact
 - patch closure does not emit `Analysis Summary`
 - analysis closure does not mix in patch-delivery sections
-- `Validation Summary` or `验证摘要` is exactly one strongest English token
+- `Reproduction` or `复现情况` uses exactly one allowed token and an `R*` grade when grade is present
+- `Fix Validation` or `修复验证` uses exactly one strongest allowed validation token and a matching `V*` grade when grade is present
+- `Impact Check` or `影响评估` uses exactly one allowed token
+- `Confidence` or `确定性` uses exactly one allowed token
+- detailed validation section still contains `Validation Summary` / `验证摘要`
+- detailed validation section still contains `Validation Grade` / `验证等级`

package/embedded-skills/shared/video-keyframe-analyzer/references/encountered-problems.md DELETED Viewed

@@ -1,12 +0,0 @@
-# Encountered Problems
-Known operational issues from real Jira recording analysis:
-- Jira attachment links may require browser/CAS auth; direct curl can return a login page.
-- Browser downloads may appear first as hidden temporary files before the final filename is visible.
-- Interactive shell environment variables may not reach non-interactive task commands.
-- Cloud video understanding can be blocked by network sandboxing.
-- Remote video processing can be slow even for short recordings.
-- Remote polling can fail with transient partial reads.
-The local keyframe workflow avoids most of these by reducing video analysis to local image artifacts.

package/embedded-skills/shared/video-keyframe-analyzer/references/triage-checklist.md DELETED Viewed

@@ -1,48 +0,0 @@
-# Triage Checklist
-Use this compact structure after reviewing keyframes.
-## Summary
-- User flow
-- Visible failure
-- Post-failure state
-## Timeline
-- `00:00-00:03`: entry state
-- `00:03-00:08`: visible user action or transition
-- `00:08-00:10`: anomaly
-- `00:10+`: post-failure behavior
-## Repro Steps
-Infer only what the recording supports:
-1. Open app and enter the visible module.
-2. Perform the visible interaction.
-3. Observe the incorrect outcome.
-## Observed vs Expected
-- Observed: what the UI actually did.
-- Expected: what should happen in a healthy flow.
-## Evidence
-Useful visual evidence includes:
-- app returns to launcher unexpectedly
-- relaunch lands on login or wrong page
-- dialog remains visible after state changes
-- loading spinner or white screen stays unchanged
-- expected toast, navigation, or dismissal never appears
-## Open Questions
-Keep speculation separate:
-- account or environment state
-- network condition
-- exact trigger if not visible
-- whether the issue reproduces across devices or accounts