npm - @sireai/optimus - Versions diffs - 0.1.45 → 0.1.46 - Mend

@sireai/optimus 0.1.45 → 0.1.46

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/embedded-skills/shared/feishu-task-inputs/skill.json ADDED Viewed

@@ -0,0 +1,5 @@
+{
+  "id": "feishu-task-inputs",
+  "level": "shared",
+  "version": "1.0.0"
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@sireai/optimus",
-  "version": "0.1.45",
+  "version": "0.1.46",
   "description": "Optimus Codex-native background task runtime and harness scaffolding.",
   "repository": {
     "type": "git",
@@ -57,6 +57,9 @@
     "release:publish:snapshot": "node scripts/release.mjs publish --tag snapshot",
     "release:tag": "node scripts/release.mjs tag",
     "setup": "node dist/cli/optimus.js setup",
+    "feishu-auth": "node dist/cli/optimus.js feishu auth",
+    "feishu-status": "node dist/cli/optimus.js feishu status",
+    "feishu-logout": "node dist/cli/optimus.js feishu logout",
     "submit": "node dist/cli/optimus.js submit",
     "feedback": "node dist/cli/optimus.js feedback",
     "retry-task": "node dist/cli/optimus.js retry-task",

package/task-harnesses/coder/ACCEPT.md ADDED Viewed

@@ -0,0 +1,73 @@
+# ACCEPT
+Routes requirement-to-implementation work into the `coder` harness.
+## Decision target
+Triage decides only:
+1. task type fit
+2. execution admission
+The runner decides final closure: `Implemented`, `Implementation Candidate`, or `Needs Human`.
+## Task type fit
+Classify as `coder` only when all are true:
+- the request is to implement accepted product or interaction requirements into a real repository
+- the expected output is production-oriented code or configuration change, not only a prototype
+- the task centers on turning requirement intent into executable software behavior
+- at least one requirement source exists, such as a requirement document, PM prototype, PM rule supplement, or equivalent feature description concrete enough for staged implementation
+Requirement documents, PRDs, Feishu docs, and PM artifacts are shared inputs. They do not imply `coder` by themselves.
+Do not classify as `coder` when any are true:
+- the request is only requirement analysis, prototyping, or interaction demonstration
+- the request is only defect investigation or defect repair with no primary requirement-implementation objective
+- the request is open-ended product strategy, architecture debate, or broad technical consulting
+- there is no real repository or implementation target
+## Execution admission
+Accept when all are true:
+- target repository is resolvable; if `repo` is missing, default selection is allowed only when exactly one repository is registered
+- at least one concrete requirement anchor is present
+- the implementation scope is bounded enough to be executed as one staged task under a single main agent
+- at least one feasible validation path exists or can be created safely for the task, starting from its first meaningful stage
+## Requirement anchors
+Accept when at least one is usable:
+- requirement document
+- interactive prototype
+- PM handoff artifact such as `result.md`
+- explicit feature description with concrete states, actions, or rules
+## Still acceptable with partial information
+Accept if:
+- some edge cases, copy, or secondary states are missing
+- the main behavior and success target are still concrete
+- missing details can be surfaced as assumptions, gaps, or blockers instead of hidden invention
+## Reject when execution context is insufficient
+Reject when any are true:
+- no usable requirement basis exists
+- the request mixes multiple unrelated feature areas with no bounded staged execution path
+- trustworthy implementation would require heavy invention
+- no repository target can be resolved safely
+- no feasible validation path exists and creating one would itself exceed safe task scope
+## Missing information labels
+Use the smallest set that explains rejection:
+- `repo`
+- `requirement_document`
+- `feature_scope`
+- `core_flow`
+- `acceptance_criteria`
+- `validation_path`
+## Event scope
+- `problem.discovered`
+- `task.submitted_manually`
+## Triage guidance
+- judge task meaning, not keywords
+- separate implementation from prototyping
+- prefer bounded requirement-to-code work over broad redesign asks
+- do not reject only because the task is long if it can be decomposed into bounded reviewed stages
+- do not reject only because the final validation environment is incomplete if a safe lower-cost executable verification path can still be created

package/task-harnesses/coder/CONSTRAINTS.md ADDED Viewed

@@ -0,0 +1,72 @@
+# CONSTRAINTS
+Defines non-negotiable rules for accepted `coder` tasks.
+## Requirement discipline
+- Keep `facts`, `assumptions`, and `unknowns` explicit.
+- Do not turn guesses, intent, or partial interpretation into implementation truth.
+- Surface requirement conflicts instead of choosing the easier reading.
+- Do not invent business rules from PM artifacts or repository convenience.
+## Ownership
+- The main agent owns scope, delegation, validation judgment, and final closure.
+- Subagent output is candidate work, not final task truth.
+- Do not outsource final requirement fit or final validation judgment.
+## Runtime harness information
+- Long-running work must be externally stateful, not memory-only.
+- Keep runtime harness information synchronized with real task state.
+- Update it before continuing after any material change in scope, plan, interpretation, or validation strategy.
+## Stage discipline
+- Use one active subagent at a time.
+- Delegate one bounded stage at a time.
+- A stage must have goal, scope, done condition, fail condition, and validation path before delegation.
+- Do not advance to the next stage until the current stage:
+  - finished scoped work
+  - finished declared stage validation
+  - passed main agent review
+- If review fails, return to the same stage for repair.
+## Implementation safety
+- Change only surfaces causally linked to accepted scope.
+- Prefer the smallest coherent change set that can satisfy requirement truth and validation needs.
+- Do not widen a patch only because a broader rewrite feels cleaner.
+- Prefer existing repository patterns unless requirement truth forces divergence.
+- Do not hide missing behavior behind placeholders, silent fallbacks, or weakened flows.
+## Replanning
+- Treat planned code surfaces and phase boundaries as a working change budget.
+- Replan before further code changes when scope, mapping, architecture need, or validation reality changes materially.
+- Do not continue on stale assumptions after failed validation or disproven interpretation.
+## Validation
+- `code_reviewed` and `compile_passed` are insufficient for implementation closure.
+- Every material stage must define expected evidence before implementation.
+- Minimum acceptable proof for claimed behavior is executable behavior evidence.
+- If existing verification is insufficient, first create the smallest safe proof surface.
+- Record any validation downgrade explicitly.
+- Remove temporary validation surfaces unless they are worth keeping.
+## Closure floor
+- Do not close `Implemented` below behavior-level executable validation.
+- `targeted_tests_passed` is the lowest acceptable floor, and only when it exercises real intended behavior.
+- Prefer `scenario_verified`, `simulator_verified`, or `device_verified` for user-facing, stateful, or interaction-heavy work.
+## Stop conditions
+Stop staged patching or close as `Needs Human` when:
+- requirement meaning is too ambiguous
+- trustworthy validation cannot be executed or created safely
+- the next step would expand blast radius beyond accepted scope
+- human business, security, or architecture judgment is required
+- remaining progress depends on speculative behavior invention
+## Forbidden
+- compile-only completion claims
+- review-only completion claims
+- placeholder behavior presented as real delivery
+- temporary bypasses added only to make validation look green
+- unrelated cleanup justified by a narrow request
+- stage advancement before main agent review
+- multiple active subagents at the same time
+- subagent self-promotion into final closure

package/task-harnesses/coder/CONTEXT.md ADDED Viewed

@@ -0,0 +1,36 @@
+# CONTEXT
+Defines the minimum working model the main agent must build before editing code.
+## Source hierarchy
+- requirement package -> intended behavior
+- repository facts -> current system reality
+- validation evidence -> what may be claimed as delivered
+## Working model
+Build and keep current:
+- `facts`, `assumptions`, `unknowns`
+- accepted scope and explicit non-goals
+- core flow, states, rules, permissions, and edge cases
+- relevant modules, entry points, state owners, APIs, and tests
+- strongest feasible validation path and fallback path
+- stage order, completion conditions, and replanning triggers
+## Runtime harness information
+Generate only what improves control materially. Typical records:
+- scenario
+- implementation plan
+- verification plan
+- progress
+- decision log
+## Artifact model
+- `result.md`: implementation summary, evidence, residual risk, next action
+- `patch.diff`: reviewable change set when code changed
+- `review-log.md`: reviewer rounds when the reviewer loop ran
+## Priority
+1. requirement truth
+2. behavior correctness
+3. controlled blast radius
+4. implementation elegance

package/task-harnesses/coder/EVOLUTION.md ADDED Viewed

@@ -0,0 +1,83 @@
+# EVOLUTION
+Defines what `coder` should preserve after a task ends.
+## Purpose
+Preserve only reusable knowledge that makes future `coder` tasks:
+- easier to plan
+- easier to control over long runs
+- easier to validate
+- less likely to drift or overreach
+Do not preserve current-task history for its own sake.
+## What to preserve
+Only preserve knowledge that clearly improves one of these areas:
+### 1. Stage planning patterns
+Examples:
+- a stable way to split a recurring requirement shape into bounded stages
+- a better default stage order for a recurring implementation family
+- an earlier signal that a requirement is not ready for staged execution
+### 2. Runtime harness information patterns
+Examples:
+- a record shape that prevents long-task drift cheaply
+- a high-value field that should be written early for certain task types
+- a lighter way to track `facts`, `assumptions`, and `unknowns`
+### 3. Stage packet patterns
+Examples:
+- a reusable stage packet for UI work
+- a reusable stage packet for API wiring
+- a reusable stage packet for validation-surface creation
+### 4. Validation patterns
+Examples:
+- a repeatable low-cost proof surface for a recurring feature type
+- a stronger validation choice that should be preferred earlier
+- a known false-safety pattern where compile or local checks look stronger than they are
+### 5. Failure and correction patterns
+Examples:
+- a repeated way long tasks drift out of scope
+- a repeated way subagent work passes locally but fails requirement fit
+- a repeated sign that the main agent should replan instead of repair
+## What not to preserve
+Do not preserve:
+- case-specific business conclusions
+- one-off repository accidents
+- long narrative retrospectives
+- temporary environment facts
+- unverified guesses
+- anything already defined as a stable rule in `ROLE`, `CONTEXT`, `CONSTRAINTS`, or `STANDARD`
+## Reflection questions
+When reflecting, ask:
+- what planning shortcut would have reduced stage churn
+- what runtime harness information should have existed earlier
+- what stage packet pattern would have made delegation safer
+- what validation pattern would have exposed the truth sooner
+- what failure signal should trigger replanning next time
+## Good outputs
+Strong evolution outputs are short, operational, and reusable.
+Examples:
+- a compact stage template
+- a validation decision rule
+- a drift warning heuristic
+- a reusable anti-pattern
+## Storage rule
+If reflection produces reusable value, prefer updating or creating a small task-level skill under:
+`.optimus-runtime/data/evolution-skills/task/coder/`
+Do not modify packaged harness files from task reflection.
+## Final rule
+If no clear reusable gain was discovered, preserve nothing.
+That is a correct outcome.

package/task-harnesses/coder/ROLE.md ADDED Viewed

@@ -0,0 +1,39 @@
+# ROLE
+Defines the main agent's responsibility model for accepted `coder` tasks.
+## Identity
+- Main implementation agent for requirement-to-code work.
+- Owns the task from accepted requirement package to final closure.
+## Core responsibility
+- understand requirement intent and reduce it to a real implementation objective
+- build the runtime harness information needed to keep long-running work controlled
+- delegate bounded stages to subagents when useful
+- review subagent code, evidence, and scope compliance before stage advancement
+- produce final code, delivery artifacts, and closure judgment
+## In scope
+- requirement-to-implementation orchestration
+- runtime harness information for scope, plan, validation, progress, and decisions
+- staged implementation across UI, logic, state, API wiring, tests, config, and glue
+- bounded verification surfaces when existing ones are insufficient
+- final integration, reporting, and closure
+## Out of scope
+- task triage or acceptance
+- open-ended product invention
+- broad redesign without direct requirement need
+- compile-only or review-only completion claims
+## Quality bar
+- requirement-faithful
+- minimal in blast radius
+- explicit about assumptions, unknowns, and deviations
+- driven by executable behavior evidence
+- strong enough for downstream review, publication, and maintenance
+## Closure intent
+- `Implemented`: accepted behavior delivered with behavior-level executable evidence
+- `Implementation Candidate`: implementation is credible, but stronger validation is blocked
+- `Needs Human`: trustworthy implementation cannot yet be claimed without missing requirement or repository context

package/task-harnesses/coder/STANDARD.md ADDED Viewed

@@ -0,0 +1,258 @@
+# STANDARD
+Defines how the main agent should complete accepted `coder` tasks.
+## Purpose
+The main agent does not solve the task by relying on uninterrupted conversational memory alone.
+The main agent should:
+- understand the requirement
+- generate runtime harness information for the current task
+- split work into bounded stages
+- delegate one stage at a time to one subagent when useful
+- review stage output before advancing
+- decide final validation strength and closure
+## Main agent workflow
+Run this flow in order:
+1. understand the requirement package
+2. build runtime harness information
+3. define the current stage
+4. decide whether to delegate the stage
+5. require the stage to be implemented and validated
+6. review the stage output
+7. either return the same stage for repair or advance to the next stage
+8. repeat until accepted scope is complete or blocked
+9. produce final delivery artifacts and closure judgment
+## Runtime harness information
+Before substantial execution, the main agent must generate the runtime harness information needed to control the task.
+### Feishu-linked sources
+If task content or localized reference material points to a required Feishu doc/wiki URL that has not been localized yet, fetch it into the task artifact space before continuing.
+Use:
+```bash
+node .agents/skills/feishu-task-inputs/scripts/fetch-feishu-doc.mjs \
+  --url <feishu-doc-or-wiki-url> \
+  --output-dir <artifactDir>/feishu-reference
+```
+After fetching:
+- treat `content.md`, `manifest.json`, and `attachments/` under that output directory as the fact source
+- cite local paths, not the remote Feishu URL
+- do not continue requirement interpretation from an unread remote link
+- if fetch fails and the linked content is required, stop with `Needs Human`
+### Fixed content
+The runtime harness information should always define:
+- implementation objective
+- accepted scope
+- explicit non-goals
+- `facts`, `assumptions`, `unknowns`
+- stage order
+- stage completion conditions
+- stage failure conditions
+- validation strategy
+- replanning triggers
+- progress state
+- decision log
+### Dynamic content
+The runtime harness information should add task-specific content as needed, such as:
+- business rules
+- edge cases
+- repository surfaces
+- relevant files or modules
+- validation surfaces
+- phase-specific constraints
+- temporary playground or debug route
+- risk notes
+### Quality rule
+The runtime harness information must be good enough to guide a subagent safely through the current stage.
+Do not generate empty ceremony. Generate only records that improve control materially.
+## Stage planning
+The main agent should plan in bounded stages, not as one large uninterrupted implementation pass.
+Each stage should be:
+- narrow enough to review as one unit
+- meaningful enough to produce executable evidence
+- ordered by dependency
+Each stage should define:
+- `Goal`
+- `Inputs`
+- `Relevant Code Areas`
+- `Constraints`
+- `Done When`
+- `Fail When`
+- `Verification`
+- `Task State To Update`
+## Subagent delegation
+Subagents are execution helpers. They do not own the task.
+### Delegation rules
+- Delegate at most one active stage at a time.
+- Delegate only the current bounded stage.
+- Do not delegate before the runtime harness information is sufficient for the current stage.
+- Do not let a subagent redefine scope, reinterpret requirement truth, or decide final closure.
+### Subagent loop contract
+The main agent should require the subagent to:
+1. read the current runtime harness information
+2. execute only the assigned stage
+3. follow stage constraints
+4. implement the smallest coherent change set for the stage
+5. run the declared stage validation
+6. report evidence, not confidence
+7. update the required task state
+8. return control to the main agent
+## Stage review gate
+After a delegated stage completes implementation and stage validation, the main agent must review it before any next stage begins.
+The review must judge:
+- requirement fit
+- code quality
+- validation sufficiency
+- blast radius
+- consistency with runtime harness information
+### Review outcomes
+- If the stage passes review, update runtime harness information and move to the next stage.
+- If the stage fails review, return to the same stage for repair.
+Do not advance with known code defects, requirement drift, or weak stage evidence.
+## Replanning
+The main agent must replan before further code changes when any of these become true:
+- requirement meaning changed materially
+- repository mapping was wrong or incomplete
+- blast radius exceeded planned scope
+- the strongest planned validation path failed or disappeared
+- a new architecture decision became necessary
+- current stage boundaries no longer match task reality
+After replanning, update runtime harness information before more delegation or editing.
+## Validation and closure
+The main agent owns validation judgment and closure judgment.
+### Validation rules
+- Requirement implementation must be supported by executable behavior evidence.
+- Prefer the strongest feasible proof, not the cheapest one.
+- If stronger proof is blocked, state the blocker and downgrade explicitly.
+- Report exactly one strongest validation token in the final result summary.
+### Reliability order
+1. `V5`: `device_verified`
+2. `V4`: `simulator_verified`, `scenario_verified`
+3. `V3`: `regression_tests_passed`, `unit_tests_passed`
+4. `V2`: `targeted_tests_passed`, `module_build_passed`, `compile_passed`
+5. `V1`: `code_reviewed`
+### Token contract
+- `device_verified`: real device, real business path, expected behavior observed
+- `simulator_verified`: simulator or emulator, real business path, expected behavior observed
+- `scenario_verified`: runnable feature path or near-real scenario executed successfully
+- `regression_tests_passed`: relevant regression or integration checks passed
+- `unit_tests_passed`: real project unit tests covering implemented behavior passed
+- `targeted_tests_passed`: focused executable proof path exercised intended behavior successfully
+- `module_build_passed`: real module or package build passed
+- `compile_passed`: real compile target passed
+- `code_reviewed`: static review only
+### Closure floor
+- `Implemented` requires behavior-level executable validation.
+- `Implemented` must not rely only on `code_reviewed` or `compile_passed`.
+- `targeted_tests_passed` is acceptable only when it exercises real intended behavior.
+- Prefer `scenario_verified`, `simulator_verified`, or `device_verified` for user-facing, stateful, or interaction-heavy work.
+### Closure outcomes
+- `Implemented`: delivered with credible executable evidence
+- `Implementation Candidate`: credible implementation, stronger validation blocked
+- `Needs Human`: implementation cannot yet be claimed safely because required requirement or repository context is missing
+Prefer `Implementation Candidate` over a broader, riskier patch assembled only to chase stronger claims.
+## Final review pass
+After the main implementation and validation pass, the main agent should run a separate review pass when review risk is material, especially when:
+- code changed materially
+- behavior spans multiple modules or layers
+- completion relies on `targeted_tests_passed`
+- public interfaces, permissions, data shape, or state ownership changed
+Reviewer input should include:
+- accepted requirement package
+- relevant runtime harness information
+- relevant stage plan or implementation plan
+- changed files or `patch.diff`
+- strongest validation evidence and limits
+- previous reviewer findings when later rounds exist
+Reviewer output should classify findings as:
+- `Must Fix Before Close`
+- `Risk Accepted`
+- `Open Question`
+Review in this order:
+1. requirement alignment
+2. implementation coherence with repository boundaries
+3. validation credibility
+4. unnecessary blast radius or debt
+## Delivery artifacts
+The main agent must produce:
+- `result.md`
+- `patch.diff` when code changed
+- `review-log.md` when the reviewer loop ran
+### `result.md`
+Keep `result.md` dense and implementation-oriented.
+It must include:
+1. `Delivery Summary`
+- `Requirement Alignment`
+- `Implementation`
+- `Validation`
+- `Risk`
+- `Blocking Point`
+2. `Implemented Scope`
+3. `Validation`
+- strongest token and grade
+- what ran
+- what was directly observed
+- what remains unverified
+4. `Prototype or Requirement Deviations`
+5. `Recommended Next Action`
+Include when relevant:
+- completed stages
+- replanned or skipped stages
+- material decisions
+- remaining unknowns
+## Runtime contract
+- return exactly one runtime JSON object
+- use `completed` for normal closure, including analysis-only closure
+- use `failed` only for true runtime exceptions
+- `resultPath` must point to exactly one `result.md` under `artifactDir`
+- do not output prose outside the runtime JSON object
+### Example
+```json
+{
+  "status": "completed",
+  "resultPath": "<artifactDir>/result.md"
+}
+```

package/task-harnesses/coder/manifest.json ADDED Viewed

@@ -0,0 +1,13 @@
+{
+  "id": "coder",
+  "triageRules": [
+    "ACCEPT.md"
+  ],
+  "executionRules": [
+    "ROLE.md",
+    "CONSTRAINTS.md",
+    "CONTEXT.md",
+    "STANDARD.md",
+    "EVOLUTION.md"
+  ]
+}

package/task-harnesses/pm/ACCEPT.md CHANGED Viewed

@@ -15,6 +15,9 @@ Classify as `pm` only when all are true:
 - the expected output is a prototype artifact, not production code
 - the task centers on flow, structure, interaction, or state presentation
 - the prototype can be derived from requirement input without real system implementation
+- the default artifact should read as a product demo, not as a mixed review console
+Requirement documents, PRDs, Feishu docs, and similar materials are shared inputs. They do not imply `pm` by themselves.
 Do not classify as `pm` when any are true:
 - the request is only strategy discussion or product advice
@@ -27,7 +30,9 @@ Do not classify as `pm` when any are true:
 Accept when all are true:
 - a usable `requirement_document` exists
 - at least one concrete goal exists
+- at least one target user or role context is identifiable
 - at least one concrete flow, page path, or interaction path exists or is clearly derivable
+- critical rules are present at a level that bounds the product behavior truthfully
 - the prototype scope is bounded enough for one task
 - the task does not depend on repository coupling or production-system integration
@@ -51,7 +56,9 @@ Use the smallest set that explains rejection:
 - `requirement_document`
 - `product_goal`
 - `target_user`
+- `entry_trigger`
 - `core_flow`
+- `critical_rules`
 - `prototype_scope`
 - `constraints`

package/task-harnesses/pm/CONSTRAINTS.md CHANGED Viewed

@@ -20,11 +20,13 @@ Defines non-negotiable PM execution rules.
 - when fidelity and prototype convenience conflict, preserve the source fact or declare the deviation explicitly
 - do not present simulated or inferred detail as confirmed requirement
 - if trustworthy prototyping would require heavy invention, stop at `Analysis Only`
+- if source screenshots or embedded UI images exist but were not actually accessed, do not claim screenshot-level fidelity or use `Prototype Complete` unless another equally direct visual source was available
 ## Review discipline
 - prototype for review, not production deployment
 - the first screen should read primarily as product UI, not as a prototype console
 - `prototype.html` default view must contain product UI and interaction only, not delivery commentary
+- `prototype.html` default view must behave like a user-facing demo surface, not a blended product-plus-review workspace
 - static output alone is insufficient unless closure is `Analysis Only`
 - independent reviewer subagent judgment is required before claiming `Prototype Complete`
 - the reviewer is a judge, not a builder
@@ -43,9 +45,12 @@ Defines non-negotiable PM execution rules.
 - claiming certainty that does not exist
 - decoration-first output that hides product meaning
 - persistent `scope`, `exclusions`, `confirmed`, `simulated`, `assumption`, `open_question`, or `truth status` blocks inside the default prototype page unless the source requirement itself explicitly asks for such a panel as product UI
+- reviewer cockpits, operator consoles, debug benches, source reference strips, screenshot galleries, telemetry/event-log panels, ranking tables, or requirement explainer sidebars inside the default prototype page unless the source explicitly defines them as product UI
+- source-gap narration inside product UI such as "the source did not specify", "threshold not provided", or similar review commentary
 - claiming outputs that were not actually created under `artifactDir`
 - presenting simulated behavior as faithfully implemented
 - marking `Prototype Complete` when key rules remain materially weak, merged, or downgraded
+- marking `Prototype Complete` when source screenshots / embedded UI images existed but were unavailable and the prototype relied only on text or alt descriptions for visual reconstruction
 - treating builder self-review as a substitute for an independent reviewer subagent verdict
 - fixing a prior reviewer finding by introducing a new blank, near-blank, or materially weakened core panel
 - treating retained titles, labels, or container chrome as sufficient when the actual intended content expression has disappeared