npm - @fro.bot/systematic - Versions diffs - 2.7.2 → 2.8.0 - Mend

@fro.bot/systematic 2.7.2 → 2.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

package/skills/ce-work-beta/references/codex-delegation-workflow.md DELETED Viewed

@@ -1,327 +0,0 @@
-# Codex Delegation Workflow
-When `delegation_active` is true, code implementation is delegated to the Codex CLI (`codex exec`) instead of being implemented directly. The orchestrating OpenCode agent retains control of planning, review, git operations, and orchestration.
-## Delegation Decision
-If `work_delegate_decision` is `ask`, present the recommendation and wait for the user's choice before proceeding.
-**When recommending Codex delegation:**
-> "Codex delegation active. [N] implementation units -- delegating in one batch."
-> 1. Delegate to Codex *(recommended)*
-> 2. Execute with OpenCode instead
-**When recommending Codex delegation, multiple batches:**
-> "Codex delegation active. [N] implementation units -- delegating in [X] batches."
-> 1. Delegate to Codex *(recommended)*
-> 2. Execute with OpenCode instead
-**When recommending OpenCode (all units are trivial):**
-> "Codex delegation active, but these are small changes where the cost of delegating outweighs having OpenCode do them."
-> 1. Execute with OpenCode *(recommended)*
-> 2. Delegate to Codex anyway
-If the user chooses the delegation option, proceed to Pre-Delegation Checks below. If the user chooses the OpenCode option, set `delegation_active` to false and return to standard execution in the parent skill.
-If `work_delegate_decision` is `auto` (the default), state the execution plan in one line and proceed without waiting: "Codex delegation active. Delegating [N] units in [X] batch(es)." If all units are trivial, set `delegation_active` to false and proceed: "Codex delegation active. All units are trivial -- executing with OpenCode."
-## Pre-Delegation Checks
-Run these checks **once before the first batch**. If any check fails, fall back to standard mode for the remainder of the plan execution. Do not re-run on subsequent batches.
-**0. Platform Gate**
-Codex delegation is only supported when the orchestrating agent is running in OpenCode. If the current session is Codex, Gemini CLI, OpenCode, or any other platform, set `delegation_active` to false and proceed in standard mode.
-**1. Environment Guard**
-Check whether the current agent is already running inside a Codex sandbox:
-```bash
-if [ -n "$CODEX_SANDBOX" ] || [ -n "$CODEX_SESSION_ID" ]; then
-  echo "inside_sandbox=true"
-else
-  echo "inside_sandbox=false"
-fi
-```
-If `inside_sandbox` is true, delegation would recurse or fail.
-- If `delegation_source` is `argument`: emit "Already inside Codex sandbox -- using standard mode." and set `delegation_active` to false.
-- If `delegation_source` is `config` or `default`: set `delegation_active` to false silently.
-**2. Availability Check**
-**Codex availability (pre-resolved):**
-!`command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_FOUND"`
-If the line above shows `CODEX_AVAILABLE`, proceed to the next check.
-If it shows `CODEX_NOT_FOUND`, the Codex CLI is not installed. Emit "Codex CLI not found (install via `npm install -g @openai/codex` or `brew install codex`) -- using standard mode." and set `delegation_active` to false.
-If it shows an unresolved command string, run `command -v codex` using a shell tool. If the command prints a path, proceed. If it fails or prints nothing, emit the same message and set `delegation_active` to false.
-**3. Consent Flow**
-If `consent_granted` is not true (from config `work_delegate_consent`):
-Present a one-time consent warning using the platform's blocking question tool (`question` in OpenCode, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). The consent warning explains:
-- Delegation sends implementation units to `codex exec` as a structured prompt
-- **yolo mode** (`--yolo`): Full system access including network. Required for verification steps that run tests or install dependencies. **Recommended.**
-- **full-auto mode** (`--full-auto`): Workspace-write sandbox, no network access.
-Present the sandbox mode choice: (1) yolo (recommended), (2) full-auto.
-On acceptance:
-- Resolve the repo root: `git rev-parse --show-toplevel`. Write `work_delegate_consent: true` and `work_delegate_sandbox: <chosen-mode>` to `<repo-root>/.systematic/config.local.yaml`
-- To write: (1) if file or directory does not exist, create `<repo-root>/.systematic/` and write the YAML file; (2) if file exists, merge new keys preserving existing keys
-- Update `consent_granted` and `sandbox_mode` in the resolved state
-On decline:
-- Ask whether to disable delegation entirely for this project
-- If yes: write `work_delegate: false` to `<repo-root>/.systematic/config.local.yaml` (using the same repo root resolved above). To write: (1) if file or directory does not exist, create `<repo-root>/.systematic/` and write the YAML file; (2) if file exists, merge new keys preserving existing keys. Set `delegation_active` to false, proceed in standard mode
-- If no: set `delegation_active` to false for this invocation only, proceed in standard mode
-**Headless consent:** If running in a headless or non-interactive context, delegation proceeds only if `work_delegate_consent` is already `true` in the config file. If consent is not recorded, set `delegation_active` to false silently.
-## Batching
-Delegate all units in one batch. If the plan exceeds 5 units, split into batches at the plan's own phase boundaries, or in groups of roughly 5 -- never splitting units that share files. Skip delegation entirely if every unit is trivial.
-## Prompt Template
-At the start of delegated execution, create a per-run OS-temp scratch directory via `mktemp -d` and capture its **absolute path** for all downstream use. All scratch files for this invocation live under that directory. Do not use `.context/` — these scratch files are per-run throwaway that get cleaned up when delegated execution ends (see Cleanup below), matching the repo Scratch Space convention for one-shot artifacts. Do not pass unresolved shell-variable strings to non-shell tools (Write, Read); use the absolute path returned by `mktemp -d`.
-```bash
-SCRATCH_DIR="$(mktemp -d -t ce-work-codex-XXXXXX)"
-echo "$SCRATCH_DIR"
-```
-Refer to the echoed absolute path as `<scratch-dir>` throughout the rest of this workflow.
-Before each batch, write a prompt file to `<scratch-dir>/prompt-batch-<batch-num>.md`.
-Build the prompt from the batch's implementation units using these XML-tagged sections:
-```xml
-<task>
-[For a single-unit batch: Goal from the implementation unit.
-For a multi-unit batch: list each unit with its Goal, stating the concrete
-job, repository context, and expected end state for each.]
-</task>
-<files>
-[Combined file list from all units in the batch -- files to create, modify, or read.]
-</files>
-<patterns>
-[File paths from all units' "Patterns to follow" fields. If no patterns:
-"No explicit patterns referenced -- follow existing conventions in the
-modified files."]
-</patterns>
-<approach>
-[For a single-unit batch: Approach from the unit.
-For a multi-unit batch: list each unit's approach, noting dependencies
-and suggested ordering.]
-</approach>
-<constraints>
-- Do NOT run git commit, git push, or create PRs -- the orchestrating agent handles all git operations
-- Restrict all modifications to files within the repository root
-- Keep changes tightly scoped to the stated task -- avoid unrelated refactors, renames, or cleanup
-- Resolve the task fully before stopping -- do not stop at the first plausible answer
-- If you discover mid-execution that you need to modify files outside the repo root, complete what you can within the repo and report what you could not do via the result schema issues field
-</constraints>
-<testing>
-Before writing tests, check whether the plan's test scenarios cover all
-categories that apply to each unit. Supplement gaps before writing tests:
-- Happy path: core input/output pairs from each unit's goal
-- Edge cases: boundary values, empty/nil inputs, type mismatches
-- Error/failure paths: invalid inputs, permission denials, downstream failures
-- Integration: cross-layer scenarios that mocks alone won't prove
-Write tests that name specific inputs and expected outcomes. If your changes
-touch code with callbacks, middleware, or event handlers, verify the
-interaction chain works end-to-end.
-</testing>
-<verify>
-After implementing, run ALL test files together in a single command (not
-per-file). Cross-file contamination (e.g., mocked globals leaking between
-test files) only surfaces when tests run in the same process. If tests
-fail, fix the issues and re-run until they pass. Do not report status
-"completed" unless verification passes. This is your responsibility --
-the orchestrator will not re-run verification independently.
-[Test and lint commands from the project. Use the union of all units'
-verification commands as a single combined invocation.]
-</verify>
-<output_contract>
-Report your result via the --output-schema mechanism. Fill in every field:
-- status: "completed" ONLY if all changes were made AND verification passes,
-  "partial" if incomplete, "failed" if no meaningful progress
-- files_modified: array of file paths you changed
-- issues: array of strings describing any problems, gaps, or out-of-scope
-  work discovered
-- summary: one-paragraph description of what was done
-- verification_summary: what you ran to verify (command and outcome).
-  Example: "Ran `bun test` -- 14 tests passed, 0 failed."
-  If no verification was possible, say why.
-</output_contract>
-```
-## Result Schema
-Write the result schema to `<scratch-dir>/result-schema.json` (using the absolute path captured at the start) once at the start of delegated execution:
-```json
-{
-  "type": "object",
-  "properties": {
-    "status": { "enum": ["completed", "partial", "failed"] },
-    "files_modified": { "type": "array", "items": { "type": "string" } },
-    "issues": { "type": "array", "items": { "type": "string" } },
-    "summary": { "type": "string" },
-    "verification_summary": { "type": "string" }
-  },
-  "required": ["status", "files_modified", "issues", "summary", "verification_summary"],
-  "additionalProperties": false
-}
-```
-Each batch's result is written to `<scratch-dir>/result-batch-<batch-num>.json` via the `-o` flag. On plan failure, files are left in place for debugging.
-If the result JSON is absent or malformed after a successful exit code, classify as task failure.
-## Execution Loop
-Initialize a `consecutive_failures` counter at 0 before the first batch.
-**Clean-baseline preflight:** Before the first batch, verify there are no uncommitted changes to tracked files:
-```bash
-git diff --quiet HEAD
-```
-This intentionally ignores untracked files. Only staged or unstaged modifications to tracked files make rollback unsafe. However, if untracked files exist at paths in the batch's planned Files list, rollback (`git clean -fd -- <paths>`) would delete them. If such overlaps are detected, warn the user and recommend committing or stashing those files before proceeding.
-If tracked files are dirty, stop and present options: (1) commit current changes, (2) stash explicitly (`git stash push -m "pre-delegation"`), (3) continue in standard mode (sets `delegation_active` to false). Do not auto-stash user changes.
-**Delegation invocation:** For each batch, execute these as **separate Bash tool calls** (not combined into one):
-**Step A — Launch (background, separate Bash call):**
-Write the prompt file, then make a single Bash tool call with `run_in_background: true` set on the tool parameter. This call returns immediately and has no timeout ceiling.
-Substitute the literal absolute path captured at setup for every `<scratch-dir>` below. Each Bash tool call starts a fresh shell, so the `$SCRATCH_DIR` variable from the setup snippet is not preserved — an unresolved `$SCRATCH_DIR` would expand empty and break result detection.
-```bash
-# Substitute the resolved sandbox_mode value (yolo or full-auto) from the skill state
-SANDBOX_MODE="<sandbox_mode>"
-# Resolve sandbox flag
-if [ "$SANDBOX_MODE" = "full-auto" ]; then
-  SANDBOX_FLAG="--full-auto"
-else
-  SANDBOX_FLAG="--dangerously-bypass-approvals-and-sandbox"
-fi
-codex exec \
-  $SANDBOX_FLAG \
-  --output-schema "<scratch-dir>/result-schema.json" \
-  -o "<scratch-dir>/result-batch-<batch-num>.json" \
-  - < "<scratch-dir>/prompt-batch-<batch-num>.md"
-```
-**Conditional flags** — only include each line when the corresponding skill-state value is set:
-- If `delegate_model` is set, insert `  -m "<delegate_model>" \` as a line before `$SANDBOX_FLAG`.
-- If `delegate_effort` is set, insert `  -c 'model_reasoning_effort="<delegate_effort>"' \` as a line before `$SANDBOX_FLAG`.
-When either value is unset, omit its line entirely — Codex resolves the default from the user's `~/.codex/config.toml` (and ultimately the CLI's own built-in default). Do not substitute a placeholder string for unset values.
-Critical: `run_in_background: true` must be set as a **Bash tool parameter**, not as a shell `&` suffix. The tool parameter is what removes the timeout ceiling. A shell `&` inside a foreground Bash call still hits the 2-minute default timeout.
-Quoting is critical for the `-c` flag when present: use single quotes around the entire key=value and double quotes around the TOML string value inside. Example: `-c 'model_reasoning_effort="high"'`.
-Do not improvise CLI flags or modify this invocation template beyond the documented conditional insertions.
-**Step B — Poll (foreground, separate Bash calls):**
-After the launch call returns, make a **new, separate** foreground Bash tool call that polls for the result file. This keeps the agent's turn active so the user cannot interfere with the working tree.
-Substitute the literal absolute path captured at setup for `<scratch-dir>`. The shell variable from Step A does not survive across separate Bash tool calls.
-```bash
-RESULT_FILE="<scratch-dir>/result-batch-<batch-num>.json"
-for i in $(seq 1 6); do
-  test -s "$RESULT_FILE" && echo "DONE" && exit 0
-  sleep 10
-done
-echo "Waiting for Codex..."
-```
-If the output is "Waiting for Codex...", issue the same polling command again as another separate Bash call. Repeat until the output is "DONE", then read the result file and proceed to classification.
-**Polling termination conditions:** Stop polling when any of these conditions is met:
-- **Result file appears** (output is "DONE") -- proceed to result classification normally.
-- **Background process exits with non-zero code** -- classify as CLI failure (row 1). Rollback and fall back to standard mode.
-- **Background process exits with zero code but result file is absent** -- classify as task failure (row 2: exit 0, result JSON missing). Rollback and increment `consecutive_failures`.
-- **5 polling rounds** elapse (~5 minutes) without the result file appearing and without a background process notification -- treat as a hung process. Classify as CLI failure (row 1). Rollback and fall back to standard mode.
-**Result classification:** Codex is responsible for running verification internally and fixing failures before reporting -- the orchestrator does not re-run verification independently.
-| # | Signal | Classification | Action |
-|---|--------|---------------|--------|
-| 1 | Exit code != 0 | CLI failure | Rollback to HEAD. Fall back to standard mode for ALL remaining work. |
-| 2 | Exit code 0, result JSON missing or malformed | Task failure | Rollback to HEAD. Increment `consecutive_failures`. |
-| 3 | Exit code 0, `status: "failed"` | Task failure | Rollback to HEAD. Increment `consecutive_failures`. |
-| 4 | Exit code 0, `status: "partial"` | Partial success | Keep the diff. Complete remaining work locally, verify, and commit. Increment `consecutive_failures`. |
-| 5 | Exit code 0, `status: "completed"` | Success | Commit changes. Reset `consecutive_failures` to 0. |
-**Result handoff — surface to user:** After reading the result JSON and before committing or rolling back, display a summary so the user sees what happened. Format:
-> **Codex batch <batch-num> — <classification>**
-> <summary from result JSON>
->
-> **Files:** <comma-separated list from files_modified>
-> **Verification:** <verification_summary from result JSON>
-> **Issues:** <issues list, or "None">
-On failure or partial results, include the classification reason (e.g., "status: failed", "result JSON missing") so the user understands why the orchestrator is rolling back or completing locally.
-Keep this brief — the goal is transparency, not a wall of text. One short block per batch.
-**Rollback procedure:**
-```bash
-git checkout -- .
-git clean -fd -- <paths from the batch's combined Files list>
-```
-Do NOT use bare `git clean -fd` without path arguments.
-**Commit on success:**
-```bash
-git add $(git diff --name-only HEAD; git ls-files --others --exclude-standard)
-git commit -m "feat(<scope>): <batch summary>"
-```
-**Between batches** (plans split into multiple batches): Report what completed, test results, and what's next. Continue immediately unless the user intervenes -- the checkpoint exists so the user *can* steer, not so they *must*.
-**Circuit breaker:** After 3 consecutive failures, set `delegation_active` to false and emit: "Codex delegation disabled after 3 consecutive failures -- completing remaining units in standard mode."
-**Scratch cleanup:** No explicit cleanup needed — OS temp handles eventual cleanup (macOS `$TMPDIR` periodic purge; Linux/WSL `/tmp` reboot or periodic cleanup). Leaving `<scratch-dir>` in place after the run also preserves intermediate artifacts for debugging if anything went wrong.
-## Mixed-Model Attribution
-When some units are executed by Codex and others locally:
-- If all units used delegation: attribute to the Codex model
-- If all units used standard mode: attribute to the current agent's model
-- If mixed: note which units were delegated in the PR description and credit both models

package/skills/ce-work-beta/references/shipping-workflow.md DELETED Viewed

@@ -1,129 +0,0 @@
-# Shipping Workflow
-This file contains the shipping workflow (Phase 3-4). Load it only when all Phase 2 tasks are complete and execution transitions to quality check.
-## Phase 3: Quality Check
-1. **Run Core Quality Checks**
-   Always run before submitting:
-   ```bash
-   # Run full test suite (use project's test command)
-   # Examples: bin/rails test, npm test, pytest, go test, etc.
-   # Run linting (per AGENTS.md)
-   # Use linting-agent before pushing to origin
-   ```
-2. **Code Review** (REQUIRED)
-   Every change gets reviewed before shipping. The depth scales with the change's risk profile, but review itself is never skipped.
-   **Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and record residual downstream work in the per-run artifact. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.
-   **Tier 1: Inline self-review** -- A lighter alternative permitted only when **all four** criteria are true. Before choosing Tier 1, explicitly state which criteria apply and why. If any criterion is uncertain, use Tier 2.
-   - Purely additive (new files only, no existing behavior modified)
-   - Single concern (one skill, one component -- not cross-cutting)
-   - Pattern-following (implementation mirrors an existing example with no novel logic)
-   - Plan-faithful (no scope growth, no deferred questions resolved with surprising answers)
-3. **Residual Work Gate** (REQUIRED when Tier 2 ran)
-   After Tier 2 code review completes, inspect the Residual Actionable Work summary it returned (or read the run artifact directly if the summary was not emitted). If one or more residual `downstream-resolver` findings remain, do not proceed to Final Validation until the user decides how to handle them.
-   Ask the user using the platform's blocking question tool (`question` in OpenCode with `ToolSearch select:question` pre-loaded if needed, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool. Never silently skip the gate.
-   Stem: `Code review found N residual finding(s) the skill did not auto-fix. How should the agent proceed?`
-   Options (four or fewer, self-contained labels):
-   - `Apply/fix now` — loop back into review with focused fixes; the agent investigates each finding, applies changes where safe, and re-runs review.
-   - `File tickets via project tracker` — load `references/tracker-defer.md` in Interactive mode; the agent files tickets in the project's detected tracker (or `gh` fallback, or leaves them in the report if no sink exists) and proceeds to Final Validation.
-   - `Accept and proceed` — record the residual findings verbatim in a durable "Known Residuals" sink before shipping. If a PR will be created or updated in Phase 4, include them in the PR description's "Known Residuals" section (the agent owns this when calling `ce-commit-push-pr`). If the user later chooses the no-PR `ce-commit` path, create `docs/residual-review-findings/<branch-or-head-sha>.md`, include the accepted findings and source review-run context, stage it with the implementation commit, and mention the file path in the final summary. The user has acknowledged the risk, but the findings must not live only in the transient session.
-   - `Stop — do not ship` — abort the shipping workflow. The user will handle findings manually before re-invoking.
-   Skip this gate entirely when the review reported `Residual actionable work: none.` or when only Tier 1 (inline self-review) was used. Do not proceed past this gate on an `Accept and proceed` decision until the agent has recorded whether the durable sink is `PR Known Residuals` or `docs/residual-review-findings/<branch-or-head-sha>.md`.
-4. **Final Validation**
-   - All tasks marked completed
-   - Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
-   - Linting passes
-   - Code follows existing patterns
-   - Figma designs match (if applicable)
-   - No console errors or warnings
-   - If the plan has a `Requirements` section (or legacy `Requirements Trace`), verify each requirement is satisfied by the completed work
-   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
-5. **Prepare Operational Validation Plan** (REQUIRED)
-   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
-   - Include concrete:
-     - Log queries/search terms
-     - Metrics or dashboards to watch
-     - Expected healthy signals
-     - Failure signals and rollback/mitigation trigger
-     - Validation window and owner
-   - If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
-## Phase 4: Ship It
-1. **Prepare Evidence Context**
-   Do not invoke `ce-demo-reel` directly in this step. Evidence capture belongs to the PR creation or PR description update flow, where the final PR diff and description context are available.
-   Note whether the completed work has observable behavior (UI rendering, CLI output, API/library behavior with a runnable example, generated artifacts, or workflow output). The `ce-commit-push-pr` skill will ask whether to capture evidence only when evidence is possible.
-2. **Update Plan Status**
-   If the input document has YAML frontmatter with a `status` field, update it to `completed`:
-   ```
-   status: active  ->  status: completed
-   ```
-3. **Commit and Create Pull Request**
-   Load the `ce-commit-push-pr` skill to handle committing, pushing, and PR creation. The skill handles convention detection, branch safety, logical commit splitting, adaptive PR descriptions, and attribution badges.
-   When providing context for the PR description, include:
-   - The plan's summary and key decisions
-   - Testing notes (tests added/modified, manual testing performed)
-   - Evidence context from step 1, so `ce-commit-push-pr` can decide whether to ask about capturing evidence
-   - Figma design link (if applicable)
-   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 5)
-   - Any "Known Residuals" accepted in the Phase 3 Residual Work Gate, rendered as a dedicated section in the PR body with severity, file:line, and title per finding
-   If the user prefers to commit without creating a PR, load the `ce-commit` skill instead.
-4. **Notify User**
-   - Summarize what was completed
-   - Link to PR (if one was created)
-   - Note any follow-up work needed
-   - Suggest next steps if applicable
-## Quality Checklist
-Before creating PR, verify:
-- [ ] All clarifying questions asked and answered
-- [ ] All tasks marked completed
-- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
-- [ ] Linting passes (use linting-agent)
-- [ ] Code follows existing patterns
-- [ ] Figma designs match implementation (if applicable)
-- [ ] Evidence decision handled by `ce-commit-push-pr` when the change has observable behavior
-- [ ] Commit messages follow conventional format
-- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
-- [ ] Code review completed (inline self-review or full `ce-code-review`)
-- [ ] PR description includes summary, testing notes, and evidence when captured
-- [ ] PR description includes Compound Engineered badge with accurate model and harness
-## Code Review Tiers
-Every change gets reviewed. The tier determines depth, not whether review happens.
-**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work is recorded in the run artifact for downstream routing. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.
-**Tier 1 (inline self-review)** -- permitted only when all four are true (state each explicitly before choosing):
-- Purely additive (new files only, no existing behavior modified)
-- Single concern (one skill, one component -- not cross-cutting)
-- Pattern-following (mirrors an existing example, no novel logic)
-- Plan-faithful (no scope growth, no surprising deferred-question resolutions)