npm - @laitszkin/apollo-toolkit - Versions diffs - 2.13.2 → 2.13.4 - Mend

@laitszkin/apollo-toolkit 2.13.2 → 2.13.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/AGENTS.md +0 -1
package/CHANGELOG.md +13 -0
package/README.md +0 -1
package/codex/learn-skill-from-conversations/README.md +6 -1
package/codex/learn-skill-from-conversations/SKILL.md +17 -6
package/package.json +1 -1
package/production-sim-debug/LICENSE +0 -21
package/production-sim-debug/README.md +0 -91
package/production-sim-debug/SKILL.md +0 -187
package/production-sim-debug/agents/openai.yaml +0 -4

package/AGENTS.md CHANGED Viewed

@@ -42,7 +42,6 @@ This repository enables users to install and run a curated set of reusable agent
 - Users can prepare and open open-source pull requests from existing changes.
 - Users can generate storyboard image sets from chapters, novels, articles, or scripts.
 - Users can configure OpenClaw from official documentation, including `~/.openclaw/openclaw.json`, skills loading, SecretRefs, CLI edits, and validation or repair workflows.
-- Users can investigate production or local simulation runs, calibrate reusable presets, and fix toolchain realism gaps between harness behavior and expected on-chain behavior.
 - Users can record multi-account spending and balance changes in monthly Excel ledgers with summary analytics and charts.
 - Users can recover missing or archived `docs/plans/...` spec sets from issue context, git history, and repository evidence before continuing feature work.
 - Users can review the current git change set from an unbiased reviewer perspective to find abstraction opportunities and simplification candidates.

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,19 @@ All notable changes to this repository are documented in this file.
 ## [Unreleased]
+## [v2.13.4] - 2026-04-05
+### Changed
+- Update `learn-skill-from-conversations` so it must inventory the current repository's existing skills first, weigh repeated user corrections and error-driven lessons more heavily, extract duplicated workflow fragments into shared skills when warranted, wrap repeatedly customized external skills, and keep project-specific tooling patterns in the owning project's `~/.codex/skills/`.
+### Fixed
+- Synchronize `package-lock.json` metadata with the current package version and CLI bin aliases before release publication.
+## [v2.13.3] - 2026-04-05
+### Removed
+- Remove `production-sim-debug` skill as it is no longer actively maintained or needed.
 ## [v2.13.2] - 2026-04-05
 ### Changed

package/README.md CHANGED Viewed

@@ -35,7 +35,6 @@ A curated skill catalog for Codex, OpenClaw, Trae, and Claude Code with a manage
 - open-source-pr-workflow
 - openai-text-to-image-storyboard
 - openclaw-configuration
-- production-sim-debug
 - recover-missing-plan
 - record-spending
 - resolve-review-comments

package/codex/learn-skill-from-conversations/README.md CHANGED Viewed

@@ -10,7 +10,12 @@ This skill extracts the latest conversations from `~/.codex/sessions` and `~/.co
 - Stops immediately when there are no recent sessions
 - Cleans up `sessions` files older than 7 days after reading
 - Deletes `archived_sessions` files after reading them
+- Reads existing skills in the current working repository before proposing new ones
+- Prioritizes repeated user corrections, reported errors, tool failures, and reusable workflow gaps
+- Encourages extracting duplicated workflow fragments into shared skills when several skills need the same pattern
+- Wraps repeatedly customized external skills in a local skill when that produces a more reusable workflow
 - Defaults to creating a new skill unless strong overlap is confirmed
+- Keeps project-specific tool workflows out of the shared catalog and places them in the relevant project's `~/.codex/skills/`
 - Validates each changed skill with `quick_validate.py`
 ## Project Structure
@@ -40,7 +45,7 @@ python3 scripts/extract_recent_conversations.py --lookback-minutes 1440
 ```
 - If output is `NO_RECENT_CONVERSATIONS`, no action is required.
-- Otherwise, review extracted `[USER]` / `[ASSISTANT]` messages and apply updates through `skill-creator`.
+- Otherwise, review extracted `[USER]` / `[ASSISTANT]` messages, compare the lessons against existing skills in the current repository, and apply updates through `skill-creator`.
 ## License

package/codex/learn-skill-from-conversations/SKILL.md CHANGED Viewed

@@ -15,9 +15,9 @@ description: Learn and evolve the local skill library from recent Codex conversa
 ## Standards
 - Evidence: Extract recent Codex session history first and derive reusable lessons only from actual conversation patterns.
-- Execution: Inventory the current skill catalog before editing, prioritize repeated requests, user corrections, reported errors, and post-completion follow-up asks that reveal missing closure, then prefer a focused update to the strongest related skill or create a new skill only when the overlap is weak.
-- Quality: Take no action when there are no recent sessions, avoid unrelated broad refactors, and validate every changed skill.
-- Output: Report the analyzed sessions, extracted lessons, created or updated skills, and the reasoning behind each decision.
+- Execution: Inventory the current working directory's existing skills before editing, prioritize repeated requests, user corrections, tool failures, logic bugs, architecture mismatches, documentation drift, and post-completion follow-up asks that reveal missing closure, then prefer a focused update to the strongest related skill or create a new skill only when the overlap is weak.
+- Quality: Take no action when there are no recent sessions, avoid unrelated broad refactors, keep shared skills cross-project reusable, route project-specific tooling patterns into the relevant project's `~/.codex/skills/`, and validate every changed skill.
+- Output: Report the analyzed sessions, extracted lessons, created or updated skills, shared-vs-project-specific placement decisions, and the reasoning behind each decision.
 ## Overview
@@ -46,10 +46,15 @@ python3 ~/.codex/skills/learn-skill-from-conversations/scripts/extract_recent_co
 ### 2) Derive reusable lessons
 - Identify repeated user needs, recurring friction, and repeated manual workflows.
+- Focus especially on repeated needs, repeated user corrections, and user-reported errors, then ask how a skill can prevent the same failure mode from recurring.
 - Give extra weight to moments where the user corrected the agent, rejected an earlier interpretation, or pointed out a missing preference or requirement.
 - Give extra weight to user-reported errors, regressions, or avoidable mistakes, then ask how a skill can prevent repeating that failure mode.
+- Treat tool-call failures, broken code paths, logic mistakes, weak architecture choices, and outputs that drifted from official documentation as valuable evidence when they expose a reusable missing guardrail or workflow.
 - Treat a user follow-up that asks for cleanup or an omitted finalization step immediately after the assistant reported completion as evidence that the workflow's done criteria were incomplete.
 - When that kind of follow-up recurs, tighten the owning skill's completion checklist before considering any new-skill extraction.
+- Even when a user request was highly specific, check whether the underlying workflow can be generalized into a reusable skill for the same class of tasks.
+- When multiple existing skills use a near-identical workflow fragment, consider extracting that fragment into a dedicated shared skill instead of leaving the duplication in place.
+- When an external skill is repeatedly used with the same user-specific customization layer, prefer wrapping it in a new local skill that encodes those standing conventions.
 - Ignore one-off issues that do not provide reusable value.
 - Distinguish between:
   - repeated trigger intent that deserves a new skill
@@ -58,7 +63,7 @@ python3 ~/.codex/skills/learn-skill-from-conversations/scripts/extract_recent_co
 ### 3) Decide new skill vs existing skill (default: new skill)
-- First read the relevant skills already present in the working repository so you do not create a duplicate under a different name.
+- First read the relevant skills already present in the current working directory repository (for example `apollo-toolkit`) so you do not create a duplicate under a different name.
 - Prefer creating a new skill.
 - Edit an existing skill only when the lesson is strongly related.
 - Treat relation as strong only when all three conditions hold:
@@ -67,12 +72,17 @@ python3 ~/.codex/skills/learn-skill-from-conversations/scripts/extract_recent_co
   - The update does not dilute the existing skill's scope.
 - When the recurring lesson is mainly about preventing a known mistake, prefer updating the existing skill that should have prevented it instead of creating a parallel skill.
 - When several skills repeat the same narrow workflow fragment, prefer extracting that fragment into a dedicated shared skill instead of copying the same guidance again.
+- When the strongest candidate is an external skill but the user repeatedly adds the same customization or policy layer, create a wrapper skill that calls that external skill while encoding the recurring local conventions.
+- Decide whether the lesson is cross-project reusable before choosing the destination:
+  - Cross-project reusable skills belong in the shared skill library.
+  - Project-specific workflows, especially ones tied to custom tools from a single repository, belong in that project's `~/.codex/skills/` directory instead of the shared catalog.
 - If uncertain, create a new skill instead of expanding an old one.
 ### 4) Apply changes through skill-creator
 - Explicitly follow `$skill-creator` workflow before editing skills.
-- For new skills, initialize with `~/.codex/skills/.system/skill-creator/scripts/init_skill.py`, then complete `SKILL.md` and required resources.
+- For new shared skills, initialize with `~/.codex/skills/.system/skill-creator/scripts/init_skill.py`, then complete `SKILL.md` and required resources.
+- For new project-specific skills, create or update them under the relevant project's `~/.codex/skills/` directory instead of the shared catalog.
 - For existing skills, make minimal focused edits and keep behavior consistent.
 ### 5) Validate every changed skill
@@ -87,10 +97,11 @@ python3 ~/.codex/skills/.system/skill-creator/scripts/quick_validate.py <skill-p
 ### 6) Report result
-- Summarize analyzed sessions, repeated needs, user corrections, error-driven lessons, created/updated skills, and why each decision was made.
+- Summarize analyzed sessions, repeated needs, user corrections, error-driven lessons, created/updated skills, placement decisions, and why each decision was made.
 ## Guardrails
 - Take no action when there are no sessions in the last 24 hours.
 - Avoid broad refactors across unrelated skills.
 - Avoid duplicate skills when an existing skill is strongly related.
+- Do not promote project-specific tool usage into the shared catalog unless the workflow is clearly reusable across repositories.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@laitszkin/apollo-toolkit",
-  "version": "2.13.2",
+  "version": "2.13.4",
   "description": "Apollo Toolkit npm installer for managed skill copying across Codex, OpenClaw, and Trae.",
   "license": "MIT",
   "author": "LaiTszKin",

package/production-sim-debug/LICENSE DELETED Viewed

@@ -1,21 +0,0 @@
-MIT License
-Copyright (c) 2026 LaiTszKin
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.

package/production-sim-debug/README.md DELETED Viewed

@@ -1,91 +0,0 @@
-# Production Sim Debug
-An agent skill for investigating production or local simulation runs when the observed behavior diverges from the intended market scenario or expected liquidation/remediation outcomes.
-This skill helps agents reproduce a bounded simulation run, inspect the real preset and runtime artifacts, separate product bugs from local harness drift, and apply the smallest realistic fix before rerunning the same scenario.
-## What this skill provides
-- A workflow for bounded production/local simulation diagnosis.
-- A decision tree for separating runtime logic bugs from harness, stub, preset, and persistence issues.
-- A repeatable way to audit the active run directory, logs, and event database before drawing conclusions.
-- Guidance for turning recurring ad hoc scenarios into named presets and documented test cases.
-- Emphasis on rerunning the same scenario after a fix instead of relying only on unit tests.
-## Repository structure
-- `SKILL.md`: Main skill definition, workflow, and output contract.
-- `agents/openai.yaml`: Agent interface metadata and default prompt.
-## Installation
-1. Clone this repository.
-2. Copy this folder to your Codex skills directory:
-```bash
-mkdir -p "$CODEX_HOME/skills"
-cp -R production-sim-debug "$CODEX_HOME/skills/production-sim-debug"
-```
-## Usage
-Invoke the skill in your prompt:
-```text
-Use $production-sim-debug to run this repository's production local simulation with the named preset for 5 minutes, explain why remediations or liquidations did not land, and fix any harness or runtime-alignment issues you confirm.
-```
-Best results come from including:
-- workspace path
-- canonical simulation entrypoint
-- preset or scenario name
-- run duration
-- expected market shape or success criteria
-- the run directory to inspect, if it already exists
-- whether toolchain fixes are in scope or the task is read-only
-If the repository already has a named preset system, prefer using it instead of describing the scenario only in prose.
-## Example
-### Input prompt
-```text
-Use $production-sim-debug for this repository.
-Workspace: /workspace/pangu
-Entrypoint: ./scripts/run-production-local-sim.sh stress-test-1
-Duration: 5 minutes
-Expectations:
-- Jupiter free tier
-- mostly oracle-blocked positions that can be unlocked by remediation
-- some directly executable opportunities
-- evidence-backed explanation for why liquidations did or did not land
-```
-### Expected response shape
-```text
-1) Scenario contract
-- Named preset, duration, and run directory used.
-2) Observed outcomes
-- Event-table counts, dominant skip reasons, and runtime stage reached.
-3) Root cause
-- Whether the main blocker was product logic, quote budget, preset design, or harness/stub drift.
-4) Fixes applied
-- Toolchain or runtime fixes with file paths.
-5) Validation
-- Rerun or targeted tests proving the intended stage now executes.
-6) Remaining gaps
-- Any realism differences still left between local simulation and chain behavior.
-```
-## License
-MIT. See `LICENSE`.

package/production-sim-debug/SKILL.md DELETED Viewed

@@ -1,187 +0,0 @@
----
-name: production-sim-debug
-description: Investigate production or local simulation runs for runtime-toolchain drift, harness bugs, preset mistakes, unrealistic local stubs, or mismatches between expected and observed liquidation outcomes. Use when users ask to run bounded production simulations, explain why simulated liquidations or remediations did not happen, calibrate presets, or fix local simulation tooling so it better matches real on-chain behavior.
----
-# Production Sim Debug
-## Dependencies
-- Required: `systematic-debug` for evidence-first root-cause analysis when a simulation shows failing or missing expected behavior.
-- Conditional: `scheduled-runtime-health-check` when the user wants a bounded production/local simulation run executed and observed; `read-github-issue` when the requested simulation work is driven by a remote issue; `marginfi-development` when liquidation, health, receivership, or instruction-order conclusions depend on official marginfi docs/source; `jupiter-development` when swap, quote, routing, or rate-limit conclusions depend on official Jupiter docs; `open-github-issue` when confirmed toolchain gaps should be published.
-- Optional: none.
-- Fallback: If the relevant simulation entrypoint, preset, logs, or run artifacts cannot be found, stop and report the missing evidence instead of inferring behavior from stale docs or memory.
-## Standards
-- Evidence: Base conclusions on the actual preset, runtime command, logs, SQLite event store, local stub responses, the code paths that generated them, and official protocol or validator documentation whenever feasibility or instruction legality is in question.
-- Execution: Reproduce with the exact scenario first, verify the bounded-run contract against the actual script/env implementation before launch, separate product logic failures from simulation-toolchain failures, verify protocol-sensitive claims against official docs or upstream source before changing code or specs, make the smallest realistic toolchain fix, and rerun the same bounded scenario to validate.
-- Quality: Prefer harness or stub fixes that improve realism over one-off scenario hacks, avoid duplicating existing workflow skills, and record reusable presets when a scenario becomes part of the regular test suite.
-- Output: Return the scenario contract, observed outcomes, root-cause chain, fixes applied, validation evidence, and any remaining realism gaps.
-## Goal
-Use this skill to debug simulation workflows where the repository exposes a production-like local run path, but the observed outcomes are distorted by presets, harness logic, local stubs, event persistence, or runtime scheduling constraints.
-## Workflow
-### 1) Lock the simulation contract before touching code
-- Identify the exact entrypoint, preset, duration, runtime mode, and rate-limit tier the user expects.
-- Read the preset or scenario definition from the repository before assuming what the test means.
-- Capture the intended success criteria explicitly, such as:
-  - successful liquidation count
-  - remediation count
-  - oracle-block registration
-  - profit ranking behavior
-  - quote budget behavior
-- If the scenario is ad hoc but likely to recur, prefer turning it into a named preset instead of leaving it as an undocumented shell invocation.
-### 2) Reproduce with the real bounded run path
-- Use the same production/local simulation script the repository already treats as canonical.
-- Prefer a bounded run window with a stable run name and output directory.
-- As soon as the harness prints the active run name or output directory, record it and treat that path as the canonical artifact root for the rest of the investigation.
-- Before launch, read the script or wrapper that enforces the run duration and confirm the real control surface, such as the exact env var name, CLI flag, shutdown helper, and artifact path conventions.
-- Do not assume a generic `RUNTIME_SECS`-style variable is wired correctly; verify the actual variable names and stop path from code or scripts first.
-- Save and inspect the exact artifacts produced by that run:
-  - main runtime log
-  - actor or stub logs
-  - generated env/config files
-  - SQLite or other persistence outputs
-  - scenario manifest or preset-resolved settings
-- Do not trust older run directories when the user asks about a new execution.
-- If the run exceeds the agreed bounded window, stop it promptly, preserve the partial artifacts, and treat the overrun itself as a toolchain bug or contract mismatch to diagnose.
-### 3) Audit the artifact chain before diagnosing product logic
-- Confirm that you are reading the correct database and log files for the active run.
-- Verify that the event tables you expect are actually the ones written by the runtime.
-- When the run appears "too clean" or fully zeroed, inspect startup selection counters first, such as candidate pool size, listener/tracked-position counts, or the repository's equivalent admission signals, before concluding there were simply no opportunities.
-- Check whether missing results come from:
-  - no candidate selection
-  - no worker completion
-  - planner failure
-  - event persistence mismatch
-  - reading the wrong file
-- Treat this artifact audit as mandatory; repeated failures in the recent chats came from toolchain alignment errors before they came from liquidation logic.
-### 4) Separate product failures from toolchain realism failures
-- When the suspected blocker touches protocol rules, instruction legality, quote semantics, or liquidation invariants, verify the claim against the relevant official docs or upstream source before assigning blame.
-- When the current spec or planned fix assumes a local-simulation capability, verify that the capability is actually supported by the validator and program ownership model before implementing it.
-- For every major blocker, explicitly classify the result as one of:
-  - production bot problem
-  - simulation environment problem
-  - both
-- Treat "both" as a first-class result when a bot bug and a local-harness realism gap are stacked in the same flow.
-- Classify each blocker into one of these buckets:
-  - preset design mismatch
-  - runtime scheduling or budget behavior
-  - stub or mock response unrealism
-  - local validator or cloned-state setup drift
-  - account ordering / remaining-account mismatch
-  - event-generation or persistence bug
-  - genuine product logic bug
-- If the symptom is caused by the local harness, fix the harness instead of masking it in runtime logic.
-- If a local stub inflates or distorts profitability, preserve the runtime behavior and calibrate the stub.
-- If a scenario intentionally stresses one dimension, make sure the harness is not accidentally stressing unrelated dimensions.
-### 4.3) Collapse infeasible simulation designs quickly
-- If official docs or upstream source prove that the proposed local-simulation design is impossible under the current architecture, stop trying to force the implementation through.
-- Treat this as a first-class debugging outcome, not as an implementation blocker to hand-wave away.
-- Name the precise external constraint, such as:
-  - validator preload behavior only applying at genesis/startup
-  - account data mutability being restricted to the owner program
-  - protocol instruction allowlists rejecting the proposed transaction shape
-- When a live spec or plan still claims that infeasible design as in scope, update the spec artifacts immediately so they only describe the remaining feasible scope.
-- Prefer narrowing the scenario to the strongest still-valid readiness or realism checks rather than leaving impossible tasks marked as pending.
-### 4.1) Map the observed failure to the real pipeline stage
-- Do not treat every `liquidation_event` row as evidence that the run reached verification or execution.
-- Reconstruct the stage explicitly, such as:
-  - candidate discovery
-  - local estimate
-  - solver candidate quote
-  - verification or pre-submit re-quote
-  - simulation
-  - execution
-- When logs or event rows expose `stage`, `bucket`, `reason`, or similar structured fields, use them to explain exactly where the attempt stopped.
-- When the user is confused by counts, distinguish:
-  - unique positions
-  - candidate attempts
-  - quote attempts
-  - verification attempts
-  - executed liquidations
-### 4.2) Audit quote-budget behavior before calling the strategy broken
-- Check whether a high quote count reflects many unique positions or repeated coarse/refinement exploration on the same few positions.
-- Trace how the runtime reserves verification capacity versus non-verification capacity, and explain which bucket was exhausted.
-- If the strategy relies on local oracle estimates before quoting, verify whether the admission threshold is merely "positive estimate" or something stricter before assuming those candidates were all strong.
-- When quote pressure appears unreasonable, tie the explanation back to the actual solver-step count, coarse/refinement selection logic, and the number of cross-mint candidates in the run.
-### 5) Trace the full decision tree for missed liquidations or remediations
-- Follow the candidate from discovery through:
-  - local profitability estimate
-  - health precheck
-  - oracle-block classification
-  - remediation registration and rearm
-  - quote admission
-  - quote calibration
-  - pre-submit verification
-  - final execution or skip reason
-- When the runtime reports a generic or overloaded failure label, reopen the logs and derive a finer-grained breakdown before proposing fixes.
-- Distinguish fail-closed behavior from broken behavior; not all skipped liquidations are bugs.
-### 6) Fix the narrowest realistic cause
-- Prefer minimal fixes that improve realism or observability at the root cause:
-  - add preset support to shell tooling instead of hardcoding another branch
-  - make oracle-blocked paths avoid external quote I/O when a local estimate is sufficient
-  - make stubs preserve run-specific metadata instead of falling back to unrealistic defaults
-  - keep continuous oracle updates realistic without breaking the runtime's own core feeds
-- Add or update regression tests when the bug is in the harness, runtime decision tree, or event persistence path.
-- If the scenario becomes a durable benchmark, add or update the named preset and the developer docs in the same change.
-### 7) Re-run the same scenario and compare outcomes
-- After the fix, rerun the same scenario or the shortest bounded version that still exercises the bug.
-- Compare:
-  - event-table counts before and after
-  - dominant skip reasons before and after
-  - whether the runtime reaches the intended decision stage
-  - whether the harness still resembles the user’s requested market conditions
-- Do not claim success based only on unit tests when the original issue was a simulation-toolchain integration problem.
-## Common failure patterns
-- **Bounded-run contract drift**: the analyst assumes the wrong duration env var, CLI flag, or shutdown path, so the run exceeds the promised window and the captured evidence no longer matches the requested contract.
-- **Wrong artifact source**: the analyst inspects an older SQLite file or the wrong event database and concludes that runtime behavior is missing.
-- **Preset says one thing, harness does another**: scenario names sound right, but the actual matrix or oracle mode does not match the user’s intent.
-- **Stub realism drift**: local quote, swap, or oracle stubs distort pricing, accounts, or program IDs enough to create false failures or false profits.
-- **Overloaded “unknown” failures**: logs contain structured reasons, but the first-pass analysis never decomposes them.
-- **Continuous-mode self-sabotage**: a stress regime intended to stale pull oracles instead makes the runtime’s own primary feeds unusable.
-- **Quote budget starvation**: local filtering improves behavior but still lets low-value cross-mint candidates consume scarce quote capacity before higher-value paths can finish.
-- **Blame assigned too early**: the first visible error gets labeled as either bot or tooling before official docs, upstream source, and run artifacts confirm that attribution.
-- **Phase confusion**: event counts are interpreted as verification or execution attempts even though the run stopped much earlier in candidate quote or pre-submit simulation.
-- **Quote-count misread**: a large total quote count is mistaken for many distinct opportunities when the runtime actually spent repeated exploration quotes on a smaller set of positions.
-## Output checklist
-- Name the exact scenario, preset, duration, and run directory.
-- State whether the root cause was product logic, toolchain realism, or both.
-- For protocol-sensitive issues, name the official docs or upstream source used to justify that attribution.
-- Cite the artifact types used: preset, logs, SQLite tables, and code paths.
-- Explain the failing stage in the liquidation pipeline and whether the key counts represent positions, attempts, quotes, or executed outcomes.
-- Summarize the narrow fix and the regression test or rerun evidence.
-- If the final scenario should be reused, state where the preset or docs were added.
-- If official docs disproved part of the planned simulation design, state which spec or plan artifacts were narrowed and why.
-## Example invocation
-`Use $production-sim-debug to run the repository's production local simulation for 5 minutes with the named preset, explain why liquidations did not land, and fix any local harness or runtime-alignment issues that make the simulation unrealistic.`

package/production-sim-debug/agents/openai.yaml DELETED Viewed

@@ -1,4 +0,0 @@
-interface:
-  display_name: "Production Sim Debug"
-  short_description: "診斷 production/local simulation 工具鏈與結果失真"
-  default_prompt: "Use $production-sim-debug to investigate a production/local simulation run, explain why outcomes diverged, and fix the harness or runtime alignment if needed."