@laitszkin/apollo-toolkit 2.13.2 → 2.13.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -42,7 +42,6 @@ This repository enables users to install and run a curated set of reusable agent
42
42
  - Users can prepare and open open-source pull requests from existing changes.
43
43
  - Users can generate storyboard image sets from chapters, novels, articles, or scripts.
44
44
  - Users can configure OpenClaw from official documentation, including `~/.openclaw/openclaw.json`, skills loading, SecretRefs, CLI edits, and validation or repair workflows.
45
- - Users can investigate production or local simulation runs, calibrate reusable presets, and fix toolchain realism gaps between harness behavior and expected on-chain behavior.
46
45
  - Users can record multi-account spending and balance changes in monthly Excel ledgers with summary analytics and charts.
47
46
  - Users can recover missing or archived `docs/plans/...` spec sets from issue context, git history, and repository evidence before continuing feature work.
48
47
  - Users can review the current git change set from an unbiased reviewer perspective to find abstraction opportunities and simplification candidates.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,19 @@ All notable changes to this repository are documented in this file.
4
4
 
5
5
  ## [Unreleased]
6
6
 
7
+ ## [v2.13.4] - 2026-04-05
8
+
9
+ ### Changed
10
+ - Update `learn-skill-from-conversations` so it must inventory the current repository's existing skills first, weigh repeated user corrections and error-driven lessons more heavily, extract duplicated workflow fragments into shared skills when warranted, wrap repeatedly customized external skills, and keep project-specific tooling patterns in the owning project's `~/.codex/skills/`.
11
+
12
+ ### Fixed
13
+ - Synchronize `package-lock.json` metadata with the current package version and CLI bin aliases before release publication.
14
+
15
+ ## [v2.13.3] - 2026-04-05
16
+
17
+ ### Removed
18
+ - Remove `production-sim-debug` skill as it is no longer actively maintained or needed.
19
+
7
20
  ## [v2.13.2] - 2026-04-05
8
21
 
9
22
  ### Changed
package/README.md CHANGED
@@ -35,7 +35,6 @@ A curated skill catalog for Codex, OpenClaw, Trae, and Claude Code with a manage
35
35
  - open-source-pr-workflow
36
36
  - openai-text-to-image-storyboard
37
37
  - openclaw-configuration
38
- - production-sim-debug
39
38
  - recover-missing-plan
40
39
  - record-spending
41
40
  - resolve-review-comments
@@ -10,7 +10,12 @@ This skill extracts the latest conversations from `~/.codex/sessions` and `~/.co
10
10
  - Stops immediately when there are no recent sessions
11
11
  - Cleans up `sessions` files older than 7 days after reading
12
12
  - Deletes `archived_sessions` files after reading them
13
+ - Reads existing skills in the current working repository before proposing new ones
14
+ - Prioritizes repeated user corrections, reported errors, tool failures, and reusable workflow gaps
15
+ - Encourages extracting duplicated workflow fragments into shared skills when several skills need the same pattern
16
+ - Wraps repeatedly customized external skills in a local skill when that produces a more reusable workflow
13
17
  - Defaults to creating a new skill unless strong overlap is confirmed
18
+ - Keeps project-specific tool workflows out of the shared catalog and places them in the relevant project's `~/.codex/skills/`
14
19
  - Validates each changed skill with `quick_validate.py`
15
20
 
16
21
  ## Project Structure
@@ -40,7 +45,7 @@ python3 scripts/extract_recent_conversations.py --lookback-minutes 1440
40
45
  ```
41
46
 
42
47
  - If output is `NO_RECENT_CONVERSATIONS`, no action is required.
43
- - Otherwise, review extracted `[USER]` / `[ASSISTANT]` messages and apply updates through `skill-creator`.
48
+ - Otherwise, review extracted `[USER]` / `[ASSISTANT]` messages, compare the lessons against existing skills in the current repository, and apply updates through `skill-creator`.
44
49
 
45
50
  ## License
46
51
 
@@ -15,9 +15,9 @@ description: Learn and evolve the local skill library from recent Codex conversa
15
15
  ## Standards
16
16
 
17
17
  - Evidence: Extract recent Codex session history first and derive reusable lessons only from actual conversation patterns.
18
- - Execution: Inventory the current skill catalog before editing, prioritize repeated requests, user corrections, reported errors, and post-completion follow-up asks that reveal missing closure, then prefer a focused update to the strongest related skill or create a new skill only when the overlap is weak.
19
- - Quality: Take no action when there are no recent sessions, avoid unrelated broad refactors, and validate every changed skill.
20
- - Output: Report the analyzed sessions, extracted lessons, created or updated skills, and the reasoning behind each decision.
18
+ - Execution: Inventory the current working directory's existing skills before editing, prioritize repeated requests, user corrections, tool failures, logic bugs, architecture mismatches, documentation drift, and post-completion follow-up asks that reveal missing closure, then prefer a focused update to the strongest related skill or create a new skill only when the overlap is weak.
19
+ - Quality: Take no action when there are no recent sessions, avoid unrelated broad refactors, keep shared skills cross-project reusable, route project-specific tooling patterns into the relevant project's `~/.codex/skills/`, and validate every changed skill.
20
+ - Output: Report the analyzed sessions, extracted lessons, created or updated skills, shared-vs-project-specific placement decisions, and the reasoning behind each decision.
21
21
 
22
22
  ## Overview
23
23
 
@@ -46,10 +46,15 @@ python3 ~/.codex/skills/learn-skill-from-conversations/scripts/extract_recent_co
46
46
  ### 2) Derive reusable lessons
47
47
 
48
48
  - Identify repeated user needs, recurring friction, and repeated manual workflows.
49
+ - Focus especially on repeated needs, repeated user corrections, and user-reported errors, then ask how a skill can prevent the same failure mode from recurring.
49
50
  - Give extra weight to moments where the user corrected the agent, rejected an earlier interpretation, or pointed out a missing preference or requirement.
50
51
  - Give extra weight to user-reported errors, regressions, or avoidable mistakes, then ask how a skill can prevent repeating that failure mode.
52
+ - Treat tool-call failures, broken code paths, logic mistakes, weak architecture choices, and outputs that drifted from official documentation as valuable evidence when they expose a reusable missing guardrail or workflow.
51
53
  - Treat a user follow-up that asks for cleanup or an omitted finalization step immediately after the assistant reported completion as evidence that the workflow's done criteria were incomplete.
52
54
  - When that kind of follow-up recurs, tighten the owning skill's completion checklist before considering any new-skill extraction.
55
+ - Even when a user request was highly specific, check whether the underlying workflow can be generalized into a reusable skill for the same class of tasks.
56
+ - When multiple existing skills use a near-identical workflow fragment, consider extracting that fragment into a dedicated shared skill instead of leaving the duplication in place.
57
+ - When an external skill is repeatedly used with the same user-specific customization layer, prefer wrapping it in a new local skill that encodes those standing conventions.
53
58
  - Ignore one-off issues that do not provide reusable value.
54
59
  - Distinguish between:
55
60
  - repeated trigger intent that deserves a new skill
@@ -58,7 +63,7 @@ python3 ~/.codex/skills/learn-skill-from-conversations/scripts/extract_recent_co
58
63
 
59
64
  ### 3) Decide new skill vs existing skill (default: new skill)
60
65
 
61
- - First read the relevant skills already present in the working repository so you do not create a duplicate under a different name.
66
+ - First read the relevant skills already present in the current working directory repository (for example `apollo-toolkit`) so you do not create a duplicate under a different name.
62
67
  - Prefer creating a new skill.
63
68
  - Edit an existing skill only when the lesson is strongly related.
64
69
  - Treat relation as strong only when all three conditions hold:
@@ -67,12 +72,17 @@ python3 ~/.codex/skills/learn-skill-from-conversations/scripts/extract_recent_co
67
72
  - The update does not dilute the existing skill's scope.
68
73
  - When the recurring lesson is mainly about preventing a known mistake, prefer updating the existing skill that should have prevented it instead of creating a parallel skill.
69
74
  - When several skills repeat the same narrow workflow fragment, prefer extracting that fragment into a dedicated shared skill instead of copying the same guidance again.
75
+ - When the strongest candidate is an external skill but the user repeatedly adds the same customization or policy layer, create a wrapper skill that calls that external skill while encoding the recurring local conventions.
76
+ - Decide whether the lesson is cross-project reusable before choosing the destination:
77
+ - Cross-project reusable skills belong in the shared skill library.
78
+ - Project-specific workflows, especially ones tied to custom tools from a single repository, belong in that project's `~/.codex/skills/` directory instead of the shared catalog.
70
79
  - If uncertain, create a new skill instead of expanding an old one.
71
80
 
72
81
  ### 4) Apply changes through skill-creator
73
82
 
74
83
  - Explicitly follow `$skill-creator` workflow before editing skills.
75
- - For new skills, initialize with `~/.codex/skills/.system/skill-creator/scripts/init_skill.py`, then complete `SKILL.md` and required resources.
84
+ - For new shared skills, initialize with `~/.codex/skills/.system/skill-creator/scripts/init_skill.py`, then complete `SKILL.md` and required resources.
85
+ - For new project-specific skills, create or update them under the relevant project's `~/.codex/skills/` directory instead of the shared catalog.
76
86
  - For existing skills, make minimal focused edits and keep behavior consistent.
77
87
 
78
88
  ### 5) Validate every changed skill
@@ -87,10 +97,11 @@ python3 ~/.codex/skills/.system/skill-creator/scripts/quick_validate.py <skill-p
87
97
 
88
98
  ### 6) Report result
89
99
 
90
- - Summarize analyzed sessions, repeated needs, user corrections, error-driven lessons, created/updated skills, and why each decision was made.
100
+ - Summarize analyzed sessions, repeated needs, user corrections, error-driven lessons, created/updated skills, placement decisions, and why each decision was made.
91
101
 
92
102
  ## Guardrails
93
103
 
94
104
  - Take no action when there are no sessions in the last 24 hours.
95
105
  - Avoid broad refactors across unrelated skills.
96
106
  - Avoid duplicate skills when an existing skill is strongly related.
107
+ - Do not promote project-specific tool usage into the shared catalog unless the workflow is clearly reusable across repositories.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@laitszkin/apollo-toolkit",
3
- "version": "2.13.2",
3
+ "version": "2.13.4",
4
4
  "description": "Apollo Toolkit npm installer for managed skill copying across Codex, OpenClaw, and Trae.",
5
5
  "license": "MIT",
6
6
  "author": "LaiTszKin",
@@ -1,21 +0,0 @@
1
- MIT License
2
-
3
- Copyright (c) 2026 LaiTszKin
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
@@ -1,91 +0,0 @@
1
- # Production Sim Debug
2
-
3
- An agent skill for investigating production or local simulation runs when the observed behavior diverges from the intended market scenario or expected liquidation/remediation outcomes.
4
-
5
- This skill helps agents reproduce a bounded simulation run, inspect the real preset and runtime artifacts, separate product bugs from local harness drift, and apply the smallest realistic fix before rerunning the same scenario.
6
-
7
- ## What this skill provides
8
-
9
- - A workflow for bounded production/local simulation diagnosis.
10
- - A decision tree for separating runtime logic bugs from harness, stub, preset, and persistence issues.
11
- - A repeatable way to audit the active run directory, logs, and event database before drawing conclusions.
12
- - Guidance for turning recurring ad hoc scenarios into named presets and documented test cases.
13
- - Emphasis on rerunning the same scenario after a fix instead of relying only on unit tests.
14
-
15
- ## Repository structure
16
-
17
- - `SKILL.md`: Main skill definition, workflow, and output contract.
18
- - `agents/openai.yaml`: Agent interface metadata and default prompt.
19
-
20
- ## Installation
21
-
22
- 1. Clone this repository.
23
- 2. Copy this folder to your Codex skills directory:
24
-
25
- ```bash
26
- mkdir -p "$CODEX_HOME/skills"
27
- cp -R production-sim-debug "$CODEX_HOME/skills/production-sim-debug"
28
- ```
29
-
30
- ## Usage
31
-
32
- Invoke the skill in your prompt:
33
-
34
- ```text
35
- Use $production-sim-debug to run this repository's production local simulation with the named preset for 5 minutes, explain why remediations or liquidations did not land, and fix any harness or runtime-alignment issues you confirm.
36
- ```
37
-
38
- Best results come from including:
39
-
40
- - workspace path
41
- - canonical simulation entrypoint
42
- - preset or scenario name
43
- - run duration
44
- - expected market shape or success criteria
45
- - the run directory to inspect, if it already exists
46
- - whether toolchain fixes are in scope or the task is read-only
47
-
48
- If the repository already has a named preset system, prefer using it instead of describing the scenario only in prose.
49
-
50
- ## Example
51
-
52
- ### Input prompt
53
-
54
- ```text
55
- Use $production-sim-debug for this repository.
56
-
57
- Workspace: /workspace/pangu
58
- Entrypoint: ./scripts/run-production-local-sim.sh stress-test-1
59
- Duration: 5 minutes
60
- Expectations:
61
- - Jupiter free tier
62
- - mostly oracle-blocked positions that can be unlocked by remediation
63
- - some directly executable opportunities
64
- - evidence-backed explanation for why liquidations did or did not land
65
- ```
66
-
67
- ### Expected response shape
68
-
69
- ```text
70
- 1) Scenario contract
71
- - Named preset, duration, and run directory used.
72
-
73
- 2) Observed outcomes
74
- - Event-table counts, dominant skip reasons, and runtime stage reached.
75
-
76
- 3) Root cause
77
- - Whether the main blocker was product logic, quote budget, preset design, or harness/stub drift.
78
-
79
- 4) Fixes applied
80
- - Toolchain or runtime fixes with file paths.
81
-
82
- 5) Validation
83
- - Rerun or targeted tests proving the intended stage now executes.
84
-
85
- 6) Remaining gaps
86
- - Any realism differences still left between local simulation and chain behavior.
87
- ```
88
-
89
- ## License
90
-
91
- MIT. See `LICENSE`.
@@ -1,187 +0,0 @@
1
- ---
2
- name: production-sim-debug
3
- description: Investigate production or local simulation runs for runtime-toolchain drift, harness bugs, preset mistakes, unrealistic local stubs, or mismatches between expected and observed liquidation outcomes. Use when users ask to run bounded production simulations, explain why simulated liquidations or remediations did not happen, calibrate presets, or fix local simulation tooling so it better matches real on-chain behavior.
4
- ---
5
-
6
- # Production Sim Debug
7
-
8
- ## Dependencies
9
-
10
- - Required: `systematic-debug` for evidence-first root-cause analysis when a simulation shows failing or missing expected behavior.
11
- - Conditional: `scheduled-runtime-health-check` when the user wants a bounded production/local simulation run executed and observed; `read-github-issue` when the requested simulation work is driven by a remote issue; `marginfi-development` when liquidation, health, receivership, or instruction-order conclusions depend on official marginfi docs/source; `jupiter-development` when swap, quote, routing, or rate-limit conclusions depend on official Jupiter docs; `open-github-issue` when confirmed toolchain gaps should be published.
12
- - Optional: none.
13
- - Fallback: If the relevant simulation entrypoint, preset, logs, or run artifacts cannot be found, stop and report the missing evidence instead of inferring behavior from stale docs or memory.
14
-
15
- ## Standards
16
-
17
- - Evidence: Base conclusions on the actual preset, runtime command, logs, SQLite event store, local stub responses, the code paths that generated them, and official protocol or validator documentation whenever feasibility or instruction legality is in question.
18
- - Execution: Reproduce with the exact scenario first, verify the bounded-run contract against the actual script/env implementation before launch, separate product logic failures from simulation-toolchain failures, verify protocol-sensitive claims against official docs or upstream source before changing code or specs, make the smallest realistic toolchain fix, and rerun the same bounded scenario to validate.
19
- - Quality: Prefer harness or stub fixes that improve realism over one-off scenario hacks, avoid duplicating existing workflow skills, and record reusable presets when a scenario becomes part of the regular test suite.
20
- - Output: Return the scenario contract, observed outcomes, root-cause chain, fixes applied, validation evidence, and any remaining realism gaps.
21
-
22
- ## Goal
23
-
24
- Use this skill to debug simulation workflows where the repository exposes a production-like local run path, but the observed outcomes are distorted by presets, harness logic, local stubs, event persistence, or runtime scheduling constraints.
25
-
26
- ## Workflow
27
-
28
- ### 1) Lock the simulation contract before touching code
29
-
30
- - Identify the exact entrypoint, preset, duration, runtime mode, and rate-limit tier the user expects.
31
- - Read the preset or scenario definition from the repository before assuming what the test means.
32
- - Capture the intended success criteria explicitly, such as:
33
- - successful liquidation count
34
- - remediation count
35
- - oracle-block registration
36
- - profit ranking behavior
37
- - quote budget behavior
38
- - If the scenario is ad hoc but likely to recur, prefer turning it into a named preset instead of leaving it as an undocumented shell invocation.
39
-
40
- ### 2) Reproduce with the real bounded run path
41
-
42
- - Use the same production/local simulation script the repository already treats as canonical.
43
- - Prefer a bounded run window with a stable run name and output directory.
44
- - As soon as the harness prints the active run name or output directory, record it and treat that path as the canonical artifact root for the rest of the investigation.
45
- - Before launch, read the script or wrapper that enforces the run duration and confirm the real control surface, such as the exact env var name, CLI flag, shutdown helper, and artifact path conventions.
46
- - Do not assume a generic `RUNTIME_SECS`-style variable is wired correctly; verify the actual variable names and stop path from code or scripts first.
47
- - Save and inspect the exact artifacts produced by that run:
48
- - main runtime log
49
- - actor or stub logs
50
- - generated env/config files
51
- - SQLite or other persistence outputs
52
- - scenario manifest or preset-resolved settings
53
- - Do not trust older run directories when the user asks about a new execution.
54
- - If the run exceeds the agreed bounded window, stop it promptly, preserve the partial artifacts, and treat the overrun itself as a toolchain bug or contract mismatch to diagnose.
55
-
56
- ### 3) Audit the artifact chain before diagnosing product logic
57
-
58
- - Confirm that you are reading the correct database and log files for the active run.
59
- - Verify that the event tables you expect are actually the ones written by the runtime.
60
- - When the run appears "too clean" or fully zeroed, inspect startup selection counters first, such as candidate pool size, listener/tracked-position counts, or the repository's equivalent admission signals, before concluding there were simply no opportunities.
61
- - Check whether missing results come from:
62
- - no candidate selection
63
- - no worker completion
64
- - planner failure
65
- - event persistence mismatch
66
- - reading the wrong file
67
- - Treat this artifact audit as mandatory; repeated failures in the recent chats came from toolchain alignment errors before they came from liquidation logic.
68
-
69
- ### 4) Separate product failures from toolchain realism failures
70
-
71
- - When the suspected blocker touches protocol rules, instruction legality, quote semantics, or liquidation invariants, verify the claim against the relevant official docs or upstream source before assigning blame.
72
- - When the current spec or planned fix assumes a local-simulation capability, verify that the capability is actually supported by the validator and program ownership model before implementing it.
73
- - For every major blocker, explicitly classify the result as one of:
74
- - production bot problem
75
- - simulation environment problem
76
- - both
77
- - Treat "both" as a first-class result when a bot bug and a local-harness realism gap are stacked in the same flow.
78
-
79
- - Classify each blocker into one of these buckets:
80
- - preset design mismatch
81
- - runtime scheduling or budget behavior
82
- - stub or mock response unrealism
83
- - local validator or cloned-state setup drift
84
- - account ordering / remaining-account mismatch
85
- - event-generation or persistence bug
86
- - genuine product logic bug
87
- - If the symptom is caused by the local harness, fix the harness instead of masking it in runtime logic.
88
- - If a local stub inflates or distorts profitability, preserve the runtime behavior and calibrate the stub.
89
- - If a scenario intentionally stresses one dimension, make sure the harness is not accidentally stressing unrelated dimensions.
90
-
91
- ### 4.3) Collapse infeasible simulation designs quickly
92
-
93
- - If official docs or upstream source prove that the proposed local-simulation design is impossible under the current architecture, stop trying to force the implementation through.
94
- - Treat this as a first-class debugging outcome, not as an implementation blocker to hand-wave away.
95
- - Name the precise external constraint, such as:
96
- - validator preload behavior only applying at genesis/startup
97
- - account data mutability being restricted to the owner program
98
- - protocol instruction allowlists rejecting the proposed transaction shape
99
- - When a live spec or plan still claims that infeasible design as in scope, update the spec artifacts immediately so they only describe the remaining feasible scope.
100
- - Prefer narrowing the scenario to the strongest still-valid readiness or realism checks rather than leaving impossible tasks marked as pending.
101
-
102
- ### 4.1) Map the observed failure to the real pipeline stage
103
-
104
- - Do not treat every `liquidation_event` row as evidence that the run reached verification or execution.
105
- - Reconstruct the stage explicitly, such as:
106
- - candidate discovery
107
- - local estimate
108
- - solver candidate quote
109
- - verification or pre-submit re-quote
110
- - simulation
111
- - execution
112
- - When logs or event rows expose `stage`, `bucket`, `reason`, or similar structured fields, use them to explain exactly where the attempt stopped.
113
- - When the user is confused by counts, distinguish:
114
- - unique positions
115
- - candidate attempts
116
- - quote attempts
117
- - verification attempts
118
- - executed liquidations
119
-
120
- ### 4.2) Audit quote-budget behavior before calling the strategy broken
121
-
122
- - Check whether a high quote count reflects many unique positions or repeated coarse/refinement exploration on the same few positions.
123
- - Trace how the runtime reserves verification capacity versus non-verification capacity, and explain which bucket was exhausted.
124
- - If the strategy relies on local oracle estimates before quoting, verify whether the admission threshold is merely "positive estimate" or something stricter before assuming those candidates were all strong.
125
- - When quote pressure appears unreasonable, tie the explanation back to the actual solver-step count, coarse/refinement selection logic, and the number of cross-mint candidates in the run.
126
-
127
- ### 5) Trace the full decision tree for missed liquidations or remediations
128
-
129
- - Follow the candidate from discovery through:
130
- - local profitability estimate
131
- - health precheck
132
- - oracle-block classification
133
- - remediation registration and rearm
134
- - quote admission
135
- - quote calibration
136
- - pre-submit verification
137
- - final execution or skip reason
138
- - When the runtime reports a generic or overloaded failure label, reopen the logs and derive a finer-grained breakdown before proposing fixes.
139
- - Distinguish fail-closed behavior from broken behavior; not all skipped liquidations are bugs.
140
-
141
- ### 6) Fix the narrowest realistic cause
142
-
143
- - Prefer minimal fixes that improve realism or observability at the root cause:
144
- - add preset support to shell tooling instead of hardcoding another branch
145
- - make oracle-blocked paths avoid external quote I/O when a local estimate is sufficient
146
- - make stubs preserve run-specific metadata instead of falling back to unrealistic defaults
147
- - keep continuous oracle updates realistic without breaking the runtime's own core feeds
148
- - Add or update regression tests when the bug is in the harness, runtime decision tree, or event persistence path.
149
- - If the scenario becomes a durable benchmark, add or update the named preset and the developer docs in the same change.
150
-
151
- ### 7) Re-run the same scenario and compare outcomes
152
-
153
- - After the fix, rerun the same scenario or the shortest bounded version that still exercises the bug.
154
- - Compare:
155
- - event-table counts before and after
156
- - dominant skip reasons before and after
157
- - whether the runtime reaches the intended decision stage
158
- - whether the harness still resembles the user’s requested market conditions
159
- - Do not claim success based only on unit tests when the original issue was a simulation-toolchain integration problem.
160
-
161
- ## Common failure patterns
162
-
163
- - **Bounded-run contract drift**: the analyst assumes the wrong duration env var, CLI flag, or shutdown path, so the run exceeds the promised window and the captured evidence no longer matches the requested contract.
164
- - **Wrong artifact source**: the analyst inspects an older SQLite file or the wrong event database and concludes that runtime behavior is missing.
165
- - **Preset says one thing, harness does another**: scenario names sound right, but the actual matrix or oracle mode does not match the user’s intent.
166
- - **Stub realism drift**: local quote, swap, or oracle stubs distort pricing, accounts, or program IDs enough to create false failures or false profits.
167
- - **Overloaded “unknown” failures**: logs contain structured reasons, but the first-pass analysis never decomposes them.
168
- - **Continuous-mode self-sabotage**: a stress regime intended to stale pull oracles instead makes the runtime’s own primary feeds unusable.
169
- - **Quote budget starvation**: local filtering improves behavior but still lets low-value cross-mint candidates consume scarce quote capacity before higher-value paths can finish.
170
- - **Blame assigned too early**: the first visible error gets labeled as either bot or tooling before official docs, upstream source, and run artifacts confirm that attribution.
171
- - **Phase confusion**: event counts are interpreted as verification or execution attempts even though the run stopped much earlier in candidate quote or pre-submit simulation.
172
- - **Quote-count misread**: a large total quote count is mistaken for many distinct opportunities when the runtime actually spent repeated exploration quotes on a smaller set of positions.
173
-
174
- ## Output checklist
175
-
176
- - Name the exact scenario, preset, duration, and run directory.
177
- - State whether the root cause was product logic, toolchain realism, or both.
178
- - For protocol-sensitive issues, name the official docs or upstream source used to justify that attribution.
179
- - Cite the artifact types used: preset, logs, SQLite tables, and code paths.
180
- - Explain the failing stage in the liquidation pipeline and whether the key counts represent positions, attempts, quotes, or executed outcomes.
181
- - Summarize the narrow fix and the regression test or rerun evidence.
182
- - If the final scenario should be reused, state where the preset or docs were added.
183
- - If official docs disproved part of the planned simulation design, state which spec or plan artifacts were narrowed and why.
184
-
185
- ## Example invocation
186
-
187
- `Use $production-sim-debug to run the repository's production local simulation for 5 minutes with the named preset, explain why liquidations did not land, and fix any local harness or runtime-alignment issues that make the simulation unrealistic.`
@@ -1,4 +0,0 @@
1
- interface:
2
- display_name: "Production Sim Debug"
3
- short_description: "診斷 production/local simulation 工具鏈與結果失真"
4
- default_prompt: "Use $production-sim-debug to investigate a production/local simulation run, explain why outcomes diverged, and fix the harness or runtime alignment if needed."