npm - @laitszkin/apollo-toolkit - Versions diffs - 2.14.21 → 2.14.23 - Mend

@laitszkin/apollo-toolkit 2.14.21 → 2.14.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +12 -0
package/improve-observability/SKILL.md +12 -1
package/package.json +1 -1
package/scheduled-runtime-health-check/SKILL.md +4 -1
package/systematic-debug/SKILL.md +11 -5

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,18 @@ All notable changes to this repository are documented in this file.
 ### Changed
 - None yet.
+## [v2.14.23] - 2026-04-18
+### Changed
+- Strengthen `scheduled-runtime-health-check` so bounded runtime investigations must explicitly choose and report the highest-fidelity execution mode that matches the user's claim, instead of silently substituting a lower-fidelity harness for production-like behavior.
+- Strengthen `systematic-debug` so runtime bug investigations must reproduce failures in the same runtime mode as the observed claim, and treat scenario or harness reruns as lower-fidelity evidence unless that limitation is made explicit.
+- Strengthen `improve-observability` so aggregate success counters must stay reconcilable with per-entity detail records across harness and production paths, treating missing detail rows as an observability bug.
+## [v2.14.22] - 2026-04-17
+### Changed
+- Strengthen `systematic-debug` so failing-test investigations must classify each symptom as stale test contract, test-harness interference, or real product bug, and must treat isolated-only passes as evidence to inspect shared-state and parallel-test interference before changing product code.
 ## [v2.14.21] - 2026-04-16
 ### Changed

package/improve-observability/SKILL.md CHANGED Viewed

@@ -15,7 +15,7 @@ description: Add focused observability to an existing system so opaque workflows
 ## Standards
 - Evidence: Read the real execution path and current telemetry before deciding where visibility actually disappears.
-- Execution: Add the smallest useful instrumentation around decision points, scope contracts, outcomes, and failure reasons.
+- Execution: Add the smallest useful instrumentation around decision points, scope contracts, outcomes, failure reasons, and any cross-path lifecycle gaps between summary counters and detailed outcome records.
 - Quality: Keep changes behavior-neutral, use structured high-signal telemetry, avoid secrets, and lock the signals with tests.
 - Output: Report which stages are now observable, which fields or metrics to inspect, and which tests validate the instrumentation.
@@ -41,6 +41,7 @@ Do not use this skill for generic bug fixing when the main request is behavior c
 - Read the relevant entrypoints, orchestration layers, and current telemetry before editing.
 - Identify the exact stages where information disappears: validation, branching, external calls, persistence, retries, settlement, cleanup, or error handling.
+- When the same business event can flow through multiple execution paths such as harness, replay, batch worker, or production runtime, compare those paths explicitly and find where their observability contract diverges.
 - Reuse the project's existing logger, tracing library, metric naming style, and error taxonomy.
 ### 2. Choose the smallest useful signals
@@ -53,6 +54,7 @@ Add instrumentation only where it helps answer a concrete debugging question. Pr
 - explicit logs for skipped paths and early returns
 - metrics or counters for outcome classes when aggregates matter
 - trace spans only when the project already uses tracing or timing data is necessary
+- paired detail records or structured child events when an aggregate success counter would otherwise hide which entities actually completed downstream follow-up work
 Avoid logging secrets, full payload dumps, or highly volatile text that breaks searchability.
@@ -69,6 +71,15 @@ For each critical stage, make these states observable when relevant:
 If a failure is already logged, improve its context instead of duplicating another generic error line.
+### 3.2 Keep aggregate and detail telemetry in lockstep
+When a system reports aggregate counts such as `success_count`, `processed_count`, or `remediation_success_count`, ensure operators can reconcile those counts back to detailed records.
+- emit or persist one detail record per counted entity when feasible
+- carry the same identifiers and outcome stage across both aggregate and detailed telemetry
+- treat "aggregate says success but detail table is empty" as an observability bug, not as an acceptable reporting gap
+- if multiple runtime modes claim the same business event, keep the critical observability fields aligned across those modes unless the output contract intentionally differs
 ### 3.1 Preserve cross-stage scope contracts
 When a workflow derives scope in one stage and consumes it later, make that contract observable end-to-end.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@laitszkin/apollo-toolkit",
-  "version": "2.14.21",
+  "version": "2.14.23",
   "description": "Apollo Toolkit npm installer for managed skill copying across Codex, OpenClaw, and Trae.",
   "license": "MIT",
   "author": "LaiTszKin",

package/scheduled-runtime-health-check/SKILL.md CHANGED Viewed

@@ -15,7 +15,7 @@ description: Use a background terminal to run a user-specified command immediate
 ## Standards
 - Evidence: Anchor every conclusion to the requested command, execution window, startup/shutdown timestamps, one canonical run folder or artifact root, captured logs, and concrete runtime signals.
-- Execution: Collect the run contract, verify the real stop mechanism before launch, use a background terminal, optionally update the code only when the user asks, execute the requested command immediately or in the requested window, record the canonical run folder once the process materializes it, capture logs, stop cleanly when bounded, then delegate log review to `analyse-app-logs` only when findings are requested or needed.
+- Execution: Collect the run contract, verify the real stop mechanism before launch, choose the highest-fidelity execution mode that matches the user's intent, use a background terminal, optionally update the code only when the user asks, execute the requested command immediately or in the requested window, record the canonical run folder once the process materializes it, capture logs, stop cleanly when bounded, then delegate log review to `analyse-app-logs` only when findings are requested or needed.
 - Quality: Keep scheduling, execution, and shutdown deterministic; separate confirmed findings from hypotheses; and mark each assessed module healthy/degraded/failed/unknown with reasons.
 - Output: Return the run configuration, execution status, log locations, optional code-update result, optional module health by area, confirmed issues, potential issues, observability gaps, and scheduler status when applicable.
@@ -46,6 +46,7 @@ This skill is an orchestration layer. It owns the background terminal session, o
 - Prefer one bounded observation window over open-ended monitoring.
 - Use one dedicated background terminal session per requested run so execution and logs stay correlated.
 - Record the canonical run directory, artifact root, or other generated output location as soon as it exists, and use that as the source of truth for later analysis.
+- When a repository exposes both synthetic harnesses and production-like runtime entrypoints, prefer the production-like path for claims about real runtime, market, or operator behavior; use the lower-fidelity harness only when the user explicitly asked for it or when it is the only safe reproduction surface.
 - Treat code update as optional and only perform it when the user explicitly requests it.
 - Treat startup, steady-state, and shutdown as part of the same investigation.
 - Do not call a module healthy unless there is at least one positive signal for it.
@@ -58,6 +59,7 @@ This skill is an orchestration layer. It owns the background terminal session, o
 1. Define the run contract
    - Confirm or derive the workspace, execution command, optional code-update step, optional schedule, optional duration, readiness signal, log locations, and whether post-run findings are required.
    - Derive commands from trustworthy sources first: `package.json`, `Makefile`, `docker-compose.yml`, `Procfile`, scripts, or project docs.
+   - If multiple commands exist for the same workflow, rank them by fidelity and state explicitly which mode you are choosing: production-like runtime, bounded integration harness, or synthetic scenario replay.
    - If no trustworthy execution command or stop method can be found, stop and ask only for the missing command rather than guessing.
 2. Prepare the background terminal run
    - Use a dedicated background terminal session for the whole workflow.
@@ -77,6 +79,7 @@ This skill is an orchestration layer. It owns the background terminal session, o
 5. Run and capture readiness
    - Execute the requested command in the same background terminal.
    - As soon as the command emits or creates its canonical run directory, artifact root, or equivalent output location, record that path and reuse it for every later check.
+   - Report the exact runtime mode used in the evidence record so later analysis does not accidentally treat synthetic-harness results as proof about production behavior.
    - Wait for a concrete readiness signal when the command is expected to stay up, such as a health endpoint, listening-port log, worker boot line, or queue-consumer ready message.
    - If readiness never arrives, stop the run, preserve logs, and treat it as a failed startup window.
 6. Observe and stop when bounded

package/systematic-debug/SKILL.md CHANGED Viewed

@@ -15,9 +15,9 @@ description: "Systematic debugging workflow for program issues: understand obser
 ## Standards
 - Evidence: Gather expected versus observed behavior from code and runtime facts before deciding on a cause, and when the issue involves a runtime pipeline or bounded run, anchor the investigation to one canonical artifact root or run directory instead of mixed terminal snippets from multiple runs.
-- Execution: Inspect the relevant paths, reproduce every plausible cause with tests or bounded reruns, map each observed failure to a concrete pipeline stage, distinguish toolchain/platform faults from application-logic faults, then apply the minimal fix.
-- Quality: Keep scope focused on the bug, prefer existing test patterns, and explicitly rule out hypotheses that could not be reproduced.
-- Output: Deliver the plausible-cause list, the canonical evidence source, reproduction tests or reruns, validated fix summary, and passing-test confirmation.
+- Execution: Inspect the relevant paths, reproduce every plausible cause with tests or bounded reruns, choose a reproduction mode whose fidelity matches the user's claim, map each observed failure to a concrete pipeline stage, distinguish toolchain/platform faults from application-logic faults, classify failing tests as stale test contract vs test-harness interference vs real product bug, then apply the minimal fix at the true owner.
+- Quality: Keep scope focused on the bug, prefer existing test patterns, explicitly rule out hypotheses that could not be reproduced, and when failures disappear in isolated reruns treat shared-state or parallel-test interference as a first-class hypothesis instead of silently dismissing the original failure.
+- Output: Deliver the plausible-cause list, the canonical evidence source, reproduction tests or reruns, the final failure classification for each investigated symptom, validated fix summary, and passing-test confirmation.
 ## Core Principles
@@ -25,7 +25,10 @@ description: "Systematic debugging workflow for program issues: understand obser
 - Cover all plausible causes with reproducible tests instead of guessing a single cause.
 - Keep fixes minimal, focused, and validated by passing tests.
 - When logs or runtime artifacts exist, treat one run as canonical and compare every conclusion against that same run's generated artifacts, not against ad hoc console recollection.
+- When a repository has both scenario or harness runs and a production-like runtime, do not treat the lower-fidelity mode as proof about the higher-fidelity mode unless you explicitly state that limitation and the user agrees.
 - When the failing flow crosses multiple layers, identify the last confirmed successful stage before assigning blame.
+- When tests fail, separate stale assertions and fixture drift from real implementation regressions before changing product code.
+- If failures only appear under parallel execution or shared shell-out paths, investigate test isolation, shared locks, temp directories, run-name collisions, and environment leakage before blaming the product.
 ## Trigger Conditions
@@ -43,8 +46,8 @@ Also auto-invoke this skill when mismatch evidence appears during normal executi
 1. **Understand and inspect**: Parse expected vs observed behavior, explore relevant code paths, record the canonical failing run or artifact root when runtime output is involved, and build a list of plausible root causes.
 2. **Map the failure boundary**: Break the flow into concrete stages such as setup, startup, readiness, steady-state execution, persistence, and shutdown, then identify the last stage that is confirmed to have succeeded.
-3. **Reproduce with tests or bounded reruns**: Write or extend tests that reproduce every plausible cause, and when the bug depends on runtime orchestration rerun the same bounded command or scenario instead of switching contexts mid-investigation.
-4. **Diagnose and confirm**: Use reproduction evidence to confirm the true root cause, explicitly rule out non-causes, and classify whether the fault belongs to the toolchain/platform layer, the orchestration layer, or application logic.
+3. **Reproduce with tests or bounded reruns**: Write or extend tests that reproduce every plausible cause, and when the bug depends on runtime orchestration rerun the same bounded command or the same runtime mode instead of switching contexts mid-investigation. If the user is asking about real runtime or market behavior, prefer the production-like bounded run over a synthetic scenario replay unless safety or tooling constraints make that impossible. When a failing test passes in isolation, rerun it under the original suite shape to determine whether the real cause is stale expectations, fixture drift, or shared-state interference.
+4. **Diagnose and confirm**: Use reproduction evidence to confirm the true root cause, explicitly rule out non-causes, and classify whether each investigated failure belongs to the toolchain/platform layer, test contract drift, test-harness interference, orchestration, or application logic.
 5. **Fix and validate**: Implement focused fixes and iterate until all reproduction tests or bounded reruns pass.
 ## Implementation Guidelines
@@ -55,11 +58,14 @@ Also auto-invoke this skill when mismatch evidence appears during normal executi
 - If a hypothesized cause cannot be reproduced, document why and deprioritize it explicitly.
 - For long-running or generated-artifact workflows, record the exact command, timestamps, and artifact paths before inspecting outputs so later comparisons stay on the same evidence set.
 - Do not mix baseline data and rerun data casually; compare the same scenario or command across runs and call out when a conclusion comes from a rerun rather than the original failure.
+- When test fixtures or assertions no longer match the implemented contract, update the tests instead of weakening the product behavior to satisfy stale expectations.
+- When tests shell out to shared local infrastructure, add deterministic isolation such as mutexes, unique temp roots, or serialized sections before accepting flakes as inevitable.
 ## Deliverables
 - Plausible root-cause list tied to concrete code paths
 - Canonical failing run or artifact root when runtime evidence exists
 - Reproduction tests or bounded reruns for each plausible cause
+- Failure classification for each symptom: stale contract, harness interference, or real bug
 - Fix summary mapped to failing-then-passing tests
 - Final confirmation that all related tests pass