npm - @laitszkin/apollo-toolkit - Versions diffs - 2.14.20 → 2.14.22 - Mend

@laitszkin/apollo-toolkit 2.14.20 → 2.14.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CHANGELOG.md +10 -0
package/implement-specs-with-worktree/SKILL.md +8 -2
package/package.json +1 -1
package/systematic-debug/SKILL.md +10 -5

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,16 @@ All notable changes to this repository are documented in this file.
 ### Changed
 - None yet.
+## [v2.14.22] - 2026-04-17
+### Changed
+- Strengthen `systematic-debug` so failing-test investigations must classify each symptom as stale test contract, test-harness interference, or real product bug, and must treat isolated-only passes as evidence to inspect shared-state and parallel-test interference before changing product code.
+## [v2.14.21] - 2026-04-16
+### Changed
+- Tighten `implement-specs-with-worktree` so branch/worktree setup uses direct `git` ref checks and requires an explicit re-check of repo state before retrying after ambiguous creation failures.
 ## [v2.14.20] - 2026-04-15
 ### Changed

package/implement-specs-with-worktree/SKILL.md CHANGED Viewed

@@ -24,8 +24,8 @@ description: >-
 ## Standards
 - Evidence: Read and understand the complete specs set before starting implementation, identify the authoritative parent branch that the worktree should inherit from, verify whether the requested scope is already implemented on that parent branch or current main working tree, and when the requested plan path is missing from the current worktree verify where the authoritative copy actually lives before substituting any nearby spec.
-- Execution: Create or use an isolated worktree for implementation only when the requested spec still needs work, sync the exact approved plan set into that worktree when it is missing there, create the worktree branch from the same parent branch as the worktree base, use the spec-set name as the canonical branch/worktree name, follow the implementation standards from the dependent skills, and commit to a local branch when done.
-- Quality: Complete all planned tasks, run relevant tests, backfill the spec documents with actual completion status, and avoid dragging unrelated sibling specs into the worktree just because they share a batch directory.
+- Execution: Create or use an isolated worktree for implementation only when the requested spec still needs work, sync the exact approved plan set into that worktree when it is missing there, create the worktree branch from the same parent branch as the worktree base, use the spec-set name as the canonical branch/worktree name, prefer direct `git` ref checks over brittle shell inference when deciding whether a branch or worktree already exists, and commit to a local branch when done.
+- Quality: Complete all planned tasks, run relevant tests, backfill the spec documents with actual completion status, avoid dragging unrelated sibling specs into the worktree just because they share a batch directory, and if branch/worktree creation reports ambiguous state re-check the actual git refs and worktree list before retrying.
 - Output: Keep the worktree branch clean with only the intended implementation commits.
 ## Goal
@@ -85,6 +85,12 @@ If not already in a worktree, or if the user explicitly requests a fresh worktre
   git worktree add ../<spec-name> <branch-name>
   ```
 - Move into the new worktree directory and begin work there.
+- When checking whether the target branch or worktree already exists, use direct git evidence instead of shell heuristics:
+  ```bash
+  git show-ref --verify --quiet refs/heads/<branch-name>
+  git worktree list --porcelain
+  ```
+- If branch creation or worktree creation fails in a way that leaves the state unclear, stop and re-read `git show-ref` plus `git worktree list --porcelain` before retrying; do not guess from wrapper output or compound shell conditionals.
 Use branch naming from `references/branch-naming.md`.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@laitszkin/apollo-toolkit",
-  "version": "2.14.20",
+  "version": "2.14.22",
   "description": "Apollo Toolkit npm installer for managed skill copying across Codex, OpenClaw, and Trae.",
   "license": "MIT",
   "author": "LaiTszKin",

package/systematic-debug/SKILL.md CHANGED Viewed

@@ -15,9 +15,9 @@ description: "Systematic debugging workflow for program issues: understand obser
 ## Standards
 - Evidence: Gather expected versus observed behavior from code and runtime facts before deciding on a cause, and when the issue involves a runtime pipeline or bounded run, anchor the investigation to one canonical artifact root or run directory instead of mixed terminal snippets from multiple runs.
-- Execution: Inspect the relevant paths, reproduce every plausible cause with tests or bounded reruns, map each observed failure to a concrete pipeline stage, distinguish toolchain/platform faults from application-logic faults, then apply the minimal fix.
-- Quality: Keep scope focused on the bug, prefer existing test patterns, and explicitly rule out hypotheses that could not be reproduced.
-- Output: Deliver the plausible-cause list, the canonical evidence source, reproduction tests or reruns, validated fix summary, and passing-test confirmation.
+- Execution: Inspect the relevant paths, reproduce every plausible cause with tests or bounded reruns, map each observed failure to a concrete pipeline stage, distinguish toolchain/platform faults from application-logic faults, classify failing tests as stale test contract vs test-harness interference vs real product bug, then apply the minimal fix at the true owner.
+- Quality: Keep scope focused on the bug, prefer existing test patterns, explicitly rule out hypotheses that could not be reproduced, and when failures disappear in isolated reruns treat shared-state or parallel-test interference as a first-class hypothesis instead of silently dismissing the original failure.
+- Output: Deliver the plausible-cause list, the canonical evidence source, reproduction tests or reruns, the final failure classification for each investigated symptom, validated fix summary, and passing-test confirmation.
 ## Core Principles
@@ -26,6 +26,8 @@ description: "Systematic debugging workflow for program issues: understand obser
 - Keep fixes minimal, focused, and validated by passing tests.
 - When logs or runtime artifacts exist, treat one run as canonical and compare every conclusion against that same run's generated artifacts, not against ad hoc console recollection.
 - When the failing flow crosses multiple layers, identify the last confirmed successful stage before assigning blame.
+- When tests fail, separate stale assertions and fixture drift from real implementation regressions before changing product code.
+- If failures only appear under parallel execution or shared shell-out paths, investigate test isolation, shared locks, temp directories, run-name collisions, and environment leakage before blaming the product.
 ## Trigger Conditions
@@ -43,8 +45,8 @@ Also auto-invoke this skill when mismatch evidence appears during normal executi
 1. **Understand and inspect**: Parse expected vs observed behavior, explore relevant code paths, record the canonical failing run or artifact root when runtime output is involved, and build a list of plausible root causes.
 2. **Map the failure boundary**: Break the flow into concrete stages such as setup, startup, readiness, steady-state execution, persistence, and shutdown, then identify the last stage that is confirmed to have succeeded.
-3. **Reproduce with tests or bounded reruns**: Write or extend tests that reproduce every plausible cause, and when the bug depends on runtime orchestration rerun the same bounded command or scenario instead of switching contexts mid-investigation.
-4. **Diagnose and confirm**: Use reproduction evidence to confirm the true root cause, explicitly rule out non-causes, and classify whether the fault belongs to the toolchain/platform layer, the orchestration layer, or application logic.
+3. **Reproduce with tests or bounded reruns**: Write or extend tests that reproduce every plausible cause, and when the bug depends on runtime orchestration rerun the same bounded command or scenario instead of switching contexts mid-investigation. When a failing test passes in isolation, rerun it under the original suite shape to determine whether the real cause is stale expectations, fixture drift, or shared-state interference.
+4. **Diagnose and confirm**: Use reproduction evidence to confirm the true root cause, explicitly rule out non-causes, and classify whether each investigated failure belongs to the toolchain/platform layer, test contract drift, test-harness interference, orchestration, or application logic.
 5. **Fix and validate**: Implement focused fixes and iterate until all reproduction tests or bounded reruns pass.
 ## Implementation Guidelines
@@ -55,11 +57,14 @@ Also auto-invoke this skill when mismatch evidence appears during normal executi
 - If a hypothesized cause cannot be reproduced, document why and deprioritize it explicitly.
 - For long-running or generated-artifact workflows, record the exact command, timestamps, and artifact paths before inspecting outputs so later comparisons stay on the same evidence set.
 - Do not mix baseline data and rerun data casually; compare the same scenario or command across runs and call out when a conclusion comes from a rerun rather than the original failure.
+- When test fixtures or assertions no longer match the implemented contract, update the tests instead of weakening the product behavior to satisfy stale expectations.
+- When tests shell out to shared local infrastructure, add deterministic isolation such as mutexes, unique temp roots, or serialized sections before accepting flakes as inevitable.
 ## Deliverables
 - Plausible root-cause list tied to concrete code paths
 - Canonical failing run or artifact root when runtime evidence exists
 - Reproduction tests or bounded reruns for each plausible cause
+- Failure classification for each symptom: stale contract, harness interference, or real bug
 - Fix summary mapped to failing-then-passing tests
 - Final confirmation that all related tests pass