npm - @laitszkin/apollo-toolkit - Versions diffs - 3.9.1 → 3.9.3 - Mend

@laitszkin/apollo-toolkit 3.9.1 → 3.9.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -34,6 +34,18 @@ All notable changes to this repository are documented in this file.
 ### Added
 - (None yet)
+## [v3.9.3] - 2026-05-07
+### Changed
+- `solve-issues-found-during-review`: add explicit completion criteria (spec conformance plus full closure of security, edge-case, and related ancillary review streams), tighten dependencies and closing report gates.
+## [v3.9.2] - 2026-05-06
+### Changed
+- Rename skill `harden-app-security` → `discover-security-issues` and realign catalog references, agent prompts, and `test/skill-workflows.test.js`.
+- Refactor `discover-edge-cases`, `discover-security-issues`, and `review-change-set` for clearer dependencies, workflows, and agent-facing copy.
+- Standardize git submission: skills that record or publish changes now depend on **`commit-and-push`** (`implement-specs*`, `implement-specs-with-subagents`, `merge-conflict-resolver` when committing, `open-source-pr-workflow`, `resolve-review-comments`, `solve-issues-found-during-review`, `develop-new-features`, `enhance-existing-features`); **`commit-and-push`** runs **push** only when the user explicitly requests a remote update.
 ## [v3.9.1] - 2026-05-06
 ### Changed

package/README.md CHANGED Viewed

@@ -21,7 +21,7 @@ A curated skill catalog for Codex, OpenClaw, Trae, Agents, and Claude Code with
 - financial-research
 - read-github-issue
 - generate-spec
-- harden-app-security
+- discover-security-issues
 - implement-specs
 - implement-specs-with-subagents
 - implement-specs-with-worktree
@@ -204,7 +204,7 @@ Compatibility note:
 - `recover-missing-plan` is a local skill used by `enhance-existing-features` and `ship-github-issue-fix` when a referenced `docs/plans/...` spec set is missing or archived.
 - `maintain-skill-catalog` can conditionally use `find-skills`, but its install source is not verified in this repository, so it is intentionally omitted from the table.
 - `read-github-issue` uses GitHub CLI (`gh`) directly for remote issue discovery and inspection, so it does not add any extra skill dependency.
-- `review-spec-related-changes` is a local skill that depends on `review-change-set`, `discover-edge-cases`, and `harden-app-security` for secondary code-practice checks after business-goal completion is reviewed against the governing specs.
+- `review-spec-related-changes` is a local skill that depends on `review-change-set`, `discover-edge-cases`, and `discover-security-issues` for secondary code-practice checks after business-goal completion is reviewed against the governing specs.
 ## Release publishing

package/analyse-app-logs/scripts/__pycache__/filter_logs_by_time.cpython-312.pyc CHANGED Viewed

Binary file

package/analyse-app-logs/scripts/__pycache__/log_cli_utils.cpython-312.pyc CHANGED Viewed

Binary file

package/analyse-app-logs/scripts/__pycache__/search_logs.cpython-312.pyc CHANGED Viewed

Binary file

package/commit-and-push/README.md CHANGED Viewed

@@ -31,6 +31,6 @@ When the diff includes code changes, `review-change-set` is still a conditional
 Apply the same rule to every other conditional gate: if its scenario is met during classification, it becomes blocking before commit rather than a best-effort follow-up.
-That includes risk-driven review gates such as `discover-edge-cases` and `harden-app-security` whenever the change surface makes them applicable.
+That includes risk-driven review gates such as `discover-edge-cases` and `discover-security-issues` whenever the change surface makes them applicable.
 For release workflows, use `version-release`.

package/commit-and-push/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: commit-and-push
 description: >-
-  Commit and push only (no semver): inspect staged vs unstaged, classify scopes, run mandated reviews (`review-change-set`, conditional `discover-edge-cases`/`harden-app-security`), **`submission-readiness-check`** BEFORE final commit honoring `CHANGELOG.md` Unreleased + `archive-specs` redirections, preserve intentional staging splits, forbid UI git stubs, VERIFY remote hashes post-push **`version-release` elsewhere**.
+  Commit and push only (no semver): inspect staged vs unstaged, classify scopes, run mandated reviews (`review-change-set`, conditional `discover-edge-cases`/`discover-security-issues`), **`submission-readiness-check`** BEFORE final commit honoring `CHANGELOG.md` Unreleased + `archive-specs` redirections, preserve intentional staging splits, forbid UI git stubs, VERIFY remote hashes post-push **`version-release` elsewhere**.
   Use for “please commit”, “submit”, “push branch” lacking explicit semver/tag language **STOP** tagging here… BAD skip readiness red… GOOD staged subset untouched unrelated dirty files changelog mirrors diff… hashes `git rev-parse HEAD` versus upstream… archive specs before commit flagged…
 ---
@@ -10,7 +10,7 @@ description: >-
 ## Dependencies
 - Required: **`submission-readiness-check`** immediately before the **final** commit.
-- Conditional: **`archive-specs`** when readiness (or completed specs) requires doc conversion or categorized `docs/` alignment; **`review-change-set`** for every **code-affecting** scope; **`discover-edge-cases`** and **`harden-app-security`** become **required** when classification/risk indicates (same scope)—treat as blocking, not polish.
+- Conditional: **`archive-specs`** when readiness (or completed specs) requires doc conversion or categorized `docs/` alignment; **`review-change-set`** for every **code-affecting** scope; **`discover-edge-cases`** and **`discover-security-issues`** become **required** when classification/risk indicates (same scope)—treat as blocking, not polish.
 - Optional: none.
 - Fallback: Any **required** dependency unavailable ⇒ **MUST** stop and report—**MUST NOT** “light” commit.
@@ -18,7 +18,7 @@ description: >-
 - **MUST** use real `git` mutations (`git add`, `git commit`, `git push`, `git stash`, etc.); **MUST NOT** treat UI tokens (`::git-commit`, IDE buttons) as proof of history.
 - **MUST** run **`submission-readiness-check`** before final commit; unresolved readiness (e.g. stale/missing `CHANGELOG.md` **Unreleased**, doc drift) **blocks** commit.
-- Code-affecting: **`review-change-set` MANDATORY**; unresolved confirmed findings **block**. When risk profile matches, **`discover-edge-cases`** / **`harden-app-security`** equally blocking.
+- Code-affecting: **`review-change-set` MANDATORY**; unresolved confirmed findings **block**. When risk profile matches, **`discover-edge-cases`** / **`discover-security-issues`** equally blocking.
 - **`archive-specs`**: when readiness says convert/archive or `docs/` mismatch—**MUST** run **before** final commit, not as a vague follow-up.
 - **MUST** reconcile **staged vs unstaged** with user intent—**MUST NOT** broaden scope by auto-staging unrelated files when user staged a subset.
 - **`CHANGELOG.md` `Unreleased`**: for code-affecting or user-visible docs, **MUST** reflect this change before commit; reopen diff after edits to match commit scope.
@@ -27,14 +27,14 @@ description: >-
 - **MUST NOT** run version bump, tag, or GitHub release (**use `version-release`**).
 - Clean worktree requests: **MUST** inspect `HEAD`, upstream, last commit—**MUST NOT** fabricate “pushed” when already satisfied or impossible.
-**Repository regression checks (verbatim requirements):** Treat root `CHANGELOG.md` `Unreleased` coverage as mandatory for code-affecting or user-visible changes. Re-open the final `CHANGELOG.md` diff after readiness updates. **`review-change-set` is required for code-affecting changes**; Run `review-change-set` for every code-affecting change before continuing; treat unresolved review findings as blocking. Any conditional gate whose trigger is confirmed by this classification becomes mandatory before commit. Treat every scenario-matched gate as blocking before commit. **`discover-edge-cases` and `harden-app-security` are important review gates**—when their scenario is met, treat them as blocking review gates, not optional polish.
+**Repository regression checks (verbatim requirements):** Treat root `CHANGELOG.md` `Unreleased` coverage as mandatory for code-affecting or user-visible changes. Re-open the final `CHANGELOG.md` diff after readiness updates. **`review-change-set` is required for code-affecting changes**; Run `review-change-set` for every code-affecting change before continuing; treat unresolved review findings as blocking. Any conditional gate whose trigger is confirmed by this classification becomes mandatory before commit. Treat every scenario-matched gate as blocking before commit. **`discover-edge-cases` and `discover-security-issues` are important review gates**—when their scenario is met, treat them as blocking review gates, not optional polish.
 ## Standards (summary)
 - **Evidence**: `git status`/`diff`; classification drives gates; changelog diff matches commit.
 - **Execution**: Inspect → classify → (deps) → readiness → commit → push verify.
 - **Quality**: No gate bypass; sequential git ops; preserve intentional commit boundaries.
-- **Output**: Conventional commit message + confirmed remote + note stash/scope if any.
+- **Output**: Conventional commit message + confirmed remote **when push ran** + note stash/scope if any.
 ## References
@@ -54,15 +54,16 @@ description: >-
 3. **Branch target** — Honor user branch; if switch needed, protect unrelated changes; cherry-pick/replay off wrong branch safely; worktree cases: identify authoritative target **before** replay.
    - **Pause →** Am I about to merge noise because diff > issue scope—should I stop and narrow first?
-4. **Code-affecting gates** — `review-change-set` always; add `discover-edge-cases` / `harden-app-security` when risk/trigger says so; fix or document blockers; re-test material logic.
+4. **Code-affecting gates** — `review-change-set` always; add `discover-edge-cases` / `discover-security-issues` when risk/trigger says so; fix or document blockers; re-test material logic.
 5. **Readiness** — Run **`submission-readiness-check`**; if it routes to **`archive-specs`**, run that **now**; fix `Unreleased` bullets; recheck changelog vs staged intent.
    - **Pause →** Could I commit while readiness still red—**why not**?
 6. **Commit** — Respect staging; separate commits if user asked; Conventional message per `references/commit-messages.md`.
-7. **Push** — Sequential; verify remote hash; sync local branch after if user asked; worktree cleanup **only after** target branch verified good.
-   - **Pause →** What two hashes prove remote == local?
+7. **Push** — **Only** when the user requested remote update (`push`, `publish`, PR branch sync, explicit upstream publish, or equivalent). If the user asked **only** for a **local** commit with **no** remote publish in this thread, finish after step 6, state local `HEAD`, and **do not** push.
+   - **Pause →** Did the user **explicitly** ask to update a remote, or only to record commits locally?
+   - **Pause →** What two hashes prove remote == local when push **did** run?
 ## Sample hints

package/commit-and-push/agents/openai.yaml CHANGED Viewed

@@ -1,4 +1,4 @@
 interface:
   display_name: "Commit and Push"
   short_description: "Submit local changes with commit and push only"
-  default_prompt: "Use $commit-and-push to inspect the current git state and classify the diff. Treat every conditional gate whose scenario is met as blocking before any commit: if the change set includes code changes, run $review-change-set; if the reviewed risk profile says edge-case or security review is needed, run $discover-edge-cases and $harden-app-security as blocking gates too; if completed specs should be converted or docs need normalization, ensure $archive-specs runs through $submission-readiness-check; if changelog synchronization is needed, complete it before continuing. Then run any additional required code-quality skills, hand the repository to $submission-readiness-check so it can synchronize completed plan archives, project docs, AGENTS.md/CLAUDE.md, and CHANGELOG.md before any commit, confirm root CHANGELOG.md Unreleased reflects the actual pending change set, preserve user staging intent, create a concise Conventional Commit, and push to the intended branch without any versioning or release steps."
+  default_prompt: "Use $commit-and-push to inspect the current git state and classify the diff. Treat every conditional gate whose scenario is met as blocking before any commit: if the change set includes code changes, run $review-change-set; if the reviewed risk profile says edge-case or security review is needed, run $discover-edge-cases and $discover-security-issues as blocking gates too; if completed specs should be converted or docs need normalization, ensure $archive-specs runs through $submission-readiness-check; if changelog synchronization is needed, complete it before continuing. Then run any additional required code-quality skills, hand the repository to $submission-readiness-check so it can synchronize completed plan archives, project docs, AGENTS.md/CLAUDE.md, and CHANGELOG.md before any commit, confirm root CHANGELOG.md Unreleased reflects the actual pending change set, preserve user staging intent, create a concise Conventional Commit, and push to the intended branch without any versioning or release steps."

package/develop-new-features/SKILL.md CHANGED Viewed

@@ -11,9 +11,9 @@ description: >-
 ## Dependencies
 - Required: `generate-spec` for shared planning artifacts and `test-case-strategy` for risk-driven test selection, oracles, and unit drift checks before coding.
-- Conditional: none.
+- Conditional: **`commit-and-push`** when the user requests **git commit** and/or **push** after delivery—**MUST** delegate final submission to **`commit-and-push`** (implementation detail: often via **`implement-specs`**, which already requires it).
 - Optional: none.
-- Fallback: **`generate-spec`** **or** **`test-case-strategy`** missing ⇒ **stop** (no improvised planning/tests).
+- Fallback: **`generate-spec`** **or** **`test-case-strategy`** missing ⇒ **stop** (no improvised planning/tests). If the user requested **commit/push** and **`commit-and-push`** is unavailable, **MUST** stop and report.
 ## Non-negotiables

package/discover-edge-cases/README.md CHANGED Viewed

@@ -11,7 +11,7 @@ It does not write tests, patch code, or open PRs.
 It follows a strict workflow:
 1. Detect whether `git diff` exists.
 2. Inspect only changed files plus minimal dependencies, or perform a full-project scan when no diff exists.
-3. Run `harden-app-security` as an adversarial dependency for code-affecting scope.
+3. Run `discover-security-issues` as an adversarial dependency for code-affecting scope.
 4. Probe the highest-risk edge cases and gather concrete evidence.
 5. Reproduce confirmed issues at least twice and check nearby variants.
 6. Prioritize confirmed findings and report hardening guidance only.
@@ -31,7 +31,7 @@ Use this skill when a task asks you to:
 - Treat prior authorship as irrelevant; even code written earlier in the same conversation must be challenged like third-party code.
 - Decisions must be evidence-based; speculative ideas stay marked as hypotheses.
 - Keep only reproducible findings with exact evidence.
-- Run `harden-app-security` as a required adversarial cross-check for code-affecting scope.
+- Run `discover-security-issues` as a required adversarial cross-check for code-affecting scope.
 - Report recommended fixes and test ideas, but do not implement them in this skill.
 ## External API requirements

package/discover-edge-cases/SKILL.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 name: discover-edge-cases
-description: Discover reproducible edge-case risks in changed code or a selected codebase scope, prove them with concrete evidence, and report prioritized findings without modifying implementation. Use when users ask to find edge cases, assess hardening gaps, or validate that unusual inputs and error paths are covered.
+description: >-
+  Diff-first (or full-repo) discovery of **reproducible** edge-case risks: boundaries, null/empty, failure paths, concurrency, observability; evidence via code/tests/runtime—**no edits, no new tests, no PRs**. For code-affecting scope, cross-check with **`discover-security-issues`** before final report.
+  Use for edge-case review, hardening gaps, unusual inputs/error paths, pre-merge risk pass **STOP** implementation or “just fix it here”… BAD unproven alarm list… GOOD path:line + double repro…
 ---
 # Discover Edge Cases
@@ -8,113 +10,82 @@ description: Discover reproducible edge-case risks in changed code or a selected
 ## Dependencies
 - Required: none.
-- Conditional: `harden-app-security` for code-affecting scopes before finalizing the report.
+- Conditional: **`discover-security-issues`** on **code-affecting** scope before finalizing the report (adversarial security pass).
 - Optional: none.
-- Fallback: If the required security cross-check is unavailable for a code-affecting scope, stop and report the missing dependency.
+- Fallback: If that security cross-check is **required** but unavailable, **MUST** stop and report the missing dependency.
-## Standards
+## Non-negotiables
-- Evidence: Keep only reproducible findings backed by code, tests, runtime output, or direct reproduction steps.
-- Execution: Determine scope first, run focused probes, confirm reproducibility, then report findings without remediation.
-- Quality: Separate confirmed findings from hypotheses and cover boundary, failure, stateful, and observability edge cases that matter to the scope.
-- Output: Return prioritized findings, edge-case evidence, risk assessment, hardening guidance, and residual risk only.
+- **Discovery-only**: **MUST NOT** edit code, add/modify tests, or open PRs.
+- **MUST** keep only **reproducible** findings; label guesses as **hypotheses**.
+- **MUST** reproduce each **confirmed** issue **at least twice** (same trigger); vary neighbors (empty vs null, malformed vs wrong-type).
+- **MUST** discard authorship bias—including code from earlier in the conversation.
+- If remediation is requested: finish this pass first; hand off **confirmed** items to an implementation workflow.
-## Non-negotiable Boundaries
+## Standards (summary)
-- This skill is discovery-only: do not edit code, do not add or modify tests, and do not open PRs.
-- Keep only reproducible findings with clear evidence.
-- Mark unverified ideas as hypotheses and separate them from confirmed findings.
-- If the task also requires remediation, finish this discovery pass first, then hand off confirmed findings to another implementation workflow.
-- Discard authorship bias completely: treat code written earlier in the conversation or by this agent as untrusted until evidence proves otherwise.
+- **Evidence**: `path:line`, commands/inputs, test output, or runtime symptoms—no intent-only claims.
+- **Execution**: Scope → baseline read → focused probes (2–5 high-impact) → validate → prioritize → report.
+- **Quality**: Prefer fewer strong findings; flag data integrity, silent failure, retry storms, cross-module propagation.
+- **Output**: Prioritized findings, reproduction, risk, hardening **advice**, residual risk/hypotheses.
 ## Workflow
-### 1) Determine scan scope (required)
+**Chain-of-thought:** Answer **`Pause →`** each step; if scope is wrong, fix before probing.
-- Run `git diff --name-only` first.
-- If diff exists: inspect only changed files plus the minimum dependency chain required to validate suspected edge cases.
-- If no diff exists: scan the full project, prioritizing core domain logic, external API boundaries, stateful workflows, and concurrency-sensitive modules.
-- If no actionable issue is found, report `No actionable edge-case finding identified` and stop.
+### 1) Determine scan scope
-### 2) Build a factual baseline
+- `git diff --name-only` first.
+- **With diff**: changed files + minimum dependency chain to validate suspected edges.
+- **No diff**: whole project, prioritizing domain logic, external boundaries, stateful/concurrent modules.
+- If nothing actionable after honest pass: report `No actionable edge-case finding identified` and stop.
+   - **Pause →** Can I name the **smallest file set** I must read—not the whole monorepo by default?
-- Read the relevant code paths end-to-end before judging behavior.
-- Re-derive behavior from code, tests, runtime output, and reproduced inputs only; ignore prior intent, authorship, or confidence from earlier turns.
-- Clarify input/output contracts: types, valid ranges, null handling, ordering assumptions, retry/error behavior, and state transitions.
-- Run existing tests or a minimal reproduction when needed to confirm actual vs expected behavior.
-- Record exact evidence with file references (`path:line`) and observable symptoms.
+### 2) Build factual baseline
-### 3) Execute focused edge-case probes
+- Read end-to-end before judging; derive behavior from code, tests, runtime only.
+- Clarify contracts: types, ranges, null, ordering, retries, state transitions.
+   - **Pause →** What did I **execute** (test/command) vs only read?
-Prioritize 2-5 high-risk cases directly tied to the selected scope:
+### 3) Focused probes (prioritize 2–5)
-- Empty collections / empty strings / None / null
-- Boundary values: 0, 1, -1, max/min limits, overflow
-- Duplicate, ordering, sorting, or deduplication assumptions
-- Exception paths: external dependency failure, timeout, retry, or partial data missing
-- Invalid formats: malformed strings, invalid date/timezone, or unexpected types
-- Concurrency/reentrancy: repeated calls, state contamination, or race windows
-- Architecture-level edge cases: backpressure, resource exhaustion, timeout propagation, or partial commit/rollback behavior
+Target high-risk patterns tied to scope:
-For broader coverage, load references as needed:
+- Empty/null/malformed/unexpected types; boundaries (0, 1, min/max, overflow); duplicates/order.
+- Dependency failure: timeout, partial data, retry loops; invalid formats.
+- Concurrency/reentrancy; architecture edges: backpressure, exhaustion, partial commit/rollback.
+- **HTTP/API** (if in scope): 429/500 behavior; logging with status/id/retry/latency (no silent fails).
-- `references/architecture-edge-cases.md`
-- `references/code-edge-cases.md`
+Load as needed: `references/architecture-edge-cases.md`, `references/code-edge-cases.md`.
+   - **Pause →** Would **discover-security-issues** flag this sink if it is auth/input injection—did I schedule that pass for code changes?
-#### External API checks
+### 4) Confirm reproducibility
-If the scope includes external API calls, validate:
+- Two passes per confirmed issue; note variants tried; keep unconfirmed as hypotheses.
-- observable health/availability handling,
-- degradation behavior for at least HTTP 429 and 500,
-- actionable error logging (status code, request id, retry count, latency) to avoid silent failures.
+### 5) Prioritize
-### 4) Confirm reproducibility
+- User impact, frequency/exploitability, blast radius; call out integrity, state corruption, silent failure.
+### 6) Security cross-check (code-affecting)
+- Run **`discover-security-issues`** on the **same** scope; integrate **confirmed** security items (do not duplicate as edge trivia unless distinct).
+### 7) Report only
+Deliver: (1) Findings—title, severity, evidence, repro, broken invariant; (2) Edge evidence—preconditions, observation, variants; (3) Risk—impact/likelihood/scope; (4) Hardening guidance (advisory); (5) Residual risk—hypotheses, next checks.
+## Minimum coverage (apply what fits scope)
+- Input validation; boundary behavior; failure/degraded modes; state/idempotency/concurrency/rollback; actionable observability.
+## Sample hints
+- **Diff**: One new parser → empty string + max length + malformed delimiter **before** “maybe SQL.”
+- **No diff**: Start at payment/state machine module—highest consequence.
+- **Handoff**: Five confirmed edges → remediation skill gets **numbered list + repro**—not this skill patching.
+## References
-- Reproduce each confirmed issue at least twice through the same trigger path.
-- For high-risk findings, try nearby variants such as boundary neighbors, empty vs null, malformed vs well-typed invalid input, repeated calls, and stale ordering.
-- Capture the exact command, request, or input together with the observed failure or missing protection.
-- Keep unverified ideas as hypotheses only.
-### 5) Prioritize confirmed findings
-- Rank findings by user impact, exploitability or frequency, and blast radius.
-- Call out data-integrity, state corruption, silent failure, retry storm, and cross-module propagation risks explicitly.
-- Prefer fewer, stronger findings over many speculative ones.
-### 6) Report findings only
-Deliver:
-1. Findings (highest risk first)
-   - Title and severity/priority
-   - Evidence (`path:line`)
-   - Reproduction steps or triggering input
-   - Broken expectation/invariant
-2. Edge-case evidence
-   - Preconditions
-   - Observed behavior
-   - Reproducibility notes and nearby variant results
-3. Risk assessment
-   - Impact, likelihood, and scope
-   - Why this matters in system context
-4. Hardening guidance (advice only)
-   - Recommended fix direction
-   - Suggested test coverage to add during remediation
-5. Residual risk
-   - Hypotheses, unknowns, and next validation ideas
-## Minimum Coverage
-Apply all relevant checks for the selected scope:
-- Input validation: empty/null/malformed/unexpected-type handling
-- Boundary behavior: zero/one/min/max/overflow/ordering edges
-- Failure behavior: timeout, retry, partial dependency failure, degraded mode
-- Stateful behavior: idempotency, replay, concurrency, rollback, duplicate processing
-- Observability: actionable errors and logging for failures that would otherwise be silent
-## Resources
-- `references/architecture-edge-cases.md`: cross-module/system-level edge-case checklist.
-- `references/code-edge-cases.md`: code-level input, boundary, and error-path checklist.
+- `references/architecture-edge-cases.md` — system-level checklist.
+- `references/code-edge-cases.md` — code-level input/error/concurrency checklist.

package/discover-edge-cases/agents/openai.yaml CHANGED Viewed

@@ -1,4 +1,4 @@
 interface:
   display_name: "Discover Edge Cases"
-  short_description: "Discover reproducible edge-case risks and coverage gaps"
-  default_prompt: "Use $discover-edge-cases to scan the current diff first (or the full codebase when there is no diff), discard any bias toward code written earlier in the conversation, run $harden-app-security as an adversarial cross-check for code-affecting scope, identify the highest-risk reproducible edge-case findings, validate them with concrete evidence, prioritize the confirmed risks, and report hardening and test recommendations without modifying code."
+  short_description: "Find reproducible edge-case risks with evidence-only reporting"
+  default_prompt: "Use $discover-edge-cases to scan the current diff first (or the full codebase when there is no diff), discard any bias toward code written earlier in the conversation, run $discover-security-issues as an adversarial cross-check for code-affecting scope, identify the highest-risk reproducible edge-case findings, validate them with concrete evidence, prioritize the confirmed risks, and report hardening and test recommendations without modifying code."

package/{harden-app-security → discover-security-issues}/CHANGELOG.md RENAMED Viewed

@@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
 The format is based on Keep a Changelog and this project follows Semantic Versioning.
+## [v0.0.3] - 2026-05-06
+### Changed
+- Rename skill directory and identifier from `harden-app-security` to `discover-security-issues`; refresh `SKILL.md`, `README.md`, and agent display metadata to match discovery-only semantics.
 ## [v0.0.2] - 2026-03-11
 ### Changed

package/discover-security-issues/README.md ADDED Viewed

@@ -0,0 +1,35 @@
+# discover-security-issues
+Evidence-first, **discovery-only** adversarial security workflow across agent, financial, and general software surfaces.
+## What this skill provides
+- Reproduce exploitable behavior with payloads, requests, and `path:line` proof—**no patches or PRs**.
+- Modules: `agent-system`, `financial-program`, `software-system`, and `combined` (cross-boundary chains).
+- Catalog-driven scenarios (SQLi, XSS, CSRF, SSRF, IDOR, prompt injection, money-path races, …).
+- Prioritized reporting plus advisory hardening notes and residual risk.
+## Layout
+- `SKILL.md` — workflow, modules, output shape.
+- `agents/openai.yaml` — metadata and default prompt.
+- `references/*` — attack catalogs and optional test-pattern snippets.
+## Typical use
+1. Pick module(s) and trust boundaries.
+2. Walk selected reference catalogs; record only **double-reproduced** issues.
+3. Prioritize and report; stop before implementation—hand off confirmed findings if fixes are needed.
+## Example
+```text
+Use $discover-security-issues in discovery-only mode.
+Module: combined (agent-system + software-system).
+Focus: prompt injection to privileged tools, SQL injection, IDOR.
+Deliver severity-ordered findings with exploit steps and path:line evidence.
+```
+## License
+MIT. See [LICENSE](LICENSE).

package/discover-security-issues/SKILL.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+name: discover-security-issues
+description: >-
+  Discovery-only adversarial audit: map trust boundaries, run module catalogs (`agent-system`, `financial-program`, `software-system`, `combined`), reproduce exploitable behavior with payloads/commands and `path:line` evidence; prioritize impact × exploitability—**no code edits, no PRs, no auto-remediation**.
+  Use for security review, vuln hunting, SQLi/XSS/auth/IDOR checks, agent prompt-injection/tool abuse, money-path races **STOP** when user wants patches shipped—hand off findings… BAD single vague “looks fine”… GOOD two-pass repro, hypothesis vs confirmed…
+---
+# Discover Security Issues
+## Dependencies
+- Required: none.
+- Conditional: none.
+- Optional: none.
+- Fallback: not applicable.
+## Non-negotiables
+- **Discovery-only**: **MUST NOT** edit code, apply patches, open PRs, or run “fix workflows.”
+- **MUST** keep only **reproducible** issues with exploit evidence; separate **hypotheses** from **confirmed** findings.
+- **MUST** reproduce each confirmed exploit **at least twice** on the same path; use nearby payload variants for high-risk sinks.
+- **MUST** discard authorship bias—treat all code as untrusted until evidence proves behavior.
+## Standards (summary)
+- **Evidence**: Payload/precondition, observable failure, `path:line`, commands or requests that reproduce.
+- **Execution**: Pick modules → boundaries → scenarios from references → validate → prioritize → report only.
+- **Quality**: Rank by impact, exploitability, reach; unknowns listed under residual risk.
+- **Output**: Findings (severity-ordered), attack evidence, risk notes, hardening **advice** (not patches), residual risk.
+## Workflow
+**Chain-of-thought:** After each step, satisfy **`Pause →`** before continuing; halt on missing scope or contradictory module choice.
+### 1) Scope and modules
+- Choose one or more of: `agent-system`, `financial-program`, `software-system`, `combined` (cross-boundary chains).
+- List untrusted inputs, privileged actions, and protected assets; state invariants that must hold.
+   - **Pause →** Which module catalogs did I **open** (file names)—not guessed from memory?
+### 2) Execute scenarios from references
+- **Agent**: `references/agent-attack-catalog.md`; optional `references/security-test-patterns-agent.md` (prompt injection, tool abuse, memory/exfil paths).
+- **Financial**: `references/red-team-extreme-scenarios.md`, `references/risk-checklist.md`; optional `references/security-test-patterns-finance.md` (authz, replay/race, idempotency, precision, lifecycle).
+- **Software**: `references/common-software-attack-catalog.md` (SQL/NoSQL/command injection, XSS, CSRF, SSRF, traversal, upload, session/JWT, IDOR/BOLA, deserialization, misconfig).
+- **Combined**: relevant subsets **plus** chains (e.g. injection → privileged API).
+   - **Pause →** Did I record **payload + preconditions + observed behavior** for each candidate—not just “maybe vulnerable”?
+### 3) Validate reproducibility
+- Re-run each confirmed path twice; add encoding/casing/delimiter variants on hot sinks.
+   - **Pause →** Is anything still “likely” without a second repro—downgrade to hypothesis?
+### 4) Prioritize
+- Order Critical/High → Medium → Low using impact, exploitability, blast radius (multi-tenant / cross-tenant called out).
+### 5) Report only
+Deliver (see **Output shape** below): findings, attack evidence, prioritization, hardening guidance (advisory), residual risk.
+## Minimum coverage (apply per selected module)
+- **Core**: trust boundaries, authn/authz, input → dangerous sink paths, secrets/sensitive data handling.
+- **Agent**: prompt/indirect injection, unauthorized tools/actions, exfil, memory poisoning resistance.
+- **Financial**: object-level authz, replay/race/idempotency, precision, oracle/side-effect safety, failure consistency.
+- **Software**: injection families, XSS/CSRF/SSRF, traversal/upload, session/JWT, brute-force/rate limits, debug/CORS/secrets exposure.
+- **Combined**: module checks + realistic cross-boundary chains.
+## Output shape
+1. **Findings** (high → low): title, severity, evidence (`path:line`), reproduction steps/payload, impacted invariant/asset.
+2. **Attack evidence**: preconditions, commands/requests, observed insecure behavior, variant results.
+3. **Risk prioritization**: impact, exploitability, reach; why it matters in **this** system.
+4. **Hardening guidance** (advice only): fix direction, validation focus post-remediation.
+5. **Residual risk**: hypotheses, assumptions, follow-up probes.
+## Sample hints
+- **Module**: Web API + Claude tool-use → `combined` (software + agent); deposits/withdrawals → include `financial-program`.
+- **Evidence**: “SQLi possible” without two runs + exact parameter → stays **hypothesis** until repro’d.
+- **Stop line**: User says “patch it now” → finish report; hand off to implementation skills—**do not** self-patch here.
+## References
+- `references/agent-attack-catalog.md`, `references/security-test-patterns-agent.md`
+- `references/red-team-extreme-scenarios.md`, `references/risk-checklist.md`, `references/security-test-patterns-finance.md`
+- `references/common-software-attack-catalog.md`, `references/test-snippets.md` (optional snippets)

package/discover-security-issues/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Discover Security Issues"
+  short_description: "Discovery-only adversarial audit: reproducible exploits across agent, finance, and software stacks"
+  default_prompt: "Use $discover-security-issues to run a discovery-only adversarial audit. Reproduce exploitable vulnerabilities with concrete evidence and severity prioritization across agent-system, financial-program, and software-system scopes (including SQL injection and common web flaws). Do not apply code fixes or PR actions."

package/docs-to-voice/scripts/__pycache__/docs_to_voice.cpython-312.pyc CHANGED Viewed

Binary file

package/enhance-existing-features/SKILL.md CHANGED Viewed

@@ -11,9 +11,9 @@ description: >-
 ## Dependencies
 - Required: `test-case-strategy` for risk selection, oracles, drift checks.
-- Conditional: **`generate-spec`** when spec triggers below fire; **`recover-missing-plan`** when user-named `docs/plans/...` is missing/archived/mismatched.
+- Conditional: **`generate-spec`** when spec triggers below fire; **`recover-missing-plan`** when user-named `docs/plans/...` is missing/archived/mismatched; **`commit-and-push`** when the user requests **git commit** and/or **push** to persist completed work—**MUST** delegate final submission to **`commit-and-push`** (often via **`implement-specs`** / **`implement-specs-with-worktree`** when a spec path is active).
 - Optional: none.
-- Fallback: **`test-case-strategy`** unavailable ⇒ **stop**. Spec path required but **`generate-spec`** unavailable ⇒ **stop**.
+- Fallback: **`test-case-strategy`** unavailable ⇒ **stop**. Spec path required but **`generate-spec`** unavailable ⇒ **stop**. If the user requested **commit/push** and **`commit-and-push`** is unavailable, **MUST** stop and report.
 ## Non-negotiables

package/generate-spec/scripts/__pycache__/create-specscpython-312.pyc CHANGED Viewed

Binary file

package/implement-specs/SKILL.md CHANGED Viewed

@@ -1,19 +1,19 @@
 ---
 name: implement-specs
 description: >-
-  Land an approved `docs/plans/{YYYY-MM-DD}/{change}` (or batch member path) on the currently checked-out branch: read the full planning bundle + `coordination.md` when relevant, execute every in-scope `tasks.md` item, backfill honest checklist/spec state, commit locally—**do not** create branches/worktrees or push unless the user explicitly widens the request mid-thread.
+  Land an approved `docs/plans/{YYYY-MM-DD}/{change}` (or batch member path) on the currently checked-out branch: read the full planning bundle + `coordination.md` when relevant, execute every in-scope `tasks.md` item, backfill honest checklist/spec state, then **finalize through `commit-and-push`**—**do not** create branches/worktrees or widen to push/release unless the user explicitly asks mid-thread.
   Choose this for “implement on this branch” scenarios. If isolation is required use **`implement-specs-with-worktree`**; if multiple specs need delegated workers use **`implement-specs-with-subagents`**.
-  Good: stay on `feature/foo`, finish tasks, `git commit`. Bad: `git worktree add` purely to avoid dirty trees—wrong skill unless user re-scoped.
+  Good: stay on `feature/foo`, finish tasks, run **`commit-and-push`**. Bad: `git worktree add` purely to avoid dirty trees—wrong skill unless user re-scoped.
 ---
 # Implement Specs
 ## Dependencies
-- Required: `enhance-existing-features` and `develop-new-features` for implementation standards.
+- Required: `enhance-existing-features` and `develop-new-features` for implementation standards; **`commit-and-push`** for the **final** implementation commit (and push when the user explicitly requests remote update).
 - Conditional: `generate-spec` if spec files need clarification or updates; `recover-missing-plan` if the requested plan path is missing from the current checkout.
 - Optional: none.
-- Fallback: If `enhance-existing-features` or `develop-new-features` is unavailable, **MUST** stop immediately and report the missing dependency. Do not improvise substitute standards.
+- Fallback: If **`enhance-existing-features`**, **`develop-new-features`**, or **`commit-and-push`** is unavailable, **MUST** stop immediately and report the missing dependency. Do not improvise substitute standards or ungated `git commit`.
 ## Non-negotiables
@@ -21,8 +21,8 @@ description: >-
 - **MUST NOT** create a branch, switch branches, or add or use a `git worktree` for this work unless the user explicitly changes the request in the same conversation.
 - **MUST** treat the approved `tasks.md` / contracts as the scope boundary: complete every item that is in scope for this request, run the relevant tests, and **MUST** backfill the planning documents with factual completion status (no aspirational checkboxes).
 - **MUST NOT** expand scope to unrelated sibling spec directories solely because they share a batch folder.
-- **MUST** commit the finished work to the **current** branch as a focused implementation commit (split only when an unavoidable checkpoint is required); the combined result **MUST** contain only the intended changes.
-- **MUST NOT** `git push`, tag, or perform release steps unless the user explicitly asks.
+- **MUST** finalize the implementation through **`commit-and-push`** after staging the intended change set (shared readiness, reviews per that skill’s classification, conventional commit message); **MUST NOT** complete the deliverable with a bare `git commit`, IDE-only commit, or other shortcut that skips **`submission-readiness-check`** / mandated gates.
+- **MUST NOT** `git push`, tag, or perform release steps **outside** **`commit-and-push`** (unless **`version-release`** / **`open-source-pr-workflow`** explicitly applies per user request).
 - If the plan path is missing or ambiguous: **MUST** use `recover-missing-plan` or other verifiable repository evidence to locate the authoritative plan; **MUST NOT** substitute a nearby path by guess. After recovery, **MUST** re-read the recovered files before coding so implementation and backfill target the same snapshot.
 ## Standards (summary)
@@ -55,8 +55,8 @@ description: >-
    - **Pause →** If I checked a box, can I point to **commit + test run** (or equivalent) that makes that check true—no wishful checking?
    - **Pause →** Did any scope shrink or shift during implementation; if so, is the plan text updated **honestly**?
-5. **Commit** — Commit on the current branch; keep the diff limited to this spec’s intent.
-   - **Pause →** Does `git diff` show only this spec’s intended surface, or do I need to revert irrelevant noise first?
+5. **Submit** — Stage the intended implementation/backfill diff. Run **`commit-and-push`** through commit using that staged intent (and **push** only when the user explicitly requested remote update). Keep scope to this spec only; split into multiple submission passes only when an unavoidable checkpoint requires separate commits.
+   - **Pause →** Does `git diff --cached` (or the equivalent staged view) show only this spec’s intended surface, or do I need to unstage/revert noise first?
    - **Pause →** Am I on the **same** branch I named in step 2, without a silent branch switch?
 6. **Report** — State current branch, commit hash, tests run, and which plan files were backfilled.
@@ -76,3 +76,4 @@ If this skill directory contains `references/implement-specs-common.md`, treat i
 - `enhance-existing-features`: brownfield implementation standards
 - `develop-new-features`: greenfield implementation standards
 - `recover-missing-plan`: missing or mismatched plan recovery
+- **`commit-and-push`**: final commit/readiness (push only when user requests remote update)

package/implement-specs-with-subagents/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ description: >-
 ## Dependencies
-- Required: `implement-specs-with-worktree` (every implementation subagent **MUST** follow this skill for its assigned directory); `merge-changes-from-local-branches` (phase integration **MUST NOT** skip this between phases).
+- Required: `implement-specs-with-worktree` (every implementation subagent **MUST** follow this skill for its assigned directory); `merge-changes-from-local-branches` (phase integration **MUST NOT** skip this between phases); **`commit-and-push`** (integration-branch **preparation** commits and **any** post-merge submission the user expects on that branch—**MUST NOT** bypass for bare `git commit` / ungated push when this skill owns the branch).
 - Conditional: `generate-spec` if the batch is not implementation-ready; `review-change-set` only when the user asks for post-merge review.
 - Fallback: If independent subagents cannot run, **MUST** report that limitation. Serial `implement-specs-with-worktree` across specs is allowed **only** after the user explicitly approves that fallback.
@@ -19,7 +19,7 @@ description: >-
 - **MUST NOT** delegate until `coordination.md`, when present, explicitly allows parallel implementation—or when absent, until you have verified no contradiction to parallel work.
 - **MUST** enumerate exact in-scope spec directories **before** any subagent starts; **MUST NOT** delegate archived, sibling, or “nearest guess” directories unless the user explicitly includes them.
 - **MUST** verify each delegated directory contains `spec.md`, `tasks.md`, `checklist.md`, `contract.md`, and `design.md` before launch.
-- **MUST** complete, verify, and **commit** documented shared preparation on the integration branch **before** any implementation subagent starts when `preparation.md` exists or specs mandate pre-work; the coordinating agent **MUST NOT** delegate that preparation. Subagents **MUST NOT** start until this branch is clean at the preparation commit (or there is no required preparation).
+- **MUST** complete, verify, and **finalize** documented shared preparation on the integration branch **through `commit-and-push`** before any implementation subagent starts when `preparation.md` exists or specs mandate pre-work; the coordinating agent **MUST NOT** delegate that preparation. Subagents **MUST NOT** start until this branch is clean at the preparation commit (or there is no required preparation).
 - **MUST** build a directed dependency graph from `coordination.md` plus each spec’s `spec.md` / `design.md` (edges: *provider spec must merge before consumer spec*). **MUST** partition specs into phases by topological **layers**: Phase 1 = specs with **no** in-batch prerequisites; for *k* ≥ 2, Phase *k* = specs whose in-batch prerequisites all appear in phases before *k*. **MUST NOT** start phase *k* until phase *k − 1* is fully merged into the integration branch (or you have an explicit user override). If the graph has a cycle, **MUST** stop and report it.
 - **MUST** assign **exactly one** spec directory per implementation subagent; **MUST NOT** assign multiple directories to one implementation subagent.
 - **MUST** cap **active** implementation subagents at **four**; **MUST** start them **one at a time** with confirmation each is running before the next start; **MUST** back off on rate limits (no burst launches). Four is a ceiling, not a quota.
@@ -43,7 +43,7 @@ description: >-
    - **Pause →** Can I quote **verbatim** why each enumerated directory is in scope—not “probably related”?
    - **Pause →** What **exact sentence** from `coordination.md` gates parallel readiness, if any—or what absence did I infer from—and is that justified?
-2. **Preparation (blocking when required)** — Coordinating agent executes shared prep only: read tasks/outputs/verification hooks, satisfy scope, run listed checks, **commit** on the integration branch, record prep commit hash in ledger, leave working tree clean. If prep fails, **stop** (no subagents).
+2. **Preparation (blocking when required)** — Coordinating agent executes shared prep only: read tasks/outputs/verification hooks, satisfy scope, run listed checks, **finalize on the integration branch through `commit-and-push`** (push only if the user requested remote update), record prep commit hash in ledger, leave working tree clean. If prep fails, **stop** (no subagents).
    - **Pause →** Am I silently delegating preparation to a **subagent** when the coordinating agent owns it—is that happening?
    - **Pause →** What **commit hash** and **clean-tree** confirmation will subagents inherit as baseline?