@laitszkin/apollo-toolkit 3.9.0 → 3.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/CHANGELOG.md +12 -0
  2. package/README.md +2 -2
  3. package/analyse-app-logs/scripts/__pycache__/filter_logs_by_time.cpython-312.pyc +0 -0
  4. package/analyse-app-logs/scripts/__pycache__/log_cli_utils.cpython-312.pyc +0 -0
  5. package/analyse-app-logs/scripts/__pycache__/search_logs.cpython-312.pyc +0 -0
  6. package/commit-and-push/README.md +1 -1
  7. package/commit-and-push/SKILL.md +9 -8
  8. package/commit-and-push/agents/openai.yaml +1 -1
  9. package/develop-new-features/SKILL.md +2 -2
  10. package/discover-edge-cases/README.md +2 -2
  11. package/discover-edge-cases/SKILL.md +61 -90
  12. package/discover-edge-cases/agents/openai.yaml +2 -2
  13. package/{harden-app-security → discover-security-issues}/CHANGELOG.md +5 -0
  14. package/discover-security-issues/README.md +35 -0
  15. package/discover-security-issues/SKILL.md +88 -0
  16. package/discover-security-issues/agents/openai.yaml +4 -0
  17. package/docs-to-voice/scripts/__pycache__/docs_to_voice.cpython-312.pyc +0 -0
  18. package/enhance-existing-features/SKILL.md +2 -2
  19. package/generate-spec/scripts/__pycache__/create-specscpython-312.pyc +0 -0
  20. package/implement-specs/SKILL.md +9 -8
  21. package/implement-specs-with-subagents/SKILL.md +9 -6
  22. package/implement-specs-with-worktree/SKILL.md +4 -4
  23. package/katex/scripts/__pycache__/render_katex.cpython-312.pyc +0 -0
  24. package/merge-conflict-resolver/SKILL.md +3 -3
  25. package/open-github-issue/scripts/__pycache__/open_github_issue.cpython-312.pyc +0 -0
  26. package/open-source-pr-workflow/SKILL.md +12 -7
  27. package/package.json +1 -1
  28. package/read-github-issue/scripts/__pycache__/find_issues.cpython-312.pyc +0 -0
  29. package/read-github-issue/scripts/__pycache__/read_issue.cpython-312.pyc +0 -0
  30. package/resolve-review-comments/SKILL.md +14 -8
  31. package/resolve-review-comments/scripts/__pycache__/review_threads.cpython-312.pyc +0 -0
  32. package/review-change-set/README.md +3 -3
  33. package/review-change-set/SKILL.md +50 -65
  34. package/review-change-set/agents/openai.yaml +2 -2
  35. package/review-spec-related-changes/README.md +1 -1
  36. package/review-spec-related-changes/SKILL.md +4 -4
  37. package/review-spec-related-changes/agents/openai.yaml +1 -1
  38. package/solve-issues-found-during-review/README.md +1 -1
  39. package/solve-issues-found-during-review/SKILL.md +3 -3
  40. package/text-to-short-video/scripts/__pycache__/enforce_video_aspect_ratio.cpython-312.pyc +0 -0
  41. package/version-release/README.md +1 -1
  42. package/version-release/SKILL.md +2 -2
  43. package/version-release/agents/openai.yaml +1 -1
  44. package/harden-app-security/README.md +0 -46
  45. package/harden-app-security/SKILL.md +0 -127
  46. package/harden-app-security/agents/openai.yaml +0 -4
  47. /package/{harden-app-security → discover-security-issues}/LICENSE +0 -0
  48. /package/{harden-app-security → discover-security-issues}/references/agent-attack-catalog.md +0 -0
  49. /package/{harden-app-security → discover-security-issues}/references/common-software-attack-catalog.md +0 -0
  50. /package/{harden-app-security → discover-security-issues}/references/red-team-extreme-scenarios.md +0 -0
  51. /package/{harden-app-security → discover-security-issues}/references/risk-checklist.md +0 -0
  52. /package/{harden-app-security → discover-security-issues}/references/security-test-patterns-agent.md +0 -0
  53. /package/{harden-app-security → discover-security-issues}/references/security-test-patterns-finance.md +0 -0
  54. /package/{harden-app-security → discover-security-issues}/references/test-snippets.md +0 -0
package/CHANGELOG.md CHANGED
@@ -34,6 +34,18 @@ All notable changes to this repository are documented in this file.
34
34
  ### Added
35
35
  - (None yet)
36
36
 
37
+ ## [v3.9.2] - 2026-05-06
38
+
39
+ ### Changed
40
+ - Rename skill `harden-app-security` → `discover-security-issues` and realign catalog references, agent prompts, and `test/skill-workflows.test.js`.
41
+ - Refactor `discover-edge-cases`, `discover-security-issues`, and `review-change-set` for clearer dependencies, workflows, and agent-facing copy.
42
+ - Standardize git submission: skills that record or publish changes now depend on **`commit-and-push`** (`implement-specs*`, `implement-specs-with-subagents`, `merge-conflict-resolver` when committing, `open-source-pr-workflow`, `resolve-review-comments`, `solve-issues-found-during-review`, `develop-new-features`, `enhance-existing-features`); **`commit-and-push`** runs **push** only when the user explicitly requests a remote update.
43
+
44
+ ## [v3.9.1] - 2026-05-06
45
+
46
+ ### Changed
47
+ - `implement-specs-with-subagents`: require full multi-phase reconciliation (repeat run/merge steps until every non-blocked in-scope spec is merged or explicitly blocked); forbid early completion narratives while later phases or unmerged successful branches remain.
48
+
37
49
  ## [v3.9.0] - 2026-05-05
38
50
 
39
51
  ### Changed
package/README.md CHANGED
@@ -21,7 +21,7 @@ A curated skill catalog for Codex, OpenClaw, Trae, Agents, and Claude Code with
21
21
  - financial-research
22
22
  - read-github-issue
23
23
  - generate-spec
24
- - harden-app-security
24
+ - discover-security-issues
25
25
  - implement-specs
26
26
  - implement-specs-with-subagents
27
27
  - implement-specs-with-worktree
@@ -204,7 +204,7 @@ Compatibility note:
204
204
  - `recover-missing-plan` is a local skill used by `enhance-existing-features` and `ship-github-issue-fix` when a referenced `docs/plans/...` spec set is missing or archived.
205
205
  - `maintain-skill-catalog` can conditionally use `find-skills`, but its install source is not verified in this repository, so it is intentionally omitted from the table.
206
206
  - `read-github-issue` uses GitHub CLI (`gh`) directly for remote issue discovery and inspection, so it does not add any extra skill dependency.
207
- - `review-spec-related-changes` is a local skill that depends on `review-change-set`, `discover-edge-cases`, and `harden-app-security` for secondary code-practice checks after business-goal completion is reviewed against the governing specs.
207
+ - `review-spec-related-changes` is a local skill that depends on `review-change-set`, `discover-edge-cases`, and `discover-security-issues` for secondary code-practice checks after business-goal completion is reviewed against the governing specs.
208
208
 
209
209
  ## Release publishing
210
210
 
@@ -31,6 +31,6 @@ When the diff includes code changes, `review-change-set` is still a conditional
31
31
 
32
32
  Apply the same rule to every other conditional gate: if its scenario is met during classification, it becomes blocking before commit rather than a best-effort follow-up.
33
33
 
34
- That includes risk-driven review gates such as `discover-edge-cases` and `harden-app-security` whenever the change surface makes them applicable.
34
+ That includes risk-driven review gates such as `discover-edge-cases` and `discover-security-issues` whenever the change surface makes them applicable.
35
35
 
36
36
  For release workflows, use `version-release`.
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: commit-and-push
3
3
  description: >-
4
- Commit and push only (no semver): inspect staged vs unstaged, classify scopes, run mandated reviews (`review-change-set`, conditional `discover-edge-cases`/`harden-app-security`), **`submission-readiness-check`** BEFORE final commit honoring `CHANGELOG.md` Unreleased + `archive-specs` redirections, preserve intentional staging splits, forbid UI git stubs, VERIFY remote hashes post-push **`version-release` elsewhere**.
4
+ Commit and push only (no semver): inspect staged vs unstaged, classify scopes, run mandated reviews (`review-change-set`, conditional `discover-edge-cases`/`discover-security-issues`), **`submission-readiness-check`** BEFORE final commit honoring `CHANGELOG.md` Unreleased + `archive-specs` redirections, preserve intentional staging splits, forbid UI git stubs, VERIFY remote hashes post-push **`version-release` elsewhere**.
5
5
  Use for “please commit”, “submit”, “push branch” lacking explicit semver/tag language **STOP** tagging here… BAD skip readiness red… GOOD staged subset untouched unrelated dirty files changelog mirrors diff… hashes `git rev-parse HEAD` versus upstream… archive specs before commit flagged…
6
6
  ---
7
7
 
@@ -10,7 +10,7 @@ description: >-
10
10
  ## Dependencies
11
11
 
12
12
  - Required: **`submission-readiness-check`** immediately before the **final** commit.
13
- - Conditional: **`archive-specs`** when readiness (or completed specs) requires doc conversion or categorized `docs/` alignment; **`review-change-set`** for every **code-affecting** scope; **`discover-edge-cases`** and **`harden-app-security`** become **required** when classification/risk indicates (same scope)—treat as blocking, not polish.
13
+ - Conditional: **`archive-specs`** when readiness (or completed specs) requires doc conversion or categorized `docs/` alignment; **`review-change-set`** for every **code-affecting** scope; **`discover-edge-cases`** and **`discover-security-issues`** become **required** when classification/risk indicates (same scope)—treat as blocking, not polish.
14
14
  - Optional: none.
15
15
  - Fallback: Any **required** dependency unavailable ⇒ **MUST** stop and report—**MUST NOT** “light” commit.
16
16
 
@@ -18,7 +18,7 @@ description: >-
18
18
 
19
19
  - **MUST** use real `git` mutations (`git add`, `git commit`, `git push`, `git stash`, etc.); **MUST NOT** treat UI tokens (`::git-commit`, IDE buttons) as proof of history.
20
20
  - **MUST** run **`submission-readiness-check`** before final commit; unresolved readiness (e.g. stale/missing `CHANGELOG.md` **Unreleased**, doc drift) **blocks** commit.
21
- - Code-affecting: **`review-change-set` MANDATORY**; unresolved confirmed findings **block**. When risk profile matches, **`discover-edge-cases`** / **`harden-app-security`** equally blocking.
21
+ - Code-affecting: **`review-change-set` MANDATORY**; unresolved confirmed findings **block**. When risk profile matches, **`discover-edge-cases`** / **`discover-security-issues`** equally blocking.
22
22
  - **`archive-specs`**: when readiness says convert/archive or `docs/` mismatch—**MUST** run **before** final commit, not as a vague follow-up.
23
23
  - **MUST** reconcile **staged vs unstaged** with user intent—**MUST NOT** broaden scope by auto-staging unrelated files when user staged a subset.
24
24
  - **`CHANGELOG.md` `Unreleased`**: for code-affecting or user-visible docs, **MUST** reflect this change before commit; reopen diff after edits to match commit scope.
@@ -27,14 +27,14 @@ description: >-
27
27
  - **MUST NOT** run version bump, tag, or GitHub release (**use `version-release`**).
28
28
  - Clean worktree requests: **MUST** inspect `HEAD`, upstream, last commit—**MUST NOT** fabricate “pushed” when already satisfied or impossible.
29
29
 
30
- **Repository regression checks (verbatim requirements):** Treat root `CHANGELOG.md` `Unreleased` coverage as mandatory for code-affecting or user-visible changes. Re-open the final `CHANGELOG.md` diff after readiness updates. **`review-change-set` is required for code-affecting changes**; Run `review-change-set` for every code-affecting change before continuing; treat unresolved review findings as blocking. Any conditional gate whose trigger is confirmed by this classification becomes mandatory before commit. Treat every scenario-matched gate as blocking before commit. **`discover-edge-cases` and `harden-app-security` are important review gates**—when their scenario is met, treat them as blocking review gates, not optional polish.
30
+ **Repository regression checks (verbatim requirements):** Treat root `CHANGELOG.md` `Unreleased` coverage as mandatory for code-affecting or user-visible changes. Re-open the final `CHANGELOG.md` diff after readiness updates. **`review-change-set` is required for code-affecting changes**; Run `review-change-set` for every code-affecting change before continuing; treat unresolved review findings as blocking. Any conditional gate whose trigger is confirmed by this classification becomes mandatory before commit. Treat every scenario-matched gate as blocking before commit. **`discover-edge-cases` and `discover-security-issues` are important review gates**—when their scenario is met, treat them as blocking review gates, not optional polish.
31
31
 
32
32
  ## Standards (summary)
33
33
 
34
34
  - **Evidence**: `git status`/`diff`; classification drives gates; changelog diff matches commit.
35
35
  - **Execution**: Inspect → classify → (deps) → readiness → commit → push verify.
36
36
  - **Quality**: No gate bypass; sequential git ops; preserve intentional commit boundaries.
37
- - **Output**: Conventional commit message + confirmed remote + note stash/scope if any.
37
+ - **Output**: Conventional commit message + confirmed remote **when push ran** + note stash/scope if any.
38
38
 
39
39
  ## References
40
40
 
@@ -54,15 +54,16 @@ description: >-
54
54
  3. **Branch target** — Honor user branch; if switch needed, protect unrelated changes; cherry-pick/replay off wrong branch safely; worktree cases: identify authoritative target **before** replay.
55
55
  - **Pause →** Am I about to merge noise because diff > issue scope—should I stop and narrow first?
56
56
 
57
- 4. **Code-affecting gates** — `review-change-set` always; add `discover-edge-cases` / `harden-app-security` when risk/trigger says so; fix or document blockers; re-test material logic.
57
+ 4. **Code-affecting gates** — `review-change-set` always; add `discover-edge-cases` / `discover-security-issues` when risk/trigger says so; fix or document blockers; re-test material logic.
58
58
 
59
59
  5. **Readiness** — Run **`submission-readiness-check`**; if it routes to **`archive-specs`**, run that **now**; fix `Unreleased` bullets; recheck changelog vs staged intent.
60
60
  - **Pause →** Could I commit while readiness still red—**why not**?
61
61
 
62
62
  6. **Commit** — Respect staging; separate commits if user asked; Conventional message per `references/commit-messages.md`.
63
63
 
64
- 7. **Push** — Sequential; verify remote hash; sync local branch after if user asked; worktree cleanup **only after** target branch verified good.
65
- - **Pause →** What two hashes prove remote == local?
64
+ 7. **Push** — **Only** when the user requested remote update (`push`, `publish`, PR branch sync, explicit upstream publish, or equivalent). If the user asked **only** for a **local** commit with **no** remote publish in this thread, finish after step 6, state local `HEAD`, and **do not** push.
65
+ - **Pause →** Did the user **explicitly** ask to update a remote, or only to record commits locally?
66
+ - **Pause →** What two hashes prove remote == local when push **did** run?
66
67
 
67
68
  ## Sample hints
68
69
 
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Commit and Push"
3
3
  short_description: "Submit local changes with commit and push only"
4
- default_prompt: "Use $commit-and-push to inspect the current git state and classify the diff. Treat every conditional gate whose scenario is met as blocking before any commit: if the change set includes code changes, run $review-change-set; if the reviewed risk profile says edge-case or security review is needed, run $discover-edge-cases and $harden-app-security as blocking gates too; if completed specs should be converted or docs need normalization, ensure $archive-specs runs through $submission-readiness-check; if changelog synchronization is needed, complete it before continuing. Then run any additional required code-quality skills, hand the repository to $submission-readiness-check so it can synchronize completed plan archives, project docs, AGENTS.md/CLAUDE.md, and CHANGELOG.md before any commit, confirm root CHANGELOG.md Unreleased reflects the actual pending change set, preserve user staging intent, create a concise Conventional Commit, and push to the intended branch without any versioning or release steps."
4
+ default_prompt: "Use $commit-and-push to inspect the current git state and classify the diff. Treat every conditional gate whose scenario is met as blocking before any commit: if the change set includes code changes, run $review-change-set; if the reviewed risk profile says edge-case or security review is needed, run $discover-edge-cases and $discover-security-issues as blocking gates too; if completed specs should be converted or docs need normalization, ensure $archive-specs runs through $submission-readiness-check; if changelog synchronization is needed, complete it before continuing. Then run any additional required code-quality skills, hand the repository to $submission-readiness-check so it can synchronize completed plan archives, project docs, AGENTS.md/CLAUDE.md, and CHANGELOG.md before any commit, confirm root CHANGELOG.md Unreleased reflects the actual pending change set, preserve user staging intent, create a concise Conventional Commit, and push to the intended branch without any versioning or release steps."
@@ -11,9 +11,9 @@ description: >-
11
11
  ## Dependencies
12
12
 
13
13
  - Required: `generate-spec` for shared planning artifacts and `test-case-strategy` for risk-driven test selection, oracles, and unit drift checks before coding.
14
- - Conditional: none.
14
+ - Conditional: **`commit-and-push`** when the user requests **git commit** and/or **push** after delivery—**MUST** delegate final submission to **`commit-and-push`** (implementation detail: often via **`implement-specs`**, which already requires it).
15
15
  - Optional: none.
16
- - Fallback: **`generate-spec`** **or** **`test-case-strategy`** missing ⇒ **stop** (no improvised planning/tests).
16
+ - Fallback: **`generate-spec`** **or** **`test-case-strategy`** missing ⇒ **stop** (no improvised planning/tests). If the user requested **commit/push** and **`commit-and-push`** is unavailable, **MUST** stop and report.
17
17
 
18
18
  ## Non-negotiables
19
19
 
@@ -11,7 +11,7 @@ It does not write tests, patch code, or open PRs.
11
11
  It follows a strict workflow:
12
12
  1. Detect whether `git diff` exists.
13
13
  2. Inspect only changed files plus minimal dependencies, or perform a full-project scan when no diff exists.
14
- 3. Run `harden-app-security` as an adversarial dependency for code-affecting scope.
14
+ 3. Run `discover-security-issues` as an adversarial dependency for code-affecting scope.
15
15
  4. Probe the highest-risk edge cases and gather concrete evidence.
16
16
  5. Reproduce confirmed issues at least twice and check nearby variants.
17
17
  6. Prioritize confirmed findings and report hardening guidance only.
@@ -31,7 +31,7 @@ Use this skill when a task asks you to:
31
31
  - Treat prior authorship as irrelevant; even code written earlier in the same conversation must be challenged like third-party code.
32
32
  - Decisions must be evidence-based; speculative ideas stay marked as hypotheses.
33
33
  - Keep only reproducible findings with exact evidence.
34
- - Run `harden-app-security` as a required adversarial cross-check for code-affecting scope.
34
+ - Run `discover-security-issues` as a required adversarial cross-check for code-affecting scope.
35
35
  - Report recommended fixes and test ideas, but do not implement them in this skill.
36
36
 
37
37
  ## External API requirements
@@ -1,6 +1,8 @@
1
1
  ---
2
2
  name: discover-edge-cases
3
- description: Discover reproducible edge-case risks in changed code or a selected codebase scope, prove them with concrete evidence, and report prioritized findings without modifying implementation. Use when users ask to find edge cases, assess hardening gaps, or validate that unusual inputs and error paths are covered.
3
+ description: >-
4
+ Diff-first (or full-repo) discovery of **reproducible** edge-case risks: boundaries, null/empty, failure paths, concurrency, observability; evidence via code/tests/runtime—**no edits, no new tests, no PRs**. For code-affecting scope, cross-check with **`discover-security-issues`** before final report.
5
+ Use for edge-case review, hardening gaps, unusual inputs/error paths, pre-merge risk pass **STOP** implementation or “just fix it here”… BAD unproven alarm list… GOOD path:line + double repro…
4
6
  ---
5
7
 
6
8
  # Discover Edge Cases
@@ -8,113 +10,82 @@ description: Discover reproducible edge-case risks in changed code or a selected
8
10
  ## Dependencies
9
11
 
10
12
  - Required: none.
11
- - Conditional: `harden-app-security` for code-affecting scopes before finalizing the report.
13
+ - Conditional: **`discover-security-issues`** on **code-affecting** scope before finalizing the report (adversarial security pass).
12
14
  - Optional: none.
13
- - Fallback: If the required security cross-check is unavailable for a code-affecting scope, stop and report the missing dependency.
15
+ - Fallback: If that security cross-check is **required** but unavailable, **MUST** stop and report the missing dependency.
14
16
 
15
- ## Standards
17
+ ## Non-negotiables
16
18
 
17
- - Evidence: Keep only reproducible findings backed by code, tests, runtime output, or direct reproduction steps.
18
- - Execution: Determine scope first, run focused probes, confirm reproducibility, then report findings without remediation.
19
- - Quality: Separate confirmed findings from hypotheses and cover boundary, failure, stateful, and observability edge cases that matter to the scope.
20
- - Output: Return prioritized findings, edge-case evidence, risk assessment, hardening guidance, and residual risk only.
19
+ - **Discovery-only**: **MUST NOT** edit code, add/modify tests, or open PRs.
20
+ - **MUST** keep only **reproducible** findings; label guesses as **hypotheses**.
21
+ - **MUST** reproduce each **confirmed** issue **at least twice** (same trigger); vary neighbors (empty vs null, malformed vs wrong-type).
22
+ - **MUST** discard authorship bias—including code from earlier in the conversation.
23
+ - If remediation is requested: finish this pass first; hand off **confirmed** items to an implementation workflow.
21
24
 
22
- ## Non-negotiable Boundaries
25
+ ## Standards (summary)
23
26
 
24
- - This skill is discovery-only: do not edit code, do not add or modify tests, and do not open PRs.
25
- - Keep only reproducible findings with clear evidence.
26
- - Mark unverified ideas as hypotheses and separate them from confirmed findings.
27
- - If the task also requires remediation, finish this discovery pass first, then hand off confirmed findings to another implementation workflow.
28
- - Discard authorship bias completely: treat code written earlier in the conversation or by this agent as untrusted until evidence proves otherwise.
27
+ - **Evidence**: `path:line`, commands/inputs, test output, or runtime symptoms—no intent-only claims.
28
+ - **Execution**: Scope baseline read focused probes (2–5 high-impact) → validate → prioritize → report.
29
+ - **Quality**: Prefer fewer strong findings; flag data integrity, silent failure, retry storms, cross-module propagation.
30
+ - **Output**: Prioritized findings, reproduction, risk, hardening **advice**, residual risk/hypotheses.
29
31
 
30
32
  ## Workflow
31
33
 
32
- ### 1) Determine scan scope (required)
34
+ **Chain-of-thought:** Answer **`Pause →`** each step; if scope is wrong, fix before probing.
33
35
 
34
- - Run `git diff --name-only` first.
35
- - If diff exists: inspect only changed files plus the minimum dependency chain required to validate suspected edge cases.
36
- - If no diff exists: scan the full project, prioritizing core domain logic, external API boundaries, stateful workflows, and concurrency-sensitive modules.
37
- - If no actionable issue is found, report `No actionable edge-case finding identified` and stop.
36
+ ### 1) Determine scan scope
38
37
 
39
- ### 2) Build a factual baseline
38
+ - `git diff --name-only` first.
39
+ - **With diff**: changed files + minimum dependency chain to validate suspected edges.
40
+ - **No diff**: whole project, prioritizing domain logic, external boundaries, stateful/concurrent modules.
41
+ - If nothing actionable after honest pass: report `No actionable edge-case finding identified` and stop.
42
+ - **Pause →** Can I name the **smallest file set** I must read—not the whole monorepo by default?
40
43
 
41
- - Read the relevant code paths end-to-end before judging behavior.
42
- - Re-derive behavior from code, tests, runtime output, and reproduced inputs only; ignore prior intent, authorship, or confidence from earlier turns.
43
- - Clarify input/output contracts: types, valid ranges, null handling, ordering assumptions, retry/error behavior, and state transitions.
44
- - Run existing tests or a minimal reproduction when needed to confirm actual vs expected behavior.
45
- - Record exact evidence with file references (`path:line`) and observable symptoms.
44
+ ### 2) Build factual baseline
46
45
 
47
- ### 3) Execute focused edge-case probes
46
+ - Read end-to-end before judging; derive behavior from code, tests, runtime only.
47
+ - Clarify contracts: types, ranges, null, ordering, retries, state transitions.
48
+ - **Pause →** What did I **execute** (test/command) vs only read?
48
49
 
49
- Prioritize 2-5 high-risk cases directly tied to the selected scope:
50
+ ### 3) Focused probes (prioritize 2–5)
50
51
 
51
- - Empty collections / empty strings / None / null
52
- - Boundary values: 0, 1, -1, max/min limits, overflow
53
- - Duplicate, ordering, sorting, or deduplication assumptions
54
- - Exception paths: external dependency failure, timeout, retry, or partial data missing
55
- - Invalid formats: malformed strings, invalid date/timezone, or unexpected types
56
- - Concurrency/reentrancy: repeated calls, state contamination, or race windows
57
- - Architecture-level edge cases: backpressure, resource exhaustion, timeout propagation, or partial commit/rollback behavior
52
+ Target high-risk patterns tied to scope:
58
53
 
59
- For broader coverage, load references as needed:
54
+ - Empty/null/malformed/unexpected types; boundaries (0, 1, min/max, overflow); duplicates/order.
55
+ - Dependency failure: timeout, partial data, retry loops; invalid formats.
56
+ - Concurrency/reentrancy; architecture edges: backpressure, exhaustion, partial commit/rollback.
57
+ - **HTTP/API** (if in scope): 429/500 behavior; logging with status/id/retry/latency (no silent fails).
60
58
 
61
- - `references/architecture-edge-cases.md`
62
- - `references/code-edge-cases.md`
59
+ Load as needed: `references/architecture-edge-cases.md`, `references/code-edge-cases.md`.
60
+ - **Pause →** Would **discover-security-issues** flag this sink if it is auth/input injection—did I schedule that pass for code changes?
63
61
 
64
- #### External API checks
62
+ ### 4) Confirm reproducibility
65
63
 
66
- If the scope includes external API calls, validate:
64
+ - Two passes per confirmed issue; note variants tried; keep unconfirmed as hypotheses.
67
65
 
68
- - observable health/availability handling,
69
- - degradation behavior for at least HTTP 429 and 500,
70
- - actionable error logging (status code, request id, retry count, latency) to avoid silent failures.
66
+ ### 5) Prioritize
71
67
 
72
- ### 4) Confirm reproducibility
68
+ - User impact, frequency/exploitability, blast radius; call out integrity, state corruption, silent failure.
69
+
70
+ ### 6) Security cross-check (code-affecting)
71
+
72
+ - Run **`discover-security-issues`** on the **same** scope; integrate **confirmed** security items (do not duplicate as edge trivia unless distinct).
73
+
74
+ ### 7) Report only
75
+
76
+ Deliver: (1) Findings—title, severity, evidence, repro, broken invariant; (2) Edge evidence—preconditions, observation, variants; (3) Risk—impact/likelihood/scope; (4) Hardening guidance (advisory); (5) Residual risk—hypotheses, next checks.
77
+
78
+ ## Minimum coverage (apply what fits scope)
79
+
80
+ - Input validation; boundary behavior; failure/degraded modes; state/idempotency/concurrency/rollback; actionable observability.
81
+
82
+ ## Sample hints
83
+
84
+ - **Diff**: One new parser → empty string + max length + malformed delimiter **before** “maybe SQL.”
85
+ - **No diff**: Start at payment/state machine module—highest consequence.
86
+ - **Handoff**: Five confirmed edges → remediation skill gets **numbered list + repro**—not this skill patching.
87
+
88
+ ## References
73
89
 
74
- - Reproduce each confirmed issue at least twice through the same trigger path.
75
- - For high-risk findings, try nearby variants such as boundary neighbors, empty vs null, malformed vs well-typed invalid input, repeated calls, and stale ordering.
76
- - Capture the exact command, request, or input together with the observed failure or missing protection.
77
- - Keep unverified ideas as hypotheses only.
78
-
79
- ### 5) Prioritize confirmed findings
80
-
81
- - Rank findings by user impact, exploitability or frequency, and blast radius.
82
- - Call out data-integrity, state corruption, silent failure, retry storm, and cross-module propagation risks explicitly.
83
- - Prefer fewer, stronger findings over many speculative ones.
84
-
85
- ### 6) Report findings only
86
-
87
- Deliver:
88
-
89
- 1. Findings (highest risk first)
90
- - Title and severity/priority
91
- - Evidence (`path:line`)
92
- - Reproduction steps or triggering input
93
- - Broken expectation/invariant
94
- 2. Edge-case evidence
95
- - Preconditions
96
- - Observed behavior
97
- - Reproducibility notes and nearby variant results
98
- 3. Risk assessment
99
- - Impact, likelihood, and scope
100
- - Why this matters in system context
101
- 4. Hardening guidance (advice only)
102
- - Recommended fix direction
103
- - Suggested test coverage to add during remediation
104
- 5. Residual risk
105
- - Hypotheses, unknowns, and next validation ideas
106
-
107
- ## Minimum Coverage
108
-
109
- Apply all relevant checks for the selected scope:
110
-
111
- - Input validation: empty/null/malformed/unexpected-type handling
112
- - Boundary behavior: zero/one/min/max/overflow/ordering edges
113
- - Failure behavior: timeout, retry, partial dependency failure, degraded mode
114
- - Stateful behavior: idempotency, replay, concurrency, rollback, duplicate processing
115
- - Observability: actionable errors and logging for failures that would otherwise be silent
116
-
117
- ## Resources
118
-
119
- - `references/architecture-edge-cases.md`: cross-module/system-level edge-case checklist.
120
- - `references/code-edge-cases.md`: code-level input, boundary, and error-path checklist.
90
+ - `references/architecture-edge-cases.md` system-level checklist.
91
+ - `references/code-edge-cases.md` code-level input/error/concurrency checklist.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Discover Edge Cases"
3
- short_description: "Discover reproducible edge-case risks and coverage gaps"
4
- default_prompt: "Use $discover-edge-cases to scan the current diff first (or the full codebase when there is no diff), discard any bias toward code written earlier in the conversation, run $harden-app-security as an adversarial cross-check for code-affecting scope, identify the highest-risk reproducible edge-case findings, validate them with concrete evidence, prioritize the confirmed risks, and report hardening and test recommendations without modifying code."
3
+ short_description: "Find reproducible edge-case risks with evidence-only reporting"
4
+ default_prompt: "Use $discover-edge-cases to scan the current diff first (or the full codebase when there is no diff), discard any bias toward code written earlier in the conversation, run $discover-security-issues as an adversarial cross-check for code-affecting scope, identify the highest-risk reproducible edge-case findings, validate them with concrete evidence, prioritize the confirmed risks, and report hardening and test recommendations without modifying code."
@@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
4
4
 
5
5
  The format is based on Keep a Changelog and this project follows Semantic Versioning.
6
6
 
7
+ ## [v0.0.3] - 2026-05-06
8
+
9
+ ### Changed
10
+ - Rename skill directory and identifier from `harden-app-security` to `discover-security-issues`; refresh `SKILL.md`, `README.md`, and agent display metadata to match discovery-only semantics.
11
+
7
12
  ## [v0.0.2] - 2026-03-11
8
13
 
9
14
  ### Changed
@@ -0,0 +1,35 @@
1
+ # discover-security-issues
2
+
3
+ Evidence-first, **discovery-only** adversarial security workflow across agent, financial, and general software surfaces.
4
+
5
+ ## What this skill provides
6
+
7
+ - Reproduce exploitable behavior with payloads, requests, and `path:line` proof—**no patches or PRs**.
8
+ - Modules: `agent-system`, `financial-program`, `software-system`, and `combined` (cross-boundary chains).
9
+ - Catalog-driven scenarios (SQLi, XSS, CSRF, SSRF, IDOR, prompt injection, money-path races, …).
10
+ - Prioritized reporting plus advisory hardening notes and residual risk.
11
+
12
+ ## Layout
13
+
14
+ - `SKILL.md` — workflow, modules, output shape.
15
+ - `agents/openai.yaml` — metadata and default prompt.
16
+ - `references/*` — attack catalogs and optional test-pattern snippets.
17
+
18
+ ## Typical use
19
+
20
+ 1. Pick module(s) and trust boundaries.
21
+ 2. Walk selected reference catalogs; record only **double-reproduced** issues.
22
+ 3. Prioritize and report; stop before implementation—hand off confirmed findings if fixes are needed.
23
+
24
+ ## Example
25
+
26
+ ```text
27
+ Use $discover-security-issues in discovery-only mode.
28
+ Module: combined (agent-system + software-system).
29
+ Focus: prompt injection to privileged tools, SQL injection, IDOR.
30
+ Deliver severity-ordered findings with exploit steps and path:line evidence.
31
+ ```
32
+
33
+ ## License
34
+
35
+ MIT. See [LICENSE](LICENSE).
@@ -0,0 +1,88 @@
1
+ ---
2
+ name: discover-security-issues
3
+ description: >-
4
+ Discovery-only adversarial audit: map trust boundaries, run module catalogs (`agent-system`, `financial-program`, `software-system`, `combined`), reproduce exploitable behavior with payloads/commands and `path:line` evidence; prioritize impact × exploitability—**no code edits, no PRs, no auto-remediation**.
5
+ Use for security review, vuln hunting, SQLi/XSS/auth/IDOR checks, agent prompt-injection/tool abuse, money-path races **STOP** when user wants patches shipped—hand off findings… BAD single vague “looks fine”… GOOD two-pass repro, hypothesis vs confirmed…
6
+ ---
7
+
8
+ # Discover Security Issues
9
+
10
+ ## Dependencies
11
+
12
+ - Required: none.
13
+ - Conditional: none.
14
+ - Optional: none.
15
+ - Fallback: not applicable.
16
+
17
+ ## Non-negotiables
18
+
19
+ - **Discovery-only**: **MUST NOT** edit code, apply patches, open PRs, or run “fix workflows.”
20
+ - **MUST** keep only **reproducible** issues with exploit evidence; separate **hypotheses** from **confirmed** findings.
21
+ - **MUST** reproduce each confirmed exploit **at least twice** on the same path; use nearby payload variants for high-risk sinks.
22
+ - **MUST** discard authorship bias—treat all code as untrusted until evidence proves behavior.
23
+
24
+ ## Standards (summary)
25
+
26
+ - **Evidence**: Payload/precondition, observable failure, `path:line`, commands or requests that reproduce.
27
+ - **Execution**: Pick modules → boundaries → scenarios from references → validate → prioritize → report only.
28
+ - **Quality**: Rank by impact, exploitability, reach; unknowns listed under residual risk.
29
+ - **Output**: Findings (severity-ordered), attack evidence, risk notes, hardening **advice** (not patches), residual risk.
30
+
31
+ ## Workflow
32
+
33
+ **Chain-of-thought:** After each step, satisfy **`Pause →`** before continuing; halt on missing scope or contradictory module choice.
34
+
35
+ ### 1) Scope and modules
36
+
37
+ - Choose one or more of: `agent-system`, `financial-program`, `software-system`, `combined` (cross-boundary chains).
38
+ - List untrusted inputs, privileged actions, and protected assets; state invariants that must hold.
39
+ - **Pause →** Which module catalogs did I **open** (file names)—not guessed from memory?
40
+
41
+ ### 2) Execute scenarios from references
42
+
43
+ - **Agent**: `references/agent-attack-catalog.md`; optional `references/security-test-patterns-agent.md` (prompt injection, tool abuse, memory/exfil paths).
44
+ - **Financial**: `references/red-team-extreme-scenarios.md`, `references/risk-checklist.md`; optional `references/security-test-patterns-finance.md` (authz, replay/race, idempotency, precision, lifecycle).
45
+ - **Software**: `references/common-software-attack-catalog.md` (SQL/NoSQL/command injection, XSS, CSRF, SSRF, traversal, upload, session/JWT, IDOR/BOLA, deserialization, misconfig).
46
+ - **Combined**: relevant subsets **plus** chains (e.g. injection → privileged API).
47
+ - **Pause →** Did I record **payload + preconditions + observed behavior** for each candidate—not just “maybe vulnerable”?
48
+
49
+ ### 3) Validate reproducibility
50
+
51
+ - Re-run each confirmed path twice; add encoding/casing/delimiter variants on hot sinks.
52
+ - **Pause →** Is anything still “likely” without a second repro—downgrade to hypothesis?
53
+
54
+ ### 4) Prioritize
55
+
56
+ - Order Critical/High → Medium → Low using impact, exploitability, blast radius (multi-tenant / cross-tenant called out).
57
+
58
+ ### 5) Report only
59
+
60
+ Deliver (see **Output shape** below): findings, attack evidence, prioritization, hardening guidance (advisory), residual risk.
61
+
62
+ ## Minimum coverage (apply per selected module)
63
+
64
+ - **Core**: trust boundaries, authn/authz, input → dangerous sink paths, secrets/sensitive data handling.
65
+ - **Agent**: prompt/indirect injection, unauthorized tools/actions, exfil, memory poisoning resistance.
66
+ - **Financial**: object-level authz, replay/race/idempotency, precision, oracle/side-effect safety, failure consistency.
67
+ - **Software**: injection families, XSS/CSRF/SSRF, traversal/upload, session/JWT, brute-force/rate limits, debug/CORS/secrets exposure.
68
+ - **Combined**: module checks + realistic cross-boundary chains.
69
+
70
+ ## Output shape
71
+
72
+ 1. **Findings** (high → low): title, severity, evidence (`path:line`), reproduction steps/payload, impacted invariant/asset.
73
+ 2. **Attack evidence**: preconditions, commands/requests, observed insecure behavior, variant results.
74
+ 3. **Risk prioritization**: impact, exploitability, reach; why it matters in **this** system.
75
+ 4. **Hardening guidance** (advice only): fix direction, validation focus post-remediation.
76
+ 5. **Residual risk**: hypotheses, assumptions, follow-up probes.
77
+
78
+ ## Sample hints
79
+
80
+ - **Module**: Web API + Claude tool-use → `combined` (software + agent); deposits/withdrawals → include `financial-program`.
81
+ - **Evidence**: “SQLi possible” without two runs + exact parameter → stays **hypothesis** until repro’d.
82
+ - **Stop line**: User says “patch it now” → finish report; hand off to implementation skills—**do not** self-patch here.
83
+
84
+ ## References
85
+
86
+ - `references/agent-attack-catalog.md`, `references/security-test-patterns-agent.md`
87
+ - `references/red-team-extreme-scenarios.md`, `references/risk-checklist.md`, `references/security-test-patterns-finance.md`
88
+ - `references/common-software-attack-catalog.md`, `references/test-snippets.md` (optional snippets)
@@ -0,0 +1,4 @@
1
+ interface:
2
+ display_name: "Discover Security Issues"
3
+ short_description: "Discovery-only adversarial audit: reproducible exploits across agent, finance, and software stacks"
4
+ default_prompt: "Use $discover-security-issues to run a discovery-only adversarial audit. Reproduce exploitable vulnerabilities with concrete evidence and severity prioritization across agent-system, financial-program, and software-system scopes (including SQL injection and common web flaws). Do not apply code fixes or PR actions."
@@ -11,9 +11,9 @@ description: >-
11
11
  ## Dependencies
12
12
 
13
13
  - Required: `test-case-strategy` for risk selection, oracles, drift checks.
14
- - Conditional: **`generate-spec`** when spec triggers below fire; **`recover-missing-plan`** when user-named `docs/plans/...` is missing/archived/mismatched.
14
+ - Conditional: **`generate-spec`** when spec triggers below fire; **`recover-missing-plan`** when user-named `docs/plans/...` is missing/archived/mismatched; **`commit-and-push`** when the user requests **git commit** and/or **push** to persist completed work—**MUST** delegate final submission to **`commit-and-push`** (often via **`implement-specs`** / **`implement-specs-with-worktree`** when a spec path is active).
15
15
  - Optional: none.
16
- - Fallback: **`test-case-strategy`** unavailable ⇒ **stop**. Spec path required but **`generate-spec`** unavailable ⇒ **stop**.
16
+ - Fallback: **`test-case-strategy`** unavailable ⇒ **stop**. Spec path required but **`generate-spec`** unavailable ⇒ **stop**. If the user requested **commit/push** and **`commit-and-push`** is unavailable, **MUST** stop and report.
17
17
 
18
18
  ## Non-negotiables
19
19
 
@@ -1,19 +1,19 @@
1
1
  ---
2
2
  name: implement-specs
3
3
  description: >-
4
- Land an approved `docs/plans/{YYYY-MM-DD}/{change}` (or batch member path) on the currently checked-out branch: read the full planning bundle + `coordination.md` when relevant, execute every in-scope `tasks.md` item, backfill honest checklist/spec state, commit locally—**do not** create branches/worktrees or push unless the user explicitly widens the request mid-thread.
4
+ Land an approved `docs/plans/{YYYY-MM-DD}/{change}` (or batch member path) on the currently checked-out branch: read the full planning bundle + `coordination.md` when relevant, execute every in-scope `tasks.md` item, backfill honest checklist/spec state, then **finalize through `commit-and-push`**—**do not** create branches/worktrees or widen to push/release unless the user explicitly asks mid-thread.
5
5
  Choose this for “implement on this branch” scenarios. If isolation is required use **`implement-specs-with-worktree`**; if multiple specs need delegated workers use **`implement-specs-with-subagents`**.
6
- Good: stay on `feature/foo`, finish tasks, `git commit`. Bad: `git worktree add` purely to avoid dirty trees—wrong skill unless user re-scoped.
6
+ Good: stay on `feature/foo`, finish tasks, run **`commit-and-push`**. Bad: `git worktree add` purely to avoid dirty trees—wrong skill unless user re-scoped.
7
7
  ---
8
8
 
9
9
  # Implement Specs
10
10
 
11
11
  ## Dependencies
12
12
 
13
- - Required: `enhance-existing-features` and `develop-new-features` for implementation standards.
13
+ - Required: `enhance-existing-features` and `develop-new-features` for implementation standards; **`commit-and-push`** for the **final** implementation commit (and push when the user explicitly requests remote update).
14
14
  - Conditional: `generate-spec` if spec files need clarification or updates; `recover-missing-plan` if the requested plan path is missing from the current checkout.
15
15
  - Optional: none.
16
- - Fallback: If `enhance-existing-features` or `develop-new-features` is unavailable, **MUST** stop immediately and report the missing dependency. Do not improvise substitute standards.
16
+ - Fallback: If **`enhance-existing-features`**, **`develop-new-features`**, or **`commit-and-push`** is unavailable, **MUST** stop immediately and report the missing dependency. Do not improvise substitute standards or ungated `git commit`.
17
17
 
18
18
  ## Non-negotiables
19
19
 
@@ -21,8 +21,8 @@ description: >-
21
21
  - **MUST NOT** create a branch, switch branches, or add or use a `git worktree` for this work unless the user explicitly changes the request in the same conversation.
22
22
  - **MUST** treat the approved `tasks.md` / contracts as the scope boundary: complete every item that is in scope for this request, run the relevant tests, and **MUST** backfill the planning documents with factual completion status (no aspirational checkboxes).
23
23
  - **MUST NOT** expand scope to unrelated sibling spec directories solely because they share a batch folder.
24
- - **MUST** commit the finished work to the **current** branch as a focused implementation commit (split only when an unavoidable checkpoint is required); the combined result **MUST** contain only the intended changes.
25
- - **MUST NOT** `git push`, tag, or perform release steps unless the user explicitly asks.
24
+ - **MUST** finalize the implementation through **`commit-and-push`** after staging the intended change set (shared readiness, reviews per that skill’s classification, conventional commit message); **MUST NOT** complete the deliverable with a bare `git commit`, IDE-only commit, or other shortcut that skips **`submission-readiness-check`** / mandated gates.
25
+ - **MUST NOT** `git push`, tag, or perform release steps **outside** **`commit-and-push`** (unless **`version-release`** / **`open-source-pr-workflow`** explicitly applies per user request).
26
26
  - If the plan path is missing or ambiguous: **MUST** use `recover-missing-plan` or other verifiable repository evidence to locate the authoritative plan; **MUST NOT** substitute a nearby path by guess. After recovery, **MUST** re-read the recovered files before coding so implementation and backfill target the same snapshot.
27
27
 
28
28
  ## Standards (summary)
@@ -55,8 +55,8 @@ description: >-
55
55
  - **Pause →** If I checked a box, can I point to **commit + test run** (or equivalent) that makes that check true—no wishful checking?
56
56
  - **Pause →** Did any scope shrink or shift during implementation; if so, is the plan text updated **honestly**?
57
57
 
58
- 5. **Commit** — Commit on the current branch; keep the diff limited to this spec’s intent.
59
- - **Pause →** Does `git diff` show only this spec’s intended surface, or do I need to revert irrelevant noise first?
58
+ 5. **Submit** — Stage the intended implementation/backfill diff. Run **`commit-and-push`** through commit using that staged intent (and **push** only when the user explicitly requested remote update). Keep scope to this spec only; split into multiple submission passes only when an unavoidable checkpoint requires separate commits.
59
+ - **Pause →** Does `git diff --cached` (or the equivalent staged view) show only this spec’s intended surface, or do I need to unstage/revert noise first?
60
60
  - **Pause →** Am I on the **same** branch I named in step 2, without a silent branch switch?
61
61
 
62
62
  6. **Report** — State current branch, commit hash, tests run, and which plan files were backfilled.
@@ -76,3 +76,4 @@ If this skill directory contains `references/implement-specs-common.md`, treat i
76
76
  - `enhance-existing-features`: brownfield implementation standards
77
77
  - `develop-new-features`: greenfield implementation standards
78
78
  - `recover-missing-plan`: missing or mismatched plan recovery
79
+ - **`commit-and-push`**: final commit/readiness (push only when user requests remote update)