npm - mindforge-cc - Versions diffs - 11.5.1 → 11.6.0 - Mend

mindforge-cc 11.5.1 → 11.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (170) hide show

package/.agent/mindforge/skill-tdd.md +53 -0
package/.agent/mindforge/skills-index.md +118 -0
package/.agent/mindforge/systematic-debug.md +60 -0
package/.agent/skills/1password-skill/SKILL.md +156 -0
package/.agent/skills/1password-skill/references/cli-examples.md +31 -0
package/.agent/skills/1password-skill/references/get-started.md +21 -0
package/.agent/skills/article-illustrator/SKILL.md +199 -0
package/.agent/skills/article-illustrator/references/prompt-construction.md +426 -0
package/.agent/skills/article-illustrator/references/style-presets.md +80 -0
package/.agent/skills/article-illustrator/references/styles.md +224 -0
package/.agent/skills/article-illustrator/references/usage.md +50 -0
package/.agent/skills/article-illustrator/references/workflow.md +332 -0
package/.agent/skills/arxiv/SKILL.md +275 -0
package/.agent/skills/blogwatcher/SKILL.md +130 -0
package/.agent/skills/code-wiki/SKILL.md +438 -0
package/.agent/skills/code-wiki/templates/README.md +31 -0
package/.agent/skills/code-wiki/templates/architecture.md +30 -0
package/.agent/skills/code-wiki/templates/getting-started.md +47 -0
package/.agent/skills/code-wiki/templates/module.md +38 -0
package/.agent/skills/codebase-inspection/SKILL.md +109 -0
package/.agent/skills/comic-creator/SKILL.md +240 -0
package/.agent/skills/comic-creator/references/analysis-framework.md +176 -0
package/.agent/skills/comic-creator/references/auto-selection.md +71 -0
package/.agent/skills/comic-creator/references/base-prompt.md +98 -0
package/.agent/skills/comic-creator/references/character-template.md +180 -0
package/.agent/skills/comic-creator/references/ohmsha-guide.md +85 -0
package/.agent/skills/comic-creator/references/partial-workflows.md +106 -0
package/.agent/skills/comic-creator/references/storyboard-template.md +143 -0
package/.agent/skills/comic-creator/references/workflow.md +401 -0
package/.agent/skills/concept-diagrams/SKILL.md +355 -0
package/.agent/skills/concept-diagrams/references/dashboard-patterns.md +43 -0
package/.agent/skills/concept-diagrams/references/infrastructure-patterns.md +144 -0
package/.agent/skills/concept-diagrams/references/physical-shape-cookbook.md +42 -0
package/.agent/skills/creative-ideation/SKILL.md +144 -0
package/.agent/skills/creative-ideation/references/full-prompt-library.md +110 -0
package/.agent/skills/devops-cli/SKILL.md +149 -0
package/.agent/skills/devops-cli/references/app-discovery.md +112 -0
package/.agent/skills/devops-cli/references/authentication.md +59 -0
package/.agent/skills/devops-cli/references/cli-reference.md +104 -0
package/.agent/skills/devops-cli/references/running-apps.md +171 -0
package/.agent/skills/devops-watchers/SKILL.md +103 -0
package/.agent/skills/docker-management/SKILL.md +273 -0
package/.agent/skills/domain-intel/SKILL.md +96 -0
package/.agent/skills/duckduckgo-search/SKILL.md +230 -0
package/.agent/skills/github-auth/SKILL.md +240 -0
package/.agent/skills/github-code-review/SKILL.md +474 -0
package/.agent/skills/github-code-review/references/review-output-template.md +74 -0
package/.agent/skills/github-issues/SKILL.md +363 -0
package/.agent/skills/github-issues/templates/bug-report.md +35 -0
package/.agent/skills/github-issues/templates/feature-request.md +31 -0
package/.agent/skills/github-pr-workflow/SKILL.md +360 -0
package/.agent/skills/github-pr-workflow/references/ci-troubleshooting.md +183 -0
package/.agent/skills/github-pr-workflow/references/conventional-commits.md +71 -0
package/.agent/skills/github-pr-workflow/templates/pr-body-bugfix.md +35 -0
package/.agent/skills/github-pr-workflow/templates/pr-body-feature.md +33 -0
package/.agent/skills/github-repo-management/SKILL.md +509 -0
package/.agent/skills/github-repo-management/references/github-api-cheatsheet.md +161 -0
package/.agent/skills/godmode/SKILL.md +396 -0
package/.agent/skills/godmode/references/jailbreak-templates.md +128 -0
package/.agent/skills/godmode/references/refusal-detection.md +142 -0
package/.agent/skills/hyperframes/SKILL.md +182 -0
package/.agent/skills/hyperframes/references/cli.md +185 -0
package/.agent/skills/hyperframes/references/composition.md +129 -0
package/.agent/skills/hyperframes/references/features.md +289 -0
package/.agent/skills/hyperframes/references/gsap.md +136 -0
package/.agent/skills/hyperframes/references/troubleshooting.md +137 -0
package/.agent/skills/hyperframes/references/website-to-video.md +145 -0
package/.agent/skills/jupyter-live-kernel/SKILL.md +160 -0
package/.agent/skills/kanban-orchestrator/SKILL.md +209 -0
package/.agent/skills/kanban-worker/SKILL.md +188 -0
package/.agent/skills/llm-wiki/SKILL.md +499 -0
package/.agent/skills/meme-generation/SKILL.md +122 -0
package/.agent/skills/node-inspect-debugger/SKILL.md +312 -0
package/.agent/skills/obsidian/SKILL.md +60 -0
package/.agent/skills/osint-investigation/SKILL.md +269 -0
package/.agent/skills/osint-investigation/templates/source-template.md +59 -0
package/.agent/skills/oss-forensics/SKILL.md +422 -0
package/.agent/skills/oss-forensics/references/evidence-types.md +89 -0
package/.agent/skills/oss-forensics/references/github-archive-guide.md +184 -0
package/.agent/skills/oss-forensics/references/investigation-templates.md +131 -0
package/.agent/skills/oss-forensics/references/recovery-techniques.md +164 -0
package/.agent/skills/oss-forensics/templates/forensic-report.md +151 -0
package/.agent/skills/oss-forensics/templates/malicious-package-report.md +43 -0
package/.agent/skills/parallel-cli/SKILL.md +384 -0
package/.agent/skills/pinggy-tunnel/SKILL.md +302 -0
package/.agent/skills/pixel-art/SKILL.md +209 -0
package/.agent/skills/pixel-art/references/palettes.md +49 -0
package/.agent/skills/plan/SKILL.md +331 -0
package/.agent/skills/polymarket/SKILL.md +75 -0
package/.agent/skills/polymarket/references/api-endpoints.md +220 -0
package/.agent/skills/python-debugpy/SKILL.md +368 -0
package/.agent/skills/requesting-code-review/SKILL.md +273 -0
package/.agent/skills/research-paper-writing/SKILL.md +2367 -0
package/.agent/skills/research-paper-writing/references/autoreason-methodology.md +394 -0
package/.agent/skills/research-paper-writing/references/checklists.md +434 -0
package/.agent/skills/research-paper-writing/references/citation-workflow.md +563 -0
package/.agent/skills/research-paper-writing/references/experiment-patterns.md +728 -0
package/.agent/skills/research-paper-writing/references/human-evaluation.md +476 -0
package/.agent/skills/research-paper-writing/references/paper-types.md +481 -0
package/.agent/skills/research-paper-writing/references/reviewer-guidelines.md +433 -0
package/.agent/skills/research-paper-writing/references/sources.md +191 -0
package/.agent/skills/research-paper-writing/references/writing-guide.md +474 -0
package/.agent/skills/research-paper-writing/templates/README.md +251 -0
package/.agent/skills/rest-graphql-debug/SKILL.md +507 -0
package/.agent/skills/s6-container-supervision/SKILL.md +171 -0
package/.agent/skills/scrapling/SKILL.md +328 -0
package/.agent/skills/sherlock/SKILL.md +186 -0
package/.agent/skills/simplify-code/SKILL.md +168 -0
package/.agent/skills/skill-authoring/SKILL.md +158 -0
package/.agent/skills/spike/SKILL.md +190 -0
package/.agent/skills/subagent-driven-development/SKILL.md +345 -0
package/.agent/skills/subagent-driven-development/references/context-budget-discipline.md +53 -0
package/.agent/skills/subagent-driven-development/references/gates-taxonomy.md +93 -0
package/.agent/skills/systematic-debugging/SKILL.md +360 -0
package/.agent/skills/test-driven-development/SKILL.md +336 -0
package/.agent/skills/video-orchestrator/SKILL.md +194 -0
package/.agent/skills/video-orchestrator/references/examples.md +227 -0
package/.agent/skills/video-orchestrator/references/intake.md +166 -0
package/.agent/skills/video-orchestrator/references/kanban-setup.md +278 -0
package/.agent/skills/video-orchestrator/references/monitoring.md +180 -0
package/.agent/skills/video-orchestrator/references/role-archetypes.md +298 -0
package/.agent/skills/video-orchestrator/references/tool-matrix.md +317 -0
package/.agent/skills/web-pentest/SKILL.md +332 -0
package/.agent/skills/web-pentest/references/bypass-techniques.md +133 -0
package/.agent/skills/web-pentest/references/exploitation-techniques.md +204 -0
package/.agent/skills/web-pentest/references/scope-enforcement.md +110 -0
package/.agent/skills/web-pentest/references/vuln-taxonomy.md +81 -0
package/.agent/skills/web-pentest/templates/authorization.md +69 -0
package/.agent/skills/web-pentest/templates/pentest-report.md +178 -0
package/.claude/commands/mindforge/skill-tdd.md +53 -0
package/.claude/commands/mindforge/skills-index.md +118 -0
package/.claude/commands/mindforge/systematic-debug.md +60 -0
package/.mindforge/config.json +2 -2
package/.mindforge/memory/sync-manifest.json +1 -1
package/.mindforge/skills/arxiv/SKILL.md +294 -0
package/.mindforge/skills/blogwatcher/SKILL.md +147 -0
package/.mindforge/skills/code-wiki/SKILL.md +457 -0
package/.mindforge/skills/codebase-inspection/SKILL.md +126 -0
package/.mindforge/skills/concept-diagrams/SKILL.md +373 -0
package/.mindforge/skills/creative-ideation/SKILL.md +162 -0
package/.mindforge/skills/domain-intel/SKILL.md +116 -0
package/.mindforge/skills/duckduckgo-search/SKILL.md +249 -0
package/.mindforge/skills/github-code-review/SKILL.md +493 -0
package/.mindforge/skills/github-issues/SKILL.md +382 -0
package/.mindforge/skills/github-pr-workflow/SKILL.md +379 -0
package/.mindforge/skills/jupyter-live-kernel/SKILL.md +179 -0
package/.mindforge/skills/kanban-orchestrator/SKILL.md +227 -0
package/.mindforge/skills/kanban-worker/SKILL.md +206 -0
package/.mindforge/skills/meme-generation/SKILL.md +141 -0
package/.mindforge/skills/obsidian/SKILL.md +80 -0
package/.mindforge/skills/osint-investigation/SKILL.md +288 -0
package/.mindforge/skills/oss-forensics/SKILL.md +421 -0
package/.mindforge/skills/pixel-art/SKILL.md +228 -0
package/.mindforge/skills/plan/SKILL.md +350 -0
package/.mindforge/skills/requesting-code-review/SKILL.md +292 -0
package/.mindforge/skills/research-paper-writing/SKILL.md +2384 -0
package/.mindforge/skills/scrapling/SKILL.md +345 -0
package/.mindforge/skills/sherlock/SKILL.md +203 -0
package/.mindforge/skills/simplify-code/SKILL.md +187 -0
package/.mindforge/skills/spike/SKILL.md +209 -0
package/.mindforge/skills/subagent-driven-development/SKILL.md +364 -0
package/.mindforge/skills/systematic-debugging/SKILL.md +379 -0
package/.mindforge/skills/test-driven-development/SKILL.md +355 -0
package/.mindforge/skills/web-pentest/SKILL.md +327 -0
package/CHANGELOG.md +43 -0
package/MINDFORGE.md +2 -2
package/README.md +39 -3
package/RELEASENOTES.md +55 -0
package/docs/getting-started.md +42 -5
package/package.json +1 -1

package/.mindforge/skills/test-driven-development/SKILL.md ADDED Viewed

@@ -0,0 +1,355 @@
+---
+name: test-driven-development
+description: "TDD: enforce RED-GREEN-REFACTOR, tests before code."
+version: 1.1.0
+status: stable
+min_mindforge_version: 11.5.1
+triggers: test driven development, tdd methodology, red green refactor, write test first, test before code, failing test first, tdd cycle, write failing test, make test pass, red-green-refactor, tdd methodology, test first approach
+---
+# Test-Driven Development (TDD)
+## Overview
+Write the test first. Watch it fail. Write minimal code to pass.
+**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
+**Violating the letter of the rules is violating the spirit of the rules.**
+## When to Use
+**Always:**
+- New features
+- Bug fixes
+- Refactoring
+- Behavior changes
+**Exceptions (ask the user first):**
+- Throwaway prototypes
+- Generated code
+- Configuration files
+Thinking "skip TDD just this once"? Stop. That's rationalization.
+## The Iron Law
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+Write code before the test? Delete it. Start over.
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+Implement fresh from tests. Period.
+## Red-Green-Refactor Cycle
+### RED — Write Failing Test
+Write one minimal test showing what should happen.
+**Good test:**
+```python
+def test_retries_failed_operations_3_times():
+    attempts = 0
+    def operation():
+        nonlocal attempts
+        attempts += 1
+        if attempts < 3:
+            raise Exception('fail')
+        return 'success'
+    result = retry_operation(operation)
+    assert result == 'success'
+    assert attempts == 3
+```
+Clear name, tests real behavior, one thing.
+**Bad test:**
+```python
+def test_retry_works():
+    mock = MagicMock()
+    mock.side_effect = [Exception(), Exception(), 'success']
+    result = retry_operation(mock)
+    assert result == 'success'  # What about retry count? Timing?
+```
+Vague name, tests mock not real code.
+**Requirements:**
+- One behavior per test
+- Clear descriptive name ("and" in name? Split it)
+- Real code, not mocks (unless truly unavoidable)
+- Name describes behavior, not implementation
+### Verify RED — Watch It Fail
+**MANDATORY. Never skip.**
+```bash
+# Use terminal tool to run the specific test
+pytest tests/test_feature.py::test_specific_behavior -v
+```
+Confirm:
+- Test fails (not errors from typos)
+- Failure message is expected
+- Fails because the feature is missing
+**Test passes immediately?** You're testing existing behavior. Fix the test.
+**Test errors?** Fix the error, re-run until it fails correctly.
+### GREEN — Minimal Code
+Write the simplest code to pass the test. Nothing more.
+**Good:**
+```python
+def add(a, b):
+    return a + b  # Nothing extra
+```
+**Bad:**
+```python
+def add(a, b):
+    result = a + b
+    logging.info(f"Adding {a} + {b} = {result}")  # Extra!
+    return result
+```
+Don't add features, refactor other code, or "improve" beyond the test.
+**Cheating is OK in GREEN:**
+- Hardcode return values
+- Copy-paste
+- Duplicate code
+- Skip edge cases
+We'll fix it in REFACTOR.
+### Verify GREEN — Watch It Pass
+**MANDATORY.**
+```bash
+# Run the specific test
+pytest tests/test_feature.py::test_specific_behavior -v
+# Then run ALL tests to check for regressions
+pytest tests/ -q
+```
+Confirm:
+- Test passes
+- Other tests still pass
+- Output pristine (no errors, warnings)
+**Test fails?** Fix the code, not the test.
+**Other tests fail?** Fix regressions now.
+### REFACTOR — Clean Up
+After green only:
+- Remove duplication
+- Improve names
+- Extract helpers
+- Simplify expressions
+Keep tests green throughout. Don't add behavior.
+**If tests fail during refactor:** Undo immediately. Take smaller steps.
+### Repeat
+Next failing test for next behavior. One cycle at a time.
+## Why Order Matters
+**"I'll write tests after to verify it works"**
+Tests written after code pass immediately. Passing immediately proves nothing:
+- Might test the wrong thing
+- Might test implementation, not behavior
+- Might miss edge cases you forgot
+- You never saw it catch the bug
+Test-first forces you to see the test fail, proving it actually tests something.
+**"I already manually tested all the edge cases"**
+Manual testing is ad-hoc. You think you tested everything but:
+- No record of what you tested
+- Can't re-run when code changes
+- Easy to forget cases under pressure
+- "It worked when I tried it" ≠ comprehensive
+Automated tests are systematic. They run the same way every time.
+**"Deleting X hours of work is wasteful"**
+Sunk cost fallacy. The time is already gone. Your choice now:
+- Delete and rewrite with TDD (high confidence)
+- Keep it and add tests after (low confidence, likely bugs)
+The "waste" is keeping code you can't trust.
+**"TDD is dogmatic, being pragmatic means adapting"**
+TDD IS pragmatic:
+- Finds bugs before commit (faster than debugging after)
+- Prevents regressions (tests catch breaks immediately)
+- Documents behavior (tests show how to use code)
+- Enables refactoring (change freely, tests catch breaks)
+"Pragmatic" shortcuts = debugging in production = slower.
+**"Tests after achieve the same goals — it's spirit not ritual"**
+No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
+Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
+| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
+| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
+| "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
+## Red Flags — STOP and Start Over
+If you catch yourself doing any of these, delete the code and restart with TDD:
+- Code before test
+- Test after implementation
+- Test passes immediately on first run
+- Can't explain why test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "Keep as reference" or "adapt existing code"
+- "Already spent X hours, deleting is wasteful"
+- "TDD is dogmatic, I'm being pragmatic"
+- "This is different because..."
+**All of these mean: Delete code. Start over with TDD.**
+## Verification Checklist
+Before marking work complete:
+- [ ] Every new function/method has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason (feature missing, not typo)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass
+- [ ] Output pristine (no errors, warnings)
+- [ ] Tests use real code (mocks only if unavoidable)
+- [ ] Edge cases and errors covered
+Can't check all boxes? You skipped TDD. Start over.
+## When Stuck
+| Problem | Solution |
+|---------|----------|
+| Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
+| Test too complicated | Design too complicated. Simplify the interface. |
+| Must mock everything | Code too coupled. Use dependency injection. |
+| Test setup huge | Extract helpers. Still complex? Simplify the design. |
+##
+### Running Tests
+Use the `terminal` tool to run tests at each step:
+```python
+# RED — verify failure
+terminal("pytest tests/test_feature.py::test_name -v")
+# GREEN — verify pass
+terminal("pytest tests/test_feature.py::test_name -v")
+# Full suite — verify no regressions
+terminal("pytest tests/ -q")
+```
+### With delegate_task
+When dispatching subagents for implementation, enforce TDD in the goal:
+```python
+delegate_task(
+    goal="Implement [feature] using strict TDD",
+    context="""
+    Follow test-driven-development skill:
+    1. Write failing test FIRST
+    2. Run test to verify it fails
+    3. Write minimal code to pass
+    4. Run test to verify it passes
+    5. Refactor if needed
+    6. Commit
+    Project test command: pytest tests/ -q
+    Project structure: [describe relevant files]
+    """,
+    toolsets=['terminal', 'file']
+)
+```
+### With systematic-debugging
+Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.
+Never fix bugs without a test.
+## Testing Anti-Patterns
+- **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
+- **Testing implementation details** — test behavior/results, not internal method calls
+- **Happy path only** — always test edge cases, errors, and boundaries
+- **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them
+## Final Rule
+```
+Production code → test exists and failed first
+Otherwise → not TDD
+```
+No exceptions without the user's explicit permission.
+## Mandatory actions when this skill is active
+Before applying this skill:
+- [ ] Read the task requirements fully before acting
+- [ ] Confirm you understand the goal and constraints
+- [ ] Check for existing work or prior context in the codebase
+While working:
+- [ ] Follow the methodology described above step by step
+- [ ] Document any decisions or findings as you go
+After completing:
+- [ ] Self-check: does the output satisfy the original requirement?
+- [ ] Verify no regressions or unintended side effects

package/.mindforge/skills/web-pentest/SKILL.md ADDED Viewed

@@ -0,0 +1,327 @@
+---
+name: web-pentest
+description: "Authorized web application penetration testing — reconnaissance, vulnerability analysis, proof-based exploitation, and professional reporting."
+version: 1.0.0
+status: stable
+min_mindforge_version: 11.5.1
+triggers: web penetration test, pentest this app, security test web app, OWASP test, authorized pentest, web application pentest, web security testing, penetration testing, pentest web, webapp security test, pentest application, web app security audit
+---
+# Web Application Penetration Testing
+A phased pentesting workflow for running web applications.
+Built around three rules:
+1. No exploit, no report — every finding requires reproducible evidence.
+2. Bounded scope — every active request goes against a target the operator
+   pre-declared. Off-scope hosts are refused.
+3. Bypass exhaustion before false-positive dismissal — a "blocked" payload
+   is not a clean bill of health until you've tried the bypass set.
+---
+## ⚠️ Hard Guardrails — Read Before Every Engagement
+Violating any of these invalidates the engagement and may be illegal.
+1. **Authorization gate.** Before the first active scan in a session, you
+   MUST confirm with the user, in writing, that they own or have written
+   authorization to test the target. Record the acknowledgement in
+   `engagement/authorization.md` (see template). No acknowledgement → no
+   active scanning. Reading public pages with `curl` is fine; sending
+   payloads is not.
+2. **Scope allowlist.** Maintain `engagement/scope.txt` — one hostname or
+   CIDR per line. Every `nmap`, `curl`, `whatweb`, browser navigation, or
+   payload-bearing request MUST be against an entry in scope. If a target
+   redirects you off-scope (3xx to a different host, a link in HTML),
+   STOP and confirm with the user before following.
+3. **No production systems without paper.** If the user hasn't told you
+   "yes, prod is in scope and I have written sign-off," assume not. Default
+   targets are staging, local docker, dedicated test instances.
+4. **Cloud metadata is off by default.** Do not probe `169.254.169.254`,
+   `metadata.google.internal`, `100.100.100.200`, `[fd00:ec2::254]`, or
+   equivalent unless the engagement explicitly includes SSRF-to-metadata
+   as a goal AND the target is one you control. The agent's browser tool
+   can reach these from inside your own infrastructure — don't.
+5. **Destructive payloads need approval.** SQLi payloads that DROP/DELETE,
+   filesystem-write SSTI, command injection with `rm`/`shutdown`/`mkfs`,
+   anything that mutates beyond a single test row → ASK FIRST. The
+   `approval.py` system catches some; don't rely on it alone.
+6. **Aux-client leakage risk.** This skill produces sessions full of SQLi/XSS/RCE
+   payloads, captured credentials, and JWT tokens. Anything sensitive you write to
+   the conversation can be replayed in context compression passes.
+   Mitigation:
+   - Redact captured tokens/credentials to the LAST 6 CHARS before logging
+     them in any message. Full values go to `engagement/evidence/` files,
+     never into chat history.
+7. **Rate limit yourself.** Default 200ms between active requests against
+   any single host. The recon-scan.sh script enforces this. Don't bypass
+   it without operator approval.
+8. **Authority of the report.** This skill produces a security
+   assessment, not a "PASS." Even a clean run is "no exploitable issues
+   FOUND in scope X within time T using methods Y" — not "the application
+   is secure." Mirror that language in the report.
+---
+## Phase 0: Engagement Setup
+Before any scanning happens, create the engagement directory and
+authorization acknowledgement.
+```bash
+ENGAGEMENT=engagement-$(date +%Y%m%d-%H%M%S)
+mkdir -p "$ENGAGEMENT"/{evidence,findings,reports}
+cd "$ENGAGEMENT"
+```
+1. **Ask the user (verbatim):**
+   > "Confirm: (a) the target URL is [X], (b) you own this application
+   > or have written authorization to test it, and (c) the engagement
+   > may run for up to [N] hours starting now. Reply 'authorized' to
+   > proceed."
+2. **Wait for explicit `authorized` response.** Any other answer means STOP.
+3. **Record authorization** to `engagement/authorization.md` using the
+   template in `templates/authorization.md`. Include:
+   - Target URL(s) and IP(s)
+   - Authorization basis (ownership / written authz from $name)
+   - Engagement window
+   - Out-of-scope items (production, third-party services, etc.)
+   - Operator name (the user driving this session)
+4. **Build scope.txt:**
+   ```
+   localhost
+   127.0.0.1
+   staging.example.com
+   192.168.1.0/24    # internal lab only, with operator OK
+   ```
+5. **Read** `references/scope-enforcement.md` before issuing the first
+   active request — that doc has the host-extraction rules you apply
+   to every command/URL before it goes out.
+---
+## Phase 1: Pre-Recon (Code Analysis, optional)
+Skip if no source access (black-box engagement).
+If you have read access to the application source:
+1. **Map the architecture** — framework, routing, middleware stack
+2. **Inventory sinks** — every `execute(`, `os.system(`, `eval(`,
+   template render, file read/write, redirect target
+3. **Map auth** — session cookie vs JWT, OAuth flows, password reset,
+   privileged endpoints
+4. **Identify trust boundaries** — what's authenticated, what's not,
+   what comes from `request.*`
+5. **Backward taint** from each sink to a request source. Early-terminate
+   when proper sanitization is found (parameterized queries, allowlists,
+   `shlex.quote`, well-known escapers).
+Output: `evidence/pre-recon.md` — architecture map, sink inventory,
+suspected vulnerable code paths.
+This is OFFLINE work. No traffic to the target.
+---
+## Phase 2: Recon (Live, Read-Only)
+Maps the attack surface. All requests are GETs of public pages, no
+payloads yet. Still scope-bounded.
+1. **Verify scope.** Resolve every target hostname → IP. Confirm IPs are
+   in scope (avoids the "DNS points somewhere unexpected" trap).
+2. **Network surface** (only if scope permits port scanning):
+   ```bash
+   nmap -sT -T3 --top-ports 100 -oN evidence/nmap.txt $TARGET
+   ```
+   Use `-T3` (default), not `-T4/-T5`. Stealthier and avoids tripping
+   IDS/IPS in shared environments.
+3. **Tech fingerprint:**
+   ```bash
+   whatweb -v $TARGET_URL > evidence/whatweb.txt
+   curl -sIk $TARGET_URL > evidence/headers.txt
+   ```
+4. **Endpoint discovery:**
+   - Crawl the app with the browser tool (`browser_navigate`,
+     `browser_get_images`, follow links).
+   - Inspect `robots.txt`, `sitemap.xml`, `.well-known/*`.
+   - Use the developer tools network panel via browser tool to capture
+     XHR/fetch calls.
+5. **Auth surface:** Identify login, registration, password reset,
+   session cookie names, token formats. Do NOT send credentials yet —
+   just observe.
+6. **Correlate with pre-recon** (if you have source). For each
+   `evidence/pre-recon.md` finding, mark whether the live surface
+   confirms it's reachable.
+Output: `evidence/recon.md` — endpoints, technologies, auth model,
+input vectors.
+---
+## Phase 3: Vulnerability Analysis
+One delegate_task per vulnerability class. Each agent reads
+`evidence/recon.md` (+ `evidence/pre-recon.md` if present), produces
+`findings/<class>-queue.json` using `templates/exploitation-queue.json`.
+Use `delegate_task` with these focused subagents (parallel where possible):
+| Class | Goal | Reference |
+|-------|------|-----------|
+| `injection` | SQLi, command, path traversal, SSTI, LFI/RFI, deserialization | `references/vuln-taxonomy.md` (slot types) |
+| `xss` | Reflected, stored, DOM-based | `references/vuln-taxonomy.md` (render contexts) |
+| `auth` | Login bypass, JWT confusion, session fixation, OAuth flaws | `references/exploitation-techniques.md` |
+| `authz` | IDOR, vertical/horizontal escalation, business logic | `references/exploitation-techniques.md` |
+| `ssrf` | Internal reachability, metadata, protocol smuggling | Skip metadata unless explicitly authorized |
+| `infra` | Misconfig, info disclosure, default creds, exposed admin | `references/exploitation-techniques.md` |
+Each queue entry has: id, vuln class, source (file:line if known),
+endpoint, parameter, slot type, suspected defense, verdict
+(`identified` / `partial` / `confirmed` / `critical`), witness payload,
+confidence (0-1), notes.
+The analysis phase doesn't send malicious payloads yet — it stages them.
+The exploitation phase actually fires them.
+---
+## Phase 4: Exploitation (Proof-Based, Conditional)
+Only run a sub-agent per class where the analysis queue has actionable
+entries (`identified` or `partial`).
+For each candidate:
+1. **Pre-send check** — host in scope? auth gate satisfied? payload
+   approved if destructive?
+2. **Send the witness payload** — minimal proof. SQLi: `' AND 1=1--`
+   then `' AND 1=2--`. XSS: a benign marker like
+   `<svg/onload=console.log("HERMES-PENTEST-XSS")>`. Never `alert(1)` in
+   stored XSS — it'll fire for other users in shared environments.
+3. **Verify the witness fires** — for blind injection, use a sleep
+   probe (`SLEEP(5)`) and time the response. For SSRF, use a
+   tester-controlled callback host you own (NOT a public service like
+   webhook.site for sensitive engagements — exfil paths).
+4. **Promote level:**
+   - **L1 Identified** — pattern matched, no behavior change
+   - **L2 Partial** — sink reached, but defense in place
+   - **L3 Confirmed** — payload changed app behavior in observable way
+   - **L4 Critical** — data extracted, code executed, access escalated
+5. **Bypass exhaustion before classifying as FP.** For each candidate
+   that blocks: try at least the bypass set in
+   `references/bypass-techniques.md` for that class. Only after the set
+   is exhausted may you write `verdict: false_positive`.
+6. **Record evidence** for every L3/L4:
+   - Full request (method, URL, headers, body)
+   - Response (status, headers, relevant body excerpt)
+   - Reproducer command (curl one-liner)
+   - Impact statement
+Output: `findings/exploitation-evidence.md`
+**Redact in evidence files:**
+- Any captured credentials/tokens → last 6 chars only in chat;
+  full value to `findings/secrets-vault.md` (gitignored).
+- Other users' PII → redact.
+- Your test credentials → fine to keep.
+---
+## Phase 5: Reporting
+Generate the final report using `templates/pentest-report.md`. Sections:
+1. Executive summary
+2. Engagement scope (from `engagement/scope.txt`)
+3. Authorization (from `engagement/authorization.md`)
+4. Findings (L3/L4 only — proof-required). Per finding:
+   - Title, severity (CVSS 3.1), CWE
+   - Affected endpoint(s)
+   - Proof (request + response excerpt)
+   - Reproduction steps
+   - Impact
+   - Remediation
+5. Not-exploited candidates (L1/L2 with notes on what blocked them)
+6. Out-of-scope observations
+7. Methodology / tools used
+8. Limitations and what was NOT tested
+**Severity policy:** CVSS only for L3/L4. L1/L2 are "candidates pending
+verification" — don't assign CVSS to unverified findings.
+---
+## When to Stop
+- The user revokes authorization.
+- A candidate finding clearly impacts production data and you don't have
+  approval for destructive testing — STOP and ask.
+- The target starts returning 503/429 storms — back off, reconvene with
+  the operator.
+- You discover something *outside* the contracted scope (e.g. an exposed
+  customer database while testing an unrelated endpoint). STOP, document,
+  report to the operator. Do not pivot without explicit approval — that
+  pivot is what makes pentesting illegal.
+---
+## What This Skill Does NOT Cover
+- Network-layer pentesting beyond port scanning (no Metasploit,
+  Cobalt Strike, AD attacks, network protocol fuzzing).
+- Reverse engineering / binary analysis (see issue #383).
+- Source-only static analysis (see issue #382).
+- Active social engineering / phishing.
+- Anything against systems the operator hasn't pre-authorized.
+If the engagement needs any of these, escalate to a professional
+pentester. This skill complements professional pentesting; it does
+not replace it.
+---
+## Further Reading
+- `references/scope-enforcement.md` — how to bound every active request
+- `references/vuln-taxonomy.md` — slot types, render contexts, OWASP map
+- `references/exploitation-techniques.md` — per-class payload patterns
+- `references/bypass-techniques.md` — common WAF/filter bypasses
+- `templates/authorization.md` — engagement authorization template
+- `templates/pentest-report.md` — final report template
+- `templates/exploitation-queue.json` — per-class finding queue schema
+- `scripts/recon-scan.sh` — rate-limited nmap+whatweb+headers wrapper
+## Mandatory actions when this skill is active
+Before applying this skill:
+- [ ] Read the task requirements fully before acting
+- [ ] Confirm you understand the goal and constraints
+- [ ] Check for existing work or prior context in the codebase
+While working:
+- [ ] Follow the methodology described above step by step
+- [ ] Document any decisions or findings as you go
+After completing:
+- [ ] Self-check: does the output satisfy the original requirement?
+- [ ] Verify no regressions or unintended side effects