mindforge-cc 11.5.0 → 11.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (177) hide show
  1. package/.agent/mindforge/skill-tdd.md +53 -0
  2. package/.agent/mindforge/skills-index.md +118 -0
  3. package/.agent/mindforge/systematic-debug.md +60 -0
  4. package/.agent/skills/1password-skill/SKILL.md +156 -0
  5. package/.agent/skills/1password-skill/references/cli-examples.md +31 -0
  6. package/.agent/skills/1password-skill/references/get-started.md +21 -0
  7. package/.agent/skills/article-illustrator/SKILL.md +199 -0
  8. package/.agent/skills/article-illustrator/references/prompt-construction.md +426 -0
  9. package/.agent/skills/article-illustrator/references/style-presets.md +80 -0
  10. package/.agent/skills/article-illustrator/references/styles.md +224 -0
  11. package/.agent/skills/article-illustrator/references/usage.md +50 -0
  12. package/.agent/skills/article-illustrator/references/workflow.md +332 -0
  13. package/.agent/skills/arxiv/SKILL.md +275 -0
  14. package/.agent/skills/blogwatcher/SKILL.md +130 -0
  15. package/.agent/skills/code-wiki/SKILL.md +438 -0
  16. package/.agent/skills/code-wiki/templates/README.md +31 -0
  17. package/.agent/skills/code-wiki/templates/architecture.md +30 -0
  18. package/.agent/skills/code-wiki/templates/getting-started.md +47 -0
  19. package/.agent/skills/code-wiki/templates/module.md +38 -0
  20. package/.agent/skills/codebase-inspection/SKILL.md +109 -0
  21. package/.agent/skills/comic-creator/SKILL.md +240 -0
  22. package/.agent/skills/comic-creator/references/analysis-framework.md +176 -0
  23. package/.agent/skills/comic-creator/references/auto-selection.md +71 -0
  24. package/.agent/skills/comic-creator/references/base-prompt.md +98 -0
  25. package/.agent/skills/comic-creator/references/character-template.md +180 -0
  26. package/.agent/skills/comic-creator/references/ohmsha-guide.md +85 -0
  27. package/.agent/skills/comic-creator/references/partial-workflows.md +106 -0
  28. package/.agent/skills/comic-creator/references/storyboard-template.md +143 -0
  29. package/.agent/skills/comic-creator/references/workflow.md +401 -0
  30. package/.agent/skills/concept-diagrams/SKILL.md +355 -0
  31. package/.agent/skills/concept-diagrams/references/dashboard-patterns.md +43 -0
  32. package/.agent/skills/concept-diagrams/references/infrastructure-patterns.md +144 -0
  33. package/.agent/skills/concept-diagrams/references/physical-shape-cookbook.md +42 -0
  34. package/.agent/skills/creative-ideation/SKILL.md +144 -0
  35. package/.agent/skills/creative-ideation/references/full-prompt-library.md +110 -0
  36. package/.agent/skills/devops-cli/SKILL.md +149 -0
  37. package/.agent/skills/devops-cli/references/app-discovery.md +112 -0
  38. package/.agent/skills/devops-cli/references/authentication.md +59 -0
  39. package/.agent/skills/devops-cli/references/cli-reference.md +104 -0
  40. package/.agent/skills/devops-cli/references/running-apps.md +171 -0
  41. package/.agent/skills/devops-watchers/SKILL.md +103 -0
  42. package/.agent/skills/docker-management/SKILL.md +273 -0
  43. package/.agent/skills/domain-intel/SKILL.md +96 -0
  44. package/.agent/skills/duckduckgo-search/SKILL.md +230 -0
  45. package/.agent/skills/github-auth/SKILL.md +240 -0
  46. package/.agent/skills/github-code-review/SKILL.md +474 -0
  47. package/.agent/skills/github-code-review/references/review-output-template.md +74 -0
  48. package/.agent/skills/github-issues/SKILL.md +363 -0
  49. package/.agent/skills/github-issues/templates/bug-report.md +35 -0
  50. package/.agent/skills/github-issues/templates/feature-request.md +31 -0
  51. package/.agent/skills/github-pr-workflow/SKILL.md +360 -0
  52. package/.agent/skills/github-pr-workflow/references/ci-troubleshooting.md +183 -0
  53. package/.agent/skills/github-pr-workflow/references/conventional-commits.md +71 -0
  54. package/.agent/skills/github-pr-workflow/templates/pr-body-bugfix.md +35 -0
  55. package/.agent/skills/github-pr-workflow/templates/pr-body-feature.md +33 -0
  56. package/.agent/skills/github-repo-management/SKILL.md +509 -0
  57. package/.agent/skills/github-repo-management/references/github-api-cheatsheet.md +161 -0
  58. package/.agent/skills/godmode/SKILL.md +396 -0
  59. package/.agent/skills/godmode/references/jailbreak-templates.md +128 -0
  60. package/.agent/skills/godmode/references/refusal-detection.md +142 -0
  61. package/.agent/skills/hyperframes/SKILL.md +182 -0
  62. package/.agent/skills/hyperframes/references/cli.md +185 -0
  63. package/.agent/skills/hyperframes/references/composition.md +129 -0
  64. package/.agent/skills/hyperframes/references/features.md +289 -0
  65. package/.agent/skills/hyperframes/references/gsap.md +136 -0
  66. package/.agent/skills/hyperframes/references/troubleshooting.md +137 -0
  67. package/.agent/skills/hyperframes/references/website-to-video.md +145 -0
  68. package/.agent/skills/jupyter-live-kernel/SKILL.md +160 -0
  69. package/.agent/skills/kanban-orchestrator/SKILL.md +209 -0
  70. package/.agent/skills/kanban-worker/SKILL.md +188 -0
  71. package/.agent/skills/llm-wiki/SKILL.md +499 -0
  72. package/.agent/skills/meme-generation/SKILL.md +122 -0
  73. package/.agent/skills/node-inspect-debugger/SKILL.md +312 -0
  74. package/.agent/skills/obsidian/SKILL.md +60 -0
  75. package/.agent/skills/osint-investigation/SKILL.md +269 -0
  76. package/.agent/skills/osint-investigation/templates/source-template.md +59 -0
  77. package/.agent/skills/oss-forensics/SKILL.md +422 -0
  78. package/.agent/skills/oss-forensics/references/evidence-types.md +89 -0
  79. package/.agent/skills/oss-forensics/references/github-archive-guide.md +184 -0
  80. package/.agent/skills/oss-forensics/references/investigation-templates.md +131 -0
  81. package/.agent/skills/oss-forensics/references/recovery-techniques.md +164 -0
  82. package/.agent/skills/oss-forensics/templates/forensic-report.md +151 -0
  83. package/.agent/skills/oss-forensics/templates/malicious-package-report.md +43 -0
  84. package/.agent/skills/parallel-cli/SKILL.md +384 -0
  85. package/.agent/skills/pinggy-tunnel/SKILL.md +302 -0
  86. package/.agent/skills/pixel-art/SKILL.md +209 -0
  87. package/.agent/skills/pixel-art/references/palettes.md +49 -0
  88. package/.agent/skills/plan/SKILL.md +331 -0
  89. package/.agent/skills/polymarket/SKILL.md +75 -0
  90. package/.agent/skills/polymarket/references/api-endpoints.md +220 -0
  91. package/.agent/skills/python-debugpy/SKILL.md +368 -0
  92. package/.agent/skills/requesting-code-review/SKILL.md +273 -0
  93. package/.agent/skills/research-paper-writing/SKILL.md +2367 -0
  94. package/.agent/skills/research-paper-writing/references/autoreason-methodology.md +394 -0
  95. package/.agent/skills/research-paper-writing/references/checklists.md +434 -0
  96. package/.agent/skills/research-paper-writing/references/citation-workflow.md +563 -0
  97. package/.agent/skills/research-paper-writing/references/experiment-patterns.md +728 -0
  98. package/.agent/skills/research-paper-writing/references/human-evaluation.md +476 -0
  99. package/.agent/skills/research-paper-writing/references/paper-types.md +481 -0
  100. package/.agent/skills/research-paper-writing/references/reviewer-guidelines.md +433 -0
  101. package/.agent/skills/research-paper-writing/references/sources.md +191 -0
  102. package/.agent/skills/research-paper-writing/references/writing-guide.md +474 -0
  103. package/.agent/skills/research-paper-writing/templates/README.md +251 -0
  104. package/.agent/skills/rest-graphql-debug/SKILL.md +507 -0
  105. package/.agent/skills/s6-container-supervision/SKILL.md +171 -0
  106. package/.agent/skills/scrapling/SKILL.md +328 -0
  107. package/.agent/skills/sherlock/SKILL.md +186 -0
  108. package/.agent/skills/simplify-code/SKILL.md +168 -0
  109. package/.agent/skills/skill-authoring/SKILL.md +158 -0
  110. package/.agent/skills/spike/SKILL.md +190 -0
  111. package/.agent/skills/subagent-driven-development/SKILL.md +345 -0
  112. package/.agent/skills/subagent-driven-development/references/context-budget-discipline.md +53 -0
  113. package/.agent/skills/subagent-driven-development/references/gates-taxonomy.md +93 -0
  114. package/.agent/skills/systematic-debugging/SKILL.md +360 -0
  115. package/.agent/skills/test-driven-development/SKILL.md +336 -0
  116. package/.agent/skills/video-orchestrator/SKILL.md +194 -0
  117. package/.agent/skills/video-orchestrator/references/examples.md +227 -0
  118. package/.agent/skills/video-orchestrator/references/intake.md +166 -0
  119. package/.agent/skills/video-orchestrator/references/kanban-setup.md +278 -0
  120. package/.agent/skills/video-orchestrator/references/monitoring.md +180 -0
  121. package/.agent/skills/video-orchestrator/references/role-archetypes.md +298 -0
  122. package/.agent/skills/video-orchestrator/references/tool-matrix.md +317 -0
  123. package/.agent/skills/web-pentest/SKILL.md +332 -0
  124. package/.agent/skills/web-pentest/references/bypass-techniques.md +133 -0
  125. package/.agent/skills/web-pentest/references/exploitation-techniques.md +204 -0
  126. package/.agent/skills/web-pentest/references/scope-enforcement.md +110 -0
  127. package/.agent/skills/web-pentest/references/vuln-taxonomy.md +81 -0
  128. package/.agent/skills/web-pentest/templates/authorization.md +69 -0
  129. package/.agent/skills/web-pentest/templates/pentest-report.md +178 -0
  130. package/.claude/commands/mindforge/skill-tdd.md +53 -0
  131. package/.claude/commands/mindforge/skills-index.md +118 -0
  132. package/.claude/commands/mindforge/systematic-debug.md +60 -0
  133. package/.mindforge/config.json +2 -2
  134. package/.mindforge/memory/sync-manifest.json +1 -1
  135. package/.mindforge/skills/arxiv/SKILL.md +294 -0
  136. package/.mindforge/skills/blogwatcher/SKILL.md +147 -0
  137. package/.mindforge/skills/code-wiki/SKILL.md +457 -0
  138. package/.mindforge/skills/codebase-inspection/SKILL.md +126 -0
  139. package/.mindforge/skills/concept-diagrams/SKILL.md +373 -0
  140. package/.mindforge/skills/creative-ideation/SKILL.md +162 -0
  141. package/.mindforge/skills/domain-intel/SKILL.md +116 -0
  142. package/.mindforge/skills/duckduckgo-search/SKILL.md +249 -0
  143. package/.mindforge/skills/github-code-review/SKILL.md +493 -0
  144. package/.mindforge/skills/github-issues/SKILL.md +382 -0
  145. package/.mindforge/skills/github-pr-workflow/SKILL.md +379 -0
  146. package/.mindforge/skills/jupyter-live-kernel/SKILL.md +179 -0
  147. package/.mindforge/skills/kanban-orchestrator/SKILL.md +227 -0
  148. package/.mindforge/skills/kanban-worker/SKILL.md +206 -0
  149. package/.mindforge/skills/meme-generation/SKILL.md +141 -0
  150. package/.mindforge/skills/obsidian/SKILL.md +80 -0
  151. package/.mindforge/skills/osint-investigation/SKILL.md +288 -0
  152. package/.mindforge/skills/oss-forensics/SKILL.md +421 -0
  153. package/.mindforge/skills/pixel-art/SKILL.md +228 -0
  154. package/.mindforge/skills/plan/SKILL.md +350 -0
  155. package/.mindforge/skills/requesting-code-review/SKILL.md +292 -0
  156. package/.mindforge/skills/research-paper-writing/SKILL.md +2384 -0
  157. package/.mindforge/skills/scrapling/SKILL.md +345 -0
  158. package/.mindforge/skills/sherlock/SKILL.md +203 -0
  159. package/.mindforge/skills/simplify-code/SKILL.md +187 -0
  160. package/.mindforge/skills/spike/SKILL.md +209 -0
  161. package/.mindforge/skills/subagent-driven-development/SKILL.md +364 -0
  162. package/.mindforge/skills/systematic-debugging/SKILL.md +379 -0
  163. package/.mindforge/skills/test-driven-development/SKILL.md +355 -0
  164. package/.mindforge/skills/web-pentest/SKILL.md +327 -0
  165. package/CHANGELOG.md +88 -0
  166. package/MINDFORGE.md +3 -3
  167. package/README.md +38 -3
  168. package/RELEASENOTES.md +100 -0
  169. package/bin/dashboard/api-router.js +10 -1
  170. package/bin/governance/approve.js +5 -1
  171. package/bin/memory/federated-sync.js +11 -2
  172. package/bin/memory/knowledge-capture.js +10 -1
  173. package/bin/memory/pillar-health-tracker.js +9 -1
  174. package/bin/review/ads-engine.js +2 -2
  175. package/bin/security/trust-boundaries.js +5 -0
  176. package/docs/getting-started.md +42 -5
  177. package/package.json +1 -1
@@ -0,0 +1,355 @@
1
+ ---
2
+ name: test-driven-development
3
+ description: "TDD: enforce RED-GREEN-REFACTOR, tests before code."
4
+ version: 1.1.0
5
+ status: stable
6
+ min_mindforge_version: 11.5.1
7
+ triggers: test driven development, tdd methodology, red green refactor, write test first, test before code, failing test first, tdd cycle, write failing test, make test pass, red-green-refactor, tdd methodology, test first approach
8
+ ---
9
+
10
+ # Test-Driven Development (TDD)
11
+
12
+ ## Overview
13
+
14
+ Write the test first. Watch it fail. Write minimal code to pass.
15
+
16
+ **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
17
+
18
+ **Violating the letter of the rules is violating the spirit of the rules.**
19
+
20
+ ## When to Use
21
+
22
+ **Always:**
23
+ - New features
24
+ - Bug fixes
25
+ - Refactoring
26
+ - Behavior changes
27
+
28
+ **Exceptions (ask the user first):**
29
+ - Throwaway prototypes
30
+ - Generated code
31
+ - Configuration files
32
+
33
+ Thinking "skip TDD just this once"? Stop. That's rationalization.
34
+
35
+ ## The Iron Law
36
+
37
+ ```
38
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
39
+ ```
40
+
41
+ Write code before the test? Delete it. Start over.
42
+
43
+ **No exceptions:**
44
+ - Don't keep it as "reference"
45
+ - Don't "adapt" it while writing tests
46
+ - Don't look at it
47
+ - Delete means delete
48
+
49
+ Implement fresh from tests. Period.
50
+
51
+ ## Red-Green-Refactor Cycle
52
+
53
+ ### RED — Write Failing Test
54
+
55
+ Write one minimal test showing what should happen.
56
+
57
+ **Good test:**
58
+ ```python
59
+ def test_retries_failed_operations_3_times():
60
+ attempts = 0
61
+ def operation():
62
+ nonlocal attempts
63
+ attempts += 1
64
+ if attempts < 3:
65
+ raise Exception('fail')
66
+ return 'success'
67
+
68
+ result = retry_operation(operation)
69
+
70
+ assert result == 'success'
71
+ assert attempts == 3
72
+ ```
73
+ Clear name, tests real behavior, one thing.
74
+
75
+ **Bad test:**
76
+ ```python
77
+ def test_retry_works():
78
+ mock = MagicMock()
79
+ mock.side_effect = [Exception(), Exception(), 'success']
80
+ result = retry_operation(mock)
81
+ assert result == 'success' # What about retry count? Timing?
82
+ ```
83
+ Vague name, tests mock not real code.
84
+
85
+ **Requirements:**
86
+ - One behavior per test
87
+ - Clear descriptive name ("and" in name? Split it)
88
+ - Real code, not mocks (unless truly unavoidable)
89
+ - Name describes behavior, not implementation
90
+
91
+ ### Verify RED — Watch It Fail
92
+
93
+ **MANDATORY. Never skip.**
94
+
95
+ ```bash
96
+ # Use terminal tool to run the specific test
97
+ pytest tests/test_feature.py::test_specific_behavior -v
98
+ ```
99
+
100
+ Confirm:
101
+ - Test fails (not errors from typos)
102
+ - Failure message is expected
103
+ - Fails because the feature is missing
104
+
105
+ **Test passes immediately?** You're testing existing behavior. Fix the test.
106
+
107
+ **Test errors?** Fix the error, re-run until it fails correctly.
108
+
109
+ ### GREEN — Minimal Code
110
+
111
+ Write the simplest code to pass the test. Nothing more.
112
+
113
+ **Good:**
114
+ ```python
115
+ def add(a, b):
116
+ return a + b # Nothing extra
117
+ ```
118
+
119
+ **Bad:**
120
+ ```python
121
+ def add(a, b):
122
+ result = a + b
123
+ logging.info(f"Adding {a} + {b} = {result}") # Extra!
124
+ return result
125
+ ```
126
+
127
+ Don't add features, refactor other code, or "improve" beyond the test.
128
+
129
+ **Cheating is OK in GREEN:**
130
+ - Hardcode return values
131
+ - Copy-paste
132
+ - Duplicate code
133
+ - Skip edge cases
134
+
135
+ We'll fix it in REFACTOR.
136
+
137
+ ### Verify GREEN — Watch It Pass
138
+
139
+ **MANDATORY.**
140
+
141
+ ```bash
142
+ # Run the specific test
143
+ pytest tests/test_feature.py::test_specific_behavior -v
144
+
145
+ # Then run ALL tests to check for regressions
146
+ pytest tests/ -q
147
+ ```
148
+
149
+ Confirm:
150
+ - Test passes
151
+ - Other tests still pass
152
+ - Output pristine (no errors, warnings)
153
+
154
+ **Test fails?** Fix the code, not the test.
155
+
156
+ **Other tests fail?** Fix regressions now.
157
+
158
+ ### REFACTOR — Clean Up
159
+
160
+ After green only:
161
+ - Remove duplication
162
+ - Improve names
163
+ - Extract helpers
164
+ - Simplify expressions
165
+
166
+ Keep tests green throughout. Don't add behavior.
167
+
168
+ **If tests fail during refactor:** Undo immediately. Take smaller steps.
169
+
170
+ ### Repeat
171
+
172
+ Next failing test for next behavior. One cycle at a time.
173
+
174
+ ## Why Order Matters
175
+
176
+ **"I'll write tests after to verify it works"**
177
+
178
+ Tests written after code pass immediately. Passing immediately proves nothing:
179
+ - Might test the wrong thing
180
+ - Might test implementation, not behavior
181
+ - Might miss edge cases you forgot
182
+ - You never saw it catch the bug
183
+
184
+ Test-first forces you to see the test fail, proving it actually tests something.
185
+
186
+ **"I already manually tested all the edge cases"**
187
+
188
+ Manual testing is ad-hoc. You think you tested everything but:
189
+ - No record of what you tested
190
+ - Can't re-run when code changes
191
+ - Easy to forget cases under pressure
192
+ - "It worked when I tried it" ≠ comprehensive
193
+
194
+ Automated tests are systematic. They run the same way every time.
195
+
196
+ **"Deleting X hours of work is wasteful"**
197
+
198
+ Sunk cost fallacy. The time is already gone. Your choice now:
199
+ - Delete and rewrite with TDD (high confidence)
200
+ - Keep it and add tests after (low confidence, likely bugs)
201
+
202
+ The "waste" is keeping code you can't trust.
203
+
204
+ **"TDD is dogmatic, being pragmatic means adapting"**
205
+
206
+ TDD IS pragmatic:
207
+ - Finds bugs before commit (faster than debugging after)
208
+ - Prevents regressions (tests catch breaks immediately)
209
+ - Documents behavior (tests show how to use code)
210
+ - Enables refactoring (change freely, tests catch breaks)
211
+
212
+ "Pragmatic" shortcuts = debugging in production = slower.
213
+
214
+ **"Tests after achieve the same goals — it's spirit not ritual"**
215
+
216
+ No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
217
+
218
+ Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
219
+
220
+ ## Common Rationalizations
221
+
222
+ | Excuse | Reality |
223
+ |--------|---------|
224
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
225
+ | "I'll test after" | Tests passing immediately prove nothing. |
226
+ | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
227
+ | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
228
+ | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
229
+ | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
230
+ | "Need to explore first" | Fine. Throw away exploration, start with TDD. |
231
+ | "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
232
+ | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
233
+ | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
234
+ | "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
235
+
236
+ ## Red Flags — STOP and Start Over
237
+
238
+ If you catch yourself doing any of these, delete the code and restart with TDD:
239
+
240
+ - Code before test
241
+ - Test after implementation
242
+ - Test passes immediately on first run
243
+ - Can't explain why test failed
244
+ - Tests added "later"
245
+ - Rationalizing "just this once"
246
+ - "I already manually tested it"
247
+ - "Tests after achieve the same purpose"
248
+ - "Keep as reference" or "adapt existing code"
249
+ - "Already spent X hours, deleting is wasteful"
250
+ - "TDD is dogmatic, I'm being pragmatic"
251
+ - "This is different because..."
252
+
253
+ **All of these mean: Delete code. Start over with TDD.**
254
+
255
+ ## Verification Checklist
256
+
257
+ Before marking work complete:
258
+
259
+ - [ ] Every new function/method has a test
260
+ - [ ] Watched each test fail before implementing
261
+ - [ ] Each test failed for expected reason (feature missing, not typo)
262
+ - [ ] Wrote minimal code to pass each test
263
+ - [ ] All tests pass
264
+ - [ ] Output pristine (no errors, warnings)
265
+ - [ ] Tests use real code (mocks only if unavoidable)
266
+ - [ ] Edge cases and errors covered
267
+
268
+ Can't check all boxes? You skipped TDD. Start over.
269
+
270
+ ## When Stuck
271
+
272
+ | Problem | Solution |
273
+ |---------|----------|
274
+ | Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
275
+ | Test too complicated | Design too complicated. Simplify the interface. |
276
+ | Must mock everything | Code too coupled. Use dependency injection. |
277
+ | Test setup huge | Extract helpers. Still complex? Simplify the design. |
278
+
279
+ ##
280
+
281
+ ### Running Tests
282
+
283
+ Use the `terminal` tool to run tests at each step:
284
+
285
+ ```python
286
+ # RED — verify failure
287
+ terminal("pytest tests/test_feature.py::test_name -v")
288
+
289
+ # GREEN — verify pass
290
+ terminal("pytest tests/test_feature.py::test_name -v")
291
+
292
+ # Full suite — verify no regressions
293
+ terminal("pytest tests/ -q")
294
+ ```
295
+
296
+ ### With delegate_task
297
+
298
+ When dispatching subagents for implementation, enforce TDD in the goal:
299
+
300
+ ```python
301
+ delegate_task(
302
+ goal="Implement [feature] using strict TDD",
303
+ context="""
304
+ Follow test-driven-development skill:
305
+ 1. Write failing test FIRST
306
+ 2. Run test to verify it fails
307
+ 3. Write minimal code to pass
308
+ 4. Run test to verify it passes
309
+ 5. Refactor if needed
310
+ 6. Commit
311
+
312
+ Project test command: pytest tests/ -q
313
+ Project structure: [describe relevant files]
314
+ """,
315
+ toolsets=['terminal', 'file']
316
+ )
317
+ ```
318
+
319
+ ### With systematic-debugging
320
+
321
+ Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.
322
+
323
+ Never fix bugs without a test.
324
+
325
+ ## Testing Anti-Patterns
326
+
327
+ - **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
328
+ - **Testing implementation details** — test behavior/results, not internal method calls
329
+ - **Happy path only** — always test edge cases, errors, and boundaries
330
+ - **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them
331
+
332
+ ## Final Rule
333
+
334
+ ```
335
+ Production code → test exists and failed first
336
+ Otherwise → not TDD
337
+ ```
338
+
339
+ No exceptions without the user's explicit permission.
340
+
341
+ ## Mandatory actions when this skill is active
342
+
343
+ Before applying this skill:
344
+ - [ ] Read the task requirements fully before acting
345
+ - [ ] Confirm you understand the goal and constraints
346
+ - [ ] Check for existing work or prior context in the codebase
347
+
348
+ While working:
349
+ - [ ] Follow the methodology described above step by step
350
+ - [ ] Document any decisions or findings as you go
351
+
352
+ After completing:
353
+ - [ ] Self-check: does the output satisfy the original requirement?
354
+ - [ ] Verify no regressions or unintended side effects
355
+
@@ -0,0 +1,327 @@
1
+ ---
2
+ name: web-pentest
3
+ description: "Authorized web application penetration testing — reconnaissance, vulnerability analysis, proof-based exploitation, and professional reporting."
4
+ version: 1.0.0
5
+ status: stable
6
+ min_mindforge_version: 11.5.1
7
+ triggers: web penetration test, pentest this app, security test web app, OWASP test, authorized pentest, web application pentest, web security testing, penetration testing, pentest web, webapp security test, pentest application, web app security audit
8
+ ---
9
+
10
+ # Web Application Penetration Testing
11
+
12
+ A phased pentesting workflow for running web applications.
13
+ Built around three rules:
14
+
15
+ 1. No exploit, no report — every finding requires reproducible evidence.
16
+ 2. Bounded scope — every active request goes against a target the operator
17
+ pre-declared. Off-scope hosts are refused.
18
+ 3. Bypass exhaustion before false-positive dismissal — a "blocked" payload
19
+ is not a clean bill of health until you've tried the bypass set.
20
+
21
+ ---
22
+
23
+ ## ⚠️ Hard Guardrails — Read Before Every Engagement
24
+
25
+ Violating any of these invalidates the engagement and may be illegal.
26
+
27
+ 1. **Authorization gate.** Before the first active scan in a session, you
28
+ MUST confirm with the user, in writing, that they own or have written
29
+ authorization to test the target. Record the acknowledgement in
30
+ `engagement/authorization.md` (see template). No acknowledgement → no
31
+ active scanning. Reading public pages with `curl` is fine; sending
32
+ payloads is not.
33
+
34
+ 2. **Scope allowlist.** Maintain `engagement/scope.txt` — one hostname or
35
+ CIDR per line. Every `nmap`, `curl`, `whatweb`, browser navigation, or
36
+ payload-bearing request MUST be against an entry in scope. If a target
37
+ redirects you off-scope (3xx to a different host, a link in HTML),
38
+ STOP and confirm with the user before following.
39
+
40
+ 3. **No production systems without paper.** If the user hasn't told you
41
+ "yes, prod is in scope and I have written sign-off," assume not. Default
42
+ targets are staging, local docker, dedicated test instances.
43
+
44
+ 4. **Cloud metadata is off by default.** Do not probe `169.254.169.254`,
45
+ `metadata.google.internal`, `100.100.100.200`, `[fd00:ec2::254]`, or
46
+ equivalent unless the engagement explicitly includes SSRF-to-metadata
47
+ as a goal AND the target is one you control. The agent's browser tool
48
+ can reach these from inside your own infrastructure — don't.
49
+
50
+ 5. **Destructive payloads need approval.** SQLi payloads that DROP/DELETE,
51
+ filesystem-write SSTI, command injection with `rm`/`shutdown`/`mkfs`,
52
+ anything that mutates beyond a single test row → ASK FIRST. The
53
+ `approval.py` system catches some; don't rely on it alone.
54
+
55
+ 6. **Aux-client leakage risk.** This skill produces sessions full of SQLi/XSS/RCE
56
+ payloads, captured credentials, and JWT tokens. Anything sensitive you write to
57
+ the conversation can be replayed in context compression passes.
58
+ Mitigation:
59
+ - Redact captured tokens/credentials to the LAST 6 CHARS before logging
60
+ them in any message. Full values go to `engagement/evidence/` files,
61
+ never into chat history.
62
+
63
+ 7. **Rate limit yourself.** Default 200ms between active requests against
64
+ any single host. The recon-scan.sh script enforces this. Don't bypass
65
+ it without operator approval.
66
+
67
+ 8. **Authority of the report.** This skill produces a security
68
+ assessment, not a "PASS." Even a clean run is "no exploitable issues
69
+ FOUND in scope X within time T using methods Y" — not "the application
70
+ is secure." Mirror that language in the report.
71
+
72
+ ---
73
+
74
+ ## Phase 0: Engagement Setup
75
+
76
+ Before any scanning happens, create the engagement directory and
77
+ authorization acknowledgement.
78
+
79
+ ```bash
80
+ ENGAGEMENT=engagement-$(date +%Y%m%d-%H%M%S)
81
+ mkdir -p "$ENGAGEMENT"/{evidence,findings,reports}
82
+ cd "$ENGAGEMENT"
83
+ ```
84
+
85
+ 1. **Ask the user (verbatim):**
86
+ > "Confirm: (a) the target URL is [X], (b) you own this application
87
+ > or have written authorization to test it, and (c) the engagement
88
+ > may run for up to [N] hours starting now. Reply 'authorized' to
89
+ > proceed."
90
+
91
+ 2. **Wait for explicit `authorized` response.** Any other answer means STOP.
92
+
93
+ 3. **Record authorization** to `engagement/authorization.md` using the
94
+ template in `templates/authorization.md`. Include:
95
+ - Target URL(s) and IP(s)
96
+ - Authorization basis (ownership / written authz from $name)
97
+ - Engagement window
98
+ - Out-of-scope items (production, third-party services, etc.)
99
+ - Operator name (the user driving this session)
100
+
101
+ 4. **Build scope.txt:**
102
+ ```
103
+ localhost
104
+ 127.0.0.1
105
+ staging.example.com
106
+ 192.168.1.0/24 # internal lab only, with operator OK
107
+ ```
108
+
109
+ 5. **Read** `references/scope-enforcement.md` before issuing the first
110
+ active request — that doc has the host-extraction rules you apply
111
+ to every command/URL before it goes out.
112
+
113
+ ---
114
+
115
+ ## Phase 1: Pre-Recon (Code Analysis, optional)
116
+
117
+ Skip if no source access (black-box engagement).
118
+
119
+ If you have read access to the application source:
120
+
121
+ 1. **Map the architecture** — framework, routing, middleware stack
122
+ 2. **Inventory sinks** — every `execute(`, `os.system(`, `eval(`,
123
+ template render, file read/write, redirect target
124
+ 3. **Map auth** — session cookie vs JWT, OAuth flows, password reset,
125
+ privileged endpoints
126
+ 4. **Identify trust boundaries** — what's authenticated, what's not,
127
+ what comes from `request.*`
128
+ 5. **Backward taint** from each sink to a request source. Early-terminate
129
+ when proper sanitization is found (parameterized queries, allowlists,
130
+ `shlex.quote`, well-known escapers).
131
+
132
+ Output: `evidence/pre-recon.md` — architecture map, sink inventory,
133
+ suspected vulnerable code paths.
134
+
135
+ This is OFFLINE work. No traffic to the target.
136
+
137
+ ---
138
+
139
+ ## Phase 2: Recon (Live, Read-Only)
140
+
141
+ Maps the attack surface. All requests are GETs of public pages, no
142
+ payloads yet. Still scope-bounded.
143
+
144
+ 1. **Verify scope.** Resolve every target hostname → IP. Confirm IPs are
145
+ in scope (avoids the "DNS points somewhere unexpected" trap).
146
+
147
+ 2. **Network surface** (only if scope permits port scanning):
148
+ ```bash
149
+ nmap -sT -T3 --top-ports 100 -oN evidence/nmap.txt $TARGET
150
+ ```
151
+ Use `-T3` (default), not `-T4/-T5`. Stealthier and avoids tripping
152
+ IDS/IPS in shared environments.
153
+
154
+ 3. **Tech fingerprint:**
155
+ ```bash
156
+ whatweb -v $TARGET_URL > evidence/whatweb.txt
157
+ curl -sIk $TARGET_URL > evidence/headers.txt
158
+ ```
159
+
160
+ 4. **Endpoint discovery:**
161
+ - Crawl the app with the browser tool (`browser_navigate`,
162
+ `browser_get_images`, follow links).
163
+ - Inspect `robots.txt`, `sitemap.xml`, `.well-known/*`.
164
+ - Use the developer tools network panel via browser tool to capture
165
+ XHR/fetch calls.
166
+
167
+ 5. **Auth surface:** Identify login, registration, password reset,
168
+ session cookie names, token formats. Do NOT send credentials yet —
169
+ just observe.
170
+
171
+ 6. **Correlate with pre-recon** (if you have source). For each
172
+ `evidence/pre-recon.md` finding, mark whether the live surface
173
+ confirms it's reachable.
174
+
175
+ Output: `evidence/recon.md` — endpoints, technologies, auth model,
176
+ input vectors.
177
+
178
+ ---
179
+
180
+ ## Phase 3: Vulnerability Analysis
181
+
182
+ One delegate_task per vulnerability class. Each agent reads
183
+ `evidence/recon.md` (+ `evidence/pre-recon.md` if present), produces
184
+ `findings/<class>-queue.json` using `templates/exploitation-queue.json`.
185
+
186
+ Use `delegate_task` with these focused subagents (parallel where possible):
187
+
188
+ | Class | Goal | Reference |
189
+ |-------|------|-----------|
190
+ | `injection` | SQLi, command, path traversal, SSTI, LFI/RFI, deserialization | `references/vuln-taxonomy.md` (slot types) |
191
+ | `xss` | Reflected, stored, DOM-based | `references/vuln-taxonomy.md` (render contexts) |
192
+ | `auth` | Login bypass, JWT confusion, session fixation, OAuth flaws | `references/exploitation-techniques.md` |
193
+ | `authz` | IDOR, vertical/horizontal escalation, business logic | `references/exploitation-techniques.md` |
194
+ | `ssrf` | Internal reachability, metadata, protocol smuggling | Skip metadata unless explicitly authorized |
195
+ | `infra` | Misconfig, info disclosure, default creds, exposed admin | `references/exploitation-techniques.md` |
196
+
197
+ Each queue entry has: id, vuln class, source (file:line if known),
198
+ endpoint, parameter, slot type, suspected defense, verdict
199
+ (`identified` / `partial` / `confirmed` / `critical`), witness payload,
200
+ confidence (0-1), notes.
201
+
202
+ The analysis phase doesn't send malicious payloads yet — it stages them.
203
+ The exploitation phase actually fires them.
204
+
205
+ ---
206
+
207
+ ## Phase 4: Exploitation (Proof-Based, Conditional)
208
+
209
+ Only run a sub-agent per class where the analysis queue has actionable
210
+ entries (`identified` or `partial`).
211
+
212
+ For each candidate:
213
+
214
+ 1. **Pre-send check** — host in scope? auth gate satisfied? payload
215
+ approved if destructive?
216
+ 2. **Send the witness payload** — minimal proof. SQLi: `' AND 1=1--`
217
+ then `' AND 1=2--`. XSS: a benign marker like
218
+ `<svg/onload=console.log("HERMES-PENTEST-XSS")>`. Never `alert(1)` in
219
+ stored XSS — it'll fire for other users in shared environments.
220
+ 3. **Verify the witness fires** — for blind injection, use a sleep
221
+ probe (`SLEEP(5)`) and time the response. For SSRF, use a
222
+ tester-controlled callback host you own (NOT a public service like
223
+ webhook.site for sensitive engagements — exfil paths).
224
+ 4. **Promote level:**
225
+ - **L1 Identified** — pattern matched, no behavior change
226
+ - **L2 Partial** — sink reached, but defense in place
227
+ - **L3 Confirmed** — payload changed app behavior in observable way
228
+ - **L4 Critical** — data extracted, code executed, access escalated
229
+ 5. **Bypass exhaustion before classifying as FP.** For each candidate
230
+ that blocks: try at least the bypass set in
231
+ `references/bypass-techniques.md` for that class. Only after the set
232
+ is exhausted may you write `verdict: false_positive`.
233
+ 6. **Record evidence** for every L3/L4:
234
+ - Full request (method, URL, headers, body)
235
+ - Response (status, headers, relevant body excerpt)
236
+ - Reproducer command (curl one-liner)
237
+ - Impact statement
238
+
239
+ Output: `findings/exploitation-evidence.md`
240
+
241
+ **Redact in evidence files:**
242
+ - Any captured credentials/tokens → last 6 chars only in chat;
243
+ full value to `findings/secrets-vault.md` (gitignored).
244
+ - Other users' PII → redact.
245
+ - Your test credentials → fine to keep.
246
+
247
+ ---
248
+
249
+ ## Phase 5: Reporting
250
+
251
+ Generate the final report using `templates/pentest-report.md`. Sections:
252
+
253
+ 1. Executive summary
254
+ 2. Engagement scope (from `engagement/scope.txt`)
255
+ 3. Authorization (from `engagement/authorization.md`)
256
+ 4. Findings (L3/L4 only — proof-required). Per finding:
257
+ - Title, severity (CVSS 3.1), CWE
258
+ - Affected endpoint(s)
259
+ - Proof (request + response excerpt)
260
+ - Reproduction steps
261
+ - Impact
262
+ - Remediation
263
+ 5. Not-exploited candidates (L1/L2 with notes on what blocked them)
264
+ 6. Out-of-scope observations
265
+ 7. Methodology / tools used
266
+ 8. Limitations and what was NOT tested
267
+
268
+ **Severity policy:** CVSS only for L3/L4. L1/L2 are "candidates pending
269
+ verification" — don't assign CVSS to unverified findings.
270
+
271
+ ---
272
+
273
+ ## When to Stop
274
+
275
+ - The user revokes authorization.
276
+ - A candidate finding clearly impacts production data and you don't have
277
+ approval for destructive testing — STOP and ask.
278
+ - The target starts returning 503/429 storms — back off, reconvene with
279
+ the operator.
280
+ - You discover something *outside* the contracted scope (e.g. an exposed
281
+ customer database while testing an unrelated endpoint). STOP, document,
282
+ report to the operator. Do not pivot without explicit approval — that
283
+ pivot is what makes pentesting illegal.
284
+
285
+ ---
286
+
287
+ ## What This Skill Does NOT Cover
288
+
289
+ - Network-layer pentesting beyond port scanning (no Metasploit,
290
+ Cobalt Strike, AD attacks, network protocol fuzzing).
291
+ - Reverse engineering / binary analysis (see issue #383).
292
+ - Source-only static analysis (see issue #382).
293
+ - Active social engineering / phishing.
294
+ - Anything against systems the operator hasn't pre-authorized.
295
+
296
+ If the engagement needs any of these, escalate to a professional
297
+ pentester. This skill complements professional pentesting; it does
298
+ not replace it.
299
+
300
+ ---
301
+
302
+ ## Further Reading
303
+
304
+ - `references/scope-enforcement.md` — how to bound every active request
305
+ - `references/vuln-taxonomy.md` — slot types, render contexts, OWASP map
306
+ - `references/exploitation-techniques.md` — per-class payload patterns
307
+ - `references/bypass-techniques.md` — common WAF/filter bypasses
308
+ - `templates/authorization.md` — engagement authorization template
309
+ - `templates/pentest-report.md` — final report template
310
+ - `templates/exploitation-queue.json` — per-class finding queue schema
311
+ - `scripts/recon-scan.sh` — rate-limited nmap+whatweb+headers wrapper
312
+
313
+ ## Mandatory actions when this skill is active
314
+
315
+ Before applying this skill:
316
+ - [ ] Read the task requirements fully before acting
317
+ - [ ] Confirm you understand the goal and constraints
318
+ - [ ] Check for existing work or prior context in the codebase
319
+
320
+ While working:
321
+ - [ ] Follow the methodology described above step by step
322
+ - [ ] Document any decisions or findings as you go
323
+
324
+ After completing:
325
+ - [ ] Self-check: does the output satisfy the original requirement?
326
+ - [ ] Verify no regressions or unintended side effects
327
+