mindforge-cc 11.5.1 → 11.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (170) hide show
  1. package/.agent/mindforge/skill-tdd.md +53 -0
  2. package/.agent/mindforge/skills-index.md +118 -0
  3. package/.agent/mindforge/systematic-debug.md +60 -0
  4. package/.agent/skills/1password-skill/SKILL.md +156 -0
  5. package/.agent/skills/1password-skill/references/cli-examples.md +31 -0
  6. package/.agent/skills/1password-skill/references/get-started.md +21 -0
  7. package/.agent/skills/article-illustrator/SKILL.md +199 -0
  8. package/.agent/skills/article-illustrator/references/prompt-construction.md +426 -0
  9. package/.agent/skills/article-illustrator/references/style-presets.md +80 -0
  10. package/.agent/skills/article-illustrator/references/styles.md +224 -0
  11. package/.agent/skills/article-illustrator/references/usage.md +50 -0
  12. package/.agent/skills/article-illustrator/references/workflow.md +332 -0
  13. package/.agent/skills/arxiv/SKILL.md +275 -0
  14. package/.agent/skills/blogwatcher/SKILL.md +130 -0
  15. package/.agent/skills/code-wiki/SKILL.md +438 -0
  16. package/.agent/skills/code-wiki/templates/README.md +31 -0
  17. package/.agent/skills/code-wiki/templates/architecture.md +30 -0
  18. package/.agent/skills/code-wiki/templates/getting-started.md +47 -0
  19. package/.agent/skills/code-wiki/templates/module.md +38 -0
  20. package/.agent/skills/codebase-inspection/SKILL.md +109 -0
  21. package/.agent/skills/comic-creator/SKILL.md +240 -0
  22. package/.agent/skills/comic-creator/references/analysis-framework.md +176 -0
  23. package/.agent/skills/comic-creator/references/auto-selection.md +71 -0
  24. package/.agent/skills/comic-creator/references/base-prompt.md +98 -0
  25. package/.agent/skills/comic-creator/references/character-template.md +180 -0
  26. package/.agent/skills/comic-creator/references/ohmsha-guide.md +85 -0
  27. package/.agent/skills/comic-creator/references/partial-workflows.md +106 -0
  28. package/.agent/skills/comic-creator/references/storyboard-template.md +143 -0
  29. package/.agent/skills/comic-creator/references/workflow.md +401 -0
  30. package/.agent/skills/concept-diagrams/SKILL.md +355 -0
  31. package/.agent/skills/concept-diagrams/references/dashboard-patterns.md +43 -0
  32. package/.agent/skills/concept-diagrams/references/infrastructure-patterns.md +144 -0
  33. package/.agent/skills/concept-diagrams/references/physical-shape-cookbook.md +42 -0
  34. package/.agent/skills/creative-ideation/SKILL.md +144 -0
  35. package/.agent/skills/creative-ideation/references/full-prompt-library.md +110 -0
  36. package/.agent/skills/devops-cli/SKILL.md +149 -0
  37. package/.agent/skills/devops-cli/references/app-discovery.md +112 -0
  38. package/.agent/skills/devops-cli/references/authentication.md +59 -0
  39. package/.agent/skills/devops-cli/references/cli-reference.md +104 -0
  40. package/.agent/skills/devops-cli/references/running-apps.md +171 -0
  41. package/.agent/skills/devops-watchers/SKILL.md +103 -0
  42. package/.agent/skills/docker-management/SKILL.md +273 -0
  43. package/.agent/skills/domain-intel/SKILL.md +96 -0
  44. package/.agent/skills/duckduckgo-search/SKILL.md +230 -0
  45. package/.agent/skills/github-auth/SKILL.md +240 -0
  46. package/.agent/skills/github-code-review/SKILL.md +474 -0
  47. package/.agent/skills/github-code-review/references/review-output-template.md +74 -0
  48. package/.agent/skills/github-issues/SKILL.md +363 -0
  49. package/.agent/skills/github-issues/templates/bug-report.md +35 -0
  50. package/.agent/skills/github-issues/templates/feature-request.md +31 -0
  51. package/.agent/skills/github-pr-workflow/SKILL.md +360 -0
  52. package/.agent/skills/github-pr-workflow/references/ci-troubleshooting.md +183 -0
  53. package/.agent/skills/github-pr-workflow/references/conventional-commits.md +71 -0
  54. package/.agent/skills/github-pr-workflow/templates/pr-body-bugfix.md +35 -0
  55. package/.agent/skills/github-pr-workflow/templates/pr-body-feature.md +33 -0
  56. package/.agent/skills/github-repo-management/SKILL.md +509 -0
  57. package/.agent/skills/github-repo-management/references/github-api-cheatsheet.md +161 -0
  58. package/.agent/skills/godmode/SKILL.md +396 -0
  59. package/.agent/skills/godmode/references/jailbreak-templates.md +128 -0
  60. package/.agent/skills/godmode/references/refusal-detection.md +142 -0
  61. package/.agent/skills/hyperframes/SKILL.md +182 -0
  62. package/.agent/skills/hyperframes/references/cli.md +185 -0
  63. package/.agent/skills/hyperframes/references/composition.md +129 -0
  64. package/.agent/skills/hyperframes/references/features.md +289 -0
  65. package/.agent/skills/hyperframes/references/gsap.md +136 -0
  66. package/.agent/skills/hyperframes/references/troubleshooting.md +137 -0
  67. package/.agent/skills/hyperframes/references/website-to-video.md +145 -0
  68. package/.agent/skills/jupyter-live-kernel/SKILL.md +160 -0
  69. package/.agent/skills/kanban-orchestrator/SKILL.md +209 -0
  70. package/.agent/skills/kanban-worker/SKILL.md +188 -0
  71. package/.agent/skills/llm-wiki/SKILL.md +499 -0
  72. package/.agent/skills/meme-generation/SKILL.md +122 -0
  73. package/.agent/skills/node-inspect-debugger/SKILL.md +312 -0
  74. package/.agent/skills/obsidian/SKILL.md +60 -0
  75. package/.agent/skills/osint-investigation/SKILL.md +269 -0
  76. package/.agent/skills/osint-investigation/templates/source-template.md +59 -0
  77. package/.agent/skills/oss-forensics/SKILL.md +422 -0
  78. package/.agent/skills/oss-forensics/references/evidence-types.md +89 -0
  79. package/.agent/skills/oss-forensics/references/github-archive-guide.md +184 -0
  80. package/.agent/skills/oss-forensics/references/investigation-templates.md +131 -0
  81. package/.agent/skills/oss-forensics/references/recovery-techniques.md +164 -0
  82. package/.agent/skills/oss-forensics/templates/forensic-report.md +151 -0
  83. package/.agent/skills/oss-forensics/templates/malicious-package-report.md +43 -0
  84. package/.agent/skills/parallel-cli/SKILL.md +384 -0
  85. package/.agent/skills/pinggy-tunnel/SKILL.md +302 -0
  86. package/.agent/skills/pixel-art/SKILL.md +209 -0
  87. package/.agent/skills/pixel-art/references/palettes.md +49 -0
  88. package/.agent/skills/plan/SKILL.md +331 -0
  89. package/.agent/skills/polymarket/SKILL.md +75 -0
  90. package/.agent/skills/polymarket/references/api-endpoints.md +220 -0
  91. package/.agent/skills/python-debugpy/SKILL.md +368 -0
  92. package/.agent/skills/requesting-code-review/SKILL.md +273 -0
  93. package/.agent/skills/research-paper-writing/SKILL.md +2367 -0
  94. package/.agent/skills/research-paper-writing/references/autoreason-methodology.md +394 -0
  95. package/.agent/skills/research-paper-writing/references/checklists.md +434 -0
  96. package/.agent/skills/research-paper-writing/references/citation-workflow.md +563 -0
  97. package/.agent/skills/research-paper-writing/references/experiment-patterns.md +728 -0
  98. package/.agent/skills/research-paper-writing/references/human-evaluation.md +476 -0
  99. package/.agent/skills/research-paper-writing/references/paper-types.md +481 -0
  100. package/.agent/skills/research-paper-writing/references/reviewer-guidelines.md +433 -0
  101. package/.agent/skills/research-paper-writing/references/sources.md +191 -0
  102. package/.agent/skills/research-paper-writing/references/writing-guide.md +474 -0
  103. package/.agent/skills/research-paper-writing/templates/README.md +251 -0
  104. package/.agent/skills/rest-graphql-debug/SKILL.md +507 -0
  105. package/.agent/skills/s6-container-supervision/SKILL.md +171 -0
  106. package/.agent/skills/scrapling/SKILL.md +328 -0
  107. package/.agent/skills/sherlock/SKILL.md +186 -0
  108. package/.agent/skills/simplify-code/SKILL.md +168 -0
  109. package/.agent/skills/skill-authoring/SKILL.md +158 -0
  110. package/.agent/skills/spike/SKILL.md +190 -0
  111. package/.agent/skills/subagent-driven-development/SKILL.md +345 -0
  112. package/.agent/skills/subagent-driven-development/references/context-budget-discipline.md +53 -0
  113. package/.agent/skills/subagent-driven-development/references/gates-taxonomy.md +93 -0
  114. package/.agent/skills/systematic-debugging/SKILL.md +360 -0
  115. package/.agent/skills/test-driven-development/SKILL.md +336 -0
  116. package/.agent/skills/video-orchestrator/SKILL.md +194 -0
  117. package/.agent/skills/video-orchestrator/references/examples.md +227 -0
  118. package/.agent/skills/video-orchestrator/references/intake.md +166 -0
  119. package/.agent/skills/video-orchestrator/references/kanban-setup.md +278 -0
  120. package/.agent/skills/video-orchestrator/references/monitoring.md +180 -0
  121. package/.agent/skills/video-orchestrator/references/role-archetypes.md +298 -0
  122. package/.agent/skills/video-orchestrator/references/tool-matrix.md +317 -0
  123. package/.agent/skills/web-pentest/SKILL.md +332 -0
  124. package/.agent/skills/web-pentest/references/bypass-techniques.md +133 -0
  125. package/.agent/skills/web-pentest/references/exploitation-techniques.md +204 -0
  126. package/.agent/skills/web-pentest/references/scope-enforcement.md +110 -0
  127. package/.agent/skills/web-pentest/references/vuln-taxonomy.md +81 -0
  128. package/.agent/skills/web-pentest/templates/authorization.md +69 -0
  129. package/.agent/skills/web-pentest/templates/pentest-report.md +178 -0
  130. package/.claude/commands/mindforge/skill-tdd.md +53 -0
  131. package/.claude/commands/mindforge/skills-index.md +118 -0
  132. package/.claude/commands/mindforge/systematic-debug.md +60 -0
  133. package/.mindforge/config.json +2 -2
  134. package/.mindforge/memory/sync-manifest.json +1 -1
  135. package/.mindforge/skills/arxiv/SKILL.md +294 -0
  136. package/.mindforge/skills/blogwatcher/SKILL.md +147 -0
  137. package/.mindforge/skills/code-wiki/SKILL.md +457 -0
  138. package/.mindforge/skills/codebase-inspection/SKILL.md +126 -0
  139. package/.mindforge/skills/concept-diagrams/SKILL.md +373 -0
  140. package/.mindforge/skills/creative-ideation/SKILL.md +162 -0
  141. package/.mindforge/skills/domain-intel/SKILL.md +116 -0
  142. package/.mindforge/skills/duckduckgo-search/SKILL.md +249 -0
  143. package/.mindforge/skills/github-code-review/SKILL.md +493 -0
  144. package/.mindforge/skills/github-issues/SKILL.md +382 -0
  145. package/.mindforge/skills/github-pr-workflow/SKILL.md +379 -0
  146. package/.mindforge/skills/jupyter-live-kernel/SKILL.md +179 -0
  147. package/.mindforge/skills/kanban-orchestrator/SKILL.md +227 -0
  148. package/.mindforge/skills/kanban-worker/SKILL.md +206 -0
  149. package/.mindforge/skills/meme-generation/SKILL.md +141 -0
  150. package/.mindforge/skills/obsidian/SKILL.md +80 -0
  151. package/.mindforge/skills/osint-investigation/SKILL.md +288 -0
  152. package/.mindforge/skills/oss-forensics/SKILL.md +421 -0
  153. package/.mindforge/skills/pixel-art/SKILL.md +228 -0
  154. package/.mindforge/skills/plan/SKILL.md +350 -0
  155. package/.mindforge/skills/requesting-code-review/SKILL.md +292 -0
  156. package/.mindforge/skills/research-paper-writing/SKILL.md +2384 -0
  157. package/.mindforge/skills/scrapling/SKILL.md +345 -0
  158. package/.mindforge/skills/sherlock/SKILL.md +203 -0
  159. package/.mindforge/skills/simplify-code/SKILL.md +187 -0
  160. package/.mindforge/skills/spike/SKILL.md +209 -0
  161. package/.mindforge/skills/subagent-driven-development/SKILL.md +364 -0
  162. package/.mindforge/skills/systematic-debugging/SKILL.md +379 -0
  163. package/.mindforge/skills/test-driven-development/SKILL.md +355 -0
  164. package/.mindforge/skills/web-pentest/SKILL.md +327 -0
  165. package/CHANGELOG.md +43 -0
  166. package/MINDFORGE.md +2 -2
  167. package/README.md +39 -3
  168. package/RELEASENOTES.md +55 -0
  169. package/docs/getting-started.md +42 -5
  170. package/package.json +1 -1
@@ -0,0 +1,345 @@
1
+ ---
2
+ name: scrapling
3
+ description: "Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python."
4
+ version: 1.0.0
5
+ status: stable
6
+ min_mindforge_version: 11.5.1
7
+ triggers: scrape website, web scraping, extract web content, scrape page, web page scraping, extract from website, html scraping, scrape data, web extraction, crawl page, scrape url, web content extraction
8
+ ---
9
+
10
+ # Scrapling
11
+
12
+ [Scrapling](https://github.com/D4Vinci/Scrapling) is a web scraping framework with anti-bot bypass, stealth browser automation, and a spider framework. It provides three fetching strategies (HTTP, dynamic JS, stealth/Cloudflare) and a full CLI.
13
+
14
+ **This skill is for educational and research purposes only.** Users must comply with local/international data scraping laws and respect website Terms of Service.
15
+
16
+ ## When to Use
17
+
18
+ - Scraping static HTML pages (faster than browser tools)
19
+ - Scraping JS-rendered pages that need a real browser
20
+ - Bypassing Cloudflare Turnstile or bot detection
21
+ - Crawling multiple pages with a spider
22
+ - When the built-in `web_extract` tool does not return the data you need
23
+
24
+ ## Installation
25
+
26
+ ```bash
27
+ pip install "scrapling[all]"
28
+ scrapling install
29
+ ```
30
+
31
+ Minimal install (HTTP only, no browser):
32
+ ```bash
33
+ pip install scrapling
34
+ ```
35
+
36
+ With browser automation only:
37
+ ```bash
38
+ pip install "scrapling[fetchers]"
39
+ scrapling install
40
+ ```
41
+
42
+ ## Quick Reference
43
+
44
+ | Approach | Class | Use When |
45
+ |----------|-------|----------|
46
+ | HTTP | `Fetcher` / `FetcherSession` | Static pages, APIs, fast bulk requests |
47
+ | Dynamic | `DynamicFetcher` / `DynamicSession` | JS-rendered content, SPAs |
48
+ | Stealth | `StealthyFetcher` / `StealthySession` | Cloudflare, anti-bot protected sites |
49
+ | Spider | `Spider` | Multi-page crawling with link following |
50
+
51
+ ## CLI Usage
52
+
53
+ ### Extract Static Page
54
+
55
+ ```bash
56
+ scrapling extract get 'https://example.com' output.md
57
+ ```
58
+
59
+ With CSS selector and browser impersonation:
60
+
61
+ ```bash
62
+ scrapling extract get 'https://example.com' output.md \
63
+ --css-selector '.content' \
64
+ --impersonate 'chrome'
65
+ ```
66
+
67
+ ### Extract JS-Rendered Page
68
+
69
+ ```bash
70
+ scrapling extract fetch 'https://example.com' output.md \
71
+ --css-selector '.dynamic-content' \
72
+ --disable-resources \
73
+ --network-idle
74
+ ```
75
+
76
+ ### Extract Cloudflare-Protected Page
77
+
78
+ ```bash
79
+ scrapling extract stealthy-fetch 'https://protected-site.com' output.html \
80
+ --solve-cloudflare \
81
+ --block-webrtc \
82
+ --hide-canvas
83
+ ```
84
+
85
+ ### POST Request
86
+
87
+ ```bash
88
+ scrapling extract post 'https://example.com/api' output.json \
89
+ --json '{"query": "search term"}'
90
+ ```
91
+
92
+ ### Output Formats
93
+
94
+ The output format is determined by the file extension:
95
+ - `.html` -- raw HTML
96
+ - `.md` -- converted to Markdown
97
+ - `.txt` -- plain text
98
+ - `.json` / `.jsonl` -- JSON
99
+
100
+ ## Python: HTTP Scraping
101
+
102
+ ### Single Request
103
+
104
+ ```python
105
+ from scrapling.fetchers import Fetcher
106
+
107
+ page = Fetcher.get('https://quotes.toscrape.com/')
108
+ quotes = page.css('.quote .text::text').getall()
109
+ for q in quotes:
110
+ print(q)
111
+ ```
112
+
113
+ ### Session (Persistent Cookies)
114
+
115
+ ```python
116
+ from scrapling.fetchers import FetcherSession
117
+
118
+ with FetcherSession(impersonate='chrome') as session:
119
+ page = session.get('https://example.com/', stealthy_headers=True)
120
+ links = page.css('a::attr(href)').getall()
121
+ for link in links[:5]:
122
+ sub = session.get(link)
123
+ print(sub.css('h1::text').get())
124
+ ```
125
+
126
+ ### POST / PUT / DELETE
127
+
128
+ ```python
129
+ page = Fetcher.post('https://api.example.com/data', json={"key": "value"})
130
+ page = Fetcher.put('https://api.example.com/item/1', data={"name": "updated"})
131
+ page = Fetcher.delete('https://api.example.com/item/1')
132
+ ```
133
+
134
+ ### With Proxy
135
+
136
+ ```python
137
+ page = Fetcher.get('https://example.com', proxy='http://user:pass@proxy:8080')
138
+ ```
139
+
140
+ ## Python: Dynamic Pages (JS-Rendered)
141
+
142
+ For pages that require JavaScript execution (SPAs, lazy-loaded content):
143
+
144
+ ```python
145
+ from scrapling.fetchers import DynamicFetcher
146
+
147
+ page = DynamicFetcher.fetch('https://example.com', headless=True)
148
+ data = page.css('.js-loaded-content::text').getall()
149
+ ```
150
+
151
+ ### Wait for Specific Element
152
+
153
+ ```python
154
+ page = DynamicFetcher.fetch(
155
+ 'https://example.com',
156
+ wait_selector=('.results', 'visible'),
157
+ network_idle=True,
158
+ )
159
+ ```
160
+
161
+ ### Disable Resources for Speed
162
+
163
+ Blocks fonts, images, media, stylesheets (~25% faster):
164
+
165
+ ```python
166
+ from scrapling.fetchers import DynamicSession
167
+
168
+ with DynamicSession(headless=True, disable_resources=True, network_idle=True) as session:
169
+ page = session.fetch('https://example.com')
170
+ items = page.css('.item::text').getall()
171
+ ```
172
+
173
+ ### Custom Page Automation
174
+
175
+ ```python
176
+ from playwright.sync_api import Page
177
+ from scrapling.fetchers import DynamicFetcher
178
+
179
+ def scroll_and_click(page: Page):
180
+ page.mouse.wheel(0, 3000)
181
+ page.wait_for_timeout(1000)
182
+ page.click('button.load-more')
183
+ page.wait_for_selector('.extra-results')
184
+
185
+ page = DynamicFetcher.fetch('https://example.com', page_action=scroll_and_click)
186
+ results = page.css('.extra-results .item::text').getall()
187
+ ```
188
+
189
+ ## Python: Stealth Mode (Anti-Bot Bypass)
190
+
191
+ For Cloudflare-protected or heavily fingerprinted sites:
192
+
193
+ ```python
194
+ from scrapling.fetchers import StealthyFetcher
195
+
196
+ page = StealthyFetcher.fetch(
197
+ 'https://protected-site.com',
198
+ headless=True,
199
+ solve_cloudflare=True,
200
+ block_webrtc=True,
201
+ hide_canvas=True,
202
+ )
203
+ content = page.css('.protected-content::text').getall()
204
+ ```
205
+
206
+ ### Stealth Session
207
+
208
+ ```python
209
+ from scrapling.fetchers import StealthySession
210
+
211
+ with StealthySession(headless=True, solve_cloudflare=True) as session:
212
+ page1 = session.fetch('https://protected-site.com/page1')
213
+ page2 = session.fetch('https://protected-site.com/page2')
214
+ ```
215
+
216
+ ## Element Selection
217
+
218
+ All fetchers return a `Selector` object with these methods:
219
+
220
+ ### CSS Selectors
221
+
222
+ ```python
223
+ page.css('h1::text').get() # First h1 text
224
+ page.css('a::attr(href)').getall() # All link hrefs
225
+ page.css('.quote .text::text').getall() # Nested selection
226
+ ```
227
+
228
+ ### XPath
229
+
230
+ ```python
231
+ page.xpath('//div[@class="content"]/text()').getall()
232
+ page.xpath('//a/@href').getall()
233
+ ```
234
+
235
+ ### Find Methods
236
+
237
+ ```python
238
+ page.find_all('div', class_='quote') # By tag + attribute
239
+ page.find_by_text('Read more', tag='a') # By text content
240
+ page.find_by_regex(r'\$\d+\.\d{2}') # By regex pattern
241
+ ```
242
+
243
+ ### Similar Elements
244
+
245
+ Find elements with similar structure (useful for product listings, etc.):
246
+
247
+ ```python
248
+ first_product = page.css('.product')[0]
249
+ all_similar = first_product.find_similar()
250
+ ```
251
+
252
+ ### Navigation
253
+
254
+ ```python
255
+ el = page.css('.target')[0]
256
+ el.parent # Parent element
257
+ el.children # Child elements
258
+ el.next_sibling # Next sibling
259
+ el.prev_sibling # Previous sibling
260
+ ```
261
+
262
+ ## Python: Spider Framework
263
+
264
+ For multi-page crawling with link following:
265
+
266
+ ```python
267
+ from scrapling.spiders import Spider, Request, Response
268
+
269
+ class QuotesSpider(Spider):
270
+ name = "quotes"
271
+ start_urls = ["https://quotes.toscrape.com/"]
272
+ concurrent_requests = 10
273
+ download_delay = 1
274
+
275
+ async def parse(self, response: Response):
276
+ for quote in response.css('.quote'):
277
+ yield {
278
+ "text": quote.css('.text::text').get(),
279
+ "author": quote.css('.author::text').get(),
280
+ "tags": quote.css('.tag::text').getall(),
281
+ }
282
+
283
+ next_page = response.css('.next a::attr(href)').get()
284
+ if next_page:
285
+ yield response.follow(next_page)
286
+
287
+ result = QuotesSpider().start()
288
+ print(f"Scraped {len(result.items)} quotes")
289
+ result.items.to_json("quotes.json")
290
+ ```
291
+
292
+ ### Multi-Session Spider
293
+
294
+ Route requests to different fetcher types:
295
+
296
+ ```python
297
+ from scrapling.fetchers import FetcherSession, AsyncStealthySession
298
+
299
+ class SmartSpider(Spider):
300
+ name = "smart"
301
+ start_urls = ["https://example.com/"]
302
+
303
+ def configure_sessions(self, manager):
304
+ manager.add("fast", FetcherSession(impersonate="chrome"))
305
+ manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
306
+
307
+ async def parse(self, response: Response):
308
+ for link in response.css('a::attr(href)').getall():
309
+ if "protected" in link:
310
+ yield Request(link, sid="stealth")
311
+ else:
312
+ yield Request(link, sid="fast", callback=self.parse)
313
+ ```
314
+
315
+ ### Pause/Resume Crawling
316
+
317
+ ```python
318
+ spider = QuotesSpider(crawldir="./crawl_checkpoint")
319
+ spider.start() # Ctrl+C to pause, re-run to resume from checkpoint
320
+ ```
321
+
322
+ ## Pitfalls
323
+
324
+ - **Browser install required**: run `scrapling install` after pip install -- without it, `DynamicFetcher` and `StealthyFetcher` will fail
325
+ - **Timeouts**: DynamicFetcher/StealthyFetcher timeout is in **milliseconds** (default 30000), Fetcher timeout is in **seconds**
326
+ - **Cloudflare bypass**: `solve_cloudflare=True` adds 5-15 seconds to fetch time -- only enable when needed
327
+ - **Resource usage**: StealthyFetcher runs a real browser -- limit concurrent usage
328
+ - **Legal**: always check robots.txt and website ToS before scraping. This library is for educational and research purposes
329
+ - **Python version**: requires Python 3.10+
330
+
331
+ ## Mandatory actions when this skill is active
332
+
333
+ Before applying this skill:
334
+ - [ ] Read the task requirements fully before acting
335
+ - [ ] Confirm you understand the goal and constraints
336
+ - [ ] Check for existing work or prior context in the codebase
337
+
338
+ While working:
339
+ - [ ] Follow the methodology described above step by step
340
+ - [ ] Document any decisions or findings as you go
341
+
342
+ After completing:
343
+ - [ ] Self-check: does the output satisfy the original requirement?
344
+ - [ ] Verify no regressions or unintended side effects
345
+
@@ -0,0 +1,203 @@
1
+ ---
2
+ name: sherlock
3
+ description: "OSINT username search across 400+ social networks. Hunt down social media accounts by username."
4
+ version: 1.0.0
5
+ status: stable
6
+ min_mindforge_version: 11.5.1
7
+ triggers: sherlock, username investigation, find accounts, OSINT username, social media investigation, find social accounts, username search, account discovery, username osint, find profiles, sherlock username, account investigation
8
+ ---
9
+
10
+ # Sherlock OSINT Username Search
11
+
12
+ Hunt down social media accounts by username across 400+ social networks using the [Sherlock Project](https://github.com/sherlock-project/sherlock).
13
+
14
+ ## When to Use
15
+
16
+ - User asks to find accounts associated with a username
17
+ - User wants to check username availability across platforms
18
+ - User is conducting OSINT or reconnaissance research
19
+ - User asks "where is this username registered?" or similar
20
+
21
+ ## Requirements
22
+
23
+ - Sherlock CLI installed: `pipx install sherlock-project` or `pip install sherlock-project`
24
+ - Alternatively: Docker available (`docker run -it --rm sherlock/sherlock`)
25
+ - Network access to query social platforms
26
+
27
+ ## Procedure
28
+
29
+ ### 1. Check if Sherlock is Installed
30
+
31
+ **Before doing anything else**, verify sherlock is available:
32
+
33
+ ```bash
34
+ sherlock --version
35
+ ```
36
+
37
+ If the command fails:
38
+ - Offer to install: `pipx install sherlock-project` (recommended) or `pip install sherlock-project`
39
+ - **Do NOT** try multiple installation methods — pick one and proceed
40
+ - If installation fails, inform the user and stop
41
+
42
+ ### 2. Extract Username
43
+
44
+ **Extract the username directly from the user's message if clearly stated.**
45
+
46
+ Examples where you should **NOT** use clarify:
47
+ - "Find accounts for nasa" → username is `nasa`
48
+ - "Search for johndoe123" → username is `johndoe123`
49
+ - "Check if alice exists on social media" → username is `alice`
50
+ - "Look up user bob on social networks" → username is `bob`
51
+
52
+ **Only use clarify if:**
53
+ - Multiple potential usernames mentioned ("search for alice or bob")
54
+ - Ambiguous phrasing ("search for my username" without specifying)
55
+ - No username mentioned at all ("do an OSINT search")
56
+
57
+ When extracting, take the **exact** username as stated — preserve case, numbers, underscores, etc.
58
+
59
+ ### 3. Build Command
60
+
61
+ **Default command** (use this unless user specifically requests otherwise):
62
+ ```bash
63
+ sherlock --print-found --no-color "<username>" --timeout 90
64
+ ```
65
+
66
+ **Optional flags** (only add if user explicitly requests):
67
+ - `--nsfw` — Include NSFW sites (only if user asks)
68
+ - `--tor` — Route through Tor (only if user asks for anonymity)
69
+
70
+ **Do NOT ask about options via clarify** — just run the default search. Users can request specific options if needed.
71
+
72
+ ### 4. Execute Search
73
+
74
+ Run via the `terminal` tool. The command typically takes 30-120 seconds depending on network conditions and site count.
75
+
76
+ **Example terminal call:**
77
+ ```json
78
+ {
79
+ "command": "sherlock --print-found --no-color \"target_username\"",
80
+ "timeout": 180
81
+ }
82
+ ```
83
+
84
+ ### 5. Parse and Present Results
85
+
86
+ Sherlock outputs found accounts in a simple format. Parse the output and present:
87
+
88
+ 1. **Summary line:** "Found X accounts for username 'Y'"
89
+ 2. **Categorized links:** Group by platform type if helpful (social, professional, forums, etc.)
90
+ 3. **Output file location:** Sherlock saves results to `<username>.txt` by default
91
+
92
+ **Example output parsing:**
93
+ ```
94
+ [+] Instagram: https://instagram.com/username
95
+ [+] Twitter: https://twitter.com/username
96
+ [+] GitHub: https://github.com/username
97
+ ```
98
+
99
+ Present findings as clickable links when possible.
100
+
101
+ ## Pitfalls
102
+
103
+ ### No Results Found
104
+ If Sherlock finds no accounts, this is often correct — the username may not be registered on checked platforms. Suggest:
105
+ - Checking spelling/variation
106
+ - Trying similar usernames with `?` wildcard: `sherlock "user?name"`
107
+ - The user may have privacy settings or deleted accounts
108
+
109
+ ### Timeout Issues
110
+ Some sites are slow or block automated requests. Use `--timeout 120` to increase wait time, or `--site` to limit scope.
111
+
112
+ ### Tor Configuration
113
+ `--tor` requires Tor daemon running. If user wants anonymity but Tor isn't available, suggest:
114
+ - Installing Tor service
115
+ - Using `--proxy` with an alternative proxy
116
+
117
+ ### False Positives
118
+ Some sites always return "found" due to their response structure. Cross-reference unexpected results with manual checks.
119
+
120
+ ### Rate Limiting
121
+ Aggressive searches may trigger rate limits. For bulk username searches, add delays between calls or use `--local` with cached data.
122
+
123
+ ## Installation
124
+
125
+ ### pipx (recommended)
126
+ ```bash
127
+ pipx install sherlock-project
128
+ ```
129
+
130
+ ### pip
131
+ ```bash
132
+ pip install sherlock-project
133
+ ```
134
+
135
+ ### Docker
136
+ ```bash
137
+ docker pull sherlock/sherlock
138
+ docker run -it --rm sherlock/sherlock <username>
139
+ ```
140
+
141
+ ### Linux packages
142
+ Available on Debian 13+, Ubuntu 22.10+, Homebrew, Kali, BlackArch.
143
+
144
+ ## Ethical Use
145
+
146
+ This tool is for legitimate OSINT and research purposes only. Remind users:
147
+ - Only search usernames they own or have permission to investigate
148
+ - Respect platform terms of service
149
+ - Do not use for harassment, stalking, or illegal activities
150
+ - Consider privacy implications before sharing results
151
+
152
+ ## Verification
153
+
154
+ After running sherlock, verify:
155
+ 1. Output lists found sites with URLs
156
+ 2. `<username>.txt` file created (default output) if using file output
157
+ 3. If `--print-found` used, output should only contain `[+]` lines for matches
158
+
159
+ ## Example Interaction
160
+
161
+ **User:** "Can you check if the username 'johndoe123' exists on social media?"
162
+
163
+ **Agent procedure:**
164
+ 1. Check `sherlock --version` (verify installed)
165
+ 2. Username provided — proceed directly
166
+ 3. Run: `sherlock --print-found --no-color "johndoe123" --timeout 90`
167
+ 4. Parse output and present links
168
+
169
+ **Response format:**
170
+ > Found 12 accounts for username 'johndoe123':
171
+ >
172
+ > • https://twitter.com/johndoe123
173
+ > • https://github.com/johndoe123
174
+ > • https://instagram.com/johndoe123
175
+ > • [... additional links]
176
+ >
177
+ > Results saved to: johndoe123.txt
178
+
179
+ ---
180
+
181
+ **User:** "Search for username 'alice' including NSFW sites"
182
+
183
+ **Agent procedure:**
184
+ 1. Check sherlock installed
185
+ 2. Username + NSFW flag both provided
186
+ 3. Run: `sherlock --print-found --no-color --nsfw "alice" --timeout 90`
187
+ 4. Present results
188
+
189
+ ## Mandatory actions when this skill is active
190
+
191
+ Before applying this skill:
192
+ - [ ] Read the task requirements fully before acting
193
+ - [ ] Confirm you understand the goal and constraints
194
+ - [ ] Check for existing work or prior context in the codebase
195
+
196
+ While working:
197
+ - [ ] Follow the methodology described above step by step
198
+ - [ ] Document any decisions or findings as you go
199
+
200
+ After completing:
201
+ - [ ] Self-check: does the output satisfy the original requirement?
202
+ - [ ] Verify no regressions or unintended side effects
203
+