bmad-plus 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/CHANGELOG.md +75 -0
  2. package/README.md +482 -0
  3. package/osint-agent-package/README.md +88 -0
  4. package/osint-agent-package/SETUP_KEYS.md +108 -0
  5. package/osint-agent-package/agents/osint-investigator.md +80 -0
  6. package/osint-agent-package/install.ps1 +87 -0
  7. package/osint-agent-package/install.sh +76 -0
  8. package/osint-agent-package/skills/bmad-osint-investigate/SKILL.md +147 -0
  9. package/osint-agent-package/skills/bmad-osint-investigate/osint/SKILL.md +452 -0
  10. package/osint-agent-package/skills/bmad-osint-investigate/osint/assets/dossier-template.md +116 -0
  11. package/osint-agent-package/skills/bmad-osint-investigate/osint/references/content-extraction.md +100 -0
  12. package/osint-agent-package/skills/bmad-osint-investigate/osint/references/enrichment-databases-fr.md +148 -0
  13. package/osint-agent-package/skills/bmad-osint-investigate/osint/references/platforms.md +130 -0
  14. package/osint-agent-package/skills/bmad-osint-investigate/osint/references/psychoprofile.md +69 -0
  15. package/osint-agent-package/skills/bmad-osint-investigate/osint/references/tools.md +281 -0
  16. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/_http.py +101 -0
  17. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/apify.py +260 -0
  18. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/brightdata.py +101 -0
  19. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/diagnose.py +141 -0
  20. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/exa.py +79 -0
  21. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/jina.py +71 -0
  22. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/mcp-client.py +136 -0
  23. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/parallel.py +85 -0
  24. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/perplexity.py +102 -0
  25. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/tavily.py +72 -0
  26. package/osint-agent-package/skills/bmad-osint-investigate/osint/scripts/volley.py +208 -0
  27. package/osint-agent-package/skills/bmad-osint-investigator/SKILL.md +15 -0
  28. package/package.json +51 -0
  29. package/readme-international/README.de.md +392 -0
  30. package/readme-international/README.es.md +484 -0
  31. package/readme-international/README.fr.md +482 -0
  32. package/src/bmad-plus/agents/agent-architect-dev/SKILL.md +96 -0
  33. package/src/bmad-plus/agents/agent-architect-dev/bmad-skill-manifest.yaml +13 -0
  34. package/src/bmad-plus/agents/agent-maker/SKILL.md +201 -0
  35. package/src/bmad-plus/agents/agent-maker/bmad-skill-manifest.yaml +13 -0
  36. package/src/bmad-plus/agents/agent-orchestrator/SKILL.md +137 -0
  37. package/src/bmad-plus/agents/agent-orchestrator/bmad-skill-manifest.yaml +13 -0
  38. package/src/bmad-plus/agents/agent-quality/SKILL.md +83 -0
  39. package/src/bmad-plus/agents/agent-quality/bmad-skill-manifest.yaml +13 -0
  40. package/src/bmad-plus/agents/agent-shadow/SKILL.md +71 -0
  41. package/src/bmad-plus/agents/agent-shadow/bmad-skill-manifest.yaml +13 -0
  42. package/src/bmad-plus/agents/agent-strategist/SKILL.md +80 -0
  43. package/src/bmad-plus/agents/agent-strategist/bmad-skill-manifest.yaml +13 -0
  44. package/src/bmad-plus/data/role-triggers.yaml +209 -0
  45. package/src/bmad-plus/module-help.csv +10 -0
  46. package/src/bmad-plus/module.yaml +174 -0
  47. package/src/bmad-plus/skills/bmad-plus-autopilot/SKILL.md +99 -0
  48. package/src/bmad-plus/skills/bmad-plus-parallel/SKILL.md +93 -0
  49. package/src/bmad-plus/skills/bmad-plus-sync/SKILL.md +69 -0
  50. package/tools/bmad-plus-npx.js +33 -0
  51. package/tools/cli/bmad-plus-cli.js +50 -0
  52. package/tools/cli/commands/install.js +437 -0
  53. package/tools/cli/commands/uninstall.js +70 -0
@@ -0,0 +1,452 @@
1
+ ---
2
+ name: osint
3
+ description: >
4
+ Conduct deep OSINT research on individuals. Build full digital footprint, psychoprofile
5
+ (MBTI/Big Five), career history, social graph with confidence scores. Recursive
6
+ self-evaluation until completeness threshold is met. Includes internal intelligence
7
+ (Telegram history, email, vault contacts) before going external.
8
+ Use when: "osint", "досье", "research person", "find everything about", "пробей",
9
+ "разведка", "due diligence", "background check", "digital footprint",
10
+ "найди всё про", "собери информацию", "кто это", "профиль человека".
11
+ NOT for: company/product research without a named person, competitive analysis,
12
+ market research, content generation, or general web scraping tasks.
13
+ ---
14
+
15
+ # OSINT Skill v3.2
16
+
17
+ Systematic intelligence gathering on individuals. From a name or handle to a scored
18
+ dossier with psychoprofile, career map, and entry points.
19
+
20
+ ## Phase Router
21
+
22
+ Determine entry point from context:
23
+
24
+ - New name/handle/URL, "пробей", "find out about" → Phase 0 (full cycle)
25
+ - "Add LinkedIn/Instagram data" to existing dossier → Phase 2 (extraction)
26
+ - "Build psychoprofile" from existing data → Phase 4
27
+ - "Rate completeness" of existing dossier → Phase 5
28
+ - "Reformat" or "present" findings → Phase 6
29
+
30
+ Default (full research request): Phase 0 → 1 → 1.5 → 2 → 3 → 4 → 5 → 6.
31
+
32
+ ## Environment
33
+
34
+ All API keys via environment variables. Never hardcode tokens.
35
+
36
+ - `PERPLEXITY_API_KEY` — Perplexity Sonar (fast answers + deep research)
37
+ - `EXA_API_KEY` — Exa AI (semantic search, company/people research, deep research)
38
+ - `TAVILY_API_KEY` — Tavily (agent-optimized search + extract, $0.005/req basic)
39
+ - `APIFY_API_TOKEN` — Apify scraping (LinkedIn, Instagram, Facebook)
40
+ - `JINA_API_KEY` — Jina reader/search/deepsearch
41
+ - `PARALLEL_API_KEY` — Parallel AI search
42
+ - `BRIGHTDATA_MCP_URL` — Bright Data MCP endpoint (full URL with token)
43
+ - `MCPORTER_CONFIG` — mcporter config path
44
+
45
+ ## Scripts
46
+
47
+ Run from skill dir: `bash scripts/<name>.sh`.
48
+ Each validates env vars, exits with descriptive error + URL to get the key.
49
+
50
+ **Search & Research:**
51
+ - `diagnose.sh` — run FIRST. Capability map of all tools.
52
+ - `perplexity.sh` — `search <query>` | `sonar <query>` (AI answer) | `deep <query>` (deep research)
53
+ - `tavily.sh` — `search <query>` (basic $0.005) | `deep <query>` (advanced) | `extract <url>`
54
+ - `exa.sh` — `search <query>` | `company <name>` | `people <name>` | `crawl <url>` | `deep <prompt>`
55
+ - `first-volley.sh "Name" "context"` — parallel search, all engines at once.
56
+ - `merge-volley.sh <outdir>` — deduplicate and merge first-volley results.
57
+
58
+ **Scraping:**
59
+ - `apify.sh` — `linkedin <url>` | `instagram <handle>` | `run` | `results` | `store-search`
60
+ - `run-actor.sh` — **universal Apify runner (55+ actors).** Embedded from [apify/agent-skills](https://github.com/apify/agent-skills).
61
+ Quick answer: `bash scripts/run-actor.sh "actor/id" '{"input":"json"}'`
62
+ Export: `bash scripts/run-actor.sh "actor/id" '{"input":"json"}' --output /tmp/out.csv`
63
+ - `jina.sh` — `read <url>` | `search <query>` | `deepsearch <query>`
64
+ - `parallel.sh` — `search <query>` | `extract <url>`
65
+ - `brightdata.sh` — `scrape <url>` | `scrape-batch` | `search` | `search-geo <cc>` | `search-yandex`
66
+
67
+ ## Research Escalation Flow
68
+
69
+ **Принцип: от дешёвого к дорогому, от быстрого к глубокому.**
70
+
71
+ ### Level 1: Quick Answers (секунды, ~$0.00)
72
+ Начни ВСЕГДА с этого. Получи быстрый контекст прежде чем копать.
73
+ Запускай ВСЕ параллельно:
74
+ ```bash
75
+ # Perplexity Sonar — AI ответ с цитатами
76
+ bash skills/osint/scripts/perplexity.sh sonar "Who is <Name>, <context>"
77
+ # Brave Search — классический поиск
78
+ web_search "<Name> <company> <role>"
79
+ # Tavily — agent-optimized search с AI answer
80
+ bash skills/osint/scripts/tavily.sh search "<Name> <context>"
81
+ # Exa — семантический поиск + company/people research
82
+ bash skills/osint/scripts/exa.sh search "<Name> <context>"
83
+ bash skills/osint/scripts/exa.sh people "<Name>"
84
+ ```
85
+ → Получаешь: быстрые факты, ссылки, контекст.
86
+ → Решение: достаточно? → Phase 6. Нужно больше? → Level 2.
87
+
88
+ ### Level 2: Source Verification (секунды-минуты, ~$0.01)
89
+ Проверяй источники из Level 1 через fetch:
90
+ ```bash
91
+ # Читай найденные URL
92
+ web_fetch "<url_from_perplexity>"
93
+ bash skills/osint/scripts/jina.sh read "<url>"
94
+ bash skills/osint/scripts/parallel.sh extract "<url>"
95
+ ```
96
+ → Получаешь: подтверждённые факты, cross-reference.
97
+ → Совпадает? → дополняй досье. Нужно глубже? → Level 3.
98
+
99
+ ### Level 3: Social Media Deep Dive (~$0.01-0.10)
100
+ Подключай scraping для соцсетей:
101
+ ```bash
102
+ # LinkedIn
103
+ bash skills/osint/scripts/apify.sh linkedin "<url>"
104
+ # Instagram
105
+ bash skills/osint/scripts/apify.sh instagram "<handle>"
106
+ # Facebook, заблокированные сайты
107
+ bash skills/osint/scripts/brightdata.sh scrape "<url>"
108
+ ```
109
+ → Получаешь: структурированные профили, фото, связи.
110
+
111
+ ### Level 4: Deep Research (~$0.05-0.50)
112
+ Если нужно копать ещё глубже — формируй развёрнутый промпт и отправляй в deep research.
113
+ Запускай ВСЕ параллельно (30-60 сек каждый):
114
+ ```bash
115
+ # Perplexity Deep Research
116
+ bash skills/osint/scripts/perplexity.sh deep "<detailed research prompt about Name>"
117
+ # Exa Deep Research
118
+ bash skills/osint/scripts/exa.sh deep "<detailed prompt>"
119
+ # Parallel AI Deep Search
120
+ bash skills/osint/scripts/parallel.sh search "<detailed query>"
121
+ # Jina DeepSearch
122
+ bash skills/osint/scripts/jina.sh deepsearch "<query>"
123
+ ```
124
+
125
+ **Правило:** Level 4 промпт должен быть РАЗВЁРНУТЫМ — включай всё что уже знаешь
126
+ из Level 1-3, чтобы deep research не повторял базовые факты, а копал дальше.
127
+
128
+ ## Swarm Mode (DEFAULT)
129
+
130
+ OSINT research runs as a **swarm of parallel sub-agents on Sonnet**.
131
+ The main agent is the coordinator — it does NOT scrape itself.
132
+
133
+ ### How it works:
134
+ 1. Main agent runs Phase 0 (tooling check) and Phase 1 (seed collection) to get initial context
135
+ 2. Main agent spawns 3-5 sub-agents via `sessions_spawn` with `model: sonnet`, `mode: run`
136
+ 3. Each sub-agent gets a focused task + all known data from Phase 1
137
+ 4. Sub-agents return results → main agent merges into dossier
138
+
139
+ ### Task split pattern:
140
+ - **Agent 1: YouTube/Content** — extract transcripts via Apify (NOT yt-dlp, NOT BrightData — YouTube blocks them). 3-5 videos, speech style, topics. Use `streamers/youtube-channel-scraper` for channel data
141
+ - **Agent 2: Facebook deep** — BrightData scrape: profile, posts, about, photos, friends (use m.facebook.com for more data). For public Pages: `apify/facebook-pages-scraper` + `apify/facebook-page-contact-information`
142
+ - **Agent 3: Social platforms** — Instagram (Apify + tagged/comments scrapers), DOU, company websites, LinkedIn (BrightData). Contact enrichment: `vdrmota/contact-info-scraper` on found websites
143
+ - **Agent 4: TikTok + Regional** — TikTok profile/videos (`clockworks/tiktok-profile-scraper`), local registries, press, university records, Yandex search, Google Maps (`compass/crawler-google-places` if business owner)
144
+ - **Agent 5: Deep research** — Perplexity deep, Exa deep, Parallel deep (if needed)
145
+
146
+ ### Rules:
147
+ - Always pass ALL known data to each sub-agent (names, URLs, emails, phones, context)
148
+ - Each sub-agent saves results to `/tmp/osint-<subject>-<task>.md`
149
+ - Main agent waits for all results, then runs Phase 3-6 (cross-reference, psychoprofile, dossier)
150
+ - Budget: each sub-agent ≤$0.15, total swarm ≤$0.50
151
+ - YouTube transcripts: use **Apify** actors, NOT BrightData or yt-dlp (both blocked by YouTube)
152
+
153
+ ### Why swarm:
154
+ - 5 agents × 5 min = 10 min total (vs 30+ min sequential)
155
+ - Sonnet is 5x cheaper than Opus
156
+ - Parallel scraping avoids rate limit stacking on single IP
157
+
158
+ ---
159
+
160
+ ## Phase 0: Tooling Self-Check
161
+
162
+ 1. Execute `bash skills/osint/scripts/diagnose.sh`.
163
+ 2. Log available vs missing tools.
164
+ 3. Check internal tools: `tg.py` (Telegram history), `himalaya` (email), vault contacts.
165
+ 4. If Bright Data unavailable → Facebook and LinkedIn deep scrape limited. Inform user.
166
+ 5. If Apify unavailable → Instagram and LinkedIn structured data limited.
167
+ 6. Proceed with available toolset.
168
+
169
+ ## Phase 1: Seed Collection
170
+
171
+ **Start with Level 1 (quick answers) ALWAYS before heavy scraping.**
172
+
173
+ 1. Parse user input. Extract identifiers: names, handles, URLs, companies, locations.
174
+ 2. **Perplexity fast pass:**
175
+ ```bash
176
+ bash skills/osint/scripts/perplexity.sh search "Who is <Name>, <context>"
177
+ ```
178
+ 3. **Brave + Parallel in parallel:**
179
+ ```bash
180
+ web_search "<Name> <company>"
181
+ bash skills/osint/scripts/first-volley.sh "Full Name" "context"
182
+ ```
183
+ 4. **Review Perplexity citations** — fetch and verify top sources:
184
+ ```bash
185
+ web_fetch "<citation_url_1>"
186
+ web_fetch "<citation_url_2>"
187
+ ```
188
+ 5. Parse & merge: `bash skills/osint/scripts/merge-volley.sh /tmp/osint-<timestamp>`.
189
+ 6. Collect all identifiers into seed list. Deduplicate.
190
+ 7. Flag name collisions (common names → verify with company/location cross-reference).
191
+ 8. **Decision point:** enough context? → skip to Phase 4. Need social media? → Phase 2. Need deep dive? → Level 4 (deep research).
192
+
193
+ **Rate limiting:** wait 1s between Brave queries, 2s between Jina calls.
194
+ Do NOT hammer APIs in tight loops — stagger parallel launches.
195
+
196
+ ## Phase 1.5: Internal Intelligence
197
+
198
+ **Before going external, check what we already know.** This phase mines local sources
199
+ that may contain gold — prior conversations, emails, vault contacts.
200
+
201
+ ### Telegram History
202
+ If `tg.py` is available (check Phase 0):
203
+ ```bash
204
+ # Search by name/handle in Telegram
205
+ python3 skills/telegram/scripts/tg.py search "Name" 20
206
+ # If we have their username/id — read conversation history
207
+ python3 skills/telegram/scripts/tg.py history <username_or_id> 50
208
+ ```
209
+
210
+ **What to extract from Telegram history:**
211
+ - Communication style (formal/informal, language, emoji patterns)
212
+ - Topics discussed — what they care about, what they ask for
213
+ - Response patterns — reply speed, active hours → timezone
214
+ - Shared links/files — projects they work on
215
+ - How they address the user — relationship dynamics
216
+ - Mentioned colleagues, partners, competitors → social graph seeds
217
+ - Pricing discussions, deal terms (if business contact)
218
+
219
+ ⚠️ **Telegram history is Grade A intelligence** — unfiltered, real-time, authentic.
220
+ Weight it higher than curated LinkedIn/Instagram profiles.
221
+ ⚠️ **Privacy:** internal intelligence stays in the dossier. Never quote DMs in public outputs.
222
+
223
+ ### Email History
224
+ If `himalaya` is available:
225
+ ```bash
226
+ # Search emails by name or domain
227
+ ~/.local/bin/himalaya search "from:name@domain.com OR to:name@domain.com" -f INBOX
228
+ # Or by name
229
+ ~/.local/bin/himalaya search "Name Surname" -f INBOX
230
+ ~/.local/bin/himalaya search "Name Surname" -f Sent
231
+ ```
232
+
233
+ **What to extract from email:**
234
+ - Formal communication style vs Telegram style (contrast = insight)
235
+ - Business proposals, invoices → financial relationship
236
+ - CC'd people → organizational map
237
+ - Signature block → title, phone, company, social links (often richer than LinkedIn)
238
+
239
+ ### Vault / CRM Check
240
+ ```bash
241
+ # Check if we already have a card
242
+ grep -rl "Name" vault/crm/ vault/contacts/ 2>/dev/null
243
+ # Check MOC indexes (adjust paths to your vault structure)
244
+ grep -i "name" vault/MOC/*.md 2>/dev/null
245
+ ```
246
+
247
+ **If vault card exists:** read it, note last_accessed, existing tags, prior interactions.
248
+ Don't duplicate — enrich the existing card after research completes.
249
+
250
+ ### Node Camera/Location (if paired device available)
251
+ If meeting in person and node is available, `nodes camera_snap` can capture context.
252
+ Only with explicit user permission.
253
+
254
+ ### Internal Intelligence Summary
255
+ After Phase 1.5, you should know:
256
+ - Do we have prior relationship? (cold/warm/hot contact)
257
+ - What language do they prefer?
258
+ - What's their communication style?
259
+ - Any existing business context?
260
+ - Social graph seeds from conversations
261
+
262
+ This context shapes Phase 2 priorities — if we already know their career from emails,
263
+ focus external research on psychoprofile and social media instead.
264
+
265
+ ## Phase 2: Platform Extraction
266
+
267
+ Read `references/platforms.md` ONLY when needing URL patterns or extraction signals.
268
+
269
+ Tool priority (primary → fallback). **If primary fails, switch immediately. Never retry same tool.**
270
+
271
+ - LinkedIn: `apify.sh linkedin` → `brightdata.sh scrape` → `jina.sh read`
272
+ - Instagram: `apify.sh instagram` → `brightdata.sh scrape`
273
+ - Instagram deep: `run-actor.sh "apify/instagram-tagged-scraper"` (who tags them), `apify/instagram-comment-scraper` (sentiment)
274
+ - Facebook personal: `brightdata.sh scrape` → none (only Bright Data works)
275
+ - Facebook pages/groups: `run-actor.sh "apify/facebook-pages-scraper"` → `brightdata.sh scrape`
276
+ - TikTok: `run-actor.sh "clockworks/tiktok-profile-scraper"` → `clockworks/tiktok-scraper` (comprehensive)
277
+ - TikTok discovery: `run-actor.sh "clockworks/tiktok-user-search-scraper"` (find by keywords)
278
+ - YouTube: `run-actor.sh "streamers/youtube-channel-scraper"` → `jina.sh read` → `brightdata.sh scrape`
279
+ - Telegram channels: `web_fetch t.me/s/{channel}` → `jina.sh read`
280
+ - Twitter/X: `python3 scripts/twitter.py tweet <url>` → `jina.sh read`
281
+ - Google Maps (businesses): `run-actor.sh "compass/crawler-google-places"`
282
+ - Contact enrichment: `run-actor.sh "vdrmota/contact-info-scraper"` (extract emails/phones from any URL)
283
+ - Any site: `jina.sh read` → `brightdata.sh scrape`
284
+
285
+ **run-actor.sh** = universal Apify runner (embedded, 55+ actors). See `references/tools.md` for full actor catalog.
286
+
287
+ Read `references/tools.md` ONLY when troubleshooting a failed tool.
288
+
289
+ ### ⚠️ Content Platform Rule (CRITICAL)
290
+
291
+ When you find YouTube, podcast, blog, or conference talks — read `references/content-extraction.md` **immediately** and extract 3-5 pieces of content on the spot.
292
+
293
+ Do NOT just note the URL. Extract transcripts/text NOW.
294
+ A 20-minute YouTube video reveals more about a person than their entire LinkedIn.
295
+ Content platforms are the #1 source for psychoprofile — skipping them = shallow dossier.
296
+
297
+ ### OpSec-Aware Targets
298
+
299
+ If initial searches return unusually little for someone who should have a footprint:
300
+
301
+ 1. **Wayback Machine:** `web_fetch "https://web.archive.org/web/2024*/target-url"` — deleted profiles, old bios
302
+ 2. **Google Cache:** `web_search "cache:domain.com/path"` — recently removed pages
303
+ 3. **Yandex Cache:** `brightdata.sh search-yandex "Name"` — Yandex indexes CIS deeper and caches longer
304
+ 4. **Username variations:** try transliteration (Иванов → ivanov, ivanoff), birth year suffixes, company abbreviations
305
+ 5. **Reverse image search:** if photo found, check for other profiles using same avatar
306
+ 6. **Conference archives:** speaker bios often survive after profiles are deleted
307
+
308
+ ## Phase 3: Cross-Reference & Confidence Scoring
309
+
310
+ ### Step 1: Fact Table
311
+ List every claim as a row: fact | source 1 | source 2 | grade.
312
+
313
+ ### Step 2: Cross-check key facts
314
+ For each critical fact (employer, role, location, education):
315
+ - Compare LinkedIn title vs Telegram signature vs email signature vs company website
316
+ - If 2+ match → Grade A
317
+ - If only 1 source → Grade B
318
+ - If inferred (timezone from messages, geotag) → Grade C
319
+ - If single unverified mention → Grade D
320
+
321
+ ### Step 3: Resolve contradictions
322
+ If LinkedIn says "CEO" but company site says "Co-founder" — flag explicitly. Include both with sources. Do NOT silently pick one.
323
+
324
+ ### Step 4: Name collision check
325
+ If common name — verify at least 2 facts (company + city, or photo + company) link to same person. If unsure, split into separate entities.
326
+
327
+ ### Confidence grades:
328
+
329
+ - **A (confirmed)**: 2+ independent sources, or official/verified profile, or direct Telegram/email conversation
330
+ - **B (probable)**: 1 credible source (LinkedIn, official media, company site)
331
+ - **C (inferred)**: indirect evidence (photo geotag, timezone from message patterns, connections)
332
+ - **D (unverified)**: single mention, could be wrong
333
+
334
+ Internal intelligence (Phase 1.5) counts as an independent source.
335
+
336
+ ## Phase 4: Psychoprofile
337
+
338
+ Read `references/psychoprofile.md` ONLY at this phase.
339
+
340
+ 1. Collect text samples: posts, bios, interviews, channel content, **Telegram messages** (highest signal).
341
+ 2. Assess MBTI per dimension with cited behavioral evidence and confidence (high/medium/low).
342
+ 3. Quantify writing style: sentence length, emoji density, self-reference rate.
343
+ 4. **Compare formal (LinkedIn/email) vs informal (Telegram/Instagram) voice** — the delta reveals the real person.
344
+ 5. Deduce values from actions, not self-reported claims.
345
+ 6. Zodiac ONLY if DOB confirmed (Grade A or B).
346
+
347
+ ## Phase 5: Completeness Evaluation (Recursive)
348
+
349
+ ### Axis 1: Data Coverage (pass/fail per dimension)
350
+
351
+ 9 mandatory checks. If any fail, flag as critical gap:
352
+
353
+ 1. Subject correctly identified? (not a namesake)
354
+ 2. Current role/company confirmed?
355
+ 3. At least 2 social platforms found?
356
+ 4. At least 1 contact method (email/phone/messenger)?
357
+ 5. Career history has 2+ verifiable positions?
358
+ 6. Location (current) established?
359
+ 7. At least 1 photo found?
360
+ 8. No unresolved contradictions between sources?
361
+ 9. Internal intelligence checked? (Telegram/email/vault — even if empty)
362
+
363
+ ### Axis 2: Depth Score (8 weighted criteria)
364
+
365
+ | Dimension | Weight | What to score (1-10) |
366
+ |-----------|--------|---------------------|
367
+ | Identity | 0.15 | Full name, DOB, location, education, photo |
368
+ | Career | 0.20 | Completeness of work history, current role clarity |
369
+ | Digital footprint | 0.15 | Number of platforms found, account activity level |
370
+ | Psychoprofile | 0.15 | MBTI confidence, writing style quantified, values deduced |
371
+ | Internal intel | 0.10 | Telegram/email history depth, vault data |
372
+ | Personal life | 0.05 | Family, hobbies, lifestyle, pets |
373
+ | Cross-reference | 0.10 | How many facts are A-grade, contradiction count |
374
+ | Actionability | 0.10 | Entry points identified, approach strategy clear |
375
+
376
+ Weighted sum (1-10) = **Depth Score**.
377
+
378
+ ### Axis 3: Source Diversity
379
+
380
+ Count unique source types used (max 12):
381
+ LinkedIn, Instagram, Facebook, Telegram DM, Telegram channel, VK, Twitter/X,
382
+ company website, press/media articles, conference profiles, government/business registries,
383
+ email correspondence.
384
+
385
+ - 8+ source types = Excellent
386
+ - 5-7 = Good
387
+ - 2-4 = Shallow
388
+ - 1 = Insufficient
389
+
390
+ ### Gap Analysis
391
+
392
+ | Depth Score | Coverage | Diagnosis | Action |
393
+ |------------|----------|-----------|--------|
394
+ | 8+ | All pass | Strong dossier | Proceed to Phase 6 |
395
+ | 8+ | Some fail | Deep but blind spots | Target failed checks, 1 more cycle |
396
+ | <7 | All pass | Wide but shallow | Deepen via interviews/articles/deepsearch |
397
+ | <7 | Some fail | Restart needed | Different search angle, new tool combination |
398
+
399
+ ### Stopping Criteria
400
+
401
+ **(a)** Depth Score ≥ 8.0 AND all coverage checks pass → exit to Phase 6
402
+ **(b)** 3 cycles completed → deliver best available with honest assessment
403
+ **(c)** Two cycles with delta < 0.5 → plateau reached, deliver with note
404
+
405
+ ### Calibration Benchmarks
406
+
407
+ - **9-10**: full career timeline, 5+ platforms, confirmed DOB, psychoprofile with high confidence, family/hobbies known, multiple entry points, Telegram history analyzed. Equivalent to a professional PI report.
408
+ - **7-8**: career outline, 3+ platforms, most facts B-grade or above, psychoprofile with medium confidence. Solid due diligence.
409
+ - **5-6**: basic bio, 1-2 platforms, some gaps. Quick background check level.
410
+ - **<5**: minimal data found. Name + current role at best. Flag as insufficient.
411
+
412
+ ## Phase 6: Dossier Output
413
+
414
+ Read `assets/dossier-template.md` before rendering. Follow the template structure exactly.
415
+ No markdown tables in output (Telegram cannot render). Bullet lists only.
416
+ Report Depth Score, source count, source types, and total API spend.
417
+
418
+ If internal intelligence was used, add a separate **"из переписки"** section
419
+ (marked as internal/confidential, not for sharing outside).
420
+
421
+ ## Budget
422
+
423
+ - ≤$0.50 per target: spend without asking.
424
+ - >$0.50: ask user before proceeding.
425
+ - Track cumulative spend per research session.
426
+
427
+ ## Troubleshooting
428
+
429
+ - **All tools return empty**: target has minimal digital presence. Try Bright Data Yandex search (better for CIS region), search by company + role instead of name.
430
+ - **Wrong person keeps appearing**: add company name, city, or role to all queries. Use quotes around full name.
431
+ - **LinkedIn blocked**: use `brightdata.sh scrape` as primary instead of Apify.
432
+ - **Apify actor dead/changed**: check `apify.sh store-search "linkedin scraper"` for alternatives. Actors on Apify are volatile — always have a Bright Data fallback.
433
+ - **Depth Score stuck at 6-7**: likely missing press/media articles or internal intel. Search industry publications (AdIndex, Sostav, Forbes, Kommersant for Russian market). Try `jina.sh deepsearch`. Check Telegram history.
434
+ - **No social media found**: person may use pseudonyms. Search by email, phone, or company employee page. Search Apify store: `bash scripts/apify.sh store-search "people search"`. If `mcpc` installed: `APIFY_TOKEN=$APIFY_API_TOKEN mcpc --json mcp.apify.com --header "Authorization: Bearer $APIFY_TOKEN" tools-call search-actors keywords:="people search" limit:=10`. Check Telegram contacts by phone.
435
+ - **TikTok scraper fails**: try `clockworks/free-tiktok-scraper` (free tier) as fallback. TikTok usernames often differ from other platforms — search by real name via `clockworks/tiktok-user-search-scraper`.
436
+ - **Need emails from website**: use `vdrmota/contact-info-scraper` — it crawls the site and extracts all contact info.
437
+ - **Rate limited (429)**: back off 5s, then 15s. Switch to fallback tool. Never retry immediately.
438
+
439
+ ## Anti-Patterns
440
+
441
+ 1. Never start with a single tool. Launch all available in parallel.
442
+ 2. Never retry a failed tool more than once. Switch to fallback.
443
+ 3. Never guess DOB, family, or zodiac.
444
+ 4. Never attribute data without cross-referencing against namesakes.
445
+ 5. Never include unsourced facts.
446
+ 6. Never reveal OSINT methods in public messages.
447
+ 7. Never exceed 3 recursive cycles. Diminishing returns.
448
+ 8. Never rate Depth Score 9+ without justification.
449
+ 9. Never skip psychoprofile. Without it, dossier = Wikipedia article.
450
+ 10. Never skip Phase 1.5 (internal intel). Telegram history is often the richest source.
451
+ 11. Never quote DMs verbatim in shareable outputs. Summarize and cite.
452
+ 12. Never hammer APIs without rate limiting. Stagger requests.
@@ -0,0 +1,116 @@
1
+ 🔍 **OSINT: {FULL_NAME}**
2
+ *{one_line_summary}*
3
+
4
+ ---
5
+
6
+ **контакты и соцсети**
7
+ - telegram: {handle} {status}
8
+ - linkedin: {url} {status}
9
+ - instagram: {handle} {status}
10
+ - facebook: {url} {status}
11
+ - email: {email} {status}
12
+ - телефон: {phone} {status}
13
+ - другое: {any_other}
14
+
15
+ ---
16
+
17
+ **биография**
18
+
19
+ - дата рождения: {dob_only_if_confirmed_A_or_B}
20
+ - возраст: {age}
21
+ - место рождения: {birthplace}
22
+ - образование: {education}
23
+
24
+ **карьера:**
25
+ - **{year}** — {role}, {company} ({age_at_time})
26
+ - **{year}** — {role}, {company}
27
+ - ...
28
+ - **сейчас** — {current_role}, {current_company}
29
+
30
+ паттерн карьеры: {career_pattern_description}
31
+
32
+ ---
33
+
34
+ **текущая ситуация**
35
+ - роль: {current_role}
36
+ - компания: {current_company} — {company_description}
37
+ - локация: {current_location}
38
+
39
+ ---
40
+
41
+ **из переписки** 🔒
42
+ *секция на основе Telegram/email — не для внешнего использования*
43
+
44
+ - характер общения: {formal_informal_emoji_patterns}
45
+ - темы: {what_they_discuss_ask_about}
46
+ - стиль: {response_speed_active_hours_language}
47
+ - контекст отношений: {cold_warm_hot_business_personal}
48
+ - упомянутые люди: {colleagues_partners_from_conversations}
49
+ - {any_deal_terms_pricing_discussed}
50
+
51
+ ---
52
+
53
+ **психопрофиль**
54
+
55
+ **MBTI: {type} ({confidence})**
56
+ - E/I: {assessment} — {evidence}
57
+ - S/N: {assessment} — {evidence}
58
+ - T/F: {assessment} — {evidence}
59
+ - J/P: {assessment} — {evidence}
60
+
61
+ **стиль коммуникации:**
62
+ - голос: {voice_description}
63
+ - ритм: {rhythm_description}
64
+ - эмодзи: {emoji_pattern}
65
+ - формальный vs неформальный: {delta_between_linkedin_and_telegram}
66
+ - ключевые маркеры: {markers}
67
+
68
+ **ценности (по действиям):**
69
+ - {value_1}
70
+ - {value_2}
71
+ - {value_3}
72
+
73
+ ---
74
+
75
+ **личное**
76
+ - семья: {family_only_confirmed}
77
+ - питомцы: {pets_only_confirmed}
78
+ - увлечения: {hobbies}
79
+ - стиль жизни: {lifestyle}
80
+
81
+ ---
82
+
83
+ **наблюдения**
84
+ - {observation_1_contradictions_or_patterns}
85
+ - {observation_2}
86
+ - {observation_3}
87
+
88
+ ---
89
+
90
+ **карта достоверности**
91
+ - A (подтверждено, 2+ источника): {list_of_A_facts}
92
+ - B (вероятно, 1 надёжный источник): {list_of_B_facts}
93
+ - C (выведено, косвенные данные): {list_of_C_facts}
94
+ - D (не подтверждено): {list_of_D_facts}
95
+
96
+ ---
97
+
98
+ **точки входа**
99
+ - {approach_1_shared_connections_or_topics}
100
+ - {approach_2}
101
+ - {approach_3}
102
+
103
+ ---
104
+
105
+ **пробелы**
106
+ - {gap_1_what_not_found}
107
+ - {gap_2}
108
+
109
+ ---
110
+
111
+ **метрики**
112
+ - depth score: {score}/10
113
+ - coverage: {passed}/{total} checks
114
+ - источников: {source_count} ({source_types_list})
115
+ - внутренняя разведка: {yes_no_what_checked}
116
+ - потрачено: ${total_cost}
@@ -0,0 +1,100 @@
1
+ # Content Extraction Guide
2
+
3
+ When you find a content platform (YouTube, podcast, blog, conference talks) -
4
+ **extract everything immediately.** Don't just note the URL and move on.
5
+
6
+ ## Principle
7
+
8
+ Content platforms are the richest source for psychoprofile and real personality.
9
+ LinkedIn is a resume. Instagram is a highlight reel. YouTube/podcasts are unfiltered.
10
+
11
+ A person talking for 20 minutes on camera reveals more than 100 LinkedIn posts.
12
+
13
+ ## YouTube
14
+
15
+ ### Discovery
16
+ ```bash
17
+ # Search for channel
18
+ web_search "site:youtube.com <Name> <context>"
19
+ bash skills/osint/scripts/exa.sh search "<Name> youtube channel"
20
+ ```
21
+
22
+ ### Channel metadata
23
+ ```bash
24
+ # Fetch channel page - subscriber count, video count, about
25
+ web_fetch "https://www.youtube.com/@<handle>"
26
+ # Or channel page
27
+ bash skills/osint/scripts/jina.sh read "https://www.youtube.com/@<handle>/about"
28
+ ```
29
+
30
+ ### Transcript extraction (CRITICAL)
31
+ Pick 3-5 most viewed or most recent videos. Extract transcripts:
32
+ ```bash
33
+ # Fetch video page - title, description, comments
34
+ web_fetch "https://www.youtube.com/watch?v=<id>"
35
+ # Get auto-generated transcript via Jina
36
+ bash skills/osint/scripts/jina.sh read "https://www.youtube.com/watch?v=<id>"
37
+ ```
38
+
39
+ If transcripts unavailable via fetch, try:
40
+ ```bash
41
+ bash skills/osint/scripts/brightdata.sh scrape "https://www.youtube.com/watch?v=<id>"
42
+ ```
43
+
44
+ ### What to extract from YouTube:
45
+ - **Topics** - what they talk about = what they care about
46
+ - **Speaking style** - formal/casual, speed, filler words, humor
47
+ - **Vocabulary** - jargon level, code-switching between languages
48
+ - **Self-presentation** - humble/confident/arrogant, claims vs evidence
49
+ - **Recurring themes** - across videos, what keeps coming back
50
+ - **Guest interactions** - how they treat guests, who they invite
51
+ - **Comment section** - audience demographics, sentiment
52
+ - **Upload frequency** - consistency = discipline trait
53
+ - **Video titles** - clickbait vs informative = marketing approach
54
+ - **Playlists** - how they organize knowledge = thinking structure
55
+
56
+ ### Budget: 3-5 video transcripts via Jina = ~$0.02-0.05
57
+
58
+ ## Podcasts / Audio Appearances
59
+
60
+ Search for podcast appearances:
61
+ ```bash
62
+ web_search "<Name> podcast interview"
63
+ bash skills/osint/scripts/exa.sh search "<Name> podcast episode guest"
64
+ ```
65
+
66
+ Podcast interviews are gold - the host asks personal questions.
67
+ Look for: origin story, career pivots, failures mentioned, mentors named.
68
+
69
+ ## Blog / Personal Website
70
+
71
+ ```bash
72
+ web_fetch "<personal-site>"
73
+ bash skills/osint/scripts/jina.sh read "<blog-url>"
74
+ # Crawl multiple pages
75
+ bash skills/osint/scripts/exa.sh crawl "<blog-url>"
76
+ ```
77
+
78
+ Extract: writing style, topics, frequency, tone evolution over time.
79
+
80
+ ## Conference Talks / Speaker Bios
81
+
82
+ ```bash
83
+ web_search "<Name> speaker conference talk"
84
+ web_search "site:youtube.com <Name> conference keynote"
85
+ ```
86
+
87
+ Speaker bios at conferences are often more honest than LinkedIn -
88
+ written for specific audience, include unique details.
89
+
90
+ ## Rule: Immediate Deep Extraction
91
+
92
+ When you discover ANY content platform during Phase 1-2:
93
+
94
+ 1. **STOP** current task
95
+ 2. **Extract** 3-5 pieces of content immediately
96
+ 3. **Analyze** for psychoprofile signals
97
+ 4. **Resume** original task with enriched data
98
+
99
+ Do NOT bookmark for later. Later = never.
100
+ The insight from 3 YouTube transcripts outweighs 10 LinkedIn connections.