thoughtleaders-cli 0.6.55__tar.gz → 0.6.56__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (125) hide show
  1. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/.claude-plugin/plugin.json +1 -1
  2. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/PKG-INFO +1 -1
  3. thoughtleaders_cli-0.6.56/agents/youtube-comment-classifier.md +50 -0
  4. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/pyproject.toml +1 -1
  5. thoughtleaders_cli-0.6.56/skills/channel-authenticity/.gitignore +2 -0
  6. thoughtleaders_cli-0.6.56/skills/channel-authenticity/SKILL.md +127 -0
  7. thoughtleaders_cli-0.6.56/skills/channel-authenticity/references/comment-patterns.md +45 -0
  8. thoughtleaders_cli-0.6.56/skills/channel-authenticity/references/peer-cohort.md +47 -0
  9. thoughtleaders_cli-0.6.56/skills/channel-authenticity/references/red-flags.md +77 -0
  10. thoughtleaders_cli-0.6.56/skills/channel-authenticity/references/scoring.md +96 -0
  11. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/analyze_channel.py +197 -0
  12. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/anomaly_detector.py +211 -0
  13. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/comment_analyzer.py +385 -0
  14. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/comment_scraper.py +94 -0
  15. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/engagement_ratios.py +159 -0
  16. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/peer_cohort.py +190 -0
  17. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/report.py +135 -0
  18. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/resolve_channel.py +133 -0
  19. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/score.py +76 -0
  20. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/tl_cli.py +236 -0
  21. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/video_integrity.py +299 -0
  22. thoughtleaders_cli-0.6.56/skills/channel-authenticity/scripts/view_curves.py +133 -0
  23. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/__init__.py +1 -1
  24. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/.claude-plugin/marketplace.json +0 -0
  25. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/.github/workflows/python-publish.yml +0 -0
  26. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/.gitignore +0 -0
  27. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/AGENTS.md +0 -0
  28. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/API.md +0 -0
  29. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/CLAUDE.md +0 -0
  30. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/LICENSE +0 -0
  31. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/README.md +0 -0
  32. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/agents/tl-analyst.md +0 -0
  33. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/hooks/hooks.json +0 -0
  34. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/hooks/scripts/load-tl-skill.mjs +0 -0
  35. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/hooks/scripts/post-usage.sh +0 -0
  36. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/hooks/scripts/pre-check.sh +0 -0
  37. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl/SKILL.md +0 -0
  38. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl/references/business-glossary.md +0 -0
  39. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl/references/elasticsearch-schema.md +0 -0
  40. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl/references/firebolt-schema.md +0 -0
  41. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl/references/postgres-schema.md +0 -0
  42. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-import/SKILL.md +0 -0
  43. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-keyword-research/SKILL.md +0 -0
  44. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-keyword-research/scripts/probe.py +0 -0
  45. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/SKILL.md +0 -0
  46. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/examples/e2e_findings.md +0 -0
  47. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/examples/golden_queries.md +0 -0
  48. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/columns_brands.md +0 -0
  49. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/columns_channels.md +0 -0
  50. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/columns_content.md +0 -0
  51. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/columns_sponsorships.md +0 -0
  52. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/intelligence_filterset_schema.json +0 -0
  53. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/intelligence_widget_schema.json +0 -0
  54. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/report_glossary.md +0 -0
  55. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/sortable_columns.json +0 -0
  56. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/sponsorship_filterset_schema.json +0 -0
  57. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/sponsorship_widget_schema.json +0 -0
  58. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/references/widgets.md +0 -0
  59. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/column_builder.md +0 -0
  60. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/database_query.md +0 -0
  61. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/name_resolver.md +0 -0
  62. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/sample_judge.md +0 -0
  63. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/similar_channels.md +0 -0
  64. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/topic_matcher.md +0 -0
  65. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-report-builder/tools/widget_builder.md +0 -0
  66. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/SKILL.md +0 -0
  67. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/columns_brands.md +0 -0
  68. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/columns_channels.md +0 -0
  69. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/columns_content.md +0 -0
  70. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/columns_sponsorships.md +0 -0
  71. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/intelligence_filterset_schema.json +0 -0
  72. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/intelligence_widget_schema.json +0 -0
  73. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/report_glossary.md +0 -0
  74. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/sortable_columns.json +0 -0
  75. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/sponsorship_filterset_schema.json +0 -0
  76. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/sponsorship_widget_schema.json +0 -0
  77. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/skills/tl-save-report/references/widgets.md +0 -0
  78. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/_completions.py +0 -0
  79. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/auth/__init__.py +0 -0
  80. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/auth/commands.py +0 -0
  81. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/auth/finalize.py +0 -0
  82. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/auth/login.py +0 -0
  83. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/auth/pkce.py +0 -0
  84. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/auth/token_store.py +0 -0
  85. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/client/__init__.py +0 -0
  86. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/client/errors.py +0 -0
  87. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/client/http.py +0 -0
  88. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/__init__.py +0 -0
  89. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/_comments_common.py +0 -0
  90. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/balance.py +0 -0
  91. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/brands.py +0 -0
  92. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/bulk_import.py +0 -0
  93. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/changelog.py +0 -0
  94. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/channels.py +0 -0
  95. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/credits.py +0 -0
  96. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/db.py +0 -0
  97. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/deals.py +0 -0
  98. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/describe.py +0 -0
  99. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/doctor.py +0 -0
  100. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/matches.py +0 -0
  101. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/proposals.py +0 -0
  102. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/recommender.py +0 -0
  103. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/reports.py +0 -0
  104. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/schema.py +0 -0
  105. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/setup.py +0 -0
  106. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/snapshots.py +0 -0
  107. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/sponsorships.py +0 -0
  108. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/uploads.py +0 -0
  109. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/commands/whoami.py +0 -0
  110. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/config.py +0 -0
  111. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/filters.py +0 -0
  112. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/hints.py +0 -0
  113. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/main.py +0 -0
  114. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/output/__init__.py +0 -0
  115. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/output/formatter.py +0 -0
  116. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/src/tl_cli/self_update.py +0 -0
  117. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/__init__.py +0 -0
  118. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_auth.py +0 -0
  119. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_describe.py +0 -0
  120. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_filters.py +0 -0
  121. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_http_auth.py +0 -0
  122. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_output.py +0 -0
  123. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_reports.py +0 -0
  124. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/tests/test_sponsorships.py +0 -0
  125. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.6.56}/uv.lock +0 -0
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tl-cli",
3
- "version": "0.6.55",
3
+ "version": "0.6.56",
4
4
  "description": "ThoughtLeaders CLI — query sponsorship deals, channels, brands, uploads, and intelligence from the terminal",
5
5
  "author": {
6
6
  "name": "ThoughtLeaders",
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: thoughtleaders-cli
3
- Version: 0.6.55
3
+ Version: 0.6.56
4
4
  Summary: ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence
5
5
  Project-URL: Homepage, https://thoughtleaders.io
6
6
  Project-URL: Repository, https://github.com/ThoughtLeaders-io/thoughtleaders-cli
@@ -0,0 +1,50 @@
1
+ ---
2
+ name: youtube-comment-classifier
3
+ description: >
4
+ Classifies a batch of YouTube comments as organic vs bot/spam/template for
5
+ the channel-authenticity skill's fake-engagement detection. Use when you
6
+ have a JSON array of scraped comments and need a fast, cheap per-comment
7
+ authenticity judgment. Returns strict JSON only.
8
+ model: haiku
9
+ tools: Read
10
+ color: yellow
11
+ ---
12
+
13
+ # YouTube Comment Authenticity Classifier
14
+
15
+ You judge whether YouTube comments come from a real, engaged human audience or
16
+ from engagement padding (bots, comment farms, generic filler). You are used by
17
+ the `channel-authenticity` skill to vet channels before ThoughtLeaders books a
18
+ paid sponsorship, so false "organic" verdicts cost real money — be skeptical.
19
+
20
+ ## Input
21
+
22
+ A JSON array of objects: `[{"i": <int>, "text": "<comment>", "author": "<handle>"}, ...]`
23
+ The user message contains ONLY this array (possibly large). Channel context
24
+ (niche/language) may be provided in a leading line — use it if present.
25
+
26
+ ## Labels (choose exactly one per comment)
27
+
28
+ - **organic** — specific, on-topic, references the actual video/creator,
29
+ asks a real question, shares a relevant experience, natural language with
30
+ normal variation. Mild praise that names something specific counts.
31
+ - **generic-template** — vague praise that could be pasted on any video:
32
+ "nice video", "great content", "thanks for sharing", "first", lone emoji
33
+ strings, "love it ❤️". On-language but contentless.
34
+ - **bot-like** — off-topic, off-language for the channel, gibberish,
35
+ random-looking handle + 1–3 word body, repeated near-identical phrasing,
36
+ engagement bait.
37
+ - **promotional** — self-promo, "check out my channel", links, services.
38
+ - **spam** — scams, adult/crypto bait, malicious or nonsensical repetition.
39
+
40
+ When torn between organic and generic-template, prefer generic-template
41
+ unless the comment clearly engages with the specific video.
42
+
43
+ ## Output — STRICT
44
+
45
+ Return ONLY a JSON array, no prose, no markdown fence:
46
+
47
+ `[{"i": 0, "label": "organic"}, {"i": 1, "label": "bot-like"}, ...]`
48
+
49
+ One object per input comment, same `i` values, same length. No extra keys.
50
+ If the input is empty, return `[]`.
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "thoughtleaders-cli"
7
- version = "0.6.55"
7
+ version = "0.6.56"
8
8
  description = "ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence"
9
9
  readme = "README.md"
10
10
  license = "MIT"
@@ -0,0 +1,2 @@
1
+ # Generated at runtime by peer_cohort.py — niche engagement medians, cached per run.
2
+ references/peer-cohort-cache.json
@@ -0,0 +1,127 @@
1
+ ---
2
+ name: channel-authenticity
3
+ description: >
4
+ Detect non-organic views / fake engagement / bot comments on a YouTube
5
+ channel before booking (or after delivering) a sponsorship. Use when asked
6
+ to vet a channel, check if views/comments are real, investigate suspicious
7
+ engagement, audit a sponsorship delivery, or whenever someone shares a
8
+ YouTube channel/handle/URL and asks "is this real / safe to buy an ad on".
9
+ Triggers: "fake views", "bot comments", "non-organic", "is this channel
10
+ legit", "vet this channel", "engagement looks off", "audit this sponsorship".
11
+ ---
12
+
13
+ # Channel Authenticity
14
+
15
+ Takes a channel (handle / URL / numeric id / name) — or `adlink:<id>` for a
16
+ sponsorship drill-down — and returns a 0–100 authenticity score plus ranked
17
+ red-flag findings. Built and calibrated from real bought-view and comment-farm
18
+ investigations.
19
+
20
+ ## Hard rules
21
+
22
+ - **One mode. Every run does everything.** No flags, no opt-in tiers. Groups
23
+ A, B, and C all run, every time.
24
+ - **Comment scraping (Group C) is mandatory and never skipped.** Metrics and
25
+ view-curves can be hand-waved ("the algorithm", "we ran ads"); reading what
26
+ the audience actually says is the only direct proof. A run without it is
27
+ invalid.
28
+ - **Data access is CLI-only.** Everything goes through `tl_cli.py` → the
29
+ `tl` CLI (`tl db pg/fb/es`, `tl channels similar`). No database credentials
30
+ are ever used. If the CLI isn't authenticated the skill fails fast with a
31
+ clear message.
32
+
33
+ ## Setup check
34
+
35
+ ```bash
36
+ cd .claude/skills/channel-authenticity/scripts
37
+ python3 tl_cli.py preflight # must print "OK"
38
+ ```
39
+ If this errors with `cli_unavailable`, tell the user to run `tl auth login`
40
+ (or set `TL_API_KEY`). Comment scraping additionally needs `yt-dlp`
41
+ (`pip install yt-dlp`) — it uses the android InnerTube client so **no cookies
42
+ or API key are required**.
43
+
44
+ ## How to run (three phases — a classifier subagent sits between two CLI passes)
45
+
46
+ **Phase 1 — collect.** From the `scripts/` dir:
47
+ ```bash
48
+ python3 analyze_channel.py "<handle|url|id|name|adlink:ID>"
49
+ ```
50
+ This runs Groups A + B + C(rule-based), scrapes ≥10 latest longforms
51
+ (+ highest-view + most-recently-sponsored), and prints a JSON envelope with
52
+ `state_path`, `llm_batch_path`, and `llm_batch_size`.
53
+
54
+ If the ref matches **multiple channels** (common for names with localized
55
+ dupes), Phase 1 exits (code 4) with `{"error":"ambiguous_channel",
56
+ "candidates":[{id,name,subscribers}…]}` instead of guessing. Show the
57
+ candidates to the user — they're ordered by subscriber count, highest first
58
+ (the most likely intended) — let them pick, then re-run Phase 1 with that
59
+ numeric id.
60
+
61
+ **Phase 2 — classify comments (run the subagent TWICE).** Read
62
+ `llm_batch_path` (a JSON array of `{i, text, author}`) and send it to the
63
+ `youtube-comment-classifier` agent via the **Agent tool**
64
+ (`subagent_type: youtube-comment-classifier`) **twice** — two separate calls on
65
+ the same batch. Prepend one context line: `channel niche: cat
66
+ <content_category>, language <language>` (both values are in the envelope).
67
+ Each call returns a strict JSON array
68
+ `[{"i":N,"label":"organic|generic-template|bot-like|promotional|spam"}]`; save
69
+ each reply verbatim to its own file (e.g. `/tmp/ca_llm1.json`,
70
+ `/tmp/ca_llm2.json`).
71
+
72
+ Why twice: single-pass LLM labeling wobbles ±10pts, so finalize majority-votes
73
+ the two passes to keep the reported organic share stable. Sophisticated
74
+ AI-comment farms read as clean English at normal volume — only the classifier
75
+ catches them, so this pass is essential.
76
+
77
+ If the batch is empty (channel had almost no comments), skip the subagent and
78
+ pass an empty array `[]` — near-zero comments is itself the loudest signal,
79
+ and Group C already penalizes it.
80
+
81
+ **Phase 3 — finalize** (pass both classifier files):
82
+ ```bash
83
+ python3 analyze_channel.py --finalize <state_path> /tmp/ca_llm1.json /tmp/ca_llm2.json
84
+ ```
85
+ This applies the LLM verdict, computes the composite score, writes the final
86
+ JSON + markdown report to `/tmp`, and prints the report. Present that report
87
+ to the user (it's already formatted — peer comparison, group scores, ranked
88
+ flags, verdict).
89
+
90
+ ## Scoring (see references/scoring.md)
91
+
92
+ Three groups, each scored 0–100 independently (start at 100, subtract fixed
93
+ per-flag penalties). **Final = simple mean of the three.** Two hard
94
+ overrides force `FRAUD_LIKELY` (score capped at 39) regardless of the mean:
95
+ (1) Group C — non-organic audience (<30% organic from the classifier, or a
96
+ dead comment section); (2) Group B — concealed/misrepresented performance
97
+ (≥2 sold+published sponsored videos deleted/unlisted, or one with ≥5k views;
98
+ or ≥3 high-view videos scrubbed with ≥15% of tracked views gone).
99
+
100
+ Bands: ≥90 CLEAN · ≥70 MINOR_FLAGS · ≥40 MIXED · <40 FRAUD_LIKELY.
101
+
102
+ ## What each group checks
103
+
104
+ - **Group A — engagement & peer ratios** (`engagement_ratios.py`,
105
+ `peer_cohort.py`): like/comment rates measured against a niche-matched peer
106
+ baseline, plus audience-size sanity checks across longforms vs shorts.
107
+ - **Group B — view-curve anomalies + video integrity** (`view_curves.py`,
108
+ `anomaly_detector.py`, `video_integrity.py`): view-over-time curves that
109
+ don't behave like organic growth (bursts without engagement, guarantee
110
+ cliffs at round numbers, frozen likes, subs flat while views surge), plus
111
+ intent-aware detection of deleted/unlisted videos used to conceal or
112
+ misrepresent performance (benign re-uploads are excluded).
113
+ - **Group C — comment content** (`comment_scraper.py`, `comment_analyzer.py`
114
+ + classifier subagent): whether the comments are a real, engaged audience —
115
+ scarcity vs views, templating and near-duplicates, language mismatch,
116
+ bot-handle patterns, and the classifier's organic-share verdict.
117
+
118
+ Full catalogue + thresholds: `references/red-flags.md`. The exact `tl` queries
119
+ each check issues live in the scripts; the underlying channel/video/adlink
120
+ schema is documented in the `tl` skill (`skills/tl/references/`).
121
+
122
+ ## After a run
123
+
124
+ Offer to log the verdict (channel, score, top flags, date) to a "Channel
125
+ Vetting Log" sheet via the `gws` skill if the user wants an audit trail.
126
+ If you discover a new robust signal, add it to `references/red-flags.md` and
127
+ a penalty to `references/scoring.md` (self-improvement).
@@ -0,0 +1,45 @@
1
+ # Comment patterns
2
+
3
+ The generic-template phrase library and handle regexes used by
4
+ `comment_analyzer.py`. Extend as new padding patterns show up; keep the code
5
+ list (`GENERIC` in `comment_analyzer.py`) and this doc in sync.
6
+
7
+ ## Generic-template phrases (case-insensitive substring/exact)
8
+
9
+ ```
10
+ nice video, great video, great content, thanks for sharing, first,
11
+ love this, love it, keep it up, keep going, awesome, amazing, good job,
12
+ well done, very nice, so good, best video, informative, helpful,
13
+ thank you so much, wow, super, 👍, 🔥, ❤, great work, nice one,
14
+ good video, very helpful, excellent
15
+ ```
16
+
17
+ A comment counts as generic if its lowercased, punctuation-stripped form is
18
+ exactly one of these OR is ≤25 chars and contains one. Lone emoji strings are
19
+ caught separately by the emoji-only check.
20
+
21
+ ## Bot-handle regex
22
+
23
+ - `^@?[a-z]+[-_]?[0-9]{4,}$` — letters then 4+ digits (YouTube
24
+ auto-suffix style, e.g. `@viewer8821`, `@john_doe4417`). High-signal in bulk.
25
+
26
+ > Note: YouTube now appends short suffixes to many *real* handles too, so
27
+ > bot-handle share is a **supporting** signal (penalty 15), never decisive on
28
+ > its own. The decisive comment signals are scarcity and LLM-not-organic.
29
+
30
+ ## Language
31
+
32
+ Channel `language == 'en'`: a comment "matches" if ≥60% of its alphabetic
33
+ chars are ASCII letters. Emoji/number-only comments are excluded from the
34
+ denominator (handled by emoji-only / length checks instead). For non-English
35
+ channels the language check is skipped (we lack reliable per-language
36
+ baselines — revisit if we onboard many non-en channels).
37
+
38
+ ## What good looks like (contrast)
39
+
40
+ Real audiences on a tech channel reference specifics ("the 72→82 jump
41
+ convinced me", "where is part 1 and 2a?"), ask operational questions, argue,
42
+ and reply to each other. Padding is short, vague, off-language, emoji-heavy,
43
+ or planted product mentions ("X was built specifically for…", "signed up just
44
+ now with the launch code"). The Haiku classifier exists to catch the planted-
45
+ promotional class that keyword rules miss.
@@ -0,0 +1,47 @@
1
+ # Peer cohort
2
+
3
+ Group A's like:view / comment:view thresholds are **relative to a
4
+ niche-matched peer baseline**, not absolute — engagement norms vary wildly by
5
+ niche (gaming ≠ finance ≠ tech tutorials), so a fixed cutoff would
6
+ false-flag low-engagement-but-honest niches and miss high-engagement niches
7
+ being inflated.
8
+
9
+ ## How the cohort is built (`peer_cohort.py`)
10
+
11
+ 1. **Preferred:** `tl channels similar <id> --limit 24` (the recommender).
12
+ Best niche match.
13
+ 2. **Fallback** (recommender empty): PG cohort —
14
+ same `content_category` + `language`, `is_active`, `reach` within ±50%,
15
+ `last_published` within 60 days, excluding the subject channel.
16
+ 3. For up to 12 peers, pull each peer's last 10 longforms via `tl db es`,
17
+ require ≥5,000 aggregate views and ≥3 videos (skip dead peers).
18
+ 4. Baseline = **median** of peers' like-rate and comment-rate, plus the 25th
19
+ percentile for context.
20
+
21
+ A subject channel flags when its longform rate is **< 0.4× the peer median**.
22
+ 0.4× is intentionally generous — we only fire on gross deviation, not normal
23
+ variance. (The origin fraud case ran 0.008× the median; real channels cluster
24
+ 0.7–1.5×.)
25
+
26
+ ## Caching
27
+
28
+ Result cached in `peer-cohort-cache.json` keyed by
29
+ `content_category|language|reach_bucket`, TTL 30 days. Buckets:
30
+ `<10k, 10-50k, 50-150k, 150-500k, 500k-1m, 1-5m, 5m+`. This avoids re-spending
31
+ recommender credits and re-querying ES on every run. Force a rebuild by
32
+ deleting the cache file or calling `get_baseline(ch, refresh=True)`.
33
+
34
+ ## Last-resort fallback
35
+
36
+ If no usable peers at all (rare — niche too small / all peers dead), a generic
37
+ English-tech floor is used (`like 2%, comment 0.25%`) and `source` is recorded
38
+ as `fallback-generic` in the metrics so the report consumer knows the
39
+ baseline was weak. Prefer widening the reach band over trusting this.
40
+
41
+ ## Caveats
42
+
43
+ - The PG fallback uses `content_category` which is coarse; the recommender
44
+ (`tl channels similar`) is materially better — prefer it.
45
+ - Reach buckets are wide on purpose (engagement scales sub-linearly with
46
+ size); don't narrow them without re-checking the false-positive rate on a
47
+ known-clean channel.
@@ -0,0 +1,77 @@
1
+ # Red-flag catalogue
2
+
3
+ Every signal the skill checks, why it matters, and the threshold. Codes match
4
+ the `flags[].code` in the JSON output. Real cases are referenced by anonymized
5
+ label only.
6
+
7
+ ## Group A — engagement & peer ratios (`engagement_ratios.py`)
8
+
9
+ | code | trigger | why |
10
+ |---|---|---|
11
+ | `A_like_rate_vs_peers` | longform like:view < 0.4× peer-cohort median | Paid/bot views don't like. Origin case: 0.027% vs 3.4% peer median (125×). |
12
+ | `A_comment_rate_vs_peers` | longform comment:view < 0.4× peer median | Same logic for comments. Origin case: 78× below. |
13
+ | `A_views_to_subs` | avg longform views > 20% of subs | Healthy 1–15%. Origin case 28%. Implies non-subscriber/external traffic. |
14
+ | `A_longform_shorts_gap` | shorts like-rate ≥ 5× longform like-rate (shorts ≥0.3%) | Organic shorts + dead longforms ⇒ longforms are the promoted units. The smoking gun on the origin case (20×). |
15
+ | `A_organic_floor` | ≥ half of longforms exceed 5× the median-short view count | Non-viral shorts ≈ true audience size. Origin case median short = 688 views vs 180k longform. |
16
+ | `A_per_video_outliers` | ≥⅓ of longforms >1.5σ below the channel's own like:view mean | One real audience produces consistent ratios; promoted videos don't. |
17
+
18
+ Peer baseline: niche-matched (`tl channels similar`, fallback PG cohort:
19
+ same content_category+language, active, reach ±50%, published <60d), median
20
+ of each peer's last-10-longform like/comment rates. Cached 30 days.
21
+
22
+ ## Group B — view-curve time-series (`anomaly_detector.py`)
23
+
24
+ | code | trigger | why |
25
+ |---|---|---|
26
+ | `B_burst_without_engagement` | a Δ-segment with Δviews/day > 3× rolling mean, >5k views, and segment like-rate < ½ lifetime | Real virality brings likes; injected views don't. |
27
+ | `B_engagement_incoherence` | Pearson r(Δviews, Δlikes+Δcomments) < 0.2 over the curve | Organic videos: views and engagement move together (r>0.6). Fraud: decoupled. |
28
+ | `B_guarantee_cliff` | plateau within 5% of a round number (50k/100k/250k/500k/1M…) by age ≤60 then flat | A bought-view case: bought to a 500k guarantee, cliffed at 581k. |
29
+ | `B_slow_start_late_spike` | views@2 < 25th-pctile-ish (< 0.15× final) AND views@10/views@2 > 8 | Paid traffic switched on days after publish — classic bought-view signature. |
30
+ | `B_latelife_drip_frozen_likes` | age ≥20 segment with >3k new views but ≤1 new like | Post-publish ad campaigns drip views with zero engagement. Seen on every video of the origin case. |
31
+ | `B_subs_flat_while_views_surge` | < 30 new subs per 100k channel views over snapshot window | A bought-view case: 27 subs / 580k views. Viewers don't convert ⇒ not real interest. |
32
+
33
+ Interpolation (`view_curves.py`) is self-contained (linear + log bracket
34
+ interpolation, per-segment deltas) — no external dependency.
35
+
36
+ ### Group B add-on — video integrity (`video_integrity.py`)
37
+
38
+ Deletion/unlisting is **not** a signal by itself — channels legitimately
39
+ re-upload and clean house. The signal is deletion used to **conceal or
40
+ misrepresent performance**. Source: ES `offline_since` (exists ⇒ video gone)
41
+ and `content_aspects` containing `'unlisted'`. Intent is inferred from
42
+ view count, age-at-removal, and whether the video was a paid sponsorship.
43
+
44
+ | code | trigger | why |
45
+ |---|---|---|
46
+ | `B_sponsored_video_concealed` | a SOLD+PUBLISHED adlink's video is now offline/unlisted | Brand paid, ad went live, delivery then hidden. Bad-faith + finance/delivery alarm. Hard-fail if ≥2, or one with ≥5k views. |
47
+ | `B_high_view_video_scrub` | offline video(s) above the channel's high-view bar (max of 50k or 25% of median) | You don't delete a 2M-view video by accident. Penalty scales by the **share of tracked views** gone, not raw count (big channels always shed a few old high-view videos): ≥15% → −25 critical; ≥3% → −12 warning; <3% → recorded, not penalized. Hard-fail only if ≥3 videos AND ≥15% of views vanished. |
48
+ | `B_unlisted_with_traffic` | unlisted video still carrying ≥20k views | Hidden from channel page/subscribers while accruing views — running content the organic audience never sees. |
49
+
50
+ Benign (recorded in metrics, **not** penalized): removed ≤7d after publish,
51
+ <5k views, non-sponsored (re-upload/mistake — e.g. a 713-view video pulled
52
+ 2 days after publish).
53
+
54
+ ## Group C — comment content (`comment_analyzer.py` + Haiku subagent)
55
+
56
+ | code | trigger | why |
57
+ |---|---|---|
58
+ | `C_comment_scarcity` | viewer comments < 15% of a 1-per-2,000-views floor (scraped ≥50k views) | The single loudest signal. Origin case: ~21 comments across ~1.8M scraped views. Measured on freshly-scraped comments so it can't be a stale count. |
59
+ | `C_language_mismatch` | <60% of comments in channel language (en channels) | Off-language comment farms — e.g. off-language/emoji junk flooding an English channel. |
60
+ | `C_generic_templates` | >40% generic ("nice video", lone emoji…) | Padding. Library in `comment-patterns.md`. |
61
+ | `C_length_uniform` | ≥70% ≤5 words AND median <8 words | Bots cluster short; real audiences have a long tail. |
62
+ | `C_emoji_only` | >25% emoji-only / no real text | Filler. |
63
+ | `C_bot_usernames` | >30% handles match `^@?[a-z]+[-_]?\d{4,}$` **AND** LLM organic share < 55% | YouTube's own default handles match this pattern too, so it's only a tell when the audience is independently suspect — fires as corroboration in the LLM step, never on format alone. |
64
+ | `C_near_duplicates` | largest token-Jaccard>0.7 cluster >10% | Templated posting. |
65
+ | `C_low_reply_ratio` | <5% of top comments have any reply | Real audiences converse. |
66
+ | `C_no_creator_engagement` | creator hearts 0 comments | Creator ignores a section they know is fake. |
67
+ | `C_commenter_churn` | <2% commenters appear on >1 video | No recurring fanbase; throwaway accounts. |
68
+ | `C_time_clustered` | >50% of comments in first hour on a weeks-old video | Burst posting. |
69
+ | `C_llm_not_organic` | Haiku classifier <50% organic | Catches subtle patterns rules miss. <30% ⇒ hard override → FRAUD_LIKELY. |
70
+
71
+ ## Contributing new signals
72
+
73
+ Found a robust new tell? Add a row here, add a penalty + severity to the
74
+ relevant `PENALTIES` dict in the script, and document the penalty in
75
+ `scoring.md`. Keep thresholds evidence-based, but **reference cases by
76
+ anonymized label only — never a channel name, id, or handle** (this skill
77
+ ships in a public repo).
@@ -0,0 +1,96 @@
1
+ # Scoring
2
+
3
+ Deliberately simple (per the approved plan). Three check groups, each scored
4
+ **independently 0–100**: start at 100, subtract the fixed per-flag penalty for
5
+ every triggered flag, floor at 0. **Final score = simple mean of the three
6
+ group sub-scores.** No weighting matrix, no bonuses, no per-group caps.
7
+
8
+ ```
9
+ final = (A + B + C) / 3
10
+ ```
11
+
12
+ ## Verdict bands
13
+
14
+ | score | verdict | advice |
15
+ |---|---|---|
16
+ | ≥ 90 | CLEAN | Safe to book at standard rates |
17
+ | ≥ 70 | MINOR_FLAGS | Book but note caveats to the AM |
18
+ | ≥ 40 | MIXED | Manual review; consider rate reduction |
19
+ | < 40 | FRAUD_LIKELY | Do not book without senior sign-off + heavy discount |
20
+
21
+ ## Hard overrides
22
+
23
+ If either trigger fires, the verdict is forced to **FRAUD_LIKELY** and the
24
+ score capped at 39 regardless of the mean:
25
+
26
+ 1. **Group C — non-organic audience:** Haiku classifier organic share
27
+ **< 30%** (`group_c.hard_fail`), or an effectively dead comment section
28
+ (<8-viewer-comment early exit). Fake comments are the most direct proof of
29
+ a fake audience.
30
+ 2. **Group B — concealed/misrepresented performance** (`group_b.hard_fail`):
31
+ ≥2 sold+published sponsored videos offline/unlisted (or one with ≥5k
32
+ views); OR ≥3 high-view videos scrubbed AND ≥15% of all tracked views
33
+ gone. Using deletion to hide paid delivery or strike-bait is bad faith,
34
+ not housekeeping.
35
+
36
+ Neither fires for benign deletion (low-view, young, non-sponsored re-uploads
37
+ are excluded before scoring).
38
+
39
+ ## Penalties (authoritative list)
40
+
41
+ Penalties live in each script's `PENALTIES` dict; this table mirrors them.
42
+ Severity drives report ordering/icons only, not math.
43
+
44
+ ### Group A — `engagement_ratios.py`
45
+ | code | penalty | severity |
46
+ |---|---|---|
47
+ | A_like_rate_vs_peers | 30 | critical |
48
+ | A_comment_rate_vs_peers | 25 | critical |
49
+ | A_longform_shorts_gap | 25 | critical |
50
+ | A_views_to_subs | 15 | warning |
51
+ | A_organic_floor | 15 | warning |
52
+ | A_per_video_outliers | 10 | info |
53
+
54
+ ### Group B — `anomaly_detector.py` + `video_integrity.py`
55
+ | code | penalty | severity |
56
+ |---|---|---|
57
+ | B_burst_without_engagement | 25 | critical |
58
+ | B_engagement_incoherence | 25 | critical |
59
+ | B_latelife_drip_frozen_likes | 20 | critical |
60
+ | B_guarantee_cliff | 15 | warning |
61
+ | B_slow_start_late_spike | 15 | warning |
62
+ | B_subs_flat_while_views_surge | 15 | warning |
63
+ | B_sponsored_video_concealed | 30 | critical |
64
+ | B_high_view_video_scrub | 25 crit (≥15% views gone) / 12 warn (≥3%) / 0 (<3%) | scaled by view-share |
65
+ | B_unlisted_with_traffic | 15 | warning |
66
+
67
+ ### Group C — `comment_analyzer.py`
68
+ | code | penalty | severity |
69
+ |---|---|---|
70
+ | C_comment_scarcity | 35 | critical |
71
+ | C_llm_not_organic | 30 | critical |
72
+ | C_language_mismatch | 20 | critical |
73
+ | C_generic_templates | 18 | warning |
74
+ | C_bot_usernames | 15 | warning |
75
+ | C_near_duplicates | 15 | warning |
76
+ | C_length_uniform | 12 | warning |
77
+ | C_commenter_churn | 12 | warning |
78
+ | C_emoji_only | 10 | info |
79
+ | C_low_reply_ratio | 8 | info |
80
+ | C_sentiment_uniform | 8 | info |
81
+ | C_time_clustered | 8 | info |
82
+
83
+ `C_bot_usernames` is **conditional**: the auto-generated-handle share is always
84
+ recorded as a metric, but the −15 only applies in the LLM step when organic
85
+ share is also low (< 55%). YouTube's own default handles are letters+digits,
86
+ so on a healthy-organic channel a high share is noise — it fires only as
87
+ corroboration when the audience is independently suspect.
88
+
89
+ Penalties intentionally let two criticals in a group drive it near zero — a
90
+ channel with two independent strong fraud signals in one dimension should not
91
+ score "mixed". Tune here as cases accumulate; record the reasoning in the
92
+ commit message.
93
+
94
+ ## Reference result
95
+
96
+ Origin fraud case (AI/coding channel): A=0, B=15, C=27 → **14.0 FRAUD_LIKELY**.