thoughtleaders-cli 0.6.55__tar.gz → 0.7.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/.claude-plugin/plugin.json +1 -1
  2. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/AGENTS.md +17 -33
  3. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/PKG-INFO +7 -6
  4. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/README.md +6 -5
  5. thoughtleaders_cli-0.7.0/agents/youtube-comment-classifier.md +50 -0
  6. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/pyproject.toml +1 -1
  7. thoughtleaders_cli-0.7.0/skills/channel-authenticity/.gitignore +2 -0
  8. thoughtleaders_cli-0.7.0/skills/channel-authenticity/SKILL.md +127 -0
  9. thoughtleaders_cli-0.7.0/skills/channel-authenticity/references/comment-patterns.md +45 -0
  10. thoughtleaders_cli-0.7.0/skills/channel-authenticity/references/peer-cohort.md +47 -0
  11. thoughtleaders_cli-0.7.0/skills/channel-authenticity/references/red-flags.md +77 -0
  12. thoughtleaders_cli-0.7.0/skills/channel-authenticity/references/scoring.md +96 -0
  13. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/_io_utf8.py +53 -0
  14. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/analyze_channel.py +199 -0
  15. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/anomaly_detector.py +213 -0
  16. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/comment_analyzer.py +386 -0
  17. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/comment_scraper.py +96 -0
  18. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/engagement_ratios.py +160 -0
  19. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/peer_cohort.py +192 -0
  20. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/report.py +138 -0
  21. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/resolve_channel.py +135 -0
  22. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/score.py +79 -0
  23. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/tl_cli.py +241 -0
  24. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/video_integrity.py +301 -0
  25. thoughtleaders_cli-0.7.0/skills/channel-authenticity/scripts/view_curves.py +135 -0
  26. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl/SKILL.md +4 -6
  27. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl/references/postgres-schema.md +3 -1
  28. thoughtleaders_cli-0.7.0/skills/tl-views-guarantee/SKILL.md +117 -0
  29. thoughtleaders_cli-0.7.0/skills/tl-views-guarantee/scripts/vg.py +371 -0
  30. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/__init__.py +1 -1
  31. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/brands.py +36 -27
  32. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/channels.py +18 -8
  33. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/doctor.py +9 -0
  34. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/.claude-plugin/marketplace.json +0 -0
  35. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/.github/workflows/python-publish.yml +0 -0
  36. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/.gitignore +0 -0
  37. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/API.md +0 -0
  38. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/CLAUDE.md +0 -0
  39. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/LICENSE +0 -0
  40. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/agents/tl-analyst.md +0 -0
  41. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/hooks/hooks.json +0 -0
  42. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/hooks/scripts/load-tl-skill.mjs +0 -0
  43. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/hooks/scripts/post-usage.sh +0 -0
  44. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/hooks/scripts/pre-check.sh +0 -0
  45. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl/references/business-glossary.md +0 -0
  46. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl/references/elasticsearch-schema.md +0 -0
  47. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl/references/firebolt-schema.md +0 -0
  48. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-import/SKILL.md +0 -0
  49. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-keyword-research/SKILL.md +0 -0
  50. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-keyword-research/scripts/probe.py +0 -0
  51. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/SKILL.md +0 -0
  52. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/examples/e2e_findings.md +0 -0
  53. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/examples/golden_queries.md +0 -0
  54. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/columns_brands.md +0 -0
  55. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/columns_channels.md +0 -0
  56. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/columns_content.md +0 -0
  57. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/columns_sponsorships.md +0 -0
  58. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/intelligence_filterset_schema.json +0 -0
  59. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/intelligence_widget_schema.json +0 -0
  60. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/report_glossary.md +0 -0
  61. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/sortable_columns.json +0 -0
  62. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/sponsorship_filterset_schema.json +0 -0
  63. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/sponsorship_widget_schema.json +0 -0
  64. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/references/widgets.md +0 -0
  65. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/column_builder.md +0 -0
  66. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/database_query.md +0 -0
  67. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/name_resolver.md +0 -0
  68. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/sample_judge.md +0 -0
  69. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/similar_channels.md +0 -0
  70. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/topic_matcher.md +0 -0
  71. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-report-builder/tools/widget_builder.md +0 -0
  72. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/SKILL.md +0 -0
  73. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/columns_brands.md +0 -0
  74. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/columns_channels.md +0 -0
  75. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/columns_content.md +0 -0
  76. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/columns_sponsorships.md +0 -0
  77. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/intelligence_filterset_schema.json +0 -0
  78. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/intelligence_widget_schema.json +0 -0
  79. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/report_glossary.md +0 -0
  80. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/sortable_columns.json +0 -0
  81. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/sponsorship_filterset_schema.json +0 -0
  82. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/sponsorship_widget_schema.json +0 -0
  83. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/skills/tl-save-report/references/widgets.md +0 -0
  84. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/_completions.py +0 -0
  85. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/auth/__init__.py +0 -0
  86. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/auth/commands.py +0 -0
  87. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/auth/finalize.py +0 -0
  88. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/auth/login.py +0 -0
  89. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/auth/pkce.py +0 -0
  90. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/auth/token_store.py +0 -0
  91. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/client/__init__.py +0 -0
  92. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/client/errors.py +0 -0
  93. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/client/http.py +0 -0
  94. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/__init__.py +0 -0
  95. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/_comments_common.py +0 -0
  96. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/balance.py +0 -0
  97. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/bulk_import.py +0 -0
  98. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/changelog.py +0 -0
  99. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/credits.py +0 -0
  100. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/db.py +0 -0
  101. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/deals.py +0 -0
  102. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/describe.py +0 -0
  103. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/matches.py +0 -0
  104. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/proposals.py +0 -0
  105. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/recommender.py +0 -0
  106. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/reports.py +0 -0
  107. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/schema.py +0 -0
  108. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/setup.py +0 -0
  109. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/snapshots.py +0 -0
  110. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/sponsorships.py +0 -0
  111. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/uploads.py +0 -0
  112. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/commands/whoami.py +0 -0
  113. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/config.py +0 -0
  114. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/filters.py +0 -0
  115. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/hints.py +0 -0
  116. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/main.py +0 -0
  117. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/output/__init__.py +0 -0
  118. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/output/formatter.py +0 -0
  119. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/src/tl_cli/self_update.py +0 -0
  120. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/__init__.py +0 -0
  121. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_auth.py +0 -0
  122. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_describe.py +0 -0
  123. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_filters.py +0 -0
  124. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_http_auth.py +0 -0
  125. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_output.py +0 -0
  126. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_reports.py +0 -0
  127. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/tests/test_sponsorships.py +0 -0
  128. {thoughtleaders_cli-0.6.55 → thoughtleaders_cli-0.7.0}/uv.lock +0 -0
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tl-cli",
3
- "version": "0.6.55",
3
+ "version": "0.7.0",
4
4
  "description": "ThoughtLeaders CLI — query sponsorship deals, channels, brands, uploads, and intelligence from the terminal",
5
5
  "author": {
6
6
  "name": "ThoughtLeaders",
@@ -1,6 +1,6 @@
1
1
  # Project Overview
2
2
 
3
- **tl-cli** is a Python CLI for querying ThoughtLeaders sponsorship data (sponsorships, channels, brands, uploads, snapshots, reports, recommender). Built with Typer + Rich + httpx. Designed as an "agent-first tool" — the CLI handles structured commands and output, while the user's AI agent (Claude) provides intelligence.
3
+ **tl-cli** is a Python CLI for querying ThoughtLeaders sponsorship data (sponsorships, channels, brands, uploads, snapshots, reports, recommender). Built with Typer + Rich + httpx. Designed as an "AI agent-first tool" — the CLI handles data commands and output, while the user's AI agent (Claude) provides decision making.
4
4
 
5
5
  # Architecture
6
6
 
@@ -11,25 +11,16 @@
11
11
  ## Command Pattern (all data commands follow this)
12
12
 
13
13
  Every data command in `src/tl_cli/commands/` uses explicit Typer subcommands:
14
- - `list` — list/search with `key:value` filters as positional args
15
14
  - `show` — detail view by ID
16
15
  - `history` — historical data list
17
16
  - `create` / `add` — create new records (where applicable)
18
17
 
19
18
  When adding a new data command, follow this pattern. See `sponsorships.py` for the reference implementation.
20
19
 
21
- `deals`, `matches`, and `proposals` are shortcut commands that delegate to sponsorships' `do_list`/`do_show`/`do_create` with a pre-set status filter. They reject explicit `status:` filters — users should use `tl sponsorships list` for finer-grained status filtering.
22
-
23
- `recommender` (`commands/recommender.py`) wraps the recommender API at `/api/cli/v1/recommender/*` — `tags` (free), `top-channels` / `top-profiles` / `top-brands`, `inspect-channel`, `inspect-brand`, `similar-to-profile` (all 25 credits flat, Intelligence-gated). The three `top-*` URLs share one server resolver; `top-brands` dedupes the underlying profile rows by brand. Channel→channel and brand→brand similarity stay on `tl channels similar` / `tl brands similar`. When updating the SKILL or examples, prefer steering category/topic discovery (e.g. "Cooking channels") to `tl recommender top-channels "<tag>"` rather than `WHERE content_category = <code>` SQL — the recommender is ranked, not equality-based. The underlying recommender code uses "element"/"field_name" terminology; the CLI/API layer renames these to "tag" at the boundary.
24
-
25
- ## Filter Parsing (`filters.py`)
26
-
27
- `parse_filters()` handles `key:value` and `key:"quoted value"` syntax. Returns `dict[str, str]` passed as query params. Date filter keys (listed in `DATE_FILTER_KEYS` — e.g. `since`, `created-at`, `created-at-start`, `publish-date-end`) accept keywords `today`, `yesterday`, `tomorrow`. Sponsorship date fields (`created-at`, `publish-date`, `purchase-date`, `send-date`) each expose three filter shapes: bare `<field>:<date>` matches within that date/period, and `<field>-start:` / `<field>-end:` give inclusive lower/upper bounds (both sides inclusive; partial dates expand to the whole period). Empty-string values result in `IS NULL` queries on the backend.
28
-
29
20
  ## Auth Flow (`auth/`)
30
21
 
31
22
  - **PKCE + Auth0**: Browser-based login with localhost callback server (`login.py`)
32
- - **Token Storage** (`token_store.py`): OS keyring primary, `~/.config/tl/credentials.json` fallback (0o600)
23
+ - **Token Storage** (`token_store.py`): OS keyring primary, `~/.config/tl/credentials.json` fallback (chmod 0o600)
33
24
  - **Env override**: `TL_API_KEY` env var takes priority over keyring (for CI)
34
25
  - **Auto-refresh**: `TLClient` refreshes expired tokens on 401
35
26
 
@@ -52,6 +43,8 @@ TTY-aware: Rich tables in terminal, JSON when piped. Flags: `--json`, `--csv`, `
52
43
  The CLI integrates with AI coding agents via skills, commands, agents, and hooks.
53
44
 
54
45
  - **Claude Code** - `tl setup claude`
46
+ - **Gemini** - `tl setup gemini`
47
+ - **Codex** - `tl setup codex`
55
48
  - **OpenCode** - `tl setup opencode`
56
49
 
57
50
  This repo is also a Claude Code plugin, and can directly be installed as one.
@@ -60,8 +53,7 @@ This repo is also a Claude Code plugin, and can directly be installed as one.
60
53
 
61
54
  - **`tl`** — the main skill for querying ThoughtLeaders data. Default for any sponsorship / channel / brand / upload / report question.
62
55
  - **`tl-keyword-research`** — invoke whenever the user wants to find videos or channels by **content keywords** (topics, concepts, niches) that aren't covered by a curated recommender tag, OR to validate that a candidate channel's content actually touches a given topic. Returns `{operator, keywords:[{keyword,count}]}` from a ranked ES probe over `title` / `summary` / `transcript`; the caller then runs the actual content search with the surviving high-count terms. **Do not compose keyword sets by hand for `tl db es` content searches — delegate to this skill first.** See `skills/tl/SKILL.md` → *Channel & video discovery* for the four-path decision tree and when to use this vs the recommender / raw SQL.
63
- - **`tl-report-builder`** — invoke when the user wants to build, refine, or save a platform report (campaign config, FilterSet, columns, widgets). Multi-phase flow: routing schema + validation columns widgets.
64
- - **`tl-import`**, **`tl-save-report`**, **`adapt-tl-data`** — narrower workflows; the skill files document their own triggers.
56
+ - **`tl-import`**, **`tl-save-report`**, **`adapt-tl-data`**, **`tl-views-guarantee`** narrower workflows; the skill files document their own triggers.
65
57
 
66
58
  ### Skill content boundaries
67
59
 
@@ -73,21 +65,6 @@ Skills under `skills/` are split into a `SKILL.md` and one or more `references/*
73
65
 
74
66
  When adding or updating skill content, place the fact in its single home and link from the others. Do not duplicate or "quick-recap" content across files — recaps are the highest drift surface.
75
67
 
76
- #### Anti-pattern: skill-local schema references
77
-
78
- When a dependent skill (e.g. `tl-report-builder`) needs to reference a schema fact (table layout, columns, fetch SQL, hallucinated-column markers), **link to the canonical home in `skills/tl/references/`** — do not create a new `<skill>/references/*.md` file that mirrors or paraphrases that content.
79
-
80
- Concrete regression marker: an earlier branch added `skills/tl-report-builder/references/data_plane.md` to consolidate the `thoughtleaders_topics` fetch query out of inline tool text. That had the right *shape* (don't restate schema in tool prose) but the wrong *home* — it forked schema facts into a parallel reference file that would silently drift from `skills/tl/references/postgres-schema.md`. The fix was to land the columns + fetch SQL + regression markers in `postgres-schema.md` (and upstream in `tl-data/references/postgres-schema.md`), delete the local file, and rewire the references via Markdown links.
81
-
82
- Rule of thumb: if you are about to write *"here's the SQL to query this table"* or *"these columns don't exist on this table"* anywhere outside `skills/tl/references/`, stop. Add the fact to the canonical reference, then link to the anchor. Same for business facts and the glossary.
83
-
84
- Skill-local `references/*.md` ARE appropriate when the content is **skill-shaped**, not schema-shaped:
85
- - Column metadata for a specific report type (sortable columns, formula templates) — `tl-report-builder/references/columns_*.md`
86
- - JSON schemas for tool-specific request/response shapes — `tl-report-builder/references/*_schema.json`
87
- - Disambiguation tables, defaults, and pitfall catalogues that exist only in this skill's flow — `tl-report-builder/references/report_glossary.md`
88
-
89
- If you are unsure whether a fact is schema-shaped or skill-shaped, ask: "would another TL skill (analyst, finance, mbn-outreach) ever need this fact?" If yes, it's schema/business-shaped — promote it to the canonical home.
90
-
91
68
  ## API Response Envelope
92
69
 
93
70
  All list endpoints return: `{ results, total, limit, offset, usage: { credits_charged, credit_rate, balance_remaining }, _breadcrumbs }`.
@@ -109,9 +86,11 @@ The version string is defined in three files and all three must be updated toget
109
86
  - `.claude-plugin/plugin.json` — `"version": "x.y.z"`
110
87
  - `src/tl_cli/__init__.py` — `__version__ = "x.y.z"`
111
88
 
112
- ## Important Constraint
89
+ ## Creating a release
113
90
 
114
- `tl snapshots video` requires `--channel` flag Firebolt queries without a channel partition are unbounded.
91
+ A "release" means using the `gh` command to create a release on GitHub, named like the current package version number.
92
+
93
+ Warn the user if they are creating a release and the latest commit didn't bump the version number, and ask for confirmation before releasing.
115
94
 
116
95
  ## Coding
117
96
 
@@ -119,14 +98,19 @@ The version string is defined in three files and all three must be updated toget
119
98
  * Do not let server implementation details into skill files (anything under `skills/`). Skills describe *what the CLI does* from the user's seat — observable command surface, inputs, outputs, examples. Do not say "the server enforces X", "the API validates Y on its side", "the backend rejects Z" — those are mechanism notes that drift the moment the server changes. State the user-visible behaviour ("unknown keys come back as 400") without naming where it's enforced.
120
99
  * **All `import` and `from X import Y` statements live at the top of the Python module file** — after the module docstring, before any code. No inline imports inside function bodies, no lazy imports for "speed" or "optional dependency" reasons. `from __future__ import …` goes at the very top (Python requires that). The only legitimate inline-import exception is **platform-conditional imports** that cannot succeed on the other platform (e.g. `import msvcrt` on Linux, `import termios`/`tty` on Windows) — those stay inside their `if sys.platform == …:` guard. If a circular-import problem makes a top-level import impossible, fix the circular dependency rather than working around it with an inline import.
121
100
 
101
+ # Updating
102
+
103
+ The `tl update --force` command will force an update of the `thoughtleaders-cli` package.
104
+ The auto-update feature keeps the package updated, by checking (cached) on each command invocation.
105
+
122
106
  # Git commit rules
123
107
 
124
108
  Do not reference internal architecture of the ThoughtLeaders app in commit messages.
125
109
 
126
- When a feature is purely server-side but changes the data the CLI receives (e.g. adding, removing, or renaming a field on a response, changing a credit rate, expanding an enum), make a forced empty commit on the tl-cli repo (`git commit --allow-empty`) describing the change. This keeps the CLI repo's history a complete log of what users see, even when no client code had to change.
110
+ When a feature is purely server-side but changes the data the CLI receives (e.g. adding, removing, or renaming a field on a response, changing a credit rate, expanding an enum), make a forced empty commit on the tl-cli repo (`git commit --allow-empty`) describing the change. This keeps the CLI repo's history a complete log of what users see, even when no client code had to change. The `tl changelog` command will read this log to show to the users.
127
111
 
128
112
  # Be aware of tests
129
113
 
130
- For every feature or change, explicitly consider whether tests need to be added or updated — new endpoint, new model field, new CLI command, new validation rule, new error path, anything that changes user-visible behaviour. Don't ship a feature without asking "what test covers this?" If no test does and the surface is non-trivial, write one. This applies across all repos involved in the change (server-side changes that ripple into the CLI need both server tests and CLI tests updated).
114
+ For every feature or change, explicitly consider whether tests need to be added or updated, on this repo or on the server repo — new endpoint, new model field, new CLI command, new validation rule, new error path, anything that changes user-visible behaviour. Don't ship a feature without asking "what test covers this?" If no test does and the surface is non-trivial, write one. This applies across all repos involved in the change (server-side changes that ripple into the CLI need both server tests and CLI tests updated).
131
115
 
132
- Be sure to check if tests need to be updated when changing any data structures or function names, in all repos involved in the change.
116
+ Be sure to check if tests need to be updated when changing any data structures or function names, in all repos involved in the change
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: thoughtleaders-cli
3
- Version: 0.6.55
3
+ Version: 0.7.0
4
4
  Summary: ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence
5
5
  Project-URL: Homepage, https://thoughtleaders.io
6
6
  Project-URL: Repository, https://github.com/ThoughtLeaders-io/thoughtleaders-cli
@@ -37,7 +37,7 @@ ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligenc
37
37
  ### For account managers and sales
38
38
 
39
39
  - **Pipeline reporting on the fly.** *"How many deals did we close in Q1?"*, *"What's my weighted pipeline by sales owner?"*, *"Which proposals are stuck in `pending` for more than 14 days?"* — one raw SQL or one structured command, instead of waiting on a dashboard.
40
- - **Brand intelligence in seconds.** *"What channels does Nike sponsor?"*, *"Which brands sponsor `MrBeast`?"*, *"What's Holafly's sponsorship history through us vs. through everyone?"* — answers are one `tl brands history` or one `tl db es` call away.
40
+ - **Brand intelligence in seconds.** *"What channels does Nike sponsor?"*, *"Which brands sponsor `MrBeast`?"*, *"What's Holafly's sponsorship history through us vs. through everyone?"* — answers are one `tl db es` call away.
41
41
  - **Vetting candidates before a pitch.** Look up a channel by ID, name, YouTube URL, or `@handle`; pull its adspots, audience demographics, evergreenness score, and detected sponsor history before drafting the IO.
42
42
  - **Pre-flight before booking.** Confirm MSN/TPP membership, integration availability, and persona/plan eligibility for a brand profile with one SQL join.
43
43
 
@@ -66,6 +66,7 @@ ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligenc
66
66
  - [jq](https://stedolan.github.io/jq/)
67
67
  - [ripgrep](https://github.com/BurntSushi/ripgrep)
68
68
  - [duckdb](https://duckdb.org/)
69
+ - [yt-dlp](https://github.com/yt-dlp/yt-dlp)
69
70
 
70
71
  For automated installs on MacOS, prefer installing Homebrew and use the pipx package manager, but ask the user if they have the admin access (sudo) password first. If not, proceed by using `uv` and installing everything locally.
71
72
 
@@ -170,8 +171,7 @@ tl recommender brands-for-channel 12345 # Brands most likely to sponsor
170
171
 
171
172
  # Brand intelligence
172
173
  tl brands show Nike
173
- tl brands history Nike # Detected sponsorships from ES
174
- tl brands history-stats Nike # Aggregate roll-up (totals, first/last seen, top channels)
174
+ tl brands find Nike # Resolve a string → single brand id
175
175
 
176
176
  # Search videos and transcripts via Elasticsearch
177
177
  tl db es '{"size":20,"query":{"term":{"channel.id":12345}},"_source":["title","views"]}'
@@ -196,7 +196,7 @@ tl balance
196
196
 
197
197
  # Health check — auth, connectivity, version, latency, and required external tools.
198
198
  # Run this first when something feels off; it surfaces token expiry,
199
- # missing `jq`/`rg`/`duckdb`, and slow endpoints in one snapshot.
199
+ # missing `jq`/`rg`/`duckdb`/`yt-dlp`, and slow endpoints in one snapshot.
200
200
  tl doctor
201
201
  ```
202
202
 
@@ -276,9 +276,10 @@ Each agent discovers the skill automatically and uses it when you ask about spon
276
276
 
277
277
  The plugin ships several focused skills (installed by all the `tl setup *` commands):
278
278
 
279
- - **`tl`** — the data-analyst skill. Defaults to raw database queries via `tl db pg|fb|es` for anything non-trivial; uses the structured `tl <resource> show` / `find` / `similar` / `history` commands for single-record lookups and the special cases they were built for (similarity search, ID resolution, sponsorship history). Comes with full schema references for Postgres, Elasticsearch, and Firebolt under `references/`.
279
+ - **`tl`** — the data-analyst skill. Defaults to raw database queries via `tl db pg|fb|es` for anything non-trivial; uses the structured `tl <resource> show` / `find` / `similar` commands for single-record lookups and similarity / ID-resolution special cases. Comes with full schema references for Postgres, Elasticsearch, and Firebolt under `references/`.
280
280
  - **`tl-report-builder`** — builds TL reports (channels / brands / sponsorships / videos) from natural-language requests. Produces an in-chat preview by default; saves a real campaign when the user is explicit ("save", "create the report").
281
281
  - **`tl-import`** / **`bulk-import`** — superuser-only; bulk-add or exclude lists of channels, brands, videos, or sponsorships against a report.
282
+ - **`tl-views-guarantee`** — sizes a multi-video sponsorship buy for a channel, returning the video bundle size, views guarantee, and likelihood to hit.
282
283
 
283
284
  ## Output Formats
284
285
 
@@ -9,7 +9,7 @@ ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligenc
9
9
  ### For account managers and sales
10
10
 
11
11
  - **Pipeline reporting on the fly.** *"How many deals did we close in Q1?"*, *"What's my weighted pipeline by sales owner?"*, *"Which proposals are stuck in `pending` for more than 14 days?"* — one raw SQL or one structured command, instead of waiting on a dashboard.
12
- - **Brand intelligence in seconds.** *"What channels does Nike sponsor?"*, *"Which brands sponsor `MrBeast`?"*, *"What's Holafly's sponsorship history through us vs. through everyone?"* — answers are one `tl brands history` or one `tl db es` call away.
12
+ - **Brand intelligence in seconds.** *"What channels does Nike sponsor?"*, *"Which brands sponsor `MrBeast`?"*, *"What's Holafly's sponsorship history through us vs. through everyone?"* — answers are one `tl db es` call away.
13
13
  - **Vetting candidates before a pitch.** Look up a channel by ID, name, YouTube URL, or `@handle`; pull its adspots, audience demographics, evergreenness score, and detected sponsor history before drafting the IO.
14
14
  - **Pre-flight before booking.** Confirm MSN/TPP membership, integration availability, and persona/plan eligibility for a brand profile with one SQL join.
15
15
 
@@ -38,6 +38,7 @@ ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligenc
38
38
  - [jq](https://stedolan.github.io/jq/)
39
39
  - [ripgrep](https://github.com/BurntSushi/ripgrep)
40
40
  - [duckdb](https://duckdb.org/)
41
+ - [yt-dlp](https://github.com/yt-dlp/yt-dlp)
41
42
 
42
43
  For automated installs on MacOS, prefer installing Homebrew and use the pipx package manager, but ask the user if they have the admin access (sudo) password first. If not, proceed by using `uv` and installing everything locally.
43
44
 
@@ -142,8 +143,7 @@ tl recommender brands-for-channel 12345 # Brands most likely to sponsor
142
143
 
143
144
  # Brand intelligence
144
145
  tl brands show Nike
145
- tl brands history Nike # Detected sponsorships from ES
146
- tl brands history-stats Nike # Aggregate roll-up (totals, first/last seen, top channels)
146
+ tl brands find Nike # Resolve a string → single brand id
147
147
 
148
148
  # Search videos and transcripts via Elasticsearch
149
149
  tl db es '{"size":20,"query":{"term":{"channel.id":12345}},"_source":["title","views"]}'
@@ -168,7 +168,7 @@ tl balance
168
168
 
169
169
  # Health check — auth, connectivity, version, latency, and required external tools.
170
170
  # Run this first when something feels off; it surfaces token expiry,
171
- # missing `jq`/`rg`/`duckdb`, and slow endpoints in one snapshot.
171
+ # missing `jq`/`rg`/`duckdb`/`yt-dlp`, and slow endpoints in one snapshot.
172
172
  tl doctor
173
173
  ```
174
174
 
@@ -248,9 +248,10 @@ Each agent discovers the skill automatically and uses it when you ask about spon
248
248
 
249
249
  The plugin ships several focused skills (installed by all the `tl setup *` commands):
250
250
 
251
- - **`tl`** — the data-analyst skill. Defaults to raw database queries via `tl db pg|fb|es` for anything non-trivial; uses the structured `tl <resource> show` / `find` / `similar` / `history` commands for single-record lookups and the special cases they were built for (similarity search, ID resolution, sponsorship history). Comes with full schema references for Postgres, Elasticsearch, and Firebolt under `references/`.
251
+ - **`tl`** — the data-analyst skill. Defaults to raw database queries via `tl db pg|fb|es` for anything non-trivial; uses the structured `tl <resource> show` / `find` / `similar` commands for single-record lookups and similarity / ID-resolution special cases. Comes with full schema references for Postgres, Elasticsearch, and Firebolt under `references/`.
252
252
  - **`tl-report-builder`** — builds TL reports (channels / brands / sponsorships / videos) from natural-language requests. Produces an in-chat preview by default; saves a real campaign when the user is explicit ("save", "create the report").
253
253
  - **`tl-import`** / **`bulk-import`** — superuser-only; bulk-add or exclude lists of channels, brands, videos, or sponsorships against a report.
254
+ - **`tl-views-guarantee`** — sizes a multi-video sponsorship buy for a channel, returning the video bundle size, views guarantee, and likelihood to hit.
254
255
 
255
256
  ## Output Formats
256
257
 
@@ -0,0 +1,50 @@
1
+ ---
2
+ name: youtube-comment-classifier
3
+ description: >
4
+ Classifies a batch of YouTube comments as organic vs bot/spam/template for
5
+ the channel-authenticity skill's fake-engagement detection. Use when you
6
+ have a JSON array of scraped comments and need a fast, cheap per-comment
7
+ authenticity judgment. Returns strict JSON only.
8
+ model: haiku
9
+ tools: Read
10
+ color: yellow
11
+ ---
12
+
13
+ # YouTube Comment Authenticity Classifier
14
+
15
+ You judge whether YouTube comments come from a real, engaged human audience or
16
+ from engagement padding (bots, comment farms, generic filler). You are used by
17
+ the `channel-authenticity` skill to vet channels before ThoughtLeaders books a
18
+ paid sponsorship, so false "organic" verdicts cost real money — be skeptical.
19
+
20
+ ## Input
21
+
22
+ A JSON array of objects: `[{"i": <int>, "text": "<comment>", "author": "<handle>"}, ...]`
23
+ The user message contains ONLY this array (possibly large). Channel context
24
+ (niche/language) may be provided in a leading line — use it if present.
25
+
26
+ ## Labels (choose exactly one per comment)
27
+
28
+ - **organic** — specific, on-topic, references the actual video/creator,
29
+ asks a real question, shares a relevant experience, natural language with
30
+ normal variation. Mild praise that names something specific counts.
31
+ - **generic-template** — vague praise that could be pasted on any video:
32
+ "nice video", "great content", "thanks for sharing", "first", lone emoji
33
+ strings, "love it ❤️". On-language but contentless.
34
+ - **bot-like** — off-topic, off-language for the channel, gibberish,
35
+ random-looking handle + 1–3 word body, repeated near-identical phrasing,
36
+ engagement bait.
37
+ - **promotional** — self-promo, "check out my channel", links, services.
38
+ - **spam** — scams, adult/crypto bait, malicious or nonsensical repetition.
39
+
40
+ When torn between organic and generic-template, prefer generic-template
41
+ unless the comment clearly engages with the specific video.
42
+
43
+ ## Output — STRICT
44
+
45
+ Return ONLY a JSON array, no prose, no markdown fence:
46
+
47
+ `[{"i": 0, "label": "organic"}, {"i": 1, "label": "bot-like"}, ...]`
48
+
49
+ One object per input comment, same `i` values, same length. No extra keys.
50
+ If the input is empty, return `[]`.
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "thoughtleaders-cli"
7
- version = "0.6.55"
7
+ version = "0.7.0"
8
8
  description = "ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence"
9
9
  readme = "README.md"
10
10
  license = "MIT"
@@ -0,0 +1,2 @@
1
+ # Generated at runtime by peer_cohort.py — niche engagement medians, cached per run.
2
+ references/peer-cohort-cache.json
@@ -0,0 +1,127 @@
1
+ ---
2
+ name: channel-authenticity
3
+ description: >
4
+ Detect non-organic views / fake engagement / bot comments on a YouTube
5
+ channel before booking (or after delivering) a sponsorship. Use when asked
6
+ to vet a channel, check if views/comments are real, investigate suspicious
7
+ engagement, audit a sponsorship delivery, or whenever someone shares a
8
+ YouTube channel/handle/URL and asks "is this real / safe to buy an ad on".
9
+ Triggers: "fake views", "bot comments", "non-organic", "is this channel
10
+ legit", "vet this channel", "engagement looks off", "audit this sponsorship".
11
+ ---
12
+
13
+ # Channel Authenticity
14
+
15
+ Takes a channel (handle / URL / numeric id / name) — or `adlink:<id>` for a
16
+ sponsorship drill-down — and returns a 0–100 authenticity score plus ranked
17
+ red-flag findings. Built and calibrated from real bought-view and comment-farm
18
+ investigations.
19
+
20
+ ## Hard rules
21
+
22
+ - **One mode. Every run does everything.** No flags, no opt-in tiers. Groups
23
+ A, B, and C all run, every time.
24
+ - **Comment scraping (Group C) is mandatory and never skipped.** Metrics and
25
+ view-curves can be hand-waved ("the algorithm", "we ran ads"); reading what
26
+ the audience actually says is the only direct proof. A run without it is
27
+ invalid.
28
+ - **Data access is CLI-only.** Everything goes through `tl_cli.py` and the
29
+ `tl` CLI (`tl db pg/fb/es`, `tl channels similar`).
30
+ - Do all data processing with the "utf-8" encoding explicitly in all scripts
31
+ you create.
32
+
33
+ ## Setup check
34
+
35
+ ```bash
36
+ cd .claude/skills/channel-authenticity/scripts
37
+ python3 tl_cli.py preflight # must print "OK"
38
+ ```
39
+ If this errors with `cli_unavailable`, tell the user to run `tl auth login`
40
+ (or set `TL_API_KEY`). Comment scraping additionally needs `yt-dlp`
41
+ (`pip install yt-dlp`) — it uses the android InnerTube client so **no cookies
42
+ or API key are required**.
43
+
44
+ ## How to run (three phases — a classifier subagent sits between two CLI passes)
45
+
46
+ **Phase 1 — collect.** From the `scripts/` dir:
47
+ ```bash
48
+ python3 analyze_channel.py "<handle|url|id|name|adlink:ID>"
49
+ ```
50
+ This runs Groups A + B + C(rule-based), scrapes ≥10 latest longforms
51
+ (+ highest-view + most-recently-sponsored), and prints a JSON envelope with
52
+ `state_path`, `llm_batch_path`, and `llm_batch_size`.
53
+
54
+ If the ref matches **multiple channels** (common for names with localized
55
+ dupes), Phase 1 exits (code 4) with `{"error":"ambiguous_channel",
56
+ "candidates":[{id,name,subscribers}…]}` instead of guessing. Show the
57
+ candidates to the user — they're ordered by subscriber count, highest first
58
+ (the most likely intended) — let them pick, then re-run Phase 1 with that
59
+ numeric id.
60
+
61
+ **Phase 2 — classify comments (run the subagent TWICE).** Read
62
+ `llm_batch_path` (a JSON array of `{i, text, author}`) and send it to the
63
+ `youtube-comment-classifier` agent via the **Agent tool**
64
+ (`subagent_type: youtube-comment-classifier`) **twice** — two separate calls on
65
+ the same batch. Prepend one context line: `channel niche: cat
66
+ <content_category>, language <language>` (both values are in the envelope).
67
+ Each call returns a strict JSON array
68
+ `[{"i":N,"label":"organic|generic-template|bot-like|promotional|spam"}]`; save
69
+ each reply verbatim to its own file (e.g. `/tmp/ca_llm1.json`,
70
+ `/tmp/ca_llm2.json`).
71
+
72
+ Why twice: single-pass LLM labeling wobbles ±10pts, so finalize majority-votes
73
+ the two passes to keep the reported organic share stable. Sophisticated
74
+ AI-comment farms read as clean English at normal volume — only the classifier
75
+ catches them, so this pass is essential.
76
+
77
+ If the batch is empty (channel had almost no comments), skip the subagent and
78
+ pass an empty array `[]` — near-zero comments is itself the loudest signal,
79
+ and Group C already penalizes it.
80
+
81
+ **Phase 3 — finalize** (pass both classifier files):
82
+ ```bash
83
+ python3 analyze_channel.py --finalize <state_path> /tmp/ca_llm1.json /tmp/ca_llm2.json
84
+ ```
85
+ This applies the LLM verdict, computes the composite score, writes the final
86
+ JSON + markdown report to `/tmp`, and prints the report. Present that report
87
+ to the user (it's already formatted — peer comparison, group scores, ranked
88
+ flags, verdict).
89
+
90
+ ## Scoring (see references/scoring.md)
91
+
92
+ Three groups, each scored 0–100 independently (start at 100, subtract fixed
93
+ per-flag penalties). **Final = simple mean of the three.** Two hard
94
+ overrides force `FRAUD_LIKELY` (score capped at 39) regardless of the mean:
95
+ (1) Group C — non-organic audience (<30% organic from the classifier, or a
96
+ dead comment section); (2) Group B — concealed/misrepresented performance
97
+ (≥2 sold+published sponsored videos deleted/unlisted, or one with ≥5k views;
98
+ or ≥3 high-view videos scrubbed with ≥15% of tracked views gone).
99
+
100
+ Bands: ≥90 CLEAN · ≥70 MINOR_FLAGS · ≥40 MIXED · <40 FRAUD_LIKELY.
101
+
102
+ ## What each group checks
103
+
104
+ - **Group A — engagement & peer ratios** (`engagement_ratios.py`,
105
+ `peer_cohort.py`): like/comment rates measured against a niche-matched peer
106
+ baseline, plus audience-size sanity checks across longforms vs shorts.
107
+ - **Group B — view-curve anomalies + video integrity** (`view_curves.py`,
108
+ `anomaly_detector.py`, `video_integrity.py`): view-over-time curves that
109
+ don't behave like organic growth (bursts without engagement, guarantee
110
+ cliffs at round numbers, frozen likes, subs flat while views surge), plus
111
+ intent-aware detection of deleted/unlisted videos used to conceal or
112
+ misrepresent performance (benign re-uploads are excluded).
113
+ - **Group C — comment content** (`comment_scraper.py`, `comment_analyzer.py`
114
+ + classifier subagent): whether the comments are a real, engaged audience —
115
+ scarcity vs views, templating and near-duplicates, language mismatch,
116
+ bot-handle patterns, and the classifier's organic-share verdict.
117
+
118
+ Full catalogue + thresholds: `references/red-flags.md`. The exact `tl` queries
119
+ each check issues live in the scripts; the underlying channel/video/adlink
120
+ schema is documented in the `tl` skill (`skills/tl/references/`).
121
+
122
+ ## After a run
123
+
124
+ Offer to log the verdict (channel, score, top flags, date) to a "Channel
125
+ Vetting Log" sheet via the `gws` skill if the user wants an audit trail.
126
+ If you discover a new robust signal, add it to `references/red-flags.md` and
127
+ a penalty to `references/scoring.md` (self-improvement).
@@ -0,0 +1,45 @@
1
+ # Comment patterns
2
+
3
+ The generic-template phrase library and handle regexes used by
4
+ `comment_analyzer.py`. Extend as new padding patterns show up; keep the code
5
+ list (`GENERIC` in `comment_analyzer.py`) and this doc in sync.
6
+
7
+ ## Generic-template phrases (case-insensitive substring/exact)
8
+
9
+ ```
10
+ nice video, great video, great content, thanks for sharing, first,
11
+ love this, love it, keep it up, keep going, awesome, amazing, good job,
12
+ well done, very nice, so good, best video, informative, helpful,
13
+ thank you so much, wow, super, 👍, 🔥, ❤, great work, nice one,
14
+ good video, very helpful, excellent
15
+ ```
16
+
17
+ A comment counts as generic if its lowercased, punctuation-stripped form is
18
+ exactly one of these OR is ≤25 chars and contains one. Lone emoji strings are
19
+ caught separately by the emoji-only check.
20
+
21
+ ## Bot-handle regex
22
+
23
+ - `^@?[a-z]+[-_]?[0-9]{4,}$` — letters then 4+ digits (YouTube
24
+ auto-suffix style, e.g. `@viewer8821`, `@john_doe4417`). High-signal in bulk.
25
+
26
+ > Note: YouTube now appends short suffixes to many *real* handles too, so
27
+ > bot-handle share is a **supporting** signal (penalty 15), never decisive on
28
+ > its own. The decisive comment signals are scarcity and LLM-not-organic.
29
+
30
+ ## Language
31
+
32
+ Channel `language == 'en'`: a comment "matches" if ≥60% of its alphabetic
33
+ chars are ASCII letters. Emoji/number-only comments are excluded from the
34
+ denominator (handled by emoji-only / length checks instead). For non-English
35
+ channels the language check is skipped (we lack reliable per-language
36
+ baselines — revisit if we onboard many non-en channels).
37
+
38
+ ## What good looks like (contrast)
39
+
40
+ Real audiences on a tech channel reference specifics ("the 72→82 jump
41
+ convinced me", "where is part 1 and 2a?"), ask operational questions, argue,
42
+ and reply to each other. Padding is short, vague, off-language, emoji-heavy,
43
+ or planted product mentions ("X was built specifically for…", "signed up just
44
+ now with the launch code"). The Haiku classifier exists to catch the planted-
45
+ promotional class that keyword rules miss.
@@ -0,0 +1,47 @@
1
+ # Peer cohort
2
+
3
+ Group A's like:view / comment:view thresholds are **relative to a
4
+ niche-matched peer baseline**, not absolute — engagement norms vary wildly by
5
+ niche (gaming ≠ finance ≠ tech tutorials), so a fixed cutoff would
6
+ false-flag low-engagement-but-honest niches and miss high-engagement niches
7
+ being inflated.
8
+
9
+ ## How the cohort is built (`peer_cohort.py`)
10
+
11
+ 1. **Preferred:** `tl channels similar <id> --limit 24` (the recommender).
12
+ Best niche match.
13
+ 2. **Fallback** (recommender empty): PG cohort —
14
+ same `content_category` + `language`, `is_active`, `reach` within ±50%,
15
+ `last_published` within 60 days, excluding the subject channel.
16
+ 3. For up to 12 peers, pull each peer's last 10 longforms via `tl db es`,
17
+ require ≥5,000 aggregate views and ≥3 videos (skip dead peers).
18
+ 4. Baseline = **median** of peers' like-rate and comment-rate, plus the 25th
19
+ percentile for context.
20
+
21
+ A subject channel flags when its longform rate is **< 0.4× the peer median**.
22
+ 0.4× is intentionally generous — we only fire on gross deviation, not normal
23
+ variance. (The origin fraud case ran 0.008× the median; real channels cluster
24
+ 0.7–1.5×.)
25
+
26
+ ## Caching
27
+
28
+ Result cached in `peer-cohort-cache.json` keyed by
29
+ `content_category|language|reach_bucket`, TTL 30 days. Buckets:
30
+ `<10k, 10-50k, 50-150k, 150-500k, 500k-1m, 1-5m, 5m+`. This avoids re-spending
31
+ recommender credits and re-querying ES on every run. Force a rebuild by
32
+ deleting the cache file or calling `get_baseline(ch, refresh=True)`.
33
+
34
+ ## Last-resort fallback
35
+
36
+ If no usable peers at all (rare — niche too small / all peers dead), a generic
37
+ English-tech floor is used (`like 2%, comment 0.25%`) and `source` is recorded
38
+ as `fallback-generic` in the metrics so the report consumer knows the
39
+ baseline was weak. Prefer widening the reach band over trusting this.
40
+
41
+ ## Caveats
42
+
43
+ - The PG fallback uses `content_category` which is coarse; the recommender
44
+ (`tl channels similar`) is materially better — prefer it.
45
+ - Reach buckets are wide on purpose (engagement scales sub-linearly with
46
+ size); don't narrow them without re-checking the false-positive rate on a
47
+ known-clean channel.
@@ -0,0 +1,77 @@
1
+ # Red-flag catalogue
2
+
3
+ Every signal the skill checks, why it matters, and the threshold. Codes match
4
+ the `flags[].code` in the JSON output. Real cases are referenced by anonymized
5
+ label only.
6
+
7
+ ## Group A — engagement & peer ratios (`engagement_ratios.py`)
8
+
9
+ | code | trigger | why |
10
+ |---|---|---|
11
+ | `A_like_rate_vs_peers` | longform like:view < 0.4× peer-cohort median | Paid/bot views don't like. Origin case: 0.027% vs 3.4% peer median (125×). |
12
+ | `A_comment_rate_vs_peers` | longform comment:view < 0.4× peer median | Same logic for comments. Origin case: 78× below. |
13
+ | `A_views_to_subs` | avg longform views > 20% of subs | Healthy 1–15%. Origin case 28%. Implies non-subscriber/external traffic. |
14
+ | `A_longform_shorts_gap` | shorts like-rate ≥ 5× longform like-rate (shorts ≥0.3%) | Organic shorts + dead longforms ⇒ longforms are the promoted units. The smoking gun on the origin case (20×). |
15
+ | `A_organic_floor` | ≥ half of longforms exceed 5× the median-short view count | Non-viral shorts ≈ true audience size. Origin case median short = 688 views vs 180k longform. |
16
+ | `A_per_video_outliers` | ≥⅓ of longforms >1.5σ below the channel's own like:view mean | One real audience produces consistent ratios; promoted videos don't. |
17
+
18
+ Peer baseline: niche-matched (`tl channels similar`, fallback PG cohort:
19
+ same content_category+language, active, reach ±50%, published <60d), median
20
+ of each peer's last-10-longform like/comment rates. Cached 30 days.
21
+
22
+ ## Group B — view-curve time-series (`anomaly_detector.py`)
23
+
24
+ | code | trigger | why |
25
+ |---|---|---|
26
+ | `B_burst_without_engagement` | a Δ-segment with Δviews/day > 3× rolling mean, >5k views, and segment like-rate < ½ lifetime | Real virality brings likes; injected views don't. |
27
+ | `B_engagement_incoherence` | Pearson r(Δviews, Δlikes+Δcomments) < 0.2 over the curve | Organic videos: views and engagement move together (r>0.6). Fraud: decoupled. |
28
+ | `B_guarantee_cliff` | plateau within 5% of a round number (50k/100k/250k/500k/1M…) by age ≤60 then flat | A bought-view case: bought to a 500k guarantee, cliffed at 581k. |
29
+ | `B_slow_start_late_spike` | views@2 < 25th-pctile-ish (< 0.15× final) AND views@10/views@2 > 8 | Paid traffic switched on days after publish — classic bought-view signature. |
30
+ | `B_latelife_drip_frozen_likes` | age ≥20 segment with >3k new views but ≤1 new like | Post-publish ad campaigns drip views with zero engagement. Seen on every video of the origin case. |
31
+ | `B_subs_flat_while_views_surge` | < 30 new subs per 100k channel views over snapshot window | A bought-view case: 27 subs / 580k views. Viewers don't convert ⇒ not real interest. |
32
+
33
+ Interpolation (`view_curves.py`) is self-contained (linear + log bracket
34
+ interpolation, per-segment deltas) — no external dependency.
35
+
36
+ ### Group B add-on — video integrity (`video_integrity.py`)
37
+
38
+ Deletion/unlisting is **not** a signal by itself — channels legitimately
39
+ re-upload and clean house. The signal is deletion used to **conceal or
40
+ misrepresent performance**. Source: ES `offline_since` (exists ⇒ video gone)
41
+ and `content_aspects` containing `'unlisted'`. Intent is inferred from
42
+ view count, age-at-removal, and whether the video was a paid sponsorship.
43
+
44
+ | code | trigger | why |
45
+ |---|---|---|
46
+ | `B_sponsored_video_concealed` | a SOLD+PUBLISHED adlink's video is now offline/unlisted | Brand paid, ad went live, delivery then hidden. Bad-faith + finance/delivery alarm. Hard-fail if ≥2, or one with ≥5k views. |
47
+ | `B_high_view_video_scrub` | offline video(s) above the channel's high-view bar (max of 50k or 25% of median) | You don't delete a 2M-view video by accident. Penalty scales by the **share of tracked views** gone, not raw count (big channels always shed a few old high-view videos): ≥15% → −25 critical; ≥3% → −12 warning; <3% → recorded, not penalized. Hard-fail only if ≥3 videos AND ≥15% of views vanished. |
48
+ | `B_unlisted_with_traffic` | unlisted video still carrying ≥20k views | Hidden from channel page/subscribers while accruing views — running content the organic audience never sees. |
49
+
50
+ Benign (recorded in metrics, **not** penalized): removed ≤7d after publish,
51
+ <5k views, non-sponsored (re-upload/mistake — e.g. a 713-view video pulled
52
+ 2 days after publish).
53
+
54
+ ## Group C — comment content (`comment_analyzer.py` + Haiku subagent)
55
+
56
+ | code | trigger | why |
57
+ |---|---|---|
58
+ | `C_comment_scarcity` | viewer comments < 15% of a 1-per-2,000-views floor (scraped ≥50k views) | The single loudest signal. Origin case: ~21 comments across ~1.8M scraped views. Measured on freshly-scraped comments so it can't be a stale count. |
59
+ | `C_language_mismatch` | <60% of comments in channel language (en channels) | Off-language comment farms — e.g. off-language/emoji junk flooding an English channel. |
60
+ | `C_generic_templates` | >40% generic ("nice video", lone emoji…) | Padding. Library in `comment-patterns.md`. |
61
+ | `C_length_uniform` | ≥70% ≤5 words AND median <8 words | Bots cluster short; real audiences have a long tail. |
62
+ | `C_emoji_only` | >25% emoji-only / no real text | Filler. |
63
+ | `C_bot_usernames` | >30% handles match `^@?[a-z]+[-_]?\d{4,}$` **AND** LLM organic share < 55% | YouTube's own default handles match this pattern too, so it's only a tell when the audience is independently suspect — fires as corroboration in the LLM step, never on format alone. |
64
+ | `C_near_duplicates` | largest token-Jaccard>0.7 cluster >10% | Templated posting. |
65
+ | `C_low_reply_ratio` | <5% of top comments have any reply | Real audiences converse. |
66
+ | `C_no_creator_engagement` | creator hearts 0 comments | Creator ignores a section they know is fake. |
67
+ | `C_commenter_churn` | <2% commenters appear on >1 video | No recurring fanbase; throwaway accounts. |
68
+ | `C_time_clustered` | >50% of comments in first hour on a weeks-old video | Burst posting. |
69
+ | `C_llm_not_organic` | Haiku classifier <50% organic | Catches subtle patterns rules miss. <30% ⇒ hard override → FRAUD_LIKELY. |
70
+
71
+ ## Contributing new signals
72
+
73
+ Found a robust new tell? Add a row here, add a penalty + severity to the
74
+ relevant `PENALTIES` dict in the script, and document the penalty in
75
+ `scoring.md`. Keep thresholds evidence-based, but **reference cases by
76
+ anonymized label only — never a channel name, id, or handle** (this skill
77
+ ships in a public repo).