thoughtleaders-cli 0.6.47__tar.gz → 0.6.52__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (96) hide show
  1. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/.claude-plugin/plugin.json +1 -1
  2. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/AGENTS.md +8 -2
  3. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/API.md +7 -7
  4. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/PKG-INFO +1 -1
  5. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/pyproject.toml +1 -1
  6. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl/SKILL.md +160 -38
  7. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl/references/elasticsearch-schema.md +4 -0
  8. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-import/SKILL.md +1 -1
  9. thoughtleaders_cli-0.6.52/skills/tl-keyword-research/SKILL.md +165 -0
  10. thoughtleaders_cli-0.6.52/skills/tl-keyword-research/scripts/probe.py +156 -0
  11. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/SKILL.md +13 -26
  12. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/examples/e2e_findings.md +13 -8
  13. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/examples/golden_queries.md +6 -6
  14. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/column_builder.md +1 -1
  15. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/__init__.py +1 -1
  16. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/bulk_import.py +5 -1
  17. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/credits.py +12 -1
  18. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/reports.py +13 -6
  19. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/schema.py +11 -4
  20. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/setup.py +27 -9
  21. thoughtleaders_cli-0.6.52/src/tl_cli/commands/uploads.py +41 -0
  22. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/whoami.py +2 -2
  23. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/config.py +0 -1
  24. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/main.py +7 -6
  25. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/output/formatter.py +3 -4
  26. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/self_update.py +14 -3
  27. thoughtleaders_cli-0.6.47/skills/tl-report-builder/tools/keyword_research.md +0 -217
  28. thoughtleaders_cli-0.6.47/src/tl_cli/commands/ask.py +0 -54
  29. thoughtleaders_cli-0.6.47/src/tl_cli/commands/uploads.py +0 -86
  30. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/.claude-plugin/marketplace.json +0 -0
  31. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/.github/workflows/python-publish.yml +0 -0
  32. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/.gitignore +0 -0
  33. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/CLAUDE.md +0 -0
  34. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/LICENSE +0 -0
  35. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/README.md +0 -0
  36. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/agents/tl-analyst.md +0 -0
  37. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/hooks/hooks.json +0 -0
  38. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/hooks/scripts/load-tl-skill.mjs +0 -0
  39. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/hooks/scripts/post-usage.sh +0 -0
  40. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/hooks/scripts/pre-check.sh +0 -0
  41. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl/references/business-glossary.md +0 -0
  42. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl/references/firebolt-schema.md +0 -0
  43. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl/references/postgres-schema.md +0 -0
  44. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/columns_brands.md +0 -0
  45. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/columns_channels.md +0 -0
  46. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/columns_content.md +0 -0
  47. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/columns_sponsorships.md +0 -0
  48. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/intelligence_filterset_schema.json +0 -0
  49. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/intelligence_widget_schema.json +0 -0
  50. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/report_glossary.md +0 -0
  51. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/sortable_columns.json +0 -0
  52. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/sponsorship_filterset_schema.json +0 -0
  53. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/sponsorship_widget_schema.json +0 -0
  54. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/references/widgets.md +0 -0
  55. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/database_query.md +0 -0
  56. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/name_resolver.md +0 -0
  57. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/sample_judge.md +0 -0
  58. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/similar_channels.md +0 -0
  59. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/topic_matcher.md +0 -0
  60. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/skills/tl-report-builder/tools/widget_builder.md +0 -0
  61. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/_completions.py +0 -0
  62. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/auth/__init__.py +0 -0
  63. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/auth/commands.py +0 -0
  64. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/auth/finalize.py +0 -0
  65. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/auth/login.py +0 -0
  66. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/auth/pkce.py +0 -0
  67. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/auth/token_store.py +0 -0
  68. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/client/__init__.py +0 -0
  69. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/client/errors.py +0 -0
  70. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/client/http.py +0 -0
  71. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/__init__.py +0 -0
  72. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/_comments_common.py +0 -0
  73. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/balance.py +0 -0
  74. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/brands.py +0 -0
  75. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/changelog.py +0 -0
  76. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/channels.py +0 -0
  77. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/db.py +0 -0
  78. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/deals.py +0 -0
  79. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/describe.py +0 -0
  80. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/doctor.py +0 -0
  81. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/matches.py +0 -0
  82. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/proposals.py +0 -0
  83. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/recommender.py +0 -0
  84. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/snapshots.py +0 -0
  85. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/commands/sponsorships.py +0 -0
  86. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/filters.py +0 -0
  87. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/hints.py +0 -0
  88. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/src/tl_cli/output/__init__.py +0 -0
  89. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/__init__.py +0 -0
  90. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/test_auth.py +0 -0
  91. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/test_filters.py +0 -0
  92. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/test_http_auth.py +0 -0
  93. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/test_output.py +0 -0
  94. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/test_reports.py +0 -0
  95. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/tests/test_sponsorships.py +0 -0
  96. {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.52}/uv.lock +0 -0
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tl-cli",
3
- "version": "0.6.47",
3
+ "version": "0.6.52",
4
4
  "description": "ThoughtLeaders CLI — query sponsorship deals, channels, brands, uploads, and intelligence from the terminal",
5
5
  "author": {
6
6
  "name": "ThoughtLeaders",
@@ -56,6 +56,13 @@ The CLI integrates with AI coding agents via skills, commands, agents, and hooks
56
56
 
57
57
  This repo is also a Claude Code plugin, and can directly be installed as one.
58
58
 
59
+ ### Bundled skills — when to invoke
60
+
61
+ - **`tl`** — the main skill for querying ThoughtLeaders data. Default for any sponsorship / channel / brand / upload / report question.
62
+ - **`tl-keyword-research`** — invoke whenever the user wants to find videos or channels by **content keywords** (topics, concepts, niches) that aren't covered by a curated recommender tag, OR to validate that a candidate channel's content actually touches a given topic. Returns `{operator, keywords:[{keyword,count}]}` from a ranked ES probe over `title` / `summary` / `transcript`; the caller then runs the actual content search with the surviving high-count terms. **Do not compose keyword sets by hand for `tl db es` content searches — delegate to this skill first.** See `skills/tl/SKILL.md` → *Channel & video discovery* for the four-path decision tree and when to use this vs the recommender / raw SQL.
63
+ - **`tl-report-builder`** — invoke when the user wants to build, refine, or save a platform report (campaign config, FilterSet, columns, widgets). Multi-phase flow: routing → schema + validation → columns → widgets.
64
+ - **`tl-import`**, **`tl-save-report`**, **`adapt-tl-data`** — narrower workflows; the skill files document their own triggers.
65
+
59
66
  ### Skill content boundaries
60
67
 
61
68
  Skills under `skills/` are split into a `SKILL.md` and one or more `references/*.md` files. To prevent drift, each fact has exactly one home:
@@ -90,7 +97,6 @@ All list endpoints return: `{ results, total, limit, offset, usage: { credits_ch
90
97
  - `TL_API_URL` — API base (default: `https://app.thoughtleaders.io`)
91
98
  - `TL_API_KEY` — Bearer token override for CI/scripts
92
99
  - `TL_AUTH0_DOMAIN`, `TL_AUTH0_CLIENT_ID`, `TL_AUTH0_AUDIENCE` — Auth0 config
93
- - `TL_LLM_KEY` — User's own LLM key for `tl ask` (avoids surcharge)
94
100
 
95
101
  ## Credit System
96
102
 
@@ -111,7 +117,7 @@ The version string is defined in three files and all three must be updated toget
111
117
 
112
118
  * Do not reference internal architecture of the ThoughtLeaders app in comments or skills. Specifially: do not reference internal table names, field names, API endpoints, Python modules or functions (including the sanitizer).
113
119
  * Do not let server implementation details into skill files (anything under `skills/`). Skills describe *what the CLI does* from the user's seat — observable command surface, inputs, outputs, examples. Do not say "the server enforces X", "the API validates Y on its side", "the backend rejects Z" — those are mechanism notes that drift the moment the server changes. State the user-visible behaviour ("unknown keys come back as 400") without naming where it's enforced.
114
- * Place all imports at the start of the Python module file
120
+ * **All `import` and `from X import Y` statements live at the top of the Python module file** — after the module docstring, before any code. No inline imports inside function bodies, no lazy imports for "speed" or "optional dependency" reasons. `from __future__ import …` goes at the very top (Python requires that). The only legitimate inline-import exception is **platform-conditional imports** that cannot succeed on the other platform (e.g. `import msvcrt` on Linux, `import termios`/`tty` on Windows) — those stay inside their `if sys.platform == …:` guard. If a circular-import problem makes a top-level import impossible, fix the circular dependency rather than working around it with an inline import.
115
121
 
116
122
  # Git commit rules
117
123
 
@@ -26,7 +26,7 @@ All requests must carry both of:
26
26
  | `Authorization` | `Bearer <api_key>` | The credential. |
27
27
  | `X-TL-Auth` | `API-KEY` | Opts into API-key auth. Without it the server interprets the Bearer as an Auth0 JWT and rejects the API-key string. |
28
28
 
29
- Create an API key from Django Admin → **API keys** → Add. The shown 64-char hex value is the only secret — keep it out of version control. Keys carry an `is_active` flag and an `expires_at` (defaults to 1 year from creation). Inactive or expired keys return `401`; if the owning user is deactivated the request fails with `403`.
29
+ Inactive or expired keys return `401`; if the owning user is deactivated the request fails with `403`.
30
30
 
31
31
  A quick set of shell variables used throughout this page:
32
32
 
@@ -76,7 +76,7 @@ Multi-row responses share one shape:
76
76
  "usage": {
77
77
  "credits_charged": 4.12,
78
78
  "credit_rate": 1.4,
79
- "balance_remaining": 9_995.88
79
+ "balance_remaining": 9995.88
80
80
  },
81
81
  "_breadcrumbs": [
82
82
  { "hint": "next page", "command": "..." }
@@ -127,7 +127,7 @@ print(get('/whoami'))
127
127
  "name": "Acme Marketing",
128
128
  "plan": "Intelligence",
129
129
  "is_managed_services": false,
130
- "credits_balance": 9_995.88
130
+ "credits_balance": 9995.88
131
131
  },
132
132
  "associated_profiles": [ ... ],
133
133
  "brands": [ ... ]
@@ -154,7 +154,7 @@ print(get('/balance'))
154
154
 
155
155
  ```json
156
156
  {
157
- "balance": 9_995.88,
157
+ "balance": 9995.88,
158
158
  "allow_overage": false,
159
159
  "recent_usage": [
160
160
  {
@@ -200,13 +200,13 @@ print(post('/raw/pg', {'query': sql}))
200
200
  ```json
201
201
  {
202
202
  "results": [
203
- {"id": 12345, "channel_name": "MrBeast", "reach": 320_000_000},
203
+ {"id": 12345, "channel_name": "MrBeast", "reach": 320000000},
204
204
  ...
205
205
  ],
206
206
  "total": 5,
207
207
  "limit": 5,
208
208
  "offset": 0,
209
- "usage": { "credits_charged": 1.84, "credit_rate": 1.4, "balance_remaining": 9_994.04 }
209
+ "usage": { "credits_charged": 1.84, "credit_rate": 1.4, "balance_remaining": 9994.04 }
210
210
  }
211
211
  ```
212
212
 
@@ -381,7 +381,7 @@ def post(path, body):
381
381
  r.raise_for_status()
382
382
  return r.json()
383
383
 
384
- # 1) Find Holafly's brand id (free name lookup endpoint covered elsewhere — use db_pg here)
384
+ # 1) Find Holafly's brand id
385
385
  brand = post('/raw/pg', {'query':
386
386
  "SELECT id FROM thoughtleaders_brand WHERE name = 'Holafly' LIMIT 1 OFFSET 0"
387
387
  })['results'][0]['id']
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: thoughtleaders-cli
3
- Version: 0.6.47
3
+ Version: 0.6.52
4
4
  Summary: ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence
5
5
  Project-URL: Homepage, https://thoughtleaders.io
6
6
  Project-URL: Repository, https://github.com/ThoughtLeaders-io/thoughtleaders-cli
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "thoughtleaders-cli"
7
- version = "0.6.47"
7
+ version = "0.6.52"
8
8
  description = "ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence"
9
9
  readme = "README.md"
10
10
  license = "MIT"
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: tl
3
3
  description: |
4
- Query and analyze YouTube sponsorship data using the `tl` CLI. Use this skill for data exploration and questions about channels, brands and sponsorships: counts, metrics, trends, time-series, distributions, single-record drill-downs, revenue / pipeline-weighting math, view-curve analysis, cross-source business questions. Examples: "How many deals did we close last quarter?", "What's the weighted pipeline by sales owner?", "Show me the view curve for video X", "Find mentions of Surfshark in transcripts", "Investigate this video".
4
+ Query and analyze YouTube sponsorship data using the `tl` CLI. Use this skill for finding channels, brands and sponsorships, and for data exploration, including counts, metrics, trends, time-series, distributions, single-record drill-downs, revenue / pipeline-weighting math, view-curve analysis, cross-source business questions. Examples: "How many deals did we close last quarter?", "What's the weighted pipeline by sales owner?", "Show me the view curve for video X", "Find mentions of Surfshark in transcripts", "Investigate this video", "Find channels...", "Find brands...".
5
5
  ---
6
6
 
7
7
  # ThoughtLeaders Data Analyst
@@ -140,13 +140,13 @@ Unless the user specifically asks for running a specific report or showing the r
140
140
  3. **Decide the method of discovery**: If the user want to explore certain topics, use the recommender commands. If it's more about filtering, construct a query for PG or ES.
141
141
  4. **Always use --json**: Parse JSON output for multi-step analysis.
142
142
  5. **Chain commands**: For complex questions, chain multiple `tl` commands, shell commands, and other tools.
143
- 6. **Format results**: When the user asks for a list or tabular data, present the results as a well-formatted markdown table. Pick the most relevant columns and use clear headers. In this case, ask the user if they want to save the list as a report, and invoke the `tl-save-report` skill.
143
+ 6. **Format results**: When the user asks for a list or tabular data, present the results as a well-formatted markdown table. Pick the most relevant columns and use clear headers. Sort the result by relevant criteria - if the user asked for "top performers", order by the performance metric; if the user asked for "most recent", sort by the pertinent date desc. When the result is tabular, ask the user if they want to save the list as a report, and invoke the `tl-save-report` skill.
144
144
 
145
- Prefer writing shell code, `jq` commands, or `duckdb` commands that fetch or analysise large sets of data instead of analysing it yourself. Create temporary files in `/tmp` that can be analysed later in different ways. Before analysing a potentially large result set, first try fetching just a single result with `LIMIT 1` without `jq` etc, to see the shape of the data and any error messages.
145
+ Prefer writing shell code, `jq` commands, or `duckdb` commands that fetch or analysise large sets of data instead of analysing it yourself. On Mac and Linux, create temporary files in `/tmp` that can be analysed later in different ways. On Windows, create them in `%USERPROFILE%\AppData\Local\Temp`. Before analysing a potentially large result set, first try fetching just a single result with `LIMIT 1` without `jq` etc, to see the shape of the data and any error messages.
146
146
 
147
147
  ## Available Commands
148
148
 
149
- Note that if you're working on Windows, you must set up UTF-8 in the console, because all of these commands return UTF-8 data.
149
+ Note that if you're working on Windows, you must set up UTF-8 in the terminal with `PYTHONIOENCODING=utf-8 tl ...`, because all of these commands return UTF-8 data.
150
150
 
151
151
  ### Data queries
152
152
 
@@ -181,10 +181,6 @@ tl channels history <id-or-name> # Sponsorship history
181
181
  tl channels similar <id-or-name> # Similarity recommender (Intelligence plan)
182
182
  tl brands show <id-or-name> # Brand detail
183
183
  tl brands find <query> # Resolve a string to {id, name}; matches name, slug, domain, or keyword
184
- tl brands history <id-or-name> # Sponsorship history
185
- tl brands history <query> --channel <id> # Brand mentions on specific channel
186
- tl brands history-stats <id-or-name> # Aggregate roll-up: counts, total/avg/median views, first/last seen, by-year, top channels
187
- tl brands history-stats <q> --channel <id> # Same roll-up, narrowed to one channel
188
184
  tl brands similar <id-or-name> # Find similar brands via similarity search
189
185
  tl recommender tags [query] # List similarity tag names — categories, demographics, formats
190
186
  tl recommender top-channels "<tag>" # Top channels loaded on a similarity tag (Intelligence)
@@ -391,7 +387,7 @@ See [references/postgres-schema.md](references/postgres-schema.md) for the accep
391
387
  ### Three sources, each authoritative for different things
392
388
 
393
389
  - **Postgres** — deals, pipeline, brands, channels, users, organizations, profiles, revenue. Source of truth for deal state. Reachable via the structured `tl` commands or raw `tl db pg`.
394
- - **Elasticsearch** — videos, transcripts, brand mentions, **current** channel/video metrics, demographics. Reachable via `tl uploads`, `tl channels`, and `tl db es`.
390
+ - **Elasticsearch** — videos, transcripts, brand mentions, **current** channel/video metrics, demographics. Reachable via `tl db es`.
395
391
  - **Firebolt** — **historical** time-series snapshots only (view curves over time, subscriber-growth trends). Reachable via `tl snapshots` (preferred) or `tl db fb`.
396
392
 
397
393
  **Use Firebolt only when you need a value AT A POINT IN TIME that no longer exists in the current ES/PG snapshot.** For "current views/subs", use ES.
@@ -449,9 +445,18 @@ tl changelog since v0.4.10 # Notes from v0.4.10 to latest
449
445
  tl changelog --md > CHANGELOG.md # Capture for a doc
450
446
  ```
451
447
 
452
- #### Channel discovery — recommender first, raw SQL second
448
+ #### Channel & video discovery — pick the path for the question shape
453
449
 
454
- For category- or demographic-driven discovery, **use the recommender, not `content_category` SQL.** The recommender ranks channels by how strongly they load on a category/demographic tag (similarity scores), instead of forcing exact equality on a single integer code. It also returns the matching brand profiles alongside the channelsuseful when the user actually wants to know "who buys this kind of inventory."
450
+ Four first-class paths, each with a different signal. **Pick by the SHAPE of the user's question, not by habit.** "Recommender first" is the right default only for path 2for paths 1, 3, and 4 the recommender is the wrong tool.
451
+
452
+ **1. Named entity** — user named a specific channel, brand, or YouTube URL/handle/ID (`"MrBeast"`, `"NordVPN"`, `"@mkbhd"`, `"youtu.be/..."`). Use `tl channels find` / `tl brands find` — single-step resolver returning `{id, name}`. Cheap, deterministic, no expansion.
453
+
454
+ ```bash
455
+ tl channels find "MrBeast"
456
+ tl brands find "NordVPN"
457
+ ```
458
+
459
+ **2. Curated tag / category / demographic** — user named a topic that maps cleanly to a recommender tag (`"Cooking"`, `"Tech"`, `"USA share"`, content categories, format hints). Use the recommender — it ranks channels by how strongly they load on a tag, returning ranked similarity scores instead of forcing exact equality. It also returns matching brand profiles alongside the channels — useful when the user wants to know "who buys this kind of inventory."
455
460
 
456
461
  ```bash
457
462
  # Discover the right tag name first (free)
@@ -464,7 +469,41 @@ tl recommender top-channels "Tech" --limit 30
464
469
  tl recommender top-brands "USA share" mbn:yes --limit 50
465
470
  ```
466
471
 
467
- Use `tl db pg` only for predicates the recommender can't express — pure attribute filters (`is_tl_channel`, `language`, `demographic_device_primary`), aggregations, and joins. Run `tl schema pg` once to confirm the live column set; the columns referenced below are stable.
472
+ **Available filters on the recommender commands:**
473
+
474
+ | Command | Filters |
475
+ | --- | --- |
476
+ | `top-channels` | `msn:<yes\|no\|all>` (default all), `exclude-for-profile:<id>` |
477
+ | `top-profiles` | `mbn:<yes\|no\|all>` (default all), `exclude-for-channel:<id>` |
478
+ | `top-brands` | `mbn:<yes\|no\|all>` (default all) |
479
+ | `channels-for-profile` | `language:<iso>` (default `en`), `msn:<yes\|no>` (default `no`) |
480
+ | `channels-for-brand` | same as `channels-for-profile` |
481
+ | `brands-for-channel` | `mbn:<yes\|no\|all>` (default `all`) |
482
+
483
+ Use `tl recommender top` for category/topic discovery (it's ranked) and `tl channels similar` / `tl brands similar` for 1:1 lookalike searches. This is the fast path.
484
+
485
+ **Hand-off to path 3 when the tag doesn't fit** If `tl recommender tags <hint>` returns no clean match, the user's intent cannot be represented by recommender tags — drop to path 3, do NOT fake-fit a loose adjacent tag. E.g. `"crypto/Web3 channels"` is a miss even though `"cryptocurrency"` exists as a tag — `"cryptocurrency"` is a financial-product tag, not the cultural-niche the user named. Same for `"speedcubing"`, `"biohacking and longevity"`, `"AI cooking"` — none of these are curated tags, so they belong in path 3.
486
+
487
+ **Also fall through to path 3 — NOT path 4 — when the recommender returns errors.** If `tl recommender top-channels "<tag>"` 5xx's or times out, the right fallback is path 3 (run the keyword-research skill against ES), not path 4 (PG `ILIKE` on `channel_name`). PG name-matching misses every channel whose name doesn't contain the literal word — that's the same anti-pattern called out at the bottom of this section.
488
+
489
+ **Also fall through to path 3 if the user wants to broaden the search.** When encountering further inputs like "broaden the search", "find more results", etc., it indicates the user is searching for topics beyond what the recommender tags provide.
490
+
491
+ **3. Content keywords beyond tags — invoke the `tl-keyword-research` skill** — user described content the channel OR video ACTUALLY TALKS ABOUT, and it isn't a curated tag. Triggers:
492
+
493
+ - **Channel search by topic** — `"crypto/Web3 channels"`, `"speedcubing channels"`, `"channels about biohacking and longevity"`, `"both 3D printing and miniature painting"`.
494
+ - **Video search by topic** — `"videos where creators discuss budget meal prep"`, `"uploads about [topic]"`, `"find videos that talk about X"`.
495
+ - **Channel–brand fit check** — does this candidate channel's content actually touch the brand's category? (Use with `channel.id` filter on the downstream ES query.)
496
+ - **Validating a recommender / SQL shortlist** — sample-check that the top-N channels really cover the niche.
497
+
498
+ **Do NOT compose keyword sets by hand for `tl db es`.** Always run the skill's script first. It broadens the user input, probes each candidate via `multi_match phrase`, and returns ranked counts:
499
+
500
+ ```json
501
+ {"operator": "OR", "keywords": [{"keyword": "crypto", "count": 18742}, {"keyword": "bitcoin", "count": 15103}, {"keyword": "rugpull", "count": 0}]}
502
+ ```
503
+
504
+ Then run the actual content search via `tl db es` (`multi_match` on the `title`, `summary`, `transcript` fields) with the surviving high-count keywords. The skill's full procedure (Phase 1 = seed expansion by you; Phase 2 = the script) is in the `tl-keyword-research` skill file.
505
+
506
+ **4. Pure attribute filter** — user wants channels filtered by metadata like: `is_tl_channel`, `language`, `demographic_device_primary`, country share in `demographic_geo` JSON, aggregations, joins. Use `tl db pg` with a SELECT on `thoughtleaders_channel`. Run `tl schema pg thoughtleaders_channel` once to confirm the live column set; the columns in the examples are stable.
468
507
 
469
508
  ```bash
470
509
  # All TPP (TL-managed) channels — pure attribute filter, not a category query
@@ -483,14 +522,17 @@ tl db pg "SELECT id, channel_name, demographic_device_primary, total_views
483
522
  LIMIT 100 OFFSET 0"
484
523
  ```
485
524
 
486
- For per-country share beyond the recommender's "USA share" tag, use the `demographic_geo` jsonb in raw SQL: `(demographic_geo->>'gb')::int >= 25`. Same pattern with `demographic_device->>'mobile'` for non-primary device shares.
525
+ For per-country share beyond the recommender's "USA share" tag, use the `demographic_geo` JSONB field in raw SQL: `(demographic_geo->>'gb')::int >= 25`. Same pattern with `demographic_device->>'mobile'` for non-primary device shares.
487
526
 
488
527
  **MSN status (`media_selling_network_join_date`) is scrubbed from the advertiser sandbox view.** Raw SQL can't filter on it from an advertiser context. For MSN-only / non-MSN lookups, run the same raw SQL with `media_selling_network_join_date IS [NOT] NULL` from a context that has access to it (full-access role), or rely on the recommender's MSN-aware filters: `tl recommender top-channels "<tag>" msn:yes|no|all`.
489
528
 
529
+ **Anti-pattern: defaulting to `ILIKE` on `channel_name` for off-tag topic queries.** If the question is "channels about X" where X is a topic / concept / niche (not a literal substring you expect in channel names), reach for path 3 (`tl-keyword-research`), not `WHERE channel_name ILIKE '%X%'`. Channel-name `ILIKE` misses channels whose name doesn't literally contain X but whose content does; the keyword-research skill catches them via `title` / `summary` / `transcript`. Use `channel_name ILIKE` only when you actually expect the channel's name to contain the term (e.g. `"Crypto"` in `"My Happy Crypto"`) as a supplementary signal alongside path 3, not as a replacement for it.
530
+
490
531
  ### Output flags
491
- - `--json` — structured JSON (use this for parsing)
492
- - `--csv` — CSV output
493
- - `--md` — Markdown table
532
+ - `--json` — structured JSON output format (use this for parsing)
533
+ - `--toon` — [TOON](https://toonformat.dev/guide/getting-started.html) output format (efficient for large data sets while keeping metadata)
534
+ - `--csv` — CSV output format
535
+ - `--md` — Markdown table for user presentation only
494
536
  - `--limit N` — max results
495
537
  - `--offset N` — pagination
496
538
 
@@ -536,7 +578,7 @@ When presenting sponsorship status data, always use human-readable labels — ne
536
578
 
537
579
  ## Examples
538
580
 
539
- "Show me my sold sponsorships this quarter":
581
+ ### "Show me my sold sponsorships this quarter":
540
582
  ```bash
541
583
  tl db pg "SELECT al.id, al.weighted_price, al.purchase_date, b.name AS brand
542
584
  FROM thoughtleaders_adlink al
@@ -549,24 +591,117 @@ tl db pg "SELECT al.id, al.weighted_price, al.purchase_date, b.name AS brand
549
591
  LIMIT 500 OFFSET 0" --json
550
592
  ```
551
593
 
552
- "What channels does Nike sponsor?":
594
+ ### Brand sponsorship history — what channels does Nike sponsor?
595
+
596
+ Resolve the brand to an ID, then probe ES for articles where the brand appears in `sponsored_brand_mentions`. Channel names live in PG (the ES article doc only carries `channel.id`), so the third call joins them in.
597
+
553
598
  ```bash
554
- tl brands history Nike --json
599
+ # 1. Resolve "Nike" → brand ID
600
+ tl brands find Nike --json # → results[0].id, say 21416
601
+
602
+ # 2. Recent sponsored videos for that brand (sorted by publication_date desc)
603
+ tl db es '{
604
+ "size": 50,
605
+ "track_total_hits": true,
606
+ "query": {"bool": {"filter": [
607
+ {"term": {"doc_type": "article"}},
608
+ {"term": {"sponsored_brand_mentions": "21416"}}
609
+ ]}},
610
+ "sort": [{"publication_date": "desc"}],
611
+ "_source": ["title", "channel.id", "publication_date", "views"]
612
+ }' --json > /tmp/nike_history.json
613
+
614
+ # 3. Resolve channel.id → channel_name (one PG round-trip for the whole page)
615
+ jq -r '[.results[].channel.id] | unique | map(tostring) | join(",")' /tmp/nike_history.json \
616
+ | xargs -I CH_IDS tl db pg "SELECT id, channel_name FROM thoughtleaders_channel WHERE id IN (CH_IDS)" --json
617
+
618
+ # Narrow to a single channel:
619
+ tl db es '{
620
+ "size": 50,
621
+ "track_total_hits": true,
622
+ "query": {"bool": {"filter": [
623
+ {"term": {"doc_type": "article"}},
624
+ {"term": {"sponsored_brand_mentions": "21416"}},
625
+ {"term": {"channel.id": 5607}}
626
+ ]}},
627
+ "sort": [{"publication_date": "desc"}],
628
+ "_source": ["title", "publication_date", "views"]
629
+ }'
630
+
631
+ # Was the video a TL-brokered deal? Cross-check ES video_id against AdLink.article_id:
632
+ tl db pg "SELECT article_id FROM thoughtleaders_adlink
633
+ WHERE article_id IN ('1247603:8LskGvKUA9I', '1247603:abc123')"
555
634
  ```
556
635
 
557
- "Compare view curves for two videos":
636
+ ### Brand sponsorship roll-up totals, first/last seen, top channels, by-year
637
+
638
+ The same ES filter (`doc_type=article` + `sponsored_brand_mentions=<id>`) with `size:0` + aggregations replaces a roll-up call. ES accepts **one aggregation total per request** (top-level + sub-aggs all count), so what would be a single server-side roll-up here splits into a few `tl db es` calls and one client-side join.
639
+
640
+ ```bash
641
+ # Totals + time range (one aggregation total — the four metric aggs are siblings under aggs and bill as a single body)
642
+ tl db es '{
643
+ "size": 0,
644
+ "track_total_hits": true,
645
+ "query": {"bool": {"filter": [
646
+ {"term": {"doc_type": "article"}},
647
+ {"term": {"sponsored_brand_mentions": "21416"}}
648
+ ]}},
649
+ "aggs": {
650
+ "views_sum": {"sum": {"field": "views"}},
651
+ "views_avg": {"avg": {"field": "views"}},
652
+ "first_seen": {"min": {"field": "publication_date"}},
653
+ "last_seen": {"max": {"field": "publication_date"}}
654
+ }
655
+ }'
656
+
657
+ # By-year breakdown (date_histogram only — no sub-agg, that would push over the one-agg cap)
658
+ tl db es '{
659
+ "size": 0,
660
+ "query": {"bool": {"filter": [
661
+ {"term": {"doc_type": "article"}},
662
+ {"term": {"sponsored_brand_mentions": "21416"}}
663
+ ]}},
664
+ "aggs": {
665
+ "by_year": {"date_histogram": {
666
+ "field": "publication_date", "calendar_interval": "year",
667
+ "format": "yyyy", "min_doc_count": 1
668
+ }}
669
+ }
670
+ }'
671
+
672
+ # Top channels by sponsored-video count (terms agg only — for views per channel, run a second call per channel)
673
+ tl db es '{
674
+ "size": 0,
675
+ "query": {"bool": {"filter": [
676
+ {"term": {"doc_type": "article"}},
677
+ {"term": {"sponsored_brand_mentions": "21416"}}
678
+ ]}},
679
+ "aggs": {
680
+ "by_channel": {"terms": {"field": "channel.id", "size": 10, "order": {"_count": "desc"}}}
681
+ }
682
+ }'
683
+
684
+ # TL-brokered deal count for the brand (PG, not ES — adlinks where the brand is on the creator profile)
685
+ tl db pg "SELECT COUNT(*) AS tl_brokered
686
+ FROM thoughtleaders_adlink al
687
+ JOIN thoughtleaders_profile p ON p.id = al.creator_profile_id
688
+ JOIN thoughtleaders_profile_brands pb ON pb.profile_id = p.id
689
+ WHERE pb.brand_id = 21416 AND al.article_id IS NOT NULL"
690
+ ```
691
+
692
+ ### "Compare view curves for two videos":
558
693
  ```bash
559
694
  tl snapshots video abc123 --channel 456 --json
560
695
  tl snapshots video def789 --channel 456 --json
561
696
  ```
562
697
 
563
- "Run my Q1 pipeline report":
698
+ ### "Run my Q1 pipeline report":
564
699
  ```bash
565
700
  tl reports --json # Find the report ID first
566
701
  tl reports run 42 --json
567
702
  ```
568
703
 
569
- "Look up a channel or brand from whatever the user pasted":
704
+ ### "Look up a channel or brand from whatever the user pasted":
570
705
  ```bash
571
706
  # Channel: accepts name, slug, YouTube channel URL, handle (@…), raw channel ID
572
707
  # (UC…), or any video URL. On ambiguity returns 400 with candidate {id, name};
@@ -596,7 +731,7 @@ tl recommender top-channels "Cooking" msn:yes --limit 100 --json \
596
731
  LIMIT 50 OFFSET 0" --json
597
732
  ```
598
733
 
599
- "Show sold sponsorships targeting mobile US audiences":
734
+ ### "Show sold sponsorships targeting mobile US audiences":
600
735
  ```bash
601
736
  tl db pg "SELECT al.id, c.channel_name, c.demographic_device_primary, c.demographic_usa_share, al.weighted_price
602
737
  FROM thoughtleaders_adlink al
@@ -608,7 +743,7 @@ tl db pg "SELECT al.id, c.channel_name, c.demographic_device_primary, c.demograp
608
743
  LIMIT 500 OFFSET 0" --json
609
744
  ```
610
745
 
611
- "Find channels similar to one I know" (similarity recommender, 25 credits per call):
746
+ ### "Find channels similar to one I know" (similarity recommender):
612
747
  ```bash
613
748
  tl channels similar 29834 --limit 10 # by ID (defaults to msn:yes, tpp:both)
614
749
  tl channels similar "Tremending girls" --limit 5 # by unique name
@@ -620,7 +755,7 @@ tl channels similar 29834 min-subs:1000000 exclude:477487 --limit 15 # client-s
620
755
  ```
621
756
  **Both `tl channels show` and `tl channels similar` accept either a numeric channel ID or a channel name.** Name arguments are case-insensitive partial matches; if more than one active channel matches, the command prints a candidates table (channel_id, subscribers, name) and exits 1 so you can retry with a specific ID. The `msn` filter on `similar` is tri-state: `yes` (only MSN channels — the default), `no` (only non-MSN channels), `both` (no MSN filter). `tl channels look-alike` is a hidden alias for `similar` that matches the internal "look-alike channels" terminology.
622
757
 
623
- "Browse the recommender" (categories, demographics, formats — `tl recommender tags` is free):
758
+ ### "Browse the recommender" (categories, demographics, formats):
624
759
  ```bash
625
760
  tl recommender tags # Full tag list (free)
626
761
  tl recommender tags cooking # Search tag names by substring
@@ -637,16 +772,3 @@ tl recommender channels-for-brand 6037 msn:yes language:en --limit 30
637
772
  tl recommender brands-for-channel 29834 --limit 30 # Brands likely to sponsor this channel
638
773
  tl recommender brands-for-channel "MrBeast" mbn:yes --limit 30 # Same, restricted to MBN brand profiles
639
774
  ```
640
-
641
- **Filters on the recommender commands:**
642
-
643
- | Command | Filters |
644
- | --- | --- |
645
- | `top-channels` | `msn:<yes\|no\|all>` (default all), `exclude-for-profile:<id>` |
646
- | `top-profiles` | `mbn:<yes\|no\|all>` (default all), `exclude-for-channel:<id>` |
647
- | `top-brands` | `mbn:<yes\|no\|all>` (default all) |
648
- | `channels-for-profile` | `language:<iso>` (default `en`), `msn:<yes\|no>` (default `no`) |
649
- | `channels-for-brand` | same as `channels-for-profile` |
650
- | `brands-for-channel` | `mbn:<yes\|no\|all>` (default `all`) |
651
-
652
- Use `tl recommender top` for category/topic discovery (it's ranked) and `tl channels similar` / `tl brands similar` for 1:1 lookalike searches.
@@ -261,6 +261,10 @@ tl db es '{
261
261
  }'
262
262
  ```
263
263
 
264
+ ## Text analyzer behavior
265
+
266
+ `text` fields on article docs (`title`, `summary`, `transcript`) appear to use the `standard` analyzer (tokenize + lowercase, no stemmer, no English-possessive filter), so inflections, plurals, and possessives are each indexed as distinct terms. For example: `bitcoin` (4,466,300) vs `bitcoins` (489,262). For stemming-style recall, expand the query side with a `bool.should` over the variants.
267
+
264
268
  ## Notes & gotchas
265
269
 
266
270
  - **Composite IDs:** `tl-platform.id` and `_id` are `<channel_id>:<youtube_id>`. The `youtube_id` portion alone is what Firebolt's `article_metrics.id` stores.
@@ -135,7 +135,7 @@ Before running, confirm:
135
135
  2. **Entity type** — one of `channels` / `brands` / `articles` / `sponsorships`. Infer from context, but translate user-facing vocabulary:
136
136
  - YouTube URLs / handles / `UC…` IDs → `channels`
137
137
  - Domains / brand slugs → `brands`
138
- - "videos" / "uploads" / video URLs / video IDs → `articles` *(the CLI calls them uploads in `tl uploads list`, but `bulk-import` expects `articles` same concept, legacy naming)*
138
+ - "videos" / "uploads" / video URLs / video IDs → `articles` *(the CLI surfaces them as uploads `tl uploads show <id>` — but `bulk-import` expects `articles`; same concept, legacy naming)*
139
139
  - "adlinks" / "deals" / "sponsorships" / numeric AdLink IDs → `sponsorships`
140
140
  3. **Identifiers** — the list. Accepted shapes per entity:
141
141
  - **channels**: numeric DB IDs, YouTube channel IDs (`UC…`), `@handles`, full YouTube URLs (`/@…`, `/channel/UC…`, `/user/…`)
@@ -0,0 +1,165 @@
1
+ ---
2
+ name: tl-keyword-research
3
+ description: |
4
+ Broaden and rank a set of content-search keywords. Invoke when the user wants to find videos or channels by content keywords (topics, concepts, niches) — not by ID or exact name. Takes one or more seed keywords (or an NL phrase), proposes related candidates, probes Elasticsearch for each one against the `title` / `summary` / `transcript` fields, and returns a strict JSON object `{"keywords":[{"keyword","count"},...]}` sorted descending by document count. The output is meant to feed the next step (typically a `tl db es` content search with the surviving high-count keywords).
5
+ ---
6
+
7
+ # tl-keyword-research
8
+
9
+ Widen and rank content-search keywords before running the actual ES content search. Two phases: the agent expands the seed keyword(s) into a broader candidate set; the bundled script probes ES for each candidate and returns the ranked counts.
10
+
11
+ ## When to invoke
12
+
13
+ Invoke this skill — directly, or as a delegated step from another skill / agent — when:
14
+
15
+ - The user wants to find **videos or channels by content keywords** (topics, concepts, niches), not by ID or by exact name.
16
+ - The user supplies at least one seed keyword, or an NL phrase from which seeds can be derived.
17
+ - The goal is to **widen** the keyword set the user came in with before running the actual content search.
18
+
19
+ Skip when:
20
+
21
+ - The user has explicit channel / brand IDs or names → use `tl channels find` / `tl brands find` instead.
22
+ - The user's intent maps cleanly to an existing recommender tag (e.g. "Cooking channels") → use `tl recommender top-channels "<tag>"` instead. Recommender tags are curated; don't re-discover them through keyword text matching.
23
+
24
+ ## Inputs
25
+
26
+ - **Seed keywords** — one or more strings supplied by the caller (or extracted from an NL phrase).
27
+ - **Optional time window** — `--since YYYY-MM-DD` and / or `--until YYYY-MM-DD`. Scopes the probes to `publication_date` within that range. Default: all-time.
28
+
29
+ ## Two phases
30
+
31
+ ### Phase 1 — Expand (you, the agent)
32
+
33
+ Take the seed keyword(s) and broaden them with:
34
+
35
+ - **Synonyms** — `"crypto"` → `"cryptocurrency"`, `"digital currency"`.
36
+ - **Sub-areas / adjacent concepts** — `"crypto"` → `"bitcoin"`, `"ethereum"`, `"DeFi"`, `"NFT"`, `"blockchain"`, `"Web3"`.
37
+ - **Specific multi-word phrases** — `"crypto"` → `"how to buy bitcoin"`, `"smart contract"`.
38
+ - **Inflectional variants** — ES text fields aren't stemmed (see the [ES schema reference](../tl/references/elasticsearch-schema.md#text-analyzer-behavior)), so each surface form is counted independently. Propose singular, plural, base verb, `-ing` form, and irregular past tense as needed; skip possessives — they rarely add reach. For example: `"review"` / `"reviews"`, `"invest"` / `"investing"`, `"swim"` / `"swam"`.
39
+ - **Reasonable alternate spellings / abbreviations** — `"ethereum"` → `"ETH"`.
40
+
41
+ Produce **5–15** candidates including the seed(s). Cap at ~20 — every candidate costs one ES probe.
42
+
43
+ Hard rules:
44
+
45
+ - DO propose generic topic / concept terms.
46
+ - **Brand names — only mirror the seeds.** If the seed set is purely topic-shaped (`"crypto"`, `"productivity"`, `"home renovation"`), do NOT introduce brand names; brands should be resolved by `tl brands find` to integer IDs and queried through `sponsored_brand_mentions` / `organic_brand_mentions`, not by free-text match. Only if the seeds **already contain at least one brand name** (e.g. the caller is hunting for competitor coverage or adjacent sponsorship mentions in transcripts) is it appropriate to expand with adjacent brand names in the same category — e.g. seed `"NordVPN"` → `"Surfshark"`, `"ExpressVPN"`, `"Mullvad"` is fine; seed `"crypto"` → adding `"Coinbase"` is not.
47
+ - DON'T propose specific channel names (e.g. `"MrBeast"`). Same path: `tl channels find`.
48
+ - DON'T propose random-letter junk to pad the list.
49
+
50
+ #### Determine AND vs OR semantics
51
+
52
+ Decide upfront how the caller will combine the keywords downstream, and pass the result to the script with `--operator AND|OR`. The decision shapes both the expansion (next bullet) and the output envelope:
53
+
54
+ - **Default `OR`.** Most off-taxonomy queries are union-style ("crypto channels" matches any of crypto / bitcoin / Web3 / …).
55
+ - **`AND` only when the user's phrasing carries clear intersection semantics:**
56
+ - **Composite noun phrases** — `"AI cooking"`, `"Roman naval warfare"`, `"vegan keto"`.
57
+ - **Explicit conjunctions** — `"both X and Y"`, `"covering both X and Y"`.
58
+ - When in doubt, OR.
59
+
60
+ **Expansion shape under `AND`:** keep candidates **inside the intersection** — don't broaden across each component independently. For `"Roman naval warfare"`, expand within Roman-naval territory (`Punic Wars`, `Roman navy`, `trireme`, `Battle of Actium`); do NOT add generic Roman-empire or generic naval-warfare terms, because the downstream AND combine would then over-match unrelated channels.
61
+
62
+ ### Phase 2 — Rank (mechanical, via the bundled script)
63
+
64
+ Run the bundled script. It takes the candidate list, sends one `size:0` + `track_total_hits` phrase probe per keyword to `tl db es` against `["title", "summary", "transcript"]`, and prints the ranked JSON on stdout.
65
+
66
+ Three invocations cover almost every case. **Pick by the question shape** (channel vs video vs AND-composite):
67
+
68
+ ```bash
69
+ # (a) Channel search by topic — default fields (title, summary, transcript)
70
+ python3 skills/tl-keyword-research/scripts/probe.py crypto bitcoin DeFi Web3 blockchain "smart contract"
71
+
72
+ # (b) Video search by topic — REQUIRED: pass --fields title,summary
73
+ # The default field set includes `transcript`, which inflates counts via
74
+ # incidental mentions inside long videos. For video-level discovery the
75
+ # downstream ES query also uses title+summary, so the probe MUST match.
76
+ python3 skills/tl-keyword-research/scripts/probe.py --fields title,summary \
77
+ "budget meal prep" "cheap meal prep" "meal prep on a budget" "frugal recipes"
78
+
79
+ # (c) Composite noun ("both X and Y") — pass --operator AND so candidates stay
80
+ # inside the intersection (don't broaden each component independently)
81
+ python3 skills/tl-keyword-research/scripts/probe.py --operator AND \
82
+ "3d printing" "miniature painting" "tabletop miniatures" "resin printing minis"
83
+ ```
84
+
85
+
86
+ **Pick the invocation shape by what the user is searching for:**
87
+
88
+ ```bash
89
+ # (a) Channel search by topic — default fields (title, summary, transcript)
90
+ python3 <SKILL_DIR>/scripts/probe.py crypto bitcoin DeFi
91
+
92
+ # (b) Video search by topic — REQUIRED: pass --fields title,summary
93
+ # Without it, the probe includes transcript matches (noise from passing
94
+ # mentions inside long videos), and the count won't match the field set
95
+ # the downstream ES query uses for video-level discovery.
96
+ python3 <SKILL_DIR>/scripts/probe.py --fields title,summary \
97
+ "budget meal prep" "cheap meal prep" "meal prep on a budget"
98
+
99
+ # (c) Composite-noun phrase ("both X and Y" / "X-themed Y") — pass --operator AND
100
+ # to keep candidates inside the intersection
101
+ python3 <SKILL_DIR>/scripts/probe.py --operator AND \
102
+ "Roman naval warfare" "Punic Wars" trireme "Roman navy"
103
+ ```
104
+
105
+ Other input / scoping forms:
106
+
107
+ ```bash
108
+ # JSON array on stdin
109
+ echo '["crypto","bitcoin","DeFi"]' | python3 <SKILL_DIR>/scripts/probe.py
110
+
111
+ # Newline-separated on stdin
112
+ printf 'crypto\nbitcoin\nDeFi\n' | python3 <SKILL_DIR>/scripts/probe.py
113
+
114
+ # Time window (optional, applies to publication_date)
115
+ python3 <SKILL_DIR>/scripts/probe.py --since 2025-01-01 --until 2026-01-01 crypto bitcoin
116
+ ```
117
+
118
+ The script:
119
+
120
+ 1. Reads keywords from argv (preferred) or stdin (JSON array or newline-separated). Deduplicates case-insensitively; the first spelling wins.
121
+ 2. For each keyword, sends a `multi_match` phrase query against `["title", "summary", "transcript"]` with `size:0` and `track_total_hits:true`. Optionally scopes by `publication_date`.
122
+ 3. Reads `total` from the response envelope (falls back to `hits.total.value` if absent).
123
+ 4. Sorts descending by count.
124
+ 5. Prints the canonical JSON object on stdout.
125
+
126
+ If a single probe fails (auth, transport, server error), the script exits non-zero and writes the error to stderr — partial output is not produced.
127
+
128
+ ## Output (strict)
129
+
130
+ A **single JSON object** on stdout — no prose, no markdown fences:
131
+
132
+ ```json
133
+ {
134
+ "operator": "OR",
135
+ "keywords": [
136
+ {"keyword": "crypto", "count": 18742},
137
+ {"keyword": "bitcoin", "count": 15103},
138
+ {"keyword": "DeFi", "count": 4221},
139
+ {"keyword": "rugpull", "count": 0}
140
+ ]
141
+ }
142
+ ```
143
+
144
+ - `operator` is always present and is one of `"OR"` (default) or `"AND"`. It echoes whatever was passed via `--operator` and tells the caller how to combine the surviving keywords downstream (`bool.should` for OR, `bool.must` for AND, or the FilterSet equivalent).
145
+ - `keywords` sorted **descending** by `count`.
146
+ - **Zero-count entries are kept** — they signal that the agent's suggestion didn't match anything in the corpus, which is informative to the caller.
147
+ - **Deduplicated case-insensitively** — `"Crypto"` and `"crypto"` collapse to one entry; the first spelling wins.
148
+ - Each entry has exactly two keys: `keyword` (string) and `count` (integer).
149
+ - The seed keyword(s) are always included in the output, ranked alongside the suggestions.
150
+
151
+ The skill's responsibility ends at the ranked JSON. The caller decides what to do with it — typically running `tl db es` with a `multi_match` over the surviving high-count keywords against the same `title` / `summary` / `transcript` fields.
152
+
153
+ ## Cost
154
+
155
+ Each probe is `size:0` + `track_total_hits:true` with no aggregations — no rows are returned. At raw-DB pricing, expect roughly 1–2 credits per probe. For 10 keywords, expect ~10–20 credits total. Run `tl describe show db` to see the current rate.
156
+
157
+ ## Self-check before emitting
158
+
159
+ 1. Output is a single valid JSON object on stdout — no prose, no fences.
160
+ 2. `operator` is `"AND"` only when the user phrasing carries clear intersection semantics (composite-noun phrase or explicit "both X and Y"); otherwise `"OR"`.
161
+ 3. Under `operator: "AND"`, candidates stay inside the intersection — no broadening across components independently.
162
+ 4. Every keyword is a generic term (no specific brand or channel names).
163
+ 5. `keywords` array is sorted descending by `count`.
164
+ 6. Each entry has exactly `keyword` (string) and `count` (integer).
165
+ 7. The seed keyword(s) appear in the output.