thoughtleaders-cli 0.6.47__tar.gz → 0.6.51__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/.claude-plugin/plugin.json +1 -1
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/AGENTS.md +7 -1
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/API.md +7 -7
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/PKG-INFO +1 -1
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/pyproject.toml +1 -1
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/SKILL.md +66 -33
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/references/elasticsearch-schema.md +4 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-import/SKILL.md +1 -1
- thoughtleaders_cli-0.6.51/skills/tl-keyword-research/SKILL.md +165 -0
- thoughtleaders_cli-0.6.51/skills/tl-keyword-research/scripts/probe.py +156 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/SKILL.md +13 -26
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/examples/e2e_findings.md +13 -8
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/examples/golden_queries.md +6 -6
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/column_builder.md +1 -1
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/__init__.py +1 -1
- thoughtleaders_cli-0.6.51/src/tl_cli/commands/uploads.py +41 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/config.py +0 -1
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/main.py +7 -6
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/self_update.py +14 -3
- thoughtleaders_cli-0.6.47/skills/tl-report-builder/tools/keyword_research.md +0 -217
- thoughtleaders_cli-0.6.47/src/tl_cli/commands/ask.py +0 -54
- thoughtleaders_cli-0.6.47/src/tl_cli/commands/uploads.py +0 -86
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/.claude-plugin/marketplace.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/.github/workflows/python-publish.yml +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/.gitignore +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/CLAUDE.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/LICENSE +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/README.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/agents/tl-analyst.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/hooks/hooks.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/hooks/scripts/load-tl-skill.mjs +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/hooks/scripts/post-usage.sh +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/hooks/scripts/pre-check.sh +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/references/business-glossary.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/references/firebolt-schema.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/references/postgres-schema.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/columns_brands.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/columns_channels.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/columns_content.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/columns_sponsorships.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/intelligence_filterset_schema.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/intelligence_widget_schema.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/report_glossary.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/sortable_columns.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/sponsorship_filterset_schema.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/sponsorship_widget_schema.json +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/references/widgets.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/database_query.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/name_resolver.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/sample_judge.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/similar_channels.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/topic_matcher.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-report-builder/tools/widget_builder.md +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/_completions.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/auth/__init__.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/auth/commands.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/auth/finalize.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/auth/login.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/auth/pkce.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/auth/token_store.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/client/__init__.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/client/errors.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/client/http.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/__init__.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/_comments_common.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/balance.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/brands.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/bulk_import.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/changelog.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/channels.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/credits.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/db.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/deals.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/describe.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/doctor.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/matches.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/proposals.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/recommender.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/reports.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/schema.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/setup.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/snapshots.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/sponsorships.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/commands/whoami.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/filters.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/hints.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/output/__init__.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/src/tl_cli/output/formatter.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/__init__.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/test_auth.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/test_filters.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/test_http_auth.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/test_output.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/test_reports.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/tests/test_sponsorships.py +0 -0
- {thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/uv.lock +0 -0
|
@@ -56,6 +56,13 @@ The CLI integrates with AI coding agents via skills, commands, agents, and hooks
|
|
|
56
56
|
|
|
57
57
|
This repo is also a Claude Code plugin, and can directly be installed as one.
|
|
58
58
|
|
|
59
|
+
### Bundled skills — when to invoke
|
|
60
|
+
|
|
61
|
+
- **`tl`** — the main skill for querying ThoughtLeaders data. Default for any sponsorship / channel / brand / upload / report question.
|
|
62
|
+
- **`tl-keyword-research`** — invoke whenever the user wants to find videos or channels by **content keywords** (topics, concepts, niches) that aren't covered by a curated recommender tag, OR to validate that a candidate channel's content actually touches a given topic. Returns `{operator, keywords:[{keyword,count}]}` from a ranked ES probe over `title` / `summary` / `transcript`; the caller then runs the actual content search with the surviving high-count terms. **Do not compose keyword sets by hand for `tl db es` content searches — delegate to this skill first.** See `skills/tl/SKILL.md` → *Channel & video discovery* for the four-path decision tree and when to use this vs the recommender / raw SQL.
|
|
63
|
+
- **`tl-report-builder`** — invoke when the user wants to build, refine, or save a platform report (campaign config, FilterSet, columns, widgets). Multi-phase flow: routing → schema + validation → columns → widgets.
|
|
64
|
+
- **`tl-import`**, **`tl-save-report`**, **`adapt-tl-data`** — narrower workflows; the skill files document their own triggers.
|
|
65
|
+
|
|
59
66
|
### Skill content boundaries
|
|
60
67
|
|
|
61
68
|
Skills under `skills/` are split into a `SKILL.md` and one or more `references/*.md` files. To prevent drift, each fact has exactly one home:
|
|
@@ -90,7 +97,6 @@ All list endpoints return: `{ results, total, limit, offset, usage: { credits_ch
|
|
|
90
97
|
- `TL_API_URL` — API base (default: `https://app.thoughtleaders.io`)
|
|
91
98
|
- `TL_API_KEY` — Bearer token override for CI/scripts
|
|
92
99
|
- `TL_AUTH0_DOMAIN`, `TL_AUTH0_CLIENT_ID`, `TL_AUTH0_AUDIENCE` — Auth0 config
|
|
93
|
-
- `TL_LLM_KEY` — User's own LLM key for `tl ask` (avoids surcharge)
|
|
94
100
|
|
|
95
101
|
## Credit System
|
|
96
102
|
|
|
@@ -26,7 +26,7 @@ All requests must carry both of:
|
|
|
26
26
|
| `Authorization` | `Bearer <api_key>` | The credential. |
|
|
27
27
|
| `X-TL-Auth` | `API-KEY` | Opts into API-key auth. Without it the server interprets the Bearer as an Auth0 JWT and rejects the API-key string. |
|
|
28
28
|
|
|
29
|
-
|
|
29
|
+
Inactive or expired keys return `401`; if the owning user is deactivated the request fails with `403`.
|
|
30
30
|
|
|
31
31
|
A quick set of shell variables used throughout this page:
|
|
32
32
|
|
|
@@ -76,7 +76,7 @@ Multi-row responses share one shape:
|
|
|
76
76
|
"usage": {
|
|
77
77
|
"credits_charged": 4.12,
|
|
78
78
|
"credit_rate": 1.4,
|
|
79
|
-
"balance_remaining":
|
|
79
|
+
"balance_remaining": 9995.88
|
|
80
80
|
},
|
|
81
81
|
"_breadcrumbs": [
|
|
82
82
|
{ "hint": "next page", "command": "..." }
|
|
@@ -127,7 +127,7 @@ print(get('/whoami'))
|
|
|
127
127
|
"name": "Acme Marketing",
|
|
128
128
|
"plan": "Intelligence",
|
|
129
129
|
"is_managed_services": false,
|
|
130
|
-
"credits_balance":
|
|
130
|
+
"credits_balance": 9995.88
|
|
131
131
|
},
|
|
132
132
|
"associated_profiles": [ ... ],
|
|
133
133
|
"brands": [ ... ]
|
|
@@ -154,7 +154,7 @@ print(get('/balance'))
|
|
|
154
154
|
|
|
155
155
|
```json
|
|
156
156
|
{
|
|
157
|
-
"balance":
|
|
157
|
+
"balance": 9995.88,
|
|
158
158
|
"allow_overage": false,
|
|
159
159
|
"recent_usage": [
|
|
160
160
|
{
|
|
@@ -200,13 +200,13 @@ print(post('/raw/pg', {'query': sql}))
|
|
|
200
200
|
```json
|
|
201
201
|
{
|
|
202
202
|
"results": [
|
|
203
|
-
{"id": 12345, "channel_name": "MrBeast", "reach":
|
|
203
|
+
{"id": 12345, "channel_name": "MrBeast", "reach": 320000000},
|
|
204
204
|
...
|
|
205
205
|
],
|
|
206
206
|
"total": 5,
|
|
207
207
|
"limit": 5,
|
|
208
208
|
"offset": 0,
|
|
209
|
-
"usage": { "credits_charged": 1.84, "credit_rate": 1.4, "balance_remaining":
|
|
209
|
+
"usage": { "credits_charged": 1.84, "credit_rate": 1.4, "balance_remaining": 9994.04 }
|
|
210
210
|
}
|
|
211
211
|
```
|
|
212
212
|
|
|
@@ -381,7 +381,7 @@ def post(path, body):
|
|
|
381
381
|
r.raise_for_status()
|
|
382
382
|
return r.json()
|
|
383
383
|
|
|
384
|
-
# 1) Find Holafly's brand id
|
|
384
|
+
# 1) Find Holafly's brand id
|
|
385
385
|
brand = post('/raw/pg', {'query':
|
|
386
386
|
"SELECT id FROM thoughtleaders_brand WHERE name = 'Holafly' LIMIT 1 OFFSET 0"
|
|
387
387
|
})['results'][0]['id']
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: thoughtleaders-cli
|
|
3
|
-
Version: 0.6.
|
|
3
|
+
Version: 0.6.51
|
|
4
4
|
Summary: ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence
|
|
5
5
|
Project-URL: Homepage, https://thoughtleaders.io
|
|
6
6
|
Project-URL: Repository, https://github.com/ThoughtLeaders-io/thoughtleaders-cli
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: tl
|
|
3
3
|
description: |
|
|
4
|
-
Query and analyze YouTube sponsorship data using the `tl` CLI. Use this skill for
|
|
4
|
+
Query and analyze YouTube sponsorship data using the `tl` CLI. Use this skill for finding channels, brands and sponsorships, and for data exploration, including counts, metrics, trends, time-series, distributions, single-record drill-downs, revenue / pipeline-weighting math, view-curve analysis, cross-source business questions. Examples: "How many deals did we close last quarter?", "What's the weighted pipeline by sales owner?", "Show me the view curve for video X", "Find mentions of Surfshark in transcripts", "Investigate this video", "Find channels...", "Find brands...".
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# ThoughtLeaders Data Analyst
|
|
@@ -140,13 +140,13 @@ Unless the user specifically asks for running a specific report or showing the r
|
|
|
140
140
|
3. **Decide the method of discovery**: If the user want to explore certain topics, use the recommender commands. If it's more about filtering, construct a query for PG or ES.
|
|
141
141
|
4. **Always use --json**: Parse JSON output for multi-step analysis.
|
|
142
142
|
5. **Chain commands**: For complex questions, chain multiple `tl` commands, shell commands, and other tools.
|
|
143
|
-
6. **Format results**: When the user asks for a list or tabular data, present the results as a well-formatted markdown table. Pick the most relevant columns and use clear headers.
|
|
143
|
+
6. **Format results**: When the user asks for a list or tabular data, present the results as a well-formatted markdown table. Pick the most relevant columns and use clear headers. Sort the result by relevant criteria - if the user asked for "top performers", order by the performance metric; if the user asked for "most recent", sort by the pertinent date desc. When the result is tabular, ask the user if they want to save the list as a report, and invoke the `tl-save-report` skill.
|
|
144
144
|
|
|
145
|
-
Prefer writing shell code, `jq` commands, or `duckdb` commands that fetch or analysise large sets of data instead of analysing it yourself.
|
|
145
|
+
Prefer writing shell code, `jq` commands, or `duckdb` commands that fetch or analysise large sets of data instead of analysing it yourself. On Mac and Linux, create temporary files in `/tmp` that can be analysed later in different ways. On Windows, create them in `%USERPROFILE%\AppData\Local\Temp`. Before analysing a potentially large result set, first try fetching just a single result with `LIMIT 1` without `jq` etc, to see the shape of the data and any error messages.
|
|
146
146
|
|
|
147
147
|
## Available Commands
|
|
148
148
|
|
|
149
|
-
Note that if you're working on Windows, you must set up UTF-8 in the
|
|
149
|
+
Note that if you're working on Windows, you must set up UTF-8 in the terminal with `PYTHONIOENCODING=utf-8 tl ...`, because all of these commands return UTF-8 data.
|
|
150
150
|
|
|
151
151
|
### Data queries
|
|
152
152
|
|
|
@@ -391,7 +391,7 @@ See [references/postgres-schema.md](references/postgres-schema.md) for the accep
|
|
|
391
391
|
### Three sources, each authoritative for different things
|
|
392
392
|
|
|
393
393
|
- **Postgres** — deals, pipeline, brands, channels, users, organizations, profiles, revenue. Source of truth for deal state. Reachable via the structured `tl` commands or raw `tl db pg`.
|
|
394
|
-
- **Elasticsearch** — videos, transcripts, brand mentions, **current** channel/video metrics, demographics. Reachable via `tl
|
|
394
|
+
- **Elasticsearch** — videos, transcripts, brand mentions, **current** channel/video metrics, demographics. Reachable via `tl db es`.
|
|
395
395
|
- **Firebolt** — **historical** time-series snapshots only (view curves over time, subscriber-growth trends). Reachable via `tl snapshots` (preferred) or `tl db fb`.
|
|
396
396
|
|
|
397
397
|
**Use Firebolt only when you need a value AT A POINT IN TIME that no longer exists in the current ES/PG snapshot.** For "current views/subs", use ES.
|
|
@@ -449,9 +449,18 @@ tl changelog since v0.4.10 # Notes from v0.4.10 to latest
|
|
|
449
449
|
tl changelog --md > CHANGELOG.md # Capture for a doc
|
|
450
450
|
```
|
|
451
451
|
|
|
452
|
-
#### Channel discovery —
|
|
452
|
+
#### Channel & video discovery — pick the path for the question shape
|
|
453
453
|
|
|
454
|
-
|
|
454
|
+
Four first-class paths, each with a different signal. **Pick by the SHAPE of the user's question, not by habit.** "Recommender first" is the right default only for path 2 — for paths 1, 3, and 4 the recommender is the wrong tool.
|
|
455
|
+
|
|
456
|
+
**1. Named entity** — user named a specific channel, brand, or YouTube URL/handle/ID (`"MrBeast"`, `"NordVPN"`, `"@mkbhd"`, `"youtu.be/..."`). Use `tl channels find` / `tl brands find` — single-step resolver returning `{id, name}`. Cheap, deterministic, no expansion.
|
|
457
|
+
|
|
458
|
+
```bash
|
|
459
|
+
tl channels find "MrBeast"
|
|
460
|
+
tl brands find "NordVPN"
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
**2. Curated tag / category / demographic** — user named a topic that maps cleanly to a recommender tag (`"Cooking"`, `"Tech"`, `"USA share"`, content categories, format hints). Use the recommender — it ranks channels by how strongly they load on a tag, returning ranked similarity scores instead of forcing exact equality. It also returns matching brand profiles alongside the channels — useful when the user wants to know "who buys this kind of inventory."
|
|
455
464
|
|
|
456
465
|
```bash
|
|
457
466
|
# Discover the right tag name first (free)
|
|
@@ -464,7 +473,41 @@ tl recommender top-channels "Tech" --limit 30
|
|
|
464
473
|
tl recommender top-brands "USA share" mbn:yes --limit 50
|
|
465
474
|
```
|
|
466
475
|
|
|
467
|
-
|
|
476
|
+
**Available filters on the recommender commands:**
|
|
477
|
+
|
|
478
|
+
| Command | Filters |
|
|
479
|
+
| --- | --- |
|
|
480
|
+
| `top-channels` | `msn:<yes\|no\|all>` (default all), `exclude-for-profile:<id>` |
|
|
481
|
+
| `top-profiles` | `mbn:<yes\|no\|all>` (default all), `exclude-for-channel:<id>` |
|
|
482
|
+
| `top-brands` | `mbn:<yes\|no\|all>` (default all) |
|
|
483
|
+
| `channels-for-profile` | `language:<iso>` (default `en`), `msn:<yes\|no>` (default `no`) |
|
|
484
|
+
| `channels-for-brand` | same as `channels-for-profile` |
|
|
485
|
+
| `brands-for-channel` | `mbn:<yes\|no\|all>` (default `all`) |
|
|
486
|
+
|
|
487
|
+
Use `tl recommender top` for category/topic discovery (it's ranked) and `tl channels similar` / `tl brands similar` for 1:1 lookalike searches. This is the fast path.
|
|
488
|
+
|
|
489
|
+
**Hand-off to path 3 when the tag doesn't fit** If `tl recommender tags <hint>` returns no clean match, the user's intent cannot be represented by recommender tags — drop to path 3, do NOT fake-fit a loose adjacent tag. E.g. `"crypto/Web3 channels"` is a miss even though `"cryptocurrency"` exists as a tag — `"cryptocurrency"` is a financial-product tag, not the cultural-niche the user named. Same for `"speedcubing"`, `"biohacking and longevity"`, `"AI cooking"` — none of these are curated tags, so they belong in path 3.
|
|
490
|
+
|
|
491
|
+
**Also fall through to path 3 — NOT path 4 — when the recommender returns errors.** If `tl recommender top-channels "<tag>"` 5xx's or times out, the right fallback is path 3 (run the keyword-research skill against ES), not path 4 (PG `ILIKE` on `channel_name`). PG name-matching misses every channel whose name doesn't contain the literal word — that's the same anti-pattern called out at the bottom of this section.
|
|
492
|
+
|
|
493
|
+
**Also fall through to path 3 if the user wants to broaden the search.** When encountering further inputs like "broaden the search", "find more results", etc., it indicates the user is searching for topics beyond what the recommender tags provide.
|
|
494
|
+
|
|
495
|
+
**3. Content keywords beyond tags — invoke the `tl-keyword-research` skill** — user described content the channel OR video ACTUALLY TALKS ABOUT, and it isn't a curated tag. Triggers:
|
|
496
|
+
|
|
497
|
+
- **Channel search by topic** — `"crypto/Web3 channels"`, `"speedcubing channels"`, `"channels about biohacking and longevity"`, `"both 3D printing and miniature painting"`.
|
|
498
|
+
- **Video search by topic** — `"videos where creators discuss budget meal prep"`, `"uploads about [topic]"`, `"find videos that talk about X"`.
|
|
499
|
+
- **Channel–brand fit check** — does this candidate channel's content actually touch the brand's category? (Use with `channel.id` filter on the downstream ES query.)
|
|
500
|
+
- **Validating a recommender / SQL shortlist** — sample-check that the top-N channels really cover the niche.
|
|
501
|
+
|
|
502
|
+
**Do NOT compose keyword sets by hand for `tl db es`.** Always run the skill's script first. It broadens the user input, probes each candidate via `multi_match phrase`, and returns ranked counts:
|
|
503
|
+
|
|
504
|
+
```json
|
|
505
|
+
{"operator": "OR", "keywords": [{"keyword": "crypto", "count": 18742}, {"keyword": "bitcoin", "count": 15103}, {"keyword": "rugpull", "count": 0}]}
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
Then run the actual content search via `tl db es` (`multi_match` on the `title`, `summary`, `transcript` fields) with the surviving high-count keywords. The skill's full procedure (Phase 1 = seed expansion by you; Phase 2 = the script) is in the `tl-keyword-research` skill file.
|
|
509
|
+
|
|
510
|
+
**4. Pure attribute filter** — user wants channels filtered by metadata like: `is_tl_channel`, `language`, `demographic_device_primary`, country share in `demographic_geo` JSON, aggregations, joins. Use `tl db pg` with a SELECT on `thoughtleaders_channel`. Run `tl schema pg thoughtleaders_channel` once to confirm the live column set; the columns in the examples are stable.
|
|
468
511
|
|
|
469
512
|
```bash
|
|
470
513
|
# All TPP (TL-managed) channels — pure attribute filter, not a category query
|
|
@@ -483,14 +526,17 @@ tl db pg "SELECT id, channel_name, demographic_device_primary, total_views
|
|
|
483
526
|
LIMIT 100 OFFSET 0"
|
|
484
527
|
```
|
|
485
528
|
|
|
486
|
-
For per-country share beyond the recommender's "USA share" tag, use the `demographic_geo`
|
|
529
|
+
For per-country share beyond the recommender's "USA share" tag, use the `demographic_geo` JSONB field in raw SQL: `(demographic_geo->>'gb')::int >= 25`. Same pattern with `demographic_device->>'mobile'` for non-primary device shares.
|
|
487
530
|
|
|
488
531
|
**MSN status (`media_selling_network_join_date`) is scrubbed from the advertiser sandbox view.** Raw SQL can't filter on it from an advertiser context. For MSN-only / non-MSN lookups, run the same raw SQL with `media_selling_network_join_date IS [NOT] NULL` from a context that has access to it (full-access role), or rely on the recommender's MSN-aware filters: `tl recommender top-channels "<tag>" msn:yes|no|all`.
|
|
489
532
|
|
|
533
|
+
**Anti-pattern: defaulting to `ILIKE` on `channel_name` for off-tag topic queries.** If the question is "channels about X" where X is a topic / concept / niche (not a literal substring you expect in channel names), reach for path 3 (`tl-keyword-research`), not `WHERE channel_name ILIKE '%X%'`. Channel-name `ILIKE` misses channels whose name doesn't literally contain X but whose content does; the keyword-research skill catches them via `title` / `summary` / `transcript`. Use `channel_name ILIKE` only when you actually expect the channel's name to contain the term (e.g. `"Crypto"` in `"My Happy Crypto"`) as a supplementary signal alongside path 3, not as a replacement for it.
|
|
534
|
+
|
|
490
535
|
### Output flags
|
|
491
|
-
- `--json` — structured JSON (use this for parsing)
|
|
492
|
-
- `--
|
|
493
|
-
- `--
|
|
536
|
+
- `--json` — structured JSON output format (use this for parsing)
|
|
537
|
+
- `--toon` — [TOON](https://toonformat.dev/guide/getting-started.html) output format (efficient for large data sets while keeping metadata)
|
|
538
|
+
- `--csv` — CSV output format
|
|
539
|
+
- `--md` — Markdown table for user presentation only
|
|
494
540
|
- `--limit N` — max results
|
|
495
541
|
- `--offset N` — pagination
|
|
496
542
|
|
|
@@ -536,7 +582,7 @@ When presenting sponsorship status data, always use human-readable labels — ne
|
|
|
536
582
|
|
|
537
583
|
## Examples
|
|
538
584
|
|
|
539
|
-
"Show me my sold sponsorships this quarter":
|
|
585
|
+
### "Show me my sold sponsorships this quarter":
|
|
540
586
|
```bash
|
|
541
587
|
tl db pg "SELECT al.id, al.weighted_price, al.purchase_date, b.name AS brand
|
|
542
588
|
FROM thoughtleaders_adlink al
|
|
@@ -549,24 +595,24 @@ tl db pg "SELECT al.id, al.weighted_price, al.purchase_date, b.name AS brand
|
|
|
549
595
|
LIMIT 500 OFFSET 0" --json
|
|
550
596
|
```
|
|
551
597
|
|
|
552
|
-
"What channels does Nike sponsor?":
|
|
598
|
+
### "What channels does Nike sponsor?":
|
|
553
599
|
```bash
|
|
554
600
|
tl brands history Nike --json
|
|
555
601
|
```
|
|
556
602
|
|
|
557
|
-
"Compare view curves for two videos":
|
|
603
|
+
### "Compare view curves for two videos":
|
|
558
604
|
```bash
|
|
559
605
|
tl snapshots video abc123 --channel 456 --json
|
|
560
606
|
tl snapshots video def789 --channel 456 --json
|
|
561
607
|
```
|
|
562
608
|
|
|
563
|
-
"Run my Q1 pipeline report":
|
|
609
|
+
### "Run my Q1 pipeline report":
|
|
564
610
|
```bash
|
|
565
611
|
tl reports --json # Find the report ID first
|
|
566
612
|
tl reports run 42 --json
|
|
567
613
|
```
|
|
568
614
|
|
|
569
|
-
"Look up a channel or brand from whatever the user pasted":
|
|
615
|
+
### "Look up a channel or brand from whatever the user pasted":
|
|
570
616
|
```bash
|
|
571
617
|
# Channel: accepts name, slug, YouTube channel URL, handle (@…), raw channel ID
|
|
572
618
|
# (UC…), or any video URL. On ambiguity returns 400 with candidate {id, name};
|
|
@@ -596,7 +642,7 @@ tl recommender top-channels "Cooking" msn:yes --limit 100 --json \
|
|
|
596
642
|
LIMIT 50 OFFSET 0" --json
|
|
597
643
|
```
|
|
598
644
|
|
|
599
|
-
"Show sold sponsorships targeting mobile US audiences":
|
|
645
|
+
### "Show sold sponsorships targeting mobile US audiences":
|
|
600
646
|
```bash
|
|
601
647
|
tl db pg "SELECT al.id, c.channel_name, c.demographic_device_primary, c.demographic_usa_share, al.weighted_price
|
|
602
648
|
FROM thoughtleaders_adlink al
|
|
@@ -608,7 +654,7 @@ tl db pg "SELECT al.id, c.channel_name, c.demographic_device_primary, c.demograp
|
|
|
608
654
|
LIMIT 500 OFFSET 0" --json
|
|
609
655
|
```
|
|
610
656
|
|
|
611
|
-
"Find channels similar to one I know" (similarity recommender
|
|
657
|
+
### "Find channels similar to one I know" (similarity recommender):
|
|
612
658
|
```bash
|
|
613
659
|
tl channels similar 29834 --limit 10 # by ID (defaults to msn:yes, tpp:both)
|
|
614
660
|
tl channels similar "Tremending girls" --limit 5 # by unique name
|
|
@@ -620,7 +666,7 @@ tl channels similar 29834 min-subs:1000000 exclude:477487 --limit 15 # client-s
|
|
|
620
666
|
```
|
|
621
667
|
**Both `tl channels show` and `tl channels similar` accept either a numeric channel ID or a channel name.** Name arguments are case-insensitive partial matches; if more than one active channel matches, the command prints a candidates table (channel_id, subscribers, name) and exits 1 so you can retry with a specific ID. The `msn` filter on `similar` is tri-state: `yes` (only MSN channels — the default), `no` (only non-MSN channels), `both` (no MSN filter). `tl channels look-alike` is a hidden alias for `similar` that matches the internal "look-alike channels" terminology.
|
|
622
668
|
|
|
623
|
-
"Browse the recommender" (categories, demographics, formats
|
|
669
|
+
### "Browse the recommender" (categories, demographics, formats):
|
|
624
670
|
```bash
|
|
625
671
|
tl recommender tags # Full tag list (free)
|
|
626
672
|
tl recommender tags cooking # Search tag names by substring
|
|
@@ -637,16 +683,3 @@ tl recommender channels-for-brand 6037 msn:yes language:en --limit 30
|
|
|
637
683
|
tl recommender brands-for-channel 29834 --limit 30 # Brands likely to sponsor this channel
|
|
638
684
|
tl recommender brands-for-channel "MrBeast" mbn:yes --limit 30 # Same, restricted to MBN brand profiles
|
|
639
685
|
```
|
|
640
|
-
|
|
641
|
-
**Filters on the recommender commands:**
|
|
642
|
-
|
|
643
|
-
| Command | Filters |
|
|
644
|
-
| --- | --- |
|
|
645
|
-
| `top-channels` | `msn:<yes\|no\|all>` (default all), `exclude-for-profile:<id>` |
|
|
646
|
-
| `top-profiles` | `mbn:<yes\|no\|all>` (default all), `exclude-for-channel:<id>` |
|
|
647
|
-
| `top-brands` | `mbn:<yes\|no\|all>` (default all) |
|
|
648
|
-
| `channels-for-profile` | `language:<iso>` (default `en`), `msn:<yes\|no>` (default `no`) |
|
|
649
|
-
| `channels-for-brand` | same as `channels-for-profile` |
|
|
650
|
-
| `brands-for-channel` | `mbn:<yes\|no\|all>` (default `all`) |
|
|
651
|
-
|
|
652
|
-
Use `tl recommender top` for category/topic discovery (it's ranked) and `tl channels similar` / `tl brands similar` for 1:1 lookalike searches.
|
{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/references/elasticsearch-schema.md
RENAMED
|
@@ -261,6 +261,10 @@ tl db es '{
|
|
|
261
261
|
}'
|
|
262
262
|
```
|
|
263
263
|
|
|
264
|
+
## Text analyzer behavior
|
|
265
|
+
|
|
266
|
+
`text` fields on article docs (`title`, `summary`, `transcript`) appear to use the `standard` analyzer (tokenize + lowercase, no stemmer, no English-possessive filter), so inflections, plurals, and possessives are each indexed as distinct terms. For example: `bitcoin` (4,466,300) vs `bitcoins` (489,262). For stemming-style recall, expand the query side with a `bool.should` over the variants.
|
|
267
|
+
|
|
264
268
|
## Notes & gotchas
|
|
265
269
|
|
|
266
270
|
- **Composite IDs:** `tl-platform.id` and `_id` are `<channel_id>:<youtube_id>`. The `youtube_id` portion alone is what Firebolt's `article_metrics.id` stores.
|
|
@@ -135,7 +135,7 @@ Before running, confirm:
|
|
|
135
135
|
2. **Entity type** — one of `channels` / `brands` / `articles` / `sponsorships`. Infer from context, but translate user-facing vocabulary:
|
|
136
136
|
- YouTube URLs / handles / `UC…` IDs → `channels`
|
|
137
137
|
- Domains / brand slugs → `brands`
|
|
138
|
-
- "videos" / "uploads" / video URLs / video IDs → `articles` *(the CLI
|
|
138
|
+
- "videos" / "uploads" / video URLs / video IDs → `articles` *(the CLI surfaces them as uploads — `tl uploads show <id>` — but `bulk-import` expects `articles`; same concept, legacy naming)*
|
|
139
139
|
- "adlinks" / "deals" / "sponsorships" / numeric AdLink IDs → `sponsorships`
|
|
140
140
|
3. **Identifiers** — the list. Accepted shapes per entity:
|
|
141
141
|
- **channels**: numeric DB IDs, YouTube channel IDs (`UC…`), `@handles`, full YouTube URLs (`/@…`, `/channel/UC…`, `/user/…`)
|
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tl-keyword-research
|
|
3
|
+
description: |
|
|
4
|
+
Broaden and rank a set of content-search keywords. Invoke when the user wants to find videos or channels by content keywords (topics, concepts, niches) — not by ID or exact name. Takes one or more seed keywords (or an NL phrase), proposes related candidates, probes Elasticsearch for each one against the `title` / `summary` / `transcript` fields, and returns a strict JSON object `{"keywords":[{"keyword","count"},...]}` sorted descending by document count. The output is meant to feed the next step (typically a `tl db es` content search with the surviving high-count keywords).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# tl-keyword-research
|
|
8
|
+
|
|
9
|
+
Widen and rank content-search keywords before running the actual ES content search. Two phases: the agent expands the seed keyword(s) into a broader candidate set; the bundled script probes ES for each candidate and returns the ranked counts.
|
|
10
|
+
|
|
11
|
+
## When to invoke
|
|
12
|
+
|
|
13
|
+
Invoke this skill — directly, or as a delegated step from another skill / agent — when:
|
|
14
|
+
|
|
15
|
+
- The user wants to find **videos or channels by content keywords** (topics, concepts, niches), not by ID or by exact name.
|
|
16
|
+
- The user supplies at least one seed keyword, or an NL phrase from which seeds can be derived.
|
|
17
|
+
- The goal is to **widen** the keyword set the user came in with before running the actual content search.
|
|
18
|
+
|
|
19
|
+
Skip when:
|
|
20
|
+
|
|
21
|
+
- The user has explicit channel / brand IDs or names → use `tl channels find` / `tl brands find` instead.
|
|
22
|
+
- The user's intent maps cleanly to an existing recommender tag (e.g. "Cooking channels") → use `tl recommender top-channels "<tag>"` instead. Recommender tags are curated; don't re-discover them through keyword text matching.
|
|
23
|
+
|
|
24
|
+
## Inputs
|
|
25
|
+
|
|
26
|
+
- **Seed keywords** — one or more strings supplied by the caller (or extracted from an NL phrase).
|
|
27
|
+
- **Optional time window** — `--since YYYY-MM-DD` and / or `--until YYYY-MM-DD`. Scopes the probes to `publication_date` within that range. Default: all-time.
|
|
28
|
+
|
|
29
|
+
## Two phases
|
|
30
|
+
|
|
31
|
+
### Phase 1 — Expand (you, the agent)
|
|
32
|
+
|
|
33
|
+
Take the seed keyword(s) and broaden them with:
|
|
34
|
+
|
|
35
|
+
- **Synonyms** — `"crypto"` → `"cryptocurrency"`, `"digital currency"`.
|
|
36
|
+
- **Sub-areas / adjacent concepts** — `"crypto"` → `"bitcoin"`, `"ethereum"`, `"DeFi"`, `"NFT"`, `"blockchain"`, `"Web3"`.
|
|
37
|
+
- **Specific multi-word phrases** — `"crypto"` → `"how to buy bitcoin"`, `"smart contract"`.
|
|
38
|
+
- **Inflectional variants** — ES text fields aren't stemmed (see the [ES schema reference](../tl/references/elasticsearch-schema.md#text-analyzer-behavior)), so each surface form is counted independently. Propose singular, plural, base verb, `-ing` form, and irregular past tense as needed; skip possessives — they rarely add reach. For example: `"review"` / `"reviews"`, `"invest"` / `"investing"`, `"swim"` / `"swam"`.
|
|
39
|
+
- **Reasonable alternate spellings / abbreviations** — `"ethereum"` → `"ETH"`.
|
|
40
|
+
|
|
41
|
+
Produce **5–15** candidates including the seed(s). Cap at ~20 — every candidate costs one ES probe.
|
|
42
|
+
|
|
43
|
+
Hard rules:
|
|
44
|
+
|
|
45
|
+
- DO propose generic topic / concept terms.
|
|
46
|
+
- **Brand names — only mirror the seeds.** If the seed set is purely topic-shaped (`"crypto"`, `"productivity"`, `"home renovation"`), do NOT introduce brand names; brands should be resolved by `tl brands find` to integer IDs and queried through `sponsored_brand_mentions` / `organic_brand_mentions`, not by free-text match. Only if the seeds **already contain at least one brand name** (e.g. the caller is hunting for competitor coverage or adjacent sponsorship mentions in transcripts) is it appropriate to expand with adjacent brand names in the same category — e.g. seed `"NordVPN"` → `"Surfshark"`, `"ExpressVPN"`, `"Mullvad"` is fine; seed `"crypto"` → adding `"Coinbase"` is not.
|
|
47
|
+
- DON'T propose specific channel names (e.g. `"MrBeast"`). Same path: `tl channels find`.
|
|
48
|
+
- DON'T propose random-letter junk to pad the list.
|
|
49
|
+
|
|
50
|
+
#### Determine AND vs OR semantics
|
|
51
|
+
|
|
52
|
+
Decide upfront how the caller will combine the keywords downstream, and pass the result to the script with `--operator AND|OR`. The decision shapes both the expansion (next bullet) and the output envelope:
|
|
53
|
+
|
|
54
|
+
- **Default `OR`.** Most off-taxonomy queries are union-style ("crypto channels" matches any of crypto / bitcoin / Web3 / …).
|
|
55
|
+
- **`AND` only when the user's phrasing carries clear intersection semantics:**
|
|
56
|
+
- **Composite noun phrases** — `"AI cooking"`, `"Roman naval warfare"`, `"vegan keto"`.
|
|
57
|
+
- **Explicit conjunctions** — `"both X and Y"`, `"covering both X and Y"`.
|
|
58
|
+
- When in doubt, OR.
|
|
59
|
+
|
|
60
|
+
**Expansion shape under `AND`:** keep candidates **inside the intersection** — don't broaden across each component independently. For `"Roman naval warfare"`, expand within Roman-naval territory (`Punic Wars`, `Roman navy`, `trireme`, `Battle of Actium`); do NOT add generic Roman-empire or generic naval-warfare terms, because the downstream AND combine would then over-match unrelated channels.
|
|
61
|
+
|
|
62
|
+
### Phase 2 — Rank (mechanical, via the bundled script)
|
|
63
|
+
|
|
64
|
+
Run the bundled script. It takes the candidate list, sends one `size:0` + `track_total_hits` phrase probe per keyword to `tl db es` against `["title", "summary", "transcript"]`, and prints the ranked JSON on stdout.
|
|
65
|
+
|
|
66
|
+
Three invocations cover almost every case. **Pick by the question shape** (channel vs video vs AND-composite):
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
# (a) Channel search by topic — default fields (title, summary, transcript)
|
|
70
|
+
python3 skills/tl-keyword-research/scripts/probe.py crypto bitcoin DeFi Web3 blockchain "smart contract"
|
|
71
|
+
|
|
72
|
+
# (b) Video search by topic — REQUIRED: pass --fields title,summary
|
|
73
|
+
# The default field set includes `transcript`, which inflates counts via
|
|
74
|
+
# incidental mentions inside long videos. For video-level discovery the
|
|
75
|
+
# downstream ES query also uses title+summary, so the probe MUST match.
|
|
76
|
+
python3 skills/tl-keyword-research/scripts/probe.py --fields title,summary \
|
|
77
|
+
"budget meal prep" "cheap meal prep" "meal prep on a budget" "frugal recipes"
|
|
78
|
+
|
|
79
|
+
# (c) Composite noun ("both X and Y") — pass --operator AND so candidates stay
|
|
80
|
+
# inside the intersection (don't broaden each component independently)
|
|
81
|
+
python3 skills/tl-keyword-research/scripts/probe.py --operator AND \
|
|
82
|
+
"3d printing" "miniature painting" "tabletop miniatures" "resin printing minis"
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
|
|
86
|
+
**Pick the invocation shape by what the user is searching for:**
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
# (a) Channel search by topic — default fields (title, summary, transcript)
|
|
90
|
+
python3 <SKILL_DIR>/scripts/probe.py crypto bitcoin DeFi
|
|
91
|
+
|
|
92
|
+
# (b) Video search by topic — REQUIRED: pass --fields title,summary
|
|
93
|
+
# Without it, the probe includes transcript matches (noise from passing
|
|
94
|
+
# mentions inside long videos), and the count won't match the field set
|
|
95
|
+
# the downstream ES query uses for video-level discovery.
|
|
96
|
+
python3 <SKILL_DIR>/scripts/probe.py --fields title,summary \
|
|
97
|
+
"budget meal prep" "cheap meal prep" "meal prep on a budget"
|
|
98
|
+
|
|
99
|
+
# (c) Composite-noun phrase ("both X and Y" / "X-themed Y") — pass --operator AND
|
|
100
|
+
# to keep candidates inside the intersection
|
|
101
|
+
python3 <SKILL_DIR>/scripts/probe.py --operator AND \
|
|
102
|
+
"Roman naval warfare" "Punic Wars" trireme "Roman navy"
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Other input / scoping forms:
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
# JSON array on stdin
|
|
109
|
+
echo '["crypto","bitcoin","DeFi"]' | python3 <SKILL_DIR>/scripts/probe.py
|
|
110
|
+
|
|
111
|
+
# Newline-separated on stdin
|
|
112
|
+
printf 'crypto\nbitcoin\nDeFi\n' | python3 <SKILL_DIR>/scripts/probe.py
|
|
113
|
+
|
|
114
|
+
# Time window (optional, applies to publication_date)
|
|
115
|
+
python3 <SKILL_DIR>/scripts/probe.py --since 2025-01-01 --until 2026-01-01 crypto bitcoin
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
The script:
|
|
119
|
+
|
|
120
|
+
1. Reads keywords from argv (preferred) or stdin (JSON array or newline-separated). Deduplicates case-insensitively; the first spelling wins.
|
|
121
|
+
2. For each keyword, sends a `multi_match` phrase query against `["title", "summary", "transcript"]` with `size:0` and `track_total_hits:true`. Optionally scopes by `publication_date`.
|
|
122
|
+
3. Reads `total` from the response envelope (falls back to `hits.total.value` if absent).
|
|
123
|
+
4. Sorts descending by count.
|
|
124
|
+
5. Prints the canonical JSON object on stdout.
|
|
125
|
+
|
|
126
|
+
If a single probe fails (auth, transport, server error), the script exits non-zero and writes the error to stderr — partial output is not produced.
|
|
127
|
+
|
|
128
|
+
## Output (strict)
|
|
129
|
+
|
|
130
|
+
A **single JSON object** on stdout — no prose, no markdown fences:
|
|
131
|
+
|
|
132
|
+
```json
|
|
133
|
+
{
|
|
134
|
+
"operator": "OR",
|
|
135
|
+
"keywords": [
|
|
136
|
+
{"keyword": "crypto", "count": 18742},
|
|
137
|
+
{"keyword": "bitcoin", "count": 15103},
|
|
138
|
+
{"keyword": "DeFi", "count": 4221},
|
|
139
|
+
{"keyword": "rugpull", "count": 0}
|
|
140
|
+
]
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
- `operator` is always present and is one of `"OR"` (default) or `"AND"`. It echoes whatever was passed via `--operator` and tells the caller how to combine the surviving keywords downstream (`bool.should` for OR, `bool.must` for AND, or the FilterSet equivalent).
|
|
145
|
+
- `keywords` sorted **descending** by `count`.
|
|
146
|
+
- **Zero-count entries are kept** — they signal that the agent's suggestion didn't match anything in the corpus, which is informative to the caller.
|
|
147
|
+
- **Deduplicated case-insensitively** — `"Crypto"` and `"crypto"` collapse to one entry; the first spelling wins.
|
|
148
|
+
- Each entry has exactly two keys: `keyword` (string) and `count` (integer).
|
|
149
|
+
- The seed keyword(s) are always included in the output, ranked alongside the suggestions.
|
|
150
|
+
|
|
151
|
+
The skill's responsibility ends at the ranked JSON. The caller decides what to do with it — typically running `tl db es` with a `multi_match` over the surviving high-count keywords against the same `title` / `summary` / `transcript` fields.
|
|
152
|
+
|
|
153
|
+
## Cost
|
|
154
|
+
|
|
155
|
+
Each probe is `size:0` + `track_total_hits:true` with no aggregations — no rows are returned. At raw-DB pricing, expect roughly 1–2 credits per probe. For 10 keywords, expect ~10–20 credits total. Run `tl describe show db` to see the current rate.
|
|
156
|
+
|
|
157
|
+
## Self-check before emitting
|
|
158
|
+
|
|
159
|
+
1. Output is a single valid JSON object on stdout — no prose, no fences.
|
|
160
|
+
2. `operator` is `"AND"` only when the user phrasing carries clear intersection semantics (composite-noun phrase or explicit "both X and Y"); otherwise `"OR"`.
|
|
161
|
+
3. Under `operator: "AND"`, candidates stay inside the intersection — no broadening across components independently.
|
|
162
|
+
4. Every keyword is a generic term (no specific brand or channel names).
|
|
163
|
+
5. `keywords` array is sorted descending by `count`.
|
|
164
|
+
6. Each entry has exactly `keyword` (string) and `count` (integer).
|
|
165
|
+
7. The seed keyword(s) appear in the output.
|