PyPI - thoughtleaders-cli - Versions diffs - 0.6.47__tar.gz → 0.6.51__tar.gz - Mend

thoughtleaders-cli 0.6.47tar.gz → 0.6.51tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (96) hide show

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/.claude-plugin/plugin.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tl-cli",
-  "version": "0.6.47",
+  "version": "0.6.51",
   "description": "ThoughtLeaders CLI — query sponsorship deals, channels, brands, uploads, and intelligence from the terminal",
   "author": {
     "name": "ThoughtLeaders",

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/AGENTS.md RENAMED Viewed

@@ -56,6 +56,13 @@ The CLI integrates with AI coding agents via skills, commands, agents, and hooks
 This repo is also a Claude Code plugin, and can directly be installed as one.
+### Bundled skills — when to invoke
+- **`tl`** — the main skill for querying ThoughtLeaders data. Default for any sponsorship / channel / brand / upload / report question.
+- **`tl-keyword-research`** — invoke whenever the user wants to find videos or channels by **content keywords** (topics, concepts, niches) that aren't covered by a curated recommender tag, OR to validate that a candidate channel's content actually touches a given topic. Returns `{operator, keywords:[{keyword,count}]}` from a ranked ES probe over `title` / `summary` / `transcript`; the caller then runs the actual content search with the surviving high-count terms. **Do not compose keyword sets by hand for `tl db es` content searches — delegate to this skill first.** See `skills/tl/SKILL.md` → *Channel & video discovery* for the four-path decision tree and when to use this vs the recommender / raw SQL.
+- **`tl-report-builder`** — invoke when the user wants to build, refine, or save a platform report (campaign config, FilterSet, columns, widgets). Multi-phase flow: routing → schema + validation → columns → widgets.
+- **`tl-import`**, **`tl-save-report`**, **`adapt-tl-data`** — narrower workflows; the skill files document their own triggers.
 ### Skill content boundaries
 Skills under `skills/` are split into a `SKILL.md` and one or more `references/*.md` files. To prevent drift, each fact has exactly one home:
@@ -90,7 +97,6 @@ All list endpoints return: `{ results, total, limit, offset, usage: { credits_ch
 - `TL_API_URL` — API base (default: `https://app.thoughtleaders.io`)
 - `TL_API_KEY` — Bearer token override for CI/scripts
 - `TL_AUTH0_DOMAIN`, `TL_AUTH0_CLIENT_ID`, `TL_AUTH0_AUDIENCE` — Auth0 config
-- `TL_LLM_KEY` — User's own LLM key for `tl ask` (avoids surcharge)
 ## Credit System

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/API.md RENAMED Viewed

@@ -26,7 +26,7 @@ All requests must carry both of:
 | `Authorization` | `Bearer <api_key>` | The credential. |
 | `X-TL-Auth` | `API-KEY` | Opts into API-key auth. Without it the server interprets the Bearer as an Auth0 JWT and rejects the API-key string. |
-Create an API key from Django Admin → **API keys** → Add. The shown 64-char hex value is the only secret — keep it out of version control. Keys carry an `is_active` flag and an `expires_at` (defaults to 1 year from creation). Inactive or expired keys return `401`; if the owning user is deactivated the request fails with `403`.
+Inactive or expired keys return `401`; if the owning user is deactivated the request fails with `403`.
 A quick set of shell variables used throughout this page:
@@ -76,7 +76,7 @@ Multi-row responses share one shape:
   "usage": {
     "credits_charged": 4.12,
     "credit_rate": 1.4,
-    "balance_remaining": 9_995.88
+    "balance_remaining": 9995.88
   },
   "_breadcrumbs": [
     { "hint": "next page", "command": "..." }
@@ -127,7 +127,7 @@ print(get('/whoami'))
     "name": "Acme Marketing",
     "plan": "Intelligence",
     "is_managed_services": false,
-    "credits_balance": 9_995.88
+    "credits_balance": 9995.88
   },
   "associated_profiles": [ ... ],
   "brands": [ ... ]
@@ -154,7 +154,7 @@ print(get('/balance'))
 ```json
 {
-  "balance": 9_995.88,
+  "balance": 9995.88,
   "allow_overage": false,
   "recent_usage": [
     {
@@ -200,13 +200,13 @@ print(post('/raw/pg', {'query': sql}))
 ```json
 {
   "results": [
-    {"id": 12345, "channel_name": "MrBeast", "reach": 320_000_000},
+    {"id": 12345, "channel_name": "MrBeast", "reach": 320000000},
     ...
   ],
   "total": 5,
   "limit": 5,
   "offset": 0,
-  "usage": { "credits_charged": 1.84, "credit_rate": 1.4, "balance_remaining": 9_994.04 }
+  "usage": { "credits_charged": 1.84, "credit_rate": 1.4, "balance_remaining": 9994.04 }
 }
 ```
@@ -381,7 +381,7 @@ def post(path, body):
     r.raise_for_status()
     return r.json()
-# 1) Find Holafly's brand id (free name lookup endpoint covered elsewhere — use db_pg here)
+# 1) Find Holafly's brand id
 brand = post('/raw/pg', {'query':
     "SELECT id FROM thoughtleaders_brand WHERE name = 'Holafly' LIMIT 1 OFFSET 0"
 })['results'][0]['id']

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: thoughtleaders-cli
-Version: 0.6.47
+Version: 0.6.51
 Summary: ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence
 Project-URL: Homepage, https://thoughtleaders.io
 Project-URL: Repository, https://github.com/ThoughtLeaders-io/thoughtleaders-cli

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "thoughtleaders-cli"
-version = "0.6.47"
+version = "0.6.51"
 description = "ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence"
 readme = "README.md"
 license = "MIT"

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/SKILL.md RENAMED Viewed

@@ -1,7 +1,7 @@
 ---
 name: tl
 description: |
-  Query and analyze YouTube sponsorship data using the `tl` CLI. Use this skill for data exploration and questions about channels, brands and sponsorships: counts, metrics, trends, time-series, distributions, single-record drill-downs, revenue / pipeline-weighting math, view-curve analysis, cross-source business questions. Examples: "How many deals did we close last quarter?", "What's the weighted pipeline by sales owner?", "Show me the view curve for video X", "Find mentions of Surfshark in transcripts", "Investigate this video".
+  Query and analyze YouTube sponsorship data using the `tl` CLI. Use this skill for finding channels, brands and sponsorships, and for data exploration, including counts, metrics, trends, time-series, distributions, single-record drill-downs, revenue / pipeline-weighting math, view-curve analysis, cross-source business questions. Examples: "How many deals did we close last quarter?", "What's the weighted pipeline by sales owner?", "Show me the view curve for video X", "Find mentions of Surfshark in transcripts", "Investigate this video", "Find channels...", "Find brands...".
 ---
 # ThoughtLeaders Data Analyst
@@ -140,13 +140,13 @@ Unless the user specifically asks for running a specific report or showing the r
 3. **Decide the method of discovery**: If the user want to explore certain topics, use the recommender commands. If it's more about filtering, construct a query for PG or ES.
 4. **Always use --json**: Parse JSON output for multi-step analysis.
 5. **Chain commands**: For complex questions, chain multiple `tl` commands, shell commands, and other tools.
-6. **Format results**: When the user asks for a list or tabular data, present the results as a well-formatted markdown table. Pick the most relevant columns and use clear headers. In this case, ask the user if they want to save the list as a report, and invoke the `tl-save-report` skill.
+6. **Format results**: When the user asks for a list or tabular data, present the results as a well-formatted markdown table. Pick the most relevant columns and use clear headers. Sort the result by relevant criteria - if the user asked for "top performers", order by the performance metric; if the user asked for "most recent", sort by the pertinent date desc. When the result is tabular, ask the user if they want to save the list as a report, and invoke the `tl-save-report` skill.
-Prefer writing shell code, `jq` commands, or `duckdb` commands that fetch or analysise large sets of data instead of analysing it yourself. Create temporary files in `/tmp` that can be analysed later in different ways. Before analysing a potentially large result set, first try fetching just a single result with `LIMIT 1` without `jq` etc, to see the shape of the data and any error messages.
+Prefer writing shell code, `jq` commands, or `duckdb` commands that fetch or analysise large sets of data instead of analysing it yourself. On Mac and Linux, create temporary files in `/tmp` that can be analysed later in different ways. On Windows, create them in `%USERPROFILE%\AppData\Local\Temp`.  Before analysing a potentially large result set, first try fetching just a single result with `LIMIT 1` without `jq` etc, to see the shape of the data and any error messages.
 ## Available Commands
-Note that if you're working on Windows, you must set up UTF-8 in the console, because all of these commands return UTF-8 data.
+Note that if you're working on Windows, you must set up UTF-8 in the terminal with `PYTHONIOENCODING=utf-8 tl ...`, because all of these commands return UTF-8 data.
 ### Data queries
@@ -391,7 +391,7 @@ See [references/postgres-schema.md](references/postgres-schema.md) for the accep
 ### Three sources, each authoritative for different things
 - **Postgres** — deals, pipeline, brands, channels, users, organizations, profiles, revenue. Source of truth for deal state. Reachable via the structured `tl` commands or raw `tl db pg`.
-- **Elasticsearch** — videos, transcripts, brand mentions, **current** channel/video metrics, demographics. Reachable via `tl uploads`, `tl channels`, and `tl db es`.
+- **Elasticsearch** — videos, transcripts, brand mentions, **current** channel/video metrics, demographics. Reachable via `tl db es`.
 - **Firebolt** — **historical** time-series snapshots only (view curves over time, subscriber-growth trends). Reachable via `tl snapshots` (preferred) or `tl db fb`.
 **Use Firebolt only when you need a value AT A POINT IN TIME that no longer exists in the current ES/PG snapshot.** For "current views/subs", use ES.
@@ -449,9 +449,18 @@ tl changelog since v0.4.10             # Notes from v0.4.10 to latest
 tl changelog --md > CHANGELOG.md       # Capture for a doc
 ```
-#### Channel discovery — recommender first, raw SQL second
+#### Channel & video discovery — pick the path for the question shape
-For category- or demographic-driven discovery, **use the recommender, not `content_category` SQL.** The recommender ranks channels by how strongly they load on a category/demographic tag (similarity scores), instead of forcing exact equality on a single integer code. It also returns the matching brand profiles alongside the channels — useful when the user actually wants to know "who buys this kind of inventory."
+Four first-class paths, each with a different signal. **Pick by the SHAPE of the user's question, not by habit.** "Recommender first" is the right default only for path 2 — for paths 1, 3, and 4 the recommender is the wrong tool.
+**1. Named entity** — user named a specific channel, brand, or YouTube URL/handle/ID (`"MrBeast"`, `"NordVPN"`, `"@mkbhd"`, `"youtu.be/..."`). Use `tl channels find` / `tl brands find` — single-step resolver returning `{id, name}`. Cheap, deterministic, no expansion.
+```bash
+tl channels find "MrBeast"
+tl brands find "NordVPN"
+```
+**2. Curated tag / category / demographic** — user named a topic that maps cleanly to a recommender tag (`"Cooking"`, `"Tech"`, `"USA share"`, content categories, format hints). Use the recommender — it ranks channels by how strongly they load on a tag, returning ranked similarity scores instead of forcing exact equality. It also returns matching brand profiles alongside the channels — useful when the user wants to know "who buys this kind of inventory."
 ```bash
 # Discover the right tag name first (free)
@@ -464,7 +473,41 @@ tl recommender top-channels "Tech" --limit 30
 tl recommender top-brands "USA share" mbn:yes --limit 50
 ```
-Use `tl db pg` only for predicates the recommender can't express — pure attribute filters (`is_tl_channel`, `language`, `demographic_device_primary`), aggregations, and joins. Run `tl schema pg` once to confirm the live column set; the columns referenced below are stable.
+**Available filters on the recommender commands:**
+| Command | Filters |
+| --- | --- |
+| `top-channels` | `msn:<yes\|no\|all>` (default all), `exclude-for-profile:<id>` |
+| `top-profiles` | `mbn:<yes\|no\|all>` (default all), `exclude-for-channel:<id>` |
+| `top-brands` | `mbn:<yes\|no\|all>` (default all) |
+| `channels-for-profile` | `language:<iso>` (default `en`), `msn:<yes\|no>` (default `no`) |
+| `channels-for-brand` | same as `channels-for-profile` |
+| `brands-for-channel` | `mbn:<yes\|no\|all>` (default `all`) |
+Use `tl recommender top` for category/topic discovery (it's ranked) and `tl channels similar` / `tl brands similar` for 1:1 lookalike searches. This is the fast path.
+**Hand-off to path 3 when the tag doesn't fit** If `tl recommender tags <hint>` returns no clean match, the user's intent cannot be represented by recommender tags — drop to path 3, do NOT fake-fit a loose adjacent tag. E.g. `"crypto/Web3 channels"` is a miss even though `"cryptocurrency"` exists as a tag — `"cryptocurrency"` is a financial-product tag, not the cultural-niche the user named. Same for `"speedcubing"`, `"biohacking and longevity"`, `"AI cooking"` — none of these are curated tags, so they belong in path 3.
+**Also fall through to path 3 — NOT path 4 — when the recommender returns errors.** If `tl recommender top-channels "<tag>"` 5xx's or times out, the right fallback is path 3 (run the keyword-research skill against ES), not path 4 (PG `ILIKE` on `channel_name`). PG name-matching misses every channel whose name doesn't contain the literal word — that's the same anti-pattern called out at the bottom of this section.
+**Also fall through to path 3 if the user wants to broaden the search.** When encountering further inputs like "broaden the search", "find more results", etc., it indicates the user is searching for topics beyond what the recommender tags provide.
+**3. Content keywords beyond tags — invoke the `tl-keyword-research` skill** — user described content the channel OR video ACTUALLY TALKS ABOUT, and it isn't a curated tag. Triggers:
+- **Channel search by topic** — `"crypto/Web3 channels"`, `"speedcubing channels"`, `"channels about biohacking and longevity"`, `"both 3D printing and miniature painting"`.
+- **Video search by topic** — `"videos where creators discuss budget meal prep"`, `"uploads about [topic]"`, `"find videos that talk about X"`.
+- **Channel–brand fit check** — does this candidate channel's content actually touch the brand's category? (Use with `channel.id` filter on the downstream ES query.)
+- **Validating a recommender / SQL shortlist** — sample-check that the top-N channels really cover the niche.
+**Do NOT compose keyword sets by hand for `tl db es`.** Always run the skill's script first. It broadens the user input, probes each candidate via `multi_match phrase`, and returns ranked counts:
+```json
+{"operator": "OR", "keywords": [{"keyword": "crypto", "count": 18742}, {"keyword": "bitcoin", "count": 15103}, {"keyword": "rugpull", "count": 0}]}
+```
+Then run the actual content search via `tl db es` (`multi_match` on the `title`, `summary`, `transcript` fields) with the surviving high-count keywords. The skill's full procedure (Phase 1 = seed expansion by you; Phase 2 = the script) is in the `tl-keyword-research` skill file.
+**4. Pure attribute filter** — user wants channels filtered by metadata like: `is_tl_channel`, `language`, `demographic_device_primary`, country share in `demographic_geo` JSON, aggregations, joins. Use `tl db pg` with a SELECT on `thoughtleaders_channel`. Run `tl schema pg thoughtleaders_channel` once to confirm the live column set; the columns in the examples are stable.
 ```bash
 # All TPP (TL-managed) channels — pure attribute filter, not a category query
@@ -483,14 +526,17 @@ tl db pg "SELECT id, channel_name, demographic_device_primary, total_views
           LIMIT 100 OFFSET 0"
 ```
-For per-country share beyond the recommender's "USA share" tag, use the `demographic_geo` jsonb in raw SQL: `(demographic_geo->>'gb')::int >= 25`. Same pattern with `demographic_device->>'mobile'` for non-primary device shares.
+For per-country share beyond the recommender's "USA share" tag, use the `demographic_geo` JSONB field in raw SQL: `(demographic_geo->>'gb')::int >= 25`. Same pattern with `demographic_device->>'mobile'` for non-primary device shares.
 **MSN status (`media_selling_network_join_date`) is scrubbed from the advertiser sandbox view.** Raw SQL can't filter on it from an advertiser context. For MSN-only / non-MSN lookups, run the same raw SQL with `media_selling_network_join_date IS [NOT] NULL` from a context that has access to it (full-access role), or rely on the recommender's MSN-aware filters: `tl recommender top-channels "<tag>" msn:yes|no|all`.
+**Anti-pattern: defaulting to `ILIKE` on `channel_name` for off-tag topic queries.** If the question is "channels about X" where X is a topic / concept / niche (not a literal substring you expect in channel names), reach for path 3 (`tl-keyword-research`), not `WHERE channel_name ILIKE '%X%'`. Channel-name `ILIKE` misses channels whose name doesn't literally contain X but whose content does; the keyword-research skill catches them via `title` / `summary` / `transcript`. Use `channel_name ILIKE` only when you actually expect the channel's name to contain the term (e.g. `"Crypto"` in `"My Happy Crypto"`) as a supplementary signal alongside path 3, not as a replacement for it.
 ### Output flags
-- `--json` — structured JSON (use this for parsing)
-- `--csv` — CSV output
-- `--md` — Markdown table
+- `--json` — structured JSON output format (use this for parsing)
+- `--toon` — [TOON](https://toonformat.dev/guide/getting-started.html) output format (efficient for large data sets while keeping metadata)
+- `--csv` — CSV output format
+- `--md` — Markdown table for user presentation only
 - `--limit N` — max results
 - `--offset N` — pagination
@@ -536,7 +582,7 @@ When presenting sponsorship status data, always use human-readable labels — ne
 ## Examples
-"Show me my sold sponsorships this quarter":
+### "Show me my sold sponsorships this quarter":
 ```bash
 tl db pg "SELECT al.id, al.weighted_price, al.purchase_date, b.name AS brand
           FROM thoughtleaders_adlink al
@@ -549,24 +595,24 @@ tl db pg "SELECT al.id, al.weighted_price, al.purchase_date, b.name AS brand
           LIMIT 500 OFFSET 0" --json
 ```
-"What channels does Nike sponsor?":
+### "What channels does Nike sponsor?":
 ```bash
 tl brands history Nike --json
 ```
-"Compare view curves for two videos":
+### "Compare view curves for two videos":
 ```bash
 tl snapshots video abc123 --channel 456 --json
 tl snapshots video def789 --channel 456 --json
 ```
-"Run my Q1 pipeline report":
+### "Run my Q1 pipeline report":
 ```bash
 tl reports --json  # Find the report ID first
 tl reports run 42 --json
 ```
-"Look up a channel or brand from whatever the user pasted":
+### "Look up a channel or brand from whatever the user pasted":
 ```bash
 # Channel: accepts name, slug, YouTube channel URL, handle (@…), raw channel ID
 # (UC…), or any video URL. On ambiguity returns 400 with candidate {id, name};
@@ -596,7 +642,7 @@ tl recommender top-channels "Cooking" msn:yes --limit 100 --json \
                           LIMIT 50 OFFSET 0" --json
 ```
-"Show sold sponsorships targeting mobile US audiences":
+### "Show sold sponsorships targeting mobile US audiences":
 ```bash
 tl db pg "SELECT al.id, c.channel_name, c.demographic_device_primary, c.demographic_usa_share, al.weighted_price
           FROM thoughtleaders_adlink al
@@ -608,7 +654,7 @@ tl db pg "SELECT al.id, c.channel_name, c.demographic_device_primary, c.demograp
           LIMIT 500 OFFSET 0" --json
 ```
-"Find channels similar to one I know" (similarity recommender, 25 credits per call):
+### "Find channels similar to one I know" (similarity recommender):
 ```bash
 tl channels similar 29834 --limit 10                         # by ID (defaults to msn:yes, tpp:both)
 tl channels similar "Tremending girls" --limit 5             # by unique name
@@ -620,7 +666,7 @@ tl channels similar 29834 min-subs:1000000 exclude:477487 --limit 15  # client-s
 ```
 **Both `tl channels show` and `tl channels similar` accept either a numeric channel ID or a channel name.** Name arguments are case-insensitive partial matches; if more than one active channel matches, the command prints a candidates table (channel_id, subscribers, name) and exits 1 so you can retry with a specific ID. The `msn` filter on `similar` is tri-state: `yes` (only MSN channels — the default), `no` (only non-MSN channels), `both` (no MSN filter). `tl channels look-alike` is a hidden alias for `similar` that matches the internal "look-alike channels" terminology.
-"Browse the recommender" (categories, demographics, formats — `tl recommender tags` is free):
+### "Browse the recommender" (categories, demographics, formats):
 ```bash
 tl recommender tags                                            # Full tag list (free)
 tl recommender tags cooking                                    # Search tag names by substring
@@ -637,16 +683,3 @@ tl recommender channels-for-brand 6037 msn:yes language:en --limit 30
 tl recommender brands-for-channel 29834 --limit 30                   # Brands likely to sponsor this channel
 tl recommender brands-for-channel "MrBeast" mbn:yes --limit 30       # Same, restricted to MBN brand profiles
 ```
-**Filters on the recommender commands:**
-| Command | Filters |
-| --- | --- |
-| `top-channels` | `msn:<yes\|no\|all>` (default all), `exclude-for-profile:<id>` |
-| `top-profiles` | `mbn:<yes\|no\|all>` (default all), `exclude-for-channel:<id>` |
-| `top-brands` | `mbn:<yes\|no\|all>` (default all) |
-| `channels-for-profile` | `language:<iso>` (default `en`), `msn:<yes\|no>` (default `no`) |
-| `channels-for-brand` | same as `channels-for-profile` |
-| `brands-for-channel` | `mbn:<yes\|no\|all>` (default `all`) |
-Use `tl recommender top` for category/topic discovery (it's ranked) and `tl channels similar` / `tl brands similar` for 1:1 lookalike searches.

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl/references/elasticsearch-schema.md RENAMED Viewed

@@ -261,6 +261,10 @@ tl db es '{
 }'
 ```
+## Text analyzer behavior
+`text` fields on article docs (`title`, `summary`, `transcript`) appear to use the `standard` analyzer (tokenize + lowercase, no stemmer, no English-possessive filter), so inflections, plurals, and possessives are each indexed as distinct terms. For example: `bitcoin` (4,466,300) vs `bitcoins` (489,262). For stemming-style recall, expand the query side with a `bool.should` over the variants.
 ## Notes & gotchas
 - **Composite IDs:** `tl-platform.id` and `_id` are `<channel_id>:<youtube_id>`. The `youtube_id` portion alone is what Firebolt's `article_metrics.id` stores.

{thoughtleaders_cli-0.6.47 → thoughtleaders_cli-0.6.51}/skills/tl-import/SKILL.md RENAMED Viewed

@@ -135,7 +135,7 @@ Before running, confirm:
 2. **Entity type** — one of `channels` / `brands` / `articles` / `sponsorships`. Infer from context, but translate user-facing vocabulary:
    - YouTube URLs / handles / `UC…` IDs → `channels`
    - Domains / brand slugs → `brands`
-   - "videos" / "uploads" / video URLs / video IDs → `articles` *(the CLI calls them uploads in `tl uploads list`, but `bulk-import` expects `articles` — same concept, legacy naming)*
+   - "videos" / "uploads" / video URLs / video IDs → `articles` *(the CLI surfaces them as uploads — `tl uploads show <id>` — but `bulk-import` expects `articles`; same concept, legacy naming)*
    - "adlinks" / "deals" / "sponsorships" / numeric AdLink IDs → `sponsorships`
 3. **Identifiers** — the list. Accepted shapes per entity:
    - **channels**: numeric DB IDs, YouTube channel IDs (`UC…`), `@handles`, full YouTube URLs (`/@…`, `/channel/UC…`, `/user/…`)

thoughtleaders_cli-0.6.51/skills/tl-keyword-research/SKILL.md ADDED Viewed

@@ -0,0 +1,165 @@
+---
+name: tl-keyword-research
+description: |
+  Broaden and rank a set of content-search keywords. Invoke when the user wants to find videos or channels by content keywords (topics, concepts, niches) — not by ID or exact name. Takes one or more seed keywords (or an NL phrase), proposes related candidates, probes Elasticsearch for each one against the `title` / `summary` / `transcript` fields, and returns a strict JSON object `{"keywords":[{"keyword","count"},...]}` sorted descending by document count. The output is meant to feed the next step (typically a `tl db es` content search with the surviving high-count keywords).
+---
+# tl-keyword-research
+Widen and rank content-search keywords before running the actual ES content search. Two phases: the agent expands the seed keyword(s) into a broader candidate set; the bundled script probes ES for each candidate and returns the ranked counts.
+## When to invoke
+Invoke this skill — directly, or as a delegated step from another skill / agent — when:
+- The user wants to find **videos or channels by content keywords** (topics, concepts, niches), not by ID or by exact name.
+- The user supplies at least one seed keyword, or an NL phrase from which seeds can be derived.
+- The goal is to **widen** the keyword set the user came in with before running the actual content search.
+Skip when:
+- The user has explicit channel / brand IDs or names → use `tl channels find` / `tl brands find` instead.
+- The user's intent maps cleanly to an existing recommender tag (e.g. "Cooking channels") → use `tl recommender top-channels "<tag>"` instead. Recommender tags are curated; don't re-discover them through keyword text matching.
+## Inputs
+- **Seed keywords** — one or more strings supplied by the caller (or extracted from an NL phrase).
+- **Optional time window** — `--since YYYY-MM-DD` and / or `--until YYYY-MM-DD`. Scopes the probes to `publication_date` within that range. Default: all-time.
+## Two phases
+### Phase 1 — Expand (you, the agent)
+Take the seed keyword(s) and broaden them with:
+- **Synonyms** — `"crypto"` → `"cryptocurrency"`, `"digital currency"`.
+- **Sub-areas / adjacent concepts** — `"crypto"` → `"bitcoin"`, `"ethereum"`, `"DeFi"`, `"NFT"`, `"blockchain"`, `"Web3"`.
+- **Specific multi-word phrases** — `"crypto"` → `"how to buy bitcoin"`, `"smart contract"`.
+- **Inflectional variants** — ES text fields aren't stemmed (see the [ES schema reference](../tl/references/elasticsearch-schema.md#text-analyzer-behavior)), so each surface form is counted independently. Propose singular, plural, base verb, `-ing` form, and irregular past tense as needed; skip possessives — they rarely add reach. For example: `"review"` / `"reviews"`, `"invest"` / `"investing"`, `"swim"` / `"swam"`.
+- **Reasonable alternate spellings / abbreviations** — `"ethereum"` → `"ETH"`.
+Produce **5–15** candidates including the seed(s). Cap at ~20 — every candidate costs one ES probe.
+Hard rules:
+- DO propose generic topic / concept terms.
+- **Brand names — only mirror the seeds.** If the seed set is purely topic-shaped (`"crypto"`, `"productivity"`, `"home renovation"`), do NOT introduce brand names; brands should be resolved by `tl brands find` to integer IDs and queried through `sponsored_brand_mentions` / `organic_brand_mentions`, not by free-text match. Only if the seeds **already contain at least one brand name** (e.g. the caller is hunting for competitor coverage or adjacent sponsorship mentions in transcripts) is it appropriate to expand with adjacent brand names in the same category — e.g. seed `"NordVPN"` → `"Surfshark"`, `"ExpressVPN"`, `"Mullvad"` is fine; seed `"crypto"` → adding `"Coinbase"` is not.
+- DON'T propose specific channel names (e.g. `"MrBeast"`). Same path: `tl channels find`.
+- DON'T propose random-letter junk to pad the list.
+#### Determine AND vs OR semantics
+Decide upfront how the caller will combine the keywords downstream, and pass the result to the script with `--operator AND|OR`. The decision shapes both the expansion (next bullet) and the output envelope:
+- **Default `OR`.** Most off-taxonomy queries are union-style ("crypto channels" matches any of crypto / bitcoin / Web3 / …).
+- **`AND` only when the user's phrasing carries clear intersection semantics:**
+  - **Composite noun phrases** — `"AI cooking"`, `"Roman naval warfare"`, `"vegan keto"`.
+  - **Explicit conjunctions** — `"both X and Y"`, `"covering both X and Y"`.
+- When in doubt, OR.
+**Expansion shape under `AND`:** keep candidates **inside the intersection** — don't broaden across each component independently. For `"Roman naval warfare"`, expand within Roman-naval territory (`Punic Wars`, `Roman navy`, `trireme`, `Battle of Actium`); do NOT add generic Roman-empire or generic naval-warfare terms, because the downstream AND combine would then over-match unrelated channels.
+### Phase 2 — Rank (mechanical, via the bundled script)
+Run the bundled script. It takes the candidate list, sends one `size:0` + `track_total_hits` phrase probe per keyword to `tl db es` against `["title", "summary", "transcript"]`, and prints the ranked JSON on stdout.
+Three invocations cover almost every case. **Pick by the question shape** (channel vs video vs AND-composite):
+```bash
+# (a) Channel search by topic — default fields (title, summary, transcript)
+python3 skills/tl-keyword-research/scripts/probe.py crypto bitcoin DeFi Web3 blockchain "smart contract"
+# (b) Video search by topic — REQUIRED: pass --fields title,summary
+#     The default field set includes `transcript`, which inflates counts via
+#     incidental mentions inside long videos. For video-level discovery the
+#     downstream ES query also uses title+summary, so the probe MUST match.
+python3 skills/tl-keyword-research/scripts/probe.py --fields title,summary \
+  "budget meal prep" "cheap meal prep" "meal prep on a budget" "frugal recipes"
+# (c) Composite noun ("both X and Y") — pass --operator AND so candidates stay
+#     inside the intersection (don't broaden each component independently)
+python3 skills/tl-keyword-research/scripts/probe.py --operator AND \
+  "3d printing" "miniature painting" "tabletop miniatures" "resin printing minis"
+```
+**Pick the invocation shape by what the user is searching for:**
+```bash
+# (a) Channel search by topic — default fields (title, summary, transcript)
+python3 <SKILL_DIR>/scripts/probe.py crypto bitcoin DeFi
+# (b) Video search by topic — REQUIRED: pass --fields title,summary
+#     Without it, the probe includes transcript matches (noise from passing
+#     mentions inside long videos), and the count won't match the field set
+#     the downstream ES query uses for video-level discovery.
+python3 <SKILL_DIR>/scripts/probe.py --fields title,summary \
+  "budget meal prep" "cheap meal prep" "meal prep on a budget"
+# (c) Composite-noun phrase ("both X and Y" / "X-themed Y") — pass --operator AND
+#     to keep candidates inside the intersection
+python3 <SKILL_DIR>/scripts/probe.py --operator AND \
+  "Roman naval warfare" "Punic Wars" trireme "Roman navy"
+```
+Other input / scoping forms:
+```bash
+# JSON array on stdin
+echo '["crypto","bitcoin","DeFi"]' | python3 <SKILL_DIR>/scripts/probe.py
+# Newline-separated on stdin
+printf 'crypto\nbitcoin\nDeFi\n' | python3 <SKILL_DIR>/scripts/probe.py
+# Time window (optional, applies to publication_date)
+python3 <SKILL_DIR>/scripts/probe.py --since 2025-01-01 --until 2026-01-01 crypto bitcoin
+```
+The script:
+1. Reads keywords from argv (preferred) or stdin (JSON array or newline-separated). Deduplicates case-insensitively; the first spelling wins.
+2. For each keyword, sends a `multi_match` phrase query against `["title", "summary", "transcript"]` with `size:0` and `track_total_hits:true`. Optionally scopes by `publication_date`.
+3. Reads `total` from the response envelope (falls back to `hits.total.value` if absent).
+4. Sorts descending by count.
+5. Prints the canonical JSON object on stdout.
+If a single probe fails (auth, transport, server error), the script exits non-zero and writes the error to stderr — partial output is not produced.
+## Output (strict)
+A **single JSON object** on stdout — no prose, no markdown fences:
+```json
+{
+  "operator": "OR",
+  "keywords": [
+    {"keyword": "crypto",  "count": 18742},
+    {"keyword": "bitcoin", "count": 15103},
+    {"keyword": "DeFi",    "count": 4221},
+    {"keyword": "rugpull", "count": 0}
+  ]
+}
+```
+- `operator` is always present and is one of `"OR"` (default) or `"AND"`. It echoes whatever was passed via `--operator` and tells the caller how to combine the surviving keywords downstream (`bool.should` for OR, `bool.must` for AND, or the FilterSet equivalent).
+- `keywords` sorted **descending** by `count`.
+- **Zero-count entries are kept** — they signal that the agent's suggestion didn't match anything in the corpus, which is informative to the caller.
+- **Deduplicated case-insensitively** — `"Crypto"` and `"crypto"` collapse to one entry; the first spelling wins.
+- Each entry has exactly two keys: `keyword` (string) and `count` (integer).
+- The seed keyword(s) are always included in the output, ranked alongside the suggestions.
+The skill's responsibility ends at the ranked JSON. The caller decides what to do with it — typically running `tl db es` with a `multi_match` over the surviving high-count keywords against the same `title` / `summary` / `transcript` fields.
+## Cost
+Each probe is `size:0` + `track_total_hits:true` with no aggregations — no rows are returned. At raw-DB pricing, expect roughly 1–2 credits per probe. For 10 keywords, expect ~10–20 credits total. Run `tl describe show db` to see the current rate.
+## Self-check before emitting
+1. Output is a single valid JSON object on stdout — no prose, no fences.
+2. `operator` is `"AND"` only when the user phrasing carries clear intersection semantics (composite-noun phrase or explicit "both X and Y"); otherwise `"OR"`.
+3. Under `operator: "AND"`, candidates stay inside the intersection — no broadening across components independently.
+4. Every keyword is a generic term (no specific brand or channel names).
+5. `keywords` array is sorted descending by `count`.
+6. Each entry has exactly `keyword` (string) and `count` (integer).
+7. The seed keyword(s) appear in the output.

thoughtleaders-cli 0.6.47__tar.gz → 0.6.51__tar.gz

thoughtleaders-cli 0.6.47tar.gz → 0.6.51tar.gz