thoughtleaders-cli 0.6.20__tar.gz → 0.6.22__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/.claude-plugin/plugin.json +1 -1
  2. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/AGENTS.md +17 -2
  3. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/PKG-INFO +1 -1
  4. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/pyproject.toml +1 -1
  5. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl/SKILL.md +8 -1
  6. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl/references/postgres-schema.md +41 -1
  7. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/SKILL.md +209 -36
  8. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/report_glossary.md +7 -5
  9. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/topic_matcher.md +4 -0
  10. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/__init__.py +1 -1
  11. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/.claude-plugin/marketplace.json +0 -0
  12. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/.github/workflows/python-publish.yml +0 -0
  13. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/.gitignore +0 -0
  14. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/CLAUDE.md +0 -0
  15. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/LICENSE +0 -0
  16. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/README.md +0 -0
  17. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/agents/tl-analyst.md +0 -0
  18. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/commands/tl-balance.md +0 -0
  19. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/commands/tl-reports.md +0 -0
  20. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/commands/tl-sponsorships.md +0 -0
  21. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/commands/tl.md +0 -0
  22. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/docs/architecture.md +0 -0
  23. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/hooks/hooks.json +0 -0
  24. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/hooks/scripts/post-usage.sh +0 -0
  25. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/hooks/scripts/pre-check.sh +0 -0
  26. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl/references/business-glossary.md +0 -0
  27. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl/references/elasticsearch-schema.md +0 -0
  28. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl/references/firebolt-schema.md +0 -0
  29. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/examples/e2e_findings.md +0 -0
  30. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/examples/golden_queries.md +0 -0
  31. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/columns_brands.md +0 -0
  32. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/columns_channels.md +0 -0
  33. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/columns_content.md +0 -0
  34. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/columns_sponsorships.md +0 -0
  35. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/intelligence_filterset_schema.json +0 -0
  36. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/intelligence_widget_schema.json +0 -0
  37. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/sortable_columns.json +0 -0
  38. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/sponsorship_filterset_schema.json +0 -0
  39. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/sponsorship_widget_schema.json +0 -0
  40. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/references/widgets.md +0 -0
  41. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/column_builder.md +0 -0
  42. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/database_query.md +0 -0
  43. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/keyword_research.md +0 -0
  44. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/name_resolver.md +0 -0
  45. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/sample_judge.md +0 -0
  46. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/similar_channels.md +0 -0
  47. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/skills/tl-report-builder/tools/widget_builder.md +0 -0
  48. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/_completions.py +0 -0
  49. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/auth/__init__.py +0 -0
  50. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/auth/commands.py +0 -0
  51. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/auth/login.py +0 -0
  52. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/auth/pkce.py +0 -0
  53. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/auth/token_store.py +0 -0
  54. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/client/__init__.py +0 -0
  55. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/client/errors.py +0 -0
  56. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/client/http.py +0 -0
  57. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/__init__.py +0 -0
  58. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/_comments_common.py +0 -0
  59. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/ask.py +0 -0
  60. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/balance.py +0 -0
  61. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/brands.py +0 -0
  62. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/changelog.py +0 -0
  63. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/channels.py +0 -0
  64. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/db.py +0 -0
  65. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/deals.py +0 -0
  66. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/describe.py +0 -0
  67. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/doctor.py +0 -0
  68. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/matches.py +0 -0
  69. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/proposals.py +0 -0
  70. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/recommender.py +0 -0
  71. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/reports.py +0 -0
  72. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/schema.py +0 -0
  73. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/setup.py +0 -0
  74. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/snapshots.py +0 -0
  75. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/sponsorships.py +0 -0
  76. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/uploads.py +0 -0
  77. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/commands/whoami.py +0 -0
  78. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/config.py +0 -0
  79. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/filters.py +0 -0
  80. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/hints.py +0 -0
  81. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/main.py +0 -0
  82. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/output/__init__.py +0 -0
  83. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/output/formatter.py +0 -0
  84. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/src/tl_cli/self_update.py +0 -0
  85. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/tests/__init__.py +0 -0
  86. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/tests/test_auth.py +0 -0
  87. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/tests/test_filters.py +0 -0
  88. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/tests/test_output.py +0 -0
  89. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/tests/test_reports.py +0 -0
  90. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/tests/test_sponsorships.py +0 -0
  91. {thoughtleaders_cli-0.6.20 → thoughtleaders_cli-0.6.22}/uv.lock +0 -0
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tl-cli",
3
- "version": "0.6.20",
3
+ "version": "0.6.22",
4
4
  "description": "ThoughtLeaders CLI — query sponsorship deals, channels, brands, uploads, and intelligence from the terminal",
5
5
  "author": {
6
6
  "name": "ThoughtLeaders",
@@ -61,11 +61,26 @@ This repo is also a Claude Code plugin, and can directly be installed as one.
61
61
  Skills under `skills/` are split into a `SKILL.md` and one or more `references/*.md` files. To prevent drift, each fact has exactly one home:
62
62
 
63
63
  - **CLI-shaped facts live in `SKILL.md`** — command surface, flags, filter syntax, output shapes, workflow, credit-cost curve, status-label mapping the CLI emits.
64
- - **Schema-shaped facts live in `references/`** — table/column catalogues, accepted-query rules for raw DB engines (PG/ES/Firebolt), index constraints, field types, ID formats.
65
- - **Business-shaped facts live in `references/business-glossary.md`** (or the equivalent glossary file) — revenue/pipeline definitions, performance grades, ownership semantics, MSN/TPP meaning, team rosters.
64
+ - **Schema-shaped facts live in `skills/tl/references/`** — table/column catalogues, accepted-query rules for raw DB engines (PG/ES/Firebolt), index constraints, field types, ID formats. This directory is the **single canonical home** for schema facts inside this plugin. It is a managed sync of the upstream `thoughtleaders-skills/tl-data/references/` (the source of truth across all TL agent surfaces); changes that originate here should be propagated upstream, and vice versa.
65
+ - **Business-shaped facts live in `skills/tl/references/business-glossary.md`** (or the equivalent glossary file) — revenue/pipeline definitions, performance grades, ownership semantics, MSN/TPP meaning, team rosters.
66
66
 
67
67
  When adding or updating skill content, place the fact in its single home and link from the others. Do not duplicate or "quick-recap" content across files — recaps are the highest drift surface.
68
68
 
69
+ #### Anti-pattern: skill-local schema references
70
+
71
+ When a dependent skill (e.g. `tl-report-builder`) needs to reference a schema fact (table layout, columns, fetch SQL, hallucinated-column markers), **link to the canonical home in `skills/tl/references/`** — do not create a new `<skill>/references/*.md` file that mirrors or paraphrases that content.
72
+
73
+ Concrete regression marker: an earlier branch added `skills/tl-report-builder/references/data_plane.md` to consolidate the `thoughtleaders_topics` fetch query out of inline tool text. That had the right *shape* (don't restate schema in tool prose) but the wrong *home* — it forked schema facts into a parallel reference file that would silently drift from `skills/tl/references/postgres-schema.md`. The fix was to land the columns + fetch SQL + regression markers in `postgres-schema.md` (and upstream in `tl-data/references/postgres-schema.md`), delete the local file, and rewire the references via Markdown links.
74
+
75
+ Rule of thumb: if you are about to write *"here's the SQL to query this table"* or *"these columns don't exist on this table"* anywhere outside `skills/tl/references/`, stop. Add the fact to the canonical reference, then link to the anchor. Same for business facts and the glossary.
76
+
77
+ Skill-local `references/*.md` ARE appropriate when the content is **skill-shaped**, not schema-shaped:
78
+ - Column metadata for a specific report type (sortable columns, formula templates) — `tl-report-builder/references/columns_*.md`
79
+ - JSON schemas for tool-specific request/response shapes — `tl-report-builder/references/*_schema.json`
80
+ - Disambiguation tables, defaults, and pitfall catalogues that exist only in this skill's flow — `tl-report-builder/references/report_glossary.md`
81
+
82
+ If you are unsure whether a fact is schema-shaped or skill-shaped, ask: "would another TL skill (analyst, finance, mbn-outreach) ever need this fact?" If yes, it's schema/business-shaped — promote it to the canonical home.
83
+
69
84
  ## API Response Envelope
70
85
 
71
86
  All list endpoints return: `{ results, total, limit, offset, usage: { credits_charged, credit_rate, balance_remaining }, _breadcrumbs }`.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: thoughtleaders-cli
3
- Version: 0.6.20
3
+ Version: 0.6.22
4
4
  Summary: ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence
5
5
  Project-URL: Homepage, https://thoughtleaders.io
6
6
  Project-URL: Repository, https://github.com/ThoughtLeaders-io/thoughtleaders-cli
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "thoughtleaders-cli"
7
- version = "0.6.20"
7
+ version = "0.6.22"
8
8
  description = "ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence"
9
9
  readme = "README.md"
10
10
  license = "MIT"
@@ -1,6 +1,13 @@
1
1
  ---
2
2
  name: tl
3
- description: Query and analyze ThoughtLeaders business data using the `tl` CLI. Default to raw database queries via `tl db pg|fb|es` for anything non-trivial (joins, aggregations, multi-condition filters, anything that would otherwise need post-processing); use the structured resource commands (sponsorships, deals, channels, brands, uploads, snapshots, reports) only for trivially simple lookups (single-record show by ID, plain filtered lists). Triggers on questions about deals, sponsorships, pipeline, revenue, brands, channels, MSN, TPP, uploads/videos, transcripts, brand mentions, view-curves, sales numbers, reports, or any cross-source business analysis ("how many deals", "pipeline report", "weighted pipeline", "channel data", "brand lookup", "view curve", "find mentions of", "investigate this video", "query the database"). You ARE the AI layer — do not use `tl ask`.
3
+ description: |
4
+ Query and analyze ThoughtLeaders business data using the `tl` CLI. Default to raw database queries via `tl db pg|fb|es` for anything non-trivial (joins, aggregations, multi-condition filters, anything that would otherwise need post-processing); use the structured resource commands (sponsorships, deals, channels, brands, uploads, snapshots, reports) only for trivially simple lookups (single-record show by ID, plain filtered lists). You ARE the AI layer — do not use `tl ask`.
5
+
6
+ **Use this skill for ANALYTICAL questions**: counts, metrics, trends, time-series, distributions, single-record drill-downs, revenue / pipeline-weighting math, view-curve analysis, cross-source business questions. *"How many deals did we close last quarter?"*, *"What's the weighted pipeline by sales owner?"*, *"Show me the view curve for video X"*, *"Find mentions of Surfshark in transcripts"*, *"Investigate this video"*.
7
+
8
+ **DEFER to `tl-cli:tl-report-builder`** when the user wants a **LIST of entities with filters** — channels, videos, brands, or sponsorships shaped as a report deliverable, regardless of whether they say "report" or "campaign". *"Show me partnerships from last quarter for beauty creators"*, *"Find me gaming channels with 100K+ subs"*, *"List the brands flagged as Managed Services"*, *"All sponsorships for channel X"*, *"Build me a TPP fintech list"* — every one of these goes to `tl-cli:tl-report-builder`, not here. The report-builder owns the four report types (content / brands / channels / sponsorships) and the preview/save flow; using this skill instead produces ad-hoc data dumps that bypass the saved-report system.
9
+
10
+ Quick routing test: *"would the answer to this prompt be a TL report (a list of entities I'd want to come back to)?"* — if yes, route to `tl-cli:tl-report-builder`. If no (it's a number, a chart, a single record, or an exploratory analysis), use this skill.
4
11
  ---
5
12
 
6
13
  # ThoughtLeaders Data Analyst
@@ -1,5 +1,7 @@
1
1
  # ThoughtLeaders PostgreSQL Schema Reference
2
2
 
3
+ > **Canonical home (within this plugin).** This file is the single source of truth for TL Postgres schema facts inside `tl-cli` (tables, columns, fetch SQL, hallucinated-column markers, join paths). Dependent skills here — most notably `tl-report-builder` — must **link to entries in this file** rather than restate columns / fetch SQL / "do not exist" markers in their own `references/*.md`. Forking schema content into a parallel `<skill>/references/*.md` produces silent drift; that anti-pattern is what this preamble exists to prevent. Upstream source of truth is `thoughtleaders-skills/tl-data/references/postgres-schema.md`; this file is a managed sync.
4
+
3
5
  ## How to query
4
6
 
5
7
  ```bash
@@ -191,7 +193,8 @@ A channel can have multiple adspots (different sellers: talent manager, direct,
191
193
  | `id` | int | Primary key |
192
194
  | `channel_name` | varchar | Display name. ⚠️ The column is `channel_name`, NOT `name`. |
193
195
  | `external_channel_id` | varchar | YouTube channel ID (e.g., `UCxxxxxx`). ⚠️ There is NO `youtube_id` column — use this one. |
194
- | `url` | varchar | Channel URL |
196
+ | `url` | varchar | Channel URL (external — usually the YouTube URL). |
197
+ | `slug` | varchar | TL platform slug. Used to build the canonical TL channel URL: `https://app.thoughtleaders.io/youtube/<slug>`. Prefer this over `url` when linking to a channel from any AM-facing surface (reports, samples, Slack posts) — the TL URL keeps the user inside the platform and is the company's hyperlink contract. Falls back to an ID-based TL path if `slug` is NULL; never fall back to the external YouTube URL. |
195
198
  | `reach` | bigint | Subscriber count. ⚠️ There is NO `subscribers` column — `reach` is the subscriber count. Many internal docs and outputs use the word "subscribers"; in SQL, always query `reach`. |
196
199
  | `media_selling_network_join_date` | date/timestamptz | When channel joined MSN. **MSN membership = this column IS NOT NULL.** |
197
200
  | `is_tl_channel` | boolean | True = TPP/VIP channel (the small VIP subset, ~144 channels at 100k+ reach). ⚠️ **`is_tl_channel` is NOT the MSN flag.** Naive `WHERE is_tl_channel = true` as an "MSN filter" silently drops ~98% of the MSN pool (8,652 → 144 at 100k+). For MSN, use `media_selling_network_join_date IS NOT NULL`. |
@@ -233,6 +236,43 @@ Source of truth: `thoughtleaders.taxonomies.ContentCategory` (Django `IntEnum` i
233
236
  | 21 | HEALTH_FITNESS | Health & Fitness |
234
237
  | 22 | MUSIC | Music |
235
238
 
239
+ ### `thoughtleaders_topics` (Curated Topic Taxonomy)
240
+
241
+ A small (under 20 rows), live-edited taxonomy of curated topics. Each row bundles a topic name with a one-paragraph description and a `keywords` JSONB array of head + long-tail terms. The report-builder skill's `topic_matcher` tool consumes this table to match a user's natural-language report query against curated topics; downstream filter-building uses the matched topics' `keywords` arrays as keyword groups. The taxonomy is **actively migrating** — content drifts week to week — so consumers must fetch live, never bundle a snapshot.
242
+
243
+ #### Fetch query (canonical — use verbatim)
244
+
245
+ ```bash
246
+ tl db pg --json "SELECT id, name, description, keywords FROM thoughtleaders_topics ORDER BY id LIMIT 100 OFFSET 0"
247
+ ```
248
+
249
+ The table has fewer than 20 rows; client-side filtering after a full fetch is free. **Do not push name-pattern WHERE clauses into the SQL** — the agent has guessed `WHERE is_active = TRUE` and `WHERE name ILIKE ANY(...)` in past runs and burnt round-trips on hallucinated columns.
250
+
251
+ | Column | Type | Description |
252
+ |--------|------|-------------|
253
+ | `id` | int | Primary key |
254
+ | `name` | varchar | Topic display name (e.g. "Artificial Intelligence", "PC Games") |
255
+ | `description` | varchar | One-paragraph human-readable description; the matcher uses it for tie-breaks |
256
+ | `keywords` | jsonb | Array of curated keyword strings — the matcher's primary signal. Mixes head terms (`"cooking"`) with long-tail (`"5-ingredient meals"`); long-tail matches are still strong signals, don't downgrade them. |
257
+ | `created_at` | timestamptz | Rarely needed |
258
+ | `updated_at` | timestamptz | Rarely needed |
259
+ | `source` | varchar | Provenance, rarely needed |
260
+
261
+ #### Columns that DO NOT exist on `thoughtleaders_topics`
262
+
263
+ Common hallucinations the agent has tried in real runs (each wasted a round-trip). All return *"column '\<name\>' does not exist"*:
264
+
265
+ - ❌ `is_active`
266
+ - ❌ `type` (topics are not subtyped at the schema level)
267
+ - ❌ `parent_id` (topics are flat, not hierarchical)
268
+ - ❌ `slug`, `topic_id` (the PK is `id`), `archived`, `is_published`
269
+
270
+ Cited regression markers from real runs:
271
+ - AI/marketing channels run: tried `thoughtleaders_topic` (singular — table doesn't exist), then `WHERE is_active = TRUE`. Three round-trips before consulting `information_schema`.
272
+ - Travel/digital-nomad run: tried `SELECT id, name, type, parent_id FROM thoughtleaders_topics WHERE name ILIKE ANY(...)`.
273
+
274
+ If a query against this table errors with *"column '\<X\>' does not exist"*, that's the regression marker — go back to the verbatim fetch above.
275
+
236
276
  ### `auth_user` (Django Users)
237
277
 
238
278
  Standard Django user table. Used for owner lookups.
@@ -1,6 +1,19 @@
1
1
  ---
2
2
  name: tl-report-builder
3
- description: Build TL saved-report configurations from natural-language requests. Generates a valid JSON campaign schema (filterset + columns + widgets + pagination) for the four report types — content (1), brands (2), channels (3), sponsorships (8) — plus a few key takeaway insights about the result. Use when a TL team member asks to build, create, or save a report. Triggers on phrasings like "build a report", "create a campaign", "make a report on", "save a dashboard for", "find me channels for outreach", "all sponsorships for X", "report on Y brand", "channels matching Z".
3
+ description: |
4
+ Build TL reports from natural-language requests. Produces an in-chat preview (sample-rows table + filter summary + takeaways) by default, or auto-saves a TL report when the user's wording is explicit about it ("save", "create the report", "make a campaign for me to come back to"). Covers the four report types: content/videos (1), brands (2), channels (3), sponsorships/deals (8).
5
+
6
+ **Use this skill — NOT `tl-cli:tl` — for ANY request that returns a list of channels, videos, brands, or sponsorships with filters applied**, even when the user doesn't say "report" or "campaign". This is the canonical path for list-shaped requests.
7
+
8
+ Triggers on every variant of "list me / find me / show me / give me / pull me / build me / make me X with filters Y", including:
9
+ - **Channels**: "Find me gaming channels with 100K+ subs", "show me TPP fintech creators in MSN", "channels we haven't pitched to <brand>", "look-alike channels to X", "non-MSN travel channels", "build me a list of <niche> creators", "channels matching <criteria>".
10
+ - **Brands**: "all brands flagged as Managed Services", "brand activity report for these specific brands: ...", "brands sponsoring <channel> in the past 6 months", "competitor brands of X".
11
+ - **Sponsorships / deals**: "**show me partnerships from last quarter** for <niche> creators", "Q1 2026 sold sponsorships for personal investing", "all proposal_approved deals owned by <user>", "list sponsorships with status sold and send_date 2026-05-07", "sponsorships for channel <name>".
12
+ - **Videos / uploads**: "videos sponsored by <brand>", "wellness videos but exclude anything sponsored by Nike or Adidas".
13
+
14
+ Save-intent variants ("save a campaign of …", "create the report …", "make a TL report for …") trigger auto-save; everything else previews. Off-taxonomy keywords ("crypto / Web3"), brand-exclusion logic ("not pitched to X"), demographic floors ("US audience ≥30%"), TPP/MSN scoping, and competitive-pitch shapes are all this skill's job — not the general `tl-cli:tl` data-analyst skill.
15
+
16
+ **Skip this skill** only for: counts, metrics, trends, single-record show-by-ID lookups, raw exploratory queries, or analytical questions that aren't shaped as "give me a list". Those go to `tl-cli:tl`.
4
17
  ---
5
18
 
6
19
  # TL Report Builder Skill
@@ -36,6 +49,43 @@ Internally this skill thinks in phases (1–4), report types (1, 2, 3, 8), tool
36
49
  - Report-type numbers (`Type 1`, `Type 3`, `Type 8`) — say "channels report", "deals report", etc.
37
50
  - Identifier-shaped names from `tools/` and `references/` — anything that reads like a code symbol (the `snake_case` tool / step / metadata names defined in this skill, the JSON keys you see in `references/*_schema.json`, internal data-layer model names). If a term reads like a programmer typed it, it doesn't belong in front of the user.
38
51
  - JSON-y decision codes and classification codes the user has no reason to recognize (verdict strings emitted from validation, count-bucket labels emitted alongside them — anything that's a literal value in the validation output JSON).
52
+ - **Internal mechanism phrases** that describe HOW the skill works rather than WHAT the user is getting. Forbidden examples (verbatim regression markers — never say any of these to the user):
53
+ - *"held in working memory"*
54
+ - *"per the skill's rules"*
55
+ - *"in working memory"*
56
+ - *"the campaign config (held in working memory; not echoed to chat per the skill's rules)"*
57
+ - *"campaign config JSON"* / *"the config"* / *"the JSON"* — when describing what the report contains, name the FILTERS, not the storage shape
58
+ - *"per the policy"* / *"the orchestration"* / *"the FilterSet"*
59
+ - **`Campaign` / `Campaign #N` / `campaign_id`** — these are Django model jargon. The TL platform calls these **reports**. Always say "**TL report**" / "**report #N**" / "**report id**" in user-facing text (chat replies, save-success messages, the save tail). The internal data model is named `Campaign` for historical reasons; the user has never heard that name. *"Report saved. … (Campaign #23801)"* is a leak — write *"TL report saved. … (report #23801)"* instead.
60
+ - **`reach` / "by reach" / "Reach"** — internal SQL column name. The user-facing term is **subscribers** (canonical mapping lives in `thoughtleaders-skills/tl-data/references/business-glossary.md`: *"AMs say subscribers, SQL says reach"*). Use `reach_from` / `reach_to` when emitting the FilterSet, but **always narrate as "subscribers"** — in sample-table column headers ("Subscribers", not "Reach"), in distribution stats ("By subscribers: 1M+ → 2", not "By reach: …"), in filter summaries ("only channels with 100K+ subscribers"), in takeaways. *"By reach: 1M+ → 2 · 100K–1M → 57 · 10K–100K → 128"* is a leak.
61
+ - **Raw internal IDs appended to names in sample-table rows** — e.g. `"Crypto Journey (id 1178513)"`, `"Altcoin Daily (id 28151)"`, `"FRÉ Skincare (brand_id 14625)"`. The numeric ID is implementation detail; the user is browsing channels/brands by name, not by primary key. **Show the name only**, hyperlinked per rule 20a (`[Crypto Journey](https://app.thoughtleaders.io/youtube/crypto-journey)`). The Markdown link is the addressable identifier — no raw ID needed alongside it. **Exception**: include the ID inline only when the user explicitly asked for it (*"give me the IDs too"*, *"include channel IDs"*) or when there's a real disambiguation case (two same-named channels in the sample). Otherwise the parenthetical `(id N)` is noise. *"Crypto Journey (id 1178513)"* in a normal sample row is a leak; write *"[Crypto Journey](https://app.thoughtleaders.io/youtube/crypto-journey)"*.
62
+
63
+ These are internal terms from this SKILL.md. They describe the skill's own implementation, not the user's report. If you find yourself about to type any of these, stop and re-write the sentence as a plain-English summary of what the report does (see "Filter summary" pattern below).
64
+
65
+ **Filter summary pattern** — when narrating WHAT the report (saved or previewed) actually contains, use **outcome-focused plain English**, not "the config". Translate each filter into a sentence describing what the user will see:
66
+
67
+ | Internal field | User-facing summary phrasing |
68
+ |---|---|
69
+ | `topics: [98]` | "results will be focused on the gaming/PC games topic" |
70
+ | `keywords` / `keyword_groups[]` | "results will be filtered to channels mentioning <keywords>" |
71
+ | `reach_from: 100000` | "only channels with 100K+ subscribers will be included" |
72
+ | `languages: ["en"]` | "only English-speaking channels will be included" |
73
+ | `creator_countries: ["US"]` | "results will be limited to US creators" |
74
+ | `min_demographic_usa_share: 50` | "only channels with strong US audiences will be included" |
75
+ | `channel_formats: [4]` | "only YouTube long-form video channels will be included" (omit if it's the default) |
76
+ | `msn_channels_only: true` | "results will be limited to MSN channels" |
77
+ | `is_tl_channel = TRUE` (resolved into `channels` M2M) | "results will be limited to TPP channels" |
78
+ | `outreach_email: "exists"` | "only outreach-ready creators (with email on file) will be included" |
79
+ | `tl_sponsorships_only: true` | "results will prioritise creators with proven TL sponsorship history" |
80
+ | `cross_references[].exclude_proposed_to_brand: ["Webull"]` | "channels already pitched to Webull will automatically be excluded" |
81
+ | `cross_references[].include_sponsored_by_mbn` | "results will be limited to creators MBN brands are working with" |
82
+ | `sort: "-reach"` | "results will be sorted by largest subscriber count first" |
83
+ | `sort: "-mentions_count"` | "results will be sorted by strongest sponsorship performance" |
84
+ | `start_date` / `end_date` / `days_ago` | "results will cover <date range / last N days>" |
85
+ | `columns: { ... }` (the chosen column set) | "outreach-ready columns will be included automatically" — don't list them by code-name; describe the focus (outreach / discovery / pricing / pipeline) |
86
+ | `widgets: [...]` | "performance widgets will be added automatically" — describe the focus, not the aggregator names |
87
+
88
+ Compose 4–7 of these into a short bulleted summary directly in chat. Use the user's own brand and keyword wording verbatim where possible. Don't list every filter — only the ones that meaningfully shape what the user will see.
39
89
 
40
90
  **Allowed**: specific channel / brand / video / advertiser names from the data, the user's own keywords, plain words like "results", "matches", "sample", "noise", "filter", "search", "report", "column", "chart". Plain-English words that happen to coincide with an internal label *as English* (e.g. "the result is narrow", "a normal-size result") are fine — the test is whether the user reads it as English or as a code symbol. The same word as `count_classification: "narrow"` is forbidden; in "the result is narrow" it's fine.
41
91
 
@@ -55,13 +105,13 @@ Internally this skill thinks in phases (1–4), report types (1, 2, 3, 8), tool
55
105
  | Phase 4 — widget builder | "Choosing the charts and dashboards…" |
56
106
  | Phase 4 — final composition | "Putting the final report together…" |
57
107
  | Preview path (default) — show takeaways + sample table | "Here's what matches…" / "Found N channels — top by reach:" / "Top videos that match:" |
58
- | Preview tail (ambiguous middle — close with this) | *"If you want this saved as a campaign you can come back to, say save."* |
59
- | Save step (write JSON to `/tmp/<name>.json`, then `tl reports create --config-file /tmp/<name>.json --yes`) | "Saving the report…" |
60
- | Save success (only after the CLI command returns success) | "Report saved." + link from the CLI response (do NOT echo the JSON config back; do NOT say "saved as <path>.json" — the temp file is transport, not the deliverable) |
108
+ | Preview tail (ambiguous middle — close with this) | *"If you want this saved as a TL report you can come back to, say save."* |
109
+ | Save step (resolve portable temp path → write JSON verify file exists → `tl reports create --config-file <path> --yes`) | "Saving the report…" |
110
+ | Save success (only after the CLI command returns success) | "**TL report saved.**" + link from the CLI response, optionally with "(report #N)" using the campaign_id rendered as a *report* number — never as "Campaign #N". Do NOT echo the JSON config back; do NOT say "saved as <path>.json" (temp file is transport, not the deliverable); do NOT say "Campaign #..." (Django model jargon — say "report"). |
61
111
  | Save failure | "Couldn't save the report: <plain-English reason>" — surface the CLI's stderr verbatim if it's user-readable, otherwise summarise |
62
112
  | User says "save" / "yes save it" / "save it" after a preview | "Saving…" — re-use the config from working memory; do NOT re-run Phases 1–4 |
63
113
  | Mode B follow-up (looks_wrong) | "The top results don't look right — here are your options…" |
64
- | Mode C (3 retries exhausted) | "I couldn't build a sensible result for this — here's what I tried…" |
114
+ | Mode C (1 retry exhausted on empty/too_broad; data shape genuinely doesn't fit) | "I couldn't build a sensible result for this — here's what I tried…" |
65
115
 
66
116
  **Report-type → user-facing label**:
67
117
 
@@ -108,7 +158,7 @@ Here's a real request and the gap between leaky narration (the failure mode) and
108
158
  >
109
159
  > Putting the final report together now — outreach-focused columns (deal history, contact info, demographic fit) and the standard channels-report charts. *(saving…)*
110
160
  >
111
- > **Report saved.** [Investing — unpitched investing channels](https://app.thoughtleaders.io/...) (Campaign #12345)
161
+ > **TL report saved.** [Investing — unpitched investing channels](https://app.thoughtleaders.io/...) (report #12345)
112
162
  >
113
163
  > **Worth knowing**:
114
164
  > - 668 channels already pitched to investing.com are excluded.
@@ -151,9 +201,9 @@ Same architecture, different intent. The prompt is exploratory; the policy says
151
201
  > - Topic anchored on the curated investing keyword set; the spot-check looked clean (1 Pokémon channel slipped in via a "trading" keyword match — easy to filter further if needed).
152
202
  > - Sort is most-recently-active first so dormant channels don't crowd the top.
153
203
  >
154
- > *If you want this saved as a campaign you can come back to, just say save.*
204
+ > *If you want this saved as a TL report you can come back to, just say save.*
155
205
 
156
- If the user replies *"yes save it"* or *"save"* → run the save step (write to `/tmp/`, invoke `tl reports create --config-file …`) using the **same config that's already in working memory**. Don't re-run Phases 1–4. The follow-up reply is just the takeaways + saved-report link.
206
+ If the user replies *"yes save it"* or *"save"* → run the save step (resolve a portable temp path → write verify invoke `tl reports create --config-file <that-exact-path> --yes`; see Save-or-preview policy step 1+2 for the full mechanics) using the **same config that's already in working memory**. Don't re-run Phases 1–4. The follow-up reply is just the takeaways + saved-report link.
157
207
 
158
208
  What changes between save-mode and preview-mode:
159
209
 
@@ -162,7 +212,7 @@ What changes between save-mode and preview-mode:
162
212
  | Phases 1–4 run? | Yes | Yes (identical) |
163
213
  | Campaign row in DB? | Yes | No |
164
214
  | What ends in chat | Takeaways + saved-report URL | Takeaways + sample table + "say save" tail |
165
- | `/tmp/<slug>.json` written? | Yes (transport for `tl reports create`) | No (config stays in working memory) |
215
+ | Portable-temp transport file (`<system-temp>/tl-report-builder-<slug>.json`) written? | Yes (transport for `tl reports create`) | No (config stays in working memory) |
166
216
  | `tl reports create` invoked? | Yes (`--config-file <path> --yes`) | No |
167
217
  | Campaign-config JSON in chat? | **No** | **No** |
168
218
 
@@ -201,7 +251,7 @@ USER_QUERY
201
251
  │ – db_count → threshold classify │
202
252
  │ – db_sample (LIMIT 10) → sample_judge │
203
253
  │ – Decide: proceed | retry | alternatives | fail │
204
- │ – Retry with feedback to T1/T2 (cap 3) on empty/too_broad │
254
+ │ – Retry with feedback to T1/T2 (cap 1) on empty/too_broad │
205
255
  │ │
206
256
  │ ┌─── Conditional Tool Invocation (within Phase 2 only) ─────────────┐ │
207
257
  │ │ T1 tools/topic_matcher.md — fires per criteria │ │
@@ -224,7 +274,7 @@ USER_QUERY
224
274
  │ timed out — confirm narrowing with the user │
225
275
  │ • Validation: sample_judge returned looks_wrong → Mode B prompt │
226
276
  │ (save anyway / refine / cancel) │
227
- │ • Validation: 3 retries exhausted on empty/too_broad → fail mode
277
+ │ • Validation: 1 retry exhausted on empty/too_broad → alternatives
228
278
  └──────────────────────────────────┬──────────────────────────────────────┘
229
279
  │ validated schema
230
280
 
@@ -309,10 +359,32 @@ There is no fifth phase. Phase 4's output IS the deliverable. The skill itself n
309
359
  > - **Default to preview**, then close the reply with one line: *"If you want to save this as a campaign you can come back to, just say save."*
310
360
  > - Conservative move — never persist on ambiguity. If the user wanted it saved they will say so.
311
361
  >
312
- > **Save mechanics** (when save is triggered): two strict steps. **Step 1 alone is not the save** — the file write is just transport for step 2. Saying "Saved as foo.json" or "Saved to <path>" after only doing step 1 is a regression bug.
362
+ > **Save mechanics** (when save is triggered): three strict steps. **Step 1 alone is not the save** — the file write is just transport for step 3. Saying "Saved as foo.json" or "Saved to <path>" after only doing step 1 is a regression bug.
363
+ >
364
+ > 1. **Resolve a portable temp path FIRST** — never hardcode `/tmp/`. Use `Bash` to query the system temp directory at runtime so the path works on Linux, macOS, AND Windows:
365
+ > ```bash
366
+ > python -c "import tempfile, os; print(os.path.join(tempfile.gettempdir(), 'tl-report-builder-<short-slug>.json'))"
367
+ > ```
368
+ > Capture the printed path verbatim. On Linux/macOS this resolves to something like `/tmp/tl-report-builder-foo.json`; on Windows it resolves to `C:\Users\<user>\AppData\Local\Temp\tl-report-builder-foo.json`. **Hardcoding `/tmp/` on Windows silently fails** — the Write tool may report success but the file lands somewhere the CLI can't read in step 3. The `python -c "import tempfile..."` pattern works on every platform Claude Code runs on.
369
+ > 2. **Write the JSON to that resolved path via the `Write` tool, then verify it landed.** Immediately after the write, run `Bash`:
370
+ > ```bash
371
+ > test -f "<resolved-path>" && wc -c "<resolved-path>" || echo "MISSING"
372
+ > ```
373
+ > If the verification reports `MISSING` (or the byte count is 0), STOP and surface a clean error to the user — **do NOT instruct them to save it themselves** (that would conflict with rule 15's ban on user self-save fallbacks). Phrase it as a bug-report-shaped message acknowledging the save couldn't run, with the JSON attached as a recovery artifact (not as a save instruction):
313
374
  >
314
- > 1. **Write the JSON to `/tmp/`** via the `Write` tool. The path **MUST** be under the system temp directory (`/tmp/` on Linux/macOS, `%TEMP%` / `$TMPDIR` on whatever platform the agent is running on). Use a name like `/tmp/tl-report-builder-<short-slug>.json`. **Never write to the user's current working directory or any project path** — the file is a transport, not a deliverable, and leaving `foo_report.json` in the user's repo or cwd pollutes their workspace. If the system temp dir isn't writable, fall back to another temp-shaped location, never to cwd.
315
- > 2. **Invoke `tl reports create --config-file <that-same-tmp-path> --yes`** via the `Bash` tool. This is what actually saves the report. Read the CLI's response: success returns a `campaign_id` and `report_url` to echo to the user; failure returns a non-zero exit and an error message — surface that error verbatim, do NOT silently mark the report as saved.
375
+ > ```
376
+ > Couldn't save the report the temp directory at <resolved-path>
377
+ > isn't writable, so I couldn't stage the config for the CLI. This
378
+ > is a bug in the skill / environment, not something you need to do.
379
+ >
380
+ > The validated config is below as a recovery artifact in case you
381
+ > want to retry from a different machine. I haven't sent it to TL.
382
+ >
383
+ > <inline JSON in a code block, fenced>
384
+ > ```
385
+ >
386
+ > Do not invoke the CLI in this branch; that would just produce a confusing "No such file or directory" error. The inline JSON is a fallback **artifact**, not an instruction — the user is not expected to run anything themselves.
387
+ > 3. **Invoke `tl reports create --config-file <that-same-resolved-path> --yes`** via the `Bash` tool. This is what actually saves the report. Read the CLI's response: success returns a `campaign_id` and `report_url` to echo to the user; failure returns a non-zero exit and an error message — surface that error verbatim, do NOT silently mark the report as saved. **Use the EXACT same path string** the verification step in (2) confirmed; don't paraphrase or convert slashes between Unix/Windows styles. Never write to the user's current working directory or any project path — the file is a transport, not a deliverable.
316
388
  >
317
389
  > **Preview mechanics** (default): show **the sample-rows table FIRST**, then takeaways, then the closing "say save" tail. The table is the deliverable in preview mode — takeaways describe it, but the table itself is what the user asked for. **Skipping the table is a regression bug** (Phase 4 hard rule 14). Use the `db_sample` rows Phase 2 already collected (top 5–10 by sort key) and format as a tight Markdown table with 2–4 type-relevant columns:
318
390
  > - Type 3 (channels): `Channel | Subscribers | Last published`
@@ -320,11 +392,15 @@ There is no fifth phase. Phase 4's output IS the deliverable. The skill itself n
320
392
  > - Type 2 (brands): `Brand | Mentions | Channels`
321
393
  > - Type 8 (deals/sponsorships): `Channel | Brand | Status | Send date`
322
394
  >
323
- > After the table, give 2–4 takeaways (count, niche fit, noise warnings, sort note). Then close with a one-liner: *"If you want this saved as a campaign you can come back to, say save."* (Skip the closing line only when the user's prompt was clearly purely informational like "are there any …".)
395
+ > After the table, give 2–4 takeaways (count, niche fit, noise warnings, sort note). Then close with the **save tail**: *"If you want this saved as a TL report you can come back to, just say save."*
396
+ >
397
+ > **The save tail is MANDATORY in every preview reply** — including when the user's wording sounds informational ("find me…", "show me…", "are there any…"). The previous "skip when purely informational" exemption was too easy to over-apply: a real run for *"Find creators for FRÉ Skincare — should be female creator, US-based, majority female audience, filter out everyone already submitted, include a CPM column, min 2,000 projected views"* produced a polished preview with notes-for-the-AM and follow-up refinement options — but no save tail — even though the prompt was clearly designing a TL report (specific filters, custom column, brand-exclusion intent). Cost of including the tail when the user didn't want it: one ignorable line. Cost of skipping it when they did: they don't know the option exists. Always include it.
398
+ >
399
+ > If the user's preview-intent prompt happens to also include implicit save signals (specific column requests, structural design choices, request for a "list" they intend to act on), append a slightly more directive variant of the tail: *"If you want this as a saved TL report, just say save."* Same outcome; the tail is always there.
324
400
  >
325
- > **The JSON config never appears in chat in either path.** In save mode it's in the `/tmp/` file; in preview mode it stays in working memory. JSON in chat is implementation noise and a regression we already shipped a fix for once.
401
+ > **The JSON config never appears in chat in either path.** In save mode it lives in the portable-temp transport file; in preview mode it stays in working memory. JSON in chat is implementation noise and a regression we already shipped a fix for once.
326
402
  >
327
- > **Edits** to a saved report use `tl reports update <id> '<json>'` — same shell-quoting caveat as save: when the patch contains apostrophes, write to a `/tmp/` file and use `tl reports update <id> "$(cat /tmp/<patch>.json)"`. Don't tell users to paste JSON into the platform UI; that's an obsolete pre-v0.6.12 fallback.
403
+ > **Edits** to a saved report use `tl reports update <id> '<json>'` — same shell-quoting caveat as save: when the patch contains apostrophes, write to a portable temp file (resolved at runtime per step 1) and use `tl reports update <id> "$(cat <that-path>)"`. Don't tell users to paste JSON into the platform UI; that's an obsolete pre-v0.6.12 fallback.
328
404
  >
329
405
  > **Reads via `tl db es` / `tl db pg` (engine routed by report type — see Step 2.V1), writes via the CLI** is the architectural split.
330
406
 
@@ -398,8 +474,41 @@ Each tool fires only when its criteria are explicitly met (no automatic / specul
398
474
  ### T1 — `tools/topic_matcher.md`
399
475
  **Fires when**: `ReportType ∈ {1, 2, 3}` AND USER_QUERY mentions a topic concept that could plausibly map to a curated topic in `thoughtleaders_topics`.
400
476
  **Skipped when**: `ReportType == 8` (sponsorships don't use topic matching at the SQL level) OR USER_QUERY is purely an entity-name lookup ("emails for these channels").
477
+ **How to fetch the live topics**: see the `tl-cli:tl` skill's Postgres-schema reference — [`tl/references/postgres-schema.md` → `thoughtleaders_topics`](../tl/references/postgres-schema.md#thoughtleaders_topics-curated-topic-taxonomy). That's the canonical home for the fetch query, column list, and "do not guess" regression markers. Don't restate the SQL here.
401
478
  **Output**: per-topic verdicts (strong/weak/none) + summary. If `summary.strong_matches` non-empty, the topic's curated `keywords[]` array drives the FilterSet's `keywords` field (with per-position `content_fields` set via `keyword_content_fields_map` when a keyword targets a non-default match surface). Phase 2 may also emit the matched topic IDs directly via the FilterSet's `topics` field — both paths are valid; pick by intent.
402
479
 
480
+ **Narrow-first FilterSet assembly (mandatory — applies to topic-strong + keyword_research paths both)**: Phase 2c MUST assemble the FilterSet with the **narrowest viable shape first**, then validate. Expand only if the count is below the type's narrow threshold. The two narrowing levers, **ranked by impact on noisy-niche / multilingual runs**:
481
+
482
+ **Lever 1 (HIGHEST impact) — Field selection (Type 3 / channel discovery)**
483
+
484
+ The initial FilterSet's `content_fields` for Type 3 MUST be `["channel.channel_name", "channel_description"]` ONLY. Do NOT include `channel_description_ai` or `channel_topic_description` on the first cycle. (Use the schema enum values verbatim — these are the platform-recognised content-field names from `intelligence_filterset_schema.json`. The FilterSet rejects unknown values.)
485
+
486
+ The mechanism: the two AI-summarised content fields (`channel_description_ai`, `channel_topic_description`) catalogue **every topic a channel has ever touched** — including incidental mentions, format crossovers, and adjacent-niche tags. A channel whose primary niche is X but that ran one video about Y will still match queries against `channel_topic_description` for Y. For channel-discovery intent, that's pure noise: you wanted channels *about* Y, not channels that *once mentioned* Y. The channel's own `channel.channel_name` + `channel_description` answer the channel-discovery question (*"is this channel ABOUT the niche?"*); the AI-summarised fields answer a strictly broader question (*"has this channel ever mentioned the niche?"*) that surfaces too many false positives at discovery time.
487
+
488
+ Mechanism implication — **once `content_fields` is right, even a broad keyword set converges; even a tight keyword set produces noisy results with the AI-summarised fields included.** Field selection is the bigger dial; keyword pruning is the fine-tune. Restricting fields first reaches a clean count in one cycle; pruning keywords without addressing the AI-fields noise source typically takes 2–3 narrowing cycles.
489
+
490
+ *Regression-marker anchor — multilingual niche-language preview (LATAM cooking, several thousand candidates)*: the cycle that finally converged kept the same keyword set as the previous noisy cycle but reduced `content_fields` from 4 fields (incl. the two AI-summarised ones) to 2 (`channel.channel_name` + `channel_description` only). Result went from ~4,000 noisy → ~1,350 signal-rich (80% on-target) in a single pass — bigger gain than the prior keyword-pruning cycles delivered combined. The principle is general (applies to fitness/wellness, beauty, aviation, any niche-discovery preview where channels cross over into adjacent topics); LATAM cooking is the run that calibrated it.
491
+
492
+ Expand to AI fields only if `db_count` is below the narrow threshold (Type 3: < 50 channels) — that's the expansion path that re-opens the broader recall, not the default starting shape.
493
+
494
+ **Lever 2 — Keyword selection**
495
+
496
+ For topic-strong matches: include topic.keywords[] entries that fit the user's language scope. Concretely, for a multilingual prompt (LATAM, EU, Asia-Pacific, etc.) include **5–8 native-language head terms** — NOT just 2–3 English head terms (which would lose recall in non-English markets).
497
+
498
+ Drop **generic-overlap terms** — head terms that match the niche literally but also surface high volumes of adjacent-niche channels. Heuristic: any single-word generic that a lifestyle / family / entertainment channel might use about the niche in passing is a generic-overlap term. Keep the niche-specific terms — multi-word phrases or native-language vocabulary that lifestyle channels wouldn't casually use.
499
+
500
+ *Concrete example from the LATAM cooking calibration run*: in `topic.keywords` for the Cooking topic, the head terms include both niche-specific Spanish/Portuguese phrases (`"recetas"`, `"cocina"`, `"receitas"`, `"culinária"`, `"gastronomia"`) and generic-overlap terms (`"food"`, `"chef"`, `"comida"`). The generic-overlap terms each matched several hundred extra lifestyle/family channels that mention food in passing; dropping them tightened the result from ~5,700 channels at 60% signal to ~4,050 at higher signal-density without sacrificing genuine cooking creators. Same pattern applies in other niches — fitness ("body", "training" are generic; "pilates", "calisthenics" are niche-specific); finance ("money", "rich" generic; "ETF", "options", "yield farming" niche-specific); etc.
501
+
502
+ For `keyword_research` outputs: include the `core_head` tier (2–4 terms) **plus** the `sub_segment` tier (3–6 terms) — i.e. the upper two tiers, ~5–10 terms total. The `long_tail` tier is held back as expansion fuel.
503
+
504
+ **Expansion trigger — Type 3 (CHANNELS) only** (strictly one validation cycle, no second attempt): if the initial Type 3 FilterSet's `db_count` is `narrow` or `very_narrow` per the Step 2.V3 threshold table (`db_count` ≤ 50 channels), do **one** expansion step — add `channel_description_ai` and `channel_topic_description` to `content_fields` (the schema enum values). This opens recall to the AI-summarised surface. After this single expansion cycle, if the count is still narrow OR if it overshoots, do NOT compose another FilterSet — emit `decision: "alternatives"` and surface to the user with the count + the failing shape. Further widening (keyword changes, threshold relaxation) is the user's call via the alternatives prompt, not skill-side iteration.
505
+
506
+ Why one and not two: field-set expansion is the high-leverage lever (re-opens AI-summarised recall); a follow-up keyword-set expansion adds 30–90s of LLM + ES round-trip for marginal extra recall, contradicting the speed-up goal. If field-expansion alone doesn't reach the narrow threshold, the underlying issue is data sparsity in this niche — a second skill-side cycle won't fix that; user judgment will.
507
+
508
+ **For Type 1 (CONTENT) and Type 2 (BRANDS), this expansion rule does NOT apply.** The mechanism above is Type 3-specific because the lever it pulls (the AI-summarised channel-level fields `channel_description_ai` / `channel_topic_description`) only exists in Type 3's default `content_fields`. Types 1 and 2 default to video-level fields (`content`, `title`, `transcript` per the schema's `_tl_default_by_report_type`) which have no AI-summary surface to expand into. When a Type 1 / Type 2 FilterSet hits `narrow` / `very_narrow` on the same Step 2.V3 thresholds, **route directly to `decision: "alternatives"`** — there is no skill-side expansion to try first. The user's refinement options (via the alternatives prompt) are the only widening path.
509
+
510
+ **Why field-selection-first matters more than keyword-count-first**: the historical broad-first pattern fails specifically because cycles 1 and 2 typically tighten keywords without addressing the AI-fields noise source — only when fields finally get tightened (often cycle 3 or later) does the result converge. Doing fields first (lever 1) reaches the converged shape immediately; the keyword-count axis (lever 2) is a second-order adjustment that fine-tunes within an already-clean field set. The calibration run that proved this in production (LATAM cooking, ~3 minutes wasted in cycles 1–2) is documented above; the principle applies to any niche-discovery preview where AI-summarized fields cross over into adjacent topics.
511
+
403
512
  ### T2 — `tools/keyword_research.md`
404
513
  **Fires when**: `ReportType ∈ {1, 2, 3}` AND `topic_matcher.summary.strong_matches.length == 0` AND no entity-name anchor is present in USER_QUERY (i.e., the user did not name specific channels or brands, and did not use look-alike phrasing like "similar to X").
405
514
  **Skipped when**: any of the above conditions fail. **Crucially, skipped when the user enumerates specific channels or brands** — those provide the filter anchor; keyword research is wasted work.
@@ -426,9 +535,13 @@ Each tool fires only when its criteria are explicitly met (no automatic / specul
426
535
 
427
536
  ### Phase 2 validation sub-tool
428
537
 
429
- **`tools/sample_judge.md`** — fires inside Phase 2's validation step.
430
- **Fires when**: `ReportType ∈ {1, 2, 3}` AND `db_count` classification is `narrow` / `normal` / `broad` (i.e., not `empty` and not `too_broad` those go straight to retry without sample inspection).
431
- **Skipped when**: type 8 (deal sample shape ≠ channel sample shape) OR `db_count` was `empty` / `too_broad` (retry path).
538
+ **`tools/sample_judge.md`** — fires inside Phase 2's validation step (Step 2.V4).
539
+ **Fires when**: `ReportType ∈ {1, 2, 3}` AND the **post-V3-routing** `db_count` classification is `normal` (51–10000) or `broad` (10001–50000). "Post-V3-routing" means: the count actually routed to Step 2.V4 per the Step 2.V3 threshold table above — that is, either the **initial** count landed in `normal` / `broad`, OR (Type 3 only) the **post-Lever-1-expansion** count landed in `normal` / `broad`.
540
+ **Skipped when** any of:
541
+ - Type 8 — deal sample shape ≠ channel sample shape; sample_judge is not configured for sponsorship rows.
542
+ - Initial `db_count` is `empty` / `too_broad` — V3 routes to the Step 2.V5 retry path, not to sampling.
543
+ - Initial `db_count` is `very_narrow` / `narrow` AND `ReportType ∈ {1, 2}` — V3 routes directly to `decision: "alternatives"`; no sample inspection because there's no Lever-1 expansion path for Types 1/2.
544
+ - Initial `db_count` is `very_narrow` / `narrow` AND `ReportType == 3` — V3 routes to Lever 1 expansion first (one cycle, see Lever 1 above). If the **post-expansion** count is still `very_narrow` / `narrow` — or if it's `empty` / `too_broad` — it routes to `decision: "alternatives"` per the post-expansion table; sample_judge does NOT fire. (Post-expansion empty/too_broad does NOT re-enter V5 retry — the one-cycle cap is total, not per-direction.) sample_judge fires on Type 3 narrow-initial cases ONLY when the post-expansion count reclassifies to `normal` or `broad`.
432
545
  **Output**: `{ judgment: matches_intent | looks_wrong | uncertain, reasoning, noise_signals, matching_signals }`. `looks_wrong` triggers a Phase 2 follow-up to the user with structured options (save anyway / refine / cancel). `widget_builder` (Phase 4) only fires once Phase 2 emits a validated FilterSet.
433
546
 
434
547
  ### Phase 3 sub-tool
@@ -509,6 +622,8 @@ Compose an ES search body. The index is fixed server-side; the client only sends
509
622
 
510
623
  The `must` array carries one `multi_match` entry per keyword, combined per `keyword_operator`: AND → list every `multi_match` inside `must` (each is required); OR → move them to a sibling `should` array and add `"minimum_should_match": 1`. The example above shows the single-keyword case; multi-keyword extensions follow that pattern.
511
624
 
625
+ > ⚠️ **The `fields` array inside `multi_match` uses ES document field paths**, NOT the FilterSet `content_fields` enum values. ES uses `["name", "description", "ai.description", "ai.topic_descriptions"]`; the FilterSet enum (documented in [Lever 1 — Field selection](#lever-1-highest-impact--field-selection-type-3--channel-discovery) above and in [`intelligence_filterset_schema.json`](references/intelligence_filterset_schema.json)) uses `["channel.channel_name", "channel_description", "channel_description_ai", "channel_topic_description"]`. They're two different APIs touching the same underlying data — keep them distinct when composing the validation query vs the FilterSet emission.
626
+
512
627
  For `db_count` on type 3: read `aggregations.distinct_channels.value`, NOT `total`. The `total` field counts documents (channel-doc duplicates included); `distinct_channels` counts unique channel IDs.
513
628
 
514
629
  For `db_sample` (size 10) on type 3: same `query` body, plus:
@@ -1099,14 +1214,33 @@ For PG queries (type 8 or smoke-check fallback):
1099
1214
 
1100
1215
  ### Step 2.V3 — Apply threshold rules
1101
1216
 
1102
- | `db_count` | classification | next |
1217
+ The routing has two tables: **initial** (first validation cycle, all types) and **post-expansion** (Type 3 only, fires once after a Lever 1 expansion). Each cell maps a classified `db_count` to exactly one downstream action; no cell loops back to a previous step.
1218
+
1219
+ **Initial routing** (any type, first validation cycle):
1220
+
1221
+ | `db_count` | classification | next (Type 3) | next (Type 1 / 2) |
1222
+ |---|---|---|---|
1223
+ | 0 | `empty` | Step 2.V5 (retry — broaden) | Step 2.V5 (retry — broaden) |
1224
+ | 1–4 | `very_narrow` | **Lever 1 expansion** (one cycle: add `channel_description_ai` + `channel_topic_description`); on the post-expansion `db_count`, use the **post-expansion routing table below** — NOT this table | **`decision: "alternatives"`** — no skill-side expansion path for Type 1/2 |
1225
+ | 5–50 | `narrow` | **Lever 1 expansion** (same as above) | **`decision: "alternatives"`** |
1226
+ | 51–10000 | `normal` | Step 2.V4 (sample) | Step 2.V4 (sample) |
1227
+ | 10001–50000 | `broad` | Step 2.V4 (sample); proceed with narrow-suggest | Step 2.V4 (sample); proceed with narrow-suggest |
1228
+ | > 50000 | `too_broad` | Step 2.V5 (retry — narrow) | Step 2.V5 (retry — narrow) |
1229
+
1230
+ **Post-expansion routing** (Type 3 only, after the single Lever 1 expansion cycle):
1231
+
1232
+ | post-expansion `db_count` | classification | next |
1103
1233
  |---|---|---|
1104
- | 0 | `empty` | Step 2.V5 (retry broaden) |
1105
- | 1–4 | `very_narrow` | Step 2.V4 (sample); proceed with warning |
1106
- | 5–50 | `narrow` | Step 2.V4 (sample); proceed with note |
1234
+ | 0 | `empty` | **`decision: "alternatives"`** (no second narrowing cycle; expansion is one cycle by Lever 1's definition) |
1235
+ | 1–4 | `very_narrow` | **`decision: "alternatives"`** |
1236
+ | 5–50 | `narrow` | **`decision: "alternatives"`** |
1107
1237
  | 51–10000 | `normal` | Step 2.V4 (sample) |
1108
1238
  | 10001–50000 | `broad` | Step 2.V4 (sample); proceed with narrow-suggest |
1109
- | > 50000 | `too_broad` | Step 2.V5 (retrynarrow) |
1239
+ | > 50000 | `too_broad` | **`decision: "alternatives"`** (no second narrowing cycle; the expansion overshot emit the count and let the user decide whether to add a stricter filter) |
1240
+
1241
+ Only `normal` and `broad` route post-expansion to sample inspection. Everything else — `empty`, `very_narrow`, `narrow`, `too_broad` — routes to `alternatives` with no further skill-side cycles. This is the unambiguous "one cycle by definition" cap from Lever 1, enforced as a table.
1242
+
1243
+ The pre-`d395ae2` initial table said `very_narrow` / `narrow` go to *"sample; proceed with warning/note."* That was the historical universal-flow behaviour (used pre-Lever-1). The new Lever 1 rule for Type 3 replaces "proceed with warning" with "expand once first"; the Type 1/2 rule replaces it with "alternatives." Old prose elsewhere in the file describing "proceed with warning" on narrow counts is stale relative to these tables — follow the tables.
1110
1244
 
1111
1245
  ### Step 2.V4 — Run sample query, then `sample_judge`
1112
1246
 
@@ -1132,7 +1266,7 @@ Decision based on judgment:
1132
1266
  - `looks_wrong` → `decision: "alternatives"` — Mode-B follow-up to user (save anyway / refine / cancel). Skip Phase 3 + Phase 4.
1133
1267
  - `uncertain` → `decision: "alternatives"` favoring "Refine" — surface ambiguity rather than ship silently.
1134
1268
 
1135
- ### Step 2.V5 — Retry orchestration (cap: 3)
1269
+ ### Step 2.V5 — Retry orchestration (cap: 1)
1136
1270
 
1137
1271
  When `db_count` is `empty` or `too_broad`, emit structured feedback to whichever upstream signal produced the failing FilterSet:
1138
1272
 
@@ -1141,11 +1275,19 @@ When `db_count` is `empty` or `too_broad`, emit structured feedback to whichever
1141
1275
  | Matched topics → `keywords` field | re-compose FilterSet with broader keywords from `topic.keywords[]` (beyond head) or relax operator AND→OR | `{issue, suggestion, previous_filterset}` |
1142
1276
  | `keyword_research` output | re-invoke T2 with the failing keywords + retry hint | `{issue, suggestion}` |
1143
1277
 
1144
- Cap at **3 retries total**. After 3, `decision: "fail"` with diagnostic better to honestly fail than infinite-loop.
1278
+ Cap at **1 retry**. After 1 retry, if the second cycle still returns `empty` or `too_broad`, emit `decision: "alternatives"` and surface the count + the failing FilterSet to the user let them pick refine / save anyway / cancel.
1279
+
1280
+ **Why 1, not 3** (mechanism + calibration evidence):
1145
1281
 
1146
- **What does NOT trigger retry**:
1147
- - `sample_judge` returning `looks_wrong` substantive failure (data sparsity or noise), not a shape failure. Retrying produces more noise. Go straight to `alternatives`.
1148
- - `db_count` in `narrow` (1–4)proceed with warning; retry would lose the small but real signal.
1282
+ - Each retry costs **30–90 seconds** of full Phase 2c → Phase 3 cycle (LLM compose + ES count + ES sample + sample_judge LLM).
1283
+ - After the first retry, if the count is *still* empty/too_broad, the underlying failure shape is almost always **data sparsity / inherent niche-language noise** not a shape issue further iteration can fix. The 2nd and 3rd retries usually fail the same way as the first, costing 60–180s for the same signal.
1284
+ - The shape-mismatch case (which retry IS valuable for wrong AND/OR, missing field) is almost always caught on the first retry. So 1 retry catches the only failure mode where iteration helps; capping at 1 just bails on the failure modes where iteration doesn't.
1285
+
1286
+ *Calibration evidence — multilingual niche-discovery runs (LATAM cooking, fitness/wellness)*: in the historical 3-cap regime, runs that hit the retry path consistently went broad → tighter → AI-anchored → name+description-only over three cycles. Cycles 2 and 3 each saved ~10% additional noise but added 60s+ each. The user value of "10% less noise on the long tail" is small relative to the 2+ extra minutes per run; better to surface the noise after one retry and let the user decide. The principle generalises to any noisy-niche shape (beauty, aviation, crypto-vs-finance edge, etc.).
1287
+
1288
+ **What does NOT trigger retry** (unchanged):
1289
+ - `sample_judge` returning `looks_wrong` — substantive failure (data sparsity or noise), not a shape failure. Retrying produces more noise. Go straight to `alternatives`. **A noisy spot-check is NOT a license for the agent to self-initiate a keyword-refinement loop.** The agent has been observed running `looks_wrong → tighten → re-validate → looks_wrong → tighten → re-validate` cycles outside the official retry path on multilingual niche-discovery prompts (LATAM cooking being one documented case), costing ~3 minutes for marginal noise reduction. If the first sample looks noisy, surface it via `alternatives`; do not silently iterate. The agent does not have license to chain validation cycles based on its own subjective noise judgment — that's the user's call after the alternatives prompt.
1290
+ - `db_count` in `narrow` (5–50) or `very_narrow` (1–4) — does NOT trigger the V5 retry path (V5 is for `empty` and `too_broad` only). The narrow / very_narrow routing is owned by Step 2.V3's threshold table: **Type 3 → Lever 1 expansion (one cycle); Type 1 / Type 2 → `decision: "alternatives"`.** Neither path involves V5. The pre-`d395ae2` text here said "narrow (1–4) — proceed with warning" — that's stale on two counts (1–4 is `very_narrow`, not `narrow`, per V3's bucket labels; AND "proceed with warning" is the historical universal-flow behaviour that the new Lever 1 rule replaces).
1149
1291
 
1150
1292
  ### Step 2.V6 — Compose decision output
1151
1293
 
@@ -1406,10 +1548,10 @@ Pseudo-shape (not runnable JSON — `<int>`, `|`-unions, and `/* notes */` are p
1406
1548
  5. **Takeaways cite specifics.** Numbers, names, intent labels. Vague takeaways ("the report looks good") add no value.
1407
1549
  6. **No new filters or columns in Phase 4.** Phase 4 doesn't reshape the FilterSet or add columns — it picks widgets, validates, and composes. Reshape requires looping back to Phase 2 or 3.
1408
1550
  7. **Type-8 axis consistency.** Both `_over_<axis>` histograms in the same type-8 report use the SAME axis (per `sponsorship_widget_schema.json`'s `_tl_axis_branching`).
1409
- 8. **Don't echo `campaign_config_json` back to chat — ever.** In save mode the JSON lives in the `/tmp/` transport file passed to `tl reports create --config-file <path> --yes`. In preview mode it stays in working memory. **There is no flow where the campaign-config JSON belongs in the chat output.** See the Save-or-preview policy at the top of this file for the full split between save mode and preview mode.
1551
+ 8. **Don't echo `campaign_config_json` back to chat — ever.** In save mode the JSON lives in the portable-temp transport file passed to `tl reports create --config-file <path> --yes` (path resolved at runtime per Save-or-preview policy step 1). In preview mode it stays in working memory. **There is no flow where the campaign-config JSON belongs in the chat output.** See the Save-or-preview policy at the top of this file for the full split between save mode and preview mode.
1410
1552
  9. **When saving, use `--config-file <path>`, not `--config '<json>'`.** Passing JSON inline through a single-quoted shell argument breaks the moment any string value contains an apostrophe (which is common — "McDonald's", "L'Oréal", channel/title text). The temp-file transport sidesteps shell quoting entirely.
1411
- 10. **Temp file MUST be under `/tmp/`** (or `$TMPDIR` / `%TEMP%`the system temp directory). Never write the transport file to the user's current working directory, project root, repo, or any other path they might be looking at. Pollution of cwd with `foo_report.json` is a regression bug.
1412
- 11. **Writing the file is NOT saving the report.** The save happens when `tl reports create --config-file <path> --yes` returns success. Until that command's exit code is read, the report does not exist. **Never tell the user "saved as <path>.json"** — that confuses the transport file (which is throwaway) with the saved Campaign (which is what they asked for). The save-success message must come from the CLI response: a `campaign_id` and `report_url`.
1553
+ 10. **Temp file MUST be under the system temp directory** — resolved at runtime via `python -c "import tempfile, os; print(os.path.join(tempfile.gettempdir(), '<name>'))"` so the path is correct on every platform (Linux/macOS: typically `/tmp/...`; Windows: `C:\Users\<user>\AppData\Local\Temp\...`). Never hardcode `/tmp/` that fails silently on Windows. Never write the transport file to the user's current working directory, project root, repo, or any other path they might be looking at. Pollution of cwd with `foo_report.json` is a regression bug.
1554
+ 11. **Writing the file is NOT saving the report.** The save happens when `tl reports create --config-file <path> --yes` returns success. Until that command's exit code is read, the report does not exist. **Never tell the user "saved as <path>.json"** — that confuses the transport file (which is throwaway) with the saved TL report (which is what they asked for). The save-success message must come from the CLI response: a `campaign_id` (rendered to the user as **"report #N"**, NOT "Campaign #N") and `report_url`.
1413
1555
  12. **Default to preview, not save.** Phases 1–4 always run, but the chat output is takeaways + a sample-rows table by default. **Only save when the user's prompt contains explicit save intent** — see the Save-or-preview policy near the top for the trigger word lists. Ambiguous middle ("build a report on X", "create a campaign for Y") → preview + the closing "say save" tail. Save is the explicit, opt-in path; preview is the conservative default.
1414
1556
  13. **In preview mode the agent does not invoke `tl reports create`** and does not write a temp file. The campaign config stays in working memory. If the user follows up with "save" / "yes" / "go ahead", re-use that same in-memory config — do not re-run Phases 1–4.
1415
1557
  14. **Preview output MUST include a sample-rows table.** Use the `db_sample` rows Phase 2 already collected (top 5–10 by sort key) and render them as a tight Markdown table with type-specific columns per the Save-or-preview policy:
@@ -1418,14 +1560,45 @@ Pseudo-shape (not runnable JSON — `<int>`, `|`-unions, and `/* notes */` are p
1418
1560
  - Type 2 (brands): `Brand | Mentions | Channels`
1419
1561
  - Type 8 (deals/sponsorships): `Channel | Brand | Status | Send date`
1420
1562
  **Takeaways alone are not a preview** — the user asked for results; takeaways describe the result, the table IS the result. Skipping the sample table because the result feels narrow, or because the prompt felt "report-y", is a regression bug. The table comes from data Phase 2 already pulled; it costs nothing extra to render.
1421
- 15. **When save intent is detected, the agent MUST invoke `tl reports create` itself.** Telling the user "Save it via POST to the report-creation API endpoint when ready" or "to save, run `tl reports create --config '<json>'`" or any other form of "you save it yourself" is a regression bug — that's the obsolete pre-v0.6.12 fallback. If the prompt contains any save-intent word (see Save-or-preview policy: "save", "create the report", "create a campaign", "make a campaign for me to come back to", "publish", "persist") the flow is: write to `/tmp/<slug>.json` with the `Write` toolrun `tl reports create --config-file /tmp/<slug>.json --yes` with `Bash` → echo the campaign_id + report_url from the CLI's response. The user never sees the JSON, never gets told to do something themselves. If the CLI returns an error, surface it; do not fall back to "here's the JSON, you do it".
1563
+ 15. **When save intent is detected, the agent MUST invoke `tl reports create` itself.** Telling the user "Save it via POST to the report-creation API endpoint when ready" or "to save, run `tl reports create --config '<json>'`" or any other form of "you save it yourself" is a regression bug — that's the obsolete pre-v0.6.12 fallback. If the prompt contains any save-intent word (see Save-or-preview policy: "save", "create the report", "create a campaign", "make a campaign for me to come back to", "publish", "persist") the flow is the three steps in Save-or-preview policy step 1+2+3: **resolve a portable temp path Write the JSON verify the file exists → invoke `tl reports create --config-file <that-exact-path> --yes`** → echo the campaign_id + report_url from the CLI's response. The user never sees the JSON, never gets told to do something themselves. If the CLI returns an error, surface it; do not fall back to "here's the JSON, you do it".
1422
1564
  16. **Forbidden phrases** (these are regression markers — if you see yourself about to type any of these, stop and re-read rule 15):
1423
1565
  - "Save it via POST to the report-creation API endpoint when ready"
1424
1566
  - "Save it via the report-creation API endpoint when ready"
1425
1567
  - "to save, run `tl reports create --config '<json>'`"
1426
1568
  - "Saved as <path>.json" (without a campaign_id from the CLI)
1427
1569
  - "Saved to <path>" (without a campaign_id)
1570
+ - "held in working memory; not echoed to chat per the skill's rules"
1571
+ - "the campaign config (held in working memory…)"
1572
+ - "per the skill's rules" / "per the policy"
1428
1573
  - Any instruction telling the user to take a save action themselves when the original prompt was a save-intent prompt.
1574
+ 17. **Always render a plain-English filter summary in the user-facing reply** — both in save mode and preview mode. The summary is 4–7 short bullets describing **what the report contains**, not how it's stored. Use the "Filter summary pattern" translation table in the user-facing-language section near the top of this file. Mention only the filters that meaningfully shape what the user will see; skip platform defaults (e.g. don't bullet `channel_formats: [4]` when it's the type-3 default). Use the user's own brand and keyword wording verbatim where it fits. Example: *"results will be focused on fintech creators in MSN; only English-speaking channels with strong US audiences will be included; channels already pitched to Webull will be automatically excluded; results will prioritise creators with proven sponsorship history; outreach-ready columns and performance widgets will be added automatically"*. **Don't describe the report as "the config" or "the JSON" or "held in working memory"** — those are internal terms; the user wants to know what the report does.
1575
+ 18. **Save-mode preflight on the temp file is mandatory.** Per the Save-or-preview policy step 1+2: resolve a portable temp path via `python -c "import tempfile, os; print(...)"` BEFORE writing, then verify with `test -f <path>` AFTER writing. Hardcoding `/tmp/` on Windows fails silently. If the verification fails, surface a clean error explaining what happened (path, why) and offer the user the inline JSON as a fallback. Do not invoke `tl reports create --config-file` if the file isn't confirmed to exist — that just produces a confusing "No such file or directory" error.
1576
+ 19. **Narrate at phase-outcome level, not tool-call level.** The user doesn't need to see "Ran 19 commands, read 2 files" enumerated, or the raw text of every `tl db pg` query the skill issued during validation. Surface the phase outcomes in plain English: "Looking up StoryBlocks in the brand list… found it (47 deals on file)." not "Ran tl db pg --json 'SELECT id, name FROM thoughtleaders_brand WHERE name ILIKE %StoryBlocks%' which returned: {results: [{id: 868, name: 'StoryBlocks'}], total: 1, ...}". The harness shows tool-call detail in collapsible UI; the skill's narration is the high-level story alongside it.
1577
+ 20. **Save tail is mandatory in every preview reply.** The previous "skip when the prompt is purely informational" exemption was over-applied (a live FRÉ Skincare run skipped the tail even though the prompt was clearly designing a TL report — specific filters, custom column, brand-exclusion logic; a live aviation/non-MSN run skipped it again, closing only with a refinement offer — *"If you want me to tighten to fixed-wing-only or drop drones, say the word and I'll re-filter."* — which is **not** the same thing). Always close the preview with *"If you want this as a saved TL report, just say save."* The line is one ignorable sentence if the user didn't want a save; if they did, it's the only signal that telling the agent to save is even an option.
1578
+
1579
+ **Refinement offers do NOT substitute for the save tail.** Both can appear in the same closing — refinements first ("Want to tighten to fixed-wing only? Drop drones? Add a CPM column?"), then the save tail on its own line ("If you want this as a saved TL report, just say save."). The save tail is **always last** so it's the line the user sees most recently when they read the reply bottom-up.
1580
+ 20a. **Channel/video/brand names in the sample-rows table MUST be hyperlinked to the TL platform page** (not to YouTube). The user is browsing the result *in TL*; the link is the affordance to drill into a row's full TL profile. URL patterns:
1581
+
1582
+ | Sample-table column | Link target | Slug source |
1583
+ |---|---|---|
1584
+ | **Channel** (type 3 / type 8) | `https://app.thoughtleaders.io/youtube/<slug>` | `thoughtleaders_channel.slug` (resolve in Phase 2 alongside the sample) |
1585
+ | **Brand** (type 2) | `https://app.thoughtleaders.io/brands/<slug>` | brand-side equivalent slug |
1586
+ | **Title** (type 1 / videos) | `https://app.thoughtleaders.io/articles/<id>` (or whatever the platform's video-detail URL is) | the article id |
1587
+
1588
+ Render as Markdown links in the table cell — *not* the bare ID, *not* the YouTube URL, *not* both. Example for type 3:
1589
+
1590
+ ```
1591
+ | Channel | Subscribers | Last published |
1592
+ |--------------------------|------------:|----------------|
1593
+ | [Jubilee](https://app.thoughtleaders.io/youtube/jubilee) | 12.4M | 2 days ago |
1594
+ | [PewDiePie](https://app.thoughtleaders.io/youtube/pewdiepie) | 110M | yesterday |
1595
+ ```
1596
+
1597
+ If the slug is missing or empty for a row, fall back to the ID-based path the platform exposes (e.g. `https://app.thoughtleaders.io/youtube/id-<channel_id>`); never fall back to the YouTube URL — that takes the user *away* from TL. The Phase 2 sample query must include the slug column alongside the rendered fields, otherwise the table can't link properly.
1598
+
1599
+ 21. **No side-channel deliverables.** The skill produces exactly two output shapes: (a) a saved TL Campaign + a campaign URL (save mode), or (b) an in-chat preview with the sample-rows table + takeaways + save tail (preview mode). It does NOT write CSVs, Markdown reports, or any other "data dump" file to disk as a deliverable. A real run for FRÉ Skincare wrote a CSV to `<temp>\fre-skincare-shortlist.csv` and pointed the user at it as the "full list" — that's a fabricated alternative deliverable that bypasses the TL report-creation flow. If the user wants more than the preview shows, the answer is "save it as a campaign and run it" — not "I'll dump CSV". The only filesystem write the skill is allowed to make is the `<system-temp>/tl-report-builder-<slug>.json` transport file used in step 1 of the save mechanics, and even that is a transport (deleted whenever) — never a deliverable.
1600
+ 22. **Phases 1–4 always run; the skill never short-circuits to a chat-only data answer.** When the skill is invoked, the output is **always** a Campaign (save mode) or a Phase-4 preview (preview mode). Bypassing Phase 1–4 to produce a verification table, an analyst summary, a list cross-check, or any other "I'll just answer this directly in chat" deliverable is a regression bug. Real example to internalise: a prompt of *"Brands sponsoring Linus Tech Tips in the past 6 months: dbrand, Private Internet Access, Squarespace, Vessi, Secretlab, UGREEN, Odoo, Dell, Razer, Saily"* should route through Phase 1 → Type 2 brands report scoped to channel 1788 + last 180 days → Phases 2/3/4 → preview with the user's seed brands as a starting filter and the takeaways calling out *"your seed list is accurate but incomplete — TL data shows 60 distinct sponsors over 131 videos; top missing are War Thunder (7), Boot.dev (6), DeleteMe (6)…"*. Instead, a recent run produced exactly that analytical content **as a free-floating markdown table in chat** — no FilterSet emitted, no columns picked, no widgets, no save option. The analytical insight is welcome as a takeaway; it is **not** a substitute for the report. If you find yourself replying with a markdown table directly, ask: am I about to ship a Phase-4 preview, or am I bypassing the phases? The answer must always be the former.
1601
+ 23. **No ad-hoc data-engineering pipelines.** The skill does NOT write Python consolidation scripts, multi-stage CSV merge tools, dedupe scripts, false-positive filters as standalone files, or any other custom data pipeline as part of producing the deliverable. The data plane is fixed: `tl db pg` (PG), `tl db es` (ES), `tl db fb` (Firebolt). Phase 2 issues queries against these directly to compose a FilterSet and validate it; that's the entire data-side surface. A real aviation/non-MSN run produced this anti-pattern: the agent issued five separate PG queries each writing a CSV (`/tmp/aviation_by_name.csv`, `/tmp/aviation_desc.csv`, `/tmp/aviation_desc2.csv`, `/tmp/aviation_desc3.csv`, `/tmp/aviation_pilot_desc.csv`), wrote a `consolidate_aviation.py` script to merge + dedupe + filter false positives, hit a Windows-vs-Linux `/tmp/` path mismatch, debugged it with `cygpath`, eventually rewrote the script to use `%LOCALAPPDATA%\Temp`, then produced `aviation_consolidated.csv` as the "full list". **None of this is the skill's job.** The right shape: one ES query with `terms` / `bool.should` filters covering the niche keywords + the `creator_countries` filter + `msn_channels_only: false` + `is_active: true` → get count + sample → emit the FilterSet → preview. If the skill's narration is starting to read like a data engineer's bash session ("Run consolidation script", "Try /tmp path resolution", "Resolve /tmp via cygpath", "Find where /tmp files actually are"), stop — the skill has gone off the rails. Restart from Phase 1 with a single composed query.
1429
1602
 
1430
1603
  ## Follow-Up Interactions
1431
1604
 
@@ -1440,7 +1613,7 @@ Every phase has explicit conditions where it must pause and ask the user, rather
1440
1613
  | **2** | T4 returned ambiguous name resolution (>1 active candidate per name) | "Which one of these did you mean?" + option list |
1441
1614
  | **2** | T3 cross-reference returned unexpectedly large or zero result set | "The preliminary query matched [N] entities — narrow the date range or status filter?" |
1442
1615
  | **2** | Validation: sample_judge returned `looks_wrong` (G11-class noise) | Mode B prompt: save anyway / refine / cancel — plain English only, citing 2–3 specific sample names; never expose internal terms (phase numbers, tool names, `validation_concerns`, `db_count`, `looks_wrong`). See "User-facing rendering (Mode B)" in the Phase 2 section. |
1443
- | **2** | Validation: 3 retries exhausted on empty/too_broad | Surface diagnostic + suggest the user reformulate the request |
1616
+ | **2** | Validation: 1 retry exhausted on empty/too_broad | Emit `decision: "alternatives"` — surface the count + failing shape; let the user choose refine / save anyway / cancel. Skill does NOT chain further validation cycles. |
1444
1617
  | **3** | Column template + extra columns the user listed differ from each other | "Use the template's columns, the columns you listed, or both?" |
1445
1618
  | **3** | Selected columns incompatible (e.g., requested `Views` on a type 3 report) | "[column] isn't available for [report type]; closest is [alternative]" |
1446
1619
  | **3** | No columns provided AND no clear intent | "I'll use [type]'s default set unless you want a different focus (outreach / discovery / sponsorship-pitch)" |
@@ -1480,7 +1653,7 @@ USER: Build me a report of gaming channels with 100K+ subscribers in English
1480
1653
 
1481
1654
  Claude follows this SKILL.md, executing each phase in order. No external command needed — the skill IS the orchestration; `tl db pg` is invoked from within Phase 2/3/4 as needed; tools fire conditionally per their criteria.
1482
1655
 
1483
- > **Save vs preview**: by default the skill runs Phases 1–4 and replies with takeaways + a sample-rows table — **no save**. Only when the user's prompt contains explicit save intent ("save", "create the report", "make a campaign for me to come back to") does the skill (1) write the JSON to a `/tmp/<slug>.json` file via the `Write` tool, then (2) run `tl reports create --config-file /tmp/<slug>.json --yes` via `Bash`. The file transport is shell-safe; passing the JSON inline as `--config '<json>'` breaks the moment any value contains an apostrophe ("McDonald's", "L'Oréal"). The user sees the takeaways and (in save mode) the resulting campaign link. **The JSON config never appears in chat in either mode.** For edits to an existing saved report, use `tl reports update <report_id> '<json patch>'` (same shell-quoting caveat — use a `/tmp/` file when the patch contains apostrophes). Do NOT tell users to paste into the platform UI — that's an obsolete fallback from before the CLI commands existed. See the Save-or-preview policy near the top for the full trigger word lists.
1656
+ > **Save vs preview**: by default the skill runs Phases 1–4 and replies with takeaways + a sample-rows table — **no save**. Only when the user's prompt contains explicit save intent ("save", "create the report", "make a campaign for me to come back to") does the skill run the three save-mechanics steps: (1) **resolve a portable temp path** via `python -c "import tempfile, os; print(...)"`, (2) **Write** the JSON to that path and **verify** with `test -f`, (3) run `tl reports create --config-file <that-exact-path> --yes` via `Bash`. The file transport is shell-safe; passing the JSON inline as `--config '<json>'` breaks the moment any value contains an apostrophe ("McDonald's", "L'Oréal"). Hardcoding `/tmp/` fails on Windows. The user sees the takeaways and (in save mode) the resulting campaign link. **The JSON config never appears in chat in either mode.** For edits to an existing saved report, use `tl reports update <report_id> '<json patch>'` (same shell-quoting caveat — use a portable temp file when the patch contains apostrophes). Do NOT tell users to paste into the platform UI — that's an obsolete fallback from before the CLI commands existed. See the Save-or-preview policy near the top for the full trigger word lists.
1484
1657
 
1485
1658
  ## Reference Files
1486
1659
 
@@ -115,13 +115,15 @@ When two fields look similar, use this table to pick.
115
115
  | "Channels created on TL since X" | `createdat_from` (+ `createdat_to`) | This is the TL-side AdLink/Channel record creation, not the YouTube publish date. |
116
116
  | Sponsorship send/publish | `start_date` / `end_date` (type 8 reuses these for send_date) | Type 8's date semantics shift — the schema docstring explains. |
117
117
 
118
- ### Reach / size signals
118
+ ### Channel-size signals
119
119
 
120
- | User intent | Fields to use | Why |
120
+ > **Vocabulary**: SQL/internal term is `reach`, user-facing term is **subscribers**. See `thoughtleaders-skills/tl-data/references/business-glossary.md` for the canonical mapping (line 149: *"AMs say subscribers, SQL says reach"*). When emitting the FilterSet, use the field name `reach_from` / `reach_to`. When narrating to the user (chat replies, sample-table column headers, takeaways, filter summaries), say **"subscribers"** — never **"reach"**. *"By reach: 1M+ → 2 · 100K–1M → 57"* is a leak; write *"By subscribers: 1M+ → 2 · 100K–1M → 57"*.
121
+
122
+ | User intent | Field to emit | Narrate as |
121
123
  |---|---|---|
122
- | "Big channels" / "mid-size or bigger" / size floor | `reach_from` (+ `reach_to`) | Reach is TL's preferred size metric — handles podcasts/newsletters too, not just subs. |
123
- | "Channels expecting >X projected views per video" | `projected_views_from` (+ `projected_views_to`) | PV is a forward-looking estimate; better than Reach when intent is sponsor-deal pricing. |
124
- | Raw YouTube views per video | `youtube_views_from` (+ `youtube_views_to`) | Per-upload metric only meaningful for type 1 (CONTENT). |
124
+ | "Big channels" / "100K+ subscribers" / size floor | `reach_from` (+ `reach_to`) | "subscribers" / "channel size" |
125
+ | "Channels expecting >X projected views per video" | `projected_views_from` (+ `projected_views_to`) | "projected views" — PV is a forward-looking estimate; better than subscribers when intent is sponsor-deal pricing |
126
+ | Raw YouTube views per video | `youtube_views_from` (+ `youtube_views_to`) | "views per video" — per-upload metric, only meaningful for type 1 (CONTENT) |
125
127
 
126
128
  ### Demographic shares
127
129
 
@@ -24,6 +24,10 @@ The orchestration injects two values:
24
24
  ```
25
25
  The topics list changes over time; never assume a fixed count or fixed IDs.
26
26
 
27
+ ### How to fetch the topics
28
+
29
+ The fetch query, the column list, and the negative-column regression markers all live in the canonical Postgres-schema reference in the `tl-cli:tl` skill: **[`tl/references/postgres-schema.md` → `thoughtleaders_topics`](../../tl/references/postgres-schema.md#thoughtleaders_topics-curated-topic-taxonomy)**. Schema-shaped facts belong in that reference, not in tool text. Use the verbatim fetch query documented there. **Do not restate or paraphrase the schema here.** If you find yourself about to type `SELECT … FROM thoughtleaders_topics …` from memory, stop and consult the reference file instead. This tool's job is to score topics against the user query; the schema reference's job is to say what the underlying table looks like.
30
+
27
31
  ---
28
32
 
29
33
  ## Output schema (strict)
@@ -1,3 +1,3 @@
1
1
  """ThoughtLeaders CLI — query sponsorship data, channels, brands, and intelligence."""
2
2
 
3
- __version__ = "0.6.20"
3
+ __version__ = "0.6.22"