@ainyc/canonry 4.27.2 → 4.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@
5
5
  **Agent-first AEO operating platform. Open source. Self-hosted.**
6
6
 
7
7
  - Track citations across Gemini, ChatGPT, Claude, Perplexity, and local LLMs
8
- - Watch AI engines crawl and refer traffic via [server-log ingestion](skills/canonry-setup/references/server-side-traffic.md) — Cloud Run today, more sources coming
8
+ - Watch AI engines crawl and refer traffic via [server-log ingestion](skills/canonry-setup/references/server-side-traffic.md) — Cloud Run logs and the WordPress Traffic Logger plugin today
9
9
  - Diagnose against real traffic with built-in [GSC](docs/google-search-console-setup.md), [GA4](docs/google-analytics-setup.md), and [Bing Webmaster](docs/bing-webmaster-setup.md)
10
10
  - Execute fixes via [WordPress](docs/wordpress-setup.md), JSON-LD schema, and indexing submissions
11
11
  - Manage many clients declaratively — config-as-code YAML + `canonry apply`
@@ -66,7 +66,7 @@ Configure during `canonry init`, in the dashboard `/settings`, or as env vars.
66
66
  | **Architecture & data model** | [docs/architecture.md](docs/architecture.md) · [docs/data-model.md](docs/data-model.md) |
67
67
  | **Aero — built-in agent** | [skills/aero/SKILL.md](skills/aero/SKILL.md) |
68
68
  | **MCP — Claude Desktop / Cursor / Codex** | [docs/mcp.md](docs/mcp.md) |
69
- | **Integrations** | [GSC](docs/google-search-console-setup.md) · [GA4](docs/google-analytics-setup.md) · [Bing](docs/bing-webmaster-setup.md) · [WordPress](docs/wordpress-setup.md) · [Server-side traffic (Cloud Run logs)](skills/canonry-setup/references/server-side-traffic.md) |
69
+ | **Integrations** | [GSC](docs/google-search-console-setup.md) · [GA4](docs/google-analytics-setup.md) · [Bing](docs/bing-webmaster-setup.md) · [WordPress](docs/wordpress-setup.md) · [Server-side traffic (Cloud Run + WordPress logs)](skills/canonry-setup/references/server-side-traffic.md) |
70
70
  | **Deployment** — Docker, Railway, Render, systemd, Tailscale | [docs/deployment.md](docs/deployment.md) |
71
71
  | **API** — 118+ endpoints | `GET /api/v1/openapi.json` (no auth) |
72
72
  | **Skills bundle** for Claude Code / Codex | `canonry skills install` ([details](skills/canonry-setup/SKILL.md)) |
@@ -45,6 +45,7 @@ Detailed playbooks live alongside this file. Read them on demand when the task m
45
45
  |---|---|
46
46
  | `references/orchestration.md` | Planning a multi-step or recurring workflow (baseline, weekly review, content-gap analysis) |
47
47
  | `references/regression-playbook.md` | A query lost its citation and you need to triage and respond |
48
+ | `references/aeo-discovery.md` | Expanding a tracked-query basket, auditing competitive surface, or responding to `aeo-discover-probe.completed` |
48
49
  | `references/memory-patterns.md` | Deciding whether to remember a fact in agent memory or re-query canonry |
49
50
  | `references/reporting.md` | Producing a client-facing weekly or monthly summary |
50
51
  | `references/wordpress-elementor-mcp.md` | Editing WordPress pages with the Elementor MCP integration |
@@ -45,9 +45,53 @@ Per session: ~$1 at the default probe budget (100 queries × 1 Gemini grounded c
45
45
  Things to call out without being asked:
46
46
 
47
47
  - **High wasted-surface ratio** (≥ 40% of probes, or > cited count at ≥ 20%) → the project is missing from its own competitive space. The auto-written `discovery.basket-divergence` insight flags this as `high` severity.
48
- - **New competitor domains** in `competitorMap` that aren't already in the project's tracked competitor list → suggest adding via `canonry competitor add <project> <domain>`. PR 2's `canonry discover promote` will automate this.
48
+ - **Recurring new competitor domains** in `competitorMap` that aren't already in the project's tracked competitor list → `canonry discover promote` adopts domains with at least 2 hits automatically alongside the queries; or add them à la carte with `canonry competitor add <project> <domain>`.
49
49
  - **Aspirational greenfield** queries with no tracked competitor and no canonical cite → low-friction content opportunities.
50
50
 
51
+ ## Promoting a session into the tracked basket
52
+
53
+ Once a session is `completed`, preview first unless the operator has already approved the write:
54
+
55
+ ```bash
56
+ canonry discover promote preview <project> <session-id>
57
+ ```
58
+
59
+ Or the MCP equivalent: `canonry_discover_promote_preview` with `{ project, sessionId }`.
60
+
61
+ The preview returns every bucket so you can explain the tradeoff:
62
+
63
+ - `cited` — already grounded to the project, safe to track.
64
+ - `aspirational` — greenfield ICP-fit opportunities, safe to track as a growth basket.
65
+ - `wasted-surface` — competitor-cited but project-missing. Treat as content-planning evidence first; do not add it to the weekly tracked basket unless the operator explicitly wants those off-ICP competitor gaps tracked.
66
+ - `suggestedCompetitors` — recurring domains not already tracked. The promote path only auto-adopts domains with at least 2 hits.
67
+
68
+ Promote with one of these paths:
69
+
70
+ ```bash
71
+ canonry discover promote <project> <session-id> # cited + aspirational buckets + recurring competitor domains
72
+ canonry discover promote <project> <session-id> --bucket aspirational # scope to a bucket subset (repeatable / comma-separated)
73
+ canonry discover promote <project> <session-id> --bucket wasted-surface # explicitly track off-ICP competitor gaps
74
+ canonry discover promote <project> <session-id> --no-competitors # queries only, skip the competitor merge
75
+ ```
76
+
77
+ Or the MCP equivalent:
78
+
79
+ ```json
80
+ { "project": "<project>", "sessionId": "<session-id>" }
81
+ ```
82
+
83
+ That default request promotes `cited` + `aspirational` queries and recurring competitors. For scoped writes, pass `request`:
84
+
85
+ ```json
86
+ { "project": "<project>", "sessionId": "<session-id>", "request": { "buckets": ["aspirational"], "includeCompetitors": false } }
87
+ ```
88
+
89
+ - **Default is cited + aspirational.** `wasted-surface` queries are off-ICP competitor gaps; promote them only when the operator explicitly wants those tracked in the weekly basket.
90
+ - **Competitor promotion requires recurrence.** Default competitor merge ignores one-off domains and adopts only domains with at least 2 hits.
91
+ - **Add-only and idempotent.** Queries and competitor domains already tracked are returned under `skipped`, never inserted twice. Re-running a promote is safe.
92
+ - **Completed sessions only.** Promoting a `queued`/`seeding`/`probing`/`failed` session is rejected — the buckets aren't final.
93
+ - Promoted rows carry `provenance="discovery:<sessionId>"`, so a tracked query can always be traced back to the session that surfaced it.
94
+
51
95
  ## When you wake on `aeo-discover-probe.completed`
52
96
 
53
97
  The follow-up payload `RunCoordinator` queues for you includes:
@@ -62,14 +106,15 @@ Respond with:
62
106
 
63
107
  1. A one-line headline naming the dominant bucket.
64
108
  2. The top 2-3 wasted-surface queries (call `canonry_discover_session_get` to fetch them — don't guess).
65
- 3. The top 1-2 new competitor domains worth tracking.
66
- 4. A single recommended next step. Examples: "add competitor.com to the tracked list", "the wasted-surface set warrants a content plan around X", "the aspirational set is greenfield — pick the 3 with highest commercial intent and write content".
109
+ 3. The top 1-2 recurring new competitor domains worth tracking, ignoring one-hit domains unless the operator asks for the full long tail.
110
+ 4. A single recommended next step. Examples: "preview and promote cited + aspirational findings (`canonry discover promote preview`, then `canonry discover promote`)", "the wasted-surface set warrants a content plan around X before tracking", "the aspirational set is greenfield — pick the 3 with highest commercial intent and write content".
111
+
112
+ Do not recommend "promote everything" as the default. The safe path is: inspect session detail, preview promotion candidates, then promote the default cited + aspirational set. Escalate `wasted-surface` to tracking only when the operator deliberately chooses that tradeoff.
67
113
 
68
114
  Keep it tight. The operator wakes to a short, decision-ready summary, not a full report.
69
115
 
70
116
  ## What discovery does NOT do (yet)
71
117
 
72
- - **No promotion.** PR 2 ships `canonry discover promote` which adopts queries into the project's tracked basket with `provenance='discovery:<sessionId>'`. Until then, the operator merges manually via `canonry query add` / `canonry competitor add`.
73
118
  - **No multi-provider amplification.** v1 probes Gemini only. v2 will probe across Gemini + ChatGPT + Claude in one session (the schema is already shaped for it — `discovery_probes` has no `UNIQUE(session_id, query)` exactly because of this).
74
119
  - **No re-run drift.** Each session is independent. Comparing sessions over time is on the PR 4 / PR 5 roadmap.
75
120
 
@@ -78,6 +123,22 @@ Keep it tight. The operator wakes to a short, decision-ready summary, not a full
78
123
  - **Gemini not configured** → orchestrator throws early; `runs.status='failed'` with `Gemini provider is not configured.` Surface as "configure Gemini before running discovery" — link to `canonry init` or `~/.canonry/config.yaml`.
79
124
  - **Vertex-only Gemini** → embeddings step throws (Vertex embeddings deferred). Same surface, "use a Gemini API key for now."
80
125
  - **ICP missing** → route returns 400 with `VALIDATION_ERROR`. Ask the operator for the ICP description in plain language.
126
+ - **Seed collapse (hyperlocal/niche businesses)** → 40 raw seeds collapse to 1-2 canonical queries after embedding+clustering, even at low dedup thresholds. This happens when Gemini generates seed queries that all live in the same semantic pocket (e.g. all variants of "boutique hotel Venice Beach"). The embedding model sees them as near-identical, so clustering produces one representative.
127
+
128
+ **Diagnostic signal:** `seedCountRaw / seedCount > 10:1` (e.g. 40 raw → 1 selected).
129
+
130
+ **Remediation:** break the ICP into 3-5 distinct purchase-intent angles and run one session per angle via `--icp-angle`:
131
+
132
+ ```bash
133
+ canonry discover run <project> \
134
+ --icp-angle "romantic anniversary stay in Venice Beach" \
135
+ --icp-angle "best rooftop bars and dining hotels LA" \
136
+ --icp-angle "walkable Venice Beach hotels near Abbot Kinney" \
137
+ --icp-angle "design-forward boutique hotels for creative professionals" \
138
+ --wait
139
+ ```
140
+
141
+ Each angle generates its own 40-seed cluster independently, so aggregate coverage grows while per-session dedup stays clean. The `--wait` output prints a combined summary with per-session session IDs and a `promote` command for each. Promote the sessions individually after reviewing previews.
81
142
 
82
143
  ## Memory hygiene
83
144
 
@@ -88,7 +88,7 @@ GA4 is a first-class signal alongside citation tracking. Connect once with `cano
88
88
  | `references/aeo-analysis.md` | Interpreting sweep output, diagnosing regressions, planning content fixes |
89
89
  | `references/indexing.md` | Submitting URLs, checking GSC/Bing coverage, fixing indexing gaps |
90
90
  | `references/wordpress-integration.md` | Connecting to WordPress, editing pages, pushing staging → live |
91
- | `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run today; WordPress / others later) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
91
+ | `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run + WordPress adapters; more planned) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
92
92
 
93
93
  ---
94
94
 
@@ -230,13 +230,16 @@ canonry google request-indexing <project> --all-unindexed # push all unknown pag
230
230
  canonry discover run <project> --icp "..." --wait --format json # full pipeline: seed → embed → cluster → probe → bucket
231
231
  canonry discover run <project> --icp "..." --dedup-threshold 0.85 # tune cosine threshold (default 0.85)
232
232
  canonry discover run <project> --icp "..." --max-probes 100 # per-session probe budget (default 100, hard cap 500)
233
+ canonry discover run <project> --icp-angle "angle 1" --icp-angle "angle 2" --wait # multi-angle: one session per ICP angle, useful for hyperlocal/niche businesses
233
234
 
234
235
  canonry discover list <project> # newest-first session list
235
236
  canonry discover show <project> <session-id> # per-query probe rows + buckets
236
- canonry discover promote preview <project> <session-id> # preview the basket PR 2 will write (read-only)
237
+ canonry discover promote preview <project> <session-id> # preview bucketed candidates + recurring suggested competitors (read-only)
238
+ canonry discover promote <project> <session-id> # adopt cited + aspirational queries + recurring competitors
239
+ canonry discover promote <project> <session-id> --bucket aspirational --no-competitors # scope to a bucket subset / skip competitor merge
237
240
  ```
238
241
 
239
- Discovery requires Gemini configured (API key today; Vertex-mode embeddings are deferred). The pipeline writes a `discovery_sessions` row, a `runs` row (kind `aeo-discover-probe`), and one `discovery.basket-divergence` insight when the session completes. Aero wakes unprompted with the bucket-count payload so the operator can act without polling.
242
+ Discovery requires Gemini configured (API key today; Vertex-mode embeddings are deferred). The pipeline writes a `discovery_sessions` row, a `runs` row (kind `aeo-discover-probe`), and one `discovery.basket-divergence` insight when the session completes. Aero wakes unprompted with the bucket-count payload so the operator can act without polling. `discover promote` defaults to cited + aspirational queries and recurring competitor domains; include `--bucket wasted-surface` explicitly for off-ICP competitor gaps. Promotion is add-only and idempotent — queries/domains already tracked are reported as skipped, never inserted twice — and only works on `completed` sessions; promoted rows carry `provenance="discovery:<sessionId>"`.
240
243
 
241
244
  ## Bing Webmaster Tools
242
245