@ainyc/canonry 4.28.0 → 4.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@
5
5
  **Agent-first AEO operating platform. Open source. Self-hosted.**
6
6
 
7
7
  - Track citations across Gemini, ChatGPT, Claude, Perplexity, and local LLMs
8
- - Watch AI engines crawl and refer traffic via [server-log ingestion](skills/canonry-setup/references/server-side-traffic.md) — Cloud Run today, more sources coming
8
+ - Watch AI engines crawl and refer traffic via [server-log ingestion](skills/canonry-setup/references/server-side-traffic.md) — Cloud Run logs and the WordPress Traffic Logger plugin today
9
9
  - Diagnose against real traffic with built-in [GSC](docs/google-search-console-setup.md), [GA4](docs/google-analytics-setup.md), and [Bing Webmaster](docs/bing-webmaster-setup.md)
10
10
  - Execute fixes via [WordPress](docs/wordpress-setup.md), JSON-LD schema, and indexing submissions
11
11
  - Manage many clients declaratively — config-as-code YAML + `canonry apply`
@@ -66,7 +66,7 @@ Configure during `canonry init`, in the dashboard `/settings`, or as env vars.
66
66
  | **Architecture & data model** | [docs/architecture.md](docs/architecture.md) · [docs/data-model.md](docs/data-model.md) |
67
67
  | **Aero — built-in agent** | [skills/aero/SKILL.md](skills/aero/SKILL.md) |
68
68
  | **MCP — Claude Desktop / Cursor / Codex** | [docs/mcp.md](docs/mcp.md) |
69
- | **Integrations** | [GSC](docs/google-search-console-setup.md) · [GA4](docs/google-analytics-setup.md) · [Bing](docs/bing-webmaster-setup.md) · [WordPress](docs/wordpress-setup.md) · [Server-side traffic (Cloud Run logs)](skills/canonry-setup/references/server-side-traffic.md) |
69
+ | **Integrations** | [GSC](docs/google-search-console-setup.md) · [GA4](docs/google-analytics-setup.md) · [Bing](docs/bing-webmaster-setup.md) · [WordPress](docs/wordpress-setup.md) · [Server-side traffic (Cloud Run + WordPress logs)](skills/canonry-setup/references/server-side-traffic.md) |
70
70
  | **Deployment** — Docker, Railway, Render, systemd, Tailscale | [docs/deployment.md](docs/deployment.md) |
71
71
  | **API** — 118+ endpoints | `GET /api/v1/openapi.json` (no auth) |
72
72
  | **Skills bundle** for Claude Code / Codex | `canonry skills install` ([details](skills/canonry-setup/SKILL.md)) |
@@ -123,6 +123,22 @@ Keep it tight. The operator wakes to a short, decision-ready summary, not a full
123
123
  - **Gemini not configured** → orchestrator throws early; `runs.status='failed'` with `Gemini provider is not configured.` Surface as "configure Gemini before running discovery" — link to `canonry init` or `~/.canonry/config.yaml`.
124
124
  - **Vertex-only Gemini** → embeddings step throws (Vertex embeddings deferred). Same surface, "use a Gemini API key for now."
125
125
  - **ICP missing** → route returns 400 with `VALIDATION_ERROR`. Ask the operator for the ICP description in plain language.
126
+ - **Seed collapse (hyperlocal/niche businesses)** → 40 raw seeds collapse to 1-2 canonical queries after embedding+clustering, even at low dedup thresholds. This happens when Gemini generates seed queries that all live in the same semantic pocket (e.g. all variants of "boutique hotel Venice Beach"). The embedding model sees them as near-identical, so clustering produces one representative.
127
+
128
+ **Diagnostic signal:** `seedCountRaw / seedCount > 10:1` (e.g. 40 raw → 1 selected).
129
+
130
+ **Remediation:** break the ICP into 3-5 distinct purchase-intent angles and run one session per angle via `--icp-angle`:
131
+
132
+ ```bash
133
+ canonry discover run <project> \
134
+ --icp-angle "romantic anniversary stay in Venice Beach" \
135
+ --icp-angle "best rooftop bars and dining hotels LA" \
136
+ --icp-angle "walkable Venice Beach hotels near Abbot Kinney" \
137
+ --icp-angle "design-forward boutique hotels for creative professionals" \
138
+ --wait
139
+ ```
140
+
141
+ Each angle generates its own 40-seed cluster independently, so aggregate coverage grows while per-session dedup stays clean. The `--wait` output prints a combined summary with per-session session IDs and a `promote` command for each. Promote the sessions individually after reviewing previews.
126
142
 
127
143
  ## Memory hygiene
128
144
 
@@ -88,7 +88,7 @@ GA4 is a first-class signal alongside citation tracking. Connect once with `cano
88
88
  | `references/aeo-analysis.md` | Interpreting sweep output, diagnosing regressions, planning content fixes |
89
89
  | `references/indexing.md` | Submitting URLs, checking GSC/Bing coverage, fixing indexing gaps |
90
90
  | `references/wordpress-integration.md` | Connecting to WordPress, editing pages, pushing staging → live |
91
- | `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run today; WordPress / others later) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
91
+ | `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run + WordPress adapters; more planned) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
92
92
 
93
93
  ---
94
94
 
@@ -230,6 +230,7 @@ canonry google request-indexing <project> --all-unindexed # push all unknown pag
230
230
  canonry discover run <project> --icp "..." --wait --format json # full pipeline: seed → embed → cluster → probe → bucket
231
231
  canonry discover run <project> --icp "..." --dedup-threshold 0.85 # tune cosine threshold (default 0.85)
232
232
  canonry discover run <project> --icp "..." --max-probes 100 # per-session probe budget (default 100, hard cap 500)
233
+ canonry discover run <project> --icp-angle "angle 1" --icp-angle "angle 2" --wait # multi-angle: one session per ICP angle, useful for hyperlocal/niche businesses
233
234
 
234
235
  canonry discover list <project> # newest-first session list
235
236
  canonry discover show <project> <session-id> # per-query probe rows + buckets