@ainyc/canonry 4.24.1 → 4.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Canonry <img src="apps/web/public/favicon-32.png" alt="Canonry canary icon" width="24" />
2
2
 
3
- [![npm version](https://img.shields.io/npm/v/@ainyc/canonry)](https://www.npmjs.com/package/@ainyc/canonry) [![License: FSL-1.1-ALv2](https://img.shields.io/badge/License-FSL--1.1--ALv2-blue.svg)](https://fsl.software/) [![Node.js >= 22.14](https://img.shields.io/badge/node-%3E%3D22.14-brightgreen)](https://nodejs.org)
3
+ [![npm version](https://img.shields.io/npm/v/@ainyc/canonry)](https://www.npmjs.com/package/@ainyc/canonry) [![Node.js >= 22.14](https://img.shields.io/badge/node-%3E%3D22.14-brightgreen)](https://nodejs.org)
4
4
 
5
5
  **Agent-first AEO operating platform. Open source. Self-hosted.**
6
6
 
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: aeo-discovery
3
+ description: How to operate the tracked-basket discovery pipeline. Read when an operator asks to expand a project's basket, audit its competitive surface, or you wake unprompted on `aeo-discover-probe.completed`.
4
+ ---
5
+
6
+ # AEO Discovery (Tracked-Basket Expansion)
7
+
8
+ Discovery turns a free-text ICP description into a deduped basket of representative queries, probes each against Gemini grounding, and classifies the results into three buckets:
9
+
10
+ - **cited** — the project's canonical (or owned) domain appears in the grounding sources
11
+ - **wasted-surface** — a tracked competitor is cited but the project is not
12
+ - **aspirational** — neither the project nor a tracked competitor is cited (greenfield)
13
+
14
+ Plus a competitor map: every non-canonical domain that shows up in probe citations, ranked by hit count, so the operator can spot recurring competitors that aren't yet on the watchlist.
15
+
16
+ ## When discovery is the right move
17
+
18
+ - Operator says "expand my tracked queries", "audit my basket", "what am I missing", "find competitors I should track".
19
+ - Recurring `wasted-surface` shows up in regression analysis — the project keeps losing on queries adjacent to its tracked basket.
20
+ - A new ICP is being onboarded and the operator only has a domain + tagline, no curated query list.
21
+
22
+ ## Triggering a session
23
+
24
+ The operator runs:
25
+
26
+ ```bash
27
+ canonry discover run <project> --icp "..." --wait
28
+ ```
29
+
30
+ Or the MCP equivalent: `canonry_discover_run_start` with `{ project, request: { icpDescription, dedupThreshold?, maxProbes? } }`. The endpoint returns `{ runId, sessionId, status: "running" }` immediately and finishes the work in the background. Poll `canonry_discover_session_get` until `status` is `completed` or `failed`.
31
+
32
+ ICP fallback: if the request omits `icpDescription`, the route uses `projects.icp_description` if set. Surface a clear "needs an ICP" prompt if neither is available.
33
+
34
+ ## Cost + budget
35
+
36
+ Per session: ~$1 at the default probe budget (100 queries × 1 Gemini grounded call each, plus a single batched embed call for ~$0.0002). Hard cap: 500 probes per session, enforced both client-side (Zod) and server-side. Recommend the default (100) unless the operator has a specific reason.
37
+
38
+ ## Reading the result
39
+
40
+ `canonry_discover_session_get` returns:
41
+
42
+ - Session-level: `seedCountRaw` vs `seedCount` (proves embedding dedup did real work), bucket counts, `competitorMap` (top recurring non-tracked domains).
43
+ - Per-probe: query, bucket, citation state, the cited domains list.
44
+
45
+ Things to call out without being asked:
46
+
47
+ - **High wasted-surface ratio** (≥ 40% of probes, or > cited count at ≥ 20%) → the project is missing from its own competitive space. The auto-written `discovery.basket-divergence` insight flags this as `high` severity.
48
+ - **New competitor domains** in `competitorMap` that aren't already in the project's tracked competitor list → suggest adding via `canonry competitor add <project> <domain>`. PR 2's `canonry discover promote` will automate this.
49
+ - **Aspirational greenfield** queries with no tracked competitor and no canonical cite → low-friction content opportunities.
50
+
51
+ ## When you wake on `aeo-discover-probe.completed`
52
+
53
+ The follow-up payload `RunCoordinator` queues for you includes:
54
+
55
+ ```
56
+ [system] Discovery run <runId> completed for project <name> (session <sessionId>).
57
+ Buckets — cited:<n>, wasted-surface:<n>, aspirational:<n> (<probeCount> probes; seed provider: gemini).
58
+ Top recurring competitor domains: <domain1>(<hits>), <domain2>(<hits>), …
59
+ ```
60
+
61
+ Respond with:
62
+
63
+ 1. A one-line headline naming the dominant bucket.
64
+ 2. The top 2-3 wasted-surface queries (call `canonry_discover_session_get` to fetch them — don't guess).
65
+ 3. The top 1-2 new competitor domains worth tracking.
66
+ 4. A single recommended next step. Examples: "add competitor.com to the tracked list", "the wasted-surface set warrants a content plan around X", "the aspirational set is greenfield — pick the 3 with highest commercial intent and write content".
67
+
68
+ Keep it tight. The operator wakes to a short, decision-ready summary, not a full report.
69
+
70
+ ## What discovery does NOT do (yet)
71
+
72
+ - **No promotion.** PR 2 ships `canonry discover promote` which adopts queries into the project's tracked basket with `provenance='discovery:<sessionId>'`. Until then, the operator merges manually via `canonry query add` / `canonry competitor add`.
73
+ - **No multi-provider amplification.** v1 probes Gemini only. v2 will probe across Gemini + ChatGPT + Claude in one session (the schema is already shaped for it — `discovery_probes` has no `UNIQUE(session_id, query)` exactly because of this).
74
+ - **No re-run drift.** Each session is independent. Comparing sessions over time is on the PR 4 / PR 5 roadmap.
75
+
76
+ ## Failure modes
77
+
78
+ - **Gemini not configured** → orchestrator throws early; `runs.status='failed'` with `Gemini provider is not configured.` Surface as "configure Gemini before running discovery" — link to `canonry init` or `~/.canonry/config.yaml`.
79
+ - **Vertex-only Gemini** → embeddings step throws (Vertex embeddings deferred). Same surface, "use a Gemini API key for now."
80
+ - **ICP missing** → route returns 400 with `VALIDATION_ERROR`. Ask the operator for the ICP description in plain language.
81
+
82
+ ## Memory hygiene
83
+
84
+ After a discovery session, store a one-liner in `agent_memory` if the operator validates a non-obvious call. Examples:
85
+
86
+ - `discovery:icp-style` — phrasing they responded well to
87
+ - `discovery:competitor-watchlist` — domains they explicitly accepted/rejected from the suggested list
88
+
89
+ Skip routine results — only memory-worthy material is what would help a future session avoid re-asking the same question.
@@ -52,6 +52,7 @@ canonry snapshot "Acme Corp" --domain acme.example.com --format json
52
52
 
53
53
  canonry run <project> # sweep all configured providers
54
54
  canonry run <project> --provider gemini # single provider only
55
+ canonry run <project> --query "alpha" --query "beta" # scope sweep to a subset of tracked queries (repeatable)
55
56
  canonry run <project> --wait # block until complete
56
57
  canonry run <project> --location <label> # run with specific location context
57
58
  canonry run <project> --all-locations # run for every configured location
@@ -223,6 +224,20 @@ canonry google request-indexing <project> <url> # push URL to Google
223
224
  canonry google request-indexing <project> --all-unindexed # push all unknown pages
224
225
  ```
225
226
 
227
+ ## Discovery (Tracked-Basket Expansion)
228
+
229
+ ```bash
230
+ canonry discover run <project> --icp "..." --wait --format json # full pipeline: seed → embed → cluster → probe → bucket
231
+ canonry discover run <project> --icp "..." --dedup-threshold 0.85 # tune cosine threshold (default 0.85)
232
+ canonry discover run <project> --icp "..." --max-probes 100 # per-session probe budget (default 100, hard cap 500)
233
+
234
+ canonry discover list <project> # newest-first session list
235
+ canonry discover show <project> <session-id> # per-query probe rows + buckets
236
+ canonry discover promote preview <project> <session-id> # preview the basket PR 2 will write (read-only)
237
+ ```
238
+
239
+ Discovery requires Gemini configured (API key today; Vertex-mode embeddings are deferred). The pipeline writes a `discovery_sessions` row, a `runs` row (kind `aeo-discover-probe`), and one `discovery.basket-divergence` insight when the session completes. Aero wakes unprompted with the bucket-count payload so the operator can act without polling.
240
+
226
241
  ## Bing Webmaster Tools
227
242
 
228
243
  ```bash