npm - @ainyc/canonry - Versions diffs - 4.24.1 → 4.26.0 - Mend

@ainyc/canonry 4.24.1 → 4.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +1 -1
package/assets/agent-workspace/skills/aero/references/aeo-discovery.md +89 -0
package/assets/agent-workspace/skills/canonry-setup/references/canonry-cli.md +15 -0
package/assets/assets/{index-BzD9HUxc.js → index-X1r0qycv.js} +84 -84
package/assets/index.html +1 -1
package/dist/{chunk-6EJ54OX7.js → chunk-2FAEQ56I.js} +92 -2
package/dist/{chunk-E5PZ23OS.js → chunk-H4RE4WLW.js} +1130 -305
package/dist/{chunk-EUGCQSFC.js → chunk-HVW665A4.js} +135 -1
package/dist/{chunk-OYYFXKRK.js → chunk-PN24DAGC.js} +127 -2
package/dist/cli.js +440 -125
package/dist/index.js +4 -4
package/dist/{intelligence-service-NVN2PAR7.js → intelligence-service-VUBODIGG.js} +2 -2
package/dist/mcp.js +9 -3
package/package.json +10 -10

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Canonry <img src="apps/web/public/favicon-32.png" alt="Canonry canary icon" width="24" />
-[![npm version](https://img.shields.io/npm/v/@ainyc/canonry)](https://www.npmjs.com/package/@ainyc/canonry) [![License: FSL-1.1-ALv2](https://img.shields.io/badge/License-FSL--1.1--ALv2-blue.svg)](https://fsl.software/) [![Node.js >= 22.14](https://img.shields.io/badge/node-%3E%3D22.14-brightgreen)](https://nodejs.org)
+[![npm version](https://img.shields.io/npm/v/@ainyc/canonry)](https://www.npmjs.com/package/@ainyc/canonry) [![Node.js >= 22.14](https://img.shields.io/badge/node-%3E%3D22.14-brightgreen)](https://nodejs.org)
 **Agent-first AEO operating platform. Open source. Self-hosted.**

package/assets/agent-workspace/skills/aero/references/aeo-discovery.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: aeo-discovery
+description: How to operate the tracked-basket discovery pipeline. Read when an operator asks to expand a project's basket, audit its competitive surface, or you wake unprompted on `aeo-discover-probe.completed`.
+---
+# AEO Discovery (Tracked-Basket Expansion)
+Discovery turns a free-text ICP description into a deduped basket of representative queries, probes each against Gemini grounding, and classifies the results into three buckets:
+- **cited** — the project's canonical (or owned) domain appears in the grounding sources
+- **wasted-surface** — a tracked competitor is cited but the project is not
+- **aspirational** — neither the project nor a tracked competitor is cited (greenfield)
+Plus a competitor map: every non-canonical domain that shows up in probe citations, ranked by hit count, so the operator can spot recurring competitors that aren't yet on the watchlist.
+## When discovery is the right move
+- Operator says "expand my tracked queries", "audit my basket", "what am I missing", "find competitors I should track".
+- Recurring `wasted-surface` shows up in regression analysis — the project keeps losing on queries adjacent to its tracked basket.
+- A new ICP is being onboarded and the operator only has a domain + tagline, no curated query list.
+## Triggering a session
+The operator runs:
+```bash
+canonry discover run <project> --icp "..." --wait
+```
+Or the MCP equivalent: `canonry_discover_run_start` with `{ project, request: { icpDescription, dedupThreshold?, maxProbes? } }`. The endpoint returns `{ runId, sessionId, status: "running" }` immediately and finishes the work in the background. Poll `canonry_discover_session_get` until `status` is `completed` or `failed`.
+ICP fallback: if the request omits `icpDescription`, the route uses `projects.icp_description` if set. Surface a clear "needs an ICP" prompt if neither is available.
+## Cost + budget
+Per session: ~$1 at the default probe budget (100 queries × 1 Gemini grounded call each, plus a single batched embed call for ~$0.0002). Hard cap: 500 probes per session, enforced both client-side (Zod) and server-side. Recommend the default (100) unless the operator has a specific reason.
+## Reading the result
+`canonry_discover_session_get` returns:
+- Session-level: `seedCountRaw` vs `seedCount` (proves embedding dedup did real work), bucket counts, `competitorMap` (top recurring non-tracked domains).
+- Per-probe: query, bucket, citation state, the cited domains list.
+Things to call out without being asked:
+- **High wasted-surface ratio** (≥ 40% of probes, or > cited count at ≥ 20%) → the project is missing from its own competitive space. The auto-written `discovery.basket-divergence` insight flags this as `high` severity.
+- **New competitor domains** in `competitorMap` that aren't already in the project's tracked competitor list → suggest adding via `canonry competitor add <project> <domain>`. PR 2's `canonry discover promote` will automate this.
+- **Aspirational greenfield** queries with no tracked competitor and no canonical cite → low-friction content opportunities.
+## When you wake on `aeo-discover-probe.completed`
+The follow-up payload `RunCoordinator` queues for you includes:
+```
+[system] Discovery run <runId> completed for project <name> (session <sessionId>).
+Buckets — cited:<n>, wasted-surface:<n>, aspirational:<n> (<probeCount> probes; seed provider: gemini).
+Top recurring competitor domains: <domain1>(<hits>), <domain2>(<hits>), …
+```
+Respond with:
+1. A one-line headline naming the dominant bucket.
+2. The top 2-3 wasted-surface queries (call `canonry_discover_session_get` to fetch them — don't guess).
+3. The top 1-2 new competitor domains worth tracking.
+4. A single recommended next step. Examples: "add competitor.com to the tracked list", "the wasted-surface set warrants a content plan around X", "the aspirational set is greenfield — pick the 3 with highest commercial intent and write content".
+Keep it tight. The operator wakes to a short, decision-ready summary, not a full report.
+## What discovery does NOT do (yet)
+- **No promotion.** PR 2 ships `canonry discover promote` which adopts queries into the project's tracked basket with `provenance='discovery:<sessionId>'`. Until then, the operator merges manually via `canonry query add` / `canonry competitor add`.
+- **No multi-provider amplification.** v1 probes Gemini only. v2 will probe across Gemini + ChatGPT + Claude in one session (the schema is already shaped for it — `discovery_probes` has no `UNIQUE(session_id, query)` exactly because of this).
+- **No re-run drift.** Each session is independent. Comparing sessions over time is on the PR 4 / PR 5 roadmap.
+## Failure modes
+- **Gemini not configured** → orchestrator throws early; `runs.status='failed'` with `Gemini provider is not configured.` Surface as "configure Gemini before running discovery" — link to `canonry init` or `~/.canonry/config.yaml`.
+- **Vertex-only Gemini** → embeddings step throws (Vertex embeddings deferred). Same surface, "use a Gemini API key for now."
+- **ICP missing** → route returns 400 with `VALIDATION_ERROR`. Ask the operator for the ICP description in plain language.
+## Memory hygiene
+After a discovery session, store a one-liner in `agent_memory` if the operator validates a non-obvious call. Examples:
+- `discovery:icp-style` — phrasing they responded well to
+- `discovery:competitor-watchlist` — domains they explicitly accepted/rejected from the suggested list
+Skip routine results — only memory-worthy material is what would help a future session avoid re-asking the same question.

package/assets/agent-workspace/skills/canonry-setup/references/canonry-cli.md CHANGED Viewed

@@ -52,6 +52,7 @@ canonry snapshot "Acme Corp" --domain acme.example.com --format json
 canonry run <project>                             # sweep all configured providers
 canonry run <project> --provider gemini           # single provider only
+canonry run <project> --query "alpha" --query "beta"  # scope sweep to a subset of tracked queries (repeatable)
 canonry run <project> --wait                      # block until complete
 canonry run <project> --location <label>          # run with specific location context
 canonry run <project> --all-locations             # run for every configured location
@@ -223,6 +224,20 @@ canonry google request-indexing <project> <url>           # push URL to Google
 canonry google request-indexing <project> --all-unindexed # push all unknown pages
 ```
+## Discovery (Tracked-Basket Expansion)
+```bash
+canonry discover run <project> --icp "..." --wait --format json    # full pipeline: seed → embed → cluster → probe → bucket
+canonry discover run <project> --icp "..." --dedup-threshold 0.85  # tune cosine threshold (default 0.85)
+canonry discover run <project> --icp "..." --max-probes 100         # per-session probe budget (default 100, hard cap 500)
+canonry discover list <project>                                     # newest-first session list
+canonry discover show <project> <session-id>                        # per-query probe rows + buckets
+canonry discover promote preview <project> <session-id>             # preview the basket PR 2 will write (read-only)
+```
+Discovery requires Gemini configured (API key today; Vertex-mode embeddings are deferred). The pipeline writes a `discovery_sessions` row, a `runs` row (kind `aeo-discover-probe`), and one `discovery.basket-divergence` insight when the session completes. Aero wakes unprompted with the bucket-count payload so the operator can act without polling.
 ## Bing Webmaster Tools
 ```bash