@ainyc/canonry 4.18.1 → 4.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -88,6 +88,7 @@ GA4 is a first-class signal alongside citation tracking. Connect once with `cano
88
88
  | `references/aeo-analysis.md` | Interpreting sweep output, diagnosing regressions, planning content fixes |
89
89
  | `references/indexing.md` | Submitting URLs, checking GSC/Bing coverage, fixing indexing gaps |
90
90
  | `references/wordpress-integration.md` | Connecting to WordPress, editing pages, pushing staging → live |
91
+ | `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run today; WordPress / others later) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
91
92
 
92
93
  ---
93
94
 
@@ -0,0 +1,167 @@
1
+ # Server-side traffic (AI Visibility — Server-Side)
2
+
3
+ Server-side traffic ingestion captures **what AI engines actually do in
4
+ your server logs** — bots crawling pages, AI products sending
5
+ click-through arrivals — in addition to the citation data that measures
6
+ **what models say** about you. The two surfaces are independent.
7
+
8
+ ## When to use it
9
+
10
+ Reach for server-side traffic when an analyst or operator asks:
11
+
12
+ - *"Is GPTBot / ClaudeBot / PerplexityBot actually fetching my pages?"*
13
+ - *"Which paths are AI engines paying attention to?"*
14
+ - *"Are users clicking through from chatgpt.com / claude.ai / etc.?"*
15
+ - *"My citation rate is fine but there's no traffic — why?"*
16
+
17
+ GA4 referrals (chatgpt.com → your site) catch click-throughs after they
18
+ land. Server logs catch the upstream bot activity AND referrals at the
19
+ edge — including arrivals GA4 missed because of cookie consent, ad
20
+ blockers, or analytics gaps.
21
+
22
+ ## Architecture
23
+
24
+ Two tables, populated from server-log adapters:
25
+
26
+ | Table | What's in it |
27
+ |---|---|
28
+ | `crawler_events_hourly` | One row per `(project, source, hour, bot, verification, path, status)` — bot crawls rolled up by hour |
29
+ | `ai_referral_events_hourly` | One row per `(project, source, hour, product, source_domain, evidence_type, landing_path, status)` — click-through arrivals rolled up by hour |
30
+ | `raw_event_samples` | Bounded forensic samples (≤100 per sync) for spot-checking |
31
+
32
+ Each `traffic_sources` row is one server-log integration for a project.
33
+ Today's only adapter is `cloud-run`; future adapters slot in by
34
+ implementing the same contract.
35
+
36
+ ## Connecting a Cloud Run source
37
+
38
+ ```bash
39
+ # 1. Create a service account in the Cloud project that hosts the Cloud Run
40
+ # service. Grant it `roles/logging.viewer`. Download the JSON key.
41
+
42
+ # 2. Connect from canonry CLI:
43
+ canonry traffic connect cloud-run <project> \
44
+ --gcp-project <gcp-project-id> \
45
+ --service-account-key <path/to/key.json>
46
+
47
+ # 3. (Optional) narrow to a specific service or location:
48
+ canonry traffic connect cloud-run <project> \
49
+ --gcp-project <id> \
50
+ --service-account-key <path> \
51
+ --service my-service-name \
52
+ --location us-east1
53
+ ```
54
+
55
+ Credentials are stored in `~/.canonry/config.yaml` (not the DB). The
56
+ canonical key lives only on the host that runs `canonry serve`. The
57
+ sync flow does NOT echo the private key back in any response.
58
+
59
+ ## Syncing data
60
+
61
+ ```bash
62
+ # Manual sync — defaults to a 30-day lookback on the first run; subsequent
63
+ # runs are clamped forward to lastSyncedAt to avoid re-pulling.
64
+ canonry traffic sync <project> --source <id>
65
+
66
+ # Override the lookback window (minutes):
67
+ canonry traffic sync <project> --source <id> --since-minutes 4320 # 3 days
68
+ ```
69
+
70
+ Cross-sync dedupe via the `last_event_ids` ring buffer means re-running a
71
+ sync over an overlapping window cannot double-count rolled-up hourly
72
+ hits. Safe to schedule (see "Scheduling" below) or trigger from CI.
73
+
74
+ ## Inspecting source state
75
+
76
+ ```bash
77
+ # All sources with last-24h totals + latest sync run (single-call):
78
+ canonry traffic status <project> --format json
79
+
80
+ # Just the source list:
81
+ canonry traffic sources <project> --format json
82
+
83
+ # Windowed events (defaults to last 24h):
84
+ canonry traffic events <project> --kind crawler --limit 200 --format json
85
+ canonry traffic events <project> --kind ai-referral --since 2026-04-01 --until 2026-04-30
86
+ ```
87
+
88
+ The `traffic status` composite returns the same per-source detail
89
+ (24h crawler hits, AI-referral arrivals, raw-event-sample count, latest
90
+ sync-run summary) whether you reach it via the CLI, the API, or the
91
+ MCP `canonry_traffic_status` tool.
92
+
93
+ ## Where the data shows up
94
+
95
+ | Surface | What's rendered |
96
+ |---|---|
97
+ | Project dashboard `/projects/:name/activity` | Live source table + 24h totals + GA4 referrals (combined view) |
98
+ | Top-level `/traffic` route | Cross-project source admin (connect, sync, archive) |
99
+ | `canonry report <project>` (HTML + SPA) | "AI Visibility — Server-Side" section, ranked above Indexing Health |
100
+ | `canonry doctor --project <name>` | `traffic.source.connected`, `recent-data`, `credentials`, `scopes` checks |
101
+ | MCP toolkit `traffic` | Tools: `canonry_traffic_status`, `_sources_list`, `_source_get`, `_events`, `_connect_cloud_run`, `_sync` |
102
+
103
+ ## Doctor signals
104
+
105
+ The doctor checks are adapter-agnostic. When they fail or warn:
106
+
107
+ | Check | Code | What to do |
108
+ |---|---|---|
109
+ | `traffic.source.connected` | `traffic.source.none` | No source — `canonry traffic connect cloud-run …` |
110
+ | `traffic.source.connected` | `traffic.source.all-errored` | Re-connect the source. The check's `details.lastError` shows the underlying reason. |
111
+ | `traffic.source.recent-data` | `traffic.recent-data.stale` | Last sync was >7d ago. Run `canonry traffic sync …` or schedule a recurring sync. |
112
+ | `traffic.source.recent-data` | `traffic.recent-data.empty` | Source connected but no data in 30d. Verify config and credentials with `canonry traffic sources <project>`. |
113
+ | `traffic.source.credentials` | `traffic.credentials.resolve-failed` | Service-account key in `~/.canonry/config.yaml` is invalid or expired. Re-connect. |
114
+
115
+ ## Scheduling
116
+
117
+ `canonry schedule` supports `--kind traffic-sync`. Recurring syncs are
118
+ safe because of the `last_event_ids` cross-sync dedupe ring buffer
119
+ described above. Recommended cadence:
120
+
121
+ | Cadence | Use case |
122
+ |---|---|
123
+ | `0 */6 * * *` (every 6h) | Production agencies tracking active client sites |
124
+ | `0 0 * * *` (daily) | Lower-traffic sites or local dev |
125
+ | Manual only | First few weeks while validating data |
126
+
127
+ ## Telemetry
128
+
129
+ Every successful or failed sync emits a `traffic.synced` event to the
130
+ canonry telemetry pipeline:
131
+
132
+ ```jsonc
133
+ {
134
+ "event": "traffic.synced",
135
+ "errorCode": "PROVIDER_AUTH", // present only when status='failed'
136
+ "properties": {
137
+ "status": "completed" | "failed",
138
+ "sourceType": "cloud-run", // adapter type
139
+ "sourceId": "<uuid>", // opaque
140
+ "pulledEvents": 234,
141
+ "crawlerHits": 200,
142
+ "aiReferralHits": 12,
143
+ "durationMs": 4150
144
+ }
145
+ }
146
+ ```
147
+
148
+ Counts are aggregate. The sourceId is an opaque UUID. No raw paths,
149
+ domains, or PII are surfaced.
150
+
151
+ ## Limits & caveats
152
+
153
+ - **Path-level citation cross-reference is not implemented yet.** The
154
+ citation store is domain-grain (`query_snapshots.cited_domains`). A
155
+ future iteration that lands URL-grain citation evidence will extend
156
+ the `topCrawledPaths` entry with a `citationState` flag. Until then,
157
+ treat the report's crawled-paths table as "engine attention" — the
158
+ signal is the bot fetched it, not whether it was cited.
159
+ - **Verified vs unverified.** The headline numbers count only
160
+ rDNS-verified hits. Unverified bots claim a known UA but couldn't be
161
+ cross-confirmed via reverse-DNS — they may be the real bot or an
162
+ imitator. Don't promote unverified counts in client-facing copy.
163
+ - **Cloud Run only in v1.** WordPress plugin and other adapters are
164
+ planned. The doctor checks and the report renderer are already
165
+ adapter-agnostic — adding a new adapter is just a new entry in
166
+ `traffic_sources.source_type` and a `TrafficSourceValidator`
167
+ registration.
@@ -17118,6 +17118,7 @@ async function trafficRoutes(app, opts) {
17118
17118
  Math.min(windowEnd.getTime(), Math.max(requestedStartMs, lastSyncedMs))
17119
17119
  );
17120
17120
  const startedAt = windowEnd.toISOString();
17121
+ const syncStartedAtMs = windowEnd.getTime();
17121
17122
  const runId = crypto20.randomUUID();
17122
17123
  app.db.insert(runs).values({
17123
17124
  id: runId,
@@ -17129,19 +17130,32 @@ async function trafficRoutes(app, opts) {
17129
17130
  startedAt,
17130
17131
  createdAt: startedAt
17131
17132
  }).run();
17132
- const markFailed = (msg) => {
17133
+ const markFailed = (msg, errorCode) => {
17133
17134
  const failedAt = (/* @__PURE__ */ new Date()).toISOString();
17134
17135
  app.db.transaction((tx) => {
17135
17136
  tx.update(runs).set({ status: RunStatuses.failed, error: msg, finishedAt: failedAt }).where(eq23(runs.id, runId)).run();
17136
17137
  tx.update(trafficSources).set({ status: TrafficSourceStatuses.error, lastError: msg, updatedAt: failedAt }).where(eq23(trafficSources.id, sourceRow.id)).run();
17137
17138
  });
17139
+ try {
17140
+ opts.onTrafficSynced?.({
17141
+ status: "failed",
17142
+ sourceType: sourceRow.sourceType,
17143
+ sourceId: sourceRow.id,
17144
+ pulledEvents: 0,
17145
+ crawlerHits: 0,
17146
+ aiReferralHits: 0,
17147
+ durationMs: Date.now() - syncStartedAtMs,
17148
+ errorCode
17149
+ });
17150
+ } catch {
17151
+ }
17138
17152
  };
17139
17153
  let accessToken;
17140
17154
  try {
17141
17155
  accessToken = await resolveAccessToken2(credential);
17142
17156
  } catch (e) {
17143
17157
  const msg = e instanceof Error ? e.message : String(e);
17144
- markFailed(msg);
17158
+ markFailed(msg, "PROVIDER_AUTH");
17145
17159
  throw providerError(`Failed to resolve Cloud Run access token: ${msg}`);
17146
17160
  }
17147
17161
  let allEvents = [];
@@ -17158,7 +17172,7 @@ async function trafficRoutes(app, opts) {
17158
17172
  allEvents = page.events;
17159
17173
  } catch (e) {
17160
17174
  const msg = e instanceof Error ? e.message : String(e);
17161
- markFailed(msg);
17175
+ markFailed(msg, "PROVIDER_PULL");
17162
17176
  throw providerError(`Cloud Run pull failed: ${msg}`);
17163
17177
  }
17164
17178
  const seenEventIds = new Set(parseJsonColumn(sourceRow.lastEventIds, []));
@@ -17292,6 +17306,18 @@ async function trafficRoutes(app, opts) {
17292
17306
  entityType: "traffic_source",
17293
17307
  entityId: sourceRow.id
17294
17308
  });
17309
+ try {
17310
+ opts.onTrafficSynced?.({
17311
+ status: "completed",
17312
+ sourceType: sourceRow.sourceType,
17313
+ sourceId: sourceRow.id,
17314
+ pulledEvents: report.totals.normalizedEvents,
17315
+ crawlerHits: report.totals.crawlerHits,
17316
+ aiReferralHits: report.totals.aiReferralHits,
17317
+ durationMs: Date.now() - syncStartedAtMs
17318
+ });
17319
+ } catch {
17320
+ }
17295
17321
  const response = {
17296
17322
  sourceId: sourceRow.id,
17297
17323
  runId,
@@ -18633,7 +18659,8 @@ async function apiRoutes(app, opts) {
18633
18659
  await api.register(trafficRoutes, {
18634
18660
  cloudRunCredentialStore: opts.cloudRunCredentialStore,
18635
18661
  pullCloudRunEvents: opts.pullCloudRunEvents,
18636
- resolveCloudRunAccessToken: opts.resolveCloudRunAccessToken
18662
+ resolveCloudRunAccessToken: opts.resolveCloudRunAccessToken,
18663
+ onTrafficSynced: opts.onTrafficSynced
18637
18664
  });
18638
18665
  await api.register(backlinksRoutes, {
18639
18666
  getBacklinksStatus: opts.getBacklinksStatus,
@@ -25830,6 +25857,17 @@ async function createServer(opts) {
25830
25857
  wordpressConnectionStore,
25831
25858
  ga4CredentialStore,
25832
25859
  cloudRunCredentialStore,
25860
+ onTrafficSynced: (event) => {
25861
+ trackEvent("traffic.synced", {
25862
+ status: event.status,
25863
+ sourceType: event.sourceType,
25864
+ sourceId: event.sourceId,
25865
+ pulledEvents: event.pulledEvents,
25866
+ crawlerHits: event.crawlerHits,
25867
+ aiReferralHits: event.aiReferralHits,
25868
+ durationMs: event.durationMs
25869
+ }, event.errorCode ? { errorCode: event.errorCode } : void 0);
25870
+ },
25833
25871
  onRunCreated: (runId, projectId, providers2, location) => {
25834
25872
  jobRunner.executeRun(runId, projectId, providers2, location).catch((err) => {
25835
25873
  app.log.error({ runId, err }, "Job runner failed");
package/dist/cli.js CHANGED
@@ -20,7 +20,7 @@ import {
20
20
  setTelemetrySource,
21
21
  showFirstRunNotice,
22
22
  trackEvent
23
- } from "./chunk-7VDM3JBI.js";
23
+ } from "./chunk-OHPZXTFC.js";
24
24
  import {
25
25
  CliError,
26
26
  EXIT_SYSTEM_ERROR,
package/dist/index.js CHANGED
@@ -1,6 +1,6 @@
1
1
  import {
2
2
  createServer
3
- } from "./chunk-7VDM3JBI.js";
3
+ } from "./chunk-OHPZXTFC.js";
4
4
  import {
5
5
  loadConfig
6
6
  } from "./chunk-P3SFTXHG.js";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ainyc/canonry",
3
- "version": "4.18.1",
3
+ "version": "4.19.0",
4
4
  "type": "module",
5
5
  "description": "Agent-first open-source AEO operating platform - track how answer engines cite your domain",
6
6
  "license": "FSL-1.1-ALv2",
@@ -63,19 +63,19 @@
63
63
  "@ainyc/canonry-db": "0.0.0",
64
64
  "@ainyc/canonry-intelligence": "0.0.0",
65
65
  "@ainyc/canonry-integration-bing": "0.0.0",
66
- "@ainyc/canonry-api-routes": "0.0.0",
67
- "@ainyc/canonry-integration-commoncrawl": "0.0.0",
68
66
  "@ainyc/canonry-integration-cloud-run": "0.0.0",
69
67
  "@ainyc/canonry-contracts": "0.0.0",
70
68
  "@ainyc/canonry-integration-google": "0.0.0",
71
69
  "@ainyc/canonry-integration-traffic": "0.0.0",
72
- "@ainyc/canonry-integration-wordpress": "0.0.0",
70
+ "@ainyc/canonry-api-routes": "0.0.0",
73
71
  "@ainyc/canonry-provider-cdp": "0.0.0",
72
+ "@ainyc/canonry-integration-wordpress": "0.0.0",
74
73
  "@ainyc/canonry-provider-claude": "0.0.0",
75
- "@ainyc/canonry-provider-gemini": "0.0.0",
74
+ "@ainyc/canonry-integration-commoncrawl": "0.0.0",
76
75
  "@ainyc/canonry-provider-local": "0.0.0",
76
+ "@ainyc/canonry-provider-openai": "0.0.0",
77
77
  "@ainyc/canonry-provider-perplexity": "0.0.0",
78
- "@ainyc/canonry-provider-openai": "0.0.0"
78
+ "@ainyc/canonry-provider-gemini": "0.0.0"
79
79
  },
80
80
  "scripts": {
81
81
  "build": "tsx scripts/copy-agent-assets.ts && tsup && tsx build-web.ts",