@ainyc/canonry 4.18.1 → 4.19.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/assets/agent-workspace/skills/canonry-setup/SKILL.md +1 -0
- package/assets/agent-workspace/skills/canonry-setup/references/server-side-traffic.md +167 -0
- package/assets/assets/{index-dLsgu2ck.js → index-CVqSCXSn.js} +103 -103
- package/assets/assets/index-QBgWzl2L.css +1 -0
- package/assets/index.html +2 -2
- package/dist/{chunk-7VDM3JBI.js → chunk-OHPZXTFC.js} +42 -4
- package/dist/cli.js +1 -1
- package/dist/index.js +1 -1
- package/package.json +8 -8
- package/assets/assets/index-4fWsYFLp.css +0 -1
|
@@ -88,6 +88,7 @@ GA4 is a first-class signal alongside citation tracking. Connect once with `cano
|
|
|
88
88
|
| `references/aeo-analysis.md` | Interpreting sweep output, diagnosing regressions, planning content fixes |
|
|
89
89
|
| `references/indexing.md` | Submitting URLs, checking GSC/Bing coverage, fixing indexing gaps |
|
|
90
90
|
| `references/wordpress-integration.md` | Connecting to WordPress, editing pages, pushing staging → live |
|
|
91
|
+
| `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run today; WordPress / others later) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
|
|
91
92
|
|
|
92
93
|
---
|
|
93
94
|
|
|
@@ -0,0 +1,167 @@
|
|
|
1
|
+
# Server-side traffic (AI Visibility — Server-Side)
|
|
2
|
+
|
|
3
|
+
Server-side traffic ingestion captures **what AI engines actually do in
|
|
4
|
+
your server logs** — bots crawling pages, AI products sending
|
|
5
|
+
click-through arrivals — in addition to the citation data that measures
|
|
6
|
+
**what models say** about you. The two surfaces are independent.
|
|
7
|
+
|
|
8
|
+
## When to use it
|
|
9
|
+
|
|
10
|
+
Reach for server-side traffic when an analyst or operator asks:
|
|
11
|
+
|
|
12
|
+
- *"Is GPTBot / ClaudeBot / PerplexityBot actually fetching my pages?"*
|
|
13
|
+
- *"Which paths are AI engines paying attention to?"*
|
|
14
|
+
- *"Are users clicking through from chatgpt.com / claude.ai / etc.?"*
|
|
15
|
+
- *"My citation rate is fine but there's no traffic — why?"*
|
|
16
|
+
|
|
17
|
+
GA4 referrals (chatgpt.com → your site) catch click-throughs after they
|
|
18
|
+
land. Server logs catch the upstream bot activity AND referrals at the
|
|
19
|
+
edge — including arrivals GA4 missed because of cookie consent, ad
|
|
20
|
+
blockers, or analytics gaps.
|
|
21
|
+
|
|
22
|
+
## Architecture
|
|
23
|
+
|
|
24
|
+
Two tables, populated from server-log adapters:
|
|
25
|
+
|
|
26
|
+
| Table | What's in it |
|
|
27
|
+
|---|---|
|
|
28
|
+
| `crawler_events_hourly` | One row per `(project, source, hour, bot, verification, path, status)` — bot crawls rolled up by hour |
|
|
29
|
+
| `ai_referral_events_hourly` | One row per `(project, source, hour, product, source_domain, evidence_type, landing_path, status)` — click-through arrivals rolled up by hour |
|
|
30
|
+
| `raw_event_samples` | Bounded forensic samples (≤100 per sync) for spot-checking |
|
|
31
|
+
|
|
32
|
+
Each `traffic_sources` row is one server-log integration for a project.
|
|
33
|
+
Today's only adapter is `cloud-run`; future adapters slot in by
|
|
34
|
+
implementing the same contract.
|
|
35
|
+
|
|
36
|
+
## Connecting a Cloud Run source
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
# 1. Create a service account in the Cloud project that hosts the Cloud Run
|
|
40
|
+
# service. Grant it `roles/logging.viewer`. Download the JSON key.
|
|
41
|
+
|
|
42
|
+
# 2. Connect from canonry CLI:
|
|
43
|
+
canonry traffic connect cloud-run <project> \
|
|
44
|
+
--gcp-project <gcp-project-id> \
|
|
45
|
+
--service-account-key <path/to/key.json>
|
|
46
|
+
|
|
47
|
+
# 3. (Optional) narrow to a specific service or location:
|
|
48
|
+
canonry traffic connect cloud-run <project> \
|
|
49
|
+
--gcp-project <id> \
|
|
50
|
+
--service-account-key <path> \
|
|
51
|
+
--service my-service-name \
|
|
52
|
+
--location us-east1
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Credentials are stored in `~/.canonry/config.yaml` (not the DB). The
|
|
56
|
+
canonical key lives only on the host that runs `canonry serve`. The
|
|
57
|
+
sync flow does NOT echo the private key back in any response.
|
|
58
|
+
|
|
59
|
+
## Syncing data
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
# Manual sync — defaults to a 30-day lookback on the first run; subsequent
|
|
63
|
+
# runs are clamped forward to lastSyncedAt to avoid re-pulling.
|
|
64
|
+
canonry traffic sync <project> --source <id>
|
|
65
|
+
|
|
66
|
+
# Override the lookback window (minutes):
|
|
67
|
+
canonry traffic sync <project> --source <id> --since-minutes 4320 # 3 days
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Cross-sync dedupe via the `last_event_ids` ring buffer means re-running a
|
|
71
|
+
sync over an overlapping window cannot double-count rolled-up hourly
|
|
72
|
+
hits. Safe to schedule (see "Scheduling" below) or trigger from CI.
|
|
73
|
+
|
|
74
|
+
## Inspecting source state
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
# All sources with last-24h totals + latest sync run (single-call):
|
|
78
|
+
canonry traffic status <project> --format json
|
|
79
|
+
|
|
80
|
+
# Just the source list:
|
|
81
|
+
canonry traffic sources <project> --format json
|
|
82
|
+
|
|
83
|
+
# Windowed events (defaults to last 24h):
|
|
84
|
+
canonry traffic events <project> --kind crawler --limit 200 --format json
|
|
85
|
+
canonry traffic events <project> --kind ai-referral --since 2026-04-01 --until 2026-04-30
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
The `traffic status` composite returns the same per-source detail
|
|
89
|
+
(24h crawler hits, AI-referral arrivals, raw-event-sample count, latest
|
|
90
|
+
sync-run summary) whether you reach it via the CLI, the API, or the
|
|
91
|
+
MCP `canonry_traffic_status` tool.
|
|
92
|
+
|
|
93
|
+
## Where the data shows up
|
|
94
|
+
|
|
95
|
+
| Surface | What's rendered |
|
|
96
|
+
|---|---|
|
|
97
|
+
| Project dashboard `/projects/:name/activity` | Live source table + 24h totals + GA4 referrals (combined view) |
|
|
98
|
+
| Top-level `/traffic` route | Cross-project source admin (connect, sync, archive) |
|
|
99
|
+
| `canonry report <project>` (HTML + SPA) | "AI Visibility — Server-Side" section, ranked above Indexing Health |
|
|
100
|
+
| `canonry doctor --project <name>` | `traffic.source.connected`, `recent-data`, `credentials`, `scopes` checks |
|
|
101
|
+
| MCP toolkit `traffic` | Tools: `canonry_traffic_status`, `_sources_list`, `_source_get`, `_events`, `_connect_cloud_run`, `_sync` |
|
|
102
|
+
|
|
103
|
+
## Doctor signals
|
|
104
|
+
|
|
105
|
+
The doctor checks are adapter-agnostic. When they fail or warn:
|
|
106
|
+
|
|
107
|
+
| Check | Code | What to do |
|
|
108
|
+
|---|---|---|
|
|
109
|
+
| `traffic.source.connected` | `traffic.source.none` | No source — `canonry traffic connect cloud-run …` |
|
|
110
|
+
| `traffic.source.connected` | `traffic.source.all-errored` | Re-connect the source. The check's `details.lastError` shows the underlying reason. |
|
|
111
|
+
| `traffic.source.recent-data` | `traffic.recent-data.stale` | Last sync was >7d ago. Run `canonry traffic sync …` or schedule a recurring sync. |
|
|
112
|
+
| `traffic.source.recent-data` | `traffic.recent-data.empty` | Source connected but no data in 30d. Verify config and credentials with `canonry traffic sources <project>`. |
|
|
113
|
+
| `traffic.source.credentials` | `traffic.credentials.resolve-failed` | Service-account key in `~/.canonry/config.yaml` is invalid or expired. Re-connect. |
|
|
114
|
+
|
|
115
|
+
## Scheduling
|
|
116
|
+
|
|
117
|
+
`canonry schedule` supports `--kind traffic-sync`. Recurring syncs are
|
|
118
|
+
safe because of the `last_event_ids` cross-sync dedupe ring buffer
|
|
119
|
+
described above. Recommended cadence:
|
|
120
|
+
|
|
121
|
+
| Cadence | Use case |
|
|
122
|
+
|---|---|
|
|
123
|
+
| `0 */6 * * *` (every 6h) | Production agencies tracking active client sites |
|
|
124
|
+
| `0 0 * * *` (daily) | Lower-traffic sites or local dev |
|
|
125
|
+
| Manual only | First few weeks while validating data |
|
|
126
|
+
|
|
127
|
+
## Telemetry
|
|
128
|
+
|
|
129
|
+
Every successful or failed sync emits a `traffic.synced` event to the
|
|
130
|
+
canonry telemetry pipeline:
|
|
131
|
+
|
|
132
|
+
```jsonc
|
|
133
|
+
{
|
|
134
|
+
"event": "traffic.synced",
|
|
135
|
+
"errorCode": "PROVIDER_AUTH", // present only when status='failed'
|
|
136
|
+
"properties": {
|
|
137
|
+
"status": "completed" | "failed",
|
|
138
|
+
"sourceType": "cloud-run", // adapter type
|
|
139
|
+
"sourceId": "<uuid>", // opaque
|
|
140
|
+
"pulledEvents": 234,
|
|
141
|
+
"crawlerHits": 200,
|
|
142
|
+
"aiReferralHits": 12,
|
|
143
|
+
"durationMs": 4150
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
Counts are aggregate. The sourceId is an opaque UUID. No raw paths,
|
|
149
|
+
domains, or PII are surfaced.
|
|
150
|
+
|
|
151
|
+
## Limits & caveats
|
|
152
|
+
|
|
153
|
+
- **Path-level citation cross-reference is not implemented yet.** The
|
|
154
|
+
citation store is domain-grain (`query_snapshots.cited_domains`). A
|
|
155
|
+
future iteration that lands URL-grain citation evidence will extend
|
|
156
|
+
the `topCrawledPaths` entry with a `citationState` flag. Until then,
|
|
157
|
+
treat the report's crawled-paths table as "engine attention" — the
|
|
158
|
+
signal is the bot fetched it, not whether it was cited.
|
|
159
|
+
- **Verified vs unverified.** The headline numbers count only
|
|
160
|
+
rDNS-verified hits. Unverified bots claim a known UA but couldn't be
|
|
161
|
+
cross-confirmed via reverse-DNS — they may be the real bot or an
|
|
162
|
+
imitator. Don't promote unverified counts in client-facing copy.
|
|
163
|
+
- **Cloud Run only in v1.** WordPress plugin and other adapters are
|
|
164
|
+
planned. The doctor checks and the report renderer are already
|
|
165
|
+
adapter-agnostic — adding a new adapter is just a new entry in
|
|
166
|
+
`traffic_sources.source_type` and a `TrafficSourceValidator`
|
|
167
|
+
registration.
|