@ainyc/canonry 4.72.3 → 4.72.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -66,8 +66,45 @@ sync flow does NOT echo the private key back in any response.
66
66
  ## Connecting a WordPress source
67
67
 
68
68
  The WordPress adapter pulls events from the **Canonry Traffic Logger**
69
- WordPress plugin, which captures every non-admin GET page-load and
70
- exposes a paginated REST endpoint protected by an Application Password.
69
+ WordPress plugin, which captures every non-admin GET page-load **that
70
+ reaches PHP** and exposes a paginated REST endpoint protected by an
71
+ Application Password.
72
+
73
+ > **Cache blind spot.** The plugin is a PHP hook, so it only sees
74
+ > requests that execute WordPress. A full-page cache (LiteSpeed, WP
75
+ > Rocket, W3 Total Cache, WP Super Cache) or CDN serves cached pages
76
+ > before PHP runs, so cache-served page views, including live AI
77
+ > user-fetches (Claude-User, ChatGPT-User), are NOT logged. Bot crawls
78
+ > of uncached endpoints (sitemap, feeds, assets, cache misses) still
79
+ > come through, which can make capture look healthy while real page
80
+ > views go uncounted. Exclude AI user-agents from the cache (and any
81
+ > CDN), or capture from access/edge logs instead. The
82
+ > `traffic.source.cache-blindspot` doctor check warns whenever a
83
+ > WordPress source is connected.
84
+
85
+ **Which user-agents to exclude from the cache** (one per line in
86
+ LiteSpeed's "Do Not Cache User Agents", WP Rocket's
87
+ `rocket_cache_reject_ua`, or W3TC / WP Super Cache "Rejected User
88
+ Agents"):
89
+
90
+ ```
91
+ Claude-User
92
+ ClaudeBot
93
+ ChatGPT-User
94
+ OAI-SearchBot
95
+ GPTBot
96
+ PerplexityBot
97
+ Perplexity-User
98
+ ```
99
+
100
+ These are the answer-engine fetchers in both live-user-fetch (`*-User`)
101
+ and crawler forms. Do NOT add `Googlebot` or `Bingbot`: caching helps
102
+ search crawlers (page speed is a ranking signal, and cached pages let
103
+ them crawl more per visit, which matters most on crawl-budget-starved
104
+ sites), and their crawl stats are already authoritative in GSC and Bing
105
+ Webmaster Tools. Rule of thumb: bypass cache only for agents you cannot
106
+ measure elsewhere and that gain nothing from being cached. Answer-engine
107
+ fetchers fit both; search crawlers fit neither.
71
108
 
72
109
  ```bash
73
110
  # 1. Install the plugin. Download the latest release zip from the
@@ -223,7 +260,7 @@ MCP `canonry_traffic_status` tool.
223
260
  | Project dashboard `/projects/:name/activity` | Live source table + 24h totals + GA4 referrals (combined view) |
224
261
  | Top-level `/traffic` route | Cross-project source admin (connect, sync, archive) |
225
262
  | `cnry report <project>` (HTML + SPA) | "AI Visibility — Server-Side" section, ranked above Indexing Health |
226
- | `cnry doctor --project <name>` | `traffic.source.connected`, `recent-data`, `credentials`, `scopes` checks |
263
+ | `cnry doctor --project <name>` | `traffic.source.connected`, `recent-data`, `credentials`, `scopes`, `cache-blindspot` checks |
227
264
  | MCP toolkit `traffic` | Tools: `canonry_traffic_status`, `_sources_list`, `_source_get`, `_events`, `_connect_cloud_run`, `_sync` |
228
265
 
229
266
  ## Doctor signals
@@ -237,6 +274,7 @@ The doctor checks are adapter-agnostic. When they fail or warn:
237
274
  | `traffic.source.recent-data` | `traffic.recent-data.stale` | Last sync was >7d ago. Run `cnry traffic sync …` or schedule a recurring sync. |
238
275
  | `traffic.source.recent-data` | `traffic.recent-data.empty` | Source connected but no data in 30d. Verify config and credentials with `cnry traffic sources <project>`. |
239
276
  | `traffic.source.credentials` | `traffic.credentials.resolve-failed` | Service-account key in `~/.canonry/config.yaml` is invalid or expired. Re-connect. |
277
+ | `traffic.source.cache-blindspot` | `traffic.cache-blindspot.wordpress-plugin` | A WordPress source is connected, so the plugin cannot see cache-served page views. Exclude AI user-agents from the page cache and any CDN, or switch to a log/edge source. Warns only, not a failure. |
240
278
 
241
279
  ## Scheduling
242
280
 
@@ -276,6 +314,21 @@ domains, or PII are surfaced.
276
314
 
277
315
  ## Limits & caveats
278
316
 
317
+ - **The WordPress plugin is blind to cache-served traffic.** The
318
+ `wordpress` adapter logs only requests that reach PHP. A full-page
319
+ cache or CDN serves cached pages from the edge, so cache-served page
320
+ views, including live AI user-fetches (Claude-User, ChatGPT-User),
321
+ never reach the plugin and go uncounted, even though bot crawls of
322
+ uncached endpoints (sitemap, assets) still appear. On a cached
323
+ WordPress site, treat the plugin's page-view counts as a floor, not a
324
+ total. Either exclude AI user-agents from the cache + CDN, or capture
325
+ cache-independent via a `cloud-run` / `vercel` / edge-log source. The
326
+ `traffic.source.cache-blindspot` doctor check surfaces this. Adapter
327
+ coverage differs: `vercel` ingests edge request-logs so cache hits are
328
+ captured (it records the `cache` HIT/MISS label), and `cloud-run` logs
329
+ every request that reaches the service, missing only what a CDN placed
330
+ in front of Cloud Run serves from its own edge cache. Only the
331
+ hook-based `wordpress` adapter has the always-present blind spot.
279
332
  - **Path-level citation cross-reference is not implemented yet.** The
280
333
  citation store is domain-grain (`query_snapshots.cited_domains`). A
281
334
  future iteration that lands URL-grain citation evidence will extend
@@ -30949,11 +30949,41 @@ var scopesCheck3 = {
30949
30949
  return summarizePerSourceResults("scopes", "scopes", results);
30950
30950
  }
30951
30951
  };
30952
+ var cacheBlindSpotCheck = {
30953
+ id: "traffic.source.cache-blindspot",
30954
+ category: CheckCategories.integrations,
30955
+ scope: CheckScopes.project,
30956
+ title: "WordPress traffic cache blind spot",
30957
+ run: (ctx) => {
30958
+ if (!ctx.project) return skippedNoProject4();
30959
+ const wpSources = loadProbes(ctx).filter(
30960
+ (s) => s.sourceType === TrafficSourceTypes.wordpress
30961
+ );
30962
+ if (wpSources.length === 0) {
30963
+ return {
30964
+ status: CheckStatuses.skipped,
30965
+ code: "traffic.cache-blindspot.no-wordpress-source",
30966
+ summary: "No WordPress traffic source connected, so the plugin cache blind spot does not apply (log and edge adapters see cache-served requests)."
30967
+ };
30968
+ }
30969
+ return {
30970
+ status: CheckStatuses.warn,
30971
+ code: "traffic.cache-blindspot.wordpress-plugin",
30972
+ summary: `${wpSources.length} WordPress traffic source(s) capture via the Canonry Traffic Logger plugin, which only logs requests that execute PHP. A full-page cache (LiteSpeed, WP Rocket, W3 Total Cache, WP Super Cache) or CDN serves cached pages before PHP runs, so cache-served page views, including live AI user-fetches such as Claude-User and ChatGPT-User, are not captured. Bot crawls of uncached endpoints (sitemap, feeds, assets, cache misses) still appear, which can make capture look healthy while real page views go uncounted.`,
30973
+ remediation: 'Exclude AI user-agents from the page cache so their requests reach PHP: LiteSpeed Cache has "Do Not Cache User Agents" under Cache > Excludes; WP Rocket uses the `rocket_cache_reject_ua` filter; W3 Total Cache and WP Super Cache have a "Rejected User Agents" box. Mirror the rule at any CDN in front. For cache-independent capture, ingest from server or edge access logs (a `cloud-run` or `vercel` source, or an edge worker) instead of the WordPress plugin.',
30974
+ details: {
30975
+ wordpressSourceCount: wpSources.length,
30976
+ wordpressSourceIds: wpSources.map((s) => s.id)
30977
+ }
30978
+ };
30979
+ }
30980
+ };
30952
30981
  var TRAFFIC_SOURCE_CHECKS = [
30953
30982
  sourceConnectedCheck,
30954
30983
  recentDataCheck,
30955
30984
  credentialsCheck,
30956
- scopesCheck3
30985
+ scopesCheck3,
30986
+ cacheBlindSpotCheck
30957
30987
  ];
30958
30988
 
30959
30989
  // ../api-routes/src/doctor/checks/wordpress-publish.ts
@@ -95,7 +95,7 @@ import {
95
95
  runs,
96
96
  schedules,
97
97
  usageCounters
98
- } from "./chunk-HSX32G47.js";
98
+ } from "./chunk-HOKVBMOD.js";
99
99
  import {
100
100
  AGENT_MEMORY_VALUE_MAX_BYTES,
101
101
  AGENT_PROVIDER_IDS,
@@ -5620,7 +5620,7 @@ function readStoredGroundingSources(rawResponse) {
5620
5620
  return result;
5621
5621
  }
5622
5622
  async function backfillInsightsCommand(project, opts) {
5623
- const { IntelligenceService: IntelligenceService2 } = await import("./intelligence-service-ZW3ARLJT.js");
5623
+ const { IntelligenceService: IntelligenceService2 } = await import("./intelligence-service-CSW4R4I7.js");
5624
5624
  const config = loadConfig();
5625
5625
  const db = createClient(config.database);
5626
5626
  migrate(db);
package/dist/cli.js CHANGED
@@ -27,7 +27,7 @@ import {
27
27
  setTelemetrySource,
28
28
  showFirstRunNotice,
29
29
  trackEvent
30
- } from "./chunk-SIB4NMEH.js";
30
+ } from "./chunk-SRBO33HB.js";
31
31
  import {
32
32
  CliError,
33
33
  EXIT_SYSTEM_ERROR,
@@ -52,7 +52,7 @@ import {
52
52
  projects,
53
53
  queries,
54
54
  renderReportHtml
55
- } from "./chunk-HSX32G47.js";
55
+ } from "./chunk-HOKVBMOD.js";
56
56
  import {
57
57
  CcReleaseSyncStatuses,
58
58
  CheckScopes,
package/dist/index.js CHANGED
@@ -1,10 +1,10 @@
1
1
  import {
2
2
  createServer
3
- } from "./chunk-SIB4NMEH.js";
3
+ } from "./chunk-SRBO33HB.js";
4
4
  import {
5
5
  loadConfig
6
6
  } from "./chunk-ZUBBADMR.js";
7
- import "./chunk-HSX32G47.js";
7
+ import "./chunk-HOKVBMOD.js";
8
8
  import "./chunk-JXFNERK4.js";
9
9
  export {
10
10
  createServer,
@@ -1,6 +1,6 @@
1
1
  import {
2
2
  IntelligenceService
3
- } from "./chunk-HSX32G47.js";
3
+ } from "./chunk-HOKVBMOD.js";
4
4
  import "./chunk-JXFNERK4.js";
5
5
  export {
6
6
  IntelligenceService
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ainyc/canonry",
3
- "version": "4.72.3",
3
+ "version": "4.72.4",
4
4
  "type": "module",
5
5
  "description": "Agent-first open-source AEO operating platform - track how answer engines cite your domain",
6
6
  "license": "FSL-1.1-ALv2",
@@ -63,21 +63,21 @@
63
63
  "tsup": "^8.5.1",
64
64
  "tsx": "^4.19.0",
65
65
  "@ainyc/canonry-api-client": "0.0.0",
66
- "@ainyc/canonry-config": "0.0.0",
67
66
  "@ainyc/canonry-api-routes": "0.0.0",
67
+ "@ainyc/canonry-config": "0.0.0",
68
68
  "@ainyc/canonry-contracts": "0.0.0",
69
- "@ainyc/canonry-db": "0.0.0",
69
+ "@ainyc/canonry-integration-bing": "0.0.0",
70
+ "@ainyc/canonry-integration-cloud-run": "0.0.0",
70
71
  "@ainyc/canonry-integration-commoncrawl": "0.0.0",
72
+ "@ainyc/canonry-db": "0.0.0",
71
73
  "@ainyc/canonry-integration-google": "0.0.0",
72
74
  "@ainyc/canonry-integration-google-business-profile": "0.0.0",
73
- "@ainyc/canonry-integration-cloud-run": "0.0.0",
74
75
  "@ainyc/canonry-integration-google-places": "0.0.0",
75
- "@ainyc/canonry-integration-bing": "0.0.0",
76
- "@ainyc/canonry-integration-traffic": "0.0.0",
77
76
  "@ainyc/canonry-integration-wordpress": "0.0.0",
77
+ "@ainyc/canonry-integration-traffic": "0.0.0",
78
+ "@ainyc/canonry-intelligence": "0.0.0",
78
79
  "@ainyc/canonry-provider-cdp": "0.0.0",
79
80
  "@ainyc/canonry-provider-claude": "0.0.0",
80
- "@ainyc/canonry-intelligence": "0.0.0",
81
81
  "@ainyc/canonry-provider-gemini": "0.0.0",
82
82
  "@ainyc/canonry-provider-local": "0.0.0",
83
83
  "@ainyc/canonry-provider-openai": "0.0.0",